Build a machine learning model to predict the rating of an app based on its features. - Refer this Dataset: https://www.kaggle.com/datasets/lava18/google-play-store-apps - You should code everything on Google Collab

To build a machine learning model for predicting the rating of an app based on its features using the given dataset, you can follow these steps using Google Colab:


Step 1: Set up the Environment

- Open Google Colab (colab.research.google.com).

- Click on "File" > "New Notebook" to create a new Python notebook.


Step 2: Import Libraries

- Begin by importing the necessary libraries. Add the following code to a code cell:

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

from sklearn.preprocessing import StandardScaler

Step 3: Load and Preprocess the Dataset

- Download the dataset from the provided Kaggle link and upload it to your Google Colab environment.

- Add the following code to load the dataset and preprocess it:

# Load the dataset

df = pd.read_csv("googleplaystore.csv")

# Remove rows with missing values

df.dropna(inplace=True)

# Select relevant features for the model

features = ['Category', 'Reviews', 'Size', 'Installs', 'Price']

target = 'Rating'

df = df[features + [target]]

# Convert Size and Price columns to numeric

df['Size'] = df['Size'].apply(lambda x: float(x[:-1]) if 'M' in x else float(x[:-1])/1024)

df['Price'] = df['Price'].apply(lambda x: float(x[1:]))

# Convert categorical features to dummy variables

df = pd.get_dummies(df)

# Split the data into training and testing sets

X = df.drop(target, axis=1)

y = df[target]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)

X_test_scaled = scaler.transform(X_test)

Step 4: Train and Evaluate the Model

- Add the following code to train a linear regression model and evaluate its performance:

# Train the linear regression model

model = LinearRegression()

model.fit(X_train_scaled, y_train)

# Predict the ratings on the test set

y_pred = model.predict(X_test_scaled)

# Evaluate the model

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)

print("R-squared:", r2)

Step 5: Run the Model

- Run each code cell sequentially to load the dataset, preprocess it, train the model, and evaluate its performance.

Note: Make sure the "googleplaystore.csv" file is in the same directory as your notebook or provide the appropriate file path if it's saved in a different location.

This code builds a linear regression model to predict the rating of an app based on its features, using the provided dataset. It preprocesses the data, converts categorical features to dummy variables, standardizes the features, and then trains the linear regression model. Finally, it evaluates the model's performance using mean squared error and R-squared metrics.


Comments

Popular Posts