Build a machine learning model to predict the rating of an app based on its features. - Refer this Dataset: https://www.kaggle.com/datasets/lava18/google-play-store-apps - You should code everything on Google Collab
To build a machine learning model for predicting the rating of an app based on its features using the given dataset, you can follow these steps using Google Colab:
Step 1: Set up the Environment
- Open Google Colab (colab.research.google.com).
- Click on "File" > "New Notebook" to create a new Python notebook.
Step 2: Import Libraries
- Begin by importing the necessary libraries. Add the following code to a code cell:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
Step 3: Load and Preprocess the Dataset
- Download the dataset from the provided Kaggle link and upload it to your Google Colab environment.
- Add the following code to load the dataset and preprocess it:
# Load the dataset
df = pd.read_csv("googleplaystore.csv")
# Remove rows with missing values
df.dropna(inplace=True)
# Select relevant features for the model
features = ['Category', 'Reviews', 'Size', 'Installs', 'Price']
target = 'Rating'
df = df[features + [target]]
# Convert Size and Price columns to numeric
df['Size'] = df['Size'].apply(lambda x: float(x[:-1]) if 'M' in x else float(x[:-1])/1024)
df['Price'] = df['Price'].apply(lambda x: float(x[1:]))
# Convert categorical features to dummy variables
df = pd.get_dummies(df)
# Split the data into training and testing sets
X = df.drop(target, axis=1)
y = df[target]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
Step 4: Train and Evaluate the Model
- Add the following code to train a linear regression model and evaluate its performance:
# Train the linear regression model
model = LinearRegression()
model.fit(X_train_scaled, y_train)
# Predict the ratings on the test set
y_pred = model.predict(X_test_scaled)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Squared Error:", mse)
print("R-squared:", r2)
Step 5: Run the Model
- Run each code cell sequentially to load the dataset, preprocess it, train the model, and evaluate its performance.
Note: Make sure the "googleplaystore.csv" file is in the same directory as your notebook or provide the appropriate file path if it's saved in a different location.
This code builds a linear regression model to predict the rating of an app based on its features, using the provided dataset. It preprocesses the data, converts categorical features to dummy variables, standardizes the features, and then trains the linear regression model. Finally, it evaluates the model's performance using mean squared error and R-squared metrics.
Comments
Post a Comment