How to Test and Use a Machine Learning Model

Many beginners train a Machine Learning model, evaluate its performance, and finish their project after evaluating the model. What they don’t know is how to actually test and use the model to make predictions in real time. So, in this article, I’ll take you through a step-by-step guide on how to test and use a Machine Learning model to make predictions using Python.

How to Test and Use a Machine Learning Model

In this guide, I’ll walk through how to test a machine learning model by making predictions in real time using the California Housing dataset from sklearn. This dataset contains information about California’s housing prices and related factors, which makes it a great choice for building a regression model.

Step 1: Load and Explore the Dataset

The first step in any machine learning project is to understand the dataset. The California Housing dataset includes features such as MedInc (median income) and AveRooms (average number of rooms), and a target variable MedHouseVal (median house value). Here’s how to load and explore the data:

from sklearn.datasets import fetch_california_housing
import pandas as pd

# load the California Housing dataset
housing = fetch_california_housing(as_frame=True)

# convert to DataFrame for exploration
df = housing.frame

# display the first few rows of the dataset
print(df.head())

   MedInc  HouseAge  AveRooms  AveBedrms  Population  AveOccup  Latitude  \
0  8.3252      41.0  6.984127   1.023810       322.0  2.555556     37.88   
1  8.3014      21.0  6.238137   0.971880      2401.0  2.109842     37.86   
2  7.2574      52.0  8.288136   1.073446       496.0  2.802260     37.85   
3  5.6431      52.0  5.817352   1.073059       558.0  2.547945     37.85   
4  3.8462      52.0  6.281853   1.081081       565.0  2.181467     37.85   

   Longitude  MedHouseVal  
0    -122.23        4.526  
1    -122.22        3.585  
2    -122.24        3.521  
3    -122.25        3.413  
4    -122.25        3.422

Step 2: Split the Dataset into Train and Test Sets

Next, we will divide the data into training and testing sets to ensure the model is trained on one part and evaluated on unseen data:

from sklearn.model_selection import train_test_split

# features (X) and target (y)
X = df.drop(columns=['MedHouseVal']) # drop the target column
y = df['MedHouseVal'] # target column

# split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training set size: {X_train.shape[0]}")
print(f"Testing set size: {X_test.shape[0]}")

Training set size: 16512
Testing set size: 4128

Step 3: Train the Machine Learning Model

Next, we will use a Random Forest Regressor, a popular ensemble learning method, to build our regression model:

from sklearn.ensemble import RandomForestRegressor

# initialize the Random Forest Regressor
model = RandomForestRegressor(random_state=42)

# train the model
model.fit(X_train, y_train)

Step 4: Evaluate the Model

Before making real-time predictions, it’s essential to check how well the model performs on unseen data using metrics like Mean Squared Error (MSE):

from sklearn.metrics import mean_squared_error

# make predictions on the test set
y_pred = model.predict(X_test)

# evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error on the Test Set: {mse}")

Mean Squared Error on the Test Set: 0.2553684927247781

The MSE measures the average squared difference between actual and predicted values. A lower MSE indicates better model performance.

Step 5: Make Real-Time Predictions

This is the step that most of the beginners skip. To simulate real-world usage, we will provide a new data point to the trained model for prediction. The model input should have the same structure as the training features:

import numpy as np

# define a new data point (real-time input)
new_data = np.array([[8.3252, 41.0, 6.9841, 1.0238, 322.0, 2.5556, 37.88, -122.23]])  # example values

# make predictions for the new data point
new_prediction = model.predict(new_data)

print(f"Predicted Median House Value: {new_prediction[0]}")

Predicted Median House Value: 4.265793

Always make sure your input data aligns with the model’s training features to make accurate predictions.

By following these steps, you can effectively test and use a Machine Learning model to make reliable predictions. You can learn about packaging Machine Learning models for deployment from here.

Summary

To test and use a machine learning model, start by loading and exploring the dataset for understanding. Split the data into training and testing sets to evaluate the model effectively. Train the model using a suitable algorithm for accurate learning. Evaluate the model’s performance using appropriate metrics to ensure reliability. Provide new data points in the same format for real-time predictions and deployment readiness.

I hope you liked this article on how to test and use a Machine Learning model. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.