Hybrid Machine Learning Model with Python

Hybrid machine learning models combine different types of algorithms to leverage their unique strengths, which results in improved predictive performance and robustness. If you want to learn about building a hybrid machine learning model, this article is for you. In this article, I’ll take you through a step-by-step guide on creating a hybrid machine learning model to use the predictive power of multiple algorithms together.

When to Build a Hybrid Machine Learning Model?

Build a hybrid machine learning model when a single algorithm cannot capture data complexity. Different data types or patterns may require this. For example, use a hybrid approach when handling sequential patterns and broader trends in the data.

Combine models like LSTM for sequence learning and Linear Regression for trend analysis to improve performance.

Identify the need for a hybrid model when single models perform poorly based on performance metrics. Different models combined can provide unique strengths in predictive modeling.

Building a Hybrid Machine Learning Model with Python

Now, I’ll take you through a step-by-step guide on building a hybrid machine learning model where we will be combining the predictive power of two different models to create a hybrid model. The dataset I will be using for this task can be downloaded from here.

Now, let’s get started with the task of building a hybrid machine learning model by importing the necessary Python libraries and the dataset:

import pandas as pd
data = pd.read_csv('/content/apple_stock_data.csv')
print(data.head())

                        Date   Adj Close       Close        High         Low  \
0  2023-11-02 00:00:00+00:00  176.665985  177.570007  177.779999  175.460007   
1  2023-11-03 00:00:00+00:00  175.750671  176.649994  176.820007  173.350006   
2  2023-11-06 00:00:00+00:00  178.317520  179.229996  179.429993  176.210007   
3  2023-11-07 00:00:00+00:00  180.894333  181.820007  182.440002  178.970001   
4  2023-11-08 00:00:00+00:00  181.958893  182.889999  183.449997  181.589996   

         Open    Volume  
0  175.520004  77334800  
1  174.240005  79763700  
2  176.380005  63841300  
3  179.179993  70530000  
4  182.350006  49340300

As the dataset is based on stock market data, I’ll convert the date column to a datetime type, set it as the index, and focus on the Close price:

data['Date'] = pd.to_datetime(data['Date'])
data.set_index('Date', inplace=True)
data = data[['Close']]

Choosing the Hybrid Models

We will be using LSTM (Long Short-Term Memory) and Linear Regression models for this task. I chose LSTM because it effectively captures sequential dependencies and patterns in time-series data, which makes it suitable for modelling stock price movements influenced by historical trends.

Linear Regression, on the other hand, is a straightforward model that captures simple linear relationships and long-term trends in data. By combining these two models into a hybrid approach, we leverage the LSTM’s ability to model complex time-dependent patterns alongside the Linear Regression’s ability to identify and follow broader trends. This combination aims to create a more balanced and accurate prediction system.

So, let’s scale the Close price data between 0 and 1 using MinMaxScaler to ensure compatibility with the LSTM model:

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
data['Close'] = scaler.fit_transform(data[['Close']])

Now, let’s prepare the data for LSTM by creating sequences of a defined length (e.g., 60 days) to predict the next day’s price:

import numpy as np
def create_sequences(data, seq_length=60):
    X, y = [], []
    for i in range(len(data) - seq_length):
        X.append(data[i:i+seq_length])
        y.append(data[i+seq_length])
    return np.array(X), np.array(y)

seq_length = 60
X, y = create_sequences(data['Close'].values, seq_length)

Now, we will split the sequences into training and test sets (e.g., 80% training, 20% testing):

train_size = int(len(X) * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]

Now, we will build a sequential LSTM model with layers to capture the temporal dependencies in the data:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

lstm_model = Sequential()
lstm_model.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], 1)))
lstm_model.add(LSTM(units=50))
lstm_model.add(Dense(1))

Now, we will compile the model using an appropriate optimizer and loss function, and fit it into the training data:

lstm_model.compile(optimizer='adam', loss='mean_squared_error')
lstm_model.fit(X_train, y_train, epochs=20, batch_size=32)

Epoch 1/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 6s 64ms/step - loss: 0.2519
Epoch 2/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 59ms/step - loss: 0.0425
Epoch 3/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 59ms/step - loss: 0.0396
Epoch 4/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 57ms/step - loss: 0.0167
Epoch 5/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 52ms/step - loss: 0.0199
Epoch 6/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step - loss: 0.0152
Epoch 7/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 49ms/step - loss: 0.0127
Epoch 8/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 75ms/step - loss: 0.0130
Epoch 9/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 60ms/step - loss: 0.0104
Epoch 10/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 81ms/step - loss: 0.0107
Epoch 11/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 64ms/step - loss: 0.0091
Epoch 12/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 65ms/step - loss: 0.0094
Epoch 13/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 62ms/step - loss: 0.0098
Epoch 14/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 62ms/step - loss: 0.0087
Epoch 15/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 62ms/step - loss: 0.0081
Epoch 16/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 56ms/step - loss: 0.0081
Epoch 17/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 64ms/step - loss: 0.0093
Epoch 18/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 61ms/step - loss: 0.0078
Epoch 19/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 57ms/step - loss: 0.0078
Epoch 20/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 66ms/step - loss: 0.0073

Now, let’s train the second model. I’ll start by generating lagged features for Linear Regression (e.g., using the past 3 days as predictors):

data['Lag_1'] = data['Close'].shift(1)
data['Lag_2'] = data['Close'].shift(2)
data['Lag_3'] = data['Close'].shift(3)
data = data.dropna()

Now, we will split the data accordingly for training and testing:

X_lin = data[['Lag_1', 'Lag_2', 'Lag_3']]
y_lin = data['Close']
X_train_lin, X_test_lin = X_lin[:train_size], X_lin[train_size:]
y_train_lin, y_test_lin = y_lin[:train_size], y_lin[train_size:]

Now, let’s train the linear regression model:

from sklearn.linear_model import LinearRegression
lin_model = LinearRegression()
lin_model.fit(X_train_lin, y_train_lin)

Now, here’s how to make predictions using LSTM on the test set and inverse transform the scaled predictions:

X_test_lstm = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
lstm_predictions = lstm_model.predict(X_test_lstm)
lstm_predictions = scaler.inverse_transform(lstm_predictions)

Here’s how to generate predictions using Linear Regression and inverse-transform them:

lin_predictions = lin_model.predict(X_test_lin)
lin_predictions = scaler.inverse_transform(lin_predictions.reshape(-1, 1))

And, here’s how to use a weighted average to create hybrid predictions:

hybrid_predictions = (0.7 * lstm_predictions) + (0.3 * lin_predictions)

Predicting using the Hybrid Model

Let’s see how to make predictions for the next 10 days using our hybrid model. Here’s how to predict the Next 10 Days using LSTM:

lstm_future_predictions = []
last_sequence = X[-1].reshape(1, seq_length, 1)
for _ in range(10):
    lstm_pred = lstm_model.predict(last_sequence)[0, 0]
    lstm_future_predictions.append(lstm_pred)
    lstm_pred_reshaped = np.array([[lstm_pred]]).reshape(1, 1, 1)
    last_sequence = np.append(last_sequence[:, 1:, :], lstm_pred_reshaped, axis=1)
lstm_future_predictions = scaler.inverse_transform(np.array(lstm_future_predictions).reshape(-1, 1))

Here’s how to predict the Next 10 Days using Linear Regression:

recent_data = data['Close'].values[-3:]
lin_future_predictions = []
for _ in range(10):
    lin_pred = lin_model.predict(recent_data.reshape(1, -1))[0]
    lin_future_predictions.append(lin_pred)
    recent_data = np.append(recent_data[1:], lin_pred)
lin_future_predictions = scaler.inverse_transform(np.array(lin_future_predictions).reshape(-1, 1))

And, here’s how to combine the predictive power of both models to make predictions for the next 10 days:

hybrid_future_predictions = (0.7 * lstm_future_predictions) + (0.3 * lin_future_predictions)

Here’s how to create the final DataFrame to look at the predictions:

future_dates = pd.date_range(start=data.index[-1] + pd.Timedelta(days=1), periods=10)
predictions_df = pd.DataFrame({
    'Date': future_dates,
    'LSTM Predictions': lstm_future_predictions.flatten(),
    'Linear Regression Predictions': lin_future_predictions.flatten(),
    'Hybrid Model Predictions': hybrid_future_predictions.flatten()
})
print(predictions_df)

                       Date  LSTM Predictions  Linear Regression Predictions  \
0 2024-11-02 00:00:00+00:00        232.222122                     230.355192   
1 2024-11-03 00:00:00+00:00        231.944504                     225.707291   
2 2024-11-04 00:00:00+00:00        231.732971                     222.703426   
3 2024-11-05 00:00:00+00:00        231.574249                     230.631535   
4 2024-11-06 00:00:00+00:00        231.456375                     225.486380   
5 2024-11-07 00:00:00+00:00        231.368454                     222.494588   
6 2024-11-08 00:00:00+00:00        231.301605                     230.930195   
7 2024-11-09 00:00:00+00:00        231.249008                     225.245599   
8 2024-11-10 00:00:00+00:00        231.205811                     222.284007   
9 2024-11-11 00:00:00+00:00        231.168671                     231.252375   

   Hybrid Model Predictions  
0                231.662038  
1                230.073332  
2                229.024102  
3                231.291435  
4                229.665369  
5                228.706293  
6                231.190176  
7                229.447978  
8                228.529273  
9                231.193782

Summary

So, this is how to build a hybrid machine learning model using Python. Build a hybrid machine learning model when a single algorithm cannot capture the complexity of the data or when different types of data or patterns are present. I hope you liked this article on building a hybrid machine learning model with Python. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.