Hybrid machine learning models combine different types of algorithms to leverage their unique strengths, which results in improved predictive performance and robustness. If you want to learn about building a hybrid machine learning model, this article is for you. In this article, I’ll take you through a step-by-step guide on creating a hybrid machine learning model to use the predictive power of multiple algorithms together.
When to Build a Hybrid Machine Learning Model?
Build a hybrid machine learning model when a single algorithm cannot capture data complexity. Different data types or patterns may require this. For example, use a hybrid approach when handling sequential patterns and broader trends in the data.
Combine models like LSTM for sequence learning and Linear Regression for trend analysis to improve performance.
Identify the need for a hybrid model when single models perform poorly based on performance metrics. Different models combined can provide unique strengths in predictive modeling.
Building a Hybrid Machine Learning Model with Python
Now, I’ll take you through a step-by-step guide on building a hybrid machine learning model where we will be combining the predictive power of two different models to create a hybrid model. The dataset I will be using for this task can be downloaded from here.
Now, let’s get started with the task of building a hybrid machine learning model by importing the necessary Python libraries and the dataset:
import pandas as pd
data = pd.read_csv('/content/apple_stock_data.csv')
print(data.head())Date Adj Close Close High Low \
0 2023-11-02 00:00:00+00:00 176.665985 177.570007 177.779999 175.460007
1 2023-11-03 00:00:00+00:00 175.750671 176.649994 176.820007 173.350006
2 2023-11-06 00:00:00+00:00 178.317520 179.229996 179.429993 176.210007
3 2023-11-07 00:00:00+00:00 180.894333 181.820007 182.440002 178.970001
4 2023-11-08 00:00:00+00:00 181.958893 182.889999 183.449997 181.589996
Open Volume
0 175.520004 77334800
1 174.240005 79763700
2 176.380005 63841300
3 179.179993 70530000
4 182.350006 49340300
As the dataset is based on stock market data, I’ll convert the date column to a datetime type, set it as the index, and focus on the Close price:
data['Date'] = pd.to_datetime(data['Date'])
data.set_index('Date', inplace=True)
data = data[['Close']]Choosing the Hybrid Models
We will be using LSTM (Long Short-Term Memory) and Linear Regression models for this task. I chose LSTM because it effectively captures sequential dependencies and patterns in time-series data, which makes it suitable for modelling stock price movements influenced by historical trends.
Linear Regression, on the other hand, is a straightforward model that captures simple linear relationships and long-term trends in data. By combining these two models into a hybrid approach, we leverage the LSTM’s ability to model complex time-dependent patterns alongside the Linear Regression’s ability to identify and follow broader trends. This combination aims to create a more balanced and accurate prediction system.
So, let’s scale the Close price data between 0 and 1 using MinMaxScaler to ensure compatibility with the LSTM model:
from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler(feature_range=(0, 1)) data['Close'] = scaler.fit_transform(data[['Close']])
Now, let’s prepare the data for LSTM by creating sequences of a defined length (e.g., 60 days) to predict the next day’s price:
import numpy as np
def create_sequences(data, seq_length=60):
X, y = [], []
for i in range(len(data) - seq_length):
X.append(data[i:i+seq_length])
y.append(data[i+seq_length])
return np.array(X), np.array(y)
seq_length = 60
X, y = create_sequences(data['Close'].values, seq_length)Now, we will split the sequences into training and test sets (e.g., 80% training, 20% testing):
train_size = int(len(X) * 0.8) X_train, X_test = X[:train_size], X[train_size:] y_train, y_test = y[:train_size], y[train_size:]
Now, we will build a sequential LSTM model with layers to capture the temporal dependencies in the data:
from tensorflow.keras.models import Sequential from tensorflow.keras.layers import LSTM, Dense lstm_model = Sequential() lstm_model.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], 1))) lstm_model.add(LSTM(units=50)) lstm_model.add(Dense(1))
Now, we will compile the model using an appropriate optimizer and loss function, and fit it into the training data:
lstm_model.compile(optimizer='adam', loss='mean_squared_error') lstm_model.fit(X_train, y_train, epochs=20, batch_size=32)
Epoch 1/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 6s 64ms/step - loss: 0.2519
Epoch 2/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 59ms/step - loss: 0.0425
Epoch 3/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 59ms/step - loss: 0.0396
Epoch 4/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 57ms/step - loss: 0.0167
Epoch 5/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 52ms/step - loss: 0.0199
Epoch 6/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step - loss: 0.0152
Epoch 7/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 49ms/step - loss: 0.0127
Epoch 8/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 75ms/step - loss: 0.0130
Epoch 9/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 60ms/step - loss: 0.0104
Epoch 10/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 81ms/step - loss: 0.0107
Epoch 11/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 64ms/step - loss: 0.0091
Epoch 12/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 65ms/step - loss: 0.0094
Epoch 13/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 62ms/step - loss: 0.0098
Epoch 14/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 62ms/step - loss: 0.0087
Epoch 15/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 62ms/step - loss: 0.0081
Epoch 16/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 56ms/step - loss: 0.0081
Epoch 17/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 64ms/step - loss: 0.0093
Epoch 18/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 61ms/step - loss: 0.0078
Epoch 19/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 57ms/step - loss: 0.0078
Epoch 20/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 66ms/step - loss: 0.0073
Now, let’s train the second model. I’ll start by generating lagged features for Linear Regression (e.g., using the past 3 days as predictors):
data['Lag_1'] = data['Close'].shift(1) data['Lag_2'] = data['Close'].shift(2) data['Lag_3'] = data['Close'].shift(3) data = data.dropna()
Now, we will split the data accordingly for training and testing:
X_lin = data[['Lag_1', 'Lag_2', 'Lag_3']] y_lin = data['Close'] X_train_lin, X_test_lin = X_lin[:train_size], X_lin[train_size:] y_train_lin, y_test_lin = y_lin[:train_size], y_lin[train_size:]
Now, let’s train the linear regression model:
from sklearn.linear_model import LinearRegression lin_model = LinearRegression() lin_model.fit(X_train_lin, y_train_lin)
Now, here’s how to make predictions using LSTM on the test set and inverse transform the scaled predictions:
X_test_lstm = X_test.reshape((X_test.shape[0], X_test.shape[1], 1)) lstm_predictions = lstm_model.predict(X_test_lstm) lstm_predictions = scaler.inverse_transform(lstm_predictions)
Here’s how to generate predictions using Linear Regression and inverse-transform them:
lin_predictions = lin_model.predict(X_test_lin) lin_predictions = scaler.inverse_transform(lin_predictions.reshape(-1, 1))
And, here’s how to use a weighted average to create hybrid predictions:
hybrid_predictions = (0.7 * lstm_predictions) + (0.3 * lin_predictions)
Predicting using the Hybrid Model
Let’s see how to make predictions for the next 10 days using our hybrid model. Here’s how to predict the Next 10 Days using LSTM:
lstm_future_predictions = []
last_sequence = X[-1].reshape(1, seq_length, 1)
for _ in range(10):
lstm_pred = lstm_model.predict(last_sequence)[0, 0]
lstm_future_predictions.append(lstm_pred)
lstm_pred_reshaped = np.array([[lstm_pred]]).reshape(1, 1, 1)
last_sequence = np.append(last_sequence[:, 1:, :], lstm_pred_reshaped, axis=1)
lstm_future_predictions = scaler.inverse_transform(np.array(lstm_future_predictions).reshape(-1, 1))Here’s how to predict the Next 10 Days using Linear Regression:
recent_data = data['Close'].values[-3:]
lin_future_predictions = []
for _ in range(10):
lin_pred = lin_model.predict(recent_data.reshape(1, -1))[0]
lin_future_predictions.append(lin_pred)
recent_data = np.append(recent_data[1:], lin_pred)
lin_future_predictions = scaler.inverse_transform(np.array(lin_future_predictions).reshape(-1, 1))And, here’s how to combine the predictive power of both models to make predictions for the next 10 days:
hybrid_future_predictions = (0.7 * lstm_future_predictions) + (0.3 * lin_future_predictions)
Here’s how to create the final DataFrame to look at the predictions:
future_dates = pd.date_range(start=data.index[-1] + pd.Timedelta(days=1), periods=10)
predictions_df = pd.DataFrame({
'Date': future_dates,
'LSTM Predictions': lstm_future_predictions.flatten(),
'Linear Regression Predictions': lin_future_predictions.flatten(),
'Hybrid Model Predictions': hybrid_future_predictions.flatten()
})
print(predictions_df)Date LSTM Predictions Linear Regression Predictions \
0 2024-11-02 00:00:00+00:00 232.222122 230.355192
1 2024-11-03 00:00:00+00:00 231.944504 225.707291
2 2024-11-04 00:00:00+00:00 231.732971 222.703426
3 2024-11-05 00:00:00+00:00 231.574249 230.631535
4 2024-11-06 00:00:00+00:00 231.456375 225.486380
5 2024-11-07 00:00:00+00:00 231.368454 222.494588
6 2024-11-08 00:00:00+00:00 231.301605 230.930195
7 2024-11-09 00:00:00+00:00 231.249008 225.245599
8 2024-11-10 00:00:00+00:00 231.205811 222.284007
9 2024-11-11 00:00:00+00:00 231.168671 231.252375
Hybrid Model Predictions
0 231.662038
1 230.073332
2 229.024102
3 231.291435
4 229.665369
5 228.706293
6 231.190176
7 229.447978
8 228.529273
9 231.193782
Summary
So, this is how to build a hybrid machine learning model using Python. Build a hybrid machine learning model when a single algorithm cannot capture the complexity of the data or when different types of data or patterns are present. I hope you liked this article on building a hybrid machine learning model with Python. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.





