Time Series Forecasting means analyzing and modeling time-series data to make future decisions. Some of the applications of Time Series Forecasting are weather forecasting, sales forecasting, business forecasting, stock price forecasting, etc. The ARIMA model is a popular statistical technique used for Time Series Forecasting. If you want to learn Time Series Forecasting with ARIMA, this article is for you. In this article, I will take you through the task of Time Series Forecasting with ARIMA using the Python programming language.
What is ARIMA?
ARIMA stands for Autoregressive Integrated Moving Average. It is an algorithm used for forecasting Time Series Data. ARIMA models have three parameters like ARIMA(p, d, q). Here p, d, and q are defined as:
- p is the number of lagged values that need to be added or subtracted from the values (label column). It captures the autoregressive part of ARIMA.
- d represents the number of times the data needs to differentiate to produce a stationary signal. If it’s stationary data, the value of d should be 0, and if it’s seasonal data, the value of d should be 1. d captures the integrated part of ARIMA.
- q is the number of lagged values for the error term added or subtracted from the values (label column). It captures the moving average part of ARIMA.
I hope you have now understood the ARIMA model. In the section below, I will take you through the task of Time Series Forecasting of stock prices with ARIMA using the Python programming language.
Time Series Forecasting with ARIMA
Now let’s start with the task of Time Series Forecasting with ARIMA. I will first collect Google stock price data using the Yahoo Finance API. If you have never used Yahoo Finance API, you can learn more about it here.
Now here’s how to collect data about the Google’s Stock Price:
import pandas as pd
import yfinance as yf
import datetime
from datetime import date, timedelta
today = date.today()
d1 = today.strftime("%Y-%m-%d")
end_date = d1
d2 = date.today() - timedelta(days=365)
d2 = d2.strftime("%Y-%m-%d")
start_date = d2
data = yf.download('GOOG',
start=start_date,
end=end_date,
progress=False)
data["Date"] = data.index
data = data[["Date", "Open", "High", "Low", "Close", "Adj Close", "Volume"]]
data.reset_index(drop=True, inplace=True)
print(data.tail()) Date Open High Low Close \
247 2022-06-13 2148.919922 2184.370117 2131.760986 2137.530029
248 2022-06-14 2137.800049 2169.149902 2127.040039 2143.879883
249 2022-06-15 2177.989990 2241.260010 2162.375000 2207.810059
250 2022-06-16 2162.989990 2185.810059 2115.850098 2132.719971
251 2022-06-17 2130.699951 2184.989990 2112.571045 2157.310059
Adj Close Volume
247 2137.530029 1837800
248 2143.879883 1274000
249 2207.810059 1659600
250 2132.719971 1765700
251 2157.310059 2163500
We only need the date and close prices columns for the rest of the task, so let’s select both the columns and move further:
data = data[["Date", "Close"]] print(data.head())
Date Close 0 2021-06-21 2529.100098 1 2021-06-22 2539.989990 2 2021-06-23 2529.229980 3 2021-06-24 2545.639893 4 2021-06-25 2539.899902
Now let’s visualize the close prices of Google before moving forward:
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
plt.figure(figsize=(15, 10))
plt.plot(data["Date"], data["Close"])
Using ARIMA for Time Series Forecasting
Before using the ARIMA model, we have to figure out whether our data is stationary or seasonal. The data visualization graph about the closing stock prices above shows that our dataset is not stationary. To check whether our dataset is stationary or seasonal properly, we can use the seasonal decomposition method that splits the time series data into trend, seasonal, and residuals for a better understanding of the time series data:
from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(data["Close"],
model='multiplicative', freq = 30)
fig = plt.figure()
fig = result.plot()
fig.set_size_inches(15, 10)
So our data is not stationary it is seasonal. We need to use the Seasonal ARIMA (SARIMA) model for Time Series Forecasting on this data. But before using the SARIMA model, we will use the ARIMA model. It will help you learn using both models.
To use ARIMA or SARIMA, we need to find the p, d, and q values. We can find the value of p by plotting the autocorrelation of the Close column and the value of q by plotting the partial autocorrelation plot. The value of d is either 0 or 1. If the data is stationary, we should use 0, and if the data is seasonal, we should use 1. As our data is seasonal, we should use 1 as the d value.
Now here’s how to find the value of p:
pd.plotting.autocorrelation_plot(data["Close"])

In the above autocorrelation plot, the curve is moving down after the 5th line of the first boundary. That is how to decide the p-value. Hence the value of p is 5. Now let’s find the value of q (moving average):
from statsmodels.graphics.tsaplots import plot_pacf plot_pacf(data["Close"], lags = 100)

In the above partial autocorrelation plot, we can see that only two points are far away from all the points. That is how to decide the q value. Hence the value of q is 2. Now let’s build an ARIMA model:
p, d, q = 5, 1, 2 from statsmodels.tsa.arima_model import ARIMA model = ARIMA(data["Close"], order=(p,d,q)) fitted = model.fit(disp=-1) print(fitted.summary())
ARIMA Model Results
==============================================================================
Dep. Variable: D.Close No. Observations: 251
Model: ARIMA(5, 1, 2) Log Likelihood -1328.041
Method: css-mle S.D. of innovations 48.034
Date: Tue, 21 Jun 2022 AIC 2674.083
Time: 06:12:58 BIC 2705.812
Sample: 1 HQIC 2686.851
=================================================================================
coef std err z P>|z| [0.025 0.975]
---------------------------------------------------------------------------------
const -1.5031 2.251 -0.668 0.505 -5.914 2.908
ar.L1.D.Close 0.0443 0.243 0.182 0.856 -0.432 0.520
ar.L2.D.Close 0.7582 0.204 3.712 0.000 0.358 1.158
ar.L3.D.Close -0.0690 0.079 -0.870 0.385 -0.224 0.086
ar.L4.D.Close -0.0623 0.069 -0.901 0.369 -0.198 0.073
ar.L5.D.Close 0.0992 0.075 1.327 0.186 -0.047 0.246
ma.L1.D.Close -0.0923 0.234 -0.394 0.694 -0.552 0.367
ma.L2.D.Close -0.7388 0.191 -3.877 0.000 -1.112 -0.365
Roots
=============================================================================
Real Imaginary Modulus Frequency
-----------------------------------------------------------------------------
AR.1 1.1301 -0.0000j 1.1301 -0.0000
AR.2 -1.4091 -0.2578j 1.4325 -0.4712
AR.3 -1.4091 +0.2578j 1.4325 0.4712
AR.4 1.1583 -1.7339j 2.0852 -0.1563
AR.5 1.1583 +1.7339j 2.0852 0.1563
MA.1 1.1026 +0.0000j 1.1026 0.0000
MA.2 -1.2276 +0.0000j 1.2276 0.5000
-----------------------------------------------------------------------------
Here’s how to predict the values using the ARIMA model:
predictions = fitted.predict() print(predictions)
2 -2.108482
3 -0.789990
4 -3.688940
5 -0.777623
6 -2.472432
...
247 2.866723
248 2.486679
249 7.659670
250 5.277199
251 8.960482
Length: 250, dtype: float64
The predicted values are wrong because the data is seasonal. ARIMA model will never perform well on seasonal time series data. So, here’s how to build a SARIMA model:
import statsmodels.api as sm
import warnings
model=sm.tsa.statespace.SARIMAX(data['Close'],
order=(p, d, q),
seasonal_order=(p, d, q, 12))
model=model.fit()
print(model.summary()) Statespace Model Results
==========================================================================================
Dep. Variable: Close No. Observations: 252
Model: SARIMAX(5, 1, 2)x(5, 1, 2, 12) Log Likelihood -1280.516
Date: Tue, 21 Jun 2022 AIC 2591.032
Time: 06:15:00 BIC 2643.179
Sample: 0 HQIC 2612.046
- 252
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ar.L1 -0.0803 3.857 -0.021 0.983 -7.639 7.479
ar.L2 0.9622 3.583 0.269 0.788 -6.060 7.984
ar.L3 -0.0029 0.182 -0.016 0.987 -0.360 0.354
ar.L4 0.0123 0.193 0.064 0.949 -0.365 0.390
ar.L5 0.0586 0.249 0.236 0.814 -0.429 0.546
ma.L1 0.0256 3.032 0.008 0.993 -5.918 5.969
ma.L2 -0.9726 2.979 -0.327 0.744 -6.811 4.866
ar.S.L12 0.2082 0.783 0.266 0.790 -1.327 1.743
ar.S.L24 0.1491 0.086 1.738 0.082 -0.019 0.317
ar.S.L36 -0.0226 0.182 -0.124 0.901 -0.379 0.334
ar.S.L48 -0.1415 0.089 -1.595 0.111 -0.315 0.032
ar.S.L60 -0.0981 0.132 -0.744 0.457 -0.356 0.160
ma.S.L12 -1.2637 0.717 -1.762 0.078 -2.669 0.142
ma.S.L24 0.2782 0.759 0.367 0.714 -1.210 1.766
sigma2 2203.0788 1934.635 1.139 0.255 -1588.737 5994.894
===================================================================================
Ljung-Box (Q): 29.16 Jarque-Bera (JB): 21.53
Prob(Q): 0.90 Prob(JB): 0.00
Heteroskedasticity (H): 2.69 Skew: 0.15
Prob(H) (two-sided): 0.00 Kurtosis: 4.44
===================================================================================
Now let’s predict the future stock prices using the SARIMA model for the next 10 days:
predictions = model.predict(len(data), len(data)+10) print(predictions)
252 2155.450727 253 2174.383879 254 2138.454522 255 2118.298381 256 2117.235728 257 2112.857380 258 2099.387811 259 2085.703155 260 2117.912628 261 2133.935300 262 2168.589946 dtype: float64
Here’s how you can plot the predictions:
data["Close"].plot(legend=True, label="Training Data", figsize=(15, 10)) predictions.plot(legend=True, label="Predictions")

So this is how you can use ARIMA or SARIMA models for Time Series Forecasting using Python.
Summary
ARIMA stands for Autoregressive Integrated Moving Average. It is an algorithm used for forecasting Time Series Data. If the data is stationary, we need to use ARIMA, if the data is seasonal, we need to use Seasonal ARIMA (SARIMA). I hope you liked this article about Time Series Forecasting with ARIMA using Python. Feel free to ask valuable questions in the comments section below.






Hey Aman..When i’m coding -nltk.download(punkt)..on jupyter notebook it is showing ssl certificate error.false …what is the issue