TIME SERIES FORECASTING MEANS ANALYZING

15 Jan

15Jan

Time Series Forecasting means analyzing and modeling time-series data to make future decisions. Some of the applications of Time Series Forecasting are weather forecasting, sales forecasting, business forecasting, stock price forecasting, etc. The ARIMA model is a popular statistical technique used for Time Series Forecasting. If you want to learn Time Series Forecasting with ARIMA, this article is for you. In this article, I will take you through the task of Time Series Forecasting with ARIMA using the Python programming language.

What is ARIMA?

ARIMA stands for Autoregressive Integrated Moving Average. It is an algorithm used for forecasting Time Series Data. ARIMA models have three parameters like ARIMA(p, d, q). Here p, d, and q are defined as:

p is the number of lagged values that need to be added or subtracted from the values (label column). It captures the autoregressive part of ARIMA.
d represents the number of times the data needs to differentiate to produce a stationary signal. If it’s stationary data, the value of d should be 0, and if it’s seasonal data, the value of d should be 1. d captures the integrated part of ARIMA.
q is the number of lagged values for the error term added or subtracted from the values (label column). It captures the moving average part of ARIMA.

I hope you have now understood the ARIMA model. In the section below, I will take you through the task of Time Series Forecasting of stock prices with ARIMA using the Python programming language.

Time Series Forecasting with ARIMA

Now let’s start with the task of Time Series Forecasting with ARIMA. I will first collect Google stock price data using the Yahoo Finance API. If you have never used Yahoo Finance API, you can learn more about it here. Now here’s how to collect data about the Google’s Stock Price: 1

import pandas as pd

import yfinance as yf

import datetime

from datetime import date, timedelta

today = date.today()

d1 = today.strftime("%Y-%m-%d")

end_date = d1

d2 = date.today() - timedelta(days=365)

d2 = d2.strftime("%Y-%m-%d")

start_date = d2

data = yf.download('GOOG',

                      start=start_date,

                      end=end_date,

                      progress=False)

data["Date"] = data.index

data = data[["Date", "Open", "High", "Low", "Close", "Adj Close", "Volume"]]

data.reset_index(drop=True, inplace=True)

print(data.tail())

          Date         Open         High          Low        Close  \ 247 2022-06-13  2148.919922  2184.370117  2131.760986  2137.530029   248 2022-06-14  2137.800049  2169.149902  2127.040039  2143.879883   249 2022-06-15  2177.989990  2241.260010  2162.375000  2207.810059   250 2022-06-16  2162.989990  2185.810059  2115.850098  2132.719971   251 2022-06-17  2130.699951  2184.989990  2112.571045  2157.310059          Adj Close   Volume  247  2137.530029  1837800  248  2143.879883  1274000  249  2207.810059  1659600  250  2132.719971  1765700  251  2157.310059  2163500

We only need the date and close prices columns for the rest of the task, so let’s select both the columns and move further: 1

data = data[["Date", "Close"]]

print(data.head())

        Date        Close 0 2021-06-21  2529.100098 1 2021-06-22  2539.989990 2 2021-06-23  2529.229980 3 2021-06-24  2545.639893 4 2021-06-25  2539.899902

Now let’s visualize the close prices of Google before moving forward: 1

import matplotlib.pyplot as plt

plt.style.use('fivethirtyeight')

plt.figure(figsize=(15, 10))

plt.plot(data["Date"], data["Close"])

Using ARIMA for Time Series Forecasting

Before using the ARIMA model, we have to figure out whether our data is stationary or seasonal. The data visualization graph about the closing stock prices above shows that our dataset is not stationary. To check whether our dataset is stationary or seasonal properly, we can use the seasonal decomposition method that splits the time series data into trend, seasonal, and residuals for a better understanding of the time series data: 1

from statsmodels.tsa.seasonal import seasonal_decompose

result = seasonal_decompose(data["Close"],

                            model='multiplicative', freq = 30)

fig = plt.figure()

fig = result.plot()

fig.set_size_inches(15, 10)

So our data is not stationary it is seasonal. We need to use the Seasonal ARIMA (SARIMA) model for Time Series Forecasting on this data. But before using the SARIMA model, we will use the ARIMA model. It will help you learn using both models. To use ARIMA or SARIMA, we need to find the p, d, and q values. We can find the value of p by plotting the autocorrelation of the Close column and the value of q by plotting the partial autocorrelation plot. The value of d is either 0 or 1. If the data is stationary, we should use 0, and if the data is seasonal, we should use 1. As our data is seasonal, we should use 1 as the d value. Now here’s how to find the value of p: 1

pd.plotting.autocorrelation_plot(data["Close"])

In the above autocorrelation plot, the curve is moving down after the 5th line of the first boundary. That is how to decide the p-value. Hence the value of p is 5. Now let’s find the value of q (moving average): 1

from statsmodels.graphics.tsaplots import plot_pacf

plot_pacf(data["Close"], lags = 100)

In the above partial autocorrelation plot, we can see that only two points are far away from all the points. That is how to decide the q value. Hence the value of q is 2. Now let’s build an ARIMA model: 1

p, d, q = 5, 1, 2

from statsmodels.tsa.arima_model import ARIMA

model = ARIMA(data["Close"], order=(p,d,q))

fitted = model.fit(disp=-1)

print(fitted.summary())

                             ARIMA Model Results                              ============================================================================== Dep. Variable:                D.Close   No. Observations:                  251 Model:                 ARIMA(5, 1, 2)   Log Likelihood               -1328.041 Method:                       css-mle   S.D. of innovations             48.034 Date:                Tue, 21 Jun 2022   AIC                           2674.083 Time:                        06:12:58   BIC                           2705.812 Sample:                             1   HQIC                          2686.851                                                                              =================================================================================                    coef    std err          z      P>|z|      [0.025      0.975] --------------------------------------------------------------------------------- const            -1.5031      2.251     -0.668      0.505      -5.914       2.908 ar.L1.D.Close     0.0443      0.243      0.182      0.856      -0.432       0.520 ar.L2.D.Close     0.7582      0.204      3.712      0.000       0.358       1.158 ar.L3.D.Close    -0.0690      0.079     -0.870      0.385      -0.224       0.086 ar.L4.D.Close    -0.0623      0.069     -0.901      0.369      -0.198       0.073 ar.L5.D.Close     0.0992      0.075      1.327      0.186      -0.047       0.246 ma.L1.D.Close    -0.0923      0.234     -0.394      0.694      -0.552       0.367 ma.L2.D.Close    -0.7388      0.191     -3.877      0.000      -1.112      -0.365                                    Roots                                    =============================================================================                  Real          Imaginary           Modulus         Frequency ----------------------------------------------------------------------------- AR.1            1.1301           -0.0000j            1.1301           -0.0000 AR.2           -1.4091           -0.2578j            1.4325           -0.4712 AR.3           -1.4091           +0.2578j            1.4325            0.4712 AR.4            1.1583           -1.7339j            2.0852           -0.1563 AR.5            1.1583           +1.7339j            2.0852            0.1563 MA.1            1.1026           +0.0000j            1.1026            0.0000 MA.2           -1.2276           +0.0000j            1.2276            0.5000 -----------------------------------------------------------------------------

Here’s how to predict the values using the ARIMA model: 1

predictions = fitted.predict()

print(predictions)

2     -2.108482 3     -0.789990 4     -3.688940 5     -0.777623 6     -2.472432         ...   247    2.866723 248    2.486679 249    7.659670 250    5.277199 251    8.960482 Length: 250, dtype: float64

The predicted values are wrong because the data is seasonal. ARIMA model will never perform well on seasonal time series data. So, here’s how to build a SARIMA model: 1

import statsmodels.api as sm

import warnings

model=sm.tsa.statespace.SARIMAX(data['Close'],

                                order=(p, d, q),

                                seasonal_order=(p, d, q, 12))

model=model.fit()

print(model.summary())

                                 Statespace Model Results                                 ========================================================================================== Dep. Variable:                              Close   No. Observations:                  252 Model:             SARIMAX(5, 1, 2)x(5, 1, 2, 12)   Log Likelihood               -1280.516 Date:                            Tue, 21 Jun 2022   AIC                           2591.032 Time:                                    06:15:00   BIC                           2643.179 Sample:                                         0   HQIC                          2612.046                                            - 252                                         Covariance Type:                              opg                                         ==============================================================================                 coef    std err          z      P>|z|      [0.025      0.975] ------------------------------------------------------------------------------ ar.L1         -0.0803      3.857     -0.021      0.983      -7.639       7.479 ar.L2          0.9622      3.583      0.269      0.788      -6.060       7.984 ar.L3         -0.0029      0.182     -0.016      0.987      -0.360       0.354 ar.L4          0.0123      0.193      0.064      0.949      -0.365       0.390 ar.L5          0.0586      0.249      0.236      0.814      -0.429       0.546 ma.L1          0.0256      3.032      0.008      0.993      -5.918       5.969 ma.L2         -0.9726      2.979     -0.327      0.744      -6.811       4.866 ar.S.L12       0.2082      0.783      0.266      0.790      -1.327       1.743 ar.S.L24       0.1491      0.086      1.738      0.082      -0.019       0.317 ar.S.L36      -0.0226      0.182     -0.124      0.901      -0.379       0.334 ar.S.L48      -0.1415      0.089     -1.595      0.111      -0.315       0.032 ar.S.L60      -0.0981      0.132     -0.744      0.457      -0.356       0.160 ma.S.L12      -1.2637      0.717     -1.762      0.078      -2.669       0.142 ma.S.L24       0.2782      0.759      0.367      0.714      -1.210       1.766 sigma2      2203.0788   1934.635      1.139      0.255   -1588.737    5994.894 =================================================================================== Ljung-Box (Q):                       29.16   Jarque-Bera (JB):                21.53 Prob(Q):                              0.90   Prob(JB):                         0.00 Heteroskedasticity (H):               2.69   Skew:                             0.15 Prob(H) (two-sided):                  0.00   Kurtosis:                         4.44 ===================================================================================

Now let’s predict the future stock prices using the SARIMA model for the next 10 days: 1

predictions = model.predict(len(data), len(data)+10)

print(predictions)

252    2155.450727 253    2174.383879 254    2138.454522 255    2118.298381 256    2117.235728 257    2112.857380 258    2099.387811 259    2085.703155 260    2117.912628 261    2133.935300 262    2168.589946 dtype: float64

Here’s how you can plot the predictions: 1

data["Close"].plot(legend=True, label="Training Data", figsize=(15, 10))

predictions.plot(legend=True, label="Predictions")

So this is how you can use ARIMA or SARIMA models for Time Series Forecasting using Python.

Summary

ARIMA stands for Autoregressive Integrated Moving Average. It is an algorithm used for forecasting Time Series Data. If the data is stationary, we need to use ARIMA, if the data is seasonal, we need to use Seasonal ARIMA (SARIMA). I hope you liked this article about Time Series Forecasting with ARIMA using Python. Feel free to ask valuable questions in the comments section below.

Comments