05 Jan
05Jan

Website Traffic Forecasting means forecasting traffic on a website during a particular period. It is one of the best use cases of Time Series Forecasting. If you want to learn how to forecast traffic on a website, this article is for you. In this article, I will take you through the task of Website Traffic Forecasting using Python.

Website Traffic Forecasting using Python

The dataset I am using for Website Traffic Forecasting is collected from the daily traffic data of thecleverprogrammer.com. It contains data about daily traffic data from June 2021 to June 2022. You can download the dataset from here. Now let’s get started with the task of website traffic forecasting by importing the necessary Python libraries and the dataset: 1

import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_pacf
from statsmodels.tsa.arima_model import ARIMA
import statsmodels.api as sm
data = pd.read_csv("Thecleverprogrammer.csv")
print(data.head())
         Date  Views 0  01/06/2021   7831 1  02/06/2021   7798 2  03/06/2021   7401 3  04/06/2021   7054 4  05/06/2021   7973

The dataset contains two columns, date and traffic. Before moving forward, I will convert the Date column into Datetime data type: 1

data["Date"] = pd.to_datetime(data["Date"],  format="%d/%m/%Y")
print(data.info())
<class 'pandas.core.frame.DataFrame'> RangeIndex: 391 entries, 0 to 390 Data columns (total 2 columns): #   Column  Non-Null Count  Dtype         ---  ------  --------------  -----          0   Date    391 non-null    datetime64[ns] 1   Views   391 non-null    int64         dtypes: datetime64[ns](1), int64(1) memory usage: 6.2 KB None

The Date time column was an object initially, so I converted it into a Datetime column. Now let’s have a look at the daily traffic of the website: 1

plt.style.use('fivethirtyeight')
plt.figure(figsize=(15, 10))
plt.plot(data["Date"], data["Views"])
plt.title("Daily Traffic of Thecleverprogrammer.com")
plt.show()

Our website traffic data is seasonal because the traffic on the website increases during the weekdays and decreases during the weekends. It is valuable to know if the dataset is seasonal or not while working on the problem of Time Series Forecasting. Below is how we can have a look at whether our dataset is stationary or seasonal: 1

result = seasonal_decompose(data["Views"],  model='multiplicative',   freq = 30)
fig = plt.figure()  
fig = result.plot()  
fig.set_size_inches(15, 10)

I will be using the Seasonal ARIMA (SARIMA) model to forecast traffic on the website. Before using the SARIMA model, it is necessary to find the p, d, and q values. You can learn how to find p, d, and q values from here. As the data is not stationary, the value of d is 1. To find the values of p and q, we can use the autocorrelation and partial autocorrelation plots: 1

pd.plotting.autocorrelation_plot(data["Views"])
p = 5
plot_pacf(data["Views"], lags = 100)
q = 2

Now here’s how we can train a SARIMA model for the task of website traffic forecasting: 1

p, d, q = 5, 1, 2
model=sm.tsa.statespace.SARIMAX(data['Views'], order=(p, d, q), seasonal_order=(p, d, q, 12))
model=model.fit()
print(model.summary())
                                 Statespace Model Results                                 ========================================================================================== Dep. Variable:                              Views   No. Observations:                  391 Model:             SARIMAX(5, 1, 2)x(5, 1, 2, 12)   Log Likelihood               -3099.402 Date:                            Tue, 28 Jun 2022   AIC                           6228.803 Time:                                    07:01:10   BIC                           6287.827 Sample:                                         0   HQIC                          6252.229                                            - 391                                         Covariance Type:                              opg                                         ==============================================================================                 coef    std err          z      P>|z|      [0.025      0.975] ------------------------------------------------------------------------------ ar.L1          0.7808      0.134      5.836      0.000       0.519       1.043 ar.L2         -0.7973      0.135     -5.920      0.000      -1.061      -0.533 ar.L3         -0.1442      0.170     -0.850      0.395      -0.477       0.188 ar.L4         -0.1833      0.151     -1.210      0.226      -0.480       0.114 ar.L5         -0.1548      0.139     -1.117      0.264      -0.426       0.117 ma.L1         -1.1826      0.094    -12.515      0.000      -1.368      -0.997 ma.L2          0.8856      0.078     11.304      0.000       0.732       1.039 ar.S.L12      -0.2606      4.608     -0.057      0.955      -9.293       8.772 ar.S.L24       0.0428      0.781      0.055      0.956      -1.488       1.573 ar.S.L36      -0.1880      0.246     -0.764      0.445      -0.670       0.294 ar.S.L48      -0.2151      0.959     -0.224      0.823      -2.095       1.664 ar.S.L60       0.0127      0.986      0.013      0.990      -1.920       1.946 ma.S.L12      -0.6902      4.611     -0.150      0.881      -9.728       8.348 ma.S.L24      -0.0994      3.637     -0.027      0.978      -7.228       7.029 sigma2      1.257e+06   1.59e+05      7.914      0.000    9.46e+05    1.57e+06 =================================================================================== Ljung-Box (Q):                      102.98   Jarque-Bera (JB):                 1.32 Prob(Q):                              0.00   Prob(JB):                         0.52 Heteroskedasticity (H):               1.03   Skew:                             0.14 Prob(H) (two-sided):                  0.85   Kurtosis:                         3.01 ===================================================================================

Now let’s forecast traffic on the website for the next 50 days: 1

predictions = model.predict(len(data), len(data)+50)
print(predictions)
391     9874.390136 392    10786.957398 393    10757.445305 394     9863.890552 395     8765.031698 396     8212.310651 397     8929.181869 398     9685.809771 399    10270.622236 400    10625.904093 401     9854.870630 402     9362.193417 403     9040.021193 404     9081.558484 405    10538.993124 406    11003.816870 407    10897.859601 408    10083.291284 409     9445.806523 410     8629.901288 411     9184.420361 412    10392.770399 413    10593.941868 414    10788.128238 415    10263.101427 416     9449.467789 417     9040.226113 418     9168.972091 419     9887.094079 420    10218.658067 421    10715.657122 422     9899.224399 423     9541.622897 424     9065.810941 425     8825.335634 426    10137.936392 427    10839.866240 428    10905.862922 429    10411.640309 430     9451.211368 431     8698.339931 432     8725.534103 433    10060.678587 434    10506.263524 435    10842.515622 436    10485.387495 437     9335.244813 438     9175.122336 439     9357.034382 440    10295.910655 441    11162.934817 dtype: float64

Here’s how we can plot the predictions: 1

data["Views"].plot(legend=True, label="Training Data",  figsize=(15, 10))
predictions.plot(legend=True, label="Predictions")

Summary

So this is how you can forecast website traffic for a particular period. Website traffic prediction is one of the best data science project ideas you can mention on your resume. I hope this article has been helpful for you to learn website traffic prediction using the Python programming language. Feel free to ask valuable questions in the comments section below. 

Comments
* The email will not be published on the website.
I BUILT MY SITE FOR FREE USING