Predicting the stock market is one of the most important applications of Machine Learning in finance. In this article, I will take you through a simple Data Science project on Stock Price Prediction using Machine Learning Python. At the end of this article, you will learn how to predict stock prices by using the Linear Regression model by implementing the Python programming language. Also, Read – Machine Learning Full Course for free.
Predicting the stock market has been the bane and goal of investors since its inception. Every day billions of dollars are traded on the stock exchange, and behind every dollar is an investor hoping to make a profit in one way or another. Entire companies rise and fall daily depending on market behaviour. If an investor is able to accurately predict market movements, he offers a tantalizing promise of wealth and influence. Today, so many people are making money staying at home trading in the stock market. It is a plus point for you if you use your experience in the stock market and your machine learning skills for the task of stock price prediction. Let’s see how to predict stock prices using Machine Learning and the python programming language. I will start this task by importing all the necessary python libraries that we need for this task:
import numpy as np |
import pandas as pd |
from sklearn import preprocessing |
from sklearn.model_selection import train_test_split |
from sklearn.linear_model import LinearRegression |
In the above section, I started the task of stock price prediction by importing the python libraries. Now I will write a function that will prepare the dataset so that we can fit it easily in the Linear Regression model:
def prepare_data(df,forecast_col,forecast_out,test_size): |
label = df[forecast_col].shift(-forecast_out) #creating new column called label with the last 5 rows are nan |
X = np.array(df[[forecast_col]]) #creating the feature array |
X = preprocessing.scale(X) #processing the feature array |
X_lately = X[-forecast_out:] #creating the column i want to use later in the predicting method |
X = X[:-forecast_out] # X that will contain the training and testing |
label.dropna(inplace=True) #dropping na values |
y = np.array(label) # assigning Y |
X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size=test_size, random_state=0) #cross validation |
response = [X_train,X_test , Y_train, Y_test , X_lately] |
return response |
df = pd.read_csv("prices.csv")
df = df[df.symbol == "GOOG"]
Now we need to prepare three input variables as already prepared in the function created in the above section. We need to declare an input variable mentioning about which column we want to predict. The next variable we need to declare is how much far we want to predict. And the last variable that we need to declare is how much should be the size of the test set. Now let’s declare all the variables: 1
forecast_col = 'close'
forecast_out = 5
test_size = 0.2
Now I will split the data and fit into the linear regression model:
X_train, X_test, Y_train, Y_test , X_lately =prepare_data(df,forecast_col,forecast_out,test_size); #calling the method were the cross validation and data preperation is in
learner = LinearRegression() #initializing linear regression model
learner.fit(X_train,Y_train) #training the linear regression model
Now let’s predict the output and have a look at the prices of the stock prices:
score=learner.score(X_test,Y_test)#testing the linear regression model |
forecast= learner.predict(X_lately) #set that will contain the forecasted data |
response={}#creting json object |
response['test_score']=score |
response['forecast_set']=forecast |
print(response) |