Spam comments detection means classifying comments as spam or not spam. YouTube is one of the platforms that uses Machine Learning to filter spam comments automatically to save its creators from spam comments. If you want to learn how to detect spam comments with Machine Learning, this article is for you. In this article, I will take you through the task of Spam comments detection with Machine Learning using Python.
Detecting spam comments is the task of text classification in Machine Learning. Spam comments on social media platforms are the type of comments posted to redirect the user to another social media account, website or any piece of content. To detect spam comments with Machine Learning, we need labelled data of spam comments. Luckily, I found a dataset on Kaggle about YouTube spam comments which will be helpful for the task of spam comments detection. You can download the dataset from here. In the section below, you will learn how to detect spam comments with machine learning using the Python programming language.
Let’s start this task by importing the necessary Python libraries and the dataset: 1
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
data = pd.read_csv("Youtube01-Psy.csv")
print(data.sample(5))
COMMENT_ID AUTHOR \ 287 z13vhnh5ewvdyzh3o23bjz55lxbwjznor04 diego acosta 43 z12jvnua2tifirkvk23cfjtpxwmgxfch004 Didier Drogba 265 z13ucxdzemugi1v5n04ccjloko25drfb4js Haley Harmicar 322 z13uffbajziyw5cfp23bwbw5auytzdl5b04 Juris Dumagan 89 z12pzpvbfl2igbwhe04cihtpuwymvr5gvsg0k NstyIC Gold DATE CONTENT \ 287 2014-11-08T10:05:27 If I get 100 subscribers, I will summon Freddy... 43 2014-01-20T06:57:25 http://www.twitch.tv/jaroadc come follow and w... 265 2014-11-08T05:35:42 9 year olds be like, 'How does this have 2 bil... 322 2014-11-12T11:03:25 I think he was drunk during this :) x) 89 2014-11-03T20:41:23 Ching Ching ling long ding ring yaaaaaa Ganga ... CLASS 287 1 43 1 265 0 322 0 89 0
We only need the content and class column from the dataset for the rest of the task. So let’s select both the columns and move further: 1
data = data[["CONTENT", "CLASS"]]
print(data.sample(5))
CONTENT CLASS 160 CHECK MY CHANNEL FOR MY NEW SONG 'STATIC'!! YO... 1 157 Follow me on Twitter @mscalifornia95 1 336 To everyone joking about how he hacked to get ... 0 329 FOLLOW MY COMPANY ON TWITTER thanks. https:/... 1 79 Hi there~I'm group leader of Angel, a rookie K... 1
The class column contains values 0 and 1. 0 indicates not spam, and 1 indicates spam. So to make it look better, I will use spam and not spam labels instead of 1 and 0: 1
data["CLASS"] = data["CLASS"].map({0: "Not Spam" 1: "Spam Comment"})
print(data.sample(5))
CONTENT CLASS 161 Incmedia.org where the truth meets you. Spam Comment 335 Hey guys can you check my YouTube channel I kn... Spam Comment 134 ❤️ ❤️ ❤️ ❤️ ❤️❤️❤️❤️ Not Spam 209 How can this music video get 2 billion views w... Not Spam 45 ....subscribe...... ......to my........ ....... Spam Comment
Now let’s move further by training a classification Machine Learning model to classify spam and not spam comments. As this problem is a problem of binary classification, I will use the Bernoulli Naive Bayes algorithm to train the model: 1
x = np.array(data["CONTENT"])
y = np.array(data["CLASS"]
cv = CountVectorizer()
x = cv.fit_transform(x)
xtrain, xtest, ytrain, ytest = train_test_split(x, y,
test_size=0.2,
random_state=42)
model = BernoulliNB()
model.fit(xtrain, ytrain)
print(model.score(xtest, ytest))
0.9857142857142858
Now let’s test the model by giving spam and not spam comments as input: 1
sample = "Check this out: https://thecleverprogrammer.com/"
data = cv.transform([sample]).toarray()
print(model.predict(data))
['Spam Comment']
sample = "Lack of information!"
data = cv.transform([sample]).toarray()
print(model.predict(data))
['Not Spam']
So this is how you can train a Machine Learning model for the task of spam detection using Python.
Spam comments detection means classifying comments as spam or not spam. Spam comments on social media platforms are the type of comments posted to redirect the user to another social media account, website or any piece of content. I hope you liked this article on detecting spam comments with Machine Learning. Feel free to ask valuable questions in the comments section below.