Today is the 19th day of war between Russia and Ukraine. Many countries are supporting Ukraine by introducing economic sanctions on Russia. There are a lot of tweets about the Ukraine and Russia war where people tend to update about the ground truths, what they feel about it, and who they are supporting. So if you want to analyze the sentiments of people over the Ukraine and Russian War, this article is for you. In this article, I will take you through the task of Ukraine and Russia war Twitter Sentiment Analysis using Python.
The dataset that I am using for the task of Twitter sentiment analysis on the Ukraine and Russia War is downloaded from Kaggle. This dataset was initially collected from Twitter and is updated regularly. You can download this dataset from here. Now let’s import the necessary Python libraries and the dataset to get started with this task: 1
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
import nltk
import re
from nltk.corpus import stopwords
import string
data = pd.read_csv("filename.csv")
print(data.head())
id conversation_id created_at date time \ 0 1.502530e+18 1.502260e+18 2022-03-12 06:03:14 UTC 3/12/2022 6:03:14 1 1.502530e+18 1.502530e+18 2022-03-12 06:03:14 UTC 3/12/2022 6:03:14 2 1.502530e+18 1.502530e+18 2022-03-12 06:03:13 UTC 3/12/2022 6:03:13 3 1.502530e+18 1.502210e+18 2022-03-12 06:03:12 UTC 3/12/2022 6:03:12 4 1.502530e+18 1.500440e+18 2022-03-12 06:03:12 UTC 3/12/2022 6:03:12 timezone user_id username \ 0 0 2.019880e+07 redcelia 1 0 2.275356e+08 eee_eff 2 0 8.431317e+07 mistify_007 3 0 9.898620e+17 reallivinghuman 4 0 1.164940e+18 rpcsas name place ... geo source user_rt_id \ 0 Johnson Out🇺🇦 🇪🇺🇮🇹🇦🇫💙😷 #NeverVoteTory NaN ... NaN NaN NaN 1 Wearing Masks still saves lives 🇺🇦🇲🇨🏥🌹🌹 NaN ... NaN NaN NaN 2 Brian🤸♀️ NaN ... NaN NaN NaN 3 Basha NaN ... NaN NaN NaN 4 RonJon NaN ... NaN NaN NaN user_rt retweet_id reply_to \ 0 NaN NaN [{'screen_name': 'RussianEmbassy', 'name': 'Ru... 1 NaN NaN [] 2 NaN NaN [] 3 NaN NaN [{'screen_name': 'RussianEmbassy', 'name': 'Ru... 4 NaN NaN [{'screen_name': 'IsraeliPM', 'name': 'Prime M... retweet_date translate trans_src trans_dest 0 NaN NaN NaN NaN 1 NaN NaN NaN NaN 2 NaN NaN NaN NaN 3 NaN NaN NaN NaN 4 NaN NaN NaN NaN [5 rows x 36 columns]
Let’s have a quick look at all the column names of the dataset: 1
print(data.columns)
Index(['id', 'conversation_id', 'created_at', 'date', 'time', 'timezone', 'user_id', 'username', 'name', 'place', 'tweet', 'language', 'mentions', 'urls', 'photos', 'replies_count', 'retweets_count', 'likes_count', 'hashtags', 'cashtags', 'link', 'retweet', 'quote_url', 'video', 'thumbnail', 'near', 'geo', 'source', 'user_rt_id', 'user_rt', 'retweet_id', 'reply_to', 'retweet_date', 'translate', 'trans_src', 'trans_dest'], dtype='object')
We only need three columns for this task (username, tweet, and language); I will only select these columns and move forward: 1
data = data[["username", "tweet", "language"]]
Let’s have a look at whether any of these columns contains any null values or not: 1
data.isnull().sum()
username 0 tweet 0 language 0 dtype: int64
So none of the columns has null values, let’s have a quick look at how many tweets are posted in which language: 1
data["language"].value_counts()
en 8812 pt 251 und 198 it 155 in 122 ru 85 hi 55 ja 52 es 40 ta 23 tr 19 ca 18 fr 16 et 16 tl 15 nl 14 de 13 pl 13 fi 9 ar 9 zh 9 sv 6 uk 6 te 6 mr 5 cs 4 el 4 gu 4 no 3 th 3 kn 3 ro 3 ur 2 or 2 eu 2 ko 2 ht 2 sl 2 bn 1 cy 1 ne 1 Name: language, dtype: int64
So most of the tweets are in English. Let’s prepare this data for the task of sentiment analysis. Here I will remove all the links, punctuation, symbols and other language errors from the tweets: 1
nltk.download('stopwords')
stemmer = nltk.SnowballStemmer("english")
stopword=set(stopwords.words('english'))
def clean(text): text = str(text).lower()
text = re.sub('\[.*?\]', '', text) text = re.sub('https?://\S+|www\.\S+', '', text) text = re.sub('<.*?>+', '', text)
text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
text = re.sub('\n', '', text)
text = re.sub('\w*\d\w*', '', text)
text = [word for word in text.split(' ') if word not in stopword]
text=" ".join(text)
text = [stemmer.stem(word) for word in text.split(' ')]
text=" ".join(text)
return text
data["tweet"] = data["tweet"].apply(clean)
Now let’s have a look at the wordcloud of the tweets, which will show the most frequently used words in the tweets by people sharing their feelings and updates about the Ukraine and Russia war: 1
text = " ".join(i for i in data.tweet)
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(text)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
Now I will add three more columns in this dataset as Positive, Negative, and Neutral by calculating the sentiment scores of the tweets: 1
nltk.download('vader_lexicon')
sentiments = SentimentIntensityAnalyzer()
data["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in data["tweet"]]
data["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in data["tweet"]]
data["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in data["tweet"]]
data = data[["tweet", "Positive", "Negative", "Neutral"]]
print(data.head())
tweet Positive Negative \ 0 russianembassi ft mfarussia jeffdsach csdcolum... 0.077 0.284 1 kidnap without charg access lawyer putin russi... 0.000 0.000 2 much western civil everyon feel compel find cr... 0.144 0.259 3 russianembassi love place ill visit sure next ... 0.291 0.126 4 israelipm iaeaorg didnt know state israel advi... 0.000 0.000 Neutral 0 0.639 1 1.000 2 0.596 3 0.583 4 1.000
Now let’s have a look at the most frequent words used by people with positive sentiments: 1
positive =' '.join([i for i in data['tweet'][data['Positive'] > data["Negative"]]])
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(positive)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
Now let’s have a look at the most frequent words used by people with negative sentiments: 1
negative =' '.join([i for i in data['tweet'][data['Negative'] > data["Positive"]]])
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(negative)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
So this is how you can analyze the sentiments of people over the Ukraine and Russia war. I hope this war gets over soon and things get back to normal.
There are a lot of tweets about the Ukraine and Russia war where people tend to update about the ground truths, what they feel about it, and who they are supporting. I used those tweets for the task of Twitter sentiment analysis on the Ukraine and Russia war. I hope you liked this article. Feel free to ask valuable questions in the comments section below.