Prediction of People Sentiments on Twitter Using Machine Learning Classifiers During Russian-Ukrainian Conflict

doi:10.21203/rs.3.rs-2410016/v1

Download PDF

Research Article

Prediction of People Sentiments on Twitter Using Machine Learning Classifiers During Russian-Ukrainian Conflict

https://doi.org/10.21203/rs.3.rs-2410016/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Social media has become an excellent way to discover people's thoughts about various topics and situations. In recent years, many studies have focused on social media during crises, including natural disasters or wars caused by individuals. This study looks at how people expressed their feelings on Twitter during the conflict between Russia and Ukraine. This study met two goals: the collected data was unique, and it used machine learning (ML) to classify the tweets based on their effect on people's feelings. The first goal was to find the most relevant hashtags about the conflict to locate the data set. The second goal was to use several well-known ML models to organize the tweets into groups. The experimental results have shown that most of the performed ML classifiers have higher accuracy with a balanced dataset. However, the findings of the demonstrated experiments using data balancing strategies would not necessarily indicate that all classes would perform better. Therefore, it is important to highlight the importance of comparing and contrasting the data balancing strategies employed in SA and ML studies, including more classifiers and a more comprehensive range of use cases.

Sentiment Analysis

Machine Learning

Classification Algorithm

Imbalanced Data Classification

Russian-Ukrainian conflict

The occurrence of crises in human societies impacts the lives of people. The reactions of these societies may be investigated by listening to peoples' ideas and comprehending their sentiments. Sentimental analysis is; thus, an important subject of study in Natural Language Processing (NLP) and information extraction. It seeks to elicit the writers’ emotions reflected in positive or negative words by evaluating a wide range of information [1]. It is conducted that social networking platforms such as Facebook, Twitter, Linkedin, and others have become significant in expressing and exchanging opinions with others about particular political or social events or inevitable social crises [2]. There has been an increase in traffic on social media, consisting of instructive messages, emotional outbursts, helpful safety suggestions, and various rumors. It is crucial to obtain an understanding of the behavior that is expressed on social media sites to have a better comprehension of how to regulate and manage a crisis [3]. In times of natural disasters, it is possible to benefit from the analysis of sentiments that appear on social media in real-time with the occured event to increase the operations of human saving lives [4], [5].

Twitter has a micro-blogging service that allows users to share a message in conjunction with the event, and it is called tweets with a length of characters not exceeding 140 characters. Following related hashtags makes it easier to follow hot events and news worldwide [2]. Thus, here it can display the crisis subject into two fields, the first type is about natural disasters and the second type of crisis is available because of conflict or any natural cause.

It is noted that exploring feelings has become essential, especially in studying and processing the natural language [6]. For instance, about the first field and depending on Twitter, the Sentiment Analysis (SA) is used in the California Campfires, one of the most damaging and destructive wildfires in the history of California [7]. Despite this, studies of SA of natural disasters are insufficient. Many disasters, such as floods, wars, and earthquakes, have occurred frequently. In addition, disasters have victims in souls and properties with negative dimensions for society in many respects. Little research has appeared about these phenomena, which studies peoples reactions to mitigate, prepare, recover and respond to disasters and reduce their damage to general citizens and economies [8].

In the second field, Twitter and other social media platforms have taken the lead in expressing opinions and sentiments during conflicts by allowing researchers to conduct real-time assessments that may aid authorities to develope early response strategies [9]. For example, one study examined one of the most difficult challenges in international politics, the rule of Taliban in Afghanistan after the withdrawal of US soldiers, using public opinion expressed in tweets [10]. In addition, a new way proposed for real-time SA on the current Refugee Crisis to provide some prediction on polarity types of political improvement based on Twitter data [11]. It is critical to assess the effectiveness of various supervised Machine Learning (ML) classifiers that have yet to be discovered in the literature. The study uses ML techniques to present a method for SA on Twitter data comprising tweets about Afghanistan [10]. The study identified algorithms and measures for evaluating the performance of supervised ML classifiers on tweets on the US war in Afghanistan [9]. In addition, ML was applied to the refugee crisis, yielding final-level decisions regarding the number of individuals commenting in support of refugees and the number of individuals commenting against refugees through binomial classification of positive and negative using supervised ML algorithms [11]. The goal of this study is to predict people's sentiments on Twitter during the Russian-Ukrainian conflict by using machine learning classifiers.

Our paper direction will be as following: section 2 will be discuss the related works to this topic, in section 3 we will discuss our methodology, section 4 we will discuss the experimental results, and the conclusion will be placed at section 5.

This section is divided into two subsections: Literature review and Classification methods. In related works, researchs about crisis inflects will be discussed, and then related ML classification methods will be examined.

2.1. Literature Review

SA has garnered significant interest in the recent years due to the prominence of social media sites such as Twitter and Facebook. In addition, the availability of voluminous data in the form of tweets, reviews, and comments expedited its development. As a result, there is a substantial body of literature on SA [10]. The proposed method detects bogus news using sentiments such as positive and negative scores. Elmurngi and Gherbi [12] use statistical approaches to assess the efficacy of spambot systems in the sentimental analysis arena. The task-specific precision of various ML models is tested. Wael et al. [13] also use SA to identify Western media's bias toward the Palestinian-Israeli crisis. This process comprises finding deceptive terms, vocabularies, and idioms used to sway public opinion about the Israel-Palestine crisis.

The refugee issue was also considered utilizing SA. For instance, Ozturk and Ayvaz [14] analyze Turkish and English tweets to address the challenges of Syrian refugees. They examined public feelings and opinions regarding the Syrian refugee situation. The results demonstrated a substantial variation in sentiment between Turkish and English tweets. The data also indicate that the Turkish tweets include more optimistic sentiments. A comparative examination reveals that Turkish tweets contain somewhat more positive than negative or neutral sentiments toward Syrian refugees. Another considered issue was terrorism; for instance, Mansour [15] conducts SA on tweets related to ISIS to gain insights into how people feel about acts related to terrorism. The Term Frequency-Inverse Document Frequency (TF-IDF) approach was applied in the study to perform SA on tweets on ISIS.

Other researchers have also used Twitter SA to extract various public opinions and feelings expressed during crisis events such as civil wars and natural disasters [16], [17]. Identifying these feelings is important for understanding the situations' dynamics and their emotional impact on affected people. A related study shows that debriefing during a disaster can help authorities develop more critical situational awareness and other programs to manage future events [18]. Studies show users' emotions fluctuate depending on location and proximity to the disaster site. For example, a study was conducted to assess the situation and public opinion regarding Brexit, in which more than 16 million tweets were collected. This study uncovered the most popular daily Twitter debates on the topics and discovered a positive correlation between Twitter's attitude towards Brexit and British pound exchange rate using the VADER library [19].

According to the information reported above, SA has become a significant research topic in artificial intelligence. According to a survey of the available literature, various Twitter SA research employs classic ML algorithms to estimate sentiment from tweets [20]–[22]. These approaches tackle SA problems as if they were text classification problems. These algorithms provide strong accuracies with fewer computer resources and used widely in SA of Twitter data [21].

This study uses a pre-annotated dataset using TextBlob and then applies various ML classifiers for SA to analyze tweets related to the Russian-Ukrainian war. To the best of our knowledge, this research is among a few (if any) that seeks to provide valuable insights from tweet content related to the Russian-Ukrainian war. Findings can be a reputable source of information to help governments and international organizations understanding social media trends and public views on the situation in Ukraine.

2.2. Classification Methods

In SA, ML classifiers have been utilized in diverse research groups for text classification. These various classifiers have provided different results depending on the applied case study.

ML is a type of optimization programming that uses training data or past experiences to find the best way to meet a performance criterion. The optimization model could be used to predict what will happen in the future, learn something from the data, or for both uses. First, a good training algorithm is needed to solve the optimization issues and store and process a considerable amount of data. Second, whenever the model is trained and learned well, the representation and algorithm solutions for inference must be efficient and effective. Even though it takes a lot of space and time, the reliability of the learning algorithm is just as important as how well it can predict the future [23].

All of the ML classifiers that have been covered up to this point produce great results across a wide range of scenarios, whether they involve SA or other ML context issues. If a classifier for the SA problem is run on two different datasets or case studies, it may produce different results each time. Many points should be considered here, like the dataset's size, the classification's goal, and the number of classes needed to be predicted or put into groups, features and parameters. Therefore, there is no foolproof method for choosing a specific classifier for a given scenario unless its performance is good from the start or compared with the performance of other classifiers to confirm whether it is the best fit for such a scenario. For that purpose, a total of eight different classifiers will be evaluated side-by-side to check the most effective one in solving SA problem in this study. This subsection will describe the primary classifiers utilized during this work. These classifiers include K-Nearest Neighbor (KNN), Random Forest (RF), Decision Tree (DT), Logistic Regression (LR), Naïve Bayes (NB), XGBoost, AdaBoost, and Multi-Layer Perception (MLP).

2.2.1. KNN

KNN is an essential ML classifier that uses instance-based learning. Text classification uses similarity measurements, which figure out how similar two points are by estimating their distance, proximity, or clustering function [24]. In KNN, all training documents are saved, and the calculations are postponed up to the classification stage [25]. KNN assigns a class based on the categories of the top neighbors of the labeled samples in the training set for each test document. The closer neighbors with the same category, the more likely that prediction is to be correct [26].

2.2.2. RF

RF is an ensemble classifier that uses bootstrapping and bagging to train several decision trees simultaneously [27]. When using an RF classifier, the final prediction is based on the most commonly observed class of objects, and this method is called bagging [28]. A large number of predictors necessitates a lot of planted trees. Individual decision trees can be randomly decorated in various ways, for example, by selecting random features or data subsets [29]. To avoid overfitting problems that can occur with individual decision trees because of their tremendous flexibility, RF uses many decision trees on a complex random subset of variables to create an effective solution [30].

2.2.3. DT

DT is a ML classification algorithm that works like a hierarchical tree. It utilizes attribute value constraints to split the training data into few parts and uses different tests to display the tree branch. Each branch slopes from the node matches the feature value. A DT works well as the text classification model does not have many features, but it is tough to make a classifier when there are a lot of features [31].

2.2.4. LR

LR is considered as one of the most prevalent approaches to ML classification [32]. It does this by employing the concept of probability for a single test result by utilizing a logistic function in which the resulting probability might be either 1 or 0 [33]. This methodology has been implemented in a wide range of SA research [40]. For this reason, it is deemed to be one of the ML classifiers to be evaluated in the classification problem conducted by this research.

2.2.5. NB

NB algorithm is one of the most straightforward examples of a probabilistic classifier [34]. The training documents are used to estimate a class-conditional document distribution, while Bayes' rule is used to get an estimate for test documents. The documents themselves are represented by their words. Furthermore, Naive Bayes may be better than discriminative classifiers for small sample sizes of data because it has a built-in regularization that makes this method less likely to overfit [35].

2.2.6. XGBoost

Extreme gradient boosting, further called XGBoost, is a fast and scalable ML classifier that uses the gradient tree boosting method. The classifier operates in parallel with great application performance, and its boosting section employs ensemble approaches [36]. This classifiers' settings can be adjusted to provide various good results, therefore, it has been evaluated for this research's performance comparison.

2.2.7. AdaBoost

AdaBoost is the first functional boosting classifier suggested by Freund [37]. It combines multiple base classifiers, usually decision trees, to build an accurate classifier. It invokes a weak classifier and provides a various training data distribution for each call. The classifier can remove unnecessary features in the training data so that important features are used in the training process. AdaBoost has been utilized in various application studies [38], and it has been deemed suitable for the comparative results provided in this study.

2.2.8. MLP

The MLP is an artificial neural network architecture, which is probably the most widely used for classification and regression today [39]. MLPs are feed-forward neural networks usually made up of several layers of nodes that only connect in one direction and are usually trained by backpropagation [40].

This section will cover many methodological approaches that were taken throughout this research. As can be shown in Fig. 1, a total of four main steps are provided. The first step is to examine the data collection process, beginning with identifying keywords and presenting the final results. In the following step, preprocessing procedures have been applied to the dataset, starting with the initial filtration and continuing to the finish of the processing. Following the conclusion of the previous step, the following step addresses the topic of preparing the dataset for ML classification with a discussion that includes activities such as the annotation technique and feature extraction. The last step takes the results from the previous step, applies the ML prediction models that were suggested for this research, and discusses the results of the models' performance.

3.1. Data Collection

This project's data collection began on the first day that Russia and Ukraine started talking. The tweets was gathered between February 24 and July 4, 2022. The goal of the data collection was Twitter since it is the most popular place for people to share their thoughts and Tweets about the conflict and crisis between Russia and Ukraine. Several keywords like “#UkraineVSRussia, #UkraineConflict, #UkraineCrisis, #UkraineWarCrimes, #stopwar, #UkraineWar, #UkraineRussia and #ukrainianwar” were used to get the tweets that were required. We have used these terms, with the "Twint" project tool to scrape 1,245,791 tweets [41]. This tool lets the users scrape tweets and other information from the past [42], [43]. All of the collected Tweets were saved as Comma-separated values (.csv) for analysis in this research.

3.2. Preprocessing

In this section, preprocessing is done as a necessary step before using the ML classifiers to do actual SA. Before using Twitter’s data for further processing, the data must undergo a filtration process to eliminate things that have no relation. Several Python tools and libraries were used, such as the Pandas-Python Data Analysis Library and the Python NLP toolkit. The first step is to normalize the data to clean it up. As part of this process, non-English tweets, emojis, URLs, and other special characters are filtered out. As a part of normalization, all capital letters are turned into small letters to simplify using text in the analysis. The following steps are Tokenization, which is a preprocessing method that separates text into small pieces called "tokens." During tokenization, all special characters are taken out, the spaces between words are found, and abbreviations, numbers, and other special characters are processed. The last step is removing English language stop words like, "is," "a," and "an", which do not add much to the text about a specific topic.

3.3. Data Annotation

Tweet annotation labels tweets categories by the research's main case for the classification [44]. Accordingly, the dataset was annotated into two categories for binary classification using TextBlob. The two categories are; negative tweets regarding the conflict annotated by 0, and positive tweets regarding the conflict annotated by 1. A total of 362,246 relevant tweets were considered after applying labeling to our collected dataset. The number of tweets in each of the two categories is depicted in Fig. 2.

As we can notice from the figure, there are two classes of repented tweets. The first class labelled 1, has 226,936 tweets about war, and the second class, labeled 0, has 135,310 tweets about the same conversations during the conflict. The number of tweets in the first category is greater than the number in the second category, which could affect the results of the ML classifiers in the studied case. Synthetic Minority Oversampling Technique (SMOTE), a distribution-balancing technique, will be implemented to address this problem. Furthermore, classification findings will be shown before and after the application of the SMOTE to establish the extent to which this technique improves categorization.

3.4. SMOTE

SMOTE, is used in ML research for data balance. Generating data samples of minority classification labels such as the number of samples from each group is nearly equal [45]. As described before, the dataset did not provide an equal distribution of categories, and this can cause the ML models to overfit.

3.5. Feature Extraction

For feature engineering, this research adopts TF-IDF. This approach operates by extracting weighted features from the data and assigning each data term with a few weight value into the model to enhance the performance of ML classifiers [46]. TF-IDF focuses on the most distinctive words, making its integration preferable to overcome the limitation of depending on word counts in SA research. Mathematical functions for TF-IDF are represented in Equations 1, and 2 as follows;

$$\text{t}\text{f}\left(\text{t},\text{d}\right)= \text{l}\text{o}\text{g}\left(1+{\text{f}}_{\text{t},\text{d}}\right)$$

$$\text{I}\text{d}\text{f}\left(\text{t}\right)= \text{log}\left(\frac{1+\text{N}}{1+ {\text{n}}_{t}}\right)$$

Where tf(t,d) represents the count of term $\text{t}$ in document d. N Represents the total documents number, and finally n representing documents containing term t.

3.6. Data Splitting

In this study, the data are shuffled to make the classification performance more generalizable, reduce the variance and avoid model overfitting then, and the data are split into 80:20 ratios where 80% are for training the model and 20% for testing the model.

3.7. Performance Metrics

The performance evaluation measures that are discussed in this research includes, Accuracy (Acc.), Precision (Pr.), Recall (Re.), F1 score, and Matthews Correlation Coefficient (MCC). These metrics are defined as follows:

$Accuracy\left(Acc\right)= \frac{TP+TN}{TP+TN+FP+FN}$	(1)
$Precision \left(Pr.\right)= \frac{TP}{TP+FP}$	(2)
$Recall \left(Re.\right)= \frac{TP}{TP+FN}$	(3)
$F1 Score=2x \frac{Pr. x Re.}{Pr. +Re.}$	(4)
$MCC= \frac{TP x TN-FP x FN}{\sqrt{\left(TP+FP\right)x \left(TP+FN\right)x\left(TN+FP\right)x(TN+FN)}}$	(5)

Where TP, FP, TN, and FN stand for true positive, false positive, true negative, and false positive, respectively.

All of the classifiers were subjected to two separate experiments: the first one, the data were imbalanced, and the second one, the issue of imbalanced data was addressed and handled using SMOTE. Table 1 shows evaluation metric, including accuracy, precision, recall, F-score, and MCC before applying SMOTE.

Table 1

Results of ML models before applying SMOTE
Model	Class	Precision	Recall	F-Score	Accuracy	MCC
KNN	0	0.76	0.69	0.72	0.80	0.571
	1	0.83	0.87	0.85
	Macro avg	0.79	0.78	0.78
RF	0	0.95	0.92	0.93	0.95	0.894
	1	0.95	0.97	0.96
	Macro avg	0.95	0.94	0.95
DT	0	0.89	0.92	0.90	0.93	0.847
	1	0.95	0.94	0.94
	Macro avg	0.92	0.93	0.92
LR	0	0.96	0.95	0.95	0.97	0.928
	1	0.97	0.98	0.97
	Macro avg	0.97	0.96	0.96
NB	0	0.85	0.86	0.85	0.89	0.762
	1	0.91	0.91	0.91
	Macro avg	0.88	0.88	0.88
XGBoost	0	0.93	0.83	0.88	0.91	0.816
	1	0.91	0.96	0.93
	Macro avg	0.92	0.90	0.91
AdaBoost	0	0.88	0.54	0.67	0.80	0.572
	1	0.78	0.96	0.96
	Macro avg	0.83	0.75	0.76
MLP	0	0.97	0.97	0.97	0.98	0.956
	1	0.98	0.98	0.98
	Macro avg	0.98	0.98	0.98

In the first experiment, it can be shown that MLP and LR are performed superiorly to all of the other ML classifiers in terms of accuracy, with scores of 0.98 and 0.97, respectively. Besides that, RF, DT, and XGBoost are followed with scores of 0.95, 0.93, and 0.91 correspondingly. Regarding the models lie on the left side, NB obtained the highest accuracy, which was 0.89, followed by KNN and AdaBoost with the lowest accuracy of 0.80. In addition to accuracy, the MCC has been recognized in the literature as a comprehensive performance evaluation for binary classification issues, especially true when using imbalanced and balanced datasets as an evaluating criteria. In this regard, the MCC scored the most for MLP with a value of 0.956, followed by LR with a value of 0.928, and MLP with a value of 0.956. However, the score had the lowest for AdaBoost and KNN with values of 0.572 and 0.571 correspondingly.

Next, the same classifiers were applied again after balancing the distributed dataset using SMOTE. This experiment was carried out to demonstrate how SMOTE can improve the performance of classifiers after they had been applied to an imbalanced dataset. As they are involved here, earlier employed evaluation measures can also be found in Table 2.

Table 2

Results of ML models after applying SMOTE
Model	Class	Precision	Recall	F-Score	Accuracy	MCC
KNN	0	0.62	0.99	0.76	0.69	0.423
	1	0.97	0.40	0.57
	Macro avg	0.79	0.69	0.66
RF	0	0.95	0.96	0.96	0.96	0.910
	1	0.96	0.95	0.96
	Macro avg	0.96	0.96	0.96
DT	0	0.95	0.94	0.94	0.94	0.884
	1	0.94	0.95	0.94
	Macro avg	0.94	0.94	0.94
LR	0	0.97	0.97	0.97	0.97	0.944
	1	0.97	0.97	0.97
	Macro avg	0.97	0.97	0.97
NB	0	0.86	0.91	0.89	0.88	0.767
	1	0.91	0.85	0.88
	Macro avg	0.88	0.88	0.88
XGBoost	0	0.95	0.90	0.92	0.93	0.853
	1	0.90	0.95	0.93
	Macro avg	0.93	0.93	0.93
AdaBoost	0	0.92	0.59	0.72	0.77	0.577
	1	0.70	0.95	0.80
	Macro avg	0.81	0.77	0.76
MLP	0	0.99	0.98	0.99	0.99	0.970
	1	0.98	0.99	0.98
	Macro avg	0.99	0.99	0.99

For the following experiment, it can be observed that the best performance was attributed for two MLP with 0.99 accuracy, followed by LR, RF, then DT with accuracies of 0.97, 0.96, and 0.94, respectively. Worst accuracy performance as attributed to AdaBoost with 0.77, then KNN with 0.71. As for MCC results, MLP was the highest classifier with 0.97, followed by LR with 0.944, then RF with 0.910. Worst MCC performance was observed at 0.477 in KNN classifier. It is indicated from the results that some classifiers' accuracies have improved after SMOTE was applied to the imbalanced dataset. At the same time, some classifiers' performance has degraded. Still, it is confirmed that MCC across all classifiers has improved, which shows its suitability in performance evaluation after balancing the dataset.

4.1. Comparative Analysis

This subsection compares the accuracy and MCC values of the results for all ML classifiers before and after using SMOTE. The comparison is illustrated in Figs. 3 and 4, respectively.

It is observed that accuracy and MCC are among the most important measures used to evaluate the performance of ML classifiers. Based on the analysis and the analyzed case in this research, it is evident that when the SMOTE technique was applied, the performance of four of the classifiers increased: RF from 0.95 to 0.96, DT from 0.93 to 0.94, XGBoost from 0.91 to 0.93, and MLP from 0.98 to 0.99. With 0.97, only LR kept its accuracy before and after SMOTE. However, the remaining three ML classifiers, Adaboost, KNN, and NB, did not demonstrate any gain in the accuracy. These results indicate the suitability of the SMOTE technique in terms of accuracy. However, another important measure known as MCC introduced in the literature is more robust and trustworthy than balanced accuracy in F1 score and binary classification analysis [47]. From the MCC data, it is evident that most classifiers exhibited an increase before and after implementing SMOTE, with the greatest improvement reported for MLP 0.97 MCC score, followed by LR 0.944 MCC score. The MCC scores for RF, DT, and XGBoost are 0.91, 0.884, and 0.853, respectively. Only the AdaBoost classifier showed a minor gain in the MCC score, bringing it to 0.577. However, the MCC score of the KNN classifier decreased after applying SMOTE, which is also consistent with the accuracy.

4.2. Result Discussion

In classifying Russian-Ukrainian conflict-related discussion on Twitter, it is evident that most basic ML classifiers improved their performance, which was confirmed by measuring the MCC score as identified in the literature to be one of the best approaches for classification problems, particularly when data are balanced utilizing techniques such as SMOTE. The only classifier that does not enhance by the used approach was the KNN classifier, which was validated by the MCC score and the accurate result. Even so, the KNN algorithm performed far higher when the data were imbalanced than when they were balanced. This demonstrates that despite the promise of data balancing methodologies, its application in producing a balanced dataset could not always be applicable across all ML classifiers. As a result, it is worthwhile to investigate the possibility of determining the performance of these various classifiers by employing additional data balancing methods to evaluate and compare them in terms of their performance.

In this study, sentiments about war during the Russian-Ukrainian conflict have been analyzed. This study achieved two goals: the uniqueness of the collected data and ML to categorize the tweets' sentiments. The first goal was to collect the dataset by searching for the most popular hashtags about the conflict. The second goal was to place the collected tweets into categories using a number of well-known ML models. The most basic ML classifiers improved their performance, confirmed by evaluating the MCC score, which is known in the literature as one of the best ways to solve classification problems, especially when data are balanced using techniques like SMOTE. Also, it was demonstrated that data balancing techniques would not guarantee that all classes could perform better. Nevertheless, the data-balancing approach must be tested and compared using different ML classifiers and SA evaluation datasets. Also, the current research can be expanded in the future by adding Deep Learning classifiers, various feature settings, different data balancing techniques, and more predictive analysis research on both the SA dataset presented here and other benchmarking datasets from the research literature. The results of this work and its future expansions will give us a deeper technical understanding of ML and its configurations and parameters.

Compliance with Ethical Standards

We certify that our work is original and does not plagiarize the work of others.

Competing Interests

We certify that there is no actual or potential conflict of interest in relation to this article.

Research Data Policy and Data Availability Statements

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

N. Afroz, M. Boral, V. Sharma, and M. Gupta, “Sentiment Analysis of COVID-19 Nationwide Lockdown effect in India,” in Proceedings - International Conference on Artificial Intelligence and Smart Systems, ICAIS 2021, 2021, pp. 561–567. doi: 10.1109/ICAIS50930.2021.9396038.
S. Hajrahnur, M. Nasrun, C. Setianingsih, and M. A. Murti, “Classification of posts Twitter traffic jam the city of Jakarta using algorithm C4.5,” in 2018 International Conference on Signals and Systems, ICSigSys 2018 - Proceedings, 2018, pp. 294–300. doi: 10.1109/ICSIGSYS.2018.8372776.
P. Kostakos, M. Nykanen, M. Martinviita, A. Pandya, and M. Oussalah, “Meta-terrorism: Identifying linguistic patterns in public discourse after an attack,” in Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2018, 2018, pp. 1079–1083. doi: 10.1109/ASONAM.2018.8508647.
G. M. Demirci, S. R. Keskin, and G. Dogan, “Sentiment Analysis in Turkish with Deep Learning,” in Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019, 2019, pp. 2215–2221. doi: 10.1109/BigData47090.2019.9006066.
A. H. Alamoodi et al., “Public Sentiment Analysis and Topic Modeling Regarding COVID-19’s Three Waves of Total Lockdown: A Case Study on Movement Control Order in Malaysia.,” KSII Trans. Internet Informatıon Syst., vol. 16, no. 7, pp. 2169–2190, 2022.
A. H. Alamoodi et al., “Sentiment analysis and its applications in fighting COVID-19 and infectious diseases: A systematic review,” Expert Systems with Applications, vol. 167. 2021. doi: 10.1016/j.eswa.2020.114155.
N. H. Khun, T. T. Zin, M. Yokota, and H. A. Thant, “Emotion analysis of twitter users on natural disasters,” in 2019 IEEE 8th Global Conference on Consumer Electronics, GCCE 2019, 2019, pp. 342–343. doi: 10.1109/GCCE46687.2019.9015234.
U. H. H. Zaki, R. Ibrahim, S. A. Halim, K. A. M. Khaidzir, and T. Yokoi, “Sentiflood: Process model for flood disaster sentiment analysis,” in 2017 IEEE Conference on Big Data and Analytics, ICBDA 2017, 2018, vol. 2018-Janua, pp. 37–42. doi: 10.1109/ICBDAA.2017.8284104.
S. K. Akpatsa et al., “Sentiment Analysis and Topic Modeling of Twitter Data: A Text Mining Approach to the US-Afghan War Crisis,” SSRN Electron. J., 2022, doi: 10.2139/ssrn.4064560.
E. Lee, F. Rustam, I. Ashraf, P. B. Washington, M. Narra, and R. Shafique, “Inquest of Current Situation in Afghanistan Under Taliban Rule Using Sentiment Analysis and Volume Analysis,” IEEE Access, vol. 10, pp. 10333–10348, 2022, doi: 10.1109/ACCESS.2022.3144659.
M. Mahiuddin, “Real time sentiment analysis and opinion mining on refugee crisis,” in 2019 5th International Conference on Advances in Electrical Engineering, ICAEE 2019, 2019, pp. 699–705. doi: 10.1109/ICAEE48663.2019.8975462.
E. Elmurngi and A. Gherbi, “Detecting Fake Reviews through Sentiment Analysis Using Machine Learning Techniques,” DATA Anal. 2017 Sixth Int. Conf. Data Anal. Detect., no. c, pp. 65–72, 2017.
W. F. Al-Sarraj and H. M. Lubbad, “Bias Detection of Palestinian/Israeli Conflict in Western Media: A Sentiment Analysis Experimental Study,” in Proceedings - 2018 International Conference on Promising Electronic Technologies, ICPET 2018, 2018, pp. 98–103. doi: 10.1109/ICPET.2018.00024.
N. Öztürk and S. Ayvaz, “Sentiment analysis on Twitter: A text mining approach to the Syrian refugee crisis,” Telemat. Informatics, vol. 35, no. 1, pp. 136–147, 2018, doi: 10.1016/j.tele.2017.10.006.
S. Mansour, “Social media analysis of user’s responses to terrorism using sentiment analysis and text mining,” in Procedia Computer Science, 2018, vol. 140, pp. 95–103. doi: 10.1016/j.procs.2018.10.297.
G. A. Ruz, P. A. Henríquez, and A. Mascareño, “Sentiment analysis of Twitter data during critical events through Bayesian networks classifiers,” Futur. Gener. Comput. Syst., vol. 106, pp. 92–104, 2020, doi: 10.1016/j.future.2020.01.005.
F. Yao and Y. Wang, “Domain-specific sentiment analysis for tweets during hurricanes (DSSA-H): A domain-adversarial neural-network-based approach,” Comput. Environ. Urban Syst., vol. 83, 2020, doi: 10.1016/j.compenvurbsys.2020.101522.
A. Squicciarini, A. Tapia, and S. Stehle, “Sentiment analysis during Hurricane Sandy in emergency response,” Int. J. Disaster Risk Reduct., vol. 21, pp. 213–222, 2017, doi: 10.1016/j.ijdrr.2016.12.011.
S. H. W. Ilyas, Z. T. Soomro, A. Anwar, H. Shahzad, and U. Yaqub, “Analyzing brexit’s impact using sentiment analysis and topic modeling on twitter discussion,” in ACM International Conference Proceeding Series, Jun. 2020, pp. 1–6. doi: 10.1145/3396956.3396973.
F. Rustam, M. Khalid, W. Aslam, V. Rupapara, A. Mehmood, and G. S. Choi, “A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis,” PLoS One, vol. 16, no. 2, Feb. 2021, doi: 10.1371/journal.pone.0245909.
Imamah and F. H. Rachman, “Twitter sentiment analysis of Covid-19 using term weighting TF-IDF and logistic regresion,” in Proceeding - 6th Information Technology International Seminar, ITIS 2020, 2020, pp. 238–242. doi: 10.1109/ITIS50118.2020.9320958.
P. Sharma and A. K. Sharma, “Experimental investigation of automated system for twitter sentiment analysis to predict the public emotions using machine learning algorithms,” Mater. Today Proc., 2020, doi: 10.1016/j.matpr.2020.09.351.
E. Alpaydin, Introduction to Machine Learning, Fourth Edi. MIT Press, 2020. doi: 10.1007/978-3-030-74640-7_4.
V. K. Vijayan, K. R. Bindu, and L. Parameswaran, “A comprehensive study of text classification algorithms,” in 2017 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2017, 2017, vol. 2017-Janua, pp. 1109–1113. doi: 10.1109/ICACCI.2017.8125990.
F. Sebastiani, “Machine Learning in Automated Text Categorization,” ACM Comput. Surv., vol. 34, no. 1, pp. 1–47, 2002, doi: 10.1145/505282.505283.
Y. Yang and X. Liu, “A re-examination of text categorization methods,” in Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999, Aug. 1999, pp. 42–49. doi: 10.1145/312624.312647.
N. Jalal, A. Mehmood, G. S. Choi, and I. Ashraf, “A novel improved random forest for text classification using feature ranking and optimal number of trees,” J. King Saud Univ. - Comput. Inf. Sci., 2022, doi: 10.1016/j.jksuci.2022.03.012.
L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123–140, 1996, doi: 10.1007/bf00058655.
L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324.
P. Domingos, “A few useful things to know about machine learning,” Commun. ACM, vol. 55, no. 10, pp. 78–87, Oct. 2012, doi: 10.1145/2347736.2347755.
B. Agarwal and N. Mittal, “Text classification using machine learning methods-a survey,” Adv. Intell. Syst. Comput., vol. 236, pp. 701–709, 2014, doi: 10.1007/978-81-322-1602-5_75.
A. Subasi, Practical Machine Learning for Data Analysis Using Python. Elsevier, 2020. doi: 10.1016/B978-0-12-821379-7.00008-4.
H. Belyadi and A. Haghighat, Machine Learning Guide for Oil and Gas Using Python. 2021. doi: 10.1016/c2019-0-03617-5.
Y. Yang, “An evaluation of statistical approaches to text categorization,” Inf. Retr. Boston., vol. 1, no. 1–2, pp. 69–90, 1999, doi: 10.1023/a:1009982220290.
S. Wang and C. D. Manning, “Baselines and bigrams: Simple, good sentiment and topic classification,” in 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference, 2012, vol. 2, pp. 90–94.
R. Can, S. Kocaman, and C. Gokceoglu, “A comprehensive assessment of XGBoost algorithm for landslide susceptibility mapping in the upper basin of Ataturk dam, Turkey,” Appl. Sci., vol. 11, no. 11, p. 4993, 2021, doi: 10.3390/app11114993.
Y. Freund and R. E. Schapire, “Experiments with a New Boosting Algorithm,” Proc. 13th Int. Conf. Mach. Learn., pp. 148–156, 1996, doi: 10.1.1.133.1040.
W. Wang and D. Sun, “The improved AdaBoost algorithms for imbalanced data classification,” Inf. Sci. (Ny)., vol. 563, pp. 358–374, Jul. 2021, doi: 10.1016/j.ins.2021.03.042.
A. Diera et al., “Bag-of-Words vs. Sequence vs. Graph vs. Hierarchy for Single- and Multi-Label Text Classification,” 2022. doi: https://doi.org/10.48550/arXiv.2204.03954.
A. Pinkus, “Approximation theory of the MLP model in neural networks,” Acta Numer., vol. 8, pp. 143–195, 1999, doi: 10.1017/S0962492900002919.
Twintproject, “Twint,” GitHub, 2022. https://github.com/twintproject/twint (accessed Aug. 01, 2022).
F. Najar and N. Bouguila, “Emotion recognition: A smoothed Dirichlet multinomial solution,” Eng. Appl. Artif. Intell., vol. 107, p. 104542, Jan. 2022, doi: 10.1016/j.engappai.2021.104542.
A. Maghraby and H. Ali, “Modern Standard Arabic mood changing and depression dataset,” Data Br., vol. 41, p. 107999, Apr. 2022, doi: 10.1016/j.dib.2022.107999.
A. Krouska, C. Troussas, and M. Virvou, “The effect of preprocessing techniques on Twitter sentiment analysis,” in IISA 2016 - 7th International Conference on Information, Intelligence, Systems and Applications, Dec. 2016, pp. 1–5. doi: 10.1109/IISA.2016.7785373.
M. A. Abid, S. Ullah, M. A. Siddique, M. F. Mushtaq, W. Aljedaani, and F. Rustam, “Spam SMS filtering based on text features and supervised machine learning techniques,” Multimed. Tools Appl., pp. 1–19, 2022, doi: 10.1007/s11042-022-12991-0.
K. Chen, Z. Zhang, J. Long, and H. Zhang, “Turning from TF-IDF to TF-IGM for term weighting in text classification,” Expert Syst. Appl., vol. 66, 2016, doi: 10.1016/j.eswa.2016.09.009.
D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, no. 1, pp. 1–13, Jan. 2020, doi: 10.1186/s12864-019-6413-7.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Prediction of People Sentiments on Twitter Using Machine Learning Classifiers During Russian-Ukrainian Conflict

Status:

Version 1

Abstract

Figures

1. Introduction

2. Related Works

2.1. Literature Review

2.2. Classification Methods

2.2.1. KNN

2.2.2. RF

2.2.3. DT

2.2.4. LR

2.2.5. NB

2.2.6. XGBoost

2.2.7. AdaBoost

2.2.8. MLP

3. Methodology

3.1. Data Collection

3.2. Preprocessing

3.3. Data Annotation

3.4. SMOTE

3.5. Feature Extraction

3.6. Data Splitting

3.7. Performance Metrics

4. Experimental Results

4.1. Comparative Analysis

4.2. Result Discussion

5. Conclusion

Declarations

References

Additional Declarations

Status:

Version 1

\(Accuracy\left(Acc\right)= \frac{TP+TN}{TP+TN+FP+FN}\)	(1)
\(Precision \left(Pr.\right)= \frac{TP}{TP+FP}\)	(2)
\(Recall \left(Re.\right)= \frac{TP}{TP+FN}\)	(3)
\(F1 Score=2x \frac{Pr. x Re.}{Pr. +Re.}\)	(4)
\(MCC= \frac{TP x TN-FP x FN}{\sqrt{\left(TP+FP\right)x \left(TP+FN\right)x\left(TN+FP\right)x(TN+FN)}}\)	(5)