This section comprehensively reviews previous research and efforts in sentiment analysis techniques. By examining a body of knowledge collected over time, we want to obtain valuable insights, contextual comprehension, and a clear perspective on the evolution of sentiment analysis.
Sentiment Analysis
Sentiment analysis is a method that uses NLP to automatically extract views, opinions, viewpoints, and feelings from text, voice, tweets, and database sources. Opinions in a text are labeled as "positive," "negative," or "neutral" in sentiment analysis. It is also known as a subjective analysis, opinion mining, or evaluation extraction [22]. Machine learning techniques that can automatically categorize sentiment have been suggested as one of several strategies for assessing text sentiment. Both supervised and unsupervised learning techniques are effective in this respect [23]. Large language models can capture context, sarcasm, and other details in sentiment expression, unlike typical lexicon-based methods that depend on specified sentiment lexicons. [24] investigated the performance of supervised and unsupervised machine learning approaches for sentiment analysis. Their results demonstrated that supervised techniques outperform unsupervised methods, such as lexicon-based algorithms, in terms of accuracy. However, obtaining adequate labeled training data for supervised techniques may be expensive and time-consuming. Recently, a large amount of study has been performed in the area of "Sentiment Analysis on Twitter” by a number of experts. Its original application was in the context of binary classification, in which positive and negative categories receive identical ratings [25].
Pak and Paroubek [26] showed a way to say whether a tweet is neutral, positive, or negative. They generated a Twitter collection using the Twitter API to gather tweets and images to name them automatically. Using this information, they made a mood generator based on the multinomial Naive Bayes method. This method uses traits like Ngram and POS tags to predict a person's mood. Their training set also didn't work because it only had tweets with symbols. [27] utilized the Naive Bayes bigram model and the Maximum Entropy model to determine the topics of tweets. They found that the Maximum Entropy model did not perform as well as the Naive Bayes model. [28] proposed a method for sentiment analysis for Twitter data based on distant supervision, with their training set consisting of tweets with emoticons acting as noisy labels. They build models using techniques like Naive Bayes, MaxEnt, and Support Vector Machines (SVM). In their feature space, they had POS, bigrams, and unigrams. They discovered that SVM outperformed competing models and that unigrams performed better as features. [29] developed a two-stage algorithmic process for analyzing the tone of tweets. First, they determined if a tweet was objective or subjective, and then they rated the quality of the emotional tweets. Combining characteristics like the past orientation of words and POS, the feature space utilized includes shares, hashtags, links, punctuation, and exclamation marks [30].
Bifet and Frank [31] utilized real-time information from Twitter's Firehouse API, which provided access to every user's public tweets. The Hoeffding tree, stochastic gradient descent, and multinomial naive Bayes were all put through their paces. They concluded that a moderate learning rate was best for the SGD-based model. [32] developed a paradigm that separates emotional states into positive, negative, and neutral states. Experiments have been conducted using models such as the unigram, feature-based, and tree kernel-based models. They used a "tree kernel," a tree structure, to represent Twitter. The feature-based model only uses 100 traits, unlike the unigram model, around 10,000. They concluded that characteristics such as the words' past orientation and parts-of-speech (pos) identities are the most critical and relevant in the categorizing process. In a three-way competition, the tree kernel-based model won.
To use Twitter user-defined hashtags in tweets as a classification of emotion type, [33] provided a method that makes use of punctuation, single words, n-grams, and patterns as multiple feature types, which are then combined into a unique feature vector for sentiment classification. The K-Nearest Neighbor method was used to assign mood labels to feature vectors constructed for each occurrence in the training and test sets.
Liang and Dai [34] gathered Twitter data using the Twitter API. Three categories of training data are used: camera, video, and mobile. Positive, negative, and non-opinion labels are used to categorize the content. Views on tweets were suppressed. The Unigram Naive Bayes model and the Naive Bayes simplified independence assumption were applied. They used a feature elimination method based on Mutual Information and Chi-Square to get rid of elements that weren't necessary. The course of a tweet can now be anticipated. Superior or inferior, etc. [35] English tweet orientation may be identified using variants of the Naive Bayes model that have been presented. There are two Naive Bayes classifiers: Baseline (trained to categorize tweets as positive, negative, or neutral) and Binary (classified as positive or negative using a polarity language). Neutral tweets are excluded from consideration. Along with Valence Shifters, Multiword from Multiple Sources, and Polarity Lexicons, classifiers looked at lemmas (nouns, verbs, adjectives, and adverbs). [36] used the bag-of-words method to determine how people felt. In this method, the connections between words are not considered, and a text is seen as a group of words [30]. To establish the mood for the whole text, the attitudes of each word were recognized, and their values were merged using different aggregation methods. [37] utilized the linguistic database WordNet to find a word's emotional meaning across many dimensions. They built a WordNet distance measure and determined the meaning orientation of words.
Xia et al., [38] used an ensemble architecture, combining different feature sets and classification methods, to classify how people felt about something. They used three fundamental models: Naive Bayes, Maximum Entropy, and Support Vector Machines, as well as two types of feature sets: part-of-speech information and word relations. They employed ensemble methods for mood classification, such as fixed combination, weighted combination, and meta-classifier combination, and noticed an improvement in accuracy. [39, 40] highlighted the challenges and a helpful technique to get opinions from tweets. Opinion elicitation on Twitter is challenging due to spam and the use of languages that vary greatly. The general model for sentiment analysis is shown in Fig. 1.
The following are steps needed for mood analysis of Twitter data,
To the best of our understanding, our work is the first study that performed a detailed analysis of positive and negative comments on quantum computing technology. We add to the literature by giving a snapshot of the early public reactions to this latest technology.