Sentiment Analysis Model of Imbalanced Comment Texts Based on BiLSTM

doi:10.21203/rs.3.rs-2434519/v1

Download PDF

Research Article

Sentiment Analysis Model of Imbalanced Comment Texts Based on BiLSTM

https://doi.org/10.21203/rs.3.rs-2434519/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

This paper tries to improve the performance of imbalanced comment texts sentiment analysis by combining deep learning and class imbalanced learning methods. A sentiment analysis method for imbalanced comment texts based on BiLSTM framework is proposed. For the case of more negative samples than positive samples, when the degree of imbalance is low, the minority class samples are Adaptive Synthetic Sampling, and the CNN-BiLSTM model is proposed to realize sentiment classification by constructing Sigmoid. When the degree of imbalance is high, the samples of majority class are sampled multiple times until the original dataset is divided into multiple low imbalance datasets. Then, multiple groups of equalization Adaptive Synthetic Sampling is carried out for the samples of minority class, and BiLSTM model is learned for each group of training data respectively. Finally, Ensemble learning is adopted to obtain the final sentiment classification results. Experimental results show that this paper method is superior to the traditional imbalanced comment texts sentiment analysis method.

sentiment analysis

ensemble learning

imbalanced data

text classification

BiLSTM

Sentiment analysis refers to the analysis of subjective texts with emotional colors, mining the emotional tendencies contained in them, and dividing different emotions, involving artificial intelligence, machine learning, data mining, natural language processing and other research fields. There are mainly sentiment analysis methods based on sentiment dictionary[1]and sentiment analysis based on traditional machine learning[2]. Due to the complexity and variability of texts, traditional machine learning methods cannot learn the deep semantic information in the text, which in turn leads to the inability to accurately classify in some sentiment analysis tasks[4]. The deep learning method has better feature representation ability and higher classification ability, so the sentiment analysis method based on deep learning has been welcomed by researchers at home and abroad in recent years.

The more popular deep learning methods are Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Bi-directional Long Short-Term Memory (BiLSTM), Gated Recurrent Unit Neural Network (GRU), Bi-Gate Recurrent Unit (BiGRU), Convolutional Neural Network (CNN), etc. Tang D[3] used GRU for short text sentiment classification. Yao Ni[4] proposed an online review text sentiment classification model based on BERT (Bidirectional Encoder Representation from Transformers) and BiGRU, which has the best classification effect compared with the traditional deep learning model. Guo Xianda[5] combined the advantages of BiLSTM and CNN to propose an online consumer review sentiment analysis method based on CNN-BiLSTM, which has certain domain scalability.

Most of the existing sentiment analysis studies are conducted under the assumption that samples of different sentiment polarity are balanced. However, in practical, the problem of class imbalance is widespread, especially in the field of e-commerce platforms, managers limit the number of comments that can be displayed for an emotion category, and the resulting comment data is often imbalanced distribution, and the class imbalance of this sample distribution often leads to the decline or even failure of the performance of traditional machine learning classification models[6]. Therefore, sentiment analysis of class imbalance comment texts data have become a challenging practical problem in the field of natural language processing.

At present, a variety of methods have been proposed to address the problem of class imbalance, which can be broadly divided into three categories. One type is at the dataset level, such as undersampling methods, oversampling methods, and so on. They modify the data itself to reduce the imbalance by changing the size and feature distribution of the training set. Since the undersampling method will discard some samples of most classes, there is a possibility of throwing potentially useful information, resulting in the classifier not being able to learn more information, thus affecting the final classification effect. Random oversampling is a simple copy of a few types of samples, and it is very likely that the sample information of the test set will be learned in advance during the training process of the classifier, making the trained classifier prone to overfitting problems. In order to improve the shortcomings of the above sampling methods, some researchers have proposed methods such as Synthetic Minority Oversampling (SMOTE), Borderline-SMOTE, and Adaptive Synthetic Sampling (ADASYN). The sampling method is often applied to the sentiment analysis research based on machine learning methods, and the model training and prediction classification are performed using the sample-balanced training set[7–9]. The other type is at the algorithmic level, such as the cost-sensitive learning[10–11], which modifies the error cost function in the process of constructing the classifier, so that the cost of less class classification is much greater than the cost of multiple class classification[12]. The third type is ensemble learning [6,13–16], which is to train multiple single classifiers and group them together, generally using the mechanism of majority voting for classification.

Now, the more popular way to solve the problem of class imbalance comment texts classification is to combine deep learning with classs imbalance learning methods. Mukherjee A [17] explored the performance of simple RNN, GRU, LSTM, and BiLSTM on the imbalanced distribution of Amazon review datasets, and selected the best performing model. Then, the model is combined with the oversampling method before training to further explore the influence of oversampling on the model. Omara E[18] proposed an Arabic sentiment analysis method based on deep convolutional neural networks, and studied the performance of convolutional neural networks under cost-sensitive learning, undersampling and oversampling conditions. Wen Tingxin[19] established a sentiment analysis model based on LWC-BiLSTM imbalanced comment texts, using the theme model to sample the original comment set to perform sampling imbalance processing, feature extraction using the Word2vec word vector model, and then using convolutional neural network to extract local features of comments, and finally performing sentiment classification experiments by BiLSTM algorithm. Yin Hao[20] obtained multiple sets of balanced Weibo comment training corpus through undersampling, trained an LSTM model for each group of corpus, and finally fused multiple LSTM models to make sentiment classification prediction. Zhang Zhiwu[21] proposed an adaptive unbalanced data sentiment analysis method based on LSTM deep learning framework, oversampling of minority samples when the degree of imbalance is low, multi-group equalization undersampling of most types of samples when the degree of imbalance is high, and finally LSTM ensemble learning is selected on the class-balanced dataset to obtain the final sentiment classification.

The above studies have achieved good results, but there are still some shortcomings. The literature[4–5] does not further explore the performance of the proposed model on imbalanced data. In binary sentiment classification, datasets tend to consider cases where there are more positive classes than negative classes. Although this is more in line with most of the reality of crawling text data in comments, the fact that there are more negative classes than positive classes cannot be ignored. Compared with good reviews, merchants also want to know those with more bad reviews in which aspects are worse, so as to take corresponding measures. The literature[17] only adopts an oversampling method for imbalance treatment, and does not consider undersampling; The undersampling method in the literature[20–21] is very likely to lose some important text information that is helpful for classification, thereby affecting the effect of sentiment classification.

In view of the inspiration of the above literature, and on the premise of not sacrificing the majority class, aiming at the binary sentiment polarity classification problem of imbalanced comment texts, this paper focuses on the actual situation when there are more negative classes than positive classes, and proposes an imbalanced comment texts sentiment analysis model based on BiLSTM framework. When the degree of imbalance is low, Adaptive Synthetic Sampling is performed on the samples of minority class, and the CNN is added on the basis of BiLSTM, the CNN-BiLSTM model is proposed for deep learning training, and finally the sentiment classification is realized by the Sigmoid layer. When the degree of imbalance is high, keep the samples of minority class unchanged, the samples of majority class are sampled multiple times, and the number of samples is two to three times that of the samples of minority class, until the dataset is divided into several small proportions of imbalanced datasets, multiple groups of equalization Adaptive Synthetic Sampling is carried out for the samples of minority class, and then the BiLSTM model is learned from each group of training data separately, and the final sentiment polarity is obtained through ensemble learning during sentiment classification prediction.

The paper is organized as follows: Section II details the model construction, Section III gives the experiment and analyzes the results, and Section IV summarizes the paper and proposes future research ideas.

Taking online product comment texts as an example, this paper designs a sentiment analysis model based on the BiLSTM framework as shown in Fig. 1 below. Firstly, the corpus is preprocessed with data, including word segmentation, removal of stopwords, word vector representation, and setting sentiment labels. Secondly, the vectorization of comment texts, that is, converted into the input representation of the deep learning model; Then, according to the degree of class imbalance, different methods are selected for deep learning model training. When the degree of imbalance is low (the number of samples of majority class is less than three times the number of samples of minority class), the minority samples in the training set will be Adaptive Synthetic Sampling to equal the number of samples of the majority class to achieve the purpose of balancing the training set, trained by the CNN-BiLSTM deep learning model, adding a convolutional layer can better extract the features of the text, and finally use the test set to make sentiment classification prediction on the model, in which the class imbalance degree of the test set will not be modified. When the degree of imbalance is high (the number of majority class samples is greater than or equal to three times the number of minority class samples),Using the idea of divide and conquer, the large proportion of imbalanced dataset is transformed into several small proportion imbalanced datasets, the specific method is: keep the minority class samples unchanged, the majority class samples are sampled multiple times, the minority samples are multi-group equalized Adaptive Synthetic Sampling, and then each group of training data is separately BiLSTM deep learning training, this paper adopts a soft voting strategy, the output of multiple models on the test set is arithmetically averaged, and the final prediction classification results of the model on the test set are obtained.

A. Word2vec

Word2Vec[22] is a tool that Google open-sourced in 2013. It can not only efficiently train on millions of dictionaries and hundreds of millions of datasets, but also the word vectors trained by the tool can measure the similarity between words. It contains two training models, Skip-gram and CBOW. The former is to use each current word as an input to a log-linear classifier with a continuous projection layer and predict words within a certain range before and after the current word. The latter is the opposite.

This paper uses jieba word segmentation to process comment texts, and uses hit stopword list to remove stopwords and remove punctuation marks. word vectors are trained by calling Word2Vec's skip-gram model in the Genism package of the python library. In this paper, the Onehot coding vector is used as the label vector for sentiment classification, and [1,0] and [0,1] are used to represent positive comments and negative comments, respectively. According to the sentiment polarity of each comment in the training dataset, the corresponding label vector is set separately.

B. ADASYN

The collected corpus of product comment texts is often imbalanced, and if the simple oversampling imbalance processing method is directly adopted, it is easy to lead to overfitting of the model, and the simple undersampling method will lose most of the sample information. Now the popular oversampling method is SMOTE, and the algorithm idea is to use the interpolation method to generate new samples. The SMOTE algorithm can effectively avoid the problem of data overfitting, but it does not take into account the distribution of data and the influence of majority class on minority class, and does not take into account the specificity of sample points. Therefore, considering that as much sample information as possible is preserved, this paper adopts Adaptive Synthetic Sampling (ADASYN), which automatically determines how many synthetic samples need to be generated for each minority class sample according to the sample distribution, that is, the more samples of majority class around a minority class sample, the more samples will generate for it through the algorithm, rather than synthesizing the same number of samples for each minority class sample like SMOTE.

C. CNN And BiLSTM

CNN is often used for image classification, but they have also been shown to be effective in text classification. Unlike RNNs, CNN do not rely on sequential elements of language, and they learn from text by individually perceiving each word in a sentence and learning its relationship to the words that surround it in the sentence. Its basic structure is roughly composed of three parts: the first part is the input layer; The second part consists of a combination of multiple convolutional layers and pooling layers, which is the core of CNN; The third part consists of a fully connected, multi-layer perceptual classifier consists[5].

The combination of convolution and pooling can not only greatly reduce the number and complexity of the parameters of the model, but also extract more abstract features from the comment texts, as well as the position information of words and the relevant semantic information between words. After repeated experiments, the CNN designed in this paper uses three convolutional layers and one pooling layer.

Compared with LSTM, BiLSTM can also consider context information, so CNN are added to BiLSTM. The experiment also uses the Dropout strategy[23] to prevent the model from overfitting, and achieves the effect of regularization to a certain extent.

D. Bagging

In this paper, the ratio of negative to positive of the dataset with a high degree of imbalance is 7:1, keeping the positive samples of minority class unchanged, and by sampling the negative samples of majority class three times, it is divided into three small proportions of imbalanced datasets, of which the number of negative comments is 2.3 times the number of positive comments. Next, Adaptive Synthetic Sampling is carried out on positive samples, and a balanced training sample set is formed with negative evaluation samples, and then multiple sets of BiLSTM training learning are carried out, each group of BiLSTM model is equivalent to an independent individual learner, ensemble learning adopts the Bagging method, which can improve the stability of training while reducing variance, and sampling is to enhance the diversity between individual learners.

When classification prediction is required, the test sample first performs classification prediction on each trained BiLSTM model, and then arithmetically averages the three prediction probability values to obtain the final sentiment prediction probability value of the test sample. Finally, the corresponding sentiment label transformation needs to be performed, and when the probability value is greater than 0.5, it is recorded as 1, and vice versa, it is recorded as 0, so as to obtain the final sentiment label prediction of the test set.

All experiments in this paper were conducted on a computer with a 1.80GHz Intel Core i5 processor, 4GB of RAM, and 64-bit Windows 11. The data preprocessing and all algorithms used for sentiment classification are run in python environment and implemented with tensorflow framework.

A. Datasets and Data Pre-Processing

This paper selects two public datasets: Chinese takeaway comment dataset and Chinese hotel comment dataset, and also crawls 12663 review data about masks in more than a dozen stores in Jingdong Mall, and the crawled dataset needs to manually add a label column, delete all comment data with a star rating of three stars, and record the comment data label with a star rating below three stars as 0, and the comment data with a star rating below three stars as 1. The three datasets were then merged together as the original corpus for the experiments in this paper. After data preprocessing, the dataset size and sentiment distribution of each datsets are shown in Table I.

Table 1. Experimental dataset

polarity	Hotel reviews	Take away reviews	Mask reviews	Total
negative	2437	7836	2867	13140
postitive	5314	3843	9316	18473
total	7751	11678	12183	31613

B. Evaluation

The commonly used evaluation index in sentiment analysis research is Accuracy, but this paper is about the binary sentiment classification problem of imbalanced data, so the G_Mean and F_measure of the common index for imbalanced data sentiment classification are selected.

The evaluation indicators are calculated by the following formula:

$$Accuracy=\frac{{TP+TN}}{{TP+FP+TN+FN}}$$

$$Sensitivity=\frac{{TP}}{{TP+FN}}$$

$$Specificity=\frac{{TN}}{{TN+FP}}$$

$$G{\text{\_}}Mean=\sqrt {Specificity * Sensitivity}$$

$$F{\text{\_}}measure=\frac{{2 \times Recall \times Precision}}{{Recall+Precision}}$$

Where TP indicates the number of comments correctly assigned to the positive category, FP indicates the number of comments incorrectly assigned to the positive category, FN indicates the number of comments incorrectly rejected to the positive category, and TN indicates the number of comments correctly rejected to the positive category.

C. Experimental Design

Before the sentiment classification experiment, the ratios of negative comments and positive comments were 3:1 and 7:1, respectively, according to the imbalance of the two class of low and high in the experimental study.This paper samples the corresponding number of positive and negative comments from the original corpus according to the ratio of 6:2:2 of the training set: verification set: test set to generate the training set, verification set and test set required for the experiment.

Due to the different selection of training and test samples, the results will be greatly different, in order to reduce the experimental error, repeat the experiment ten times, and the result will be averaged. In order to verify the performance of the model proposed in this paper based on the BiLSTM framework, two types of algorithms, machine learning and deep learning, are used to compare each model from the aspects of G_mean and F_measure under the condition of different degree of class imbalance. Among them, the machine learning algorithm uses SVM(Support Vector Machine), Random Forest, Naive Bayes and Logistic Regression, and the deep learning algorithm uses CNN, LSTM, and GRU to compare with BiLSTM.

D. Experimental parameter design

After adjusting the parameters, the hyperparameters parameters of CNN and Word2Vec model are shown in Table II and Table III, respectively. The machine learning algorithm parameters are set to default values in sklearn. In the process of model training, in order to ensure the comparability between models, these neural networks are set to two hidden layer structures, and the callback functions ReduceLROnPlateau and EarlyStopping in Keras are used to set the optimization scheme of the learning rate, so the specific parameter settings are shown in Table IV.

Table 2. CNN super parameter settings

Neural network layer	Tunable parameters	Value
The first layer of convolution	neurons	256
	convolution kernel size	5
	activation function	relu
Dropout	convolution kernel size	5
The second layer of convolution	neurons	128
	convolution kernel size	5
	activation function	relu
The third layer of convolution	neurons	32
	convolution kernel size	3
	activation function	tanh

Table 3. Word2Vec model parameter settings

Tunable parameters	Value
algorithm	Skip-gram
vector size	200
min_count	3
window	3

Table 4. Deep learning Model parameter settings

Tunable parameters	Value
number of neurons	32
optimizer	adam
loss function	binary_crossentropy
dropout	0.4
batch size	32
epoch	10

E. Experimental result

In order to ensure the fairness of the comparative experiment, all classification algorithms in the experiment adopt a unified unbalanced data processing method: when the imbalance degree is low, Adaptive Synthetic Sampling is performed on the samples of minority class; On the contrary, the samples of minority class are kept unchanged, the samples of majority class are sampled multiple times, and the minority samples are subjected to multi-group equalization Adaptive Synthetic Sampling.

1) Comparison of small proportion unbalanced emotion analysis methods

According to the above experimental design, different classification algorithms were carried out on the small scale imbalanced comment dataset in this paper, and the experimental comparison results are shown in Table V below.

Table 5. Experimental Performance Comparison of Different Clas-sification Algorithms for Small Scale Unbalanced Datasets

model	G_Mean	F_measure	Accuracy
CNN-BiLSTM	0.8529	0.8343	0.8816
CNN	0.8314	0.8072	0.8646
BiLSTM	0.8474	0.8037	0.8666
LSTM	0.8416	0.8027	0.8650
GRU	0.8444	0.8074	0.8676
SVM	0.7537	0.7085	0.8223
Naive Bayes	0.5832	0.5122	0.6384
Logistic regression	0.7492	0.7012	0.8189
Random Forest	0.8013	0.6864	0.8511

From the experimental results, the CNN-BiLSTM model proposed in this paper has the best performance. The overall performance of the four commonly used deep learning algorithms was better than that of the four machine learning algorithms, with an average increase of 11.93%, 15.32% and 8.33% in terms of G_mean, F_measure and Accuracy. In deep learning algorithms, CNN, BiLSTM, LSTM and GRU are almost the same in accuracy and F_measure. In terms of F_measure alone, the performance of these four commonly used deep learning algorithms is relatively close, but it can be seen from the G_mean that CNN has the worst performance and the best performance is BiLSTM.

Specifically, the lowest Accuracy is the Naive Bayes method, which has a value of 0.6384, and its G_mean and F_measure are also the lowest, with values of 0.5832 and 0.5122; the highest accuracy is the CNN-BiLSTM model proposed in this paper, which has a value of 0.8816, and its G_mean and F_measure are also better than other algorithms, with values of 0.8529 and 0.8343. Compared with BiLSTM, the CNN-BiLSTM model proposed in this paper directly improves 0.55% (0.8529 − 0.8474), 3.06% (0.8343 − 0.8037) and 1.50% (0.8816 − 0.8666) in terms of G_mean, F_measure and Accuracy. Compared with CNN, it increased by 2.15% (0.8529 − 0.8314), 2.71% (0.8343 − 0.8072) and 1.70% (0.8816 − 0.8646).

2) Comparison of large proportion unbalanced emotion analysis methods

Under the condition that the degree of imbalance is high, the ensemble learning performance of the deep learning algorithm is compared according to the experimental design of this paper, as shown in Table VI.

Table 6. Experimental Performance Comparison of Different Clas-sification Algorithms for Small Scale Unbalanced Datasets

model	G_Mean	F_measure	Accuracy
BiLSTM	0.8401	0.8209	0.8570
LSTM	0.8389	0.8217	0.8566
CNN	0.8204	0.8100	0.8485
GRU	0.8091	0.8028	0.8386

Table 7. Experimental Performance Comparison of Different Clas-sification Algorithms for Small Scale Unbalanced Datasets

model	G_Mean	F_measure	Accuracy
SVM	0.7721	0.7310	0.8151
Naive Bayes	0.6344	0.5874	0.6506
Logistic regression	0.7657	0.7258	0.8083
Random Forest	0.7916	0.7726	0.8302

The results show that the multiple BiLSTM ensemble has the best performance, with G_mean of 0.8401 and F_measure of 0.8209, while the GRU has the worst performance with G_mean of 0.8091 and F_measure of 0.8028. From the table, BiLSTM and LSTM have little difference in F_measure, which may be because the dataset used in this paper is short text, so it cannot fully reflect the advantages of BiLSTM to consider context, but from the perspective of G_mean, BiLSTM is significantly improved by 14.3% ((0.8401 − 0.8389)/0.8389) than LSTM, which indicates that the deep learning BiLSTM model has corresponding potential in dealing with short text sentiment classification problems. Table VII shows that the experimental performance of machine learning algorithms in dealing with the problem of imbalanced comment text sentiment classification is indeed inferior to the deep learning ensemble model proposed in this paper.

3) Comparison of unbalance treatment methods under the framework of BiLSTM

In order to verify the experimental performance of Adaptive Synthetic Sampling imbalance processing method for different imbalance situations under the framework of BiLSTM, four sentiment classification methods under data imbalance processing are designed:

The first is complete training + BiLSTM framework, that is, the training set is not balanced, all training data is taken for training, and the BiLSTM framework is used for deep learning training and classification prediction;

The second is the random oversampling + BiLSTM framework, that is, random oversampling processing is done for the samples of minority class in the unbalanced training set, combined with the samples of majority class to form a balanced training set, and the BiLSTM framework is used for deep learning training and classification prediction;

The third is the random undersampling + BiLSTM framework, that is, the random undersampling processing is carried out on the samples of majority class in the unbalanced training set, combined with the samples of minority class to form a balanced training set, and the BiLSTM framework is used for deep learning training and classification prediction;

The fourth is the Adaptive Synthetic Sampling + BiLSTM framework in this paper, that is, the corresponding imbalance processing and training prediction framework is selected according to the degree of imbalance of the training set.

When the imbalance ratio is low, the above four imbalance treatment methods are based on the CNN-BiLSTM model proposed in this paper. When the imbalance ratio is high, the first method is full training + BiLSTM, that is, 7000 negative comments and 1000 positive comments are taken to training, and BiLSTM is directly used for deep learning training and classification prediction, while the second to fourth imbalance processing methods are based on multiple BiLSTM ensemble frameworks proposed in this paper. In order to further verify the performance of multiple BiLSTM ensemble frameworks when the degree of imbalance is high, an additional sentiment classification method is added for comparison: Adaptive Synthetic Sampling + BiLSTM, that is, Adaptive Synthetic Sampling the samples of minority class, forming a balanced training set with the samples of majority class, and directly using BiLSTM for deep learning training and classification prediction.

Figures 2 and 3 compare the experimental results of different methods in the G_mean and F_measure indicators under the low degree of imbalance and high degree of imbalance, respectively.

It can be seen from the experimental results that the Adaptive Synthetic Sampling sentiment analysis method based on the BiLSTM framework proposed in this paper can maintain a good performance advantage and the overall performance is the best by adopting the corresponding learning strategy for the imbalance degree. Specifically, Fig. 2 shows that the performance of all CNN-BiLSTM methods after imbalanced treatment is better than that of fully trained without imbalanced treatment. Compared with the fully trained CNN-BiLSTM method without equalization, the adaptive comprehensive oversampling CNN-BiLSTM method proposed in this paper for low imbalance rate has a G_mean increase of 53.1% (0.8529 − 0.3219) and an increase of 22.69% (0.8343 − 0.6074) in F_measure, which is improved to a certain extent compared with the CNN-BiLSTM method with random undersampling or random oversampling balancing treatment.

In Fig. 3, the Adaptive Synthetic Sampling multiple BiLSTM ensemble method proposed for high imbalance rate is better than simple random oversampling and random undersampling from the perspective of imbalance processing method, whether it is G_mean (0.8401) or F_measure (0.8209). Second, from the model point of view, the performance of multiple BiLSTM integration is indeed better than that of using only BiLSTM models, and its G_mean is increased by at least 5.03% (((0.8401 − 0.7999)/0.7999), and the F_measure is increased by at least 5.55% ((0.8209 − 0.7777)/0.7777). Another point is that it can be seen on the G_mean and F_measure that when the data imbalance of the training set is large, the simple random undersampling and random oversampling methods may fail, or even inferior to the fully trained BiLSTM model.

F. Result analysis

In the comparison of small proportion imbalanced sentiment analysis methods, machine learning methods are more affected by the imbalanced distribution of data than deep learning methods, and their performance is poor. The performance of several commonly used deep learning methods selected in this paper is CNN, LSTM, GRU, and BiLSTM in descending order. CNN have often been used for sentiment analysis in recent years, but from the experimental results of this paper, if used alone, its performance is not high. LSTM uses the gate mechanism to solve the gradient disappearance problem of traditional RNNs, which can realize the preservation and control of long-term memory. GRU is a variant of LSTM, which has fewer parameters and faster convergence than LSTM. The experimental results show that the performance of GRU is slightly better than that of LSTM. BiLSTM is a combination of forward LSTM and backward LSTM, which solves the problem that LSTM cannot encode back-to-front information, and BiLSTM performs better than LSTM in this experiment. Therefore, CNN is added on the basis of BiLSTM, and a better performance CNN-BiLSTM model is proposed, which can not only establish temporal relationships, but also characterize local spatial characteristics.

In the comparison of large proportion imbalanced sentiment analysis methods, the ensemble performance of several deep learning methods is GRU, CNN, LSTM, BiLSTM from low to high. The results show that compared with the ensemble of LSTM and CNN, the advantages of GRU such as small number of parameters and fast convergence speed have not been fully utilized in ensemble learning. This shows from the side that in the experimental process, we cannot blindly only look at the advantages and disadvantages of the method model to choose which one to use or not to use, and in the experiment, we need to choose the most appropriate treatment method and model according to the actual situation.

Under the BiLSTM deep learning framework, the simple random undersampling and random oversampling methods, compared with the BiLSTM method that is fully trained without imbalance treatment, have their own advantages and disadvantages under different imbalance ratios, on the one hand, it shows that the imbalance of data distribution can affect the performance of the model to a certain extent, on the other hand, when the degree of imbalance is large, the effect of simple random undersampling and random oversampling methods may not be satisfactory, and more consideration should be given to improving performance while balancing data distribution. After many experiments, the Adaptive Synthetic Sampling method is selected among the imbalance treatment methods. Different learning strategies are adopted for different levels of balance of data. Finally, the sentiment analysis method based on BiLSTM framework under Adaptive Synthetic Sampling can maintain good performance advantages, which shows that the performance of the model is not only related to the classification method, but also depends on the data distribution and data quality.

Aiming at the sentiment analysis of imbalanced comment text data, combined with deep learning and class imbalance processing methods, this paper designs a sentiment analysis method based on BiLSTM framework, selects different ways to process low or high imbalance rate datasets, balances the training set through Adaptive Synthetic Sampling, and then performs a CNN-BiLSTM deep learning training or multiple sets of BiLSTM parallel deep learning training accordingly. and individual predictive classification or ensemble learning predictive classification. Experiments on the combined dataset of public and crawled data show that the proposed method improves the performance of sentiment analysis of imbalanced data to a certain extent. In future improvement work, the sampling technique will be optimized, and attempts will be made to incorporate cost-sensitive techniques to further improve the performance of the sentiment classification model. In addition, sentiment analysis of multi-class imbalanced data is also one of the main directions of future research work.

LIU B, ZHANG L. A survey of opinion mining and sentiment analy-sis[M]// AGGARWAL C C,ZHAI C X. Mining text data.New York: Springer2012: 415–463.
PANG B, LEE L, VAITHYANATHAN S. Thumbsup sentiment classification using machine learning techniques[C]//Proceedings of the Conference on Empirical Methods in Natural Language Pro-cessing(EMNLP). Stroudsburg: Association for Computational Lin-guistics,2002:79.
Tang D, Qin B, Liu T. Document Modeling with Gated Recurrent Neural Network for Sentiment Classification[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Pro-cessing,2015: 1422–1432.
Yao Ni, Gao Zheng-yuan, Lou Kun, et al. Research on sentiment classification for online reviews based on BERT and BIGRU[J]. Jour-nal of Light Industry, 2020, 35(5): 80–86. DOI:10.12187/2020.05.011.
Guo Xian-da, Na Ri-sa, Cui Shao-ze. Consumer reviews sentimet analysis based on CNNBiLSTM[J]. Systems Engineering—Theory & Practice, 2020, 40(3): 653–663. DOI:10.12011/1000-6788-2018-1890-11.
Wang Zhong-qing, Li Shou-shan, Zhu Qiao-ming, et al. Chinese sentiment classification on imbalanced data distribution[J]. Journal of Chinese Information Processing, 2012, 26(3):33–38.
Rodriguez Gonzalez A, Tunas J M, Santamaria L P, et al. Identify-ing polarity in tweets from an imbalanced dataset about diseases and vaccines using a meta-model based on machine learning techniques[J]. Applied Sciences Basel,2020,10(24):9019.
Moscato V, Picariello A, Sperli G. A benchmark of machine learning approaches for credit score prediction[J]. Expert Systems with Applica-tions, 2021,165(9):113986.
Gosain A, Sardana S. Handling class imbalance problem using oversampling techniques: A review[C]//2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE, 2017.
Li Fang, Qu Yu-bin, Chen Xiang, et al. AA sentiment analysis method based on cls imbalanced learning.[J]. Journal of Jilin Universi-ty(Science Edition), 2021, 58(4): 929–935. DOI:10.13413/j.cnki.jdxblxb.2020252.
Li Wei-jiang, Tang Ming, Yu Zheng-tao. Sentiment classification of unbalanced samples based on Multi-Channel Bi-GRU and loss Re-Balance strat[J]. Journal of Chinese Information Processing, 2022(002):036.
Li Ang, Han Meng, Mu Dong-liang, Gao Zhi-hui, Liu Shu-juan. Survey of Multi-Class imbalanced data classification meth-ods[J].Application Research of Computers, 2022.03(0198):1–15.DOI:10.19734/j.issn.1001-3695.2022.03.0198.
Zhang D M, Ma J,Yi J, et al.An ensemble method for unbalanced sentiment classification[C]//The 2015 11th International Conference on Natural Computation(ICNC),2015:440–445.
Tang T C, Tang X H, Yuan T Y. Fine-tuning BERT for multi-label sentiment analysis in unbalanced code-switching text[J]. IEEE Ac-cess,2020, 8:193248–193256.
Chen Li-fang, Dai Qi, Zhao Jia-liang. A multi-granularity ensemble classication algorithm for imbalanced data[J]. Computer Engineering & Science,2021,43(5):917–925.
Duan JD, Ma K, Sun RY. Unbalanced data sentiment classification method based on ensemble learning[C]//International Conference on Big Data Technologies(ICBDT), 2019:34–38. DOI:10.1145/3358528.3358597.
Mukherjee A, Mukhopadhyay S, Panigrahi PK, et al. Utilization of Oversampling for multiclass sentiment analysis on Amazon Review Dataset [C]//IEEE International Conference on Awareness Science and Technology(ICAST), 2019:413–418.
Omara E, Mosa M, Ismail N, Deep Convolutional Arabic Senti-ment Analysis With Imbalanced Data[C]//International Computer En-gineering Conference(ICENCO), 2019:198–203.
Wen Xin-ting, Chen Yi-lin.Sentiment analysis model of imbal-anced comment texts based on deep learning[J].Information Research, 2022(7):14–22.
Yin Hao, Li Shou-shan, Gong Zheng-xian, et al.Imbalanced emtion classification based on Multi-Channel LSTM[J]. Journal of Chinese Information Processing, 2018, 32(1):7.
Zhang Zhi-wu, Xue Juan, Chen Guo-lan. Sentiment analysis of class imbalance data under the framework of deep learning[J]. Journal of Modern Information, 2021, 41(10):8.
MIKOLOV T,CHEN K,CORRADO G S,et al.Efficient estima-tion of word representations in vector space [C]// Proceedings of the 2013 International Conference on Learning Representations, 2013.
Hinton G E,Srivastava N, Krizhevsky A, et al. Improving neural networks by preventing co-adaption of feature detectors[J]. Computer Science, 2010, 3(4):212–223..

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Sentiment Analysis Model of Imbalanced Comment Texts Based on BiLSTM

Status:

Version 1

Abstract

Figures

I. Introduction

Ii. Model Building

Iii. Experiment And Result Analysis

Iv. Conclusion

References

Additional Declarations

Status:

Version 1