Sentiment Analysis on Text Reviews Using Lexicon Selected-Bert Embedding (LeBERT) Model with Convolutional Neural Network

doi:10.21203/rs.3.rs-2330887/v1

Download PDF

Research Article

Sentiment Analysis on Text Reviews Using Lexicon Selected-Bert Embedding (LeBERT) Model with Convolutional Neural Network

https://doi.org/10.21203/rs.3.rs-2330887/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Sentiment analysis has become an important area of research in natural language processing. This technique has a wide range of applications such as comprehending user preferences in ecommerce feedback portals, politics, and in governance. However, accurate sentiment analysis requires robust text representation techniques that can convert words into precise vectors that represent the input text. There are two categories of text representation techniques: lexicon-based techniques and machine learning-based techniques. From research, both techniques have limitations. For instance, pre-trained word embeddings such as Word2Vec, Glove and Bidirectional Encoder Representations from Transformers (BERT) generate vectors by considering word distances, similarities and occurrences ignoring other aspects such as word sentiment orientation. Aiming at such limitations, this paper presents a sentiment classification model (named LeBERT) combining Sentiment Lexicon, N-grams, BERT and CNN. In the model, Sentiment Lexicon, N-grams and BERT are used to vectorize words selected from a section of the input text. CNN is used as the deep neural network classifier for feature mapping and giving the output sentiment class. The proposed model is evaluated on Yelp’s three datasets (movie, restaurant and products’ reviews) using accuracy, precision and F-measure as performance metrics. The experimental results indicate that the proposed LeBERT model outperform the existing state-of-the-art models with an F-measure score of 88.73% in binary sentiment classification.

Computational Neuroscience

Natural Language Processing

Word embeddings

BERT

Sentiment Analysis

Convolutional Neural Network

Sentiment Lexicon

Recently, social media platforms have created opportunities for businesses and organizations to get feedback from their customers and clients through reviews in form of user generated posts. Such posts are availed through social media and worldwide web in form of blogs which contain data in text, audio, visual or combination of the three modes. Specifically, social media text data is characterized by short sentences which are unstructured, semi-structured and normally full of colloquial language making it messy, difficult and time consuming to build its vector representations and sentiment classification [1, 2, 3, and 4]. However, through sentiment analysis (SA), one of the big data analytics techniques, the text data can provide insightful business information [4]. Sentiment analysis is the process of classifying texts into predetermined opinion classes [3] which can be done at document level, sentence level or word level. Sentence level SA is a text classification task that assigns short texts (sentences) to predefined sentiment or opinion classes. Sentiment analysis of social media data is a sentence level SA task since most posts are very short usually less than forty (40) words. Currently, few tools can perform sentiment analysis of social media text data effectively [4]. This is attributed to the nature of social media texts, which are unstructured, making it difficult to extract the right features at text representation phase. According to Zhiying Jiang et al. [1], text representation is the second phase in sentiment analysis after text data preprocessing. In this phase, documents or sentences are converted into numeric vectors that represent the texts by use of vector space models (VSM).

Conversion of text to vector representation is the cornerstone of text classification models [5]. The accuracy and efficiency of sentiment analysis is dependent on whether or not the word vector is representative of the text [5, 6, and 7]. From literature, there are two widely used text-vector representation techniques: (1) Natural Language Processing (NLP) techniques based on bag of words, Part of Speech (POS) tags and Sentiment Lexicons [1, 8, 9, and 10], (2) Deep learning-based automated vector representation approaches such as word embeddings [11, 12, and 13]. Word Embedding is one of the most useful deep learning methods used for constructing vector representations of words and documents in text classification tasks. This is because of their abilities to capture the syntactic and semantic relations among words [14]. Word embeddings models are based on deep learning Word2Vec [15], Global Vectors (Glove) [16], FastText [17] and Bidirectional Encoder Representations from Transformers (BERT) model [18]. Although these word embeddings methods are very effective compared to conventional NLP based methods, [19, 20] they have some limitations and thus need improvement. For instance, effective training and vector representation of words, word embeddings require very large corpus. Due to these limitations, researchers use pre-trained word embeddings for transfer learning which may not correspond well with their data especially small sized datasets [21]. Further, the pre-trained word embeddings vectors do not consider the context of the word or other characteristics of the word such as semantic orientation of the word. Existing NLP techniques such as sentiment lexicon, POS tags and word positions can be used to improve performance of sentiment analysis models based on word embeddings [14].

In this paper, we propose a deep learning-based sentiment analysis model for user reviews which combines sentiment lexicon, N-grams and BERT word embeddings. In the model, we combine pre-trained word embeddings with sentiment lexicon to generate word representation for sentiment analysis. A text (review, sentence or a document) is treated as a collection of word N-grams and a sentiment lexicon is used to identify a section (N-grams) of the text where a sentiment may be found. BERT Pre-trained word embeddings are then used to build vector representation of the text. In addition to solving the aforementioned limitations of word embeddings, our model reduces high feature dimensionality and computational costs brought by building word vectors from the entire text. We evaluate the proposed approach on the Yelp datasets in which the experimental results show that the model improves accuracy of pre-trained word embeddings. The main contributions of this paper are as follows:

To the best of our knowledge, the presented model is the first in sentiment analysis in which sentiment lexicon is employed with N-grams and BERT word embeddings in conjunction with CNN.
We propose a model that advances utilization of BERT pre-trained word embeddings model for sentiment analysis.
Although curse of dimensionality of text vectors for sentiment analysis has been solved through feature reduction methods, our model solves such problems in a novel way by proactively reducing the size of input text for embedding layer.

The rest of the paper is organized as follows. Section 2 presents related work. The proposed approach is described in Section 3. Section 4 describes the experimental procedures carried out. The result and discussion are presented in Section 5. Finally, Section 6 concludes the paper and recommends future work in this area of study.

Sentiment analysis (SA) is a branch in NLP which utilizes text mining and related technologies to classify subjective text into classes of opinions, emotions or any other category. Vector representation of text is a very important task in sentiment analysis since it determines the accuracy and efficiency of the developed SA models [5]. Recently, there are many studies that have used pre-trained word embeddings, NLP techniques and deep learning architecture in vector representation and generally in SA. This section briefly describes current research contributions in sentiment analysis especially on combination of NLP techniques (with emphasizes on lexicon based techniques), pre-trained word embeddings and CNNs.

The lexicon based techniques uses dictionary of words labeled with their sentiment orientations. In such techniques a piece of text is converted into a bag of words whose sentiment orientations are summarized or aggregated to classify the text. This technique is simple but it’s mostly dependent on manual labeling of the text [22]. Baharudin and Khan [23] suggested that sentence structure and contextual information are important for sentiment orientation and classification. In their work, each term in the sentence was assigned a sentiment score from Sent WordNet lexicon. The overall classification of the sentence is the sum total score of the individual scores of the terms in the sentence. While the approach is interesting, one of the limitations of this approach is that words can be of the same orientation but negating one another thus giving the wrong sentiment classification. Main improvements of lexicon based techniques involve using lexicon labeled words as input to machine learning classifiers. Mudinas et al. [24] combined lexicon-based approach and Support Vector Machine. In their method they generated word sentiment labels and used them as input to SVM classifier. Seyed et al. [14] used several lexicons to assign lexicon vectors to words in a text which they referred to as Lexicon2Vec (L2V). They combined their vector with Word2Vec and PoS2vec to obtain a hybrid vector representation. Generally, little research has been done on combining lexicon based methods and deep learning architectures.

Generation of word N-grams is another important NLP technique applicable in sentiment analysis. In text classification word-grams are used to generate word co-occurrence patterns and vectors for machine learning classifiers. N-gram NLP models are widely used due to their simplicity and effectiveness [25]. However, they do not consider the information encapsulated in the sequence of the words. For instance words could be negating one another in a sentence or having different meaning in different context. Kumar et al [26] in their recent research on use of N-grams in text representation used bi-grams and tri-grams to extract features from text data. Their work yielded promising results which is an indication that N-grams can be utilized for effective text representation. They proposed a big data analytics framework for sentiment analysis and classification using intelligent cognitive inspired computing. In their model, they used fuzzy cognitive maps as classifier. In our research we advance this work by investigating use of hybrid NLP techniques including N-grams and sentiment lexicon. They also recommended for future research on deep learning architecture, an area which is also being explored in this research work. We do so by seeking to combine pre-trained word embeddings with sentiment lexicon and N-grams.

Recently, Word embeddings-based vector representation techniques are playing an important role in natural language processing [27]. According to Mikolov [28] research in word embedding feature selection gained momentum in 2013. The main word embeddings algorithms are the Word2Vec [15], Glove [16] and FastText [17, 29] which are used to convert words to vectors. Recently, Bidirectional Encoder Representations from Transformers (BERT) model [18] has received much attention due to its bidirectional and attention mechanisms. Consequently, use of BERT embedding-based models outperforms other models thus showing remarkable performance in sentiment analysis tasks [30, 31]. Word embeddings are better than the normal bag of words representation since they cater for synonyms and produce vectors with lower dimensionality than the bag of words [14, 15]. Garg [32] did research on word embeddings and established that Word2Vec embeddings performed better than the other word embedding algorithms. Currently, most researchers use pre-trained word embeddings vectors as inputs of machine learning classifiers in their sentiment analysis research since they are more accurate and compatible with deep learning neural networks [22]. However, pre-trained word embeddings ignore sentiment orientation of words and their context hence affecting sentiment classification accuracy [14, 27]. This is because they use word distances and synonyms to calculate word vectors.

Kim [33] studied use of pre-trained Word2Vec vectors as inputs to convolutional neural networks and improved their performance by hyper parameter tuning of the CNN model. Wang et al. [34] used pre-trained Glove vectors as inputs of attention-based LSTMs model for aspect-level sentiment analysis. Liu et al. [21] used pre-trained Word2Vec in idiom recommendation model in essay writing. Liu et al. [35] used pre trained Word2Vec model and improved them for cross-domain classification by extending the vector to include domain information. Recently D’Silva & Sharma, [36] used FastText pre trained word embeddings and neural networks to classify Konkani texts. Prottasha et al. [30] did a study to compare Word2Vec, Glove, FastText and BERT. They demonstrated that transformer architectures such as BERT models are the state-of-the-art models for text representation and play a crucial role in sentiment analysis. The superiority of BERT is that it can read series of words in either direction unlike other word embedding algorithms. Further, BERT employs the attention mechanism of the transformer that assigns a word its vector depending on the surrounding words. This mechanism enhances the semantic representation of the target text. However, the series of input words to be read by the BERT algorithm remains the entire words of the target text. We propose that the performance of BERT algorithm can be enhanced by focusing the input series to a few words of which contain sentiment information and their neighbours of the target text. This can be guided by utilization of sentiment lexicon and word N-grams. In a recent study [13], the researchers investigated a text representation technique using sentiment lexicon and N-grams where a Lexicon-pointed hybrid N-gram Feature Extraction Model (LeNFEM) was proposed and investigated. A three word N-gram was identified that contains a sentiment word by use of a sentiment lexicon. The N-gram was then expanded to form a hybrid vector containing words, POS tags and sentiments. Although this is a novel text representation technique a proposal was put forth on investigation of how the approach could be applied with deep learning models including word embeddings. In this paper, we extend on this work and present a text representation technique named Lexicon Selected-Bert Embedding (LeBERT) Model. The model combines Sentiment Lexicon and BERT word embeddings via word N-grams for sentiment classification.

Based on the related works discussed, we observe that existing deep learning models for sentiment analysis generate text representation vectors using word embeddings. We also noted that BERT model is one of the state-of-the-art embedding models thus any study on improving it advances sentiment analysis and Natural Language Processing research. With this objective, this study suggested and investigated combination of BERT word embedding model, Sentiment Lexicon and N-grams. The novelty of the proposed LeBERT model is that Sentiment Lexicon is utilized to identify a section of a text (sentence or a document) where sentiment information is domiciled and BERT algorithm used to build word vectors from that section only. In section 3 we present and describe the details of the proposed model.

In deep learning, BERT model is one of the current word embeddings and text representation models under study for sentiment analysis. BERT, unlike other word embedding algorithms, can effectively read series of words in either direction of the input text, and since it uses the attention mechanism to assign a word its vector depending on the surrounding words, it is efficient in word vectorization [37]. Although BERT considers the context of a word when assigning the vector, it does so for all the words in the input text which leads to a resultant vector with high dimensionality. Second, word vectors build from BERT do not contain semantic information which is critical in sentiment classification. Compared with BERT, Sentiment Lexicon can be used to identify sentiment words in a text and assign specific sentiment polarity to the words. However, Sentiment Lexicon cannot generate representative word vectors hence leading to high data sparseness. Thus, to improve sentiment classification, this paper proposes the LeBERT model that combines Sentiment Lexicon, N-grams and BERT algorithms.

The design idea of the LeBERT model is to first use N-grams to split the input text into sections, and then use a Sentiment Lexicon to identify a section or sections that contain a sentiment word. It is worth noting that text reviews such as social media posts contain short text, and characteristically, semantic features in short texts are concentrated in a certain part [37]. Thus extracting features from such parts will lead to efficient and effective text representation. The words of the identified section(s) are then converted into a vector by BERT. The output word vector is then used as the input into a CNN model where a fully connected layer where features from the vector are obtained. The features extracted are then integrated by the dense output layer and finally the sentiment class of the text is done by a SoftMax classifier. The architecture of the proposed LeBERT model is shown in Fig. 1.

As shown in Fig. 1, the Sentiment Lexicon, N-grams and BERT algorithm are used in the embedding layer to build the word vector. The overall sentiment analysis model using LeBERT model is presented in Fig. 2.

3.2 LeBERT Embedding

There are currently two common methods used to build text vectors for sentiment analysis; word embedding based methods or lexicon based methods. In our proposed model we sought to utilize both methods through N-grams. The Sentiment Lexicon is used to identify word N-grams containing a sentiment word and then the vector from the N-gram words using BERT word embedding model.

To build the vector, we first generate word N-grams from the sentences. An N-gram is a combination of words from a sentence which forms a Markovian process. Normally this is used to predict the next word in a sequence of words. Further, Markovian process also generates co-occurrence of words which is a key aspect in influencing sentiment in a text. In this case we use N-gram sequences to partition a sentence into various sections that represents the entire text such as an online review or a sentence. This is because N-grams present co-occurrence of words in a text in a more comprehensive manner than mere bag of words (BoW). The size of the partition depends on the value of N.

For instance, if we consider a sentence S given as:

$$S=\left\{w1, w2, w3, w4, w5, \dots , \dots , \dots wn\right\}$$

Where w_i are words.

For various values of N we have;

N = 1, the set of N-grams N ₁ ={w ₁, w₂, w₃ ,…, w_n}

N = 2, the set of N-grams N ₂ ={w ₁ _w ₂, w_{2_}w₃, w₃_w₄,… ,w_n−1_w_n }

N = 3, the set of N-grams N ₃ ={w ₁ _w ₂ _w ₃, w₂_w₃_w₄, w₃_w₄_w_5, … ,w_n−2_w_n−1_w_n}

The fundamental idea is that with the set of N-grams it is possible to select a section of the entire input text. This ensures that we use the most significant words when building text vector for sentiment analysis. Once the N-gram(s) are identified from the text, it is then reverted to a bag of words. Each word is then converted into a vector using BERT word embedding algorithm.

Definitions of parameters used in the algorithm

Let;

L: Sentiment Lexicon; C: Corpus of subjective user reviews (R_i); V_i: Vector representation of a subjective review (R_i); W_t: Sentiment term; W₁: the first word neighboring the sentiment term; W₂: the second word neighboring the sentiment term.

We define the text vector, v, of a subjective review, R_i, as the vector originating from a selected section of the review S_i using sentiment lexicon and BERT word embedding model (Be). The algorithm listing of the sentence vector representation generation is presented in Fig. 3.

3.3 The CNN Layer

The CNN deep learning model is used as the classifier which uses the resultant vector from LeBERT embedding as input and gives the sentiment class as the output. CNNs are specialized type of artificial neural networks which are capable of outperforming the common machine learning algorithms in supervised learning tasks. CNNs’ main function is to identify and learn the information characteristic patterns through the use of convolution layers and thus facilitate classification of the objects. The CNN model is presented in Fig. 4. Using the convolution kernels (windows) and the nonlinear function (filter) feature maps are obtained. A pooling operation is then applied on the feature maps to select the optimal features. The dense output layer then classifies the optimal features using softmax activation function (which uses probability) into a positive or a negative class.

This section describes the dataset used; the experiments set up and carried out to evaluate the performance of the proposed model. The tools and techniques used in model formulation and evaluation are also discussed.

4.1 Dataset

In order to evaluate the effectiveness of the proposed model, the experiments were carried using a dataset complied from three public datasets. The dataset contains three world datasets including; Amazon products’ reviews dataset with 70,000 reviews, Imbd dataset with 50,000 movie reviews and Yelp dataset with 300,000 restaurants’ reviews. In the experiments we used 3,000 reviews as compiled by Kotzias et al. [38] and published in Machine Learning repository. For each website, Kotzias et al. [38] randomly sampled 500 positive and 500 negative tweets which were clearly positive and negative.

4.2 Experiment Setup

The reviews presented in the dataset were cleaned of non-english words and preprocessed. Tokenization, n-grams generation, text vector building and designing of the CNN layers was done using python programming language in the virtual laboratory (Google Colaboratory). The obtained vector was split into two subsets, 80% of the dataset was used for training the CNN model and the other 20% for evaluating the classification performance. Since the dataset contained multiple sentences (reviews), pooled output was used in the BERT embedding. The Rectified Linear Unit (RELU) was used as the activation function with 100 neurons for the hidden fully connected layer. The output dense layer was set up with two (2) neurons since the texts were to be classified into two classes. Softmax was used as the activation function which was in line with the text classification problem at hand.

4.3 Model Performance Evaluation.

To verify the effectiveness of the proposed model, a 2 by 2 contingency matrix that shows the number of correctly predicted positive reviews (TP), true negative reviews (TN), false positive reviews (FP) and false negative reviews [39] was used as shown in Table 1.

Table 1

Contingency Table
	Classified as Positive	Classified as negative
Actual Positive	TP	FN
Actual Negative	FP	TN

Four model evaluation metrics were selected: accuracy, precision, recall, and F-measure. From Table 1 we calculated the metrics as discussed and presented in equations 4–7.

Accuracy is the ratio of the correctly classified predictions to the total sum of predictions. It is given as;

$$Accuracy= \frac{TP+TN}{(TP+FN+FP+TN)}$$

Precision is the ratio of accurately classified data to the total data classified in the class. It is given as;

$$Precision= \frac{TP}{(TP+FP)}$$

Recall is the ratio of accurately classified data to the actual data in the class. It is given as;

$$Recall= \frac{TP}{(TP+FN)}$$

F-measure is the mean of precision and recall. It is given as;

$$\text{F}-\text{m}\text{e}\text{a}\text{s}\text{u}\text{r}\text{e}= \frac{2 \times \text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n} \times \text{R}\text{e}\text{c}\text{a}\text{l}\text{l}}{\left(\text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n}+\text{R}\text{e}\text{c}\text{a}\text{l}\text{l}\right)}$$

This section describes the results obtained from the experiments. We tested our approach on the three benchmark datasets. In the study, we used 50-dimensional Glove word embeddings trained on Google News, 250-dimensional Word2Vec embeddings trained on Wikipedia and 128-dimensional BERT embeddings trained on English Wikipedia corpus. In the training, implementation and evaluation of our models we used tensor flow. The reported results are based on 8:2 training and testing ratio. Among the training set a small potion (100) of the reviews was used for validation. In section 5.1 we present the model parameters of the designed CNN model. In section 5.2 we present the model performance results in comparison to baseline models.

We first sought to test the effect of using sentiment lexicon on the input text data and the vector. We compared the shape of Yelp dataset (restaurants reviews) before and after using the sentiment lexicon. Table 2 presents the details of the text data.

Table 2

Details of the text data before and after using sentiment lexicon
Text Data Item	Before Using the Sentiment Lexicon	After Using the Sentiment Lexicon
Characters(no spaces)	46, 744	14, 182
Characters(with spaces)	56, 616	19, 212
Number of words	10, 863	2, 989
Number of paragraphs	996	996
Average Number of words per Post/paragraph	11	3

From Table 2, it was evident that application of sentiment lexicon to select a section of the input text significantly reduced the size of input text. Although the number of posts or paragraphs remained the same, the shape of the input text changed from 11 rows to 3 rows which in turn would reduce the computation time for the model. We then designed and performed experiments with deep learning CNN to evaluate how the LeBERT embedding model would perform in sentiment analysis.

5.1 Model Parameters BERT, Glove and Word2Vec Pre-trained Word Embeddings

The model parameters for the BERT word embeddings were as shown in Table 3.

Table 3

Model Parameters for BERT word embedding
Layer (Type)	Output Shape	Parameters #
Keras Layer	(None, 128)	4385921
Dense Hidden layers	(None, 16)	2064
Dense Output Layer	(None, 1)	17
Total Parameters 4,388,002 Trainable Parameters: 4,388,001 Non-trainable Parameters: 1

From Table 3, keras layer represents the shape of embedding and the preprocessor used for the BERT model. In the experiment, the BERT word embeddings were initialized using BERT small due to limitations of computation resources. Consequently, the dimension of the word embedding was set to 128 and appropriate preprocessor for the BERT was set. Glove and Word2Vec word embeddings of 50 and 250 dimensions respectively were used as baseline models and their parameters were set as shown in Tables 4 and 5.

Table 4

Model Parameters for 50-diemsional Glove Word Embeddings
Layer (Type)	Output Shape	Parameters #
Keras Layer	(None, 50)	48190600
Dense Hidden layers	(None, 16)	816
Dense Output Layer	(None, 1)	17
Total Parameters 48,191,433 Trainable Parameters: 48,191,433 Non-trainable Parameters: 0

Table 5

Model Parameters for 250-dimensional Word2Vec Word Embeddings
Layer (Type)	Output Shape	Parameters #
Keras Layer	(None, 250)	252343750
Dense Hidden layers	(None, 16)	4016
Dense Output Layer	(None, 1)	17
Total Parameters 48,191,433 Trainable Parameters: 48,191,433 Non-trainable Parameters: 0

From Tables 4 and 5, The Keras layer represents the input layer in which the input vector was obtained using the Glove and Word2Vec word embeddings. The shape of the Keras layer was determined by the dimensions of the word embeddings. The dense output layer is for binary classification of the input text into positive or negative sentiment.

5.2 Ablation Study on Effect of Size of N-grams on LeBERT Model

In order to verify the effectiveness of using LeBERT model as the embedding layer to generate word vectors, we first did an experiment to study the effect of the size of N-grams on the LeBERT model with CNN. In the experiment, the restaurant reviews datasets was used. The experimental results of N = 1,2, 3 and all words were as shown in Table 6.

Table 6

Sentiment Classification results of various sizes of N-grams with LeBERT model.
N-Grams	LeBERT - CNN
N-Grams	Accuracy (%)	Precision (%)	Recall (%)	F-measure (%)
N = 1	65.02	65.02	65.15	65.08
N = 2	79.45	79.50	80.04	79.77
N = 3	88.20	88.45	89.01	88.73
N = 4	87.65	87.65	87.80	87.72
All words	84.00	84.00	84.20	84.10

For N = 1, implies that for each sentence only one word was used which was chosen by the sentiment lexicon. The results indicate a low performance since one word cannot represent the sentiment of the entire text. The highest model performance was obtained at N = 3. As shown in Table 6, we generated N-grams up to N = 4 due to computational resources. The category of ‘All words’ implies that the sentiment lexicon was not applied on the input text to select some words hence this reverts to the original BERT model. The results indicated that N = 3 is an ideal size of N-gram for the proposed model. Section 5.3 presents the performance results of the model in comparison to the baseline models in the three datasets.

5.3 Comparison of LeBERT Model Performance with Baseline Models

The experiment was carried out to validate the performance of the proposed LeBERT model in terms of accuracy, recall, precision and F-measure of the CNN on the three discussed datasets. Glove and Word2Vec were used as baseline word embedding models. In this experiment, tri-grams (N = 3) were used. Tables 7, 8 and 9 shows the performance results on restaurants reviews, movie reviews and products reviews datasets respectively.

Table 7

Sentiment Classification Prediction under Yelp Dataset (Restaurant Reviews)
Embedding Model	Accuracy (%)	Precision (%)	Recall (%)	F-measure (%)
Glove	78.50	78.56	78.70	78.63
Le-Glove	81.50	82.00	83.01	82.50
Word2Vec	75.50	75.50	75.80	75.65
Le-Word2Vec	82.40	82.45	83.15	82.80
BERT	84.00	84.00	84.20	84.10
LeBERT(our Model)	88.20	88.45	89.01	88.73

Table 8

Sentiment Classification Prediction under IMDB Dataset (Movie Reviews)
Embedding Model	Accuracy (%)	Precision (%)	Recall (%)	F-measure (%)
Glove	79.50	79.50	80.10	79.80
Le-Glove	82.50	82.70	83.25	82.97
Word2Vec	77.45	77.46	78.01	77.73
LeWord2Vec	83.00	83.02	83.42	83.22
BERT	84.01	84.08	84.63	84.35
LeBERT(our Model)	86.10	86.71	87.00	86.85

Table 9

Sentiment Classification Prediction under Amazon Dataset (Products Reviews)
Embedding Model	Accuracy (%)	Precision (%)	Recall (%)	F-measure (%)
Glove	79.00	79.00	79.65	79.32
Le-Glove	79.60	80.00	80.45	80.22
Word2Vec	79.50	79.50	80.25	79.87
Le-Word2Vec	81.50	81.50	82.05	81.77
BERT	81.72	81.75	82.04	81.89
LeBERT	82.40	82.40	82.64	82.52

The presented tables indicate the comparative results between the pre-trained word embeddings with and without the proposed fusion with Sentiment Lexicon. Generally the proposed LeBERT model performs better compared to the baseline word embeddings models. Accuracy is considered to be a good performance evaluation metric when the classes are balanced [38]. Since in our experiments all the three datasets exhibited balanced classes we compared accuracy of the model with the various approaches for the three datasets. The results obtained were as shown in Fig. 5.

From Fig. 5, proposed model had highest accuracy in all datasets with relatively lower accuracy on product reviews dataset. This could be attributed to the fact that the reviews referred to various products thus the sentiment terms varied from one product to another.

In this paper, we proposed a sentiment analysis model based on Sentiment Lexicon, N-grams, BERT word embedding and CNN. In the model (named LeBERT model), a section of a document or a sentence where sentiment information can be highly found is selected using sentiment lexicon and word N-grams and then the words vectorized using BERT word embedding algorithm. A CNN classifier is then used to classify the input vector into a sentiment class. To validate the performance of the proposed LeBERT model, original Word2Vec, Glove and BERT word embeddings are used as baseline models on three benchmark sentiment datasets. From the experimental results, LeBERT model had high predictive performance than all the baseline models.

Experimental results indicated that our proposed model advances sentiment analysis techniques that use word embeddings and deep learning models with increased classification metrics performance. The limitation of this research study is that the designed model utilized convolutional neural network (CNN) only. In future, the LeBERT embedding model could be implemented and evaluated in other neural networks such as long short-term memory (LSTM) architectures. Evaluation of the model on other text classification tasks is also a promising investigation.

Author Contributions

Conceptualization, James Mutinda(J.M) , Waweru Mwangi(W.M) and George Okeyo (G.O); Data curation, J.M; Formal analysis, G. O and W. M; Investigation, J. M, G. O and W. M; Methodology, J.M., G. O. and W. M; Supervision, G. O. and W. M; Writing – original draft, J. M.; Writing – review & editing, G. O and W. M.

Data Availability

The dataset used in the experiments is publicly available from https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences. The dataset was compiled by Kotzias et al [38] which is cited in this research work.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Jiang Z., Gao B., He Y., Han Y., Doyle P., and Zhu Q. (2021). Text Classification Using Novel Term Weighting Scheme-Based Improved TF-IDF for Internet Media Reports. Mathematical Problems in Engineering Volume 2021, https://doi.org/10.1155/2021/6619088.
Onan A., (2021). Ensemble of Classifiers and Term Weighting Schemes for Sentiment Analysis in Turkish. Scientific Research Communications, 1(1). https://doi:10.52460/src.2021.004.
Kalarani P. & Brunda S. S. (2015). An overview on Research Challenges in Opinion Mining and Sentiment Analysis. International Journal of Innovative Research in Computer and Communication Engineering. Vol. 3, Issue 10, October 2015, ISSN: 2320–9801.
Yang J., Xiu P., Sun L., Ying L., Muthu B. (2022). Social media data analytics for business decision making system to competitive analysis, Information Processing & Management, Volume 59, Issue 1, 2022, 102751, ISSN 0306–4573, https://doi.org/10.1016/j.ipm.2021.102751.
Li Rao. (2022). Sentiment Analysis of English Text with Multilevel Features, Scientific Programming, vol. 2022, Article ID 7605125, 10 pages, 2022. https://doi.org/10.1155/2022/7605125.
Aytug˘ Onan & Serdar Korukog˘lu (2017). A feature selection model based on genetic rank aggregation for text sentiment classification. Journal of Information Science 2017, Vol. 43(1) 25–38, DOI: 10.1177/0165551515613226
Bhadane, C., Dalal, H., & Doshi, H. (2015). Sentiment Analysis: Measuring Opinions. Procedia Computer Science, Vol. 45, 808–814.
Mozetič, I., Grčar, M., & Smailovič, J. (2016). Multilingual twitter sentiment classification: The role of human annotators. PloS One, 11(5), e0155036. doi: 10.1371/journal.pone.0155036
Bin Li & Yuan G. (2012). Improvement of TF-IDF Algorithm Based on Hadoop Framework. Proceedings of the 2nd International Conference on Computer Application and System Modeling (ICCASM 2012). pp 391–393. Atlantis Press. https://doi.org/10.2991/iccasm.2012.98.
Ankit, N. S. (2018). An ensemble classification system for Twitter Sentiment Analysis. Procedia Computer Science: Elsevier Journal Science direct. 132. pp 937–947.
Chug, A., Kohli, S., Gupta, S., Ahu, P., & Ahuja, R. (2019). The Impact of Features Extraction on the Sentiment Analysis. Procedia Computer Science, 152, 341–348.
Rao G., Huang, Z. F., and Cong Q. (2018). LSTM with sentence representations for document level sentiment classification. Neurocomputing, Vol.308, 49–57.
Mutinda J, Mwangi W, & Okeyo G. (2021). Lexicon-pointed hybrid N-gram Features Extraction Model (LeNFEM) for sentence level sentiment analysis. Engineering Reports. 3.10.1002/eng2.12374.
Rezaeinia S. M., Rahmani R., Ghodsi A., Veisi H. (2019). Sentiment analysis based on improved pre-trained word embeddings, Expert Systems with Applications, Volume 117, Pages 139–147, ISSN 0957–4174, https://doi.org/10.1016/j.eswa.2018.08.044.
Mikolov T., Chen K., Corrado G., Dean J. (2013). Efficient Estimation of Word Representations in Vector Space. Proceedings of the International Conference on Learning Representations (ICLR) USA. http://arxiv.org/abs/1301.3781.
Pennington J., Socher R., and Manning C. (2014). GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar. Association for Computational Linguistics.
Bojanowski P., Grave E., Joulin A., Mikolov T. (2017). Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 2017; 5 135–146. doi: https://doi.org/10.1162/tacl_a_00051
Kenton, J. D. M. W. C., & Toutanova, L. K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT (pp. 4171–4186).
Sharma A. K., Chaurasiaa S., Srivastavaa D. K. (2020). Sentimental Short Sentences Classification by Using CNN Deep Learning Model with Fine Tuned Word2Vec. Procedia Computer Science 167 pp 1139–1147.
Dashtipour, K., Gogate, M., Adeel, A., Larijani, H., & Hussain, A. (2021). Sentiment Analysis of Persian Movie Reviews Using Deep Learning. Entropy, 23(5), 596. MDPI AG. Retrieved from http://dx.doi.org/10.3390/e23050596
Liu, Y., Liu, B., Shan, L., & Wang, X. (2018). Modelling context with neural networks for recommending idioms in essay writing. Neurocomputing, 275, 2287–2293.
Giatsoglou, M., Vozalis, M., Diamantaras, K., Vakali, A., Sarigiannidis, G., & Chatzisav-vas, K. (2017). Sentiment analysis leveraging emotions and word embeddings. Expert Systems with Applications, 69, 214–224.
Baharudin B. & Khan A. (2011). Sentiment Classification Using Sentence-level Semantic Orientation of Opinion Terms from Blogs. IEEE, 978-1-4577-1884-7/11/$26.00©2011.
Mudinas, A., Zhang, D., & Levene, M. (2012). Combining lexicon and learning based approaches for concept-level sentiment analysis. In Proceedings of the first international workshop on issues of sentiment discovery and opinion mining (pp. 1–8).
Fotis A., Dimitrios T., John V. & Theodora V. (2016). Using N-Gram Graphs for Sentiment Analysis: An Extended Study on Twitter. 44–51. 10.1109/ BigDataService.2016.13.
Jain D. K., Prasanthi B., Venkatesh J., Prakash M. (2022). An Intelligent Cognitive-Inspired Computing with Big Data Analytics Framework for Sentiment Analysis and Classification, Information Processing & Management, Volume 59, Issue 1, 2022, 102758, ISSN 0306–4573, https://doi.org/10.1016/j.ipm.2021.102758.
Araque, O., Corcuera-Platas, I., Sánchez-Rada, J., & Iglesias, C. (2017). Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Expert Systems with Applications, 77(19), 236–246.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems (pp. 3111–3119), Lake Tahoe, CA.
Chandrasekaran, G., Nguyen, T. N., & Hemanth D, J. (2021). Multimodal sentimental analysis for social media applications: A comprehensive review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 11(5), e1415.
Prottasha, N. J., Sami, A. A., Kowsher, M., Murad, S. A., Bairagi, A. K., Masud, M., & Baz, M. (2022). Transfer Learning for Sentiment Analysis Using BERT Based Supervised Fine-Tuning. Sensors, 22(11), 4157.
Jain, P. K., Quamer, W., Saravanan, V., & Pamula, R. (2022). Employing BERT-DCNN with sentic knowledge base for social media sentiment analysis. Journal of Ambient Intelligence and Humanized Computing, 1–13.
Garg, S.B., Subrahmanyam, V.V. (2022). Sentiment Analysis: Choosing the Right Word Embedding for Deep Learning Model. In: Bianchini, M., Piuri, V., Das, S., Shaw, R.N. Advanced Computing and Intelligent Technologies. Lecture Notes in Networks and Systems, vol 218. Springer, Singapore. https://doi.org/10.1007/978-981-16-2164-2_33
Kim Y. (2014). Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1746–1751, Doha, Qatar. Association for Computational Linguistics.
Wang Y., Huang M., Zhu X., and Zhao L. (2016). Attention-based LSTM for Aspect-level Sentiment Classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 606–615, Austin, Texas. Association for Computational Linguistics.
Liu, J., Zheng, S., Xu, G. (2021). Cross-domain sentiment aware word embeddings for review sentiment analysis. Int. J. Mach. Learn. & Cyber. 12, 343–354 (2021). https://doi.org/10.1007/s13042-020-01175-7.
D’Silva, J., & Sharma, U. (2022). Automatic text summarization of konkani texts using pre-trained word embeddings and deep learning. International Journal of Electrical & Computer Engineering (2088–8708), 12(2).
Hairuo Yang, (2022). Network Public Opinion Risk Prediction and Judgment Based on Deep Learning: A Model of Text Sentiment Analysis. Computational Intelligence and Neuroscience. vol. 2022, Article ID 1221745, 9 pages, 2022. https://doi.org/10.1155/2022/1221745.
Kotzias D., Denil M., Nando de Freitas, and Smyth P. (2015). From Group to Individual Labels Using Deep Features. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, 597–606. DOI: https://doi.org/10.1145/2783258.2783380.
Singh, K. N., Devi, S. D., Devi, H. M., & Mahanta, A. K. (2022). A novel approach for dimension reduction using word embedding: An enhanced text classification approach. International Journal of Information Management Data Insights, 2(1), 100061.

Download PDF

Version 1

posted

You are reading this latest preprint version

Sentiment Analysis on Text Reviews Using Lexicon Selected-Bert Embedding (LeBERT) Model with Convolutional Neural Network

Status:

Version 1

Abstract

Figures

1 Introduction

2 Related Work

3 The Proposed Lebert Model

3.2 LeBERT Embedding

3.3 The CNN Layer

4 Experiments

4.1 Dataset

4.2 Experiment Setup

4.3 Model Performance Evaluation.

5 Results And Discussion

5.1 Model Parameters BERT, Glove and Word2Vec Pre-trained Word Embeddings

5.2 Ablation Study on Effect of Size of N-grams on LeBERT Model

5.3 Comparison of LeBERT Model Performance with Baseline Models

6 Conclusion

Declarations

References

Status:

Version 1