In deep learning, BERT model is one of the current word embeddings and text representation models under study for sentiment analysis. BERT, unlike other word embedding algorithms, can effectively read series of words in either direction of the input text, and since it uses the attention mechanism to assign a word its vector depending on the surrounding words, it is efficient in word vectorization [37]. Although BERT considers the context of a word when assigning the vector, it does so for all the words in the input text which leads to a resultant vector with high dimensionality. Second, word vectors build from BERT do not contain semantic information which is critical in sentiment classification. Compared with BERT, Sentiment Lexicon can be used to identify sentiment words in a text and assign specific sentiment polarity to the words. However, Sentiment Lexicon cannot generate representative word vectors hence leading to high data sparseness. Thus, to improve sentiment classification, this paper proposes the LeBERT model that combines Sentiment Lexicon, N-grams and BERT algorithms.
The design idea of the LeBERT model is to first use N-grams to split the input text into sections, and then use a Sentiment Lexicon to identify a section or sections that contain a sentiment word. It is worth noting that text reviews such as social media posts contain short text, and characteristically, semantic features in short texts are concentrated in a certain part [37]. Thus extracting features from such parts will lead to efficient and effective text representation. The words of the identified section(s) are then converted into a vector by BERT. The output word vector is then used as the input into a CNN model where a fully connected layer where features from the vector are obtained. The features extracted are then integrated by the dense output layer and finally the sentiment class of the text is done by a SoftMax classifier. The architecture of the proposed LeBERT model is shown in Fig. 1.
As shown in Fig. 1, the Sentiment Lexicon, N-grams and BERT algorithm are used in the embedding layer to build the word vector. The overall sentiment analysis model using LeBERT model is presented in Fig. 2.
3.2 LeBERT Embedding
There are currently two common methods used to build text vectors for sentiment analysis; word embedding based methods or lexicon based methods. In our proposed model we sought to utilize both methods through N-grams. The Sentiment Lexicon is used to identify word N-grams containing a sentiment word and then the vector from the N-gram words using BERT word embedding model.
To build the vector, we first generate word N-grams from the sentences. An N-gram is a combination of words from a sentence which forms a Markovian process. Normally this is used to predict the next word in a sequence of words. Further, Markovian process also generates co-occurrence of words which is a key aspect in influencing sentiment in a text. In this case we use N-gram sequences to partition a sentence into various sections that represents the entire text such as an online review or a sentence. This is because N-grams present co-occurrence of words in a text in a more comprehensive manner than mere bag of words (BoW). The size of the partition depends on the value of N.
For instance, if we consider a sentence S given as:
$$S=\left\{w1, w2, w3, w4, w5, \dots , \dots , \dots wn\right\}$$
1
Where wi are words.
For various values of N we have;
N = 1, the set of N-grams N 1 ={w 1 , w2, w3 ,…, wn}
N = 2, the set of N-grams N 2 ={w 1 _w 2 , w2_w3, w3_w4,… ,wn−1_wn }
N = 3, the set of N-grams N 3 ={w 1 _w 2 _w 3 , w2_w3_w4, w3_w4_w5, … ,wn−2_wn−1_wn}
The fundamental idea is that with the set of N-grams it is possible to select a section of the entire input text. This ensures that we use the most significant words when building text vector for sentiment analysis. Once the N-gram(s) are identified from the text, it is then reverted to a bag of words. Each word is then converted into a vector using BERT word embedding algorithm.
Definitions of parameters used in the algorithm
Let;
L: Sentiment Lexicon; C: Corpus of subjective user reviews (Ri); Vi: Vector representation of a subjective review (Ri); Wt: Sentiment term; W1: the first word neighboring the sentiment term; W2: the second word neighboring the sentiment term.
We define the text vector, v, of a subjective review, Ri, as the vector originating from a selected section of the review Si using sentiment lexicon and BERT word embedding model (Be). The algorithm listing of the sentence vector representation generation is presented in Fig. 3.