The Use of Machine Learning Algorithms in the Analysis of Sentiments of E- Commerce Customer Reviews and Recommendations Feedback

doi:10.21203/rs.3.rs-5030218/v1

Download PDF

Research Article

The Use of Machine Learning Algorithms in the Analysis of Sentiments of E- Commerce Customer Reviews and Recommendations Feedback

https://doi.org/10.21203/rs.3.rs-5030218/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

The aim of this research is to examine the use of machine learning models in the context of e-commerce customer reviews’ analysis, and more specifically, to classify customers’ recommendations based on textual feedback. The accumulation of a huge amount of unstructured big data reviews on the e-commerce platforms has a major drawback concerning the proper interpretation of the analyzed data, particularly in terms of the identification of overall customer sentiments. In the present study, we used a dataset of women’s clothing reviews and five classification algorithms, namely logistic regression, support vector machine, Naive Bayes, random forest, and light gradient boosting machine, and assessed their performance based on accuracy, precision, recall, and F1 score. The results show that the support vector machine model had the highest overall performance with 89.06% for accuracy and 90.49% for precision can be recommended for sentiment analysis with balanced performance. As for the results, logistic regression and light gradient boosting machine were also quite stable, especially in terms of precision and recall, while Naive Bayes and random forest were characterized by high recall and are good in identifying positive sentiment but with certain trade-offs in precision. The findings of the study are then compared with the previous literature for similarities and differences, especially with ensemble methods, such as random forest that had a fluctuating performance. The study finds that one model does not outperform the others, and the selection of the machine learning algorithm should be based on the characteristics of the dataset and the purpose of the analysis. Further studies are suggested to examine the utilization of deep learning models, the effect of elaborate preprocessing of data, and the concept of combining different models in order to improve the performance of sentiment analysis in the context of e-commerce.

Sentiment Analysis

Machine Learning

Support Vector Machine (SVM)

LightGBM

Naive Bayes

Logistic Regression

Random Forest

The increasing adoption of e-commerce has revolutionized the way consumers shop and what they can shop for. This has created a revolutionary development that has seen the increase of customer generated data especially feedback and reviews for shopping sites. Through analyzing these reviews, it is possible to grasp the customers’ attitudes and opinions to meet their needs and improve satisfaction. In this context, machine learning (ML) algorithms have become effective techniques for processing large amounts of textual data to provide valuable insights, for instance, from customers’ feedback [1]. This paper examines how implementing ML in the context of sentiment analysis can help in perceiving product recommendations, which can greatly influence consumers’ choice and experience [2]. This paper aims to provide an overview of the current literature on the application of different ML algorithms for sentiment analysis of e-commerce customer reviews and for predicting customers’ propensity to recommend a product.

Currently, artificial intelligence (AI) has been widely used in e-commerce to analyze big data that comes from a consumer’s activity. Sentiment analysis, which is a subfield of natural language processing (NLP) deals with the use of AI to determine the sentiments of texts [3]. For its part, NLP is a ML technology that permits the computers to understand, analyze, and manipulate the human language [4]. This paper employs five classification models—logistic regression, support vector machine (SVM), Naive Bayes, random forest, and light gradient boosting machine (LightGBM)—to recognize the customers’ sentiments from e-commerce websites [5]. These models are compared with the help of customer recommendations vital in sales and the competitiveness of the e-commerce business [6]. There is a relationship between reviews and recommendations in the interpersonal textual feedback, where the recommendation decision depends on the positive (or negative) sentiment expressed [7]. This textual data are then analyzed by the ML algorithms that learn the frequencies and probabilities of the word choices, their tone, and the context and then comes up with the sentiment as positive or negative. SVM and LightGBM stand out by their ability to learn from these patterns, feature engineer the relevant features from the text, and give the probability of a recommendation depending on the sentiments shown [8]. Therefore, the performance of these algorithms is compared in this paper to identify the best approaches to sentiment analysis in the e-commerce domain.

The uses of sentiment analysis do not stop at the customer satisfaction aspect in e-commerce; rather, it is vital in the formulation of marketing strategies and product design as well as customer relations management. Arguably, the information derived from sentiment analysis can help business organizations to make decisions in consonance with those of their customers and increase brand loyalty and market share [9]. This paper adds to the existing literature on the use of ML in e-commerce by analyzing the performance of various algorithms in sentiment classification and recommendation generation. It is anticipated that the work will be useful for e-commerce platforms intending to have or enhance sentiment analysis as a component of the service, improving customer relationship and overall business results.

Even though sentiment analysis has come a long way in improving its abilities, many problems remain that affect the efficiency of the analysis, especially in the area of e-commerce. The first major research problem identified is the data quality issues, including noise and the relatively widespread unbalanced datasets that may affect the model’s prediction and result in incorrect sentiment assessment [10]. Most e-commerce datasets are highly unbalanced, meaning that they include many more positive and negative reviews, which leads to the development of models that do not work on new data [11]. Moreover, the richness of human language has features, such as irony, sarcasm, and other forms of contextual meanings. The above problems show that the conventional models fail in identifying these subtle nuances, and this misclassification hampers the application of sentiment analysis in the decision-making process [12]. Also, the current focus on general models has shortcomings since such models do not have to be well developed to the language and context used in the e-commerce domain, for instance fashion or electronics [13]. This research aims to fill these gaps by applying state-of-the-art technologies in the enhancement of data cleaning to reduce noise, as well as utilizing the features of the particular domain in the model’s development to increase its efficiency and understandability. By paying attention to the specific difficulties of e-commerce sentiment analysis, this research intends to bring value to the existing literature and help businesses gain more accurate and practical insights into their customers’ sentiments, paving the way for future developments in the field.

This paper is structured as follows: the introduction highlights the background of the study and the relevance of sentiment analysis in the sphere of e-commerce. Next, the literature review focuses on previous work regarding the utilization of ML in sentiment analysis and recommendation systems as well as the current state and future directions of the research. The methodology section follows, which describes the research design, which includes the identification of the datasets, the feature extraction techniques as well as the application of the ML algorithms. The results section contains an analysis the performance results for all proposed models in the experiment through measures, such as accuracy, precision, recall, and F1 score. Next, the findings of this study are discussed in light of the previous research and possibilities of practice implications and future research directions are presented in the discussion section. At the end of this research, conclusion and future work sections present the main contributions of the study and recommendations for further investigations on the use of ML for sentiment analysis in e-commerce, which is the plan best suited for the study, as it ensures that the paper adopts a structured approach to addressing the research objective.

2.1 Sentiment Analysis

It has become a vital application in the e-commerce sector since the sentiment of customers has to be determined to achieve success in the business. Due to the increased use of e-commerce platforms, there is an enormous amount of unstructured data from customers’ reviews, feedback, and social media interactions. Sentiment analysis gives a structure to organize this data into meaningful results [14]. Analyzing the emotions and opinions in customer reviews, companies can estimate the customers’ satisfaction with the products and services and adjust their strategies to meet the customers’ expectations [15]. Also, sentiment analysis allows companies to recognize the tendencies and possible problems in real-time, including the customers’ needs and complaints, which is crucial in the fierce competition of the digital environment [16]. It is critical to know what the customer feels and thinks in order to enhance the products and services offered to them and also to formulate marketing strategies that will appeal to the customers’ feelings. Thus, sentiment analysis has become an essential part of contemporary e-commerce strategies as a tool that helps to make decisions based on the consumers’ opinions.

2.2 Traditional ML in Sentiment Analysis

Conventional ML models have been quite useful in advancing sentiment analysis, especially in the initial stages of e-commerce. Among the most common algorithms applied in this field are logistic regression, SVM, and Naive Bayes. Logistic regression is considered simple and effective, especially in binary classification problems; however, the model has difficulty handling intricate patterns in data because of its linearity [17]. SVM, on the other hand, is efficient in dealing with high dimensional spaces and has the capability to produce non-linear decision planes through the application of kernel tricks. However, SVMs can be slow and often requires fine-tuning of the parameters to get the best results [18]. Next, Naive Bayes, which is one of the most popular models in ML and based on the probabilistic approach, is famous for its effectiveness and applicability in handling big data, especially text classification [19]. However, this model’s major drawback is the assumption of feature independence, since if the features are highly correlated it may not perform optimally [20]. To overcome the drawbacks of these classical approaches, new and modern approaches like ensemble techniques, such as random forest, and gradient boosting algorithms like LightGBM, have been created [8]. These approaches integrate the features of several models to produce better forecasts and higher reliability in order to support the sentiment analysis in the volatile context of e-commerce [21].

2.3 Advanced Models in Ensemble for Sentiment Analysis

Random forest and other ensemble methods are now widely used for sentiment analysis, as they bring the power of numerous decision trees to increase the level of prediction and decrease the chance of overfitting [22]. Random forest works by developing a ‘forest’ of decision trees. Every decision tree is produced based on random data and features, which enhance the model’s accuracy and applicability [21]. This approach has been found to be useful in sentiment analysis, for example, because human language is diverse and often unpredictable, posing a challenge to models, such as logistic regression or SVM. Besides the random forest, LightGBM, which belongs to the gradient boosting family, improves the state of the art in sentiment analysis by constructing an ensemble of trees that fine-tune the previous models and deliver the most precise results [5]. Specifically, LightGBM is considered one of the most efficient and scalable models, which is why it works great with large e-commerce datasets. However, compared to the basic ML models, ensemble methods are more accurate and less sensitive to noise, but they are computationally expensive and need proper parameter tuning [23]. The enhanced deep learning models, such as convoluted neural networks (CNNs) and recurrent neural networks (RNNs), have provided better results in sentiment analysis since they can identify the complex and subtle relations as well as features in the text data that even the best ensemble techniques cannot capture [24].

3.1 Proposed Model Graph

The dataset of customer reviews and recommendations is first collected and then preprocessed to clean and structure the data as well as to normalize it, meaning that the features will not bias the results (Fig. 1). Then, based on text version of the data, certain features are extracted from it, and ML models are trained using that data. As for the models’ testing and validation, they undergo thorough assessment immediately after the training phase. If the models meet the set criteria, the one with the highest accuracy is then used for prediction; otherwise, the models are further trained, and their parameters are readjusted to enhance performance. After the model is fine-tuned, it is used in forecasting. And in the end, the model is applied for the purpose of classifying customer recommendations based on the analysis of the reviews.

3.2 Models

3.2.1 Logistic regression

In this study, logistic regression is used initially to compare the performance of the proposed model in sentiment classification to predict customer recommendation from their respective reviews. Though, logistic regression is rather efficient in solving binary classification problems, for instance, to define whether a review is positive or negative. This model works through applying the logistic function to determine the likelihood of the input example to be of a certain class [25]. Because this model is easy to implement and to understand, it can be useful for discovering the general trends of feature sentiment correlations. However, it possesses a linear structure, making it perhaps less effective in capturing more intricate features of the given data [26].

3.2.2 SVM

The SVM is used in this study to predict recommendation chances from customer reviews due to its capability in dealing with high dimensional data, this is because features used in text-based sentiment analysis are numerous [27]. SVM operates through identification of the best hyperplane that separates the various classes with the highest margin and is efficient in distinguishing between the positive and negative sentiments. SVM also has the capability to model non-linear relationships through the kernel trick since it is way more flexible than the linear models. However, SVM can be time consuming, and the model’s performance depends on the selection of the parameters [28].

3.2.3 Naive Bayes

Naive Bayes is included in this study because it is simple and efficient, particularly in text categorization to predict recommendation chances from customer reviews. It has been found to give good results even with the assumed independence of the features [29]. It is based on Bayes’ theorem, with the assumption that all features are conditionally independent given the class label, hence the name Naive Bayes. This is because when it comes to large data sets, as is common with e-commerce platforms, Naive Bayes is efficient to train and predict. However, this assumption of independence may not yield the best results sometimes, especially when there is a strong relation between the features [19].

3.2.4 Random Forest

Random forest is implemented in this study to predict recommendation chances from customer reviews. It is a model that aggregates the decision trees’ results for the purpose of enhancing classification accuracy and reducing overfitting [22]. Each tree in the forest is trained on a random sample of the instances, and the final classification is obtained by taking the class most frequent among the individual trees’ classifications. This approach is less prone to over fitting since it is a common issue with individual decision trees. Indeed, random forest has a high ability to work with noisy and imbalanced data that are common in sentiment analysis, which here makes it an efficient tool [30].

3.2.5 LightGBM

LightGBM is employed in this study to leverage customer reviews in predicting their chances of recommending because of the efficiency and ability of handling large scale sentiment analysis data. LightGBM is a gradient boosting model, which means that it constructs the ensemble of the decision trees where every new tree learns from the mistakes of the previous trees [8]. It has been built to be fast and efficient, due to the use of the histogram-based algorithm, which optimizes memory usage and accelerates the training process. LightGBM has some advantages when working with big and intricate data sets, as it has a good performance and yields high accuracy with lesser computational complexity than other boosting algorithms [5].

3.3 Dataset

The dataset applied in this work is the ‘Women’s E-commerce Clothing Reviews’ data set, which can be obtained from Kaggle (https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews) and was accessed on 9 July 2024. The data set contains more than 23,486 customers’ commentaries and 10 features/variables [31]. The key variables from the dataset were review text and recommendation IND (Recommended IND: Binary variable stating where the customer recommends the product where 1 is recommended, 0 is not recommended.). Preliminary analysis of the data reveals that the dataset contains 19,314 positive and 4,172 negative recommendations. Every review contains customers’ textual comments, ratings in the form of numbers, and a flag indicating whether the reviewer recommends the product or not, being a valuable source for sentiment analysis. The dataset has samples of customers’ sentiments ranging from very positive to very negative and encompassing diverse clothing products. This heterogeneity in content offers an appropriate setting for assessing the performance of the ML models used in this research; it makes it possible to have a thorough assessment of the contribution of the proposed models in identifying the customer sentiment and recommendation propensity. The dataset was split into training and testing sets at the ratio of 80% and 20%, respectively. The training data comprises of 15,451 positive and 3,338 negative reviews, while the testing set includes 3,863 positive and 834 negative reviews.

3.4 Parameter Settings

3.4.1 Logistic Regression

C (inverse of regularization strength): 0.5
Penalty = > L2 regularization
Solver: saga (stochastic average gradient descent)
Max iterations: 200
Random state – 42 (for reproducibility)

3.4.2 SVM

C (regularization parameter) set to 0.8
Kernel: polynomial
Degree: 2 (degree of the polynomial kernel function)
Gamma – auto (kernel coefficient calculated based on the number of features)
Probability: true (to enable probability estimates)
Random state – 42 (for reproducibility)

3.4.3 Naive Bayes

Alpha: 0.8 (additive smoothing parameter)
Fit prior: false (not learning class prior probabilities from data)

3.4.4 Random Forest

N Estimators – 150 (number of trees in the forest)
Max depth: 20 (maximum depth of the tree)
Min samples split: 4 (minimum number of samples required to split an internal node)
Random state – 42 (for reproducibility)

3.4.5 LightGBM

Objective: multiclass classification.
Boosting type > > goss (gradient-based one-side sampling)
Num leaves: 40 (maximum number of leaves in one tree)
Learning rate: 0.03 (shrinkage rate to prevent overfitting)

3.5 Performance Evaluation

In performance evaluation, the study relied heavily on computing true and false positives and negatives to arrive at the ultimate evaluation metrics. Within our research on sentiment analysis of e-commerce reviews, True positives (TP) mean the cases, in which the model accurately identifies a positive recommendation out of a positive review. A true negative (TN) is a situation in which the model suggests a negative recommendation for a negative review [32]. But false positives (FP) define cases where the model increases the chances of the recommendation being positive when in real sense the review was negative; this leads to a wrong impression of the level of customer satisfaction. On the other hand, the false negatives (FN) are when the model suggests a negative rating, but the review was in fact positive, meaning that it may not find customers that were satisfied enough to recommend a product. These metrics are important in model performance and most especially when trying to balance between precision and recall. The specific metrics used in evaluation are accuracy, precision, recall, and F1 score.

3.5.1 Accuracy

Accuracy is defined as the number of correctly classified instances to the total number of instances in the dataset. It is a rather simple score that helps to estimate a model’s general performance when it comes to classification of data points [33]. However, accuracy can be uninformative in cases of imbalanced dataset where one class is much larger than the other, as it does not consider the distribution of the different classes [32].

3.5.2 Precision

Precision specifically measures a model’s ability in correctly identifying the positive class out of all the cases it classified as such. It is helpful in situations where the FP is expensive, for instance in sentiment analysis where a negative review may be wrongly classified as a positive one, which may lead to wrong decisions being made regarding the business [33].

3.5.3 Recall

This is the true positive rate, or sensitivity, which is the percentage of actual positives correctly classified by the model out of all actual positives. High recall is a critical measure when FN cases are undesirable since it implies that a positive instance is missed, and this is why it is vital in sentiment analysis [32].

3.5.4 F1 Score

F1 score is the weighted average of precision and recall where the F1 score is the harmonic mean of the two, incorporating both the FPs and FNs. It is most helpful when the classes are imbalanced, providing a single measure that can be used to assess both the model’s precision and recall [33].

4.1 Confusion Matrix

The matrices in Error! Reference source not found. illustrate that all models have high true positive rate (TPR), meaning that they are able to effectively classify most of the positive cases as such (bottom-right quadrant).

4.1.1 Logistic Regression: As for the confusion matrix, the logistic regression produces 3,562 true positives for positive reviews, and it also has some leakage with 404 FPs and 119 FNs. This means that though the model is fair at predicting outcomes, it could work on preventing the FP predictions especially on negative reviews.

4.1.2 SVM: SVM’s results were almost identical to logistic regression, predicting 3,560 true positives. It had 374 FPs and 121 FNs, which meant that the model was able to moderately balance between the true positive rate and the TN rate than the logistic regression, yet it misclassified some negative reviews as positive.

4.1.3 LightGBM: LightGBM was able to classify 3,538 as positive reviews and had a higher FP rate at 410 and a lower FN rate at 143. This means that although the model is efficient in detecting positive recommendations, it does somewhat worse than the other models with regard to FPs and FNs, therefore some adjustments are needed.

4.1.4 Naive Bayes: In recall, Naive Bayes was outstanding with 3,665 true positives and 16 FNs, which shows that Naive Bayes is good at recognizing positive reviews. However, the model produced the highest FP results with 716 instances. This means that even though the model captures the positive sentiments with good accuracy, it is not very good at precision especially in identifying negative comments.

4.1.5 Random Forest: Random forest again showed a balanced performance with 3,660 true positives and only 21 FNs, which is a good indication of recall. But it had 686 FPs, which means that although the model correctly identifies most of the positive reviews, it fails to distinguish between negative and positive in 686 cases, so there is still room for increasing the percentage of accuracy.

4.2 Model Performance Comparison

In comparing the performance of the models, Figure stacks up the scores and groups them by the metric names.

4.2.1 Accuracy

On the accuracy of the model’s comparison, SVM stands out with a very slight difference and a higher accuracy of 0.8906, slightly better than logistic regression, with an accuracy of 0.8844. The performances of both models are quite outstanding for they are capable of generating high classification rates, which shows that they can correctly classify most of the instances of the given dataset. Coming in third place is LightGBM, with an accuracy of 0.8778, which is again an indication of the model’s capacity in managing big data and intricate patterns but still slightly below the standards set by SVM and logistic regression. Naive Bayes and random forest in the same manner give relatively weaker results with accuracy scores of 0.8483 and 0.8438, respectively. These outcomes imply that although random forest’s and Naive Bayes’ algorithms are accurate, they might not model the complex interactions in the data, such as SVM and logistic regression, especially in the scenario of sentiment analysis.

4.2.2 Precision

As for the precision, once more the SVM model gives slightly better results. With a precision of 0.9049, it is evident that the proposed approach has a high potential to reduce FPs. Nearest in rank is logistic regression, with a precision of 0.8981, which proves the efficiency of the proposed method in distinguishing between the real positive values. LightGBM also gives good results achieving a precision of 0.8961, which is also suitable for the tasks that demand high precision in sentiment analysis. Naive Bayes and random forest have the lowest precision scores of 0.8422 and 0.8366, respectively, which indicates that these models are likely to give more FP predictions. This lower precision can be very disadvantageous especially in applications where the cost of FPs is high, such as in the customer sentiment analysis, where wrong classification may lead to wrong business decisions.

4.2.3 Recall

Recall depicts the models’ effectiveness in correctly predicting actual positive cases and it is of interest to note that Naive Bayes and random forest achieve high recall scores of 0.9957 and 0.9943, respectively. These scores show that both the models are quite efficient in identifying almost all the positive cases, which is quite useful in cases where a positive case is missed, and it leads to adverse consequences. SVM and logistic regression also show good results with recall of 0.9671 and 0.9677, respectively, which shows their performance in determining true positive is accurate. LightGBM comes in last with a recall of 0.9612, which also indicates good performance, though a bit less than the other models. The higher recall of Naive Bayes and random forest means that although they are not accurate, they are effective in identifying positive sentiments, which may be valuable depending on the kind of application.

4.2.4 F1 Score

The F1 score is a measurement that considers both precision and recall and can give an overall view of each model’s efficacy. SVM got the highest F1 score of 0.9350, which shows that the model’s overall precision and recall is better than the other models. In the same order, the classification algorithms F1 score results are as follows: logistic regression has a value of 0.9316, which supports the efficiency of the proposed model in sentiment analysis tasks as well. LightGBM equally has an F1 score of 0.9275, meaning that it is quite good in all aspects, despite the fact that it is slightly inferior to SVM and logistic regression. Random forest and Naive Bayes with F1 scores were 0.9119 and 0.9092, respectively. While efficient, they may have a minor challenge in achieving a good balance between the precision and recall that may affect the performance in real life. These results show that SVM, logistic regression, and LightGBM are the best options for performing the sentiment analysis by a balanced performance in the context of e-commerce.

We established that SVM gave the best results with an accuracy of 89.06% and was identified as the most preferred model, followed by logistic regression with 88.44%, and LightGBM at 87.78%. From these results, SVM is the best model for generalizing the results across the whole dataset, slightly outperforming the other models in terms of accuracy. In view of the literature review, our findings are consistent with [4] who also established that SVM had high accuracy of approximately 97.2% and [34] who found the accuracy of 96.51% through random forest indicating that with different data or more data processing methods, the ensemble method random forest could possibly be better than SVM.

Regarding the precision, SVM once more emerged as the best of the models we used in this study with a precision of 90.49%, which is a good level of FP rate reduction. This is especially important in sentiment analysis, where it is rather easy to mistakenly label negative sentiment as positive, which might have severe consequences to the business. LightGBM and logistic regression also gave good results with the precision of 89.81% and 89.61%, respectively. In agreement with [4], high precision was also reported in models, such as LightGBM, in this case. But [11] reported that the precision of REPTree was 93.75%, which was better than SVM in their work, meaning that ensemble methods may increase the precision if properly set. On the same note, [7] pointed out that Naive Bayes produced a precision of 94%, while our results show that Naive Bayes had a precision of 83.66%. This difference shows that the measures of precision are very dependent on the dataset and processing methodology applied. Our study re-emphasizes the need to be more careful with the choice of models for sentiment analysis especially when FPs can be costly and models, such as SVMs and logistic regression, are highly suitable for scenarios where a high level of precision is needed.

Based on our results, it can be concluded that Naive Bayes and random forest have the highest recall rates, which are 99.57% and 99.43%, respectively. These results show that these models are suitable to detect positive instances and therefore when it is important to capture all the positives even with increasing the number of FPs. In the study by[34], random forest also had high recall but was also reported to have a high precision meaning the model performed well. [35] observed the same trend with their model, getting high recall and a very good F1 score. The recall scores for SVM and logistic regression, which were around 96.71% and 96.77%, agree with these findings; however, they did not meet the highest recall rate observed in the Naive Bayes. These differences demonstrate the differences in the trade-offs of precision and recall for different models. Although high recall is beneficial in excluding the FNs, it leads to high FP, however, as has been observed with Naive Bayes in this study.

F1 score is an important metric in our analysis since it gives the overall measure of both precision and recall. In SVM, we have observed that it has given the highest F1 score in our study of 93.50%, which points to a general good performance of the model. Logistic regression and LightGBM were second and third, with the F1 score of 93.16% and 92.75%, respectively. In line with the literature, [4] obtained high F1 scores of 97% for LightGBM, which are in line with the current study’s results, while [34] observed that random forest gave an F1 score of 96.50%. But in this paper, random forest got the F1 score of 91%. It is evident that random forest is a reliable model; however, to achieve good precision and recall, it may need some adjustments to attain higher performance. The F1 scores recorded in the present study are a clear indication that models examined in the present study perform quite well when it comes to the task of sentiment analysis, where equal importance is afforded to both classes. However, comparing the F1 scores to other studies, it is crucial to choose the right model and its parameters depending on the goals of the analysis.

In this paper, the efficiency of many ML models, such as logarithmic regression, SVMs, Naive Bayesian, random forest, and LightGBM are discussed in light of sentiment analysis of e-commerce customer review samples. Based on these findings, it is clear that SVM is the model of choice when all four performance measures— accuracy, precision, recall, and F1 score—are critically important. Other algorithms, such as logistic regression and LightGBM, are relatively good in accuracy and retain high precision and good recall scores. As visualized from the results, both the Naive Bayes and random forest models were convincingly effective in terms of recall, but performed highly in learning of positive sentiments while discerning more FPs. Comparing the work with other studies show that it is possible to improve the random forest and other ensemble methods depending on the preprocessing of data and selection of features.

This work enriches the theoretical background of sentiment analysis while specifying that the choice of the ML model should be optimized for the existing objectives and datasets characteristics. It furthers existing research in the direction of proving the fact that no model under study outperforms all the other models in measures, hence there is a need for model selection depending on the problem that the model is going to solve.

In practical terms, this work reveals that SVM can be adopted as a robust model by corporate organizations intending to employ sentiment analysis when accuracy of all the parameters is paramount. Nevertheless, for the cases when recall rate is more important, for instance in the customer feedback analysis when the goal is to capture all the positive opinions, the Naive Bayes and random forest methods might be used. Thus, it is crucial to continue to refine the models and calibrate them for every dataset and enhancing the domain of knowledge to change the ML.

Several research findings are presented by this study which include the following: the study focuses on the evaluation of the performance of individual and composite classification models on a real-world dataset that is relevant to the sentiment analysis in e-commerce; reveals the advantages and limitations of each model in terms of data processing of the customer sentiment; and offers a guide to researchers and practitioners on the models to use depending on the task at hand.

Applying the results of the present work, one can help manage large amounts of customer reviews with the help of autonomic ML models. Such models enable businesses to work through a large number of review statements and the output is achieved in less time. A surprisingly simple model, such as SVM or logistic regression, that is recommended for implementation can produce quite stable results, which allows companies with limited computing power to work with them.

The contribution of this research is as follows: first, it involves the innovation of applying ML models in the sentiment analysis of e-commerce. Second, although there have been many studies done on e-commerce sentiment analyses particularly using classifiers and ensembles, this study aims at comparing the ML classifiers and ensembles only for the e-commerce sentiment analysis of women’s dress from the aspect of customer reviews. The study also gives useful tips on how models’ performances can be influenced by characteristics of data and the metric under consideration to enhance the knowledge base of this field of study.

The main constraint relating to this study is the usage of one data setting and therefore the findings may not be generalized. In future research, similar models should be tested on various sets of data from other domains or from other categories of products. A limitation of the present study is the use of only traditional and ensemble models to classify the tweets, without the use of deep learning models, such as CNNs or RNNs, which could contain different findings.

This research should be extended in the future to use architectures, such as CNNs and RNNs, for dealing with the complexities of sentiment analysis in e-commerce since such architecture is capable of handling many features of textual data. As suggested by the results from [35], it could be insightful to analyze deep learning models against classical ML and the torrents in-between to determine which methods are best in serving the e-commerce platforms. However, these studies should also compare the results of using more complex preprocessing features, including more detailed feature engineering and fewer dimensions. The creation of methods, which possess a number of algorithms in their base and apply the data of each algorithm to improve the outcome of the other, might also be researched further especially when working with big datasets with a significant disproportion between the amount of positive and negative instances. Lastly, it would be possible to recommend more accurate and relevant models based on one or several types of products or one or another type of customers, which would help to get deeper understanding of customers’ attitudes towards products and services and provide better recommendation systems to improve the users’ experience on different kinds of e-commerce industries.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement: Not applicable.

Conflicts of Interest:

The authors declare no conflict of interest.

Funding:

This research received no external funding.

Author Contribution

Author Contributions: Conceptualization, O.A.; Formal analysis, O.A.; Investigation, O.A.; Methodology, H.B.; Validation, H.B.; Writing—review & editing, H.B.. All authors have read and agreed to the published version of the manuscript.

Data Availability

https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews

Hemalatha B, Velmurugan T (2020) Impact of Customer Feedback System Using Machine Learning Algorithms for Sentiment Mining. Int J Innov Technol Explor Eng 9:1475–1483. https://doi.org/10.35940/ijitee.d1537.029420
Yi S, Liu X (2020) Machine learning based customer sentiment analysis for recommending shoppers, shops based on customers’ review. Complex Intell Syst 6:621–634. https://doi.org/10.1007/s40747-020-00155-2
Dang NC, Moreno-García MN, De la Prieta F (2020) Sentiment analysis based on deep learning: A comparative study. Electronics 9:483. https://doi.org/10.3390/electronics9030483
Lin X (2020) Sentiment analysis of e-commerce customer reviews based on natural language processing. In Proceedings of the 2020 2nd international conference on big data and artificial intelligence; ; pp. 32–36, https://doi.org/10.1145/3436286.3436293
Alzamzami F, Hoda M, El Saddik A (2020) Light gradient boosting machine for general sentiment classification on short texts: a comparative evaluation. IEEE access 8:101840–101858. https://doi.org/10.1109/access.2020.2997330
Zhao H, Liu Z, Yao X, Yang Q (2021) A machine learning-based sentiment analysis of online product reviews with a novel term weighting and feature selection approach. Inf Process Manag 58:102656. https://doi.org/10.1016/j.ipm.2021.102656
Xie S (2019) Sentiment Analysis using machine learning algorithms: online women clothing reviews (Doctoral dissertation); Dublin, National College of Ireland
NarendraBabu CR, Harsha S, Shaikh TS, LightGBM (2023) Next Point of Interest Location Prediction Using Ensemble Machine Learning. SN Comput Sci 4:764. https://doi.org/10.1007/s42979-023-02254-6
Eriksson T, Bigi A, Bonera M (2020) Think with me, or think for me? On the future role of artificial intelligence in marketing strategy formulation. TQM J 32:795–814. https://doi.org/10.1108/tqm-12-2019-0303
Wankhade M, Rao ACS, Kulkarni C (2022) A survey on sentiment analysis methods, applications, and challenges. Artif Intell Rev 55:5731–5780. https://doi.org/10.1007/s10462-022-10144-1
Hamsagayathri P, Rajakumari K Machine learning algorithms to empower Indian women entrepreneur in E-commerce clothing. In (2020) International Conference on Computer Communication and Informatics (ICCCI); IEEE: 2020; pp. 1–5, https://doi.org/10.1109/iccci48352.2020.9104111
Nazir A, Rao Y, Wu L, Sun L (2020) Issues and challenges of aspect-based sentiment analysis: A comprehensive survey. IEEE Trans Affect Comput 13:845–863. https://doi.org/10.1109/taffc.2020.2970399
Giri C, Chen Y (2022) Deep learning for demand forecasting in the fashion and apparel retail industry. Forecasting 4:565–581. https://doi.org/10.3390/forecast4020031
Jain PK, Pamula R, Srivastava G (2021) A systematic literature review on machine learning applications for consumer sentiment analysis using online reviews. Comput Sci Rev 41:100413. https://doi.org/10.1016/j.cosrev.2021.100413
Alantari HJ, Currim IS, Deng Y, Singh S (2022) An empirical comparison of machine learning methods for text-based sentiment analysis of online consumer reviews. Int J Res Mark 39:1–19. https://doi.org/10.1016/j.ijresmar.2021.10.011
Wu SJ, Chiang RD, Chang HC (2024) Applying sentiment analysis in social web for smart decision support marketing. J Ambient Intell Humaniz Comput 15:1927–1936. https://doi.org/10.1007/s12652-018-0683-9
Gomila R (2021) Logistic or linear? Estimating causal effects of experimental treatments on binary outcomes using regression analysis. J Exp Psychol Gen 150:700. https://doi.org/10.1037/xge0000920
Boateng EY, Otoo J, Abaye DA (2020) Basic tenets of classification algorithms K-nearest-neighbor, support vector machine, random forest and neural network: A review. J Data Anal Inform Process 8:341–357. https://doi.org/10.4236/jdaip.2020.84020
Vangara V, Vangara SP, Thirupathur K (2020) Opinion mining classification using naive bayes algorithm. Int J Innovative Technol Exploring Eng (IJITEE) 9:495–498. https://doi.org/10.35940/ijitee.E2402.039520
Alizadeh SH, Hediehloo A, Harzevili NS (2021) Multi independent latent component extension of naive Bayes classifier. Knowl Based Syst 213:106646. https://doi.org/10.1016/j.knosys.2020.106646
Roy SS, Dey S, Chatterjee S (2020) Autocorrelation aided random forest classifier-based bearing fault detection framework. IEEE Sens J 20:10792–10800. https://doi.org/10.1109/JSEN.2020.2995109
Chen H, Wu L, Chen J, Lu W, Ding J (2022) A comparative study of automated legal text classification using random forests and deep learning. Inf Process Manag 59:102798. https://doi.org/10.1016/j.ipm.2021.102798
Kunapuli G (2023) Ensemble methods for machine learning; Simon and Schuster
Yang Y, Lv H, Chen N (2023) A survey on ensemble learning under the era of deep learning. Artif Intell Rev 56:5545–5589. https://doi.org/10.1007/s10462-022-10283-5
Dumitrescu E, Hué S, Hurlin C, Tokpavi S (2022) Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects. Eur J Oper Res 297:1178–1192. https://doi.org/10.1016/j.ejor.2021.06.053
Nusinovici S, Tham YC, Yan MYC, Ting DSW, Li J, Sabanayagam C, Cheng CY (2020) Logistic regression was as good as machine learning for predicting major chronic diseases. J Clin Epidemiol 122:56–69. https://doi.org/10.1016/j.jclinepi.2020.03.002
Ayyub K, Iqbal S, Munir EU, Nisar MW, Abbasi M (2020) Exploring diverse features for sentiment quantification using machine learning algorithms. IEEE Access 8:142819–142831. https://doi.org/10.1109/access.2020.3011202
Zahoor K, Bawany NZ, Hamid S Sentiment analysis and classification of restaurant reviews using machine learning. In (2020) 21st International Arab Conference on Information Technology (ACIT); IEEE: 2020; pp. 1–6, https://doi.org/10.1109/acit50332.2020.9300098
Kewsuwun N, Kajornkasirat S (2022) A sentiment analysis model of agritech startup on Facebook comments using naive Bayes classifier. Int J Electr Comput Eng 12:229. https://doi.org/10.11591/ijece.v12i3.pp2829-2838
Singh NK, Tomar DS, Sangaiah AK (2020) Sentiment analysis: a review and comparative analysis over social media. J Ambient Intell Humaniz Comput 11:97–117. https://doi.org/10.1007/s12652-018-0862-8
Brooks N (2018) Women’s e-commerce clothing reviews; Kaggle
Handelman GS, Kok HK, Chandra RV, Razavi AH, Huang S, Brooks M, Asadi H (2019) Peering into the black box of artificial intelligence: evaluation metrics of machine learning methods. Am J Roentgenol 212:38–43. https://doi.org/10.2214/ajr.18.20224
Zhou J, Gandomi AH, Chen F, Holzinger A (2021) Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics 10:593. https://doi.org/10.3390/electronics10050593
Mahmud FAM, Mullick SBRA, Anas TCM (2023) Sentiment Analysis of Women's Clothing Reviews on E-commerce Platforms: A Machine Learning Approach. Dhaka, Bangladesh,, University of Liberal Arts Bangladesh
Wassan S, Shen T, Xi C, Gulati K, Vasan D, Suhail B (2022) Customer Experience towards the Product during a Coronavirus Outbreak. Behavioural Neurology 2022, 4279346, https://doi.org/10.1155/2022/4279346

No competing interests reported.

Download PDF

Reviewers agreed at journal
06 Nov, 2024
Reviewers agreed at journal
16 Oct, 2024
Reviewers agreed at journal
13 Oct, 2024
Reviewers invited by journal
11 Oct, 2024
Editor assigned by journal
06 Sep, 2024
Submission checks completed at journal
04 Sep, 2024
First submitted to journal
04 Sep, 2024

You are reading this latest preprint version

The Use of Machine Learning Algorithms in the Analysis of Sentiments of E- Commerce Customer Reviews and Recommendations Feedback

Status:

Version 1

Abstract

Figures

1. Introduction

2. Literature Review

2.1 Sentiment Analysis

2.2 Traditional ML in Sentiment Analysis

2.3 Advanced Models in Ensemble for Sentiment Analysis

3. Methodology

3.1 Proposed Model Graph

3.2 Models

3.2.1 Logistic regression

3.2.2 SVM

3.2.3 Naive Bayes

3.2.4 Random Forest

3.2.5 LightGBM

3.3 Dataset

3.4 Parameter Settings

3.4.1 Logistic Regression

3.4.2 SVM

3.4.3 Naive Bayes

3.4.4 Random Forest

3.4.5 LightGBM

3.5 Performance Evaluation

3.5.1 Accuracy

3.5.2 Precision

3.5.3 Recall

3.5.4 F1 Score

4. Results

4.1 Confusion Matrix

4.2 Model Performance Comparison

4.2.1 Accuracy

4.2.2 Precision

4.2.3 Recall

4.2.4 F1 Score

5. Discussion

6. Conclusion and Future Work

Declarations

Institutional Review Board Statement

Conflicts of Interest:

Funding:

Author Contribution

Data Availability

References

Additional Declarations

Status:

Version 1