A novel robust LSTM model on Churn and comparative analysis versus deep learning, machine learning and ensemble learning: Telecommunications Case

doi:10.21203/rs.3.rs-3421473/v1

Download PDF

Research Article

A novel robust LSTM model on Churn and comparative analysis versus deep learning, machine learning and ensemble learning: Telecommunications Case

https://doi.org/10.21203/rs.3.rs-3421473/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Customer churn is an important issue in increasing both the long and short-term revenues of a company. If companies identify customers’ churn behavior, they can prevent customer loss, ensure customer loyalty, and in turn gain better financial returns. The telecommunication sector is a customer-oriented sector that requires customer retention to survive in the market and, in this sector customer loss is observed at a high level. In recent years, artificial intelligence-based customer churn analysis has been widely used to predict customers’ churn behavior. In this study, a customer churn analysis was conducted with publicly shared Telco telecom data. Predictive models were built with machine learning (LR, KNN, SVM, DT, RF, ANN), ensemble learning (XGBoost, Voting), and deep learning (LSTM) methods. Besides, a 3-layered LSTM model was proposed. Accuracy, F1, precision, and recall rates were used to evaluate the models. As a result, the novel 3-layered LSTM model achieved 91% accuracy, 87% precision, 84% recall, and 89% F1 values. The proposed model is competitive with the literature.

Customer churn analysis

ensemble learning

machine learning

deep learning

Telecommunication

Incremental developments in Information Technologies (IT) and the Internet have led to radical changes in consumer behaviors, organizational structure, and business processes. The new age is based on a knowledge-based view (KBV) and in this age, transforming data into information for decision-making is a vital strategic tool. In data-based decision-making, executives use the outcomes of predictive data analysis to make predictions and prescriptive data analysis to provide recommendations for current and future activities [1].

The rise of electronic commerce in the mid 1990’s triggered fierce competition. In such an environment, retaining customers has become even more difficult for companies. The emergence of online businesses led to new revenue models. Web catalogs, digital content, advertising-supported, advertising-subscription mixed, and fee-based models are the new sources of income for online businesses [2]. E-commerce companies require appropriate methods for predicting consumer behavior and decreasing customer loss rates to survive or gain a competitive advantage in the market. Customer churn analysis is a widely employed method in the sectors that adopt subscription-based income models as telecommunication, insurance, and banking [3].

In today’s world, speed, easiness, user-friendly applications, and websites are the factors that have a positive impact on consumer experience by providing access to information in the right place at the right time. These positive user experiences play an important role in consumers' buying decisions [4]. Customers can react in three dimensions when their expectations are not satisfied or cannot experience the user experience they desire: Exit, voice, and loyalty. Customer churn is the behavior of a customer engages in to end her/his relationship with the company (e.g., unsubscribe, transfer, or suspend the subscription) and when the customer leaves the company and starts to buy another company’s product it is called as customer churn [5]. In the case of customer churn, the customer can be convinced to stay with the company by implementing customized marketing strategies [6].

The saturation of markets by globalization gave rise to a highly competitive environment in the business world and as a result, the conversion costs of customers are decreased, and customer acquisition costs of companies are increased [7]. Because customer loyalty directly affects company revenues, it is one of the primary goals of customer relationship management (CRM). As acquiring a new customer is more costly than retaining existing customers, companies conduct customer churn prediction analysis [8].

Customer churn analysis is used widely in various sectors to increase the level of predictability [8] and plays an important role in the sectors that adopt a subscription-based revenue model as telecommunication, banking, and insurance [3]. The telecommunication sector witnesses high churn rates in its highly competitive market environment [9], and identifying churn risk is required for retaining existing customers and gaining a competitive advantage [10].

Customer churn in the telecommunication sector can be defined as a customer unsubscribing from a company’s service. Telecommunication companies need to know the reasons for churn and this information can be extracted from the data of the customers [11]. Customer churn behavior in the telecommunication sector came into prominence in the 2000s in the literature, and “churn prediction and modeling” have become one of the most examined areas [12]. Research in this field includes building classification models by using traditional machine learning methods (eg. Decision Tree-DT, Random Forest- RF, Naïve Bayes-NB, Logistic Regression-LR, Support Vector Machine-SVM) and/or deep learning algorithms (eg. Artificial Neural Network-ANN, Convolution Neural Network-CNN) and determining the accuracy levels of the models by using various evaluation metrics [13], [14]. As a result of the developments in big data technology, models based on deep learning algorithms increase prediction efficiency by requiring less processing time and providing higher accuracy rates [13].

Research comparing the performance of predictive models based on traditional machine learning algorithms, deep learning algorithms, and hybrid algorithms plays an important role in the literature. For example, Gaur and Dubey compared the performance of several classification models (LR, SVM, RF, and gradient-boosted tree) [11]. Findings show that the Gradient boosting model is the best model with a performance of 84.57% AUC. Besides, in Zatonatska et al.’s research, all four models were defined as “high quality” with accuracy rates of over 80%. have accuracy rates [15]. Jeyaprakaash and Sashirekha compared predictive models based on Adaboost and K-Nearest Neighbor (KNN) algorithms and as a result, the Adaboost model with the accuracy level of 90% was defined as the best predictive model. [16]. Bayrak et al. compared two deep learning models based on CNN and ANN algorithms and as a result, the CNN model with a 97.62% accuracy rate is defined as the best predictive model. Bayrak et al. compared the hybrid model based on CNN and long short-term memory (LSTM) algorithms with the RNN models and defined that the hybrid model is the best performance model [17].

If churn customers can be predicted as possible as earlier and accurately, it will be possible to plan new marketing offers, who are considering leaving can be predicted in advance and accurately, it will be possible to plan and implement new marketing campaigns and offers to increase the loyalty of these customers. Therefore, the classification performance of churn customer analysis is a very important issue. Various factors affect the success of the model. The types of attributes in the dataset, missing/incorrect data, conformity of data types to the algorithms used, and parameter optimizations of the algorithms are some of these factors. Hence, the processes related to these factors should be examined through various experiments, and the models should be built appropriately. To objectively evaluate the success of the models, it would be useful to compare them with the findings of similar research in the literature.

In this research, experiments were conducted on an open-source Telco dataset to compare the performances of the widely used learning methods learning, deep learning, and ensemble learning- in predicting customer churn. The experiment results were evaluated according to accuracy, precision, and F1-score metrics. As a result, it was observed that the LSTM method of deep learning, an emergent technology, provides better results than traditional machine learning methods. The flow diagram of the study is given in Fig. 1.

Customer churn management (CCM) aims to predict the customers who most likely intend to transfer to the competing service provider and convince them to stay with the company by providing customized offers to these customers. [18]. There are two kinds of churn customers for companies [19]: voluntary churn, and involuntary churn. Involuntary churn includes cases in which customers quit unwillingly. Death, relocation, or termination of subscription due to non-payment of invoices are examples of involuntary churn. Voluntary churn is the loss of customers because they voluntarily give up the company's goods and services [7]. Voluntary customers are also called “unhappy customers” [20].

Research shows that the cost of acquisition of a new customer is higher than retaining existing ones. Therefore, customer churn management focuses on convincing voluntary churn customers to continue to use the company’s services. Involuntary churn is excluded from consideration for it is out of the company’s control [19]. Customer churn prediction is performed in various businesses to increase the revenue of the company by increasing the retention rate of customers [7]. Telecommunication, banking, insurance, gaming, music streaming, newspaper subscriptions, social media-based services, public relations, and energy are examples of industries where customer churn analysis is utilized [7]. Particularly, customer churn prediction has a vital role in industries where value is directly related to active customer amount and require customer retention as telecommunication, banking, and insurance [21]. Hence, companies struggle to prevent customer loss by using data mining methods in developing strategies to increase customer loyalty [22].

The success of CCM is related to predicting voluntary churns and understanding underlying reasons [23]. To achieve this goal, companies use data science and machine learning methods to define customer satisfaction, reveal the reasons for dissatisfaction, and predict customer churn [15]. Predictive models have a key role in developing customer churn strategy and sustaining customer loyalty [24].

2.1. Customer Churn Analysis

Customer churn analysis (CCA), a subfield of customer analytics [25], aims to predict customer churn based on historical data and behavior patterns and to convict these customers by providing customized offerings [8].

Data mining methods and machine learning algorithms have been traditionally used in CCA. Developments in big data technology made it possible to build predictive models based on deep learning and to get real-time warnings [6]. Data mining is the process of exploring relationships in the data without relying on assumptions or premises and this feature distinguishes it from traditional data analysis [22]. Data mining can be used for classifying data into series, patterns, and trends (descriptive mining), and for predicting by processing data with various algorithms (predictive mining) [19]. The behavior of churn customers can be extracted from the data analyzed by data mining methods [26].

A predictive model for churn analysis is built by following five steps [19]: The first step is collecting appropriate data for analysis. Demographic data, customer account information, call data in the telecommunication sector [19], and sales data in the retail sector[17] are examples of data used in CCA. In the second step, data is organized and analyzed to explore patterns and relationships. First, a threshold value should be defined to distinguish “churn customer” from “ordinary customer” [17]. The time frame when several customers are inactive is also called the “time window”. Customers who exceed this threshold are identified as “churn customers”. The time window period can change due to the type of services provided by the company [7]. Afterward, training data is prepared for the data mining process. [17]. As the training data is tested and verified, it can be used as input for a predictive model [8]. In the third step, the appropriate data mining method for the data is selected. Several machine learning methods frequently used in this process are LR, SVM, RF, gradient boosted tree (GBT), KNN algorithms, DT, ANN, rough set approach, linear discriminant analysis, market basket analysis, and sequential pattern mining, NB[11] [20]. Data mining can be applied in two ways: supervised and unsupervised. In CCA, the supervised learning method is preferred to classify churning and non-churning customers [19]. In the final step, a predictive model is built based on the preferred data mining method, and the results are evaluated and verified in terms of usability and accuracy.

Essentially, customer churn analysis is a classification problem. Hence, there are many classification-methods-based CCA research in the literature. in the literature. Ullah et al. analyzed two samples of telecom customer data by using the Bayes method and found that test results range between 51% and 82.11% according to the type of sample and iteration level [27]. Arifin and Samopa conducted customer churn classification research in Indonesia with the SVM method, and as a result, the predictive model achieved an 80% accuracy rate. [28]. De Caigny et al. combined LR and DT methods and proposed a novel hybrid model. Accuracy rates for individual classifiers are defined as 81.6% for LR, 79,6% 79,6% for RF, and with the hybrid model, the result increased to 86% [29]. Coussement et al. built classification models based on various machine learning algorithms such as RF, SVM, DT, SGB, and NB and analyzed the customer data of a big telecom service provider in Europe. The accuracy levels of the models were found as follows. NB-62%, RF-63%, SVM-53%, and SGB-61%. [30]. Zhang et al. Analyzed with LR, DT, and ANN methods to classify churn customers in the South Asia telecom sector, and as a result, the ANN model achieved a 75.53% accuracy level [31]. In Azeem et al.’s study, a model based on fuzzy classifiers was applied to a dataset of 600.000 subscribers with 84 attributes and as a result, the model provided a 57% accuracy level [32]. Qureshi et al analyzed telecom customer data with LR-model prediction and predicted customer churn with a 78% accuracy level [33]. Varan and Bahara’s NB model achieved a 68% accuracy.

Wanchai analyzed customer data of a telecommunication company in Thailand with the ANN model and got an accuracy level of 86.13% [19]. Zatonatska et al. built classification models with LR, RF, SVM, and Extreme Gradient Boosting (XGBoost) and analyzed Telco Telecom customer data on a Python program. As a result, the performance of the models for accuracy rates was defined as follows: SVM-82%, RF-80.9%, LR-80.8%, and XGBoost-83% [15]. Lalwani et al. proposed a six-layered customer churn model to analyze telecom customer data. The authors performed pre-processing and attribute selection in the first three layers and selected attributes with a gravitational search algorithm in the third layer. They separated data into training and test data with the data holdout method (80% and 20%) in the fourth layer. In the last layer, models were evaluated with AdaBoost and XGBoost methods. As a result, the AdaBoost model performed 81.71%, and the XGBoost model at 80,8% accuracy rates [24].

Gaur and Dubey, built LR, SVM, RF, and GBT classification models for churn prediction. The data holdout method applied to the data with the rates of 75% and 25% and as a result, the accuracy rates of the models were defined as follows: LR-82.86%; SVM-79.75%; RF-81,26% and GBT-84.59% [11]. Seymen et al. Performed customer churn analysis in the telecommunications industry using an RF model and achieved an accuracy rate of 88.63% [26].

This section provides information about the dataset used in the study and the machine learning, deep learning, and ensemble learning methods that were applied to the dataset.

4.1.Dataset

The experiments were conducted on the Telco dataset [34], which is an open-source dataset of customer data from a telecommunications company in California in the second quarter of 2022. The Telco customer churn dataset contains data about a fictional telecom company operating in California during the third quarter. This dataset comprises 7043 customer records and includes information on customer retention, departure, and sign-ups for services. There are 21 attributes in this dataset. Table 1 shows the attributes and their descriptions [34].

Table 1: Attributes and their Description

Attribute Name	Description
Customer Id	Customer ID
Gender	Gender of customer
SeniorCitizen	Customer’s senior citizenship status
Partner	Does the client have a partner?
Addicts	Presence of dependents
Mission time	The month he stayed with the company
PhoneService	Telephone service subscription
MultipleLines	Multiple lines
Internet Service	Internet service provider
OnlineSecurity	Online safety status
OnlineBackup	Online backup status
Device Protection	Device protection ownership status
Tech Support	Availability of technical support
StreamingTV	TV subscription status
Streaming Movies	Movie subscription status
Agreement	Contract duration
Paperless Billing	E-billing status
payment method	Customer's payment method
MonthlyFees	Amount to be paid by the customer per month
TotalCharges	Total fees payable by the customer
Churn	Customer churn status

The dataset was preprocessed for missing data and outliers. Missing data is only present in the Total Charges attribute. This missing data was completed by assigning the median value of the relevant attribute, as is commonly done in the literature.

4.2. Machine Learning

Machine learning is a branch of artificial intelligence that enables computer systems to learn from data, recognize patterns, and make decisions. Machine learning is used in many fields today with its ability to understand and utilize a wide range of data. Machine learning is a revolutionary approach to improving the data analysis and learning capabilities of computers. Thanks to machine learning, solving complex problems has become more accessible and effective [35]. Information about the machine learning methods used in the study is given below.

4.2.1 Random Forest

RF is a method developed by Leo Breiman that combines the decisions of multiple trees trained on different training data sets, rather than building a single decision tree [36]. By creating multiple decision trees, each trained on a different dataset, the decisions of these trees are combined, resulting in a more powerful and stable classifier. RF is a popular algorithm that provides successful results for various machine learning problems such as classification and regression. It is also favored because of its ability to work effectively on high-dimensional and large datasets. RF is a widely used method in data analysis because it has a wide range of applications and is generally resistant to overfitting [36].

4.2.2 Support Vector Machines

SVM can be defined as a vector space-based machine learning method that finds a decision boundary between two classes and places this decision boundary at the furthest point from any point in the training data. SVM works as a structural risk minimization approach in statistical learning theory. It is noteworthy that SVM works as a structural risk minimization approach in statistical learning theory. One of the basic assumptions of the SVM is that all the samples in the training set are independent of each other and are similarly distributed [37]. This property makes SVM an effective solution for linear and nonlinear data classification and regression problems. Due to its high performance and generalizability on low-dimensional and high-dimensional data sets, SVM has a wide range of applications and is a powerful tool widely used in machine learning and data analysis [37].

4.2.3 Decision Tree

DT is an important tool in the field of data analysis and machine learning. Used in a wide range of data mining processes, decision trees facilitate information extraction with the ability to transform complex data structures into simple decision rules. Decision trees provide an effective way to make data-driven decisions by analyzing the features in the dataset [38]. The process of building decision trees is a fundamental part of the data mining steps. In the first step, the data is preprocessed and feature selection is performed. Then, the tree-building process begins, where nodes are created to identify the best discriminative decisions based on the features in the dataset. However, the tree is pruned to prevent overfitting and to ensure generalizability. Decision trees have yielded successful results in many fields such as disease diagnosis in medicine, risk assessment in finance, and customer segmentation in marketing. As a result, decision trees are an important tool that provides valuable information in data-driven decision-making processes and constitutes one of the cornerstones of data analysis.[38].

4.2.4 Artificial Neural Networks

ANNs are artificial model systems inspired by the biological nervous system and provide impressive results in the fields of machine learning and data analysis. These artificial models can be used in various tasks thanks to their data processing and pattern recognition capabilities [39].

The basic components of artificial neural networks are called "neurons". These neurons mimic the functioning of real nerve cells when processing data inputs. Artificial neural networks are organized in layers. The input layer receives the data, the intermediate layers process the data and the output layer produces the results. The internal connections of the network are determined by weights and thresholds. The learning process of the ANN takes place by adjusting the weights to recognize and understand patterns in the dataset.

With a sub-field called Deep Learning, deep structured networks can produce outstanding results by performing feature extraction on large and complex datasets. Used in many fields such as natural language processing, and game strategies, ANNs play an important role in processing large amounts of data and understanding complex relationships [40].

4.2.5 Logistic Regression

LR is a classification technique widely used in statistical analysis and machine learning. This method, which is generally used in binary classification problems, aims to estimate the probability of a data point belonging to a particular class. This estimation is realized by deciding on a threshold value of the probability values and making a classification. LR derives its name from the "logit" function. This function is used to map probabilities to an unlimited range of real numbers. The main purpose of the model is to estimate the probability of a given event occurring, based on given input characteristics. Therefore, the output value is a probability value and lies between 0 and 1 [41].

At the heart of LR is the idea of finding a line or hyperplane (this line or plane is called the "decision boundary") that best separates the dataset. This line or plane tries to best capture the separation between classes. The training process takes place by adapting the parameters (weights) of the model to the data. At the end of training, the model can be used to predict new data points [42].

4.2.6 K-Nearest Neighbor

The KNN algorithm is a widely used technique for problems like pattern recognition, classification, and regression. The KNN algorithm has a simple structure. When a data point needs to be classified or predicted, the k nearest neighbors around this point are determined. The class labels or values of these neighbors are examined. In the case of classification, the class to which the largest number of neighbors belongs is selected. In the case of regression, the values of the neighbors are averaged and prediction is performed [43].

4.3 Deep Learning

Deep learning is a branch of machine learning that aims to automatically learn data analysis, pattern recognition, and predictive capabilities using complex model structures such as artificial neural networks. By processing large datasets, this approach can identify more complex and high-level features. This can lead to more accurate results and better understanding [44]. Deep learning is characterized by making the layers of artificial neural networks wider and deeper. These neural networks use many intermediate layers to process the data, allowing more complex features to be discovered. The learning process involves automatically adjusting weights and features using large amounts of data. Deep learning aims to extract features directly from the data. This makes it possible to learn more information and achieve better results. The application areas of deep learning are quite wide. It is successfully used in many fields such as image recognition, audio processing, natural language processing, medical diagnosis, and autonomous vehicles. For example, deep learning models can identify objects in images, understand text, and diagnose diseases [45]. The LSTM deep learning method used in this study will be described in this section.

4.3.1. Long Short-Term Memory

LSTM is a powerful type of recurrent neural network developed for deep learning and especially for processing sequential data. LSTM has achieved great success in areas such as natural language processing, time series analysis, and video processing. This model is designed to overcome previous traditional methods for learning and understanding sequential data [46]. The key feature of LSTM is its ability to understand long-term dependencies in sequential data. This determines the superiority of LSTM over regular recurrent neural networks (RNN) [47].

LSTM cell has three components; Forget gate, Input gate, and Output gate. The forget gate controls how long the previous cell state is forgotten. The input gate sets the insertion of new information. The exit gate sets the time for the updated cell state to be used.

4.4 Ensemble Learning

Ensemble learning is used in many fields such as social networks, communication networks, and biological networks. With this method, groups within a network and their relationships with each other can be better understood. Ensemble learning algorithms identify communities by identifying similar nodes or grouping nodes according to the strength of their connections to each other. Information about the ensemble learning methods used in the study is given below [48].

4.4.1 XGBoost

XGBoost is an ensemble learning algorithm used in machine learning. It shows high performance, especially in classification and regression problems. XGBoost offers a more powerful and faster solution than previous Gradient Boosting methods.

XGBoost works as a tree-based algorithm. It combines multiple weak learners (usually decision trees) to build a robust prediction model. The algorithm performs the learning process with an approach that focuses on the sequential addition of trees and correcting the errors of previous trees [49].

4.4.2 Voting

Voting is a method used in machine learning and statistical forecasting. In this method, the predictions of different prediction models or classifiers are combined and the result is the prediction of the majority vote. Voting is used to compensate for the weaknesses of models used alone and to obtain more reliable predictions. There are two types: majority voting and probabilistic voting [50].

Majority voting (Hard Voting): A method where the majority of the predictions produced by different models are taken. For example, three different classifiers A, B, and C. If models A and B predict an instance as Class 1, while model C predicts it as Class 2, the Majority Voting method predicts it as Class 1 [51].
Probabilistic Voting (Soft Voting): This is a method that takes the average or weighted average of the forecast probabilities of different models. This method can obtain more accurate predictions by taking into account the reliability levels of the models. The prediction probabilities of each model are combined and the class with the highest probability is predicted [50], [51]

Voting ensemble learning is used to combine the predictions of different prediction models or classifiers to obtain more robust and reliable results. In this study, all machine learning methods are used together with majority voting.

In the proposed model, three LSTM neural networks are used together. A graphical representation of this model is given in Fig. 2.

As can be seen in Fig. 3, the Telco dataset was preprocessed for missing data, and outlier data and then scaled and prepared for the classification process. The prepared dataset was then modeled with a 3-layer LSTM, a Dropout layer, and finally Dense layer. The model is then evaluated with performance metrics. Experiments on the model were conducted on Google Colaboratory (Colab) using TensorFlow 2.9.0 and Keras 2.9 libraries and Python version 3.9.13. The Colab Pro version was used for faster and uninterrupted results of the experiments. In addition, various callbacks were used to create a model with better results. These are ReduceLROnPlateau and Early Stopping.

In the study, hyperparameter optimization was performed with GridSearchCV. Parameters such as epoch, batch size, activation function, optimization function, number of neural network nodes, and dropout rate are given in Table 2.

Table 2

Hyperparameters of the proposed model
Parameters	Değerler
Batch size	32, 64,128
Epoch	10, 20, 30
Activation Functions	ReLU, Sigmoid, tanh
Optimizaton Functions	Adam, RMSprop, SGD
Number of units	64, 128, 256
Dropout rate	0,3; 0,4; 0,5; 0,6

The parameters shown in bold in Table 2 are the most optimum parameters with Grid SearchCV. The results of the proposed model with these parameters are given under the experimental results heading. The comparison of this model with the results obtained with other machine and ensemble learning methods is also given under the same heading.

In this section, we describe the metrics used to compare the performance of the models and the results obtained.

6.1. Performance Evaluations

The performance evaluation of the constituent models is done through the complexity matrix. The matrix from which accuracy (Acc) and other metrics are obtained is given in Table 3 [52].

Table 3

Confusion Matrix
		Actual value
		Positive	Negative
Predicted Value	Positive	TP	FP
Predicted Value	Negative	FN	TN

The Acc of the model is expressed as the ratio of correctly classified samples to the total number of samples. High Acc indicates that the model has a high ability to predict correctly, while low Acc indicates that improvements in the model's performance are needed. The calculation of the Acc value is given in Eq. (1) [52].

$$Acc=\frac{{T}_{P}+{T}_{N}}{{T}_{P}+{F}_{P}+{F}_{N}+{T}_{N}}$$

Precision (Prec) refers to the proportion of samples that the model predicts as positive to true positives. A high precision means that false positive predictions are low and most of the samples classified as positive are positive. Prec is calculated as in Eq. (2)[53].

$$Prec = \frac{{T}_{P}}{{T}_{P}+{F}_{P}}$$

Recall (Rec) is a performance measure used in classification problems and refers to the rate at which all true positive examples are correctly predicted as positive. Prec is an important measure to reduce the number of false negatives and to minimize missing true positive examples. Rec is calculated as in Eq. (3) [53]

$$Rec=\frac{{T}_{P}}{{T}_{P}+{F}_{N}}$$

Mathematically, the F1-score (F1) is defined as the harmonic mean of Recall and Precision. The F1 is given in Eq. (4) [52].

$$F1=\frac{2*Recall*Precision}{Recall+Precision}$$

In the study, the Acc performance criterion was used in the classification models created for customer churn. A comparison of the literature on similar studies on the same dataset on the Acc criterion is given in the conclusion and discussion section.

6.2. Experimental Results

Experimental results from a 5-fold cross-validation decomposition on the Telco dataset are presented in this section. Within the scope of the research, the results of the experiments on the models created with machine learning, deep learning, and ensemble learning in the Google Colaboratory environment are given in Table 4.

Table 4

Experimental results of the models built on the Telco Telecom dataset
Category	Classifiers	Acc	Prec	Rec	F1
Machine Learning	LR	80	75	77	75
	KNN	77	71	67	68
	SVM	80	75	71	73
	DT	80	76	69	71
	RF	80	77	79	81
	ANN	85	83	82	84
Ensemble Learning	XgbBoost	83	84	81	83
Ensemble Learning	Voting	81	82	82	82
	LSTM	89	85	83	88
Deep Learning	Proposed Model	91	87	84	89

According to Table 4, LSTM gave the highest Acc result with 91% in the studies conducted with machine learning, deep learning, and ensemble learning on Telco Telecom data in training-test separation with 5-fold cross-validation. Machine learning methods LR, SVM, RF, and DT gave the same results with 80%, while ensemble learning methods XGBoost gave similar results. KNN lagged behind all other methods. The training validation accuracy and loss graph of the proposed model is given in Fig. 3.

In today's market conditions, the value of existing customers for companies is quite high. Therefore, companies attach great importance to Customer Churn Analysis. There are many studies in this field in the literature. In this study, churn analysis experiments were conducted on the Telco Telecommunication open dataset with models based on some of the machine learning, deep learning, and ensemble learning methods detailed above. Acc, prec, rec, and F1 performance metrics were used to evaluate the experimental results.

The results of the experiments performed on the Telco dataset using the above methods are shown in Table 3. When Table 2 is analyzed, it is seen that the method that gives the best Acc result is the three LSTM models consisting of a hybrid of deep learning methods with 91%. The comparison of the best result obtained from the models developed in the research with other studies in the literature on the same dataset is presented in Table 5.

Table 5

Comparison with the literature
Model	Acc (%)
Azeem et.al., [32]	57
Coussement et. al., [30]	63
Varan and Behara [54]	68
Zhang et.al., [31]	75.53
Qureshi et. al., [33]	78
Arifin and Samopa[28]	80
Caigny et.al.,[29]	81.6
Lalwani et.al., [24]	81.71
Amin et.al., [27]	82.11
Zatonatska et.al, [15]	83
Gaur ve Dubey [11]	84.59
Wanchai [19]	86.13
Ullah et.al.,[26]	88.63
Proposed Model	91

It can be seen that this is a study that competes with the literature by examining the values given in Table 5. Future studies are planned to evaluate the performance of other methods of deep learning, a cutting-edge technology, in customer churn analysis studies.

Although customer churn analysis focuses on the Telecommunication sector, one of the subscription-based industries, future studies can also handle similar sectors such as banking, insurance, health, etc. Besides, it is recommended that new and novel methods as various methods of deep learning can also be applied as individual or ensemble learning.

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Author contributions

Muhammet Sinan BAŞARSLAN Defining the methodology, Aslıhan ÜNAL, manuscript draft preparation Fatih KAYAALP, evaluations of the results and draft editing.

Funding

No funding was obtained for this study

Availability of data and materials

The experiments were conducted Open Access Telco Dataset.

Dataset : Shi Long Zhuang, “Telecom Churn Dataset,” 2022. https://www.kaggle.com/datasets/shilongzhuang/telecom-customer-churn-by-maven-analytics

Consent to participate

Informed consent was obtained from all individual participants included in the study.

Consent to Publish

Authors affirm There is no figure of any participant in the article.

J. Kozak, K. Kania, P. Juszczuk, and M. Mitręga, “Swarm intelligence goal-oriented approach to data-driven innovation in customer churn management,” Int J Inf Manage, vol. 60, p. 102357, Oct. 2021, doi: 10.1016/j.ijinfomgt.2021.102357.
Gary Schneider, Electronic Commerce, 12th Edition. Cengage Learning, 2016.
O. O. Osmanoglu, “Comparing to Techniques Used in Customer Churn Analysis,” 2019.
PwC consumer Experience Series, “https://www.pwc.de/de/consulting/pwc-consumer-intelligence-series-customer-experience.pdf.”
W. Fu, “Research on the construction of early warning model of customer churn on e-commerce platform,” Applied Mathematics and Nonlinear Sciences, vol. 0, no. 0, Aug. 2022, doi: 10.2478/amns.2022.1.00016.
S. Cao, W. Liu, Y. Chen, and X. Zhu, “Deep Learning Based Customer Churn Analysis,” in 2019 11th International Conference on Wireless Communications and Signal Processing (WCSP), IEEE, Oct. 2019, pp. 1–6. doi: 10.1109/WCSP.2019.8927877.
J. Ahn, J. Hwang, D. Kim, H. Choi, and S. Kang, “A Survey on Churn Analysis in Various Business Domains,” IEEE Access, vol. 8, pp. 220816–220839, 2020, doi: 10.1109/ACCESS.2020.3042657.
K. G. M. Karvana, S. Yazid, A. Syalim, and P. Mursanto, “Customer Churn Analysis and Prediction Using Data Mining Models in Banking Industry,” in 2019 International Workshop on Big Data and Information Security (IWBIS), IEEE, Oct. 2019, pp. 33–38. doi: 10.1109/IWBIS.2019.8935884.
C. Su, L. Wei, and X. Xie, “Churn Prediction in Telecommunications Industry Based on Conditional Wasserstein GAN,” in 2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC), IEEE, Dec. 2022, pp. 186–191. doi: 10.1109/HiPC56025.2022.00034.
N. Edwine, W. Wang, W. Song, and D. Ssebuggwawo, “Detecting the Risk of Customer Churn in Telecom Sector: A Comparative Study,” Math Probl Eng, vol. 2022, pp. 1–16, Jul. 2022, doi: 10.1155/2022/8534739.
A. Gaur and R. Dubey, “Predicting Customer Churn Prediction In Telecom Sector Using Various Machine Learning Techniques,” in 2018 International Conference on Advanced Computation and Telecommunication (ICACAT), IEEE, Dec. 2018, pp. 1–5. doi: 10.1109/ICACAT.2018.8933783.
J. Bhattacharyya and M. K. Dash, “What Do We Know About Customer Churn Behaviour in the Telecommunication Industry? A Bibliometric Analysis of Research Trends, 1985–2019,” FIIB Business Review, vol. 11, no. 3, pp. 280–302, Sep. 2022, doi: 10.1177/23197145211062687.
A. Mishra and U. S. Reddy, “A Novel Approach for Churn Prediction Using Deep Learning,” in 2017 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), IEEE, Dec. 2017, pp. 1–4. doi: 10.1109/ICCIC.2017.8524551.
Y. Indulkar and A. Patil, “Comparative Study of Machine Learning Algorithms for Twitter Sentiment Analysis,” in 2021 International Conference on Emerging Smart Computing and Informatics (ESCI), IEEE, Mar. 2021, pp. 295–299. doi: 10.1109/ESCI50559.2021.9396925.
T. Zatonatska, Y. Fareniuk, and V. Shpyrko, “Churn Rate Modeling for Telecommunication Operators Using Data Science Methods,” Marketing and Management of Innovations, vol. 14, no. 2, pp. 163–173, 2023, doi: 10.21272/mmi.2023.2-15.
P. Jeyaprakaash and K. Sashirekha, “Accuracy Measure of Customer Churn Prediction in Telecom Industry using Adaboost over Decision Tree Algorithm,” J Pharm Negat Results, vol. 13, no. SO4, Jan. 2022, doi: 10.47750/pnr.2022.13.S04.179.
A. T. Bayrak, A. A. Aktas, O. Tunali, O. Susuz, and N. Abbak, “Personalized Customer Churn Analysis with Long Short-Term Memory,” in 2021 IEEE International Conference on Big Data and Smart Computing (BigComp), IEEE, Jan. 2021, pp. 79–82. doi: 10.1109/BigComp51126.2021.00024.
J. Hadden, A. Tiwari, R. Roy, and D. Ruta, “Computer assisted customer churn management: State-of-the-art and future trends,” Comput Oper Res, vol. 34, no. 10, pp. 2902–2917, Oct. 2007, doi: 10.1016/j.cor.2005.11.007.
P. Wanchai, “Customer churn analysis : A case study on the telecommunication industry of Thailand,” in 2017 12th International Conference for Internet Technology and Secured Transactions (ICITST), IEEE, Dec. 2017, pp. 325–331. doi: 10.23919/ICITST.2017.8356410.
K. K. Kurt, H. T. Atay, M. A. Cicek, S. B. Karaca, and S. Turkeli, “The Importance of Multidisciplinary in Data Science: Application of the Method in Health Sector to Telecommunication Sector,” in 2019 International Artificial Intelligence and Data Processing Symposium (IDAP), IEEE, Sep. 2019, pp. 1–4. doi: 10.1109/IDAP.2019.8875917.
Sadi Evren Şeker, “Müşteri Kayıp Analizi (Customer Churn Analysis),” YBS Ansiklopedi, vol. 3, no. 1, Mar. 2016.
F. Guo and H. Qin, “Data Mining Techniques for Customer Relationship Management,” J Phys Conf Ser, vol. 910, p. 012021, Oct. 2017, doi: 10.1088/1742-6596/910/1/012021.
P. Routh, A. Roy, and J. Meyer, “Estimating customer churn under competing risks,” Journal of the Operational Research Society, vol. 72, no. 5, pp. 1138–1155, May 2021, doi: 10.1080/01605682.2020.1776166.
P. Lalwani, M. K. Mishra, J. S. Chadha, and P. Sethi, “Customer churn prediction system: a machine learning approach,” Computing, vol. 104, no. 2, pp. 271–294, Feb. 2022, doi: 10.1007/s00607-021-00908-y.
O. F. Seymen, E. Ölmez, O. Doğan, O. Er, And K. Hızıroğlu, “Customer Churn Prediction Using Ordinary Artificial Neural Network and Convolutional Neural Network Algorithms: A Comparative Performance Assessment,” Gazi University Journal of Science, vol. 36, no. 2, pp. 720–733, Jun. 2023, doi: 10.35378/gujs.992738.
I. Ullah, B. Raza, A. K. Malik, M. Imran, S. U. Islam, and S. W. Kim, “A Churn Prediction Model Using Random Forest: Analysis of Machine Learning Techniques for Churn Prediction and Factor Identification in Telecom Sector,” IEEE Access, vol. 7, pp. 60134–60149, 2019, doi: 10.1109/ACCESS.2019.2914999.
A. Amin, F. Al-Obeidat, B. Shah, A. Adnan, J. Loo, and S. Anwar, “Customer churn prediction in telecommunication industry using data certainty,” J Bus Res, vol. 94, pp. 290–301, Jan. 2019, doi: 10.1016/j.jbusres.2018.03.003.
S. Arifin and F. Samopa, “Analysis of Churn Rate Significantly Factors in Telecommunication Industry Using Support Vector Machines Method,” J Phys Conf Ser, vol. 1108, p. 012018, Nov. 2018, doi: 10.1088/1742-6596/1108/1/012018.
A. De Caigny, K. Coussement, and K. W. De Bock, “A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees,” Eur J Oper Res, vol. 269, no. 2, pp. 760–772, Sep. 2018, doi: 10.1016/j.ejor.2018.02.009.
K. Coussement, S. Lessmann, and G. Verstraeten, “A comparative analysis of data preparation algorithms for customer churn prediction: A case study in the telecommunication industry,” Decis Support Syst, vol. 95, pp. 27–36, Mar. 2017, doi: 10.1016/j.dss.2016.11.007.
X. Zhang, Z. Liu, X. Yang, W. Shi, and Q. Wang, “Predicting customer churn by integrating the effect of the customer contact network,” in Proceedings of 2010 IEEE International Conference on Service Operations and Logistics, and Informatics, IEEE, Jul. 2010, pp. 392–397. doi: 10.1109/SOLI.2010.5551545.
M. Azeem, M. Usman, and A. C. M. Fong, “A churn prediction model for prepaid customers in telecom using fuzzy classifiers,” Telecommun Syst, vol. 66, no. 4, pp. 603–614, Dec. 2017, doi: 10.1007/s11235-017-0310-7.
S. A. Qureshi, A. S. Rehman, A. M. Qamar, A. Kamal, and A. Rehman, “Telecommunication subscribers’ churn prediction model using machine learning,” in Eighth International Conference on Digital Information Management (ICDIM 2013), IEEE, Sep. 2013, pp. 131–136. doi: 10.1109/ICDIM.2013.6693977.
Shi Long Zhuang, “Telecom Churn Dataset,” 2022. https://www.kaggle.com/datasets/shilongzhuang/telecom-customer-churn-by-maven-analytics
A. G. Kakisim, “Enhancing attributed network embedding via enriched attribute representations,” Applied Intelligence, vol. 52, no. 2, pp. 1566–1580, Jan. 2022, doi: 10.1007/s10489-021-02498-w.
L. Breiman, “Random forests,” Mach Learn, vol. 45, no. 1, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.
R. Ahmed, M. Bibi, and S. Syed, “Improving Heart Disease Prediction Accuracy Using a Hybrid Machine Learning Approach: A Comparative study of SVM and KNN Algorithms,” International Journal of Computations, Information and Manufacturing (IJCIM), vol. 3, no. 1, pp. 49–54, Jun. 2023, doi: 10.54489/ijcim.v3i1.223.
E. Uzun, H. V. Agun, and T. Yerlikaya, “Web content extraction by using decision tree learning,” in 2012 20th Signal Processing and Communications Applications Conference (SIU), IEEE, Apr. 2012, pp. 1–4. doi: 10.1109/SIU.2012.6204476.
S. Ustebay, Z. Turgut, and M. A. Aydin, “Cyber Attack Detection by Using Neural Network Approaches: Shallow Neural Network, Deep Neural Network and AutoEncoder,” 2019, pp. 144–155. doi: 10.1007/978-3-030-21952-9_11.
T. Öztürk, Z. Turgut, G. Akgün, and C. Köse, “Machine learning-based intrusion detection for SCADA systems in healthcare,” Network Modeling Analysis in Health Informatics and Bioinformatics, vol. 11, no. 1, p. 47, Dec. 2022, doi: 10.1007/s13721-022-00390-2.
G. Baltsou, K. Tsichlas, and A. Vakali, “Local community detection with hints,” Applied Intelligence, vol. 52, no. 9, pp. 9599–9620, Jul. 2022, doi: 10.1007/s10489-021-02946-7.
M. P. LaValley, “Logistic Regression,” Circulation, vol. 117, no. 18, pp. 2395–2399, May 2008, doi: 10.1161/CIRCULATIONAHA.106.682658.
R. Ahmed, M. Bibi, and S. Syed, “Improving Heart Disease Prediction Accuracy Using a Hybrid Machine Learning Approach: A Comparative study of SVM and KNN Algorithms,” International Journal of Computations, Information and Manufacturing (IJCIM), vol. 3, no. 1, pp. 49–54, Jun. 2023, doi: 10.54489/ijcim.v3i1.223.
S. Zavrak and M. Iskefiyeli, “Flow-based intrusion detection on software-defined networks: a multivariate time series anomaly detection approach,” Neural Comput Appl, vol. 35, no. 16, pp. 12175–12193, Jun. 2023, doi: 10.1007/s00521-023-08376-5.
S. Zavrak and S. Yilmaz, “Email spam detection using hierarchical attention hybrid deep learning method,” Expert Syst Appl, vol. 233, p. 120977, Dec. 2023, doi: 10.1016/j.eswa.2023.120977.
K. L. Tan, C. P. Lee, K. S. M. Anbananthen, and K. M. Lim, “RoBERTa-LSTM: A Hybrid Model for Sentiment Analysis With Transformer and Recurrent Neural Network,” IEEE Access, vol. 10, pp. 21517–21525, 2022, doi: 10.1109/ACCESS.2022.3152828.
S. Malik, P. Bansal, P. Sharma, R. Jain, and A. Vashisht, “Image Retrieval Using Multilayer Bi-LSTM,” 2022, pp. 745–755. doi: 10.1007/978-981-16-2597-8_64.
M. S. Başarslan and F. Kayaalp, “Sentiment analysis with ensemble and machine learning methods in multi-domain datasets,” Turkish Journal of Engineering, vol. 7, no. 2, pp. 141–148, Apr. 2023, doi: 10.31127/tuje.1079698.
T. Chen and C. Guestrin, “XGBoost,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA: ACM, Aug. 2016, pp. 785–794. doi: 10.1145/2939672.2939785.
J. Cao, Z. Lin, G.-B. Huang, and N. Liu, “Voting based extreme learning machine,” Inf Sci (N Y), vol. 185, no. 1, pp. 66–77, Feb. 2012, doi: 10.1016/j.ins.2011.09.015.
A. Mohammadifar, H. Gholami, and S. Golzari, “Stacking- and voting-based ensemble deep learning models (SEDL and VEDL) and active learning (AL) for mapping land subsidence,” Environmental Science and Pollution Research, vol. 30, no. 10, pp. 26580–26595, Nov. 2022, doi: 10.1007/s11356-022-24065-7.
A. Şentürk and Z. K. Şentürk, “Yapay Sinir Ağları İle Göğüs Kanseri Tahmini,” El-Cezeri Fen ve Mühendislik Dergisi, vol. 3, no. 2, May 2016, doi: 10.31202/ecjse.264199.
M. Sinan Basarslan and F. Kayaalp, “Performance Evaluation of Classification Algorithms on Diagnosis of Breast Cancer and Skin Disease,” 2021, pp. 27–35. doi: 10.1007/978-981-15-6321-8_2.
S. Varan and R. S. Behara, “Customer churn analysis in the wireless industry: A data mining approach,” 2014. [Online]. Available: https://www.researchgate.net/publication/228411974

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

A novel robust LSTM model on Churn and comparative analysis versus deep learning, machine learning and ensemble learning: Telecommunications Case

Status:

Version 1

Abstract

Figures

1. INTRODUCTION

2. CUSTOMER CHURN MANAGEMENT

2.1. Customer Churn Analysis

3. LITERATURE

4. MATERIAL AND METHOD

4.1.Dataset

4.2. Machine Learning

4.2.1 Random Forest

4.2.2 Support Vector Machines

4.2.3 Decision Tree

4.2.4 Artificial Neural Networks

4.2.5 Logistic Regression

4.2.6 K-Nearest Neighbor

4.3 Deep Learning

4.3.1. Long Short-Term Memory

4.4 Ensemble Learning

4.4.1 XGBoost

4.4.2 Voting

5. Proposed Model

6. EXPERIMENTAL RESULTS AND EVALUATION

6.1. Performance Evaluations

6.2. Experimental Results

7. CONCLUSION AND DISCUSSION

Declarations

References

Additional Declarations

Status:

Version 1