This section provides information about the dataset used in the study and the machine learning, deep learning, and ensemble learning methods that were applied to the dataset.
4.1.Dataset
The experiments were conducted on the Telco dataset [34], which is an open-source dataset of customer data from a telecommunications company in California in the second quarter of 2022. The Telco customer churn dataset contains data about a fictional telecom company operating in California during the third quarter. This dataset comprises 7043 customer records and includes information on customer retention, departure, and sign-ups for services. There are 21 attributes in this dataset. Table 1 shows the attributes and their descriptions [34].
Table 1: Attributes and their Description
Attribute Name
|
Description
|
Customer Id
|
Customer ID
|
Gender
|
Gender of customer
|
SeniorCitizen
|
Customer’s senior citizenship status
|
Partner
|
Does the client have a partner?
|
Addicts
|
Presence of dependents
|
Mission time
|
The month he stayed with the company
|
PhoneService
|
Telephone service subscription
|
MultipleLines
|
Multiple lines
|
Internet Service
|
Internet service provider
|
OnlineSecurity
|
Online safety status
|
OnlineBackup
|
Online backup status
|
Device Protection
|
Device protection ownership status
|
Tech Support
|
Availability of technical support
|
StreamingTV
|
TV subscription status
|
Streaming Movies
|
Movie subscription status
|
Agreement
|
Contract duration
|
Paperless Billing
|
E-billing status
|
payment method
|
Customer's payment method
|
MonthlyFees
|
Amount to be paid by the customer per month
|
TotalCharges
|
Total fees payable by the customer
|
Churn
|
Customer churn status
|
The dataset was preprocessed for missing data and outliers. Missing data is only present in the Total Charges attribute. This missing data was completed by assigning the median value of the relevant attribute, as is commonly done in the literature.
4.2. Machine Learning
Machine learning is a branch of artificial intelligence that enables computer systems to learn from data, recognize patterns, and make decisions. Machine learning is used in many fields today with its ability to understand and utilize a wide range of data. Machine learning is a revolutionary approach to improving the data analysis and learning capabilities of computers. Thanks to machine learning, solving complex problems has become more accessible and effective [35]. Information about the machine learning methods used in the study is given below.
4.2.1 Random Forest
RF is a method developed by Leo Breiman that combines the decisions of multiple trees trained on different training data sets, rather than building a single decision tree [36]. By creating multiple decision trees, each trained on a different dataset, the decisions of these trees are combined, resulting in a more powerful and stable classifier. RF is a popular algorithm that provides successful results for various machine learning problems such as classification and regression. It is also favored because of its ability to work effectively on high-dimensional and large datasets. RF is a widely used method in data analysis because it has a wide range of applications and is generally resistant to overfitting [36].
4.2.2 Support Vector Machines
SVM can be defined as a vector space-based machine learning method that finds a decision boundary between two classes and places this decision boundary at the furthest point from any point in the training data. SVM works as a structural risk minimization approach in statistical learning theory. It is noteworthy that SVM works as a structural risk minimization approach in statistical learning theory. One of the basic assumptions of the SVM is that all the samples in the training set are independent of each other and are similarly distributed [37]. This property makes SVM an effective solution for linear and nonlinear data classification and regression problems. Due to its high performance and generalizability on low-dimensional and high-dimensional data sets, SVM has a wide range of applications and is a powerful tool widely used in machine learning and data analysis [37].
4.2.3 Decision Tree
DT is an important tool in the field of data analysis and machine learning. Used in a wide range of data mining processes, decision trees facilitate information extraction with the ability to transform complex data structures into simple decision rules. Decision trees provide an effective way to make data-driven decisions by analyzing the features in the dataset [38]. The process of building decision trees is a fundamental part of the data mining steps. In the first step, the data is preprocessed and feature selection is performed. Then, the tree-building process begins, where nodes are created to identify the best discriminative decisions based on the features in the dataset. However, the tree is pruned to prevent overfitting and to ensure generalizability. Decision trees have yielded successful results in many fields such as disease diagnosis in medicine, risk assessment in finance, and customer segmentation in marketing. As a result, decision trees are an important tool that provides valuable information in data-driven decision-making processes and constitutes one of the cornerstones of data analysis.[38].
4.2.4 Artificial Neural Networks
ANNs are artificial model systems inspired by the biological nervous system and provide impressive results in the fields of machine learning and data analysis. These artificial models can be used in various tasks thanks to their data processing and pattern recognition capabilities [39].
The basic components of artificial neural networks are called "neurons". These neurons mimic the functioning of real nerve cells when processing data inputs. Artificial neural networks are organized in layers. The input layer receives the data, the intermediate layers process the data and the output layer produces the results. The internal connections of the network are determined by weights and thresholds. The learning process of the ANN takes place by adjusting the weights to recognize and understand patterns in the dataset.
With a sub-field called Deep Learning, deep structured networks can produce outstanding results by performing feature extraction on large and complex datasets. Used in many fields such as natural language processing, and game strategies, ANNs play an important role in processing large amounts of data and understanding complex relationships [40].
4.2.5 Logistic Regression
LR is a classification technique widely used in statistical analysis and machine learning. This method, which is generally used in binary classification problems, aims to estimate the probability of a data point belonging to a particular class. This estimation is realized by deciding on a threshold value of the probability values and making a classification. LR derives its name from the "logit" function. This function is used to map probabilities to an unlimited range of real numbers. The main purpose of the model is to estimate the probability of a given event occurring, based on given input characteristics. Therefore, the output value is a probability value and lies between 0 and 1 [41].
At the heart of LR is the idea of finding a line or hyperplane (this line or plane is called the "decision boundary") that best separates the dataset. This line or plane tries to best capture the separation between classes. The training process takes place by adapting the parameters (weights) of the model to the data. At the end of training, the model can be used to predict new data points [42].
4.2.6 K-Nearest Neighbor
The KNN algorithm is a widely used technique for problems like pattern recognition, classification, and regression. The KNN algorithm has a simple structure. When a data point needs to be classified or predicted, the k nearest neighbors around this point are determined. The class labels or values of these neighbors are examined. In the case of classification, the class to which the largest number of neighbors belongs is selected. In the case of regression, the values of the neighbors are averaged and prediction is performed [43].
4.3 Deep Learning
Deep learning is a branch of machine learning that aims to automatically learn data analysis, pattern recognition, and predictive capabilities using complex model structures such as artificial neural networks. By processing large datasets, this approach can identify more complex and high-level features. This can lead to more accurate results and better understanding [44]. Deep learning is characterized by making the layers of artificial neural networks wider and deeper. These neural networks use many intermediate layers to process the data, allowing more complex features to be discovered. The learning process involves automatically adjusting weights and features using large amounts of data. Deep learning aims to extract features directly from the data. This makes it possible to learn more information and achieve better results. The application areas of deep learning are quite wide. It is successfully used in many fields such as image recognition, audio processing, natural language processing, medical diagnosis, and autonomous vehicles. For example, deep learning models can identify objects in images, understand text, and diagnose diseases [45]. The LSTM deep learning method used in this study will be described in this section.
4.3.1. Long Short-Term Memory
LSTM is a powerful type of recurrent neural network developed for deep learning and especially for processing sequential data. LSTM has achieved great success in areas such as natural language processing, time series analysis, and video processing. This model is designed to overcome previous traditional methods for learning and understanding sequential data [46]. The key feature of LSTM is its ability to understand long-term dependencies in sequential data. This determines the superiority of LSTM over regular recurrent neural networks (RNN) [47].
LSTM cell has three components; Forget gate, Input gate, and Output gate. The forget gate controls how long the previous cell state is forgotten. The input gate sets the insertion of new information. The exit gate sets the time for the updated cell state to be used.
4.4 Ensemble Learning
Ensemble learning is used in many fields such as social networks, communication networks, and biological networks. With this method, groups within a network and their relationships with each other can be better understood. Ensemble learning algorithms identify communities by identifying similar nodes or grouping nodes according to the strength of their connections to each other. Information about the ensemble learning methods used in the study is given below [48].
4.4.1 XGBoost
XGBoost is an ensemble learning algorithm used in machine learning. It shows high performance, especially in classification and regression problems. XGBoost offers a more powerful and faster solution than previous Gradient Boosting methods.
XGBoost works as a tree-based algorithm. It combines multiple weak learners (usually decision trees) to build a robust prediction model. The algorithm performs the learning process with an approach that focuses on the sequential addition of trees and correcting the errors of previous trees [49].
4.4.2 Voting
Voting is a method used in machine learning and statistical forecasting. In this method, the predictions of different prediction models or classifiers are combined and the result is the prediction of the majority vote. Voting is used to compensate for the weaknesses of models used alone and to obtain more reliable predictions. There are two types: majority voting and probabilistic voting [50].
- Majority voting (Hard Voting): A method where the majority of the predictions produced by different models are taken. For example, three different classifiers A, B, and C. If models A and B predict an instance as Class 1, while model C predicts it as Class 2, the Majority Voting method predicts it as Class 1 [51].
- Probabilistic Voting (Soft Voting): This is a method that takes the average or weighted average of the forecast probabilities of different models. This method can obtain more accurate predictions by taking into account the reliability levels of the models. The prediction probabilities of each model are combined and the class with the highest probability is predicted [50], [51]
Voting ensemble learning is used to combine the predictions of different prediction models or classifiers to obtain more robust and reliable results. In this study, all machine learning methods are used together with majority voting.