In this paper, a dataset is created to decide the necessity of spectrum handoff. Various machine learning classification algorithm are employed to provide the optimal boundary based on the trained data. Figures 7 to 10, show the trained set result and test set result for 100 and 500 number of users when various ML algorithms such as Logistic Regression, KNN Algorithm, SVM Algorithm, Naïve Bayes Classifier, Decision Tree Classification and Random Forest Algorithm. These plots are made by using two independent variables i.e., Distance from the base station on the x-axis and Power of each user on the y-axis. The graph shows two regions: the blue and the yellow. The former is represented by the observations in the blue region, while the latter is represented by the observations in the yellow region. Data points in the graph correspond to the users of the dataset, and the two regions represent the prediction and blue observations.
The blue point observations are for which requirement of spectrum handoff (dependent variable) is probably 0, i.e., users who are under the coverage of base-station 1. The yellow point observations are for which requirement of spectrum handoff is probably 1 means user is not under the coverage of base-station 1. Therefore, it is observed that Blue users don’t require spectrum handoff, when this user crosses the boundary line then it requires spectrum handoff. It is a good model and prediction. However, there are some data points that are in different regions which can be ignored. To minimize this error, we will use the confusion matrix to analyze the data. The classification shown in Figs. 7(a), (b) and 9(a), (b) is a linear model which is used for logistic regression. In the future, we will learn about non-linear classification techniques. In the first example, the boundary shown in the Figs. 7(c), (d) and 9(c), (d) is irregular because it is a K-NN algorithm that finds the nearest neighbor. It has also classified the users according to their categories. For instance, the blue region is for those who don't require the handoff, while the yellow region is for those who do. Although the model is showing good results, there are still some yellow and blue points in the different regions. This is not a big issue since doing this model prevents overfitting. The output of the model shown in Figs. 7(e), (f) and 9(e), (f) is similar to the one shown in the previous example. In the output, the hyperplane has been used to classify the users according to their categories. It has also divided the two classes into the blue and yellow regions.
The Nave Bayes classifier (see Figs. 8(a), (b) and 10(a), (b)) shows that it has a fine boundary and segregated the data points. It is a Gaussian curve, and we have used it in our code. However, there are some errors in the predictions that we have made in Confusion matrix. Despite these, it is still a good classifier. The decision tree classification output shown in Figs. 8(c), (d) and 10(c), (d) is different from the other models. It has both horizontal and vertical lines that are splitting the data according to the Distance and Power variable. This is because the tree is trying to capture all the data. Figures 8(e), (f) and 10(e), (f) (Random Forest Algorithm output) is very much similar to the Decision tree classifier. So, in the Random Forest classifier, we have taken 10 trees that have predicted Yes or NO for the handoff. The classifier took the majority of the predictions and provided the result. We can check that there is a minimum number of incorrect predictions without the overfitting issue. We will get different results by changing the number of trees in the classifier.
In the Tables 1 and 2, the predicted output and real test output are given for 100 and 500 number of users respectively. We can clearly see that there are some values in the prediction vector, which are different from the real vector values. These are called prediction errors and are highlighted in the tables for better understanding. So if we want to know the number of correct and incorrect predictions, we need to use the confusion matrix. The concept of the confusion matrix is a table that shows the rows that represent the actual classes that the model should have been able to achieve as shown in Figs. 11 and 12. The columns in the matrix represent the predictions that the algorithm has made. However, it is also easy to see which ones are wrong. True or False means that the model was correct, while the other means that there was an error or a wrong prediction. With the creation of the Confusion Matrix, we can now measure the quality of the model. In the Fig. 11(a), we can see the confusion matrix, which has 0 + 3 = 3 incorrect predictions and 19 + 3 = 22 correct predictions. The number of correct and incorrect predictions are generated and shown in Figs. 11 and 12 for 100 and 500 number of users.
Table 1
Predicted output and real test output of various ML algorithms for 100 number of users
Logistic Regression
|
Test set
|
[1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0]
|
Predict set
|
[1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
|
KNN Algorithm
|
Test set
|
[1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0]
|
Predict set
|
[1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0]
|
SVM Algorithm
|
Test set
|
[1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0]
|
Predict set
|
[1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0]
|
Naïve Bayes Classifier
|
Test set
|
[1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0]
|
Predict set
|
[1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0]
|
Decision Tree Classification
|
Test set
|
[1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0]
|
Predict set
|
[1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0]
|
Random Forest Algorithm
|
Test set
|
[1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0]
|
Predict set
|
[1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0]
|
Table 2
Predicted output and real test output of various ML algorithms for 500 number of users
Logistic Regression
|
Test set
|
[0 1 0 0 1 1 0 1 1 0 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 0 1 0 0 0 1 0 0 1 0 1 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 1 0 1 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0]
|
Predict set
|
[0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 0 1 1 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 1 0 0 1 0 1 0 1 1 1 0 0 0 1 1 0 0 1 0 0 0 0 1 0]
|
KNN Algorithm
|
Test set
|
[0 1 0 0 1 1 0 1 1 0 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 0 1 0 0 0 1 0 0 1 0 1 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 1 0 1 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0]
|
Predict set
|
[1 1 1 0 1 1 0 1 1 0 1 0 0 1 1 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 0 0 1 0 1 1 0 0 1 0 0 1 0 1 1 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 1 0 1 0 1 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 1 0]
|
SVM Algorithm
|
Test set
|
[0 1 0 0 1 1 0 1 1 0 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 0 1 0 0 0 1 0 0 1 0 1 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 1 0 1 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0]
|
Predict set
|
[0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 0 1 1 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 1 0 0 1 0 1 0 1 1 1 0 0 0 1 1 0 0 1 0 0 0 0 1 0]
|
Naïve Bayes Classifier
|
Test set
|
[0 1 0 0 1 1 0 1 1 0 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 0 1 0 0 0 1 0 0 1 0 1 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 1 0 1 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0]
|
Predict set
|
[0 1 1 0 1 1 0 1 0 0 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 0 1 1 0 1 1 0 0 0 1 0 0 1 0 1 1 0 0 1 0 0 1 0 0 1 0 1 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 1 0 1 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0]
|
Decision Tree Classification
|
Test set
|
[0 1 0 0 1 1 0 1 1 0 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 0 1 0 0 0 1 0 0 1 0 1 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 1 0 1 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0]
|
Predict set
|
[0 1 1 0 1 1 1 1 1 0 1 0 0 1 1 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 0 1 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 1 1 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 1 0]
|
Random Forest Algorithm
|
Test set
|
[0 1 0 0 1 1 0 1 1 0 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 0 1 0 0 0 1 0 0 1 0 1 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 1 0 1 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0]
|
Predict set
|
[1 1 1 0 1 1 0 1 1 0 1 0 0 1 1 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 0 0 1 0 1 1 0 1 1 0 0 1 0 1 1 1 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 1 1 0 1 1 1 0 1 0 0 1 0 0 0 0 0 1 1 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 1 0]
|
The authors of this study analyzed the performance of the ML algorithms on various parameters such as Accuracy, Precision, Sensitivity, Specificity, F1_score, Confusion Matrix [30] by varying the number of users and presented in Tables 3 and 4. The accuracy of a model is measured by how often it is correct. The precision measure is used to evaluate the amount of positive percentage. On the other hand, sensitivity is a measure of how good a model is at predicting false negatives. This is because, if a model is correct about predicting a positive outcome, then it should also consider the true positives. The sensitivity measure is useful in assessing the accuracy of a model when it comes to predicting a positive outcome. Specificity, on the other hand, is a measure of how well a model can predict a negative outcome. It takes into account both the false and positive cases. The F-score is a harmonic measure that takes into account both the sensitivity and precision of a model. However, it does not take into account the True Negative values. It has been observed from the Tables 3 and 4 that the number of test users increases with the number of values, which leads to more errors. It is also believed that the system will become more efficient by having fewer test users and more trained ones.
Table 3
Performance analysis of various ML algorithms for 100 users
|
Accuracy
|
Precision
|
Sensitivity
|
Specificity
|
F1_score
|
Logistic Regression
|
0.88
|
1.0
|
0.5
|
1.0
|
0.666
|
KNN Algorithm
|
0.96
|
1.0
|
0.833
|
1.0
|
0.909
|
SVM Algorithm
|
0.92
|
1.0
|
0.666
|
1.0
|
0.8
|
Naïve Bayes Classifier
|
0.92
|
1.0
|
0.666
|
1.0
|
0.8
|
Decision Tree Classification
|
1.0
|
1.0
|
1.0
|
1.0
|
1.0
|
Random Forest Algorithm
|
0.96
|
1.0
|
0.833
|
1.0
|
0.909
|
Table 4
Performance analysis of various ML algorithms for 500 users
|
Accuracy
|
Precision
|
Sensitivity
|
Specificity
|
F1_score
|
Logistic Regression
|
0.776
|
0.704
|
0.673
|
0.835
|
0.688
|
KNN Algorithm
|
0.856
|
0.769
|
0.869
|
0.848
|
0.816
|
SVM Algorithm
|
0.792
|
0.738
|
0.673
|
0.860
|
0.704
|
Naïve Bayes Classifier
|
0.84
|
0.782
|
0.782
|
0.873
|
0.782
|
Decision Tree Classification
|
0.832
|
0.735
|
0.847
|
0.822
|
0.787
|
Random Forest Algorithm
|
0.832
|
0.711
|
0.913
|
0.784
|
0.799
|