5.1 Correlation analysis
The correlation between the characteristics of the dataset provides crucial information about the characteristics and the degree of influence they have on the dependent variable. Pearson's heat map showing the correlation between features is shown in “Fig. 4 ”.
This “Fig. 4” reveals a strong positive correlation between variables such as Throughput_DL_user, CQI_Avg and DL_perceived_throughput. A relatively weak positive correlation is observed between CDR and DL_perceived_throughput. We have a weak negative correlation between DL_perceived_throughput and the other variables.
5.2 Predictive analysis
During this analysis, we compare, using ML techniques, the performance of different RMSE variables to predict the perceived throughput of the downlink. RMSE determines the error between the predicted and actual values of the flow in the test set and its expression is given by:
$$RMSE=\sqrt{\frac{1}{N}{\sum }_{i=1}^{N}{\left(A|i-{P}_{i}\right)}^{2}}$$
1
Where Ai is the ith value of DL_perceived_throughput in the test set, Pi is the ith corresponding predicted value of DL_perceived_throughput, and N is the number of DL_perceived_throughput observations in the test set. Using the LR technique gives us in Table 2 the RMSE between the actual values of DL_perceived_throughput in the test set and the predicted values.
Table 2
RMSE between actual values of DL_perceived_throughput in test set and predicted values when using LR model
Method | Variable used in prediction for DL_perceived_throughput | RMSE |
LR | DL_Traffic | 10.17 |
Traffic_UL | 11.91 |
Total_Traffic | 11.33 |
Active_user_Max | 7.60 |
RRC_user_Max | 33.51 |
throughput_UL_user | 3331.72 |
Throughput_DL_user | 11937.21 |
DL_PRB_Rate | 8.54 |
UL_PRB_Rate | 8.54 |
CDR | 13.12 |
CQI_Avg | 5.65 |
Radio_DL_Delay_avg | 52.50 |
An RMSE of 5.65 is obtained when CQI_Avg in the test set is used in combination with the LR model to generate predicted values for DL_perceived_throughput. With the DT, the RMSE between the actual values of DL_perceived_throughput in the test set and the predicted values is given in Table 3.
Table 3
RMSE between actual values of DL_perceived_throughput in test set and predicted values when using DT model
Method | Variable used in prediction for DL_perceived_throughput | RMSE |
DT | DL_Traffic | 9.96 |
Traffic_UL | 12.00 |
Total_Traffic | 11.11 |
Active_user_Max | 7.67 |
RRC_user_Max | 33.49 |
throughput_UL_user | 3331.70 |
Throughput_DL_user | 11937.14 |
DL_PRB_Rate | 8.52 |
UL_PRB_Rate | 8.61 |
CDR | 13.23 |
CQI_Avg | 5.78 |
Radio_DL_Delay_avg | 52.37 |
We get an RMSE of 5.78 when CQI_Avg in the test set is used in combination with the fitted DT model to generate the predicted values for DL_perceived_throughput. Using the RF technique, the RMSE between the actual values of DL_perceived_throughput in the test set and the predicted values is given in Table 4.
Table 4
RMSE between actual values of DL_perceived_throughput in test set and predicted values when using RF model
Method | Variable used in prediction for DL_perceived_throughput | RMSE |
RF | DL_Traffic | 9.96 |
Traffic_UL | 11.71 |
Total_Traffic | 11.15 |
Active_user_Max | 7.37 |
RRC_user_Max | 33.54 |
throughput_UL_user | 3331.85 |
Throughput_DL_user | 11937.38 |
DL_PRB_Rate | 8.28 |
UL_PRB_Rate | 8.31 |
CDR | 12.94 |
CQI_Avg | 5.43 |
Radio_DL_Delay_avg | 52.54 |
From this Table 4, an RMSE of 5.43 is obtained when CQI_Avg in the test set is used in combination with the RF model fitted to give the predicted values of DL_perceived_throughput. With ANN or MLP, the RMSE between the actual values of DL_perceived_throughput in the test set and the predicted values is given in Table 5.
Table 5
RMSE between actual values of DL_perceived_throughput in test set and predicted values when using MLP model
Method | Variable used in prediction for DL_perceived_throughput | RMSE |
MLP | DL_Traffic | 10.12 |
Traffic_UL | 11.90 |
Total_Traffic | 11.26 |
Active_user_Max | 7.49 |
RRC_user_Max | 33.16 |
throughput_UL_user | 3331.74 |
Throughput_DL_user | 11937.26 |
DL_PRB_Rate | 8.49 |
UL_PRB_Rate | 8.53 |
CDR | 13.13 |
CQI_Avg | 5.72 |
Radio_DL_Delay_avg | 52.45 |
This Table 5 gives an RMSE of 5.72 when CQI_Avg in the test set is used in combination with the MLP model fitted to give the predicted values for DL_perceived_throughput. With the DNN model, the RMSE between the actual values of DL_perceived_throughput in the test set and the predicted values is given in Table 6.
Table 6
RMSE between actual values of DL_perceived_throughput in test set and predicted values when using DNN model
Method | Variable used in prediction for DL_perceived_throughput | RMSE |
DNN | DL_Traffic | 11.96 |
Traffic_UL | 12.11 |
Total_Traffic | 13.30 |
Active_user_Max | 7.80 |
RRC_user_Max | 31.12 |
throughput_UL_user | 3159.79 |
Throughput_DL_user | 12292.75 |
DL_PRB_Rate | 9.01 |
UL_PRB_Rate | 8.77 |
CDR | 13.28 |
CQI_Avg | 6.04 |
Radio_DL_Delay_avg | 56.50 |
From this Table 6, an RMSE of 6.04 is obtained when CQI_Avg in the test set is used in combination with the DNN model fitted to give the predicted values of DL_perceived_throughput. Thus, the DNN model further confirms that the Channel Quality Indicator (CQI) represented here by the CQI_Avg parameter is a good predictor of throughput because it gives the lowest RMSE compared to the other input parameters of the DNN model.
5.3 Model prediction performance.
In this part, we use training and test datasets to train and evaluate each of our models.
“Fig. 5” illustrates actual values of DL_perceived_throughput vs. Predictions generated when using the LR, DT, RF and MLP models. The black colored line reffered to as test set in this figure represents the actual values of the DL_perceived_throughput in the test set.
We evaluate these five ML models (LR, DT, RF, MLP and DNN) on the dataset using two regression metrics: RMSE and R-Squared.
The RMSE is the square root of the root mean square errors and its expression is given in Eq. (1).
The R-Squared metric, also known as the coefficient of determination, provides an indication of how the model predicts invisible values. R-Squared is calculated using Eq. (2).
$${r}^{2}=1-\frac{SSE}{SS{T}_{0}}$$
2
where SSE is the sum of the squared error and SSTO is the total sum of the squared values.
$$SSE={\sum }_{i=1}^{N}{\left({P}_{j}-Ᾱ\right)}^{2}$$
3
$$SS{T}_{0}={\sum }_{i=1}^{N}{\left({P}_{i}-Ᾱ\right)}^{2}$$
4
where Pj, Ᾱ are respectively the predicted and mean value of the target variable.
The MAE metric measures the average of the magnitude of errors over the test set with N data points. Its expression is given in Eq. (5).
$$MAE=\frac{1}{2N}{\sum }_{i=1}^{N}\left|{A}_{i}-{P}_{i}\right|$$
5
RMSE and MAE are negatively oriented scores, which means lower values are better.
In addition, RMSE is preferable to MAE when the behavior of outliers is important as is the case in our study.
The prediction performance of each ML algorithm is evaluated on a set of test data. The predicted values of DL_perceived_throughput versus actual values in the data set are plotted in “Fig. 6” for models (LR, DT, RF, MLP).
After implementing the DNN model, a set of training and testing or validation data to measure model performance. Thus, the MAE metric is used to see training and validation losses. These losses can be seen in “Fig. 7”.
This figure above shows low training and validation losses that can justify the throughput prediction performance with this DNN model.
The prediction performance of the DNN model can be seen in “Fig. 8”.
As we can observe in this “Fig. 8” above, our DNN model predicts the throughput with great accuracy.
“Fig. 9” illustrates the relationship between the actual values of each predictor (independent variable) and the predicted values of DL_perceived_throughput by DNN model.
Performance Comparison of All Models
The comparison between the different models in this study is made based on evaluation metrics such as accuracy, MAE and RMSE. The static results of these metrics are grouped in the table below.
Table 7
Metrics | LR | DT | RF | MLP | DNN |
MAE | 0.72 | 0.90 | 0.63 | 1.10 | 0.49 |
RMSE | 0.92 | 1.40 | 0.83 | 1.58 | 0.73 |
accuracy | 91.75% | 81.01% | 93.31% | 75.73% | 96.1% |
The results listed in Table 7 show that the five models can be used effectively for prediction. According to Table 7, DNN offers better performance (i.e. low RMSE value = 0.73, smaller MAE value = 0.49 and with higher accuracy = 96.1%).
For more details on the comparison between these different models, we have visualized in the form of a bar chart each of the evaluation metrics for each model. “Fig. 10, 11, 12” illustrate these bar charts.
As can be seen, “Fig. 10, 11, 12” show that the DNN is the best model because it offers the smallest RMSE and MAE. This DNN world also offers the highest accuracy up to 91.1%.
We present below the functional diagram of the model / system.
Random Forest – Decision tree
- For k=1 to B
(a) Draw N sample points from the collected data from the MUEs and the neighboring small cell base stations (SBSs) to form a bootstrap at the designated SBS
(b) Grow a random forest tree \({T}_{b}\) to the bootstrapped data by recursively repeating the following steps for each terminal node of the tree until the minimum size \({n}_{min}\) is reached
- Select m variables at random from the p variables
- Pick the best variable/split-point among the m variables
- Split the node into two daughter nodes
2. Output the ensemble of trees \({\left\{{T}_{b}\right\}}_{1}^{B}\).
The prediction of a new parameter value from the u = input data x is given by the regression
$${\stackrel{`}{f}}_{RF}^{B}\left(x\right)=\frac{1}{B}{\sum }_{b=1}^{B}{T}_{b}\left(x\right)$$
The classification is given by the majority vote as follows:
Let \({C}_{b}\left(x\right)\) be the class prediction of the both random forest tree, then
$${{C}_{RF}^{B}\left(x\right)=majorityvote}_{}{{\{C}_{b}\left(x\right)\}}_{1}^{B}$$
With the proposed RF algorithm, each SBS applies the RF locally using its own data and the data received from the neighboring SBSs to construct the bootstrap. The contribution to this scheme is the sharing of data by the SBSs which enables a dynamic cooperation clustering and effective parameter classification. The cluster of SBSs exchanging data is of a variable size too.
The optimization technique used for our model is cross validation. The method used is GridSearchCV from Scikit-Learn.