Three metrics are used to evaluate the performance of the proposed IMV-RE algorithm. The IMV-RE algorithm is applied to ten benchmark datasets and the classification results obtained using RandomForest Classifier, RMSE values and Coefficient of determination (R2) values are compared to CCMVI, KNNI, MiceForest, IterativeImputer, SimpleImputer MVI techniques.
Table 2 depicts the average classification accuracies obtained on the ten benchmark datasets Waveform, Glass, Wheat-Seed, Digits, Wine, Iris, Seeds, Ionosphere, SCADI and Ecoli having 10%, 20%, 30%, 40% and 50% missing rates. The average accuracy of the proposed IMV-RE algorithm achieved for all the ten datasets is 91.58% which is better than the other five MVI approaches MiceForest (81.91%), KNNI (80.95%), SimpleImputer (81.7%) and IterativeImputer (82.81%)
For Waveform, Glass, Wheat-Seed, Digits and Wine datasets the CCMVI technique is not applicable beyond 10% missing rate because of its dependency on the samples without missing values to calculate the threshold value that is utilized to estimate the missing values in a particular class.
Similarly, Table 3 compares the average RMSE values obtained for ten datasets. The results show that the proposed IMV-RE algorithm achieves the lowest RMSE value 0.73 in comparison to the other five MVI techniques. RMSE for CCMVI technique cannot be calculated because of the same above-mentioned reason.
Table 4 compares the average value of coefficient of determination obtained for all the ten datasets. The proposed IMV-RE algorithm achieves average Coefficient of Determination value 0.886 which is higher than all other MVI techniques.
Table 2
Average classification accuracies obtained from distinct MVI techniques
Datasets | IMV-RE | MiceForest | KNNI | CCMVI | SimpleImputer | IterativeImputer |
Waveform | 0.96443 | 0.78015 | 0.76447 | - | 0.77695 | 0.79092 |
Glass | 0.80498 | 0.54197 | 0.52140 | - | 0.55801 | 0.57763 |
Wheat-Seed | 0.95944 | 0.78583 | 0.78957 | - | 0.81168 | 0.82681 |
Digits | 0.98664 | 0.77682 | 0.73919 | - | 0.65356 | 0.71446 |
Wine | 0.95883 | 0.95197 | 0.93429 | 0.96990 | 0.96860 | 0.96537 |
Iris | 0.97333 | 0.92267 | 0.90667 | 0.97267 | 0.92933 | 0.94800 |
Seeds | 0.89143 | 0.87143 | 0.88095 | 0.90286 | 0.87714 | 0.87810 |
Ionosphere | 0.91851 | 0.91059 | 0.90365 | 0.92029 | 0.91285 | 0.90766 |
SCADI | 0.85275 | 0.84440 | 0.83846 | 0.84725 | 0.85626 | 0.84703 |
Ecoli | 0.84753 | 0.80530 | 0.81605 | 0.85595 | 0.82558 | 0.82551 |
Average | 0.91579 | 0.81911 | 0.80947 | - | 0.81700 | 0.82815 |
Table 3
Average RMSE obtained from distinct MVI techniques
Datasets | IMV-RE | MiceForest | KNNI | CCMVI | SimpleImputer | IterativeImputer |
Waveform | 0.68948 | 0.70871 | 0.73801 | - | 0.80543 | 0.71955 |
Glass | 0.32751 | 0.37197 | 0.38146 | - | 0.38244 | 0.39661 |
Wheat-Seed | 0.32499 | 0.46663 | 0.48078 | - | 0.63877 | 0.42437 |
Digits | 2.02541 | 2.07970 | 2.25400 | - | 2.60753 | 2.53113 |
Wine | 3.54842 | 4.65323 | 4.98252 | 3.79756 | 7.21529 | 6.15099 |
Iris | 0.15431 | 0.21552 | 0.21250 | 0.17177 | 0.40314 | 0.21158 |
Seeds | 0.14982 | 0.11402 | 0.15104 | 0.15284 | 0.30043 | 0.13956 |
Ionosphere | 0.07780 | 0.05712 | 0.04592 | 0.07728 | 0.06871 | 0.06354 |
SCADI | 0.00102 | 0.00082 | 0.00041 | 0.00100 | 0.00182 | 0.00159 |
Ecoli | 0.02252 | 0.03085 | 0.02623 | 0.02361 | 0.02984 | 0.02654 |
Average | 0.73213 | 0.86986 | 0.92729 | - | 1.24534 | 1.06655 |
Table 4
Average Coefficient of Determination (R2) obtained from distinct MVI techniques
Datasets | IMV-RE | MiceForest | KNNI | CCMVI | SimpleImputer | IterativeImputer |
Waveform | 0.78181 | 0.75131 | 0.73399 | - | 0.71399 | 0.76373 |
Glass | 0.71782 | 0.63663 | 0.60446 | - | 0.65406 | 0.65720 |
Wheat-Seed | 0.85907 | 0.68851 | 0.71047 | - | 0.57340 | 0.77365 |
Digits | 0.67990 | 0.67332 | 0.56739 | - | 0.53716 | 0.50253 |
Wine | 0.96225 | 0.96254 | 0.93562 | 0.95724 | 0.93715 | 0.95333 |
Iris | 0.95632 | 0.93200 | 0.92954 | 0.94407 | 0.82545 | 0.93378 |
Seeds | 0.97092 | 0.97381 | 0.96951 | 0.96438 | 0.90623 | 0.95981 |
Ionosphere | 0.97309 | 0.98352 | 0.98874 | 0.97279 | 0.97986 | 0.98238 |
SCADI | 0.99958 | 0.99940 | 0.99981 | 0.99929 | 0.99934 | 0.99948 |
Ecoli | 0.96658 | 0.94247 | 0.94557 | 0.96319 | 0.95191 | 0.95639 |
Average | 0.88673 | 0.85435 | 0.83851 | - | 0.80785 | 0.84823 |
Figure 2 graphically represents the performance comparison of proposed IMV-RE algorithm over different evaluation metrics.
Table 5
Percentage Error between mean values of actual datasets and imputed datasets
Datasets | Ecoli | Glass | Wheat-Seed | Digit | Wine | Iris | Seed | Ionosphere | SCADI | Waveform |
Actual Mean | 0.4996 | 0.0968 | 6.8967 | 4.8843 | 0.0437 | 3.4645 | 0.0820 | 0.2477 | 0.2033 | 1.7123 |
Imputed Mean | 0.4988 | 0.0790 | 6.9166 | 4.8887 | 0.0371 | 3.4657 | 0.0749 | 0.2438 | 0.2035 | 1.7153 |
Percentage Error | 0.1520 | 18.3642 | 0.2890 | 0.0890 | 15.0280 | 0.0332 | 8.5944 | 1.5801 | 0.0914 | 0.1745 |
Table 6
Percentage Error between calculated standard deviation of actual datasets and imputed datasets
Datasets | Ecoli | Glass | Wheat-Seed | Digit | Wine | Iris | Seed | Ionosphere | SCADI | Waveform |
Actual Stdev | 0.144 | 1.048 | 1.010 | 3.684 | 0.702 | 0.948 | 0.632 | 0.510 | 0.263 | 1.520 |
Imputed Stdev | 0.156 | 0.961 | 5.311 | 5.549 | 0.704 | 1.968 | 0.634 | 0.573 | 0.951 | 1.758 |
Percentage Error | 8.470 | 8.343 | 425.986 | 50.616 | 0.366 | 107.660 | 0.375 | 12.176 | 261.444 | 15.672 |
Table 5 represents the percentage error calculated between the mean values of actual datasets and imputed datasets by considering the average of total mean value calculated at different missing rates 10%, 20%, 30%, 40% and 50% using proposed IMV-RE algorithm.
Similarly, the Table 6 represents the percentage error calculated between the standard deviation of actual datasets and imputed datasets by considering the average of total standard deviation value calculated at different missing rates 10%, 20%, 30%, 40% and 50% using proposed IMV-RE algorithm.