We examine all selected studies (30 papers) in this section. A synthesis of the data is performed in order to answer the aforementioned research questions.
RQ1: Which hybridized metaheuristic algorithms are used for the FS problem? What is the purpose of the hybridization? What are their application domains?
Metaheuristic algorithms have been shown as techniques that can solve difficult computational NP-problems like the FS problem, so it has gained significant interest from scholars,
Table 3Summary of hybrid algorithms, purpose of hybridization, datasets utilized, and their application domains
P9
|
El-Kenawy, Mirjalili et al. (2022)
|
SCA + WOA
|
To leverage the strengths of WOA and SCA for solving problems with continuous and binary decision variables.
|
19 UCI benchmark datasets
|
Multiple domains
|
P10
|
Ewees, Al-qaness et al. (2021)
|
AOA + GA
|
To guarantee the solutions’ diversity is kept.
To tackle the main weaknesses of conventional AOA’s by avoiding the search strategies’ local search problem and search balancing.
|
20benchmarks + 2 real-world problems containing gene datasets
|
Multiple domains
|
P11
|
Liang, Wang et al. (2019)
|
ACO + BSO
|
To enhance the performance of ACO, as a result, avoid stagnation in the local optima and premature convergence.
|
Six binary classification UCI datasets
|
Multiple domains
|
P12
|
Almazini, Ku-Mahamud et al. (2023)
|
GWO + ACO
|
To improve the initialization of the wolf population using the ACO algorithm concept.
|
NSL-KDD benchmark datasets
|
Intrusion detection
|
P13
|
Mazini, Shirazi et al. (2019)
|
ABC + AdaBoost
|
ABC is utilised for FS, while AdaBoost is employed for feature evaluation and classification.
|
NSL-KDD + ISCXIDS2012 datasets
|
Intrusion detection
|
P14
|
Thawkar (2021)
|
TLBO + SSA
|
To achieve better convergence and efficiency, The basic TLBO is modified using the SSA.
|
651 breast cancer screenings
|
Medical
|
P15
|
Fajri and Wiharto (2023)
|
BeeSO + Q-Learning
|
To enhance the efficiency of the feature search method and ultimately enhance the classification accuracy by combining the BeeSO with Q-learning.
|
Four heart disease UCI datasets
|
Medical
|
P16
|
El-Shafiey, Hagag et al. (2022)
|
PSO + GA
|
To aim for the rejected individuals from every generation to achieve the goal of rehabilitating them and optimizing the contribution of all individuals in each generation.
|
Cleveland and Statlog
(heart disease UCI datasets)
|
Medical
|
P17
|
Li, Zhang et al. (2021)
|
PSO + GA
|
To reduce the chance of getting stuck in local optima, PSO was used for increasing the convergence rate of GA.
|
Diabetes dataset
|
Medical
|
P18
|
Bezdan, Zivkovic et al. (2022)
|
BSO + FA
|
To achieve a balance between exploring and exploiting.
To reduce the drawbacks of the conventional BSO.
|
21 UCI benchmark datasets +
Covid-19
|
Multiple domains
|
P19
|
Hans and Kaur (2020)
|
ALO + SCA
|
To take benefits from the SCA to balance exploration and exploitation.
To increase the diversity of the solutions, it also allows both algorithms to explore the search space.
|
18 diverse real-time datasets attained from online
|
Multiple domains
|
P20
|
Houssein, Hosney et al. (2020)
|
HHO + CS
|
To avoid falling into local optimum and premature convergence.
To balance between exploration and exploitation.
To enhance the limitations of the original HHO.
|
Two chemical datasets + 10 UCI benchmark datasets
|
Multiple domains
|
P21
|
Osmani, Mohasefi et al. (2022)
|
ABC + ICA
|
To improve ABC exploitation.
|
16 UCI benchmark datasets + 2 Amazon datasets
|
Sentiment classification
|
P22
|
Akinola, Ezugwu et al. (2022)
|
DMO + SA
|
To improve the limited exploitative process of the DMO.
|
Three high‑dimensional medical datasets +
18 UCI datasets (low and medium dimensions)
|
Multiple domains
|
P23
|
Phogat and Kumar (2023)
|
IBCSO + WOA
|
To attain a good balance between exploration and exploitation and identify informative genes.
|
Six microarray datasets
|
Medical
|
P24
|
Alkanhel, El-kenawy et al. (2023)
|
GWO + DTO
|
To avoid the local optima issue stagnation and early convergence of GWO.
To improve the exploration and exploitation search.
|
IoT-IDS dataset
(RPL-NIDDS17 dataset)
|
Intrusion detection
|
P25
|
Alhussan, Abdelhamid et al. (2023)
|
DBER + DTO
|
To improve exploration and exploitation of the search space where BER was motivated by the behavior of swarm members in achieving their global goals.
|
Diabetes datasets
|
Medical
|
P26
|
Alwan, AbuEl-Atta et al. (2021)
|
FA + GA
|
To prevent being stuck in local optima through improving the exploration abilities of the standard firefly, a mutation operation was employed.
|
NSL-KDD benchmark datasets
|
Intrusion detection
|
P27
|
Alweshah, Aldabbas et al. (2023)
|
BWO + IG
|
To enhance the local search capabilities of the BWO algorithm.
|
Nine benchmark datasets from the gene expression data repository
|
Medical
(Gene selection)
|
P28
|
Masrom, Rahman et al. (2022)
|
PSO + GA
|
To resolve the problem of immature convergence in PSO
|
Real dataset on the tax avoidance cases among companies in Malaysia
|
Taxation and financial compliance
|
P29
|
Shanthi, Akshaya et al. (2022)
|
SDS + TS
|
Diversity is given by the SDS to the candidate solutions belonging to the TS if there are no good solutions.
|
Lung cancer dataset
|
Medical
|
P30
|
Lee, Le et al. (2022)
|
GWO + HBO
|
To improve the capability of both global and local searching.
To resolve the problem of falling into local traps in GWO.
|
4 UCI benchmark datasets
|
Engineering
|
especially in the hybrid metaheuristics field (Talbi 2009). These hybrid approaches try to share each other's strengths to enhance their performance. Figure 7 presents all population-based metaheuristic algorithms in this SLR that have been employed for solving the FS problem in classification.
The GWO is one of the most modern bio-inspired algorithms; it has gained significant interest within the field of hybrid metaheuristic algorithms for FS at a rate of 33% of publications. It is widely recognized that both exploration and exploitation are crucial for any population-based algorithm to exhibit excellent performance. In conventional GWO, the primary focus is on updating all the search agents, which are represented by wolves, based on the best search agent (α), second best search agent (β), and third best search agent (δ) throughout the entire optimization process. Essentially, premature convergence occurs as a result of the position update method because the search agents are not given sufficient opportunity to efficiently explore the search space (Arora, Singh et al. 2019). Moreover, the value of the exploration and exploitation parameter does not depend on the feedback from the search process but changes linearly. This greatly restricts the search process as each wolf will have the same value for the solution it provides (Gu, Li et al. 2019).
To address the above-mentioned shortcomings of the original GWO and use it to solve the FS problem in classification tasks, several studies have conducted hybridization of GWO with another algorithm. For example, it has been hybrid with PSO in (Al-Tashi, Abdul Kadir et al. 2019, El-Kenawy and Eid 2020). El-Kenawy and Eid (2020) proposed to divide the population into two groups. The first group adheres to the GWO procedures, whereas the second group adheres to the PSO procedures to balance exploitation and exportation as well as improve algorithm performance. Al-Tashi, Abdul Kadir et al. (2019) developed the hybrid GWO with PSO, which was proposed in (Singh and Singh 2017) to make it capable of solving the FS problem. The authors enhanced the exploitation capability in PSO and the exploration capability in GWO by controlling the exploration and exploitation of the grey wolf in the search space by the inertia constant parameter of PSO.
To solve the problem of immature convergence, a more effective exploration phase of GWO in high-level hybridization is suggested by using HHO in (Al-Wajih, Abdulkadir et al. 2021) and WOA in (Mafarja, Qasem et al. 2020). The GWO algorithm modifies the position vector of wolves by considering the three best agents in the population. This can lead to being stuck at the local optimum. In contrast, the WOA algorithm uses a random factor to move certain agents inside the feature space. This factor enables the WOA algorithm to randomly evade local optima; however, it may cause premature convergence. These drawbacks motivated Mafarja, Qasem et al. (2020) to propose three hybrid models (serial GWO-WOA, random switcher GWO-WOA, and adaptive switcher GWO-WOA). Almazini, Ku-Mahamud et al. (2023) improved the initial population of the GWO by using a heuristic-based ACO in intrusion detection. The DTO, which is capable of identifying viable regions and providing the best solution, was hybridized with GWO (Alkanhel, El-kenawy et al. 2023, Sami Khafaga, M. El-kenawy et al. 2023) to improve the performance of GWO.
The hybridizations in low-level to enhance the performance of GWO were introduced by using CSA in (Arora, Singh et al. 2019), SFS in (El-Kenawy, Eid et al. 2020) and HBO in (Lee, Le et al. 2022). Arora, Singh et al. (2019) used a control parameter of CSA in the position update equation of GWO to achieve a good balance between exploration and exploitation. Another improvement is related to maintaining population diversity; not all agents in the population are modified by the alpha and beta updating directions (by alpha only). El-Kenawy, Eid et al. (2020) proposed a hybridization of GWO and SFS to increase the ability of exploration and obtain the optimal solution; the SFS diffusion process is implemented by utilizing the Gaussian distribution approach for random walks in the growth process, and they have employed the crossover/mutation operations to raise the population diversity; crossover operator improved the exploitation procedure, and the mutation operator improved the exploration procedure. Lee, Le et al. (2022) suggested a hybridized GWO with HBO to enhance the chances of evading local optima and improve the effectiveness of both global and local search. The optimal solution acquired from the GWO is kept as a record. If the new solution produced by HBO is more than 90% similar to the aforementioned record, the crossover is employed. If the new solution resulting from crossover is identical to the current record, then mutation is executed.
The GA technique family conducted up to 17% of the FS-based metaheuristic algorithms (five studies). It was hybrid with PSO in (Li, Zhang et al. 2021, El-Shafiey, Hagag et al. 2022, Masrom, Rahman et al. 2022). Due to the high convergence rate of PSO, it was employed to accelerate GA convergence (Li, Zhang et al. 2021); the proposed hybrid algorithm has the ability to minimize the likelihood of encountering local minima. Meanwhile, in [16], the PSO is employed to specifically target the individuals that have been rejected in each generation. This approach aims to address the defect of premature convergence, which happens when the PSO concludes this procedure before achieving the ideal solution. Hence, (Masrom, Rahman et al. 2022) proposed adaptive GA operators to enhance the performance of PSO in tax avoidance detection. The authors introduced three types of models using hybrid PSO with adaptive crossover, adaptive mutation, or both adaptive crossover mutation.
The hybridization of GA with AOA was proposed by Ewees, Al-qaness et al. (2021). The suggested technique addressed the primary limitations of the traditional AOA by avoiding the difficulty of local search and achieving a good balance of exploration and exploitation by incorporating the mutation operator of GA with FA (Alwan, AbuEl-Atta et al. 2021). The suggested hybrid algorithm has a heightened capacity for exploration in order to prevent becoming stuck in local optimal solutions.
A summary of the metaheuristics with their respective application domains employed in the reviewed papers in this SLR is presented in Table 4. Moreover, Fig. 8 illustrates that 14% of reviewed papers (4 out of 30) were employed in the intrusion detection domain. While 27% of the FS-based metaheuristic research (8 out of 30) in this SLR was used to serve the medical domain. 50% of papers (15) suggested different metaheuristics, either improved variants of conventional algorithms or new hybrid metaheuristics and generally evaluated their performance in different domains, including medical and others. Other topics that received far less attention from researchers were engineering, financial compliance, and Sentiment classification, each with only one publication.
Table 4
Summary of hybrid algorithms with their application domains
Application Domains
|
Hybrid Algorithm
|
Reference
|
Multiple Domains
|
GWO + PSO
GWO + HHO
GWO + PSO
GWO + DTO
GWO + CSA
GWO + WOA
WOA + HHO
GWO + SFS
SCA + WOA
AOA + GA
ACO + BSO
BSO + FA
ALO + SCA
HHO + CS
DMO + SA
|
Al-Tashi, Abdul Kadir et al. (2019)
Al-Wajih, Abdulkadir et al. (2021)
El-Kenawy and Eid (2020)
Sami Khafaga, M. El-kenawy et al. (2023)
Arora, Singh et al. (2019)
Mafarja, Qasem et al. (2020)
Alwajih, Abdulkadir et al. (2022)
El-Kenawy, Eid et al. (2020)
El-Kenawy, Mirjalili et al. (2022)
Ewees, Al-qaness et al. (2021)
Liang, Wang et al. (2019)
Bezdan, Zivkovic et al. (2022)
Hans and Kaur (2020)
Houssein, Hosney et al. (2020)
Akinola, Ezugwu et al. (2022)
|
Intrusion Detection
|
GWO + ACO
ABC + AdaBoost
GWO + DTO
FA + GA
|
Almazini, Ku-Mahamud et al. (2023)
Mazini, Shirazi et al. (2019)
Alkanhel, El-kenawy et al. (2023)
Alwan, AbuEl-Atta et al. (2021)
|
Medical
|
TLBO + SSA
BeeSO + Q-Learning
PSO + GA
PSO + GA
IBCSO + WOA
DBER + DTO
BWO + IG
SDS + TS
|
Thawkar (2021)
Fajri and Wiharto (2023)
El-Shafiey, Hagag et al. (2022)
Li, Zhang et al. (2021)
Phogat and Kumar (2023)
Alhussan, Abdelhamid et al. (2023)
Alweshah, Aldabbas et al. (2023)
Shanthi, Akshaya et al. (2022)
|
Sentiment Classification
|
ABC + ICA
|
Osmani, Mohasefi et al. (2022)
|
Taxation and financial compliance
|
PSO + GA
|
Masrom, Rahman et al. (2022)
|
Engineering
|
GWO + HBO
|
Lee, Le et al. (2022)
|
RQ2: What type of hybridization is used? What are the evaluation metrics? What statistical tests are used?
Hybridization is essential in metaheuristic algorithms for solving FS problems effectively by integrating the advantages of several algorithms. The significance of hybridization resides in its capacity to enhance the overall performance and efficacy of metaheuristic algorithms. As shown in Table 5, different levels of hybridization were used in reported studies. Algorithms run simultaneously, and they can operate independently and periodically exchange information, or they might cooperate to solve a common problem; these types are parallel. Whereas the algorithms run one after another, they are sequential. The hybridization without changing the internal algorithm is known as the high-level type, while in low-level hybridization, the functionality of the two algorithms is merged. We can conclude the benefits of hybridization in these studies as follows:
• It assists in enhancing the effectiveness of the original algorithm (e.g., improving weak exploration or exploitation capabilities and improving population diversity).
• It assists in resolving the problems of premature convergence and the local optimal trap.
• It assists to strike a balance between exploration and exploitation processes.
An essential step in the classification process is evaluating a predictive model. This occurs subsequent to the construction and training of the model using a set of data. The researcher's primary focus is to assess the model's performance, usefulness, and generalizability. They also consider whether additional features are necessary and if further training is required to enhance the model's overall performance. The evaluation of FS-based metaheuristics performance involved the use of various standard measures, including mean, best, worst fitness, classification accuracy/ error, sensitivity, specificity, and standard deviation. Nevertheless, the criteria that were most frequently employed in reported studies included classification accuracy/error, mean fitness and the number of selected features, as shown in Fig. 9.
Table 5
Summary of hybridization type, statistical test, and evaluation metrics in selected studies
No
|
Reference
|
Hybridization Type
|
Statistical Test
|
Evaluation Metrics
|
Sequential
|
parallel
|
Low-level
|
High-level
|
P1
|
Al-Tashi, Abdul Kadir et al. (2019)
|
|
√
|
√
|
|
-
|
Average accuracy,
Average selected Feature size
Mean fitness
Best fitness
Worst fitness
Average computational time
|
P2
|
Al-Wajih, Abdulkadir et al. (2021)
|
√
|
|
|
√
|
Wilcoxon test
|
Average accuracy
Average selected feature size
Mean fitness
Best fitness
Worst fitness
Average computational time
|
P3
|
El-Kenawy and Eid (2020)
|
|
√
|
|
√
|
-
|
Average classification error
Best fitness
Worst Fitness
Average fitness size
Mean
Standard deviation
|
P4
|
Sami Khafaga, M. El-kenawy et al. (2023)
|
|
√
|
|
√
|
-
|
Best fitness
Worst fitness
Average error
Average select size
mean fitness size
Standard deviation
|
P5
|
Arora, Singh et al. (2019)
|
|
√
|
√
|
|
Wilcoxon and Friedman test
|
Classification accuracy
Statistical mean
Standard deviation
Average features length
|
P6
|
Mafarja, Qasem et al. (2020)
|
|
√
|
|
√
|
Wilcoxon test and F-test
|
Average classification accuracy
Average selected size
mean fitness
Average running time
|
P7
|
Alwajih, Abdulkadir et al. (2022)√
|
|
|
√
|
|
Statistical one-way ANOVA test
|
Classification accuracy
mean fitness
Average selected features
Computational time
|
P8
|
El-Kenawy, Eid et al. (2020)
|
|
√
|
|
√
|
Wilcoxon test
|
Average error
mean fitness
Mean
Best Fitness
worst Fitness
Standard Deviation
|
P9
|
El-Kenawy, Mirjalili et al. (2022)
|
√
|
|
√
|
|
Wilcoxon
one-way ANOVA
|
Average error
mean fitness
Mean
Best Fitness
worst Fitness
Standard Deviation
|
P10
|
Ewees, Al-qaness et al. (2021)
|
|
√
|
√
|
|
Non-parametric Friedman test
|
Average accuracy
Average of selected features
Mean fitness
Standard deviation
|
P11
|
Liang, Wang et al. (2019)
|
-
|
-
|
√
|
|
-
|
Accuracy
Percent rate
Recall rate
F-measures
Average time costs
|
P12
|
Almazini, Ku-Mahamud et al. (2023)
|
√
|
|
|
√
|
Non-parametric Friedman test
|
Average classification accuracy
Average number of selected features,
|
P13
|
Mazini, Shirazi et al. (2019)
|
√
|
|
|
√
|
-
|
Classification error and detection rate
Time and space complexity
Sensitivity
|
P14
|
Thawkar (2021)
|
|
√
|
|
√
|
-
|
Sensitivity
Specificity
Classification accuracy
F-score
Kappa coefficient
False Positive Rate
False Negative Rate
|
P15
|
Fajri and Wiharto (2023)
|
|
√
|
|
√
|
-
|
Accuracy
Precision
Recall
Selected feature
Execution time,
|
P16
|
El-Shafiey, Hagag et al. (2022)
|
√
|
|
|
√
|
-
|
Accuracy
Recall
Specificity
Sensitivity
ROC curve
|
P17
|
Li, Zhang et al. (2021)
|
-
|
-
|
|
√
|
-
|
Accuracy
Sensitivity
Specificity
|
P18
|
Bezdan, Zivkovic et al. (2022)
|
|
√
|
|
√
|
Wilcoxon and Friedman test
|
Average Fitness
Average Accuracy
|
P19
|
Hans and Kaur (2020)
|
√
|
|
|
|
Wilcoxon test
|
Average Accuracy
mean fitness
Worst fitness
Best fitness
Standard Deviation
Average number of features selected
F-Measure
|
P20
|
Houssein, Hosney et al. (2020)
|
|
√
|
√
|
|
-
|
Accuracy
Sensitivity
Specificity
Recall
Precision
F-measure
Worst, Best, Mean
Standard Deviation
|
P21
|
Osmani, Mohasefi et al. (2022)
|
|
√
|
√
|
|
T-test
Wilcoxon
Friedman test
|
Accuracy Precision
Recall
F-measure
|
P22
|
Akinola, Ezugwu et al. (2022)
|
√
|
|
√
|
|
Wilcoxon
Friedman mean ranking test
|
Accuracy
Average feature size
Respective algorithms’ convergence characteristics
|
P23
|
Phogat and Kumar (2023)
|
√
|
|
|
√
|
-
|
Accuracy
Specificity
Sensitivity
F-measure
MCC
Standard deviation
Optimal number of genes
|
P24
|
Alkanhel, El-kenawy et al. (2023)
|
|
√
|
|
√
|
Wilcoxon test
ANOVA test
|
mean fitness size
Average error
Standard deviation
Worst, Best, and Mean fitness
|
P25
|
Alhussan, Abdelhamid et al. (2023)
|
-
|
-
|
-
|
-
|
Wilcoxon test
ANOVA test
|
Average Error
Mean, Best, Worst
Fitness
Average fitness size Standard deviation
|
P26
|
Alwan, AbuEl-Atta et al. (2021)
|
|
√
|
√
|
|
-
|
Accuracy
Number of selected features
|
P27
|
Alweshah, Aldabbas et al. (2023)
|
√
|
|
|
√
|
-
|
Convergence speed
Classification accuracy
Average number of genes selected
mean fitness
|
P28
|
Masrom, Rahman et al. (2022)
|
|
√
|
√
|
|
-
|
Accuracy
Number of selected features
|
P29
|
Shanthi, Akshaya et al. (2022)
|
|
√
|
|
|
-
|
Accuracy
Recall
Precision
Best Fitness
|
P30
|
Lee, Le et al. (2022)
|
|
√
|
√
|
|
-
|
Mean fitness value
Average number of selected features
Average operating times
|
Statistical tests play an essential role in assessing the quality of the model. The common statistical tests used in FS with metaheuristics comprise Wilcoxon, Friedman, ANOVA and T-test. As displayed in Table 5, 14 papers were rated (47%) as having used statistical tests to evaluate their models. The most popular test is the Wilcoxon test, which was used in 37% of publications (11 out 30), followed by the Friedman test, which was employed in 6 papers. Additionally, the ANOVA was used in 4 papers.
RQ3: What are the FS techniques applied with metahumans to achieve good classification accuracy and minimum number of features? What classifiers are used? What initial population methods are used?
The filter method assesses individual features in the dataset according to their information theoretical or statistical characteristics without using any classification algorithms (Nguyen, Xue et al. 2020). It is less expensive computationally and has a faster execution time than wrapper methods due to working independently of any classifier. However, it has the drawback of ignoring the performance of the selected features (Chaudhuri and Sahu 2021). Only two studies (Almazini, Ku-Mahamud et al. 2023, Phogat and Kumar 2023) in this review use a filter approach to solve the FS problem.
Wrapper techniques necessitate a specified learning algorithm and use its performance as an assessment criterion.
This dependence criterion necessitates the predefined learning approach in FS and relies on the effectiveness of this approach when used to determine which features are chosen (Liu and Yu 2005, Zhang, Xiong et al. 2019). The majority of articles (17 out of 18 studies reported the FS method) have employed wrapper technique-based metaheuristics to decrease the number of features in classification tasks, as shown in Table 6.
Classification techniques are used to develop models that are capable of automatically learning patterns and relationships in data and predicting which class unseen instances belong to. To categorize data effectively, these techniques make use of the capabilities of statistical analysis and pattern recognition. Several classification approaches are available, each with strengths and drawbacks that are appropriate for various datasets and problem domains. Among the most popular classification approaches are KNN, SVM, ANN, NB and RF.
The KNN method stands out as one of the simplest and most extensively applied methods that integrate with metaheuristic algorithms for improving FS in classification tasks. Due to its efficacy and stability, it was the most commonly used classification method, appearing in 18 papers, as shown in Table 6. Only one study used KNN with other classifiers when building the fitness function, while 17 papers reported that
Table 6
Summary of feature selection approach, classifier, and population initialization method in selected studies
No
|
Reference
|
FS Approach
|
Classifier
|
Population Initialization Method
|
P1,
P2,
P3, P6, P7, P8, P10, P17, P18, P22, P27
|
Al-Tashi, Abdul Kadir et al. (2019), Al-Wajih, Abdulkadir et al. (2021), El-Kenawy and Eid (2020), Mafarja, Qasem et al. (2020), Alwajih, Abdulkadir et al. (2022), El-Kenawy, Eid et al. (2020), Ewees, Al-qaness et al. (2021), Li, Zhang et al. (2021), Bezdan, Zivkovic et al. (2022), Akinola, Ezugwu et al. (2022), Alweshah, Aldabbas et al. (2023)
|
Wrapper
|
KNN
|
Random
|
P4,
P5, P11, P19, P24, P25
|
Sami Khafaga, M. El-kenawy et al. (2023), Arora, Singh et al. (2019), Liang, Wang et al. (2019), Hans and Kaur (2020), Alkanhel, El-kenawy et al. (2023), Alhussan, Abdelhamid et al. (2023)
|
-
|
KNN
|
Random
|
P9
|
El-Kenawy, Mirjalili et al. (2022)
|
-
|
-
|
Random
|
P12
|
Almazini, Ku-Mahamud et al. (2023)
|
Filter
|
SVM
|
Heuristic
|
P13
|
Mazini, Shirazi et al. (2019)
|
Wrapper
|
-
|
Random
|
P15
|
Fajri and Wiharto (2023)
|
Wrapper
|
SVM, RF, LGBM and XGBoost
|
Random
|
P16
|
El-Shafiey, Hagag et al. (2022)
|
Wrapper
|
RF
|
Random
|
P20
|
Houssein, Hosney et al. (2020)
|
-
|
SVM
|
Chaotic maps
|
P21
|
Osmani, Mohasefi et al. (2022)
|
Wrapper
|
SVM
|
Random
|
P23
|
Phogat and Kumar (2023)
|
Wrapper, Filter
|
ANN
|
Random
|
P26
|
Alwan, AbuEl-Atta et al. (2021)
|
-
|
NB
|
Random
|
P28
|
Masrom, Rahman et al. (2022)
|
-
|
KNN, SVM, RF
|
Random
|
P29
|
Shanthi, Akshaya et al. (2022)
|
-
|
ANN, DT, NB
|
Random
|
P30
|
Lee, Le et al. (2022)
|
Wrapper
|
SVM, LDA
|
Random
|
kNN classification accuracy is only used to construct fitness functions. In the majority of these studies (9 papers), a common practice is to assign a value of 5 for the parameter "k" (5 neighbors) as a suitable value to achieve high accuracy. Additionally, the SVM attracted attention from 6 studies due to SVM’s advantages, including its efficacy in space with high dimensions, adaptability, and memory efficiency, regarding other classifiers used in a few studies, as displayed in Fig. 10.
Naturally, P-metaheuristics are more exploration search algorithms because of the wide diversity of initial populations. This stage is essential to the algorithm's efficiency and efficacy. Insufficient diversity in the initial population can lead to premature convergence (Talbi 2009). Usually, the initial population is generated at random, as shown in Table 6, around 93% (28 of 30 papers) initialized the population randomly. While Almazini, Ku-Mahamud et al. (2023) initialized the wolves' population in GWO by using a heuristic-based ACO aiming to generate solutions by choosing features that optimize the classification accuracy. Houssein, Hosney et al. (2020) introduced the utilization of chaotic maps for initializing solutions for updating the control energy parameters in HHO in order to prevent the occurrence of local optima and premature convergence.
RQ4: What are the crucial/optimal parameters values? Which transfer and fitness functions are used?
Controlling the parameters in metaheuristic algorithms is among the most crucial areas of research. The common parameters of all population-based metaheuristic algorithms are the size of the population and the number of iterations. For more efficient and high-quality computations, it is essential to fine-tune these parameters (Črepinšek, Liu et al. 2012). Furthermore, running the algorithm multiple times enables it to explore the search space widely, helping to reduce the impact of randomness and enhance the probability of discovering optimum or nearly optimal solutions.
Table 7
Summary of transfer, fitness functions and values of crucial parameters in selected studies
No
|
Reference
|
Transfer Function
|
Fitness Function
|
Parameters Sitting
|
Population size
|
Iteration
|
Run Times
|
P1, P2, P7
|
Al-Tashi, Abdul Kadir et al. (2019), Al-Wajih, Abdulkadir et al. (2021), Alwajih, Abdulkadir et al. (2022)
|
S-Shaped (Sigmoid)
|
\(F=\alpha E\left(D\right)+\beta \frac{\left|S\right|}{\left|T\right|}\)
α = [0,1] and β = (1 − α)
E(D): Classification error rate
|S|: features selected subset
|T|: whole features in the dataset
|
10
|
100
|
20
|
P3
|
El-Kenawy and Eid (2020)
|
S-Shaped (Sigmoid)
|
\(F=\alpha E\left(D\right)+\beta \frac{\left|S\right|}{\left|T\right|}\)
|
10
|
80
|
20
|
P4
|
Sami Khafaga, M. El-kenawy et al. (2023)
|
20
|
50
|
-
|
P5
|
Arora, Singh et al. (2019)
|
7
|
100
|
30
|
P6
|
Mafarja, Qasem et al. (2020)
|
20
(10, 20, 30, 40, 50)
|
100
(50, 75, 100, 150, 200)
|
30
|
P8
|
El-Kenawy, Eid et al. (2020)
|
10
|
80
|
20
|
P9
|
El-Kenawy, Mirjalili et al. (2022)
|
S-Shaped (Sigmoid)
|
-
|
10
|
100
|
20
|
P10
|
Ewees, Al-qaness et al. (2021)
|
-
|
\(F=\alpha E\left(D\right)+\beta \frac{\left|S\right|}{\left|T\right|}\)
|
25
|
100
|
13
|
P11
|
Liang, Wang et al. (2019)
|
-
|
-
|
150
|
300
|
100
|
P12
|
Almazini, Ku-Mahamud et al. (2023)
|
-
|
𝐹𝑖𝑡𝑛𝑒𝑠𝑠 = 𝐴𝐶. 𝑎 + (1/𝑁𝐹). 𝑏
where NF is the features subset, and AC is the
accuracy and the values of parameters a, and b are in [0,1]
|
-
|
-
|
-
|
P13
|
Mazini, Shirazi et al. (2019)
|
-
|
\({fit}_{i}=\left\{\begin{array}{c}\frac{1}{(1+f\left({x}_{i} \right))}, f\left({x}_{i}\right)\ge 0\\ 1+\left|f\left({x}_{i}\right)\right|, f\left({x}_{i}\right)<0\end{array}\right.\)
|
-
|
150, 200, 250, 500
|
-
|
P14
|
Thawkar (2021)
|
-
|
\(f\left({x}_{i}\right)=\left({E}_{xi}\text{*}\right(1+0.5\text{*}\frac{S}{N}\) ))2
f(Xi) represents the cost of Xi. EXi is the performance value or classification accuracy of the ith feature set. S be the number of selected features, and N be the number of features in the original database.
|
25
|
25, 30, 40, 50, 100
|
-
|
P16
|
El-Shafiey, Hagag et al. (2022)
|
-
|
-
|
50
|
30
|
5
|
P18
|
Bezdan, Zivkovic et al. (2022)
|
S-Shaped (Sigmoid)
|
\(F=\alpha E\left(D\right)+\beta \frac{\left|S\right|}{\left|T\right|}\)
|
8
|
70
|
20
|
P19
|
Hans and Kaur (2020)
|
S-Shaped (sigmoid)
V-Shaped (tanh)
|
\(\text{F}=\alpha \frac{\left|S\right|}{\left|T\right|}+\beta E\left(D\right)\)
|
20
|
100
|
10
|
P20
|
Houssein, Hosney et al. (2020)
|
-
|
\(F=\alpha +\beta \frac{\left|R\right|}{\left|C\right|}-G\)
\(F>T\)
R: classification error, C: total number features, Β: subset length,
α: classification performance defined in the range [0, 1]. T: is a necessary condition, G: group column for the specific classifier. Each step in the algorithm is compared with T, where the obtained fitness value must be greater than in order to maximize the solution.
|
30, 41
|
100, 1000
|
30
|
P22
|
Akinola, Ezugwu et al. (2022)
|
Binarization
Function
|
\(F=\mu .\left(1-{A}_{c}\right)+\left(1-\mu \right).\frac{{b}_{s}}{{D}_{t}}\)
(1-Ac): classification error,
bs: feature subset dimension,
Dt: total number of attributes,
µ ∈ [0,1]
|
10, 20, 30,40, 50
(10best)
|
50
|
10
|
P23
|
Phogat and Kumar (2023)
|
S-shaped threshold function
|
\(\text{F}=\alpha \frac{\left|S\right|}{\left|T\right|}+(1-\alpha )\varGamma\)
Γ: classification accuracy
|
100, 200
|
20
|
-
|
P24
|
Alkanhel, El-kenawy et al. (2023)
|
S-shaped (sigmoid)
|
\(F=\alpha {E}_{R}\left(D\right)+\beta \frac{\left|R\right|}{\left|C\right|}\)
|
-
|
-
|
-
|
P25
|
Alhussan, Abdelhamid et al. (2023)
|
\(F=\alpha {E}_{R}\left(D\right)+\beta \frac{\left|R\right|}{\left|C\right|}\)
|
-
|
500
|
10
|
P26
|
Alwan, AbuEl-Atta et al. (2021)
|
min f(x) = (100 − Accuracy)
|
10, 20, 30, 40
|
500
|
15
|
P27
|
Alweshah, Aldabbas et al. (2023)
|
-
|
\(F=\alpha {E}_{R}\left(D\right)+\beta \frac{\left|R\right|}{\left|C\right|}\)
|
10
|
100
|
-
|
P28
|
Masrom, Rahman et al. (2022)
|
-
|
-
|
10, 20,30
|
100–1000
(600 best)
|
-
|
P30
|
Lee, Le et al. (2022)
|
S-Shaped (sigmoid)
|
\(F=\frac{{N}_{T}}{{N}_{T}+{N}_{F}}\text{*}100\text{\%}\)
NT: number of instances that is true predicted,
NF: number of instances that is false predicted.
|
10
|
100
|
30
|
Table 7 provides a concise overview of the essential parameters that are necessary for all studies on metaheuristics-based FS within the scope of this SLR. These parameters comprise the population size, the maximum number of iterations, and the number of executions (run times). Nevertheless, it has been noted that the metaheuristic-based FS algorithms described in (Li, Zhang et al. 2021, Shanthi, Akshaya et al. 2022, Alkanhel, El-kenawy et al. 2023, Almazini, Ku-Mahamud et al. 2023) lack the essential statistical analysis to illustrate the significance and superiority of these parameters, which is a crucial component of empirical research. In the majority of studies, 43%, the iteration parameter is assigned a value of 100, and 33% of studies used 10 agents as a population size.
Most metaheuristic algorithms are designed for solving continuous problems. However, FS is a binary problem, where each possible solution is represented by a d-dimensional vector consisting of either 0 or 1 values. In this representation, 0's indicate rejected features and 1's represent selected features. Hence, embracing a binary representation is a crucial step in the FS domain. Hence, many researchers frequently use S-shaped and V-shaped transfer functions for this task.
As shown in Table 7, 15 out of 16 studies that reported the used transfer function employed S-shaped transfer functions, which is a family of sigmoid approaches, making it the dominant method. It is worth mentioning the presence of V-shaped functions, which are used only in (Hans and Kaur 2020). Akinola, Ezugwu et al. (2022) used a binarization function for solution representation in the range [0, 1], where the selected feature is represented by 1 if the position index is equal to 0.5 and above, whereas a position index less than 0.5 denotes a rejected feature.
The fitness function determines the quality of the solution based on the features that have been chosen. The formulation of an efficient fitness function is absolutely necessary for the success of the process (Talbi 2009). There are different fitness functions that are frequently utilized in metaheuristic algorithms for the purpose of FS. The common fitness function used by recent studies reviewed in this SLR in 73% focuses on classification error and the number of selected features.
RQ5: What are the challenges in the current studies and their future directions?
Although the selected studies prove effective in solving the FS problem, they still suffer from shortcomings that indicate the need for more investigation in future studies.
• Selected studies are single objective that employed only one fitness function. In contrast, FS serve two purposes: maximizing the accuracy of classification and decreasing the number of attributes. These two objectives often conflict with each other, necessitating an optimization algorithm to determine the ideal trade-offs between them. Regrettably, there is a limited amount of research on multi-objective FS. Therefore, the use of hybridization of metaheuristics in multi-objective FS is an open research topic for scholars.
• Most selected studies use the wrapper approach as traditional FS. Hence, the hybridize wrapper and filter is an open research topic.
• Some selected studies hybridize the original metaheuristic algorithms without tuning the crucial parameters. Therefore, the appropriate strategies for tuning crucial parameters in hybrid metaheuristic algorithms are of interest for further study.
• Further exploration of various application fields, such as finance, cybersecurity, and engineering, should be expanded. These domains exhibit high-dimensional data in various media, encompassing image, text, and audio.
• Since computational Complexity is a significant concern in hybrid metaheuristics for FS, some reported studies have demonstrated improvement but have issues related to computational burden and time complexity. It is advisable to provide more suitable methods to reduce it.