Hybrid Metaheuristic Algorithms for Feature Selection in Classification: A Systematic Literature Review

doi:10.21203/rs.3.rs-4286826/v1

Download PDF

Research Article

Hybrid Metaheuristic Algorithms for Feature Selection in Classification: A Systematic Literature Review

https://doi.org/10.21203/rs.3.rs-4286826/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

The effectiveness of a machine learning model is significantly impacted by feature selection. Feature selection is one of the most popular and highly effective techniques for eliminating irrelevant and redundant features to enhance the relevance of the collected data and improve the effectiveness of classification tasks. Feature selection is challenging because of the intricate relationship between features and large search space, which grows exponentially with the number of existing features in the original dataset. Metaheuristic algorithms are the most effective techniques for managing feature selection due to their robustness, intelligibility, and effectiveness in solving intricate optimization problems. Recent studies have focused on using hybrid metaheuristics as feature selection approaches. This systematic literature review explores recent studies from 2019 to 2023 that used hybrid metaheuristic algorithms for feature selection in classification. This paper aims to understand the existing hybrid algorithms, the goal of hybridization, the type of hybridization, and their application domains. Moreover, crucial parameters, fitness and transfer functions, initial population method, feature selection approach, classification algorithm, evaluation criteria, and statistical test are also investigated in this paper. A list of 30 relevant papers in line with the topic were extracted and examined to develop new insights in the domain of feature selection in classification. The focus is on a single fitness function (single objective). However, feature selection can be seen as a multi-objective problem, making hybridization in multi-objective feature selection problems a future research work for scholars.

Binary

classification

feature selection

hybridization

metaheuristic

Classification is a machine learning task in which algorithms for machine learning are utilized to extract knowledge from a set of given data (Soui, Mansouri et al. 2021). It is a supervised learning that aims to develop a decision model that searches for patterns in training set data to predict the class or category for new unknown data (Cervantes, Garcia-Lamont et al. 2020). These classes can be binary (e.g., spam or not spam) or multiclass (e.g., distinguishing flower species). Classification is a highly studied subject in data mining and machine learning. It has a wide range of applications across various domains, such as image recognition, speech recognition, and fraud detection (Bhavan, Chauhan et al. 2019, Das, Das et al. 2019, Ileberi, Sun et al. 2022).

Numerous different types of classification techniques exist. These techniques have been extensively researched and evolved over the years with the intention of improving accuracy, efficiency, and interpretability. Nonetheless, all these algorithms reflect problems with various levels of complexity (Kowsari, Meimandi et al. 2019). With the significant increase in the number of datasets, managing the classifiers implementation may be challenging, and it would be helpful if the desirable features were separated from the irrelevant and redundant ones. The procedure of choosing the relevant features is referred to as feature selection.

Feature selection (FS) aims to reduce features by removing redundant and irrelevant attributes from certain datasets (Al-Tashi, Abdul Kadir et al. 2019). In real-world applications, data representation commonly uses a significant number of features, some of which could be redundant. This redundancy means that certain features can perform the same function as others, and thus, unnecessary features can be eliminated. However, it is important to take into account the interdependence of relevant attributes, as they can impact output and provide essential data that would be lost if any were eliminated (Bell and Wang 2000).

Feature selection has been employed in many machine learning and data mining problems, especially with high-dimensional data. It is essential for four main reasons. The first is to reduce overfitting (Meyer, Reudenbach et al. 2018). A model can excel in the training data during the training stage but performs poorly in the testing stage. The model can improve generalization to new data by choosing just the most relevant features. Secondly, reducing the Complexity of a model in terms of computing will increase its speed and efficiency (Alwajih, Abdulkadir et al. 2022). This is particularly important in large-scale applications where the quantity of data and features is substantial. The third reason is to improve the interpretability of the model (Pirovano, Heuberger et al. 2021). It is simpler to comprehend the association between the input variables and the resulting output when only the most crucial features are chosen, which may assist in making wise decisions and boost a model's performance. Finally, FS helps to mitigate the "curse of dimensionality” by reducing the number of attributes used in datasets (El-Kenawy and Eid 2020). Overall, FS is a powerful classification tool that can enhance the classification model's efficiency, interpretability and effectiveness.

Feature selection can be reviewed using three primary techniques: filter, wrapper, and embedded. The filter technique is independent of the machine learning algorithm that is being utilized. It has the merit of being fast, computationally inexpensive, and able to scale up to large datasets (El-Kenawy and Eid 2020). Despite these advantages, it provides a poor-quality solution. The wrapper technique selects a subset of attributes based on the model's predictive power. This technique selects features by iterating through multiple combinations of features and evaluating their performance using a chosen machine learning algorithm (Zhang, Xiong et al. 2019). Embedded techniques select features during training rather than features before training the model. The benefits of embedded techniques are that they can be computationally effective since FS is incorporated into the training process. Additionally, they can be used to build more interpretable models since they only include the most relevant features. However, they may not perform similarly to wrapper methods in some cases, mainly when many features exist.

Finding the most practical features can be complex and computationally demanding. Despite various suggested approaches, most of them still encounter difficulties, such as becoming stuck in local optimal solutions and high computational expense problems mainly because of the large search area. As a result, it is necessary to possess a global search mechanism that effectively manages FS tasks. The most effective techniques for handling FS are metaheuristic algorithms because of their robustness, intelligibility, and efficiency in resolving complex optimization issues (Alzubi, Anbar et al. 2022, El-Shafiey, Hagag et al. 2022, Almazini, Ku-Mahamud et al. 2023).

Several metaheuristic methods have previously been employed to solve the FS problems. Scholars are currently encountering a challenge in effectively applying and providing accurate recommendations for modern metaheuristics because every metaheuristic algorithm has pros and cons (Piri, Mohapatra et al. 2023). Therefore, most recent studies have focused on using hybrid metaheuristics to enhance the algorithm's performance, adaptability, and efficiency by integrating strengths from different metaheuristic algorithms. These hybridizations increase the probability of finding high-quality feature subsets.

A. Previous review works summary

This section summarizes recent reviews of studies using metaheuristic algorithms for FS tasks. Review papers (Abiodun, Alabdulatif et al. 2021, Agrawal, Abutarboush et al. 2021, Rostami, Berahmand et al. 2021, Sharma and Kaur 2021, Akinola, Ezugwu et al. 2022, Alyasiri, Cheah et al. 2022, Dokeroglu, Deniz et al. 2022, Pham and Raahemi 2023, Piri, Mohapatra et al. 2023) evaluating the hybrid metaheuristic algorithms in addressing the FS problem and details of hybridization are limited in the existing literature.

A comprehensive survey on using metaheuristic algorithms for the FS problem was conducted by Dokeroglu. The authors focus only on the most remarkable recent 22 metaheuristics that have been developed in the last two decades, from HS in 2001 to HHO and BOA in 2019. Agrawal, Abutarboush et al. (2021) reviewed the binary metaheuristics developed for FS from 2009 to 2019. The discussion focused on the performance of algorithms and analyzed the classifiers, datasets, transfer function, and evaluation metrics, as well as identified the challenges and emphasized areas where further research is needed. Furthermore, a case study on UCI datasets was provided.

A systematic literature review (SLR) on FS-based metaheuristics in text classification has been conducted by Abiodun, Alabdulatif et al. (2021) and Alyasiri, Cheah et al. (2022). The SLR on metaheuristic-based FS approaches for developing text classification was presented by Abiodun et al. (2021). The review answered many questions, such as the datasets, classifiers, sub-field of metaheuristics, and its impact on the accuracy of text classification. On the other hand, a total of 37 papers from 2016 to 2021 that employed metaheuristic algorithms to solve FS problem were selected by Alyasiri et al. (2022). It significantly contributes to the current body of knowledge to comprehensively understand the methods and their advantages and disadvantages in terms of decreasing the computational resources needed to execute text classification. However, this SLR examines FS techniques-based metaheuristic algorithms for text classification in the English language text.

A comprehensive analysis of metaheuristics used for FS issues and their categorical list was presented by Sharma and Kaur (2021). The authors highlighted the performance and role of binary and chaotic variants of different nature-inspired metaheuristic algorithms in 10 major research domains. Metaheuristic algorithms for different disease predictions are also illustrated. However, the study did not analyze the transfer function or fitness employed or the parameter values that provide an important basis for the improvement of metaheuristics. A review of swarm intelligence algorithms for FS and their categorization is performed in Rostami, Berahmand et al. (2021). The authors focused on the strengths and drawbacks of 11 swarm intelligence-based FS algorithms studies that were evaluated. The algorithms are (ACO, PSO, DE, ABC, SSA, WOA, GWO, BA, GSA, FA, and COA) with six medical datasets and three classifiers. The review scope is considered narrow because it did not determine the open challenges and future directions of FS base swarm intelligence algorithms. Besides these shortcomings, there is also no discussion about transfer functions, population initialization method, parameters setting and hybridization.

Akinola, Ezugwu et al. (2022) provide a review of the literature (from 200 to 2022) on applying metaheuristic algorithms to multi-class FS problems, which enable classifiers to choose optimal or nearly optimal features with incredible speed and accuracy. The authors emphasize the review of metaheuristic algorithms and details of their six categories (evolutionary, swarm, human, physics, system, and bio)-based algorithms, as well as the categorization of the FS approaches. The SLR is performed on five prime digital databases of engineering and science to identify the bio-inspired algorithms that are employed in FS and their domains, such as healthcare, text classification, image processing, and cybersecurity (Pham and Raahemi 2023). It detailed the categories of bio-inspired FS algorithms, bio-inspired algorithms with frequency of occurrence and improvement techniques. Piri, Mohapatra et al. (2023) provide a critical assessment of the current hybrid FS based metaheuristic approaches, considering 35 studies published between 2009 and 2022. The review presents a thorough picture of the metaheuristic algorithms utilized for hybridization and also discusses a range of different classifiers and datasets, the application fields of the corresponding approaches, their fitness functions and evaluation metrics, and the various application fields.

To conclude, the reviews mentioned above serve as good resources for FS-based metaheuristic algorithms, and the summary of the reviews is presented in Table 1. The reviews offer valuable information and discussion about the challenges and future directions for future research. Nevertheless, none of the review papers concentrates on FS based hybrid metaheuristic algorithms and the essential details of hybridization in classification.

B. Scope of discussion

This systematic literature review focuses mostly on using hybrid metaheuristic algorithms for FS in classification. The aim is to investigate the algorithms, datasets, application domains, transfer function, fitness function, classifiers, FS approaches, population initialization method, and values of crucial parameters (population size, iterations, run times). In addition, important details of hybridization include the type (parallel, sequential, low-level, high-level), research trends and gaps.

C. Our contributions

This SLR presents a complete analysis and synthesis of five years (2019–2023) of studies that use hybrid metaheuristic algorithms to solve the FS problem in classification tasks. This attempt will yield advantages for forthcoming investigations in this domain and help determine a suitable modelling methodology for classification.

The essential contributions of this review are as follows:

Provide a comprehensive and critical review of 30 papers that have used hybrid metaheuristic algorithms to solve FS in classification. In addition, the result of an analysis of the fundamental information regarding the purpose of hybridization, datasets utilized, and application domains in the selected studies is provided.
Identify studies that have used statistical tests and highlight the most common evaluation metrics used to assess the performance of algorithms, analyze and summarize the type of hybridization used.
Conduct an analysis and summarize of the FS approach, machine learning techniques used to create fitness functions, and methods used to initialize the population.
Investigate the employed transfer function, fitness function and values of crucial parameters.
Address the challenges and potential future directions of the metaheuristics-FS field, which might serve as a reference framework for future studies.

A. Metaheuristic optimization algorithms

Optimization plays a significant role in everyday life, and numerous individuals apply optimization techniques to their daily routines. In computing, optimization refers to the process of enhancing system or application performance while minimizing the usage of resources (Abd Elaziz, Abualigah et al. 2021). Typically, there are two optimization techniques. The first category is classified as deterministic optimization techniques, whereas metaheuristic optimization techniques fall under the second category.

Table 1 Summary of previous review papers on metaheuristic for feature selection

Reference	Period		Aim	Scope	Analysis
Piri, Mohapatra et al. (2023)	2009-2022	Systematic Review for hybridization of different metaheuristics that have been employed for FS.		Peer-reviewed publications in multiple domains	-Used algorithms -FS approaches -Classifiers -Datasets -Applications -Evaluation metrics -Schemes of hybridization -Fitness function details -Issues and challenges -Future directions
Pham and Raahemi (2023)	2020-2023	Systematic Review to Identify of the bio-inspired algorithms that are employed in FS and their domains of application.		Peer-reviewed journals and conference publications in multiple domains. It does not cover hybridization between a bioinspired and a non-bioinspired method.	Algorithm and improvement technique FS approaches Parameters (population size and max iteration) Statistical test Application domains Data type Classifiers Transfer function Challenges and future directions
Alyasiri, Cheah et al. (2022)	2016-2021	Systematic Review to explore available studies of different metaheuristic (adaptation, modification, and hybridization) algorithms used for FS.		Papers indexed in Scopus and ISI in the English text classification domain.	Algorithms FS approaches Classifiers Datasets Evaluation metrics Compared with Number of population and generation Performance Gaps in the current studies Future directions
Akinola, Ezugwu et al. (2022)	2000-2022	A systematic survey of literature to solve multi-class FS problems using metaheuristics that can help classifiers choose optimal or nearly optimal features more quickly and accurately.		Papers on metaheuristics used for multi-class FS problems in multiple domains	Algorithms Application domains Datasets Classifier Issues or challenges Future directions
Sharma and Kaur (Sharma and Kaur 2021)	2000-2019	Systematic Review to provide a comprehensive analysis of nature-inspired metaheuristic algorithms used in the domain of FS.		Multiple domains	Algorithms Metaheuristics variant (binary or chaotic) FS approaches performance Datasets Evaluation metrics Compared with Application Challenges and future directions
Abiodun, Alabdulatif et al. (2021)	2015-2021	Systematic Review of the present state of FS with respect to metaheuristics and hyper-heuristic methods.		Text classification	FS method Metaheuristic performance Classifier Dataset Evaluation metrics Shortcomings and contributions Challenges and future directions
Rostami, Berahmand et al. (2021)	2008-2020	Review of swarm intelligence-based FS techniques.			Algorithms FS method Classifiers Application Search strategy Weakness and strength
Dokeroglu, Deniz et al. (2022)	last two decades	A comprehensive survey on recent metaheuristics for FS.		Recent 22 metaheuristics for FS (2001-2019)	Exploration/exploitation operators Transfer functions Selection approachs Fitness value evaluation Parameter setting methods Challenges and future research topics
Agrawal, Abutarboush et al. (2021)	2009-2019	Review the binary metaheuristic algorithms developed for FS.			Algorithm Classifiers Evaluation metrics Datasets Transfer function Compared with Challenges and future works

Metaheuristic optimization algorithms, known as stochastic optimization algorithms, differ from their deterministic counterparts as these algorithms embrace some random search in seeking the optimal solution. Similar to deterministic algorithms, these algorithms are iterative algorithms comprising multiple iterations. The categories of metaheuristic algorithms are summarized in Fig. 1.

Metaheuristic optimization algorithms are global optimization approaches, and they can be implemented in real-world issues that have several local optima. Furthermore, these methods are simple to use and do not need the objective function's derivatives (Talbi 2009) and having the ability to perform global search do not experience the issues associated with gradient-based techniques (Akay, Karaboga et al. 2022).

Two key concepts emerge (i.e. exploration/ diversification and exploitation/intensification in metaheuristic optimization algorithms. Exploration refers to the ability of the algorithm to widely search for promising areas throughout the entire search space, while exploitation refers to the capability of the algorithm to carry out a focused local search within the promising regions found in the process of exploration, aiming to identify the optimal solution inside these areas. In accordance with the distinct stages of exploration and exploitation, metaheuristic algorithms are categorized as either population-based or single solution-based metaheuristic algorithms.

1) Single solution-based metaheuristic algorithms

Algorithms such as simulated annealing (SA) (Kirkpatrick, Gelatt et al. 1983, Yu, Redi et al. 2017), tabu search (TS) (Glover 1989, Prajapati, Jain et al. 2020) and iterated local search (ILS) (Palhazi Cuervo, Goos et al. 2014), which fall under the category of single solution-based approach, manipulate a singular solution during the search process. Generally, the foundational single-solution-based metaheuristic (S-metaheuristic) tends to emphasize exploitation, which focuses on intensifying the search in local areas. In contrast, population-based metaheuristics place a greater emphasis on exploration, which potentially leads to more extensive searches across the solution space. However, they may be prone to get trapped in local optima, which complicates the identification of the global minimum. This is attributed to their generation of a single, randomly generated solution for a specific problem (Dhiman and Kaur 2018).

2)Population-based metaheuristic algorithms

Population-based metaheuristics (P-metaheuristic) begin the search within the initial population of solutions and start iteratively replacing the existing population with the developed solutions. New populations of solutions are developed throughout the generation phase. A selection from the present and new populations is made during the replacement phase. This procedure is carried out until the stopping criterion is reached (Al-Wajih, Abdulkadir et al. 2021). P-metaheuristic are explorative and exhibit the capability to discover an optimal or suboptimal solution, which may align closely with the true optimum or exist in its vicinity (Dhiman and Kaur 2018). Consequently, there has been a heightened interest in P-metaheuristic algorithms among contemporary researchers.

P-metaheuristic algorithms are categorized using the principles of evolutionary algorithms, swarm intelligence of particles, the biological behavior of bio-inspired algorithms, and physics laws-based algorithms.

Evolutionary algorithms (EA) are P-metaheuristic algorithms that imitate the survival of the fittest principle. The methodologies of these algorithms draw inspiration from natural evolutionary processes such as mutation, reproduction, and recombination, which are chosen processes that favor the most qualified candidates (Simoncini and Zhang 2018). Some well-known algorithms adopt evolutionary concepts such as GA (John 1992, Ileberi, Sun et al. 2022), Differential Evolution (DE) (Li, Nichols et al. 2009, Pant, Zaheer et al. 2020), and Evolutionary Strategy (ES) (Beyer and Schwefel 2002, She 2021).

Swarm intelligence seeks to model populations of agents endowed with the capacity to interact and self-organize for problem-solving purposes. These algorithms draw inspiration from the intelligent behaviors observed in swarms, flocks, or herds. These algorithms have gained great attention and are utilized in many events as they are robust, flexible, and have characteristics of distributive, parallel, and self-organization processes (Tang, Liu et al. 2021). Consequently, several related methods to collective behavior have emerged.

Bio-inspired algorithms are metaheuristic algorithms that mimic the behavior, processes, and principles found in biological systems. These algorithms are popularly utilized for optimization, such as GWO (Mirjalili, Mirjalili et al. 2014, Fan, Huang et al. 2021), HHO (Heidari, Mirjalili et al. 2019, Shehab, Mashal et al. 2022), bat

algorithm(BA) (Cui, Li et al. 2019) and firefly algorithm (FA) (Fister, Yang et al. 2013, Yang and Slowik 2020), thus assisting in reducing the classification error and the processing time (Yadav and Vishwakarma 2020). These algorithms possess extremely significant academic and practical value due to their robustness, simplicity, and efficiency in resolving complex optimization problems (Almazini and Ku-Mahamud 2021).

The metaheuristic optimization technique, based on physics, is an algorithm that utilizes principles and laws of physics to solve difficult issues, for example, the gravitational search algorithm(GSA) (Rashedi, Nezamabadi-Pour et al. 2009, Magdy, El Marhomy et al. 2019), black hole algorithm(BH) (Abdulwahab, Noraziah et al. 2019) and galaxy based search algorithm (GbSA) (Shah-Hosseini 2011, Sardari and Moghaddam 2017). Although they do not exactly fit into the existing categories of biologically inspired techniques, physics-based algorithms typically fall into the areas of metaheuristics and computational intelligence. They may just as readily be referred to as nature-inspired algorithms in light of this. These algorithms are effective and reliable for handling complex, high-dimensional problems.

B. Common framework of metaheuristics

The general flowchart representing the primary tasks carried out by metaheuristic algorithms is depicted in Fig. 2 (adopted from (Dokeroglu, Deniz et al. 2022)). In the first step, the population is initialized, and the parameters, such as the termination threshold and the number of iterations, are initialized. Typically, the initial population is produced in a random manner with the objective of encompassing a wide range of regions inside the search space. Then, the candidates' fitness values are computed by the fitness function, which is a metric used to assess the performance of individual solutions. The metaheuristic algorithms perform exploration and exploitation operators to produce new candidate solutions. Due to the high computational cost of these algorithms, their new variants, such as parallel and dynamic, are faster and can yield better solutions because they conduct a greater number of fitness evaluations within a shorter time (Dokeroglu, Deniz et al. 2022).

C. Exploration and exploitation

Exploration and exploitation are the two most essential operations in optimization algorithms. Exploration (diversification) refers to the ability of the algorithm to widely search for promising areas throughout the entire search space. In contrast, exploitation (intensification) refers to the capability of the algorithm to carry out a focused local search within the promising regions found in the process of exploration, aiming to identify the optimal solution inside these areas (Morales-Castañeda, Zaldivar et al. 2020). Balancing exploration and exploitation is a critical aspect of achieving good performance. For all optimization algorithms to converge efficiently toward high-quality solutions, they must consider this balancing between the processes of exploration and exploitation in a search area (Mittal, Singh et al. 2016, Arora, Singh et al. 2019).

The optimal balance between exploration and exploitation does not to be 50% of the entire optimization time. This issue should be effectively resolved by the dynamics of the suggested algorithm. It is important to explore the entire search space as much as possible and avoid being trapped in local optima during the search (Dokeroglu, Deniz et al. 2022). Through the concept of hybridization metaheuristics can adjust these phases adaptively and provide much better performance.

D. Premature convergence

Convergence in metaheuristics is generally rapid when the best location is in close proximity to the initial solutions (Ekinci, Izci et al. 2023). Premature convergence in swarm optimization algorithms, where the convergence in optimization algorithms is the point at which the algorithm ceases to make further improvements and achieves a stable solution. Immature convergence occurs when the algorithm converges to a sub-optimal solution too quickly without exploring the search space thoroughly. This convergence can happen when the algorithm gets stuck in a local minimum rather than determining the global minimum (Fister, Yang et al. 2013, Shami, El-Saleh et al. 2022). The researchers have been interested in improving the global search ability in metaheuristic algorithms and avoiding premature convergence.

E. Local optima stagnation

Local optimum, in optimization issues, refers to a solution that is the optimal possible solution within a certain region of the solution space, but it is not the globally optimal solution (Baxter 1981, D'Angelo and Palmieri 2021). It occurs when a particular algorithm or optimization method converges to a solution that is the best within a certain proximity but fails to consider other solutions that may exist outside of that proximity. When this happens, the algorithm determines the global minimum incorrectly. The position and the minimum level are both inaccurate, and this leads to wrong outcomes. The issue of local optima in FS prevents the identification of the most crucial attributes, but it chooses the features with higher classification error, resulting in diminished algorithm performance. The visualization of local and global optima is shown in Fig. 3.

F. Hybridization

Hybridization, in the context of metaheuristic algorithms, refers to the merging of various optimization approaches or algorithms to generate a novel and more efficient algorithm that capitalizes on the strengths of its individual components. Metaheuristics are sophisticated techniques developed to explore the search space in order to find the optimal solutions for optimization problems (Talbi 2009).

Hybridization can occur at various levels; these levels are categorized as low-level and high-level hybridization. These terms refer to the level of integration and interaction among various algorithms or components. The low-level hybridization focuses on the functional composition of a single metaheuristic algorithm. In this hybrid type, a specific function of a metaheuristic is substituted with another metaheuristic. Whereas in high-level hybrid algorithms, each metaheuristic operates independently and is self-contained. There is no connection to a metaheuristic's internal operations (Talbi 2009, Alobaedy and Ku-Mahamud 2015).

Hybridization is often used to overcome the limitations of individual algorithms, like the balance between exploration and exploitation, as well as to avoid the problems of premature convergence and being stuck in local optima (Arora, Singh et al. 2019, El-Shafiey, Hagag et al. 2022). The selection of hybridization depends on the characteristics of the optimization issue and the capabilities of the individual algorithms involved. In recent years, several studies have embraced hybrid algorithms as a way to address a wide range of challenges.

G. Feature selection

The FS is a significant method for decreasing data dimensionality by elimination the redundant and irrelevant features as a pre-processing phase. The FS methods are intended to choosing the most relevant attributes to minimize the computational cost and raise the performance, simplify the learned model and gain a better knowledge of the data in pattern recognition applications or machine learning (Chandrashekar and Sahin 2014, Khafaga 2022). The FS approaches are essentially categorized into three main groups: filter method, wrapper method, and embedded method. The majority of research efforts are focused on FS for classification (Liu and Yu 2005, Khafaga 2022, Sami Khafaga, M. El-kenawy et al. 2023).

The FS is sometimes referred to as attribute selection or variable selection. As seen in Fig. 4, five basic steps are typically involved in the FS process: feature initialization, feature subset creation, evaluation of feature subset, stopping criteria, and validation of the result (Zhang, Xiong et al. 2019). The initialization phase of any FS approach relies on all of the original attributes present in the dataset. That is, the dimensionality of the search space is commonly described as the entire number of features utilized within this procedure (Nguyen, Xue et al. 2020).

The subset discovery is done in the second phase of the FS process, which identifies a potential subset of features for assessment. The search for a subset can start with the following three options: without any features, with all features, or with a random search for a feature subset. Several search approaches, including conventional approaches, heuristic, and metaheuristic algorithms, are utilized at this phase to pick the optimal feature subsets.

The third phase involves evaluating the generated feature subset. The feature subset produced in the second phase will be assessed and compared to the previous best set using a specific assessment function. In the event that the new subset outperforms the prior best subset, it will take its place.

The stopping criteria are the fourth step; in the absence of an appropriate stopping condition, the FS process can continue endlessly over the subsets space. The procedures for generating and evaluating subsets are iterated until a predefined stopping threshold is met. The decision of stopping threshold is affected by whether a predefined number of iterations have been achieved, whether the optimum subset has been obtained, and whether eliminating any attribute does not produce an improved subset, according to a certain function of evaluation. The loop will terminate when the stopping threshold are met. Finally, the validity of the subset is verified in the validation procedure. The test set will be employed to validate the specified subset of features (Liu and Yu 2005, Zhang, Xiong et al. 2019).

1) Solution representation

Metaheuristic optimization algorithms utilize the population of candidate solutions, which is commonly represented as a vector of values. Due to the FS is binary problem the metaheuristic-based FS algorithms often utilize a binary encoding to express a solution, which consists of a chosen set of features (Abu Khurmaa, Aljarah et al. 2021, Dokeroglu, Deniz et al. 2022). This representation is shown in Fig. 5. The solution's length is represented by m, where m is an integer representing the number of features. One of two possible positions for the solution is "0" or "1". A binary value of "0" denotes non-selection of the feature, while a binary value of "1" signifies the inclusion of the feature. Thus, the magnitude of the subset feature is expressed through the presence of a "1" value.

2) Curse of dimensionality

The amount of data is growing exponentially, which complicates and demands high computing resources for data processing and analysis. The curse of dimensionality encompasses a range of issues that arise when dealing with data that has a high number of dimensions. The search space in FS grows at an exponential rate with regard to the number of existing attributes in dataset; the total number of possible solutions is 2ⁿ for a dataset with n attributes. This means that the majority of current FS approaches suffer from high computational cost and being trapped in local optimum (Alwajih, Abdulkadir et al. 2022). Moreover, data sparsity increases when the number of dimensions increases; consequently, the FS is an NP-hard problem (Nguyen, Xue et al. 2020). Fig. 6 illustrates the substantial growth of the exponential search space compared to the linear search space.

This section presents the methodology used to choose and examine the literature on hybrid metaheuristic algorithms for the FS problem, as well as the application domains in which they are applied. A SLR was conducted using the established guidelines in Kitchenham, Pretorius et al. (2010).

A. Research questions

The following questions are attempted to be addressed by this review.

RQ1: Which hybridized metaheuristic algorithms are used for the FS problem? What is the purpose of the hybridization? What are their application domains?
RQ2: What type of hybridization is used? What are the evaluation metrics? What statistical tests are used?
RQ3: What are the FS techniques applied with metaheuristic to achieve good classification accuracy and minimum number of features? What classifiers are used? What initial population methods are used?
RQ4: What are the crucial/optimal parameters values? Which transfer and fitness functions are used?
RQ5: What are the challenges in the current studies and their future directions?

B. Search strategy

One of the most important tasks in creating an SLR is figuring out what sufficient and correct keywords to use. Initially, we formulated the query using basic keywords and searched Google to find the most important publications in this field. These queries were designed to extract the primary keywords and important databases from those publications. Subsequently, the primary keywords from the aforementioned articles are chosen, and the queries are on the University Utara Malaysia library databases and Google scholar. The search is within the article's title, abstract, and keywords to locate papers that are relevant to this subject. The abstracts underwent an initial screening process to eliminate all articles that did not fulfil the inclusion criteria. Full-text papers were exported to the Endnote library for deep analysis.

C. Search query and data source

The query comprises three components. The first part of the query is "Hybrid". The second part collects terms linked to "metaheuristic" concepts, while the third part collects terms connected to "classification".

(Hybrid AND (metaheuristic OR meta-heuristic OR metaheuristics OR meta-heuristics) AND for AND "feature selection" AND (on OR in) AND classification)

We conducted a search using the specified search string in the main digital libraries, which are Elsevier-Scopus in the UUM library and Google scholar. This search was performed in December 2023, with the publication year limited to the range of 2019 to 2023.

D. Inclusion and exclusion criteria

The exclusion criteria were applied to the selected papers. The criteria for inclusion and exclusion are shown in Table 2.

The primary search included 320 papers after removing 48 papers that were published before 2019. There were still 272 studies that needed to be screened. Out of those, 242 papers were eliminated as they were either not relevant or did not meet the inclusion criteria. A total of 30 publications were ultimately chosen for inclusion in this study.

Table 2 Inclusion and exclusion criteria

Inclusion Criteria

Exclusion Criteria

1- papers on hybrid metaheuristics for FS for classification are included.

2- Only related works utilized the hybridization for metaheuristics.

3- papers that were published in the period (2019-2023)

Papers that do not satisfy the inclusion criteria

1- Irrelevant papers are discarded if they do not pertain to the search topic (hybridization used was in FS approaches rather than metaheuristics).

2- papers prior to 2019 are not included.

3- paper that is inaccessible

4- papers written in other language (only papers written in English are considered)

A. Data extraction

This stage involves extracting pertinent data and information from the chosen papers. The data that has been extracted is subsequently analyzed in order to address our study questions.

We conducted an analysis of this data in order to categorize studies that were comparable to one another. Extracting the

essential information from each study will enable us to identify the significant advancements and patterns in the research, assess any existing gaps, and establish directions for future work. As a result, we summarized the selected studies according to the research questions in Table 3 to identify the areas that need more investigation and guide future studies.

We examine all selected studies (30 papers) in this section. A synthesis of the data is performed in order to answer the aforementioned research questions.

RQ1: Which hybridized metaheuristic algorithms are used for the FS problem? What is the purpose of the hybridization? What are their application domains?

Metaheuristic algorithms have been shown as techniques that can solve difficult computational NP-problems like the FS problem, so it has gained significant interest from scholars,

Table 3Summary of hybrid algorithms, purpose of hybridization, datasets utilized, and their application domains

P9	El-Kenawy, Mirjalili et al. (2022)	SCA + WOA	To leverage the strengths of WOA and SCA for solving problems with continuous and binary decision variables.	19 UCI benchmark datasets	Multiple domains
P10	Ewees, Al-qaness et al. (2021)	AOA + GA	To guarantee the solutions’ diversity is kept. To tackle the main weaknesses of conventional AOA’s by avoiding the search strategies’ local search problem and search balancing.	20benchmarks + 2 real-world problems containing gene datasets	Multiple domains
P11	Liang, Wang et al. (2019)	ACO + BSO	To enhance the performance of ACO, as a result, avoid stagnation in the local optima and premature convergence.	Six binary classification UCI datasets	Multiple domains
P12	Almazini, Ku-Mahamud et al. (2023)	GWO + ACO	To improve the initialization of the wolf population using the ACO algorithm concept.	NSL-KDD benchmark datasets	Intrusion detection
P13	Mazini, Shirazi et al. (2019)	ABC + AdaBoost	ABC is utilised for FS, while AdaBoost is employed for feature evaluation and classification.	NSL-KDD + ISCXIDS2012 datasets	Intrusion detection
P14	Thawkar (2021)	TLBO + SSA	To achieve better convergence and efficiency, The basic TLBO is modified using the SSA.	651 breast cancer screenings	Medical
P15	Fajri and Wiharto (2023)	BeeSO + Q-Learning	To enhance the efficiency of the feature search method and ultimately enhance the classification accuracy by combining the BeeSO with Q-learning.	Four heart disease UCI datasets	Medical
P16	El-Shafiey, Hagag et al. (2022)	PSO + GA	To aim for the rejected individuals from every generation to achieve the goal of rehabilitating them and optimizing the contribution of all individuals in each generation.	Cleveland and Statlog (heart disease UCI datasets)	Medical
P17	Li, Zhang et al. (2021)	PSO + GA	To reduce the chance of getting stuck in local optima, PSO was used for increasing the convergence rate of GA.	Diabetes dataset	Medical
P18	Bezdan, Zivkovic et al. (2022)	BSO + FA	To achieve a balance between exploring and exploiting. To reduce the drawbacks of the conventional BSO.	21 UCI benchmark datasets + Covid-19	Multiple domains
P19	Hans and Kaur (2020)	ALO + SCA	To take benefits from the SCA to balance exploration and exploitation. To increase the diversity of the solutions, it also allows both algorithms to explore the search space.	18 diverse real-time datasets attained from online	Multiple domains
P20	Houssein, Hosney et al. (2020)	HHO + CS	To avoid falling into local optimum and premature convergence. To balance between exploration and exploitation. To enhance the limitations of the original HHO.	Two chemical datasets + 10 UCI benchmark datasets	Multiple domains
P21	Osmani, Mohasefi et al. (2022)	ABC + ICA	To improve ABC exploitation.	16 UCI benchmark datasets + 2 Amazon datasets	Sentiment classification
P22	Akinola, Ezugwu et al. (2022)	DMO + SA	To improve the limited exploitative process of the DMO.	Three high‑dimensional medical datasets + 18 UCI datasets (low and medium dimensions)	Multiple domains
P23	Phogat and Kumar (2023)	IBCSO + WOA	To attain a good balance between exploration and exploitation and identify informative genes.	Six microarray datasets	Medical
P24	Alkanhel, El-kenawy et al. (2023)	GWO + DTO	To avoid the local optima issue stagnation and early convergence of GWO. To improve the exploration and exploitation search.	IoT-IDS dataset (RPL-NIDDS17 dataset)	Intrusion detection
P25	Alhussan, Abdelhamid et al. (2023)	DBER + DTO	To improve exploration and exploitation of the search space where BER was motivated by the behavior of swarm members in achieving their global goals.	Diabetes datasets	Medical
P26	Alwan, AbuEl-Atta et al. (2021)	FA + GA	To prevent being stuck in local optima through improving the exploration abilities of the standard firefly, a mutation operation was employed.	NSL-KDD benchmark datasets	Intrusion detection
P27	Alweshah, Aldabbas et al. (2023)	BWO + IG	To enhance the local search capabilities of the BWO algorithm.	Nine benchmark datasets from the gene expression data repository	Medical (Gene selection)
P28	Masrom, Rahman et al. (2022)	PSO + GA	To resolve the problem of immature convergence in PSO	Real dataset on the tax avoidance cases among companies in Malaysia	Taxation and financial compliance
P29	Shanthi, Akshaya et al. (2022)	SDS + TS	Diversity is given by the SDS to the candidate solutions belonging to the TS if there are no good solutions.	Lung cancer dataset	Medical
P30	Lee, Le et al. (2022)	GWO + HBO	To improve the capability of both global and local searching. To resolve the problem of falling into local traps in GWO.	4 UCI benchmark datasets	Engineering

especially in the hybrid metaheuristics field (Talbi 2009). These hybrid approaches try to share each other's strengths to enhance their performance. Figure 7 presents all population-based metaheuristic algorithms in this SLR that have been employed for solving the FS problem in classification.

The GWO is one of the most modern bio-inspired algorithms; it has gained significant interest within the field of hybrid metaheuristic algorithms for FS at a rate of 33% of publications. It is widely recognized that both exploration and exploitation are crucial for any population-based algorithm to exhibit excellent performance. In conventional GWO, the primary focus is on updating all the search agents, which are represented by wolves, based on the best search agent (α), second best search agent (β), and third best search agent (δ) throughout the entire optimization process. Essentially, premature convergence occurs as a result of the position update method because the search agents are not given sufficient opportunity to efficiently explore the search space (Arora, Singh et al. 2019). Moreover, the value of the exploration and exploitation parameter does not depend on the feedback from the search process but changes linearly. This greatly restricts the search process as each wolf will have the same value for the solution it provides (Gu, Li et al. 2019).

To address the above-mentioned shortcomings of the original GWO and use it to solve the FS problem in classification tasks, several studies have conducted hybridization of GWO with another algorithm. For example, it has been hybrid with PSO in (Al-Tashi, Abdul Kadir et al. 2019, El-Kenawy and Eid 2020). El-Kenawy and Eid (2020) proposed to divide the population into two groups. The first group adheres to the GWO procedures, whereas the second group adheres to the PSO procedures to balance exploitation and exportation as well as improve algorithm performance. Al-Tashi, Abdul Kadir et al. (2019) developed the hybrid GWO with PSO, which was proposed in (Singh and Singh 2017) to make it capable of solving the FS problem. The authors enhanced the exploitation capability in PSO and the exploration capability in GWO by controlling the exploration and exploitation of the grey wolf in the search space by the inertia constant parameter of PSO.

To solve the problem of immature convergence, a more effective exploration phase of GWO in high-level hybridization is suggested by using HHO in (Al-Wajih, Abdulkadir et al. 2021) and WOA in (Mafarja, Qasem et al. 2020). The GWO algorithm modifies the position vector of wolves by considering the three best agents in the population. This can lead to being stuck at the local optimum. In contrast, the WOA algorithm uses a random factor to move certain agents inside the feature space. This factor enables the WOA algorithm to randomly evade local optima; however, it may cause premature convergence. These drawbacks motivated Mafarja, Qasem et al. (2020) to propose three hybrid models (serial GWO-WOA, random switcher GWO-WOA, and adaptive switcher GWO-WOA). Almazini, Ku-Mahamud et al. (2023) improved the initial population of the GWO by using a heuristic-based ACO in intrusion detection. The DTO, which is capable of identifying viable regions and providing the best solution, was hybridized with GWO (Alkanhel, El-kenawy et al. 2023, Sami Khafaga, M. El-kenawy et al. 2023) to improve the performance of GWO.

The hybridizations in low-level to enhance the performance of GWO were introduced by using CSA in (Arora, Singh et al. 2019), SFS in (El-Kenawy, Eid et al. 2020) and HBO in (Lee, Le et al. 2022). Arora, Singh et al. (2019) used a control parameter of CSA in the position update equation of GWO to achieve a good balance between exploration and exploitation. Another improvement is related to maintaining population diversity; not all agents in the population are modified by the alpha and beta updating directions (by alpha only). El-Kenawy, Eid et al. (2020) proposed a hybridization of GWO and SFS to increase the ability of exploration and obtain the optimal solution; the SFS diffusion process is implemented by utilizing the Gaussian distribution approach for random walks in the growth process, and they have employed the crossover/mutation operations to raise the population diversity; crossover operator improved the exploitation procedure, and the mutation operator improved the exploration procedure. Lee, Le et al. (2022) suggested a hybridized GWO with HBO to enhance the chances of evading local optima and improve the effectiveness of both global and local search. The optimal solution acquired from the GWO is kept as a record. If the new solution produced by HBO is more than 90% similar to the aforementioned record, the crossover is employed. If the new solution resulting from crossover is identical to the current record, then mutation is executed.

The GA technique family conducted up to 17% of the FS-based metaheuristic algorithms (five studies). It was hybrid with PSO in (Li, Zhang et al. 2021, El-Shafiey, Hagag et al. 2022, Masrom, Rahman et al. 2022). Due to the high convergence rate of PSO, it was employed to accelerate GA convergence (Li, Zhang et al. 2021); the proposed hybrid algorithm has the ability to minimize the likelihood of encountering local minima. Meanwhile, in [16], the PSO is employed to specifically target the individuals that have been rejected in each generation. This approach aims to address the defect of premature convergence, which happens when the PSO concludes this procedure before achieving the ideal solution. Hence, (Masrom, Rahman et al. 2022) proposed adaptive GA operators to enhance the performance of PSO in tax avoidance detection. The authors introduced three types of models using hybrid PSO with adaptive crossover, adaptive mutation, or both adaptive crossover mutation.

The hybridization of GA with AOA was proposed by Ewees, Al-qaness et al. (2021). The suggested technique addressed the primary limitations of the traditional AOA by avoiding the difficulty of local search and achieving a good balance of exploration and exploitation by incorporating the mutation operator of GA with FA (Alwan, AbuEl-Atta et al. 2021). The suggested hybrid algorithm has a heightened capacity for exploration in order to prevent becoming stuck in local optimal solutions.

A summary of the metaheuristics with their respective application domains employed in the reviewed papers in this SLR is presented in Table 4. Moreover, Fig. 8 illustrates that 14% of reviewed papers (4 out of 30) were employed in the intrusion detection domain. While 27% of the FS-based metaheuristic research (8 out of 30) in this SLR was used to serve the medical domain. 50% of papers (15) suggested different metaheuristics, either improved variants of conventional algorithms or new hybrid metaheuristics and generally evaluated their performance in different domains, including medical and others. Other topics that received far less attention from researchers were engineering, financial compliance, and Sentiment classification, each with only one publication.

Table 4

Summary of hybrid algorithms with their application domains
Application Domains	Hybrid Algorithm	Reference
Multiple Domains	GWO + PSO GWO + HHO GWO + PSO GWO + DTO GWO + CSA GWO + WOA WOA + HHO GWO + SFS SCA + WOA AOA + GA ACO + BSO BSO + FA ALO + SCA HHO + CS DMO + SA	Al-Tashi, Abdul Kadir et al. (2019) Al-Wajih, Abdulkadir et al. (2021) El-Kenawy and Eid (2020) Sami Khafaga, M. El-kenawy et al. (2023) Arora, Singh et al. (2019) Mafarja, Qasem et al. (2020) Alwajih, Abdulkadir et al. (2022) El-Kenawy, Eid et al. (2020) El-Kenawy, Mirjalili et al. (2022) Ewees, Al-qaness et al. (2021) Liang, Wang et al. (2019) Bezdan, Zivkovic et al. (2022) Hans and Kaur (2020) Houssein, Hosney et al. (2020) Akinola, Ezugwu et al. (2022)
Intrusion Detection	GWO + ACO ABC + AdaBoost GWO + DTO FA + GA	Almazini, Ku-Mahamud et al. (2023) Mazini, Shirazi et al. (2019) Alkanhel, El-kenawy et al. (2023) Alwan, AbuEl-Atta et al. (2021)
Medical	TLBO + SSA BeeSO + Q-Learning PSO + GA PSO + GA IBCSO + WOA DBER + DTO BWO + IG SDS + TS	Thawkar (2021) Fajri and Wiharto (2023) El-Shafiey, Hagag et al. (2022) Li, Zhang et al. (2021) Phogat and Kumar (2023) Alhussan, Abdelhamid et al. (2023) Alweshah, Aldabbas et al. (2023) Shanthi, Akshaya et al. (2022)
Sentiment Classification	ABC + ICA	Osmani, Mohasefi et al. (2022)
Taxation and financial compliance	PSO + GA	Masrom, Rahman et al. (2022)
Engineering	GWO + HBO	Lee, Le et al. (2022)

RQ2: What type of hybridization is used? What are the evaluation metrics? What statistical tests are used?

Hybridization is essential in metaheuristic algorithms for solving FS problems effectively by integrating the advantages of several algorithms. The significance of hybridization resides in its capacity to enhance the overall performance and efficacy of metaheuristic algorithms. As shown in Table 5, different levels of hybridization were used in reported studies. Algorithms run simultaneously, and they can operate independently and periodically exchange information, or they might cooperate to solve a common problem; these types are parallel. Whereas the algorithms run one after another, they are sequential. The hybridization without changing the internal algorithm is known as the high-level type, while in low-level hybridization, the functionality of the two algorithms is merged. We can conclude the benefits of hybridization in these studies as follows:

• It assists in enhancing the effectiveness of the original algorithm (e.g., improving weak exploration or exploitation capabilities and improving population diversity).

• It assists in resolving the problems of premature convergence and the local optimal trap.

• It assists to strike a balance between exploration and exploitation processes.

An essential step in the classification process is evaluating a predictive model. This occurs subsequent to the construction and training of the model using a set of data. The researcher's primary focus is to assess the model's performance, usefulness, and generalizability. They also consider whether additional features are necessary and if further training is required to enhance the model's overall performance. The evaluation of FS-based metaheuristics performance involved the use of various standard measures, including mean, best, worst fitness, classification accuracy/ error, sensitivity, specificity, and standard deviation. Nevertheless, the criteria that were most frequently employed in reported studies included classification accuracy/error, mean fitness and the number of selected features, as shown in Fig. 9.

Table 5

Summary of hybridization type, statistical test, and evaluation metrics in selected studies
No	Reference	Hybridization Type				Statistical Test	Evaluation Metrics
No	Reference	Sequential	parallel	Low-level	High-level	Statistical Test	Evaluation Metrics
P1	Al-Tashi, Abdul Kadir et al. (2019)		√	√		-	Average accuracy, Average selected Feature size Mean fitness Best fitness Worst fitness Average computational time
P2	Al-Wajih, Abdulkadir et al. (2021)	√			√	Wilcoxon test	Average accuracy Average selected feature size Mean fitness Best fitness Worst fitness Average computational time
P3	El-Kenawy and Eid (2020)		√		√	-	Average classification error Best fitness Worst Fitness Average fitness size Mean Standard deviation
P4	Sami Khafaga, M. El-kenawy et al. (2023)		√		√	-	Best fitness Worst fitness Average error Average select size mean fitness size Standard deviation
P5	Arora, Singh et al. (2019)		√	√		Wilcoxon and Friedman test	Classification accuracy Statistical mean Standard deviation Average features length
P6	Mafarja, Qasem et al. (2020)		√		√	Wilcoxon test and F-test	Average classification accuracy Average selected size mean fitness Average running time
P7	Alwajih, Abdulkadir et al. (2022)√			√		Statistical one-way ANOVA test	Classification accuracy mean fitness Average selected features Computational time
P8	El-Kenawy, Eid et al. (2020)		√		√	Wilcoxon test	Average error mean fitness Mean Best Fitness worst Fitness Standard Deviation
P9	El-Kenawy, Mirjalili et al. (2022)	√		√		Wilcoxon one-way ANOVA	Average error mean fitness Mean Best Fitness worst Fitness Standard Deviation
P10	Ewees, Al-qaness et al. (2021)		√	√		Non-parametric Friedman test	Average accuracy Average of selected features Mean fitness Standard deviation
P11	Liang, Wang et al. (2019)	-	-	√		-	Accuracy Percent rate Recall rate F-measures Average time costs
P12	Almazini, Ku-Mahamud et al. (2023)	√			√	Non-parametric Friedman test	Average classification accuracy Average number of selected features,
P13	Mazini, Shirazi et al. (2019)	√			√	-	Classification error and detection rate Time and space complexity Sensitivity
P14	Thawkar (2021)		√		√	-	Sensitivity Specificity Classification accuracy F-score Kappa coefficient False Positive Rate False Negative Rate
P15	Fajri and Wiharto (2023)		√		√	-	Accuracy Precision Recall Selected feature Execution time,
P16	El-Shafiey, Hagag et al. (2022)	√			√	-	Accuracy Recall Specificity Sensitivity ROC curve
P17	Li, Zhang et al. (2021)	-	-		√	-	Accuracy Sensitivity Specificity
P18	Bezdan, Zivkovic et al. (2022)		√		√	Wilcoxon and Friedman test	Average Fitness Average Accuracy
P19	Hans and Kaur (2020)	√				Wilcoxon test	Average Accuracy mean fitness Worst fitness Best fitness Standard Deviation Average number of features selected F-Measure
P20	Houssein, Hosney et al. (2020)		√	√		-	Accuracy Sensitivity Specificity Recall Precision F-measure Worst, Best, Mean Standard Deviation
P21	Osmani, Mohasefi et al. (2022)		√	√		T-test Wilcoxon Friedman test	Accuracy Precision Recall F-measure
P22	Akinola, Ezugwu et al. (2022)	√		√		Wilcoxon Friedman mean ranking test	Accuracy Average feature size Respective algorithms’ convergence characteristics
P23	Phogat and Kumar (2023)	√			√	-	Accuracy Specificity Sensitivity F-measure MCC Standard deviation Optimal number of genes
P24	Alkanhel, El-kenawy et al. (2023)		√		√	Wilcoxon test ANOVA test	mean fitness size Average error Standard deviation Worst, Best, and Mean fitness
P25	Alhussan, Abdelhamid et al. (2023)	-	-	-	-	Wilcoxon test ANOVA test	Average Error Mean, Best, Worst Fitness Average fitness size Standard deviation
P26	Alwan, AbuEl-Atta et al. (2021)		√	√		-	Accuracy Number of selected features
P27	Alweshah, Aldabbas et al. (2023)	√			√	-	Convergence speed Classification accuracy Average number of genes selected mean fitness
P28	Masrom, Rahman et al. (2022)		√	√		-	Accuracy Number of selected features
P29	Shanthi, Akshaya et al. (2022)		√			-	Accuracy Recall Precision Best Fitness
P30	Lee, Le et al. (2022)		√	√		-	Mean fitness value Average number of selected features Average operating times

Statistical tests play an essential role in assessing the quality of the model. The common statistical tests used in FS with metaheuristics comprise Wilcoxon, Friedman, ANOVA and T-test. As displayed in Table 5, 14 papers were rated (47%) as having used statistical tests to evaluate their models. The most popular test is the Wilcoxon test, which was used in 37% of publications (11 out 30), followed by the Friedman test, which was employed in 6 papers. Additionally, the ANOVA was used in 4 papers.

RQ3: What are the FS techniques applied with metahumans to achieve good classification accuracy and minimum number of features? What classifiers are used? What initial population methods are used?

The filter method assesses individual features in the dataset according to their information theoretical or statistical characteristics without using any classification algorithms (Nguyen, Xue et al. 2020). It is less expensive computationally and has a faster execution time than wrapper methods due to working independently of any classifier. However, it has the drawback of ignoring the performance of the selected features (Chaudhuri and Sahu 2021). Only two studies (Almazini, Ku-Mahamud et al. 2023, Phogat and Kumar 2023) in this review use a filter approach to solve the FS problem.

Wrapper techniques necessitate a specified learning algorithm and use its performance as an assessment criterion.

This dependence criterion necessitates the predefined learning approach in FS and relies on the effectiveness of this approach when used to determine which features are chosen (Liu and Yu 2005, Zhang, Xiong et al. 2019). The majority of articles (17 out of 18 studies reported the FS method) have employed wrapper technique-based metaheuristics to decrease the number of features in classification tasks, as shown in Table 6.

Classification techniques are used to develop models that are capable of automatically learning patterns and relationships in data and predicting which class unseen instances belong to. To categorize data effectively, these techniques make use of the capabilities of statistical analysis and pattern recognition. Several classification approaches are available, each with strengths and drawbacks that are appropriate for various datasets and problem domains. Among the most popular classification approaches are KNN, SVM, ANN, NB and RF.

The KNN method stands out as one of the simplest and most extensively applied methods that integrate with metaheuristic algorithms for improving FS in classification tasks. Due to its efficacy and stability, it was the most commonly used classification method, appearing in 18 papers, as shown in Table 6. Only one study used KNN with other classifiers when building the fitness function, while 17 papers reported that

Table 6

Summary of feature selection approach, classifier, and population initialization method in selected studies
No		FS Approach	Classifier	Population Initialization Method
P1, P2, P3, P6, P7, P8, P10, P17, P18, P22, P27	Al-Tashi, Abdul Kadir et al. (2019), Al-Wajih, Abdulkadir et al. (2021), El-Kenawy and Eid (2020), Mafarja, Qasem et al. (2020), Alwajih, Abdulkadir et al. (2022), El-Kenawy, Eid et al. (2020), Ewees, Al-qaness et al. (2021), Li, Zhang et al. (2021), Bezdan, Zivkovic et al. (2022), Akinola, Ezugwu et al. (2022), Alweshah, Aldabbas et al. (2023)	Wrapper	KNN	Random
P4, P5, P11, P19, P24, P25	Sami Khafaga, M. El-kenawy et al. (2023), Arora, Singh et al. (2019), Liang, Wang et al. (2019), Hans and Kaur (2020), Alkanhel, El-kenawy et al. (2023), Alhussan, Abdelhamid et al. (2023)	-	KNN	Random
P9	El-Kenawy, Mirjalili et al. (2022)	-	-	Random
P12	Almazini, Ku-Mahamud et al. (2023)	Filter	SVM	Heuristic
P13	Mazini, Shirazi et al. (2019)	Wrapper	-	Random
P15	Fajri and Wiharto (2023)	Wrapper	SVM, RF, LGBM and XGBoost	Random
P16	El-Shafiey, Hagag et al. (2022)	Wrapper	RF	Random
P20	Houssein, Hosney et al. (2020)	-	SVM	Chaotic maps
P21	Osmani, Mohasefi et al. (2022)	Wrapper	SVM	Random
P23	Phogat and Kumar (2023)	Wrapper, Filter	ANN	Random
P26	Alwan, AbuEl-Atta et al. (2021)	-	NB	Random
P28	Masrom, Rahman et al. (2022)	-	KNN, SVM, RF	Random
P29	Shanthi, Akshaya et al. (2022)	-	ANN, DT, NB	Random
P30	Lee, Le et al. (2022)	Wrapper	SVM, LDA	Random

kNN classification accuracy is only used to construct fitness functions. In the majority of these studies (9 papers), a common practice is to assign a value of 5 for the parameter "k" (5 neighbors) as a suitable value to achieve high accuracy. Additionally, the SVM attracted attention from 6 studies due to SVM’s advantages, including its efficacy in space with high dimensions, adaptability, and memory efficiency, regarding other classifiers used in a few studies, as displayed in Fig. 10.

Naturally, P-metaheuristics are more exploration search algorithms because of the wide diversity of initial populations. This stage is essential to the algorithm's efficiency and efficacy. Insufficient diversity in the initial population can lead to premature convergence (Talbi 2009). Usually, the initial population is generated at random, as shown in Table 6, around 93% (28 of 30 papers) initialized the population randomly. While Almazini, Ku-Mahamud et al. (2023) initialized the wolves' population in GWO by using a heuristic-based ACO aiming to generate solutions by choosing features that optimize the classification accuracy. Houssein, Hosney et al. (2020) introduced the utilization of chaotic maps for initializing solutions for updating the control energy parameters in HHO in order to prevent the occurrence of local optima and premature convergence.

RQ4: What are the crucial/optimal parameters values? Which transfer and fitness functions are used?

Controlling the parameters in metaheuristic algorithms is among the most crucial areas of research. The common parameters of all population-based metaheuristic algorithms are the size of the population and the number of iterations. For more efficient and high-quality computations, it is essential to fine-tune these parameters (Črepinšek, Liu et al. 2012). Furthermore, running the algorithm multiple times enables it to explore the search space widely, helping to reduce the impact of randomness and enhance the probability of discovering optimum or nearly optimal solutions.

Table 7

Summary of transfer, fitness functions and values of crucial parameters in selected studies
No	Reference	Transfer Function	Fitness Function	Parameters Sitting
No	Reference	Transfer Function	Fitness Function	Population size	Iteration	Run Times
P1, P2, P7	Al-Tashi, Abdul Kadir et al. (2019), Al-Wajih, Abdulkadir et al. (2021), Alwajih, Abdulkadir et al. (2022)	S-Shaped (Sigmoid)	\(F=\alpha E\left(D\right)+\beta \frac{\left\|S\right\|}{\left\|T\right\|}\) α = [0,1] and β = (1 − α) E(D): Classification error rate \|S\|: features selected subset \|T\|: whole features in the dataset	10	100	20
P3	El-Kenawy and Eid (2020)	S-Shaped (Sigmoid)	\(F=\alpha E\left(D\right)+\beta \frac{\left\|S\right\|}{\left\|T\right\|}\)	10	80	20
P4	Sami Khafaga, M. El-kenawy et al. (2023)			20	50	-
P5	Arora, Singh et al. (2019)			7	100	30
P6	Mafarja, Qasem et al. (2020)			20 (10, 20, 30, 40, 50)	100 (50, 75, 100, 150, 200)	30
P8	El-Kenawy, Eid et al. (2020)			10	80	20
P9	El-Kenawy, Mirjalili et al. (2022)	S-Shaped (Sigmoid)	-	10	100	20
P10	Ewees, Al-qaness et al. (2021)	-	\(F=\alpha E\left(D\right)+\beta \frac{\left\|S\right\|}{\left\|T\right\|}\)	25	100	13
P11	Liang, Wang et al. (2019)	-	-	150	300	100
P12	Almazini, Ku-Mahamud et al. (2023)	-	𝐹𝑖𝑡𝑛𝑒𝑠𝑠 = 𝐴𝐶. 𝑎 + (1/𝑁𝐹). 𝑏 where NF is the features subset, and AC is the accuracy and the values of parameters a, and b are in [0,1]	-	-	-
P13	Mazini, Shirazi et al. (2019)	-	\({fit}_{i}=\left\{\begin{array}{c}\frac{1}{(1+f\left({x}_{i} \right))}, f\left({x}_{i}\right)\ge 0\\ 1+\left\|f\left({x}_{i}\right)\right\|, f\left({x}_{i}\right)<0\end{array}\right.\)	-	150, 200, 250, 500	-
P14	Thawkar (2021)	-	\(f\left({x}_{i}\right)=\left({E}_{xi}\text{}\right(1+0.5\text{}\frac{S}{N}\) ))2 f(X_i) represents the cost of X_i. E_Xi is the performance value or classification accuracy of the ith feature set. S be the number of selected features, and N be the number of features in the original database.	25	25, 30, 40, 50, 100	-
P16	El-Shafiey, Hagag et al. (2022)	-	-	50	30	5
P18	Bezdan, Zivkovic et al. (2022)	S-Shaped (Sigmoid)	\(F=\alpha E\left(D\right)+\beta \frac{\left\|S\right\|}{\left\|T\right\|}\)	8	70	20
P19	Hans and Kaur (2020)	S-Shaped (sigmoid) V-Shaped (tanh)	\(\text{F}=\alpha \frac{\left\|S\right\|}{\left\|T\right\|}+\beta E\left(D\right)\)	20	100	10
P20	Houssein, Hosney et al. (2020)	-	\(F=\alpha +\beta \frac{\left\|R\right\|}{\left\|C\right\|}-G\) \(F>T\) R: classification error, C: total number features, Β: subset length, α: classification performance defined in the range [0, 1]. T: is a necessary condition, G: group column for the specific classifier. Each step in the algorithm is compared with T, where the obtained fitness value must be greater than in order to maximize the solution.	30, 41	100, 1000	30
P22	Akinola, Ezugwu et al. (2022)	Binarization Function	\(F=\mu .\left(1-{A}_{c}\right)+\left(1-\mu \right).\frac{{b}_{s}}{{D}_{t}}\) (1-Ac): classification error, bs: feature subset dimension, Dt: total number of attributes, µ ∈ [0,1]	10, 20, 30,40, 50 (10best)	50	10
P23	Phogat and Kumar (2023)	S-shaped threshold function	\(\text{F}=\alpha \frac{\left\|S\right\|}{\left\|T\right\|}+(1-\alpha )\varGamma\) Γ: classification accuracy	100, 200	20	-
P24	Alkanhel, El-kenawy et al. (2023)	S-shaped (sigmoid)	\(F=\alpha {E}_{R}\left(D\right)+\beta \frac{\left\|R\right\|}{\left\|C\right\|}\)	-	-	-
P25	Alhussan, Abdelhamid et al. (2023)		\(F=\alpha {E}_{R}\left(D\right)+\beta \frac{\left\|R\right\|}{\left\|C\right\|}\)	-	500	10
P26	Alwan, AbuEl-Atta et al. (2021)		min f(x) = (100 − Accuracy)	10, 20, 30, 40	500	15
P27	Alweshah, Aldabbas et al. (2023)	-	\(F=\alpha {E}_{R}\left(D\right)+\beta \frac{\left\|R\right\|}{\left\|C\right\|}\)	10	100	-
P28	Masrom, Rahman et al. (2022)	-	-	10, 20,30	100–1000 (600 best)	-
P30	Lee, Le et al. (2022)	S-Shaped (sigmoid)	\(F=\frac{{N}_{T}}{{N}_{T}+{N}_{F}}\text{*}100\text{\%}\) NT: number of instances that is true predicted, NF: number of instances that is false predicted.	10	100	30

Table 7 provides a concise overview of the essential parameters that are necessary for all studies on metaheuristics-based FS within the scope of this SLR. These parameters comprise the population size, the maximum number of iterations, and the number of executions (run times). Nevertheless, it has been noted that the metaheuristic-based FS algorithms described in (Li, Zhang et al. 2021, Shanthi, Akshaya et al. 2022, Alkanhel, El-kenawy et al. 2023, Almazini, Ku-Mahamud et al. 2023) lack the essential statistical analysis to illustrate the significance and superiority of these parameters, which is a crucial component of empirical research. In the majority of studies, 43%, the iteration parameter is assigned a value of 100, and 33% of studies used 10 agents as a population size.

Most metaheuristic algorithms are designed for solving continuous problems. However, FS is a binary problem, where each possible solution is represented by a d-dimensional vector consisting of either 0 or 1 values. In this representation, 0's indicate rejected features and 1's represent selected features. Hence, embracing a binary representation is a crucial step in the FS domain. Hence, many researchers frequently use S-shaped and V-shaped transfer functions for this task.

As shown in Table 7, 15 out of 16 studies that reported the used transfer function employed S-shaped transfer functions, which is a family of sigmoid approaches, making it the dominant method. It is worth mentioning the presence of V-shaped functions, which are used only in (Hans and Kaur 2020). Akinola, Ezugwu et al. (2022) used a binarization function for solution representation in the range [0, 1], where the selected feature is represented by 1 if the position index is equal to 0.5 and above, whereas a position index less than 0.5 denotes a rejected feature.

The fitness function determines the quality of the solution based on the features that have been chosen. The formulation of an efficient fitness function is absolutely necessary for the success of the process (Talbi 2009). There are different fitness functions that are frequently utilized in metaheuristic algorithms for the purpose of FS. The common fitness function used by recent studies reviewed in this SLR in 73% focuses on classification error and the number of selected features.

RQ5: What are the challenges in the current studies and their future directions?

Although the selected studies prove effective in solving the FS problem, they still suffer from shortcomings that indicate the need for more investigation in future studies.

• Selected studies are single objective that employed only one fitness function. In contrast, FS serve two purposes: maximizing the accuracy of classification and decreasing the number of attributes. These two objectives often conflict with each other, necessitating an optimization algorithm to determine the ideal trade-offs between them. Regrettably, there is a limited amount of research on multi-objective FS. Therefore, the use of hybridization of metaheuristics in multi-objective FS is an open research topic for scholars.

• Most selected studies use the wrapper approach as traditional FS. Hence, the hybridize wrapper and filter is an open research topic.

• Some selected studies hybridize the original metaheuristic algorithms without tuning the crucial parameters. Therefore, the appropriate strategies for tuning crucial parameters in hybrid metaheuristic algorithms are of interest for further study.

• Further exploration of various application fields, such as finance, cybersecurity, and engineering, should be expanded. These domains exhibit high-dimensional data in various media, encompassing image, text, and audio.

• Since computational Complexity is a significant concern in hybrid metaheuristics for FS, some reported studies have demonstrated improvement but have issues related to computational burden and time complexity. It is advisable to provide more suitable methods to reduce it.

Metaheuristic algorithms are a burgeoning subject that can effectively address complex optimization challenges. The growing interest in metaheuristics for FS is due to their high performance compared to other algorithms when handling high-dimensional data. This study conducted a systematic literature review analyzing and synthesizing 30 papers from a total of 320 papers obtained from well-known search databases aimed to provide a comprehensive understanding of how these metaheuristics are applied in classification. This SLR outlines the conducted statistical analysis to ensure that relevant studies are carefully chosen. The contributions of this study comprise the recognition of the hybrid metaheuristics that are employed to solve FS problem in classification and their domains of application. The most common algorithm used at 33% is GWO, followed by PSO and GA, each of them used at 17% of publications. It is noteworthy that 18 out of 30 studies, 43% of the total, employed a KNN classifier to construct fitness functions. The second popular classifier is SVM, which is used in 6 papers (20%). Regarding the transfer function, the method most frequently employed in the chosen publications is the S-shaped transfer function. The random initialize population is common in most reported studies.

The efficacy of hybridization in increasing the effectiveness of metaheuristics is generally recognized in the field of optimization, as observed in the analyzed studies. Hybrid metaheuristics frequently outperform the performance of their individual constituents, providing a balance between exploration and exploitation, avoiding immature convergence, which leads to trap in local optima, and improving the performance of algorithms to achieve an optimal or near-optimal solution. According to the analysis in this SLR, studies in additional application domains, like finance, cybersecurity, and engineering, should be expanded. These domains present high-dimensional data in diverse formats, ranging from text and images to audio. Moreover, future research should focus on the application of a hybrid metaheuristic to address the multi-objective FS issues.

Author Contribution

The first and second authors prepared the draft, made substantial contributions to the conception or design of the work, and revised it critically for important intellectual content. Authors approved the final version for submission.

Data availability

Data and materials included and referenced in the article.

Abd Elaziz, M., L. Abualigah, R. A. Ibrahim and I. Attiya (2021). "IoT Workflow Scheduling Using Intelligent Arithmetic Optimization Algorithm in Fog Computing." Computational Intelligence and Neuroscience 2021(Cc).
Abdulwahab, H. A., A. Noraziah, A. A. Alsewari and S. Q. Salih (2019). "An enhanced version of black hole algorithm via levy flight for optimization and data clustering problems." Ieee Access 7: 142085-142096.
Abiodun, E. O., A. Alabdulatif, O. I. Abiodun, M. Alawida, A. Alabdulatif and R. S. Alkhawaldeh (2021). "A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities." Neural Computing and Applications 33(22): 15091-15118.
Abu Khurmaa, R., I. Aljarah and A. Sharieh (2021). "An intelligent feature selection approach based on moth flame optimization for medical diagnosis." Neural Computing and Applications 33: 7165-7204.
Agrawal, P., H. F. Abutarboush, T. Ganesh and A. W. Mohamed (2021). "Metaheuristic algorithms on feature selection: A survey of one decade of research (2009-2019)." Ieee Access 9: 26766-26791.
Akay, B., D. Karaboga and R. Akay (2022). "A comprehensive survey on optimizing deep learning models by metaheuristics." Artificial Intelligence Review 55(2): 829-894.
Akinola, O. A., A. E. Ezugwu, O. N. Oyelade and J. O. Agushaka (2022). "A hybrid binary dwarf mongoose optimization algorithm with simulated annealing for feature selection on high dimensional multi-class datasets." Scientific Reports 12(1): 14945.
Akinola, O. O., A. E. Ezugwu, J. O. Agushaka, R. A. Zitar and L. Abualigah (2022). "Multiclass feature selection with metaheuristic optimization algorithms: a review." Neural Computing and Applications 34(22): 19751-19790.
Al-Tashi, Q., S. J. Abdul Kadir, H. M. Rais, S. Mirjalili and H. Alhussian (2019). "Binary Optimization Using Hybrid Grey Wolf Optimization for Feature Selection." IEEE Access 7: 39496-39508.
Al-Wajih, R., S. J. Abdulkadir, N. Aziz, Q. Al-Tashi and N. Talpur (2021). "Hybrid binary grey Wolf with Harris hawks optimizer for feature selection." IEEE Access 9: 31662-31677.
Alhussan, A. A., A. A. Abdelhamid, S. Towfek, A. Ibrahim, M. M. Eid, D. S. Khafaga and M. S. Saraya (2023). "Classification of Diabetes Using Feature Selection and Hybrid Al-Biruni Earth Radius and Dipper Throated Optimization." Diagnostics 13(12): 2038.
Alkanhel, R., E.-S. M. El-kenawy, A. A. Abdelhamid, A. Ibrahim, M. A. Alohali, M. Abotaleb and D. S. Khafaga (2023). "Network Intrusion Detection Based on Feature Selection and Hybrid Metaheuristic Optimization." Computers, Materials & Continua 74(2).
Almazini, H. and K. R. Ku-Mahamud (2021). "Grey Wolf Optimization Parameter Control for Feature Selection in Anomaly Detection." International Journal of Intelligent Engineering and Systems 14(2): 474-483.
Almazini, H. F., K. R. Ku-Mahamud and H. Almazini (2023). "Heuristic Initialization Using Grey Wolf Optimizer Algorithm for Feature Selection in Intrusion Detection." International Journal of Intelligent Engineering and Systems 16(1): 410-418.
Alobaedy, M. M. and K. R. Ku-Mahamud (2015). "Hybrid ant colony system and genetic algorithm approach for scheduling of jobs in computational grid." Research Journal of Applied Sciences, Engineering and Technology 11(7): 806-816.
Alwajih, R., S. J. Abdulkadir, H. Al Hussian, N. Aziz, Q. Al-Tashi, S. Mirjalili and A. Alqushaibi (2022). "Hybrid binary whale with harris hawks for feature selection." Neural Computing and Applications 34(21): 19377-19395.
Alwan, K. M., A. H. AbuEl-Atta and H. H. Zayed (2021). "Feature Selection Models Based on Hybrid Firefly Algorithm with Mutation Operator for Network Intrusion Detection." International Journal of Intelligent Engineering & Systems 14(1).
Alweshah, M., Y. Aldabbas, B. Abu-Salih, S. Oqeil, H. S. Hasan, S. Alkhalaileh and S. Kassaymeh (2023). "Hybrid black widow optimization with iterated greedy algorithm for gene selection problems." Heliyon 9(9).
Alyasiri, O. M., Y.-N. Cheah, A. K. Abasi and O. M. Al-Janabi (2022). "Wrapper and hybrid feature selection methods using metaheuristic algorithms for English text classification: A systematic review." IEEE Access 10: 39833-39852.
Alzubi, Q. M., M. Anbar, Y. Sanjalawe, M. A. Al-Betar and R. Abdullah (2022). "Intrusion detection system based on hybridizing a modified binary grey wolf optimization and particle swarm optimization." Expert Systems with Applications 204(May): 117597-117597.
Arora, S., H. Singh, M. Sharma, S. Sharma and P. Anand (2019). "A new hybrid algorithm based on grey wolf optimization and crow search algorithm for unconstrained function optimization and feature selection." Ieee Access 7: 26343-26361.
Baxter, J. (1981). "Local optima avoidance in depot location." Journal of the Operational Research Society 32(9): 815-819.
Bell, D. A. and H. Wang (2000). "Formalism for relevance and its application in feature subset selection." Machine Learning 41(2): 175-195.
Beyer, H.-G. and H.-P. Schwefel (2002). "Evolution strategies–a comprehensive introduction." Natural computing 1: 3-52.
Bezdan, T., M. Zivkovic, N. Bacanin, A. Chhabra and M. Suresh (2022). "Feature selection by hybrid brain storm optimization algorithm for covid-19 classification." Journal of Computational Biology 29(6): 515-529.
Bhavan, A., P. Chauhan and R. R. Shah (2019). "Bagged support vector machines for emotion recognition from speech." Knowledge-Based Systems 184: 104886.
Cervantes, J., F. Garcia-Lamont, L. Rodríguez-Mazahua and A. Lopez (2020). "A comprehensive survey on support vector machine classification: Applications, challenges and trends." Neurocomputing 408(xxxx): 189-215.
Chandrashekar, G. and F. Sahin (2014). "A survey on feature selection methods." Computers and Electrical Engineering 40(1): 16-28.
Chaudhuri, A. and T. P. Sahu (2021). "A hybrid feature selection method based on Binary Jaya algorithm for micro-array data classification." Computers & Electrical Engineering 90: 106963.
Črepinšek, M., S.-H. Liu and L. Mernik (2012). "A note on teaching–learning-based optimization algorithm." Information Sciences 212: 79-93.
Cui, Z., F. Li and W. Zhang (2019). "Bat algorithm with principal component analysis." International Journal of Machine Learning and Cybernetics 10: 603-622.
D'Angelo, G. and F. Palmieri (2021). "GGA: A modified genetic algorithm with gradient-based local search for solving constrained optimization problems." Information Sciences 547: 136-162.
Das, A., P. Das, S. Panda and S. Sabut (2019). "Detection of liver cancer using modified fuzzy clustering and decision tree classifier in CT images." Pattern Recognition and Image Analysis 29: 201-211.
Dhiman, G. and A. Kaur (2018). "Optimizing the design of airfoil and optical buffer problems using spotted hyena optimizer." Designs 2(3): 1-16.
Dokeroglu, T., A. Deniz and H. E. Kiziloz (2022). "A comprehensive survey on recent metaheuristics for feature selection." Neurocomputing 494: 269-296.
Ekinci, S., D. Izci, E. Eker and L. Abualigah (2023). "An effective control design approach based on novel enhanced aquila optimizer for automatic voltage regulator." Artificial Intelligence Review 56(2): 1731-1762.
El-Kenawy, E.-S. M., M. M. Eid, M. Saber and A. Ibrahim (2020). "MbGWO-SFS: Modified binary grey wolf optimizer based on stochastic fractal search for feature selection." IEEE Access 8: 107635-107649.
El-Kenawy, E.-S. M., S. Mirjalili, F. Alassery, Y.-D. Zhang, M. M. Eid, S. Y. El-Mashad, B. A. Aloyaydi, A. Ibrahim and A. A. Abdelhamid (2022). "Novel meta-heuristic algorithm for feature selection, unconstrained functions and engineering problems." IEEE Access 10: 40536-40555.
El-Kenawy, E. S. and M. Eid (2020). "Hybrid gray wolf and particle swarm optimization for feature selection." International Journal of Innovative Computing, Information and Control 16(3): 831-844.
El-Shafiey, M. G., A. Hagag, E. S. A. El-Dahshan and M. A. Ismail (2022). "A hybrid GA and PSO optimized approach for heart-disease prediction based on random forest." Multimedia Tools and Applications 81(13): 18155-18179.
Ewees, A. A., M. A. Al-qaness, L. Abualigah, D. Oliva, Z. Y. Algamal, A. M. Anter, R. Ali Ibrahim, R. M. Ghoniem and M. Abd Elaziz (2021). "Boosting arithmetic optimization algorithm with genetic algorithm operators for feature selection: case study on cox proportional hazards model." Mathematics 9(18): 2321.
Fajri, Y. A. Z. A. and W. Wiharto (2023). "Hybrid Model Feature Selection with the Bee Swarm Optimization Method and Q- Learning on the Diagnosis of Coronary Heart Disease."
Fan, Q., H. Huang, Y. Li, Z. Han, Y. Hu and D. Huang (2021). "Beetle antenna strategy based grey wolf optimization." Expert Systems with Applications 165(July 2020): 113882-113882.
Fister, I., X. S. Yang and J. Brest (2013). "A comprehensive review of firefly algorithms." Swarm and Evolutionary Computation 13: 34-46.
Glover, F. (1989). "Tabu search-part I & II." ORSA Journal on computing 1(3): 190-206.
Gu, Q., X. Li and S. Jiang (2019). "Hybrid genetic grey wolf algorithm for large-scale global optimization." Complexity 2019.
Hans, R. and H. Kaur (2020). "Hybrid binary Sine Cosine Algorithm and Ant Lion Optimization (SCALO) approaches for feature selection problem." International Journal of Computational Materials Science and Engineering 9(01): 1950021.
Heidari, A. A., S. Mirjalili, H. Faris, I. Aljarah, M. Mafarja and H. Chen (2019). "Harris hawks optimization: Algorithm and applications." Future Generation Computer Systems 97: 849-872.
Houssein, E. H., M. E. Hosney, M. Elhoseny, D. Oliva, W. M. Mohamed and M. Hassaballah (2020). "Hybrid Harris hawks optimization with cuckoo search for drug design and discovery in chemoinformatics." Scientific Reports 10(1): 14439.
Ileberi, E., Y. Sun and Z. Wang (2022). "A machine learning based credit card fraud detection using the GA algorithm for feature selection." Journal of Big Data 9(1).
John, H. H. (1992). "Genetic Algorithm." 267(1): 1-23.
Khafaga, D. (2022). "Meta-heuristics for feature selection and classification in diagnostic breast cancer." Computers, Materials and Continua 73(1): 749-765.
Kirkpatrick, S., C. D. Gelatt and M. P. Vecchi (1983). "Optimization by simulated annealing." Science 220(4598): 671-680.
Kitchenham, B., R. Pretorius, D. Budgen, O. P. Brereton, M. Turner, M. Niazi and S. Linkman (2010). "Systematic literature reviews in software engineering–a tertiary study." Information and software technology 52(8): 792-805.
Kowsari, K., K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes and D. Brown (2019). "Text classification algorithms: A survey." Information (Switzerland) 10(4): 1-68.
Lee, C.-Y., T.-A. Le and Y.-T. Lin (2022). "A Feature Selection Approach Hybrid Grey Wolf and Heap-Based Optimizer Applied in Bearing Fault Diagnosis." IEEE Access 10: 56691-56705.
Li, H., P. G. H. Nichols, S. Han, K. J. Foster, K. Sivasithamparam and M. J. Barbetti (2009). "Differential Evolution – A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces." Australasian Plant Pathology 38(3): 284- 287.
Li, X., J. Zhang and F. Safara (2021). "Improving the accuracy of diabetes diagnosis applications through a hybrid feature selection algorithm." Neural Processing Letters: 1-17.
Liang, H., Z. Wang and Y. Liu (2019). "A new hybrid ant colony optimization based on brain storm optimization for feature selection." IEICE Transactions on Information and Systems E102D(7): 1396-1399.
Liu, H. and L. Yu (2005). "Toward integrating feature selection algorithms for classification and clustering." IEEE Transactions on knowledge and data engineering 17(4): 491-502.
Mafarja, M., A. Qasem, A. A. Heidari, I. Aljarah, H. Faris and S. Mirjalili (2020). "Efficient hybrid nature-inspired binary optimizers for feature selection." Cognitive Computation 12: 150-175.
Magdy, M., A. El Marhomy and M. A. Attia (2019). "Modeling of inverted pendulum system with gravitational search algorithm optimized controller." Ain Shams Engineering Journal 10(1): 129-149.
Masrom, S., R. A. Rahman, M. Mohamad, A. S. Abd Rahman and N. Baharun (2022). "Machine learning of tax avoidance detection based on hybrid metaheuristics algorithms." IAES International Journal of Artificial Intelligence 11(3): 1153.
Mazini, M., B. Shirazi and I. Mahdavi (2019). "Anomaly network-based intrusion detection system using a reliable hybrid artificial bee colony and AdaBoost algorithms." Journal of King Saud University-Computer and Information Sciences 31(4): 541-553.
Meyer, H., C. Reudenbach, T. Hengl, M. Katurji and T. Nauss (2018). "Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation." Environmental Modelling and Software 101: 1-9.
Mirjalili, S., S. M. Mirjalili and A. Lewis (2014). "Grey Wolf Optimizer." Advances in Engineering Software 69: 46-61.
Mittal, N., U. Singh and B. S. Sohi (2016). "Modified grey wolf optimizer for global engineering optimization." Applied Computational Intelligence and Soft Computing 2016.
Morales-Castañeda, B., D. Zaldivar, E. Cuevas, F. Fausto and A. Rodríguez (2020). "A better balance in metaheuristic algorithms: Does it exist?" Swarm and Evolutionary Computation 54: 100671.
Nguyen, B. H., B. Xue and M. Zhang (2020). "A survey on swarm intelligence approaches to feature selection in data mining." Swarm and Evolutionary Computation 54: 100663.
Osmani, A., J. B. Mohasefi and F. S. Gharehchopogh (2022). "Sentiment classification using two effective optimization methods derived from the artificial bee colony optimization and imperialist competitive algorithm." The Computer Journal 65(1): 18-66.
Palhazi Cuervo, D., P. Goos, K. Sörensen and E. Arráiz (2014). "An iterated local search algorithm for the vehicle routing problem with backhauls." European Journal of Operational Research 237(2): 454-464.
Pant, M., H. Zaheer, L. Garcia-Hernandez and A. Abraham (2020). "Differential Evolution: A review of more than two decades of research." Engineering Applications of Artificial Intelligence 90: 103479.
Pham, T. H. and B. Raahemi (2023). "Bio-inspired feature selection algorithms with their applications: a systematic literature review." IEEE Access.
Phogat, M. and D. Kumar (2023). "A Hybrid Metaheuristics based technique for Mutation Based Disease Classification." International journal of electrical and computer engineering systems 14(6): 635-646.
Piri, J., P. Mohapatra, R. Dey, B. Acharya, V. C. Gerogiannis and A. Kanavos (2023). "Literature Review on Hybrid Evolutionary Approaches for Feature Selection." Algorithms 16(3): 167.
Pirovano, A., H. Heuberger, S. Berlemont, S. Ladjal and I. Bloch (2021). "Automatic Feature Selection for Improved Interpretability on Whole Slide Imaging." Machine Learning and Knowledge Extraction 3(1): 243-262.
Prajapati, V. K., M. Jain and L. Chouhan (2020). "Tabu Search Algorithm (TSA): A Comprehensive Survey." Proceedings of 3rd International Conference on Emerging Technologies in Computer Engineering: Machine Learning and Internet of Things, ICETCE 2020(February): 222-229.
Rashedi, E., H. Nezamabadi-Pour and S. Saryazdi (2009). "GSA: a gravitational search algorithm." Information sciences 179(13): 2232-2248.
Rostami, M., K. Berahmand, E. Nasiri and S. Forouzandeh (2021). "Review of swarm intelligence-based feature selection methods." Engineering Applications of Artificial Intelligence 100: 104210.
Sami Khafaga, D., E.-S. M. El-kenawy, F. Khalid Karim, M. Abotaleb, A. Ibrahim, A. A. Abdelhamid and D. L. Elsheweikh (2023). "Hybrid Dipper Throated and Grey Wolf Optimization for Feature Selection Applied to Life Benchmark Datasets." Computers, Materials & Continua 74(2): 4531-4545.
Sardari, F. and M. E. Moghaddam (2017). "A hybrid occlusion free object tracking method using particle filter and modified galaxy based search meta-heuristic algorithm." Applied Soft Computing 50: 280-299.
Shah-Hosseini, H. (2011). Otsu's criterion-based multilevel thresholding by a nature-inspired metaheuristic called galaxy- based search algorithm. 2011 third world congress on nature and biologically inspired computing, IEEE.
Shami, T. M., A. A. El-Saleh, M. Alswaitti, Q. Al-Tashi, M. A. Summakieh and S. Mirjalili (2022). "Particle Swarm Optimization: A Comprehensive Survey." IEEE Access 10: 10031-10061.
Shanthi, S., V. Akshaya, J. Smitha and M. Bommy (2022). "Hybrid TABU search with SDS based feature selection for lung cancer prediction." International Journal of Intelligent Networks 3: 143-149.
Sharma, M. and P. Kaur (2021). "A comprehensive analysis of nature-inspired meta-heuristic techniques for feature selection problem." Archives of Computational Methods in Engineering 28: 1103-1127.
She, J. (2021). "Combining PPO and Evolutionary Strategies for Better Policy Search." accessed: Nov. 6th.Shehab, M., I. Mashal, Z. Momani, M. K. Y. Shambour, A. Al-Badareen, S. Al-Dabet, N. Bataina, A. R. Alsoud and L. Abualigah (2022). Harris Hawks Optimization Algorithm: Variants and Applications. 29: 5579-5603.
Simoncini, D. and K. Y. J. Zhang (2018). Population-based sampling and fragment-based de novo protein structure prediction, Elsevier Ltd.
Singh, N. and S. B. Singh (2017). "Hybrid Algorithm of Particle Swarm Optimization and Grey Wolf Optimizer for Improving Convergence Performance." Journal of Applied Mathematics 2017.
Soui, M., N. Mansouri, R. Alhamad, M. Kessentini and K. Ghedira (2021). "NSGA-II as feature selection technique and AdaBoost classifier for COVID-19 prediction using patient’s symptoms." Nonlinear Dynamics 106(2): 1453-1475.
Talbi, E.-G. (2009). Metaheuristics: From Design To Implementation.Tang, J., G. Liu and Q. Pan (2021). "A review on representative swarm intelligence algorithms for solving optimization problems: Applications and trends." IEEE/CAA Journal of Automatica Sinica 8(10): 1627-1643.
Thawkar, S. (2021). "A hybrid model using teaching–learning-based optimization and Salp swarm algorithm for feature selection and classification in digital mammography." Journal of Ambient Intelligence and Humanized Computing 12: 8793-8808.
Yadav, A. and D. K. Vishwakarma (2020). "A comparative study on bio-inspired algorithms for sentiment analysis." Cluster Computing 23: 2969-2989.
Yang, X.-S. and A. Slowik (2020). Firefly algorithm. Swarm intelligence algorithms, CRC Press: 163-174.Yu, V. F., A. A. N. P. Redi, Y. A. Hidayat and O. J. Wibowo (2017). "A simulated annealing heuristic for the hybrid vehicle routing problem." Applied Soft Computing 53: 119-132.
Zhang, J., Y. Xiong and S. Min (2019). "A new hybrid filter/wrapper algorithm for feature selection in classification." Analytica chimica acta 1080: 43-54.

No competing interests reported.

APPENDIX.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Hybrid Metaheuristic Algorithms for Feature Selection in Classification: A Systematic Literature Review

Status:

Version 1

Abstract

Figures

I. Introduction

II. Basic concepts

A. Metaheuristic optimization algorithms

1) Single solution-based metaheuristic algorithms

2)Population-based metaheuristic algorithms

B. Common framework of metaheuristics

C. Exploration and exploitation

E. Local optima stagnation

F. Hybridization

G. Feature selection

III. Review methodology

IV. Data synthesis and results

V. Conclusion

Declarations

Author Contribution

Data availability

References

Additional Declarations

Supplementary Files

Status:

Version 1