Correlation Matrix as a Smart Filter for Malware Classification Using Ensemble of Novel Feature Selection Algorithms

doi:10.21203/rs.3.rs-3154479/v1

Download PDF

Research Article

Correlation Matrix as a Smart Filter for Malware Classification Using Ensemble of Novel Feature Selection Algorithms

https://doi.org/10.21203/rs.3.rs-3154479/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

While the development of technology has made our lives easier, our dependence on it has also increased. Cybercriminals develop various types of malware to exploit this dependency. Thus, study of malware, for example, malware classification, is essential for security researchers and incident response teams to take the necessary action against it to prevent or reduce the damage intended. This paper investigates the importance of features in a given malware dataset. To this end, we combine three recently proposed feature selection methods: Leave One Feature Out (LOFO) Importance, Feature Relevance-based Unsupervised Feature Selection (FRUFS), and A General Framework for Auto-Weighted Feature Selection via Global Redundancy Minimization (AGRM) to find the most effective subset of features for the objective at hand. This preprocessing is important as it allows working with fewer features, reducing the dimension of the problem at hand. We use the feature subsets from these three different feature selection methods with three different models: Extreme Gradient Boosting (XGBoost), Random Forest (RF), and Histogram-Based Gradient Boosting (HGB) classifiers using a stacking ensemble. From the nine prediction probabilities obtained, we eliminated the prediction probabilities containing similar information by setting a threshold in the correlation matrix. By inputting the final prediction probabilities to the Support Vector Machine (SVM) meta-classifier, we aim to display how effective the proposed scheme could be. Our proposed model obtains an average of 1.2% better classification accuracy than the tested feature selection methods on one of the well-known and most widely used malware datasets (Microsoft Malware Prediction dataset).

Artificial Intelligence and Machine Learning

Feature Selection

Ensemble

FRUFS

AGRM

LOFO

Malware Classification

Competing interests: The authors declare no competing interests.

Download PDF

Version 1

posted

You are reading this latest preprint version

Correlation Matrix as a Smart Filter for Malware Classification Using Ensemble of Novel Feature Selection Algorithms

Status:

Version 1

Abstract

Full Text

Additional Declarations

Status:

Version 1