The presence of harmful files on Android devices must be detectable by antivirus software. Despite that, the situation is ideal. The classic method of comparing suspect executable files to already-banned signatures in a database is often used by malware detection software, although malware authors are skilled enough to get around it. The method, nevertheless, requires a lot of time and resources. Machine learning models were used to solve these problem, and different researchers may employ different features for malware identification and categorization. A researcher is more likely to use characteristics that can precisely identify hazardous behaviour and yield superior results if he or she is a "expert." Otherwise, the result can be unsatisfactory.
Deep learning was intended as the answer to these issues. In multiple earlier investigations, the APK was reverse engineered to extract properties [43] [44] for model construction. The technique produced more accurate results, but it is difficult and time-consuming to use. In this situation, our research was crucial since we wanted to use a simpler model with less training data. Prior to training deep learning models on these images, we preprocess the input executable files (APKs) of benign and malicious software into images.
Making malware binaries into images and using machine learning on such images will enable Android malware to be detected successfully. The target apps' whole executable files are converted into images and used for machine learning in existing research (example: DEX files in Android application packages). However, the entire DEX file, which consists of a header part, an identifier section, a data section, an optional link data region, etc., may have noisy information that makes it difficult to detect malware. In this study, we solely convert data portions of DEX files into grayscale images and use CNN to apply machine learning to the images.
a) Performance of the customized CNN model
In the beginning, we built a fundamental Convolutional Neural Network from scratch using the same approach as [45], trained it using a training image dataset, and tested it. The training progress curve with accuracy and loss values is shown in Fig. 7. The charts demonstrated that the model is optimal because there is no overfitting. Table 2 displays the generated CNN's performance. It is also evident from Fig. 8's confusion matrix that all of the test set's benign samples were accurately predicted (Recall: 100%). However, only 155 out of 176 images for malignant samples are successfully predicted (Recall: 88.1%). The model misread some of the malignant images and incorrectly identified them as belonging to the benign class (11.9%).
b) Performance of the transfer learning
Reductions were made to the VGG19 weights, the trainable weights in the top layers of CNN, and the fully linked layers. A model's validation loss is automatically observed by the early stopping callback, which determines when to halt training based on its parameterization. According to the setting of 5, it will take five epochs for the data loss to diminish by the minimum delta. Changing the learning rate as the number of epochs increases and the validation loss disappears is the responsibility of the learning rate scheduler. To extract features from imagenet weights and transfer the result to a new classifier, we used a VGG-16 pretrained model. We must include weights = "imagenet" to retrieve the VGG-16 model which was produced using imagenet dataset. To circumvent downloading the pretrained model's fully linked layers, include top = False is mandatory. The total number of parameters in both the top and bottom parts of the network is 14,976,834, which includes the VGG16 weights, trainable weights in the top layers of CNN, and fully connected layers. The number of trainable parameters has been reduced to 262,146 because the VGG16 layers were frozen. There were 14,714,688 non-trainable parameters. The loss and accuracy plots (Fig 9) indicate that the model is optimized. The performance of the model is shown in Table 3. By comparing the confusion matrix of the previous CNN model, the recall of both the classes were improved. The benign class has 95.8% of true predictions while the malignant images acquired 99.4%.
Table 3: Performance of the developed models.
Model
|
Accuracy (%)
|
F1 score (%)
|
Precision (%)
|
Recall (%)
|
Customised CNN
|
93.43
|
93.42
|
93.63
|
94.03
|
Pretrained VGG16
|
97.81
|
97.78
|
97.98
|
97.63
|
Results indicate that the transfer learned VGG-16 model performs the best compared to the customized CNN model for the malware detection problem. It could be necessary to use a more complex model, as the one trained on imageNet weights for learning the informative features [46], as the images get bigger. We developed the CNN as a means to understand the performance of a less complicated model on the malware detection process. But as the images were created directly from the APKs, the image size was too large and the feature extraction part could not be handled by the convolutional layers designed by us as efficiently as the VGG-16 model. If the images for CNN training was developed only from the .dex content inside the APK file [47], the image size will be more reduced, and contain only the required information and in such a smaller model could be able to classify the images properly. But extracting relevant information from .dex file requires more time and computation as there is a need to reverse engineer the APKs.
c) Comparison of performance
As shown in Table 4, a number of works used images to train CNN to recognise malware by converting APKs or other APK components to images. [25] have created a variety of CNNs, including VGG, GoogleNet, AlexNet, and Inception-v3, using the bytecode of classes. classes.dex from an Android zip file converted to RGB. The study suggests a detection method that can identify both known and unidentified Android malware. They also intended to integrate the method with the backend of their main product to provide comfortable usage situations for consumers or enterprises, thus accuracy wasn't as important to them as cutting down on computational costs.
Grayscale images taken directly from an executable sample of a mobile app were used by [27]. The feature set is put into a three-layer DNN in order to determine whether the sample under examination is malware and, if so, to which family and iteration it corresponds. The Neural Network was far more basic than our model, which would account for its lower accuracy.
[26] converted the DEX file into RGB images and plain text based on the section characteristics of the DEX file, and then extracted image and text properties to classify the Android malware. A variety of properties were examined in the images, and these results were used to train the classifier. With manual feature extraction and conventional machine learning, it achieved 96%, but it takes a lot of effort.
The model's performance is comparable to ours. Similar to our work, [48] converted raw malware binaries into colour images that the optimised CNN architecture utilised to locate and classify malware families.
The findings show that the model can match the performance of a variety of cutting-edge models that determine if a file is benign or malignant using extracted data, such as API calls [49], which were recovered and turned into a sequence [50]. To reduce False positives and False negatives in the prediction, the majority's knowledge of which characteristics contribute to malware's malignant nature or which sequences are present in all malwares has a fundamental error. Combining attention networks with standard CNN can do this [51].
Table 4: Comparison of performance to previous works.
Works
|
Accuracy
|
Hsien-De Huang and Kao (2018) [25]
|
90%
|
Mercaldo and Santone, (2020) [27]
|
91.8%
|
Fang et al, 2020 [26]
|
96%
|
Vasan et al., 2020 [48]
|
97.35%
|
Our model
|
97.81%
|
The presence of harmful files on Android devices must be detectable by antivirus software. To address this issue, ML models were used. We create images by preprocessing both malicious and benign input files (APKs). Although the procedure was more accurate, it was difficult and time-consuming to put into practise. To better understand how a simpler model performed during the malware detection process, we created the CNN. But because the images were made directly from APKs, they were too big, and the feature extraction part could not be handled by the convolutional layers we developed as effectively as the VGG-16 model. Compared to the malignant images, which acquired 99.4%, the benign class has 95.8% accurate predictions.