Evaluating Supervised Machine Learning Models for Zero-Day Phishing Attack Detection: A Comprehensive Study

doi:10.21203/rs.3.rs-3204260/v1

Download PDF

Case Report

Evaluating Supervised Machine Learning Models for Zero-Day Phishing Attack Detection: A Comprehensive Study

https://doi.org/10.21203/rs.3.rs-3204260/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

To have highly secure e-commerce websites, detecting and preventing cyber-attacks is of high importance. Among diverse types of cyber-attacks, identifying zero-day attacks is problematic since they are unknown to the security system. It is because they usually are launched by an attacker and none of the existing defined patterns match with the unknown (malicious) case. There are many machine learning models developed to analyze and detect phishing websites, specifically using supervised models. However, the main issue with zero-day attacks is that they are not seen before, so their patterns are not trained to the model. Thus, the supervised models designed for detecting phishing URLs should be very accurate in predicting the label of unseen data. This research addresses the underlying issue by evaluating seven different supervised machine learning models to assess their accuracy in predicting zero-day phishing attacks. Unlike previous studies that examined models on features that are only extracted from URLs, our evaluation framework incorporates a comprehensive dataset that includes not only URL features but also third-party extracted features as well as content-based features. This research also examines the performance of the models under the impact of dimension reduction techniques. By reducing the dimensionality of the dataset, we aim to improve computational efficiency without compromising the accuracy of the models. The results depict that XGBoost performs best on zero-day attack data sets with accuracy and an f1-score of 96.6%, and PCA can be applied in high-dimensional data sets without adverse effects on the models’ performance.

Zero-day attacks

Supervised Models

Machine Learning

Phishing URLs

Dimension Reduction

Phishing attacks are growing threats to individuals and businesses, as they can lead to financial losses and privacy violations. The rise of electronic services, such as e-banking and e-commerce, has made Uniform Resource Locator (URL) phishing attacks more prevalent, with cybercriminals using fake webpages to trick users into revealing sensitive information. The recent pandemic has also made the ground for attackers to target remote workers and healthcare institutions (Belfedhal & Belfedhal, 2022). Despite the efforts of organizations such as the Anti-Phishing Working Group, web phishing attacks continue to evolve and exploit vulnerabilities in security mechanisms like HTTPS and SSL (Naresh Kumar D & Panimalar, 2020).

Existing solutions for detecting phishing attacks rely on reactive block listing and heuristic methods, which have limitations in terms of small trusted lists and detecting new visual appearances of phishing pages (Abdelnabi et al., 2020). For zero-day attacks, in particular, that have not been seen before and are not listed in phishing website lists, these methods are not applicable. Zero-day attacks are typically used only once, to steal information from specific victims, and are then discarded to avoid being detected by security measures. In other words, the criminals generate a new URL for each attack to reduce the risk of being caught and to maximize the number of victims they can target (Bu & Cho, 2021). These attacks are particularly dangerous as they leave no time for a case report to be filed. The key challenge of zero-day phishing attacks is that the phishing URLs are created and discarded immediately after the required information is obtained by the attacker. In some cases, it is even reported that a zero-day attack remains undetected for a considerable period (several months) before they are recognized by the security team or software.

To address these limitations, machine learning (ML) approaches have been increasingly applied to enhance the accuracy of phishing detection and prevention, some of which are described in the literature review section. However, it is important to consider three crucial factors when implementing these techniques: (1) selecting appropriate and effective ML models, (2) utilizing distinctive and informative features, and (3) gathering a comprehensive set of representative samples for training the model (Hannousse & Yahiouche, 2021).

Considering the mentioned factors, we developed a framework to examine the performance of multiple supervised models in detecting zero-day attacks. These models are evaluated using a comprehensive benchmark data set with 11430 URLs including phishing and legitimate URLs. The data set includes a proper variety of features extracted from URLs, web content, and third-party service providers (external features). There are three main objectives for carrying out this research: i) examining the performance of multiple light-weight binary classification methods on the given benchmark data set and their performance in detecting zero-day attacks; ii) examining the impact of dimension reduction on the given data set and in identifying unseen attacks; iii) and examining the impact of removing external features from the data set. The reason for performing the last objective is that the benchmark data set provides valuable features which resulted in high performance for the ML techniques. However, extracting data for external features from third-party services is not always possible and requires time, so these features are challenging for real-time phishing detection. For this reason, we aimed to examine how removing these features from the data set may affect the performance of the models.

This paper consists of the following sections: in section 2 the reviewed literature is discussed to show what relevant works have been performed in the field of applying ML methods in URL phishing detection. Section 3 explains the framework designed for this project with detail of its development and implementation. Section 4 shows the results of the implementation and evaluates them according to the reviewed literature. Section 5 provides a discussion of the work and findings of the research, and finally, section 6 gives the conclusion of the research.

Despite the existence of various anti-phishing techniques, users are still falling victim to these attacks. In the recent decade, ML-based models have been applied widely in URL phishing detection. Some of them designed their model to be applicable for detecting zero-day attacks, while others are not. Table 1 summarizes the recent articles (published in the last five years from 2018 to 2023) studied for this research and depicts which ML methods and data sets are utilized in the purposed frameworks.

Table 1 Summaries of reviewed papers that applied ML models in detecting phishing URLs

Reference	ML methods	Type of data	0-day	Source of the data set
Hannousse & Yahiouche, 2021³	NLP feature extraction + RF*, DT, LR, NB, and SVM	Text (URLs)	No	Created a data set with 11430 samples and 87 features extracted from Alexa and Yandex (for legitimate) and PhishTank and OpenPhish⁴ (for malicious samples).
Abdelnabi et al., 2020 - VisualPhishNet	CNN and similarity metrics	Visual data (screenshot of pages)	No	Made the largest visual phishing detection data set with 10250 samples
Wei et al., 2020	CNN	Text (URLs)	Yes	10604 phishing URLs from PhishTank⁵/10604 legitimate URLs from Common Crawl Foundation⁶
Ariyadasa et al., 2022- PhishDet	Graph CNN + Long-Term Recurrent Convolutional Network	Text (URLs and HTML codes)	Yes	Two different data sets with (40000 and 50000 records were built by collecting legitimate URLs from the Google search engine and phishing samples from PhishTank. A benchmark data set is used for testing (with 46096 samples)
Bu & Cho, 2021	Deep Convolutional AutoEncoder	Text (URLs)	Yes	three real-world URL data sets (ISCX-URL-2016 data set, PhishTank, and PhishStorm) consisting of 222,541 legitimate and phishing URLs
Belfedhal & Belfedhal, 2022	MLP + TF-IDF	Text (URLs)	No	A benchmark data set of 73575 phishing and legitimate URLs
Naresh Kumar D & Panimalar, 2020	RF, KNN, DT, SVM, LR	Text and structured data (URL, source code, session, type of security, protocol, and website type)	No	Not described in the article
Sahingoz et al., 2019	NLP feature extraction + DT, Adaboost, K-star, KNN (n = 3), RF, SMO, NB	Text (URLs)	Yes	Ebbu2017 Phishing Data set: a data set with 73,575 URLs (36,400 legitimate URLs and 37,175 phishing URLs)
Ghalati et al., 2020	Semantic feature extraction + RF, LR, NB	Text (Protocol, Domain, Path, URL, IP)	Yes	Ebbu2017 Phishing Data set 74k URLs, 36k are legitimate and 37k are phishing. DMOZ data set and the Alexa.com data set for providing benign data sources
Chatterjee & Namin, 2019	Deep Reinforcement	Text (URLs)	Yes	Ebbu2017 Phishing Data set

*RF: Random Forest, DT: Decision Tree, LR: Logistic Regression, SVM: Support Vector Model, NB: Naïve Bayes, KNN: K-Nearest Neighbors

[3] This data set is applied in our research

[4] website: https://openphish.com/

[5] https://phishtank.org/

[6] http://commoncrawl.org/

As can be seen, all the studies utilized supervised classification models. Also, six out of ten papers have used Neural Networks (NN) and Deep Learning in their models and only two of them applied non-NN classification models. In addition, except for one research (Abdelnabi et al., 2020) that used visual data in their model, other papers utilized URLs in their work, and some of them applied NLP-based feature extraction methods to build a structured data set (Ghalati et al., 2020, Sahingoz et al., 2019, Belfedhal & Belfedhal, 2022). The table also shows that six out of ten works are examined in detecting zero-day attacks. Here are more details of how each of these studies designed and applied ML models in URL phishing detection.

(Abdelnabi et al., 2020) introduced VisualPhishNet, a similarity-based phishing detection framework, which applies Convolutional Neural Network (CNN). Using a similarity metric, this framework detects phishing websites, particularly on pages with new appearances. The authors also built the largest visual phishing detection data set by crawling active phishing pages on the PhishTank website and collecting screenshots from the pages which resulted in 10250 samples. The evaluation metrics they considered in detecting phishing samples were elements’ sizes, colors, locations, and website languages. Convolutional Neural Network is also applied in other research to recognize phishing webpages by only analyzing URLs (Wei et al., 2020). The authors explained that, unlike other studies that separate URLs into different parts and analyze each part to identify phishing attacks, their model only needs to encrypt URLs as one-hot character-level vectors. Then the vector is fed to CNN and the model can detect phishing samples with an accuracy of almost 100% and is effective for detecting zero-day attacks as well.

PhishDet is another neural network-based framework that is applied to recognize zero-day phishing attacks. Being developed based on Graph Convolutional Neural Networks and Long-Term Recurrent Convolutional Networks, this model can detect malicious websites by analyzing their URLs and HTML codes with an accuracy of 96. 42%. To maintain its high performance, PhishDet requires frequent retraining over time (Ariyadasa et al., 2022). Utilizing a Deep Convolutional AutoEncoder (CAE), (Bu & Cho, 2021) provided a character-level URL feature modeling to detect zero-day phishing attacks. They applied three real-world phishing data sets with 222,541 URLs to examine their model and considered the Receiver-operating characteristic (ROC) as the metric for comparing their results with the results of other models in the literature.

(Belfedhal & Belfedhal, 2022) presents a lightweight system for real-time detection of phishing webpages using a neural network method and features extracted from URLs. The system uses a multilayer perceptron (MLP) model and two categories of features: lexical features extracted from the URL strings and tokens frequency using the TF-IDF algorithm. Experimental results showed that the system achieved a false negative rate of only 1.35%.

Among the reviewed articles, one considered the non-static nature of phishing websites and used reinforcement learning to overcome this issue. (Chatterjee & Namin, 2019) presents a new approach for detecting malicious URLs through a Deep Reinforcement learning-based model that adapts to the evolving nature of phishing websites. The model is developed using a deep neural network and is capable of learning features associated with phishing website detection. The performance of the model is evaluated using precision, recall, accuracy, and F-measure, and the results are compared with existing phishing URL classifiers.

Regarding non-NN models, four articles have been done to examine classification techniques. (Naresh Kumar D & Panimalar, 2020) proposes an ML-based classification algorithm that uses heuristic features such as URL, source code, session, type of security, protocol, and website type to detect phishing websites. The algorithm is evaluated using five machine learning models, and the random forest algorithm is found to be the most effective, achieving an attack detection accuracy of 91.4%.

Also, (Sahingoz et al., 2019) propose a real-time anti-phishing system that uses seven different classification algorithms and natural language processing (NLP)-based features. The system is distinguished from other studies in literature by its language independence, use of a large data set, real-time execution, detection of new websites, and use of feature-rich classifiers. The experimental results showed that the Random Forest algorithm with only NLP-based features achieved the best performance for detecting phishing URLs.

Another research used semantic features from domains and URLs to detect malicious URLs (Ghalati et al., 2020). The authors introduced an adaptive method that can dynamically change based on new feedback received on 0-day attacks, and they found that Random Forest has the highest accuracy of over 96% with more interpretability and performance benefits.

One of the studies that applied non-NN models is performed with the main objective of creating a benchmark balanced data set that includes an equal number of benign and malicious URLs (Hannousse & Yahiouche, 20). The collected URLs are processed, and 87 features are extracted from URLs, web page contents, and external resources that are obtained by querying third-party service providers (such as Alexa and WHOIS). The authors then applied five different classification methods (RF, DT, LR, NB, and SVM) to see their performance on the created data set. However, they did not consider the issue of zero-day attacks in their work. This data set is utilized in our research to examine the performance of multiple classifiers (RF, DT, SVM, LR, MLP, XGBoost, and AdaBoost) with the whole data and by applying dimension reduction techniques and their performance on the zero-day test set. We also aimed to examine whether tuning hyperparameters of the given models leads to a higher performance score. In addition, we tried to answer the question of the impact of the external features on the models’ performance. Our proposed model is described in detail in the next section.

In this section, the framework designed to detect zero-day phishing URLs is described with details about the data set and machine learning models applied in this work. This framework is designed to examine the performance of multiple supervised algorithms both on the whole data set and on the reduced dimension data set, to see the effects on the models’ performance.

The proposed framework consists of two major components. The first component is related to dimension reduction which is about decreasing the number of features in the data set. The second component is for examining multiple classification models on the data set in different conditions (with and without dimension reduction). We reduced the dimensions of the data set in two different ways. First, by a dimension reduction method, which automatically chooses fewer effective features, and second, by dropping the external features to see the impact of these features on the models’ performance. The reason why we chose to drop external features is that these features should be obtained from third-party services, and it is possible that in some cases the third-party services are not available or do not provide the required data. So, this research aimed to examine how important and effective these external features are in zero-day phishing prediction. Figure 1 illustrates the diagram of our framework model.

3.1. Dimension Reduction

For dimension reduction, two different methods are applied: algorithm-based reduction using Principal Component Analysis (PCA) and manual dropping of external features. PCA is a method of analyzing complex data sets with multiple variables to extract crucial information while minimizing the loss of relevant data by reducing the dimensionality and retaining the maximum possible variation (Sanguansat, 2012). Applying PCA to the data set and then examining the output of the classification models shows how relevant and important the removed features are in the prediction. If the results show a decrease in the performance, it depicts that the dropped data contain the necessary information for the models. To find the best number of components that should be used in PCA, Grid Search was applied to a range of values, and it showed that 78 features of the main data set should be decreased to 65 (the best number of components was 65).

In addition, we evaluate the model by dropping the external features from the data set to see whether these features play key roles in phishing detection or not. According to (Hannousse & Yahiouche, 2021) these features are external and gain data from third parties that are as follows.

These features provide valuable information on when the domain is registered and by whom, the website traffic and the number of its users, its DNS record, whether it is indexed by Google, and its rank in the Google search engine. In the Evaluation section, we will discuss the results of each of these conditions we applied in the framework.

3.2. Classification Models

Before explaining the utilized ML models, it is worth noting to give the reason why we used supervised (classification) models rather than unsupervised (or clustering) methods. The reason is that the data set is labeled, and each record shows the features of either a legitimate (normal) or phishing URL. Therefore, supervised models are truly reasonable choices, and especially because of having two labels, binary classification models are applicable. In this framework, these seven different classification models are examined: Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), XGBoost (XGB), AdaBoost, Multi-layer Perceptron (MLP). Here are brief definitions for each of these models.

A Decision Tree (DT) is a model that predicts outcomes by recursively dividing the feature space into subspaces, which serve as the basis for the prediction (Rokach, 2016). Random Forest (RF), which is introduced by Breiman (2001), is an ensemble method for classification that makes use of multiple decision trees to partition the data and predict classes through a majority vote. The SVM is a machine learning algorithm used for supervised learning in both classification and regression tasks. It was developed by Boser et al. (1992) based on statistical learning theory. Extreme gradient boosting (Xgboost) is a gradient boosting machine (GBM) algorithm that is widely used in supervised learning for both classification and regression problems. It is favored by data scientists due to its ability to perform high-speed out-of-core computation (Chen & Guestrin, 2016). The fundamental concept of the AdaBoost algorithm is to train a group of weak classifiers on a single training set, and subsequently merge their outcomes to form a more robust and effective final classifier through an iterative approach (Zhang et al., 2017). Last but not least, Multi-Layer Perceptron (MLP) is a type of feedforward artificial neural network that typically includes three or more layers of nodes. It uses backpropagation as a supervised learning technique for training, and it can identify data that are not linearly separable, making it a form of deep learning (Savalia & Emamian, 2018).

3.3. Grid Search

Each of these models has several hyperparameters that impact the process and time of learning in the training phase. Hyperparameters which are called hparams in this article should be tuned with the best probable values to result in optimum performance in the model. Different methods can be used in tuning the hparams, one of which is Grid Search.

Grid search is a method used to generate various model configurations by analyzing a range of values for each hparams of interest. The approach involves training and testing models across all combinations of values for all hparams. While the technique is simple to use, it can be expensive due to the large number of hparams and levels of each and as a result, the computing cost increases exponentially (Belete & Huchaiah, 2022).

To use Grid Search, the hparams are given a set of values that can be seen in Fig. 2.

Based on the results obtained from Grid Search, the classification models are set with the resulted values, which are shown in Fig. 3.

Then, these values are given to the models to see which model performs better than others. We split the data set with a proportion of 30% for the test set. The results are discussed in the Evaluation section, however, XGBoost shows the best performance among other models, and we selected it as the champion model in the framework. The results of running the models on the training set are shown in Table 2 and Fig. 4.

Table 2

Performance of ML models on the training set
ML Model	Recall	Precision	F1_Score	ROC_AUC
XGB Classifier	0.968	0.965	0.967	0.994
Random Forest	0.963	0.967	0.965	0.993
AdaBoostClassifier	0.963	0.965	0.964	0.993
Decision Tree	0.939	0.938	0.939	0.938
Logistic Regression	0.784	0.786	0.785	0.864
Multilayer Perceptrons (MLP)	0.750	0.794	0.746	0.737
SVM	0.889	0.567	0.692	0.778

We considered F1-score as the major metric in evaluating the performance of the models, however, other metrics such as recall, precision, and ROC-AUC are also displayed in the result. The next step after finding the champion model is running it over the test data set.

3.4. Data Set

The data set used in this project is structured and includes 87 features extracted from 11430 URLs. The data set is created by (Hannousse & Yahiouche, 2021) to design a benchmark for ML-based phishing detection studies. It provides valuable data gathered from the URL and webpages, including 56 features about the structure and syntax of webpage links, 24 features from web contents, as well as 7 features extracted from external services. The data set is balanced with 50% balanced and 50% phishing data. All 87 features have no null values, and all data are processed and cleaned properly, thus no record was removed, and all 11430 rows are used. The column “URL” is removed from the data set, because its features are extracted and stored in other columns. We set the “status” feature as the label (Y variable), and other features as input (X).

In this research, models are designed to predict zero-day attacks. For the zero-day attacks data set, we followed the method (Abdelnabi et al., 2020) applied in their research. They examined 955 recently created PhishTank⁷ pages that targeted trusted lists. These pages were not included in the data set used to train and evaluate the model in previous experiments, and therefore, they represent future pages that are distinct from the model. They then evaluated the model using this new set of pages, without retraining it. Therefore, to use the data set for examining ML models, it is divided into 70% for training and 30% for the test set as zero-day attacks.

[7] https://phishtank.org/

In this section, we provide the results of the experiments we have conducted from the research. We examined the models on the benchmark data set in three different ways: i) the whole data set; ii) the reduced data set with PCA; iii) and the reduced data set with dropping external features. Table 3 shows the results of each experiment.

Comparing the results with the original article that created the benchmark, we found that Random Forest was the champion model with an accuracy of 96.61% (Hannousse & Yahiouche, 2021). Our model concluded that XGBoost shows the highest accuracy of 96.6%, as well. The high accuracy of the model demonstrates that this framework can be utilized for detecting zero-day attacks with high accuracy. The original article did not consider zero-day attacks in their experiment. We also observed that applying PCA on the benchmark data set results in a very slight decrease in the output of the models (96% for XGBoost), so in the case that the number of records is very high, dimension reduction using PCA seems a reasonable approach since it helps to increase the speed of the process without any considerable decrease in performance. In addition, dropping the external features from the data set resulted in a significant decrease in the accuracy of the models (93% for XGBoost), which shows that these features provide valuable information in detecting phishing attacks.

The results of examining the framework showed that in all three ways of evaluating the models, XGBoost demonstrates the highest performance detecting zero-day phishing URLs attacks. Executing the model on the whole data, resulted in the high accuracy and an f1-score of 96.6%, while for reduced data sets with PCA, and manual method, it showed 96% and 93.8% of accuracy respectively. It means that XGBoost is a reliable model in detecting phishing URLs that had not been seen before by security systems. It is followed by AdaBoost and Random Forest which come in the next place.

In addition, many previous studies conducted for phishing detection, the proposed frameworks were tested on data sets that only include URL features. However, our model is tested with a holistic data set containing a variety of features extracted from URLs, webpage contents, and third-party services. Our framework showed that in the case that the data set is too large, PCA can be used to reduce the dimension, without making a considerable incline in the performance. As can be seen, PCA had a slight effect on the performance of XGBoost, AdaBoost, and Random Forest.

On the other hand, dropping the features that have been extracted from third-party services, caused a considerable decrease in the models’ performance, which means that external features include important information about phishing URLs. Excluding this type of features is crucial particularly in the case of detecting phishing URLs in real-time, because third-party services may not be available all the time.

Reviewing the literature shows that neural network (NN) techniques are very popular in designing phishing detection systems, due to their high performance in detecting phishing cases. However, in our proposed framework, MLP as a neural network model, did not resulted in a high accuracy and f1-score. The main drawback of NN models is that they are very slow in training, so in large data sets, they require huge time and resources, which is not applicable in real-time or light-weight phishing detection systems.

This study examined the performance of seven different classification models in determining zero-day phishing URLs and concluded that XGBoost shows the highest performance. It also assessed the changes in the underlying models with decreasing data sets’ dimensions and figured out that PCA is a useful tool for dimension reduction without decreasing the model’s accuracy.

The data set utilized for this research is a balanced one with the same number of benign and malicious URLs, while in the real word, the number of attacks is by far less than the number of normal cases. Therefore, for the future work, we intend to apply an imbalanced data set with few numbers of phishing URLs. Under this circumstance, some methods should be employed to overcome the imbalanced issue in the data set. So, we will use generative models (like GAN and VEA) to enhance the effectiveness of our framework. We also plan to develop this framework as a web-browser plugin with lightweight and fast algorithms to provide web users with a zero-day phishing detection service.

Acknowledgment

Author contribution- All authors contributed to this article equally.

Funding- The authors did not receive support from any organization for the submitted work.

Data availability- The dataset which is used for this research is open-source and available to the public on Kaggle website[2].

Conflict of interest- The authors have no conflicts of interest to declare that are relevant to the content of this article.

[2] https://www.kaggle.com/datasets/shashwatwork/web-page-phishing-detection-dataset

Ariyadasa, S., Fernando, S., & Fernando, S. (2022). Combining Long-Term Recurrent Convolutional and Graph Convolutional Networks to Detect Phishing Sites Using URL and HTML. IEEE Access, 10, 82355–82375. https://doi.org/10.1109/ACCESS.2022.3196018
Abdelnabi, S., Krombholz, K., & Fritz, M. (2020). VisualPhishNet: Zero-Day Phishing Website Detection by Visual Similarity. Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, 1681–1698. https://doi.org/10.1145/3372297.3417233
Belete, D. M., & Huchaiah, M. D. (2022). Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results. International Journal of Computers and Applications, 44(9), 875–886.
Belfedhal, A. E., & Belfedhal, M. A. (2022, December). A Lightweight Phishing Detection System Based on Machine Learning and URL Features. In International Conference on Managing Business Through Web Analytics (pp. 307-319). Cham: Springer International Publishing.
Mohammed Belkebir (Eds.), International Conference on Managing Business Through Web Analytics (pp. 307–319). Springer International Publishing. https://doi.org/10.1007/978-3-031- 06971-0_22
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992, July). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory (pp. 144-152).
Bu, S.-J., & Cho, S.-B. (2021). Deep Character-Level Anomaly Detection Based on a Convolutional Autoencoder for Zero-Day Phishing URL Detection. Electronics, 10(12), 1492. https://doi.org/10.3390/electronics10121492
Chatterjee, M., & Namin, A.-S. (2019). Detecting Phishing Websites through Deep Reinforcement Learning. 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), 227–232. https://doi.org/10.1109/COMPSAC.2019.10211
Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
Ghalati, N. F., Ghalaty, N. F., & Barata, J. (2020). Towards the Detection of Malicious URL and Domain Names Using Machine Learning. In L. M. Camarinha-Matos, N. Farhadi, F. Lopes, & H. Pereira (Eds.), Technological Innovation for Life Improvement (Vol. 577, pp. 109–117). Springer International Publishing. https://doi.org/10.1007/978-3-030-45124-0_10
Hannousse, A., & Yahiouche, S. (2021). Towards Benchmark Data sets for Machine Learning Based Website Phishing Detection: An experimental study. Engineering Applications of Artificial Intelligence, 104, 104347. https://doi.org/10.1016/j.engappai.2021.104347
Marchal, S., François, J., State, R., & Engel, T. (2014). PhishStorm: Detecting Phishing With Streaming Analytics. IEEE Transactions on Network and Service Management, 11(4), 458–471. https://doi.org/10.1109/TNSM.2014.2377295
Naresh Kumar D & Panimalar Engineering Collage. (2020). Detection of Phishing Websites using an Efficient Machine Learning Framework. International Journal of Engineering Research And, V9(05), IJERTV9IS050888. https://doi.org/10.17577/IJERTV9IS050888
Rokach, L. (2016). Decision forest: Twenty years of research. Information Fusion, 27, 111–125. https://doi.org/10.1016/j.inffus.2015.06.005
Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. (2019). Machine learning based phishing detection from URLs. Expert Systems with Applications, 117, 345–357. https://doi.org/10.1016/j.eswa.2018.09.029
Sanguansat, P. (2012). Principal Component Analysis: Engineering Applications. BoD–Books on Demand.
Savalia, S., & Emamian, V. (2018). Cardiac Arrhythmia Classification by Multi-Layer Perceptron and Convolution Neural Networks. Bioengineering, 5(2), 35. https://doi.org/10.3390/bioengineering5020035
Wei, W., Ke, Q., Nowak, J., Korytkowski, M., Scherer, R., & Woźniak, M. (2020). Accurate and fast URL phishing detector: A convolutional neural network approach. Computer Networks, 178, 107275. https://doi.org/10.1016/j.comnet.2020.107275
Zhang, X., Zeng, Y., Jin, X.-B., Yan, Z.-W., & Geng, G.-G. (2017). Boosting the phishing detection performance by semantic analysis. 2017 IEEE International Conference on Big Data (Big Data), 1063–1070. https://doi.org/10.1109/BigData.2017.8258030

Tables 3 is available in the Supplementary Files section.

No competing interests reported.

Table3.png
Table 3 The results of running the models on the test set with different conditions

Download PDF

Version 1

posted

You are reading this latest preprint version

Evaluating Supervised Machine Learning Models for Zero-Day Phishing Attack Detection: A Comprehensive Study

Status:

Version 1

Abstract

Figures

1. Introduction

2. Related Work

3. The Evaluation Framework

3.1. Dimension Reduction

3.2. Classification Models

3.3. Grid Search

3.4. Data Set

4. Evaluation

5. Discussion

6. Conclusion

Declarations

References

Table

Additional Declarations

Supplementary Files

Status:

Version 1