RF (Zhu et al., 2022) | Traditional | UCI dataset 30 features | - Using the multi-objective evolution algorithm to increase accuracy and recall metrics - Comparing with five different public datasets | - No feature extraction process - Not examining the response time | 98.37 |
CNN + Highway Deep Pyramid (Zheng et al., 2022) | DL | Imbalanced dataset containing 420,000 instances with ratio 1:5 (legitimate vs phishing) based on character-level and word-level features | - Getting raw URLs as input | - Not using balancing methods - Not examining the response time - Max URL length up to 120 characters | 98.3 |
MLP + RNN + CNN (Yu et al., 2022) | DL | balanced dataset containing 6,000 instances with features based on URL, HTML, text, and image | - Hybrid architecture and features | - Not examining the response time - Using powerful processor (GPU) - Max URL length up to 16 characters and HTML text up to 256 - Using small dataset | 97.75 |
RF (Wei & Sekiya, 2022) | Traditional | 58,645 legitimate 88, 647 phishing with selecting 14 features from 111 | - Using three algorithms for feature selection | - Not presenting feature extraction process - Not examining the response time | 97 |
Hybrid network (Wang & Chen, 2022) | DL | ISCX-URL2016 dataset | - High accuracy - Getting raw URLs as input - Integrating convolution branches (local correlation analysis) and transformer (encoding) | - Not examining the response time - Max URL length up to 200 characters | 99.77 |
CNN + LSTM (Shaiba et al., 2022) | DL | Ebbu2017 dataset based on character-level features | - Using optimization algorithm for parameter-tuning | - Not examining the response time | 99.01 |
LightGBM (Sanchez-Paniagua et al., 2022) | Traditional | 67,000 phishing 67,000 legitimate Based on 54 features: 21 URL, 8 HTML, 14 hybrid, and web technology | - instances of legitimate and phishing include 62% and 41% of login pages, respectively − 27 new features (including web technology) - Fast in feature extraction (43.56 milliseconds after loading website) | - Dependent on English language (in 5 copyright features) | 97.95 |
RF (Rao et al., 2022) | Traditional | 5,400 phishing 5,000 legitimate Feature extraction through tokenization and lemmatization of the domain-specific name in the source code | - Using 5 different word embedding algorithms - High accuracy - Response in 1.56 seconds - Implemented as a plugin | - Inefficient when not accessing the source code - Dependent on the English language | 99.34 |
SVM and NB (Orunsolu et al., 2022) | Traditional | Balanced dataset containing 5,000 instances with 15 features | - High accuracy | - Small dataset - Dependent on third-party services - Without any new feature - Not examining the response time | 99.96 |
KNN (Minocha & Singh, 2022) | Traditional | UCI dataset Selecting 27 features | - Using Modified Equilibrium Optimizer (MEO) for feature selection | - No feature extraction process - Not examining the response time | 97.46 |
DT (Marimuthu et al., 2022) | Traditional | 9,500 legitimate 13,500 phishing with 20 features | - Implementation as a plugin | - Using Alexa ranking | 99.4 |
Hybrid classifier (Hevapathige & Rathnayake, 2022) | Traditional | 476,000 legitimate 273,000 malwares (including 142,000 phishing) 53 URL-based features | - Combining 6 classification algorithms - Large dataset | - Not examining the response time - Using powerful processor (GPU) - Low accuracy - High computational complexity - Not tuning the parameter | 95.14 |
MLP (Alsaedi et al., 2022) | DL | 428,000 legitimate 223,000 malwares (including 94,000 phishing) with features based on URL, Google Search, and Whois | - Classification of each group of features with RF and final decision making with MLP | - Dependent on third-party services - Not examining the response time - No comparing with existing methods - Not explaining the features | 96.8 |
Fuzzy system (Abdul-Hussein et al., 2022) | Fuzzy Logic | Balanced dataset containing 20,000 instances with 6 features | - Optimizing the set of rules with Differential Evolution algorithm | - Extracting a feature based on page rank from the Alexa | 97.6 |
XGBoost (Das Guptta et al., 2022) | Traditional | Balanced dataset containing 6,000 instances from ISCX-URL2016 with 15 URL-based and 10 HTML-based features (hyperlink in the source code) | - High accuracy | - Small dataset | 99.17 |
RF (Bustio-Martinez et al., 2022) | Traditional | Balanced dataset containing 52,000 instances with 9 features (6 new) | - High accuracy - Response in 100 milliseconds - Using feature selection algorithm (among 46 features) | - Need to implement as a plugin and evaluate the model with different datasets | 99.57 |
LRCN + GCN (Ariyadasa et al., 2022) | DL | Balanced dataset containing 50,000 instances with automatic feature extraction based on URL and HTML | - Combining LSTM and CNN to check URL and using GCN to check HTML - Response in 1.8 seconds | - Low accuracy - Using powerful processor (Xeon with 4 cores) - No comparison with existing methods | 96.42 |
Deep Autoencoder (Alqahtani et al., 2022) | DL | UCI dataset 30 features | - Using Artificial Algae algorithm to remove unimportant samples - Using Invasive Weed Optimization algorithm for parameters-tuning - Highest accuracy on UCI | - Not examining the response time and complexity of model - No feature extraction process | 99.28 |
RNN + GRU (Tang and Mahmoud, 2021) | DL | Balanced dataset containing 120,000 instances | - High accuracy - Feature extraction with NLP - Implementation as a browser extension | - Max URL length up to 200 characters - Not supporting short URLs | 99.18 |
CNN + Bi-LSTM (Ray & Kusshwaha, 2021) | DL | 7,500 instances with character and word level features + 15 manual features from URL | - Combining CNN and Bi-LSTM with manual features | - Dependent on third-party services including Alexa in manual features - Small dataset - Not examining the response time and complexity of model | 97.5 |
DNN + BiLSTM (Ozcan et al., 2021) | DL | Balanced dataset containing 28,000 instances 27 NLP (old) features with DNN and character embedding with LSTM | - Combining two networks in the output layer - High accuracy - Examining the computing time | - Extracting one feature from Alexa - Not examining the response time - Small dataset - Max URL length up to 150 characters | 99.21 |
CNN (Mourtaji et al., 2021) | DL | Imbalanced dataset containing 30,000 legitimate 10,000 phishing with 37 features | - Using combined features and black lists - Comparing several algorithms | - Dependent on third party services and target website address - Long running time (4 hours) | 97.94 |
RF (Lakshmanarao et al., 2021) | Traditional | Imbalanced data set containing 393,000 legitimate 146,000 phishing with URL features by using Hashing Vectorizer | - Applying three techniques for text feature extraction (TF-IDF Vectorizer, Count Vectorizer, and Hashing Vectorizer) - Implemented as a WebApp - No limit on the number of URLs' characters | - Not examining the response time | 97.5 |
GB + RF (Indrasiri et al., 2021) | Traditional | Balanced dataset containing 75,000 with selecting 22 features (from 46) | - Hybrid classifier | - Two features dependent on third-party services - Computing time 170 seconds | 98.27 |
PART (Barraclough et al., 2021) | Traditional | 10,000 legitimate 20,500 malwares with 3,000 features | - High accuracy | - Many and unclear features | 99.33 |
MLP (Deval et al., 2021) | DL | 38,500 legitimate 40,000 phishing with 11 features (5 new) | - Presenting the first collaborative approach with the ability to add and remove features for the first time | - Dependent on English language (copyright feature) - Not examining the response time | 95–97 |
RF (Gupta et al., 2021) | Traditional | 20,000 instances from ISCX-URL2016 with 9 URL-based features | - High accuracy - short response time (51 milliseconds) - Limited features | - Not evaluating the robustness of the model with different datasets - Old dataset | 99.57 |
RF (Gandorta and Gupta, 2020) | Traditional | 2,500 phishing 2,700 legitimate With 20 features based on URL, source code, and page rank | - High accuracy - Comparison of six different algorithms | - Small dataset - Not evaluating the robustness of the model - Dependent on third party services | 99.5 |
RF (Stobbs et al., 2020) | Traditional | 20,000 legitimate 10,000 phishing with 27 features | - Using optimization algorithms for tuning parameter and feature selection - High accuracy | - Dependent on third party services - Not examining the response time | 99.33 |
LSTM (Somesha et al., 2020) | DL | 3,500 instances with 10 features: 3 URL-based, 6 HTML-based, and 1 page rank | - High accuracy | - Small dataset - Dependent on third party services (Alexa) - Not examining the response time | 99.57 |
ensemble-based (Sameen et al., 2020) | Traditional | Balanced dataset containing 100,000 with lexical features based on URL and HTML | - Containing AI-generated phishing URLs in dataset - Detecting tiny URLs using DeepPhish - Calculating computational complexity | - Not examining the response time | 98 |
RF (Sadique et al., 2020) | Traditional | Imbalanced data set containing 60,000 legitimate 38,000 phishing with 36 features | - Prioritizing feature extraction according to their computing time - Collecting legitimate websites from PhishTank | - Low accuracy - Dependent on third party services - No comparison with existing methods - Not explaining features | 87 |
RF (Nagunwa et al., 2020) | Traditional | Imbalanced data set containing 9,000 phishing 1,700 legitimate with 20 features + black list + sensitive words (i.g. login) | - Detecting tiny URLs | - Dependent on third party services -8.5 seconds per website | 98.45 |
RNN (Feng & Yue, 2020) | DL | 800,000 legitimate 760,000 phishing with 17 URL-based features | - High accuracy - Large dataset - Automatic feature extraction | - Using powerful processor (GPU) | 99 |
Gradient Boost (Arora & Misra, 2020) | Traditional | 5,500 legitimate 5,000 phishing with 11 URL-based features | - Accuracy obtained without 3 features dependent on third-party services: 98.42% - Response in of 2 seconds | - Small dataset - Not evaluating the robustness of the model | 99.93 |
CNN (Aljofey et al., 2020) | DL | 158,000 phishing 161,000 legitimate based on character-level features | - Evaluating robustness of model (extraction of four different groups of features to compare test results on multiple sets) - Classification time 0.47 milliseconds | - Max URL length up to 200 characters | 95.02 |
LSTM + CNN (Adebowale et al., 2020) | DL | Dataset containing one million URLs and 10,000 phishing pictures with 35 features based on text, frame, and images | - Large dataset - Combination of two networks - Using hybrid features | - Low accuracy - High response time (25 seconds) - Dependent on third-party services - Extract features based on source code only in JavaScript language | 93.28 |
RNN + CNN (Wang et al., 2019) | DL | Balanced dataset containing 490,000 instances | - Large dataset - Evaluating the robustness of the model - Getting raw URLs as input - Combining two neural networks | - Low accuracy - Long response time (40 sec) - Max URL length up to 255 characters | 95.79 |
ANFIS (Adebowale et al., 2019) | Fuzzy Logic | 13,000 instances: 5,000 phishing 2,000 suspicious 6,000 legitimate with 35 features based on text, frame, and images | - High accuracy - Using hybrid features | - Dependent on third-party services - Extracting features based on source code only in JavaScript language | 98.55 |
LR (Yan et al., 2019) | Traditional | Balanced dataset containing 800,000 instances | - Large dataset - Automatic feature extraction by Stacked Denoising Autoencoders network - Testing time for per URL 0.083 milliseconds | - Using powerful processor (Xeon with 16 cores + GPU) - Long training time (85 minutes) | 98.25 |
RF (Sahingoz et al., 2019) | Traditional | Ebbu2017 dataset with 40 features | - Extracting 27 features using NLP | - Dependent on third-party services (Alexa) - Not examining the response time | 97.98 |
RF (Jain and Gupta, 2018) | Traditional | 2,100 phishing 1,900 legitimate with 19 features based on URL and source code | - Comparing five different algorithms - Response in 5.8 seconds | -Small dataset - Some features based on comparison with reputable websites | 99.09 |