Bridging the Gap: Advancing the Transparency and Trustworthiness of Network Intrusion Detection with Explainable AI

doi:10.21203/rs.3.rs-3263546/v1

Download PDF

Research Article

Bridging the Gap: Advancing the Transparency and Trustworthiness of Network Intrusion Detection with Explainable AI

https://doi.org/10.21203/rs.3.rs-3263546/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 18 Jun, 2024

Read the published version in International Journal of Machine Learning and Cybernetics →

Version 1

posted

You are reading this latest preprint version

With the explosive rise of internet usage and the development of web applications across various platforms, ensuring network and system security has become a critical concern. Networks and web services are particularly susceptible to targeted attacks, as hackers and intruders persistently attempt to gain unauthorized access. The integration of artificial intelligence (AI) has emerged as a crucial tool for detecting intrusions and constructing effective Intrusion Detection Systems (IDSs) to counter cyber-attacks and malicious activities. IDSs developed using machine learning (ML) and deep learning (DL) techniques have proven to be highly effective in detecting network attacks, offering machine-centric solutions. Nevertheless, mainstream adoption, confidence and trust in these systems have been greatly impeded by the fact that ML/DL implementations tend to be “black boxes,” and thus lacking human interpretability, transparency, explainability, and logical reasoning in their prediction outputs. This limitation has prompted questions about the responsibility and comprehension of AI-driven intrusion detection systems. In this study, we propose four novel architectures that incorporate Explainable Artificial Intelligence (XAI) techniques to overcome the challenges of limited interpretability in ML/DL based IDSs. We focus on the development of ExplainDTC, SecureForest-RFE, RationaleNet, and CNNShield architectures in network security solutions, and inquiry into their potential to convert the untrustworthy architectures into trustworthy. The models are applied to scan network traffic and identify, and report intrusions based on the traits extracted from the UNSW-NB15 dataset. To explain how a decision is made by the models and to add expansibility at every stage of machine learning pipeline, we integrate multiple XAI methods such as LIME, SHAP, ElI5, and ProtoDash on top of our architectures. The generated explanations provide quantifiable insights into the influential factors and their respective impact on network intrusion predictions.

Systems and Networking

Intrusion detection system

cybersecurity

explainable AI

machine learning

deep learning

LIME

SHAP

ELI5

ProtoDash

local and global explanations

The security of networks, devices, and data against unwanted access and improper use has emerged as a crucial research domain in the era of expanding connectivity. This field of study is known as cyber security, encompassing a range of practices aimed at safeguarding information and preserving its confidentiality, integrity, and availability [1]. To combat the ever-evolving landscape of cyber threats, defensive mechanisms have emerged at various levels, including applications, networks, hosts, and data [2]. The staggering surge in global internet users by 0.3 billion in 2021 can be attributed to the continuous development of computer networks, servers, and mobile devices, along with the easy accessibility of the internet [3]. Consequently, global cyber-attacks have skyrocketed by astonishing 29% in the same year, as unveiled by the illustrious 2021 Cyber Trends Report [4]. Additionally, the nature of cyber-attacks constantly changing with attackers exploring newer patterns, developing sophisticated, smart, intricate, and more powerful methods to attack cyber infrastructure to stay ahead [5]. A rigid categorization of attacks is therefore inadequate to account for the constantly evolving and innovative tactics used by adversaries.

Network intrusion is a term used frequently in the realm of cyber security to describe unwanted access to a computer within an organization or a recognized domain. It is one of the most prominent types of attack in critical services where attempts are made to compromise different security principles. Absence of robust security mechanisms to battle such threats indicate failure in the security controls set up to safeguard information and resources. Network Intrusion Detection Systems (NIDSs) have been developed to counteract these invasions. They are designed to monitor network or local system activity and spot any unusual or malicious behavior that deviates from security norms or established procedures [6]. The use of ML and DL techniques to build effective NIDSs has gained popularity recently. By learning a network's typical behavior and then observing any variations that would suggest an intrusion, these advanced algorithms potential to increase network security's overall resilience and detection capabilities. However, the use of ML and DL in IDSs lacks the ability to provide explicit explanations for the judgments they make, not having the capability to explain how they arrive at prediction or classification of an attack, and therefore making their outcomes oftentimes not comprehensible by humans [7].

A dedicated field of study called XAI is emerging to explain the reasoning behind predictions made by models and the significance of incorporating explainability into the traditionally opaque nature of black-box AI systems [8] [9]. Various strategies, collectively referred to as XAI, have been proposed to enhance the intelligibility of AI decisions for human users. Already, integration of XAI has found success in variety of application domains, including healthcare, Natural Language Processing (NLP), and financial services [14]. As a result, numerous cyber security professionals have studied the integration of XAI approaches in an effort to better understand how IDSs make decisions for enhancing their interpretability and reliability [10].

In this study, we design four distinct architectures employing both ML and DL algorithms that integrate XAI techniques to improve intrusion detection and analysis in different networks. We endeavor to overcome the constraints of current ML and DL based intrusion detection systems (IDSs) by articulating the rationale behind the crucial choices that these systems make. Our work centers around the development and evaluation of two ML-based architectures, namely ExplainDTC and SecureForest-RFE, and two DL-based architectures, RationaleNet and CNNShield, in the context of network security solutions. These architectures are powered by XAI and aim to address the limitations of existing ML and DL based IDSs by providing clear explanations for the critical decisions made by these systems. At several points in the ML pipeline, explainable AI methods may be added to acquire insightful data. We can identify and correct any biases that existed before training the model by creating explanations using the training data. In a similar vein, explanations provided by the trained model can assist us in determining whether the model has picked up any unsuitable rules or patterns. We may enhance the performance of the model by tweaking it or changing the feature choices. Additionally, instance-specific explanations are helpful for debugging the model and and their significance persists even after the model has been deployed.

To achieve the goals, we first build up the architectures to quickly identify and anticipate attacks utilizing machine and deep neural networks. We then incorporate various XAI techniques such as LIME [26] [30], SHAP [21] [29], ELI5 [27], and ProtoDash [34] into our architectures to explain model classification choices, reasoning behind predictions, feature importance and contributions for predictability, and analyze dataset. We also carry out extensive tests utilizing well-known attacks to assess the strength and explainability of our proposed architectures. The outcomes of these tests reveal the effectiveness and interpretability of our methodology and highlight its precision in identifying and deducing different threats. This will increase transparency of IDSs and allow human analysts, security experts, and end users to understand them more thoroughly and transparently. Our contributions can be summarized as follows:

We propose improved architectures that utilize AI to safeguard networks from emerging cyber threats with excellent accuracies.
We enhanch transparency and comprehension of intrusion detection models by integrating explanations at every stage of the machine learning pipeline. This is crucial for instilling confidence in the decision- making and general comprehension process for the security personnel as well as the end users.
Our research aligns with emerging regulations and laws, such as the General Data Protection Regulation (GDPR) [22], which emphasize the need for explanations across various domains, including cybersecurity.
XAI is motivated by the need to address biases and misconceptions in cyber security systems. One prominent issue is biased training data, which can undermine the credibility of the model's outputs, particularly when utilizing neural networks that learn patterns from the training data [23]. We overcome this challenge by efficiently processing, transforming, and handling traffic data at all stage which is crucial to ensure the reliability and fairness of the system.
We provide meaningful justifications for the decisions made by the system. This not only enables the development of cyber security defenses that are fair and socially responsible, but also allows the system to defend its results with appropriate justifications. By offering transparent justifications, we bring about trustworthiness and accountability of our system.

The rest of this paper is presented as follows: Section 2 provides essential background information regarding XAI. We then briefly present various developments related to this work in Section 3. Then we discuss the Proposed Approach and the followed methods in this work in Chap. 4. Chapter 5 provides performance evaluation and analysis of our architectures that navigate through the ourcomes and explanations generated over the outcomes. Finally, the Conclusion and Future direction are highlighted in section 6.

This section aims to provide a general overview of XAI that includes essential background information. This background information will be helpful for readers to enhance their comprehension of the subsequent sections, which discuss the applications of XAI in IDSs.

Based on [24] and [25], the categorization of XAI can be organized into different aspects. The categorization methods for XAI are overlapping in nature, with certain XAI techniques falling into multiple categories. To provide a more comprehensive understanding of a specific XAI technique, it is beneficial to categorize it from different perspectives. This approach allows for a deeper exploration of the technique's characteristics and reveals valuable information at various levels.

2.1 Intrinsic and Post-hoc

Post-hoc XAI techniques are used to explain the decisions of an AI model after the model has been trained, while intrinsic XAI techniques are used to explain the decisions of an AI model during the training process. Due to their inherent self-explanatory nature, several machine learning models, such as decision trees and sparse linear models, are regarded as intrinsic XAI techniques. Contrarily, post-hoc explanations are means of interpretation applied after the models have been trained and the choices have been made. LIME and Permutation Importance are two typical Post-hoc explanation techniques.

2.2 Global Interpretability versus Local Interpretability

Local interpretability techniques explain each prediction made by an AI model, whereas global interpretability techniques explain the overall decision of the AI model [33]. The term "local explainability" describes a system's capacity to justify a given choice. To explain the predictions of machine learning models, local explainability techniques like LIME, SHAP, and counterfactual explanations [32] are frequently utilized.

2.3 Model-agnostic Vs Model Specific

Model-specific XAI approaches are created for a particular kind of AI model, whereas model-agnostic XAI techniques can be applied to any form of AI model. For instance, one method for providing understandable reasons for predictions generated by any graph-based machine learning model is the graph neural network explainer [28]. In contrast, any machine learning model can theoretically be used with model-agnostic explanation tools. SHAP tools [29], Saliency Map [30], and Gradient-weighted Class Activation Mapping (Grad-CAM) [31] are examples of frequently used model-agnostic explanation tools.

2.4 Explanation Output

The format of the explanation output plays a significant role in the categorization of XAI, as it can greatly impact the user experience. Text-based explanations are commonly used in Na to provide detailed and human-readable explanations [35]. Visualized explanations, on the other hand, find application in various domains such as NLP [36], neural networks [37], and healthcare [38]. Visualizations are particularly useful for presenting feature summaries and highlighting relevant information [39]. Argument-based explanations aim to outline features in a way that resembles how humans make decisions, aiding in better understanding their relevance [40]. Model-based explanations involve revealing the internal workings of black-box models by approximating their behavior with more interpretable models.

XAI has emerged as a pivotal component in the fight against network intrusions, with an increasing focus on developing XAI-based IDSs. Several studies have explored the integration of XAI techniques to enhance the transparency and effectiveness of IDSs for cybersecurity applications. The authors in [44] emphasize the significance of ethical AI as they analyze the ideas, taxonomies, prospects, and difficulties related to XAI. They investigate a number of XAI approaches, such as rule-based approaches, LIME, SHAP, and others. The importance of model openness, interpretability, and trustworthiness in AI systems is also covered in the study. Overall, the work provides academics and practitioners with a useful tool for understanding XAI and its implications for creating ethical AI systems.

Zakaria et al. [46] propose a deep learning-based framework to provide explanations for critical decisions made by IDSs in IoT network traffic. This work incorporates RuleFit, LIME and SHAP to offer both local and global explanations, aiming to enhance the interpretability of the decisions. Their XAI-based framework was validated on NSL-KDD [63] and UNSW-NB15 [12] datasets and the results demonstrates significant enhancement of the interpretability of IoT IDSs when dealing with common attacks.

Basim et al. [46] employ decision tree models, which are straightforward and understandable and mimic human decision-making by dividing options into smaller sub-options. This worked focuses on feature ranking, extraction of decision tree rules, and comparison with existing cutting-edge algorithms. To extract rules and assess the efficacy of this strategy, experimental studies are carried out using a well-known KDD benchmark dataset.

The implementation of data cleaning and AI explainability techniques deliver explanations to the analysts, taking into account their expertise and background in FAIXID, a framework developed by Hong et al. [48] that promptly eliminates false positive to make more sophisticated decisions. The modeling module aids analysts in comprehending the inner workings of AI models by providing explanations, the post-modeling explainability module enhances the understandability of AI model results through additional explanations, the attribution module selects appropriate explanations for analytical findings, the evaluation module that assess the explanations and gather feedback from analysts are some functional modules of this work.

Marwa et al. [43] present a framework designed for enhancing the transparency of intrusion detection systems in IoT networks. This approach involves the utilization of a Short-Term Long Memory (LSTM) model and provides explanatory insights into the decision-making process of the model.

Pieter et al. [49] presented a two-stage pipeline for reliable network intrusion detection. For supervised intrusion detection, they first use an extreme gradient boosting (XGBoost) model. Explanations for model's choices are created using SHAP to improve interpretability. In the subsequent stage, these justifications are utilized to train an auto-encoder that can distinguish between known and unidentified assaults.

To improve user comprehension of the patterns discovered by the proposed model, Shraddha et al. [44] created a system that combines several XAI techniques. The system helps users to find pertinent characteristics for decision-making processes by building relationships between features and system outcomes, giving instance-wise explanations, and offering both local and global explanations. Users can change the dataset or choose a new feature set to guarantee the model learns the proper patterns if the learnt patterns are erroneous. Increased transparency is made possible by using XAI methods like SHAP, LIME, CEM, ProtoDash, and BRCG at various phases of the framework. The NSL-KDD dataset is used in the experiments, and the proposed framework successfully generated explanations from various perspectives.

In contrast to the existing works, which focus only on either ML or DL techniques and may not always be applicable in certain cases, our research addresses the crucial of explainability of both ML and DL aspects in XAI models. It acknowledges the significance of considering the core features of ML and DL algorithms to ensure that the explanations provided are aligned with the underlying decision-making process. The proposed research focuses to detail enhancement of the reliability and effectiveness of our frameworks. On top of that, accuracy is undoubtedly crucial for effective intrusion detection, however the existing works predominantly focus on accuracy in IDSs that oftentimes overshadow understanding and interpreting the decisions made by classification algorithms which is equally important. Satisfactorily, our architectures demonstrate excellent accuracies. Additionally, we have placed central emphasis on enhancing the interpretability of predictive models, surpassing the level of explanations provided by previous studies. Our work offers valuable insights into the factors influencing the algorithm's predictions, facilitating the identification of confusing factors, as well as highlighting potential vulnerabilities or blind spots in the system.

Furthermore, most of the previous studies did not specifically target a wide range of networks and often relied on the NSL-KDD, KDDCup99 [41], DARPA [42] datasets, which lack coverage of diverse attacks, particularly emerging ones. However, our work utilizes UNSW-NB15 dataset which is a relatively better choice for intrusion detection research and development compared to previous NIDS benchmark datasets and shows more recent and contemporary network activities. It is more recent, comprehensive, realistic, and easier to use.

The framework is designed to address network security challenges using the UNSW-NB15 dataset, which comprises both real-world normal behavior and synthetic attack activities. It can categorize network traffic and find probable intrusions with transparent explanations of the outcomes. Overall, the work provides academics and practitioners with a useful tool for understanding XAI and its implications. for creating ethical AI systems. The system incorporates LIME, SHAP, ELI5, and ProtoDash techniques to improve interpretability. LIME makes it easier to create local explanations by approximating complicated ML/DL models with more straightforward, understandable models. By quantifying each feature's contribution to the prediction, SHAP offers a comprehensive insight of the model's decision-making process. By giving extra information on feature relevance and weightage, ELI5 enhances existing strategies. Protodash allows for the presentation of instances from the training dataset that bear resemblance to a provided sample, showcasing both the commonalities and distinctions between these instances.

4.1 Analysis of the dataset

Numerous datasets [11] exist for IDSs. In this study, we employ UNSW-NB15, a substantial dataset of network intrusions. The Australian Centre for Cyber Security (ACCS) produced it, and it has almost 2.5 million network packets. The raw traffic, captured using the Tcpdump tool, amounts to 100 GB in Pcap files. This dataset comprises nine distinct types of attacks, namely Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, and Worms.

We partition the dataset into two halves. There are 175,341 records in the training set and 82,332 records in the testing set. Each record in the dataset contains a number of features including the source IP address, the destination IP address, the source port, the destination port, the protocol, the number of bytes sent, the number of packets sent, the duration of the connection, the type of attack and so on. The UNSW-NB15 dataset is a better choice for intrusion detection research and development compared to previous NIDS benchmark datasets such as KDDCup99, NSL-KDD, DARPA and shows more recent and contemporary network activity [13]. The dataset is more recent, comprehensive, realistic, and easier to use.

In our experimental procedures, we will focus on binary classification, specifically distinguishing between Normal and Attacking behavior. Figure 2 presents comprehensive information and the distribution of values for each attack class within the data subsets. In this representation, Normal behavior is denoted as 0, while Attacking behavior is denoted as 1. It is evident from the Fig. 2 that the dataset exhibits a satisfactory balance for the binary response variable related to activity behavior.

4.2 Data Preprocessing

The initial phase involves data cleaning by getting rid of any error or discrepancies. We check for missing values to fill them up, remove duplicate records and fix noisy data. We select and prepare the features to employ them in training our models. A series of steps such us One Hot Encoding, Label Encoding, Normalization, Correlation analysis have been conducted to reduce dimensionality and to focus on the significant attributes. To encode the categorical variables into numerical values, One-hot encoding and label encoding techniques are used.

Highly correlated variables can be problematic for our models because they can potentially cause the model to overfit the training data. By removing highly correlated variables, we improve the performance of our models on unseen data. For this purpose, we perform correlation analysis and remove highly correlated variables from the dataset. We select all columns that have a correlation coefficient greater than 98% with any other column. We then remove the selected columns from the dataset.

Additionally, we normalize or scale the numerical features to ensure they are on a similar scale. We perform standardization scaling, also known as z-score normalization or standardization, that transforms the features such that they have zero mean and unit variance. This helps to prevent certain features from dominating the model training process.

The concluding stage involves dividing the data into two distinct sets: a training set and a test set. The training set is employed to train the models, while the test set is utilized to assess respective performances. A partition ratio of 70% for training and 30% for testing is adopted. We further ensure that the class distribution remains consistent across the training and testing sets.

4.3 Model Architectures

In this section, we discuss the model architectures in use to provide a better understanding of their operations.

4.3.1 ExplainDTC

ExplainDTC utilizes the concept of decision trees. It builds a tree-like model of decisions and their potential consequences based on the training data. It recursively splits the data based on feature values to create a hierarchical structure of decision nodes and leaf nodes. Each internal node represents a decision based on a specific feature and threshold, while each leaf node represents a class label or a class distribution. ExplainDTC has a hierarchical structure, where decisions are made at each node based on the feature values, leading to a path from the root to a leaf node. Various criteria are used to determine the best feature and threshold for splitting the data at each node. ExplainDTC is a highly interpretable model, as the decision path can be easily visualized and understood. It can capture non-linear relationships between features and the target variable by creating non-linear decision boundaries. This model can be enhanced through ensemble methods, such as Random Forests or Gradient Boosting.

4.3.2 SecureForest-RFE

SecureForest-RFE is based on Random Forest Classifier, an ensemble learning model that combines multiple decision trees to make predictions. It is an extension of the decision tree model and belongs to the family of tree-based models. SecureForest uses an ensemble of decision trees to make predictions. Each decision tree in the ensemble is trained on a randomly sampled subset of the training data and a random subset of features. It employs a technique called bagging (bootstrap aggregating) to create multiple subsets of the training data. In addition to training on different subsets of the training data, each decision tree in SecureForest also considers only a random subset of features at each split.

When SecureForest classifier is combined with Recursive Feature Elimination (RFE), the resulting RFC-RFE model can be classified as a wrapper feature selection method. RFE is a feature selection technique that ranks and eliminates features based on their importance or relevance. It iteratively trains the classifier on the full feature set and eliminates the least important features. This process continues until a predefined number of features or a specified threshold is reached. During the RFE process, the classifier assigns importance scores to each feature based on their contribution to reducing impurity or information gain across the trees in the forest. These importance scores are used to rank the features and determine their relevance for the classification task. RFE starts with the full feature set and eliminates the least important feature(s) based on the importance scores. SecureForest-RFC Classifier is retrained on the reduced feature set, and the process is repeated until the desired number of features or threshold is reached.

4.3.3 RationaleNet

RationaleNet is a type of artificial neural network that falls under the category of feedforward neural networks. It is a versatile model harness the power of MLP (Multi-Layer Perceptron). The RationaleNet classifier consists of multiple layers of interconnected artificial neurons, organized into an input layer, one or more hidden layers, and an output layer. The neurons are arranged in a feedforward manner, with connections only going from one layer to the next, and no cycles or loops. Each neuron in the model applies an activation function to its input, which determines the output of the neuron. The training of RationaleNet involves the use of backpropagation, an algorithm that iteratively adjusts the weights of the connections based on the error between the predicted output and the true output. Backpropagation calculates the gradients of the loss function with respect to the weights and updates the weights accordingly to minimize the error. RationaleNet is capable of learning non-linear decision boundaries by introducing non-linear activation functions and multiple hidden layers. The hidden layers provide the capacity to capture complex relationships and patterns in the data.

RationaleNet is designed for binary classification of network traffic instances as either "normal" or "attack" in the context of the UNSW NB15 dataset. It consists of three Dense layers with specific settings. The first Dense layer has 64 units, meaning it has 64 neurons, and uses the Rectified Linear Unit (ReLU) activation function. The second Dense layer has 128 units, also employing ReLU activation. The output layer consists of a single unit and uses the sigmoid activation function for binary classification. The sigmoid activation function restricts the output to a range between 0 and 1, representing the probability of the input belonging to either the negative class ("normal") or the positive class ("attack").

4.3.4 CNNShield

CNNShield is based on Convolutional Neural Network (CNN) and designed for binary classification of network traffic instances as either "normal" or "attack." The architecture and configuration of this model aim to effectively classify network traffic instances in the UNSW NB15 dataset, improving the detection and understanding of potential security threats within network communications. It consists of several layers with specific settings. The initial Conv1D layer is the input layer with a shape of 38 x 1 based on the feature size of the data. It has 64 filters and a filter size of 3. The subsequent MaxPooling1D layer has a pool size of 2. After that, another Conv1D layer follows with 128 filters and a filter size of 3, followed by another MaxPooling1D layer with a pool size of 2. The model then flattens the output before passing it through two Dense layers. The first Dense layer has 64 units, and the final Dense layer has a single unit for binary classification. The model utilizes the Rectified Linear Unit (ReLU) activation function for the Conv1D and first Dense layers, while the sigmoid activation function is used for the final Dense layer. CNNShield is trained using the binary cross-entropy loss function, optimized with Adam optimizer, and evaluated using metrics such as accuracy, F1 score, precision, and recall.

4.4 Explanation Generation

In this section, we discuss XAI techniques in use to provide a better understanding of their underlying architectures.

4.4.1 LIME

Machine learning models are becoming increasingly complex and powerful, however they can also be difficult to understand. LIME is a method for explaining the predictions of machine learning models by creating local interpretable models. LIME is model agnostic, meaning that it can be used to explain the predictions of any complex classifiers. The idea is to find a simpler explanation model (g) from a class of interpretable models (G) that accurately represents the behavior of the original complex classifier (f) for a particular data point (x). This local explanation (φ(x)) helps us understand why the model made a specific prediction for that instance.

To validate the effectiveness of this enhanced LIME method, experiments are conducted on a dataset with different classification models. Comparisons are made between the explanations generated using the original LIME method and the new augmented version. Evaluation metrics like interpretability scores, prediction accuracy, and human perception studies are used to assess the quality of these explanations. In simple terms, LIME helps us understand the "why" behind the predictions made by complex machine learning models.

4.4.2 SHAP

The demand for interpretability in machine learning models has given rise to explainable artificial intelligence (XAI), which has become more popular. SHAP provides a powerful method to produce feature significance ratings and justifications for specific predictions. The mathematical foundations of SHAP are examined in this paper, and the importance of each characteristic to the prediction process is quantified using ideas from cooperative game theory. The tenets of cooperative game theory, notably the notion of Shapley values, form the basis of SHAP. By taking into account all potential player combinations, Shapley values determine the marginal contribution of each player in a cooperative game. Features are viewed as players in the context of machine learning, and their cooperative behavior is examined to establish their significance.

The SHAP formulation aims to assign importance scores to each feature, reflecting their contribution towards a specific prediction. Calculating the Shapley values requires considering all possible subsets of features and computing their contributions. However, the exponential complexity makes an exhaustive evaluation infeasible. To address this issue, various approximation algorithms and sampling techniques have been proposed, such as Monte Carlo sampling and Kernel SHAP [Ref. Required].

To validate the effectiveness of SHAP, we conduct an extensive experimental evaluation on our dataset and classification models. We compare the feature importance scores generated by SHAP with other popular interpretability methods. Evaluation metrics, such as stability, consistency, and fidelity, are employed to assess the quality and robustness of the explanations provided by SHAP.

4.4.3 ELI5

Even for non-technical people, ELI5's explanations are clear and simple, offering a distinctive viewpoint. In this work, we demonstrate ELI5's capacity to produce human-friendly explanations while examining its fundamental ideas and approaches. ELI5 uses feature significance analysis to pinpoint the variables influencing model outcomes. ELI5 helps users to understand the important parameters the model took into account by highlighting the most important variables and quantifying the impact of each feature. Information gain, SHAP values, and permutation importance are frequently used methods for feature importance analysis.

In addition to feature importance analysis, ELI5 leverages various model interpretability techniques to enhance the understandability of predictions. This includes visualizations such as decision trees, partial dependence plots, and LIME-like explanations, which provide insights into how individual features influence the model's output. ELI5's primary objective is to produce explanations that are easily understood by users with minimal domain knowledge. It achieves this by simplifying complex model internals, providing explanations in a language and format that is accessible to a wide range of audience. ELI5 employs techniques such as natural language explanations, informative visualizations, and interactive tools to engage users and enhance comprehension.

To evaluate the effectiveness of ELI5, we conduct diverse experiments on our datasets using ML/DL models. We compare the explanations generated by ELI5 with other popular interpretability approaches, assessing factors such as simplicity, accuracy, and user perception. Evaluation metrics include readability scores, user satisfaction surveys, and comprehension tests.

4.4.4 ProtoDash

The core idea behind ProtoDash is its capacity to derive insightful data using explainable AI algorithms. These justifications are crucial tools for spotting and correcting biases in the training data, resulting in a reliable and accurate model. Additionally, ProtoDash creates explanations from the trained model to help users comprehend the underlying principles and patterns more accurately. With this information, security professionals may increase the model's accuracy, choose more useful characteristics, and fine-tune it for better performance. We have used this algorithm from an open-source toolkit called AI Explainability 360 (AIX 360) developed by IBM [51].

ProtoDash stands apart due to its focus on real-time cybersecurity. Security teams can react quickly to new threats since ProtoDash integrates XAI into the ML pipeline at every level. Model debugging and post-deployment analysis are made efficiently by ProtoDash's instance-specific explanations. This capacity guarantees ongoing model upkeep and optimization, adapts to changing attack strategies, and strengthens network defenses against creative intrusion attempts.

We carefully assess the performance of the developed models throughout our study and benchmark outcomes of each model. To evaluate how well they operate at recognizing different kinds of network intrusions, we examined their detection accuracy, precision, recall, F1-score, and ROC-AUC. Table 1 illustrates that the models secure high accuracy in detecting network intrusions.

Table 1

Performance metrics of our XAI-empowered architectures
Proposed Architecture	Accuracy	Precision	Recall	F1	RUC-AUC
ExplainDTC	93.6%	95.0%	94.9%	95.0%	93.0%
SecureForest-RFE	94.6%	95.8%	95.8%	95.8%	94.0%
RationaleNet	93.7%	95.1%	95.0%	95.0%	93.0%
CNNShield	93.6%	95.5%	94.4%	94.8%	93.0%

The outcomes of our XAI-enhanced architectures, as depicted in Table 2, demonstrate their superiority over other state-of-the-art approaches. Notably, our framework achieves the highest accuracy and detection rate, as confirmed by the experimental results. These findings underscore the superior performance of our XAI-driven architectures compared to existing works in the field.

Table 2

Comparision of Our XAI-empowered architectures with state-of-the-art ML/DL-based models
Proposed Solution	Dataset	Accuracy (%)	XAI
Marwa et al. [50]	UNSW-NB15	0.866	√
Sree et al. [52] DT	UNSW-NB15	0.85	√
Sree et al. [52] Xgboost	UNSW-NB15	0.898	√
Sree et al. [52] MLP	UNSW-NB15	0.899	√
[53] Xgboost	UNSW-NB15	88.13	NO
[54] MLP	UNSW-NB15	84.24	NO
[55] DT	UNSW-NB15	89.7	NO
[55] RF	UNSW-NB15	90.3	NO
ExplainDTC	UNSW-NB15	93.6	√
SecureForest-RFE	UNSW-NB15	94.6	√
RationaleNet	UNSW-NB15	93.7	√
CNNShield	UNSW-NB15	93.6	√

In the following part, we have employed multiple explainability techniques to elucidate the workings of the architectures. The core objective is to provide clarifications and justifications of the outcomes.

5.1 ExplainDTC

We start our investigation with ExplainDTC. The ExplainDTC model that is capable of distinguishing between normal and attack behavior in network traffic data with high accuracy.

We also determine the top 15 features and calculate their relative importance utilizing both the scikit-learn library and ELI5’s Permutation Importance toolkit in Fig. 7. The feature importance metric in use considers the decrease in node impurity weighted by the probability of reaching that node. Our experiment finds that both outputs support very similar feature importance with the "sttl" feature, representing the "source to destination time to live value" as the most important feature. Our feature significance analysis is further supported by the decision tree visualization. The most crucial components, including "sttl," are prominently shown in the upper tiers of the tree-like structure. This finding supports the idea that these important characteristics have a stronger impact on the categorization process, highlighting the importance of these characteristics for network traffic analysis.

ExplainDTC, by nature, provide explainability in machine learning algorithm due to their visualization of resulting trees. They are relatively simple to understand and interpret, this make them an easy choice for human analysts who need to understand how the model is reaching to its predictions. Our research demonstrates great accuracy in identifying Normal and Attack activity by using decision trees for network traffic analysis. It gives us the ability to investigate each decision level, as well as the corresponding feature and splitting value for each condition. By analyzing circumstances for each network traffic sample, the decision tree algorithm directs the categorization process. The decision tree algorithm starts at the root of the tree and works its way down, evaluating each condition along the way. If a condition is met, the sample goes down the left branch or node; otherwise, it goes down the right branch. Furthermore, the classification prediction outcome for each class is governed by the tree's maximum depth. It can be visualized from the Fig. 8 to Fig. 9 that as the depth of the tree increases, the explainability reduces.

Figure 10 illustrates the important features for two different classes in the ExplainDTC. Among the features, "sttl", "synack", "sbytes", "dbytes", and "spkts" have the highest scores, indicating their significance in determining the class predictions. In the case of the DTC, it is a binary classifier capable of distinguishing between two classes, such as "Normal" and "Attack." Therefore, the SHAP feature importance plot for ExplainDTC shows the importance scores of the features for both classes separately. Each feature is evaluated based on its contribution to the prediction of each class, indicating how much it influences the classification decision for each class.

Among these features, sttl stands out as the most important. It has the largest impact on the model's predictions compared to the other features. On average, a change in the value of sttl leads to a substantial shift in the predicted probability of the "Normal" class, with an average change of 28 percentage points (0.28 on the x-axis). The other features such as "synack", "sbytes", "dbytes", and "spkts", also contribute significantly to the model's predictions, but their impacts are relatively lower compared to sttl.

The waterfall plot in Fig. 11 represents the local interpretation of ExplainDTC for the second and thirteen instances in test data using SHAP values. Starting from the baseline value of 0.361 at the bottom for second instance, which is the expected prediction, or in other words, the mean of all predictions, the plot illustrates the breakdown of the prediction for the given instance. At the top of plot a), prediction for the given instance. The model predicts a value of 0, corresponding to the "Normal" class. Analyzing the SHAP values, we observe that the feature "sbytes" has the largest contribution to the model's prediction, with a SHAP value of -0.27. Following that, "sttl," "spkts," "ct_srv_dst," and "sload" have SHAP values of -0.18, + 0.12, -0.07, + 0.06, respectively. Here, "sbytes," "sttl," and "ct_srv_dst" make negative contributions to the prediction, while "spkts" and "sload" make positive contributions. By summing up all the SHAP values, we arrive at the total sum that represents the model's prediction. In this case, the sum − 0.27 − 0.18 + 0.12 − 0.07 + 0.06 − 0.02…… and so on to the last feature value would be the sum would be equal to \(E\left[f\right(x\left)\right] — f\left(x\right)\) resulting in a prediction of 0 or the "Normal" class. Therefore, based on the contributions of the individual features, the model predicts that the first instance in the test dataset belongs to the "Normal" class. Each class, indicating how much it influences the classification decision for each class. In a similar manner, the model result in a prediction of 1 or the "Attack" class for instance thirteen as shown in b).

5.2 SecureForest-RFE

The important features for two different classes in SecureForest-RFE is shown in Fig. 12. The features that have the highest scores, indicating their significance in determining the class predictions, are sbytes, sinpkt, dttl, and proto. Among these features, sbytes stands out as the most important. It has the largest impact on the model's predictions compared to other features. On average, a change in the value of sbytes leads to a considerable shift in the predicted probability of the "Normal" class, with an average change of 17 percentage points (0.17 on the x-axis).

The dependence plot in Fig. 13 for the class prediction of SecureForest-RFE exhibits a partially monotonic pattern between the feature of interest "sload" and the target feature "proto". When the feature value of "sload" is in the range between 0.0–1.5 (on the x-axis), the larger value of "proto" (highlighted in red) leads to a decrease in the SHAP value of "sload" (-0.10 to -0.15). This decrease in SHAP value influences the model's prediction output, pushing it towards the "Normal" class. At the same feature value range of "sload" (0.0–1.5 on the x-axis), the smaller value of "sttl" (highlighted in blue) results in both an increase (0.10 to 0.20) and a decrease (-0.10 to -0.20) in the SHAP value of "sload" in both direction for the majority of the instances available around that region. The decrease in SHAP value influences the model's prediction output, pushing it towards the the model's prediction output, pushing it towards the "Attack" class. For larger "sload" values (greater than 2), the impact of "proto" on the SHAP values of "sload" becomes less pronounced. This suggests that variations in "proto" have a weaker influence on the model's prediction for instances with higher "sload" values. In this region, a larger proportion of instances are classified as "Normal," and the visibility of the "Attack" class becomes minimal. Therefore, the relationship between "proto" and the model's prediction becomes less significant as "sload" increases.

A decision plot is a useful tool for presenting multiple features of a dataset in a local explanation. In Fig. 14, we display the decision plot for 100 observations of Recursive Feature-Eliminated (RFE) test data, using the SecureForest-RFE classifier. The x-axis represents the model's predicted output. In this plot, we have visualized the model output values and corresponding feature values for the 100 observations. The top of the plot indicates the probability of each observation belonging to the "Normal" class or the "Attack" class. Positive values indicate predictions towards the "Attack" class, while negative values indicate predictions towards the "Normal" class. The y-axis lists the features, with a total of 19 features in the RFE observations. The features are ordered by descending importance, calculated over the plotted observations. Based on the importance calculated over the 100 observations, the top 5 features are "sttl," "sjit," "sinpkt," "dinpkt," and "dbytes." These features have higher absolute SHAP values compared to the other 14 features, indicating a stronger contribution to the model's prediction. We can identify that there are small line segments in between features and when the slope or incline of the line segment in between two features are less steep (0–45 degrees for positive slope or 135–180 degrees for negative slope), or the slope has a smaller absolute value, the feature strongly contributes to the model prediction. This explains how the feature values push the prediction towards either the "Normal" or "Attack" class.

At the top of the plot, each sample's predicted value is represented by a colored line striking the x-axis. The color of the line corresponds to the prediction value on a spectrum. In almost half of the observations, we see the features shown in blue that push the probability towards the left (Normal class), while in the other half of observations, we see the features shown in red push the probability towards the right (Attack class). Moving from the bottom to the top of the plot, the SHAP values for each feature are cumulatively added to the model's base value of 0.63, resulting in an output of either 0 or 1. This demonstrates how each feature contributes to the overall prediction.

The Force plot for a single prediction of either "Normal" or "Attack" class prediction using SecureForest-RFE is explained using Fig. 15 and Fig. 16.

The model accurately predicts it as an "Attack" with a probability of 1.00 (based on a base value of 0.3607) as represented in Fig. 15. The majority of the features have a tendency to increase the score towards 1. Features like "sinpkt", "dttl", "sbytes", "spkts", "dpkts", and "dbytes" have the most significant influence in determining the probability of the data sample being classified as "Attack." However, the feature "djit" plays a crucial role in driving the probability of the data sample being classified as "Normal." Similarly, our model identifies the observation in Fig. 16 as a “Normal” traffic with the value of \(f\left(x\right)\) as 0 and the base value is 0.6393. More features are pushing the decision towards while only djit pushing the decision towards “Attack” probability.

SHAP force plot for multiple predictions that demonstrates the model's ability to effectively distinguish between the "Normal" and "Attack" classes in showed Fig. 17. The graphic combines 1000 cases from the test dataset to demonstrate the model's capacity to offer informative justifications of the feature contributions for each instance, assisting in the comprehension of its classification judgments.

5.3 RationaleNet

We showcase some instances of the Lime Tabular Explainer output, highlighting the top 15 features. This illustrative dashboard effectively demonstrates the features and their respective weights that contributed to the accurate classification of a network traffic record as "Class 0" or "Normal" for the instance number 2335 in the network traffic in Fig. 18. On the other hand, "Class 1" or "Attack" for the instance number 1033 is shown in Fig. 19. Features in orange color contributes to ‘attack’ and blue contributes to "normal" category. This visual dashboard offers comprehensive and reliable individual explainability for the predicted classifications, enabling cybersecurity analysts to conduct in-depth analyses and follow-up assessments on the reasoning behind specific network traffic classifications made by the model.

RationaleNet is "black box" due to its complex internal workings, this tool significantly enhances the transparency of predictions. This increased transparency capacity facilitates future cybersecurity research, as analysts can exploit the insights provided by the dashboard to gain valuable understanding of the model's decision-making process. The combination of robust individual explainability and the advantages of neural net’s RationaleNet holds great potential for advancing the field of cybersecurity.

5.4 CNNShield

The feature importance of the CNNShield, ordered from the highest to the lowest effect on the model's predictions is displayed in Fig. 20a). The SHAP feature importance plot might be designed for a single-class classification task, where the focus is primarily on predicting a specific class, such as "Normal" or "Attack," rather than distinguishing between multiple classes. Consequently, the SHAP feature importance plot showcases the importance scores of the features specifically for that single class. For the CNNShield, the highest scoring features are ct_state_ttl, sttl, swin, and ct_dst_sport_ltm. Among these, ct_state_ttl emerges as the most important feature, with an average change in the predicted absolute probability of 13 percentage points (0.13 on the x-axis). This indicates that variations in ct_state_ttl have a significant impact on the model's predictions, primarily influencing the probability of the specific class the model is designed to predict. Figure 20b) shows the top 20 features of “Normal” class extracted through CNNShiled. Higher feature value is indicated by red color, and lower feature value is indicated by blue color. On the x-axis, higher SHAP value to the right corresponds to higher prediction value i.e. "Attack" class, and lower SHAP value to the left corresponds to lower prediction value i.e. "Normal" class.

This means that when the feature values of "ct_state_ttl", "sttl", "service", "smean" are larger, their SHAP values correspond to a larger prediction value. Hence, the model is more likely to consider the data as "Attack" class. The smaller the feature values (the bluer the color), the smaller their SHAP values. Hence, the data gets labeled as "Normal" class from the perspective of those features. On the other hand, the larger the values of swin, ackdat (the redder the color), the smaller the SHAP values. Hence, when the feature values of swin, ackdat are larger, the smaller the SHAP values and the model is more likely to consider the data as "Normal" class.

The instance in Fig. 21 is a "Normal" traffic, and the model correctly detects it. The base value for the model is 0.65 and each feature has contributed to the final prediction of Class 0 or "Normal." The feature values such as "ct_state_ttl", "sttl", "dttl", "tcprtt" are negative whereas "is_ftp_login", "stcpb", "dtcpb", are positive. However, both the positive and negative values exist in the features in predicting the probability of the data in both directions. Among these features, the most contributing features in predicting the probability of the data sample being classified as "Normal" are "ct_state_ttl", "swin", and "sttl."

The instance in Fig. 22 is an "Attack" and the model accurately detects it as an "Attack". The baseline value is 0.6197.

The SHAP force plot displays a combination of 1000 instances in Fig. 23 from the test dataset for the CNN model which demonstrates its effective ability to distinguish between the "Normal" and "Attack" classes. In the first 200 samples, the prominence of blue values in the features "ct_state_ttl" and "sttl" indicates a tendency towards predicting the "Normal" class, representing normal traffic flow. However, from approximately sample 300 to near 900, red values in the features "ct_state_ttl" "swin" and "dttl" become more prominent, indicating a tendency towards predicting the "Attack" class. Around sample 900 until 1000, the prominence of blue values becomes more apparent again in the features "stcpb”, "ct_dst_srv", "swin" and "ackdat." This suggests a tendency towards predicting the "Normal" class. This trend is observed despite the prediction being towards the "Attack" class for the majority of samples.

For the 934th instance, the model accurately detects the attack class in Fig. 24a). On the other hand, 1034th instance is an attack that is flawlessly predicted by our model as represented in Fig. 24b).

The dependence plot in Fig. 25 for the class prediction of the CNNShield reveals an approximately linear and positive relationship between the "service" feature and the target feature "sttl". This suggests that "service" and "sttl" interact frequently in influencing the model's prediction. When the feature value of "service" is less than 0 (on the x-axis), the larger value of "sttl" (highlighted in red) leads to a decrease in the SHAP value of "service" (-0.10 to 0.05). This decrease in SHAP value influences the model's prediction output, pushing it towards the "Normal" class. Conversely, when the feature value of "service" is around 1.5 (on the x-axis), the larger value of "sttl" (also highlighted in red) results in an increase in the SHAP value of "service" (0.05 to 0.10). This increase in SHAP value influences the model's prediction output, pushing it towards the "Attack" class. For even larger feature values of "service" (ranging from 3.0 to 4.0 on the x-axis), the smaller value of "sttl" (highlighted in blue) leads to an increase in the SHAP value of "service" (0.05 to 0.20). This increase in SHAP value further influences the model's prediction, causing it to lean towards the "Attack" class.

5.5 Similarity Analysis of Predicted Result

It is advantageous to provide cases from the training dataset that have commonalities with the test instance in question to improve comprehension of a model's decision-making process. In our study, we concentrate on the eighth test instance that the model predicted as 1 or an 'attack'. Figure 26a) displays similar instances from the training data, with their level of similarity denoted by the weight mentioned in the last row. Furthermore, these tables provide easily interpretable explanations by showcasing feature values in terms of their corresponding weights. Figure 26a) and 26 b) represent twenty three closest instances to the test instance. By analyzing the weights, we can determine that the instance listed under column 0 holds the highest representation of similarity to the test instance, as indicated by its weight of 0.406798. This information equips analysts with a greater level of confidence when making final decisions based on the system's output.

The current landscape of IDSs reveals a notable constraint in their ability to furnish justifications for the decisions they make. This limitation significantly hampers a comprehensive understanding of the internal mechanisms of IDSs and, consequently, undermines the human capacity to make well-informed choices in responding to potential threats. Our research addresses this pressing issue by introducing four improved XAI-powered architectures tailored for network intrusion detection. Through the strategic integration of well-established techniques such as LIME, SHAP, ELI5, and ProtoDash, our study significantly enhances the explainability, transparency, and trustworthiness of IDSs. This study's scope extends beyond end-users seeking comprehension and confidence in system outputs, encompassing requirements of cybersecurity personnel who rely on understanding the confidence levels of system outputs to make sophisticated decisions. The results of our rigorous evaluation provide evidence for the effectiveness of this approach and suggest that XAI could be a promising direction for future IDSs research and development. To further strengthen our contributions, future endeavors could focus on subjecting the proposed XAI-powered systems to more expansive and varied datasets, allowing us to glean insights into its performance across diverse scenarios. Furthermore, augmentation of more user-friendliness within XAI techniques could also be explored.

Competing interests: The authors declare no competing interests.

Data availability statement

The codes generated and the dataset used during this study are available from the corresponding author on reasonable request.

CISA, “What is Cybersecurity? | CISA,” What is Cybersecurity? https://www.cisa.gov/uscert/ncas/tips/ST04-001 (accessed Jul. 01, 2022).
D. S. Berman, A. L. Buczak, J. S. Chavis, and C. L. Corbett, “A Survey of Deep Learning Methods for Cyber Security,” Information, vol. 10, no. 4, Art. no. 4, Apr. 2019, doi: 10.3390/info10040122.
“Number of internet users worldwide 2021,” Statista. https://www.statista.com/statistics/273018/number-of-internet-users- worldwide/ (accessed Jul. 01, 2022).
“2021 Cyber Attack Trends Mid-Year Report | Check Point Software.” https://pages.checkpoint.com/cyber-attack-2021- trends.html (accessed Jul. 01, 2022).
S. Zeadally, E. Adi, Z. Baig, and I. A. Khan, “Harnessing Artificial Intelligence Capabilities to Improve Cybersecurity,” IEEE Access, vol. 8, pp. 23817–23837, 2020, doi: 10.1109/ACCESS.2020.2968045.
M. Macas, C. Wu, and W. Fuertes, “A survey on deep learning for cybersecurity: Progress, challenges, and opportunities,” Computer Networks, vol. 212, p. 109032, Jul. 2022, doi: 10.1016/j.comnet.2022.109032.
Z. A. El Houda, B. Brik and S. -M. Senouci, "A Novel IoT-Based Explainable Deep Learning Framework for Intrusion Detection Systems," in IEEE Internet of Things Magazine, vol. 5, no. 2, pp. 20-23, June 2022, doi: 10.1109/IOTM.005.2200028.
C.S.W.M.M.DanielL.Marino,"AnAdversarialApproachforExplainableAIinIntrusion Detection Systems," in IECON 2018 - 44th Annual Conference of the IEEE Industrial Electronics Society, Washington, DC, 2018, pp. 3237-3243, doi: 10.1109/IECON.2018.8591457.
K. Z. Y. Y. X. W. Maonan Wang, "An Explainable Machine Learning Framework for Intrusion Detection System," IEEE Access , vol. 8, pp. 73127 - 73141, 16 April 2020.
P. Barnard, N. Marchetti, and L. A. D. Silva, “Robust Network Intrusion Detection through Explainable Artificial Intelligence(XAI),” IEEE Networking Letters, pp. 1–1, 2022, doi: 10.1109/LNET.2022.3186589.
S. S. S. K. J. Santosh Kumar Sahu, "A detail analysis on intrusion detection datasets," in 2014 IEEE International Advance Computing Conference (IACC), Gurgaon, India, 21-22 Feb. 2014.
Moustafa, Nour, and Jill Slay. "UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)." Military Communications and Information Systems Conference (MilCIS), 2015. IEEE, 2015.
Moustafa, Nour, and Jill Slay. "The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 dataset and the comparison with the KDD99 dataset." Information Security Journal: A Global Perspective (2016): 1-14.
“A Systematic Review of Human–Computer Interaction and Explainable Artificial Intelligence in Healthcare With Artificial Intelligence Techniques | IEEE Journals & Magazine | IEEE Xplore.” https://ieeexplore.ieee.org/document/9614151 (accessed Jul. 02, 2022).
M.-A. Clinciu and H. Hastie, “A Survey of Explainable AI Terminology,” in Proceedings of the 1st Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence (NL4XAI 2019), 2019, pp. 8–13. doi: 10.18653/v1/W19-8403.
O. Biran and C. V. Cotton, “Explanation and Justification in Machine Learning : A Survey Or,” undefined, 2017, Accessed: Jul. 08, 2022. [Online]. Available: https://www.semanticscholar.org/paper/Explanation-and- Justification-in-Machine-Learning-%3A-Biran- Cotton/02e2e79a77d8aabc1af1900ac80ceebac20abde4
T. Speith, “A Review of Taxonomies of Explainable Artificial Intelligence (XAI) Methods,” in 2022 ACM Conference on Fairness, Accountability, and Transparency, New York, NY, USA, Jun. 2022, pp. 2239–2250. doi: 10.1145/3531146.3534639.
D. Ucci, L. Aniello, and R. Baldoni, “Survey of machine learning techniques for malware analysis,” Computers & Security, vol. 81, pp. 123–147, Mar. 2019, doi: 10.1016/j.cose.2018.11.001.
S. Han, M. Xie, H.-H. Chen, and Y. Ling, “Intrusion Detection in Cyber-Physical Systems: Techniques and Challenges,” IEEE Systems Journal, vol. 8, no. 4, pp. 1052–1062, 2014, doi: 10.1109/JSYST.2013.2257594.
R. Ying, D. Bourgeois, J. You, M. Zitnik, and J. Leskovec, “GNNExplainer: Generating Explanations for Graph Neural Networks.” arXiv, Nov. 13, 2019. doi: 10.48550/arXiv.1903.03894.
S. M. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions,” in Advances in Neural Information Processing Systems, 2017, vol. 30. Accessed: Jul. 09, 2022. [Online]. Available: https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d7 6c43dfd28b67767-Abstract.html
R. Iyer, Y. Li, H. Li, M. Lewis, R. Sundar, and K. Sycara, “Transparency and Explanation in Deep Reinforcement Learning Neural Networks.” arXiv, Sep. 17, 2018. doi: 10.48550/arXiv.1809.06061.
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization,” Int J Comput Vis, vol. 128, no. 2, pp. 336–359, Feb. 2020, doi: 10.1007/s11263-019-01228-7.
R. Donida Labati, A. Genovese, V. Piuri, F. Scotti, and S. Vishwakarma, “Computational Intelligence in Cloud Computing,” in Recent Advances in Intelligent Engineering: Volume Dedicated to Imre J. Rudas’ Seventieth Birthday, L. Kovács, T. Haidegger, and A. Szakál, Eds. Cham: Springer International Publishing, 2020, pp. 111–127. doi: 10.1007/978-3-030-14350-3_6.
T. Perarasi, S. Vidhya, L. Moses M., and P. Ramya, “Malicious Vehicles Identifying and Trust Management Algorithm for Enhance the Security in 5G-VANET,” in 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), Jul. 2020, pp. 269–275. doi: 10.1109/ICIRCA48905.2020.9183184.
“What is GDPR, the EU’s new data protection law?,” GDPR.eu, Nov. 07, 2018. https://gdpr.eu/what-is-gdpr/ (accessed Jul. 08, 2022).
C. T. Wolf, “Explainability scenarios: towards scenario-based XAI design,” in Proceedings of the 24th International Conference on Intelligent User Interfaces, New York, NY, USA, Mar. 2019, pp. 252–257. doi: 10.1145/3301275.3302317.
A. Barredo Arrieta et al., “Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI,” Information Fusion, vol. 58, pp. 82–115, Jun. 2020, doi: 10.1016/j.inffus.2019.12.012.
D. V. Carvalho, E. M. Pereira, and J. S. Cardoso, “Machine Learning Interpretability: A Survey on Methods and Metrics,” Electronics, vol. 8, no. 8, Art. no. 8, Aug. 2019, doi: 10.3390/electronics8080832.
M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier.” arXiv, Aug. 09, 2016. doi: 10.48550/arXiv.1602.04938.
A. Altmann, L. Toloşi, O. Sander, and T. Lengauer, “Permutation importance: a corrected feature importance measure,” Bioinformatics, vol. 26, no. 10, pp. 1340–1347, May 2010, doi: 10.1093/bioinformatics/btq134.
S. Wachter, B. Mittelstadt, and C. Russell, “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.” Rochester, NY, Oct. 06, 2017. doi: 10.2139/ssrn.3063289.
V. Arya et al., “One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques.” arXiv, Sep. 14, 2019. doi: 10.48550/arXiv.1909.03012.
A. D. G. C. C. A. Karthik S. Gurumoorthy, "Efficient Data Representation by Selecting Prototypes with Importance Weights," in International Conference on Data Mining (ICDM), 2019.
H. Liu, Q. Yin, and W. Y. Wang, “Towards Explainable NLP: A Generative Explanation Framework for Text Classification.” arXiv, Jun. 11, 2019. doi: 10.48550/arXiv.1811.00196.
M. Danilevsky, K. Qian, R. Aharonov, Y. Katsis, B. Kawas, and P. Sen, “A Survey of the State of Explainable AI for Natural Language Processing.” arXiv, Oct. 01, 2020. doi: 10.48550/arXiv.2010.00711.
J. V. Jeyakumar, J. Noor, Y.-H. Cheng, L. Garcia, and M. Srivastava, “How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods,” in Advances in Neural Information Processing Systems, 2020, vol. 33, pp. 4211–4222. Accessed: Jul. 09, 2022. [Online]. Available: https://proceedings.neurips.cc/paper/2020/hash/2c29d89cc56cdb191 c60db2f0bae796b-Abstract.html
W. Jin, X. Li, and G. Hamarneh, “Evaluating Explainable AI on a Multi-Modal Medical Imaging Task: Can Existing Algorithms Fulfill Clinical Requirements?” arXiv, Mar. 12, 2022. doi: 10.48550/arXiv.2203.06487.
J. Lu, D. Lee, T. W. Kim, and D. Danks, “Good Explanation for Algorithmic Transparency.” Rochester, NY, Nov. 11, 2019. doi: 10.2139/ssrn.3503603.
L. Amgoud and H. Prade, “Using arguments for making and explaining decisions,” Artificial Intelligence, vol. 173, no. 3, pp. 413–436, Mar. 2009, doi: 10.1016/j.artint.2008.11.006.
KDD Cup 1999, Ocotber 2007, [online] Available: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
DARPA intrusion detection evaluation, http://www.ll.mit.edu/IST/ideval/data/data index.html
H. Lakkaraju, E. Kamar, R. Caruana, and J. Leskovec, “Faithful and Customizable Explanations of Black Box Models,” in Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, USA, Jan. 2019, pp. 131–138. doi: 10.1145/3306618.3314229.
Mane, Shraddha, and Dattaraj Rao. "Explaining Network Intrusion Detection System Using Explainable AI Framework." arXiv preprint arXiv:2103.07110 (2021).
Arrieta, Alejandro Barredo, et al. "Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI." Information Fusion 58 (2020): 82-115.
Z. A. E. Houda, B. Brik, and L. Khoukhi, “‘Why Should I Trust Your IDS?’: An Explainable Deep Learning Framework for Intrusion Detection Systems in Internet of Things Networks,” IEEE Open Journal of the Communications Society, pp. 1–1, 2022, doi: 10.1109/OJCOMS.2022.3188750.
B. Mahbooba, M. Timilsina, R. Sahal, and M. Serrano, “Explainable Artificial Intelligence (XAI) to Enhance Trust Management in Intrusion Detection Systems Using Decision Tree Model,” Complexity, vol. 2021, p. e6634811, Jan. 2021, doi: 10.1155/2021/6634811.
H. Liu, C. Zhong, A. Alnusair, and S. R. Islam, “FAIXID: A Framework for Enhancing AI Explainability of Intrusion Detection Results Using Data Cleaning Techniques,” J Netw Syst Manage, vol. 29, no. 4, p. 40, May 2021, doi: 10.1007/s10922-021-09606-8.
P. Barnard, N. Marchetti, and L. A. D. Silva, “Robust Network Intrusion Detection through Explainable Artificial Intelligence (XAI),” IEEE Networking Letters, pp. 1–1, 2022, doi: 10.1109/LNET.2022.3186589.
Marwa Keshk, Nickolaos Koroniotis, Nam Pham, Nour Moustafa, Benjamin Turnbull, Albert Y. Zomaya, An explainable deep learning-enabled intrusion detection framework in IoT networks, Information Sciences, Volume 639, 2023, 119000, ISSN 0020-0255, https://doi.org/10.1016/j.ins.2023.119000.
"AI Explainability 360 (v0.2.0)," 2019. [Online]. Available: https://github.com/Trusted-AI/AIX360
Sree Ranganayaki, Prof. A. Ramesh Babu, Machine learning algorithms for detection and classification IoT Network Intrusion, Vol. 44 No. 7 (2023): Issue 7, Journal of Harbin Engineering University.
Performance Analysis of Intrusion Detection Systems Using a Feature Selection Method on the UNSW-NB15 Dataset|SpringerLink. Available online: https://link.springer.com/article/10.1186/s40537-020-00379-6 (accessed on 25 April 2023)
Yin, Y., Jang-Jaccard, J., Xu, W. et al. IGRF-RFE: a hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 dataset. J Big Data 10, 15 (2023). https://doi.org/10.1186/s40537-023-00694-8
R. Vinayakumar, M. Alazab, K. Soman, P. Poornachandran, A. AlNemrat, and S. Venkatraman, “Deep learning approach for intelligent intrusion detection system,” IEEE Access, vol. 7, 2019.
M. Tavallaee, E. Bagheri, W. Lu and A. A. Ghorbani, "A detailed analysis of the KDD CUP 99 data set," 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 2009, pp. 1-6, doi: 10.1109/CISDA.2009.5356528.

Download PDF

Journal Publication

published 18 Jun, 2024

Read the published version in International Journal of Machine Learning and Cybernetics →

Version 1

posted

You are reading this latest preprint version

Bridging the Gap: Advancing the Transparency and Trustworthiness of Network Intrusion Detection with Explainable AI

Status:

Journal Publication

Version 1

Abstract

Figures

1 Introduction

2 Overview and Categorization of XAI

2.1 Intrinsic and Post-hoc

2.2 Global Interpretability versus Local Interpretability

2.3 Model-agnostic Vs Model Specific

2.4 Explanation Output

3 Interdisciplinary Background

4 Proposed Framework

4.1 Analysis of the dataset

4.2 Data Preprocessing

4.3 Model Architectures

4.3.1 ExplainDTC

4.3.2 SecureForest-RFE

4.3.3 RationaleNet

4.3.4 CNNShield

4.4 Explanation Generation

4.4.1 LIME

4.4.2 SHAP

4.4.3 ELI5

4.4.4 ProtoDash

5 Performance Analysis and Evaluation

5.1 ExplainDTC

5.2 SecureForest-RFE

5.3 RationaleNet

5.4 CNNShield

5.5 Similarity Analysis of Predicted Result

6 Conclusion and Future Scope

Declarations

Data availability statement

References

Status:

Journal Publication

Version 1