Machine Learning-Based Early Intrusion Detection System in Industrial LAN Networks Using Honeypots

doi:10.21203/rs.3.rs-1122586/v1

Download PDF

Research Article

Machine Learning-Based Early Intrusion Detection System in Industrial LAN Networks Using Honeypots

https://doi.org/10.21203/rs.3.rs-1122586/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

The emergence of industrial Cyberinfrastructures, the development of information communication technology in industrial fields, and the remote accessibility of automated Industrial Control Systems (ICS) lead to various cyberattacks on industrial networks and Supervisory Control and Data Acquisition (SCADA) networks. The development of ICS industry-specific cybersecurity mechanisms can reduce the vulnerability of systems to fire, explosion, human accidents, environmental damage, and financial loss. Given that vulnerabilities are the points of penetration into industrial systems, and using these weaknesses, threats are organized, and intrusion into industrial systems occurs. Thus, it is essential to continuously improve the security of the networks of industrial control facilities. Traditional intrusion detection systems have been shown to be sluggish and prone to false positives. As a result, these algorithms' performance and speed must be improved. This paper proposes a novel Honeypot enhanced industrial Early Intrusion Detection System (EIDS) incorporated with Machine Learning (ML) algorithms. The proposed scheme collects data from all sensors of Honeypot and industrial devices from the industrial control network, stores it in the database of EIDS, analyses it using expert ML algorithms. The designed system for early intrusion detection can protect industrial systems against vulnerabilities by alerting the shortest possible time using online data mining in the EIDS database. The results show that the proposed EIDS detects anomalous behavior of the data with a high detection rate, low false positives, and better classification accuracy.

Electrical Engineering

Electronic Materials and Devices

Early Intrusion Detection System

ICS Honeypot

Network Security

Machine Learning

ICS SCADA

EIDS Honeypot Programming

NOWADAYS, cyberattacks on Industrial Control Systems (ICSs) are on the rise. As a result, Industrial Smart Sensors (ISSs) and users' information are vulnerable to these attacks. For this purpose, information security and monitoring of Industrial Intelligent Sensors (IISs) have been considerably expanded with the advent of Information and Communication Technology (ICT) in the present age. Cybersecurity information technology can upgrade communication and network security by providing efficient and practical approaches. Therefore, advanced techniques have been needed to increase the network security level and intrusion detection [1]. Furthermore, with the development of technology, cyberattacks against the information of network users have also been changed. Therefore, security systems must constantly be updated to deal with any zero-day attack [2].

A variety of methods such as Honeypot [3], Intrusion Detection Systems (IDSs) [4], Intrusion Prevention Systems (IPSs) [5], and Machine Learning (ML) [6], etc. have been proposed in the literature. Capturing hackers' information systems and gaining helpful information from attackers is critical for defending IISs nowadays. In this respect, Honeypot ICSs, which are a critical component of recognizing and analyzing cyberattacks, are also capable of obtaining the characteristics of the attacking device, obtaining vital information about the attacker, and identifying the attack pattern. A Honeypot system is a protective strategy that functions as bait, luring the intruder in. Therefore, they deceive the adversary and imitate the fundamental industrial infrastructure. Consequently, imitating industrial control infrastructures protects the primary locations of industrial facilities from destruction and assault.

ICSs refer to the systems that consist of different types of subsystems, including industrial control, processing, monitoring, Supervisory Control and Data Acquisition (SCADA), and Programmable Logic Controllers (PLC) that are used to achieve industrial objectives such as monitoring, control, transportation of energy and instrumentation in mutually dependent manufacturing infrastructures for high production [7, 8].

Some industrial applications of SCADA networks are water flow management in sewage systems [9], gas and oil flow control through pipes in the power plant industry [10, 11], output monitoring in power smart grid systems [12-14], monitoring products distribution in manufacturers [15-16], controlling railway and other transportation lines [17], and processing management in chemical areas [18]. Through the recent advancements in hardware technologies and multiple algorithms such as Artificial Intelligence (AI), ML, Deep Learning (DL), Data Mining (DM), radio communications, telemetry, and computer processing, almost all industries control processes remotely through SCADA.

With the development of 4th generation industries in recent years, modern SCADA networks integrate with the smart sensors, Internet of Things (IoT), AI, cloud-based digital data stored systems, and Big data analytics [19]. Although the combination of emerging technologies could improve the infrastructure and maintenance costs, system performance, and interoperability, it was associated with new security challenges in the near real-time environments, including access control, classic Intrusion Detection Systems (IDSs), protocol vulnerability assessment, facilities, and operating systems (OSs) safety, key management in cryptography algorithms [20, 21] and crosstalk of communication equipment [22]. These cybersecurity attacks are becoming more sophisticated and carrying risks like an explosion in industrial environments, dangers to human life, and financial damage [23]. Thus, achieving a secure SCADA network for ICSs enhances industrial applications' security and performance of cyber-systems [24].

The SCADA framework moves forward the controlling and administering mechanical systems by synchronizing the machine's distinctive parts using the amounts assembled by sensors [25]. Assurance against cyberattacks for the SCADA could be a strict requirement, but it is an essential tool to preserve network security. Using a Honeypot detection system that has the ability to maintain high security can be advantageous in complex industrial systems with high adaptability [26]. In this manner, Honeypots are used to ensure ICS facilities from being assaulted and explore assailants' behavior unauthorized as well as identify to obscure assaults [27].

So far, some researchers have exploited artificial intelligence-based approaches to guarantee SCADA networks' security in ICS fields. Although the existing mechanisms enhance the performance of IDSs in industrial environments, they struggle with unacceptable performance. Besides, some of the IDS models only focus on the cybersecurity arena and ignore the process event states in physical ICS environments [28]. Furthermore, most ML-based ICS networks focus on cyberattack detection in industrial applications, and they do not describe the real impact of threats. Therefore, developing a new IDS with industrial Honeypot for networks to improve industrial infrastructure requirements security is necessary.

There has been no research conducted on the impact of combining ML-based classification approaches with the widely utilized Honeypot methodology. Honeypots are used as the first line of defense in this study to identify vulnerabilities and mitigate attacks. This has the potential to dramatically improve the accuracy and performance of intrusion detection systems. To do this, many machine learning-based categorization approaches are utilized to identify early detection of Honeypot penetration. These classification algorithms include Support Vector Machines (SVMs) [29], Decision Tree (DT) [30], Multi-Layer Perceptron (MLP) [31], Dense Layer [32], K-Nearest Neighbor's (KNNs) [33], 1D Convolutional Neural Network (CNN1D) [34], and Long Short-Term Memory (LSTM) [35]. In this research, the architectural design of early IDS with ML-based Honeypot is proposed as a solution for the detection of intrusion. The proposed method is presented in this research in the following sections.

The remainder of the paper is as follows: To begin, Section II discusses similar works. Then, a general framework of the Honeypot industrial IDS is reviewed, and also Honeypot using ML systems is studied in sections Ⅲ, Ⅳ, and Ⅴ. The performance of the Honeypot in ICS protection utilizing the ML system is tested in Section VI using a variety of datasets. The proposed strategies, establishment and arrangement, are discussed in sections Ⅶ and Ⅷ. Next, the EIDS technique is discussed and summarized as an ICS application in section Ⅸ, and finally, the paper concludes in Section X.

Security is a significant challenge to satisfy the requirements of SCADA applications in ICS. A broad range of approaches has been presented in the literature to address this issue. Tab.1 provides a summary of the prior works that have been done.

A combination of IDS-based techniques, according to the literature, increased the efficacy of ICS networks. However, ML-based approaches have mostly concentrated on cyberattack detection against ICS networks in industrial contexts and have not addressed the real-world impact of attacks on ICS. As a result, it is vital to investigate the impact of merging various deep and shallow machine learning techniques with intrusion detection systems. The contribution of this work is to explore the performance of a Honeypot system when it is integrated with various machine learning and deep learning systems in order to improve its accuracy and computational speed.

Table 1

A summary of previous related works.
Authors	Publication year	Summary
Selvaraj et al. [36]	2016	They presented a revolutionary concept based on honeypot systems and intrusion detection systems. The suggested architecture was designed to deceive attackers, get their information, and lead them astray.
Mashima et al. [37]	2017	Proposed a honeypot system based on an intelligent grid.
Dalamagkas et al. [38]	2019	Reviewed strategies using honeypots in smart grids.
Shi et al. [39]	2019	Proposed a dynamic property Honeypot based on Blockchain to enable the system to discriminate between actual and fictitious resources.
Luo et al. [40]	2017	Proposed an IoT Honeypot to improve the security of IoT devices.
Nursetyo et al. [41]	2019	Proposed a Honeypot system for detecting intruders via an analysis of network server security measures.
Bykara and Das [42]	2018	To improve the efficacy of real-time intrusion detection, a honeypot was paired with an IDS.
Ziaie Tabari and Ou [43]	2020	A multi-tiered and multi-phase IoT Honeypot ecosystem was created to collect data from cyberattackers and analyze it in IoT systems.
Yang et al. [44]	2019	DL networks with SCADA-based systems to safeguard ICSs against conventional and network-based attacks. Employed CNNs to automatically extract significant attributes and retrained new attacks to perform better.
Gao et al. [45]	2020	DL-based IDS detects temporally correlated and uncorrelated threats in SCADA-based systems using a feedforward neural network and an LSTM.
Perez et al. [46]	2018	SVM and RF were used to identify intrusions that were not detected in the database, and it was found that the RF outperformed the SVM in terms of providing security for SCADA systems in ICSs.
Sheng et al. [47]	2021	Introduced a cyber-physical identification approach for assessing threat levels against susceptible industrial systems with control device and protocol shortcomings. The system structure is defined by communication patterns and device statuses. Any deviation from the plan was deemed a cyberattack.
Khan et al. [48]	2019	For imbalanced data in ICSs, a hybrid, multi-level intrusion detection approach was presented. The approach used a KNN rule design to boost detection accuracy. Although the method targeted industrial cybersecurity, it neglected process conditions in physical contexts.
Qian et al. [49]	2020	SCADA networks now include a safe technique for detecting cyber and physical aggression, such as Man-in-the-Middle assaults (MITM). The method also presented an NHF classifier for dataset classification. In the cyber area, the parallel hyperplane of the SVM outperformed this hybrid technique.
Bulle et al. [50]	2020	To identify emerging threats in SCADA networks, a reliable host-based IDS with OS diversity has been created. SCADA communications were tested in an ICS to determine the most dependable OS. Choosing the best OS for IDS improves accuracy compared to situations with just one OS.
Al-Asiri and El-Alfy [51]	2020	We coupled physical and network measurements with IDS to secure SCADA networks. The method was tested on a gas pipeline dataset to see how well it classified features.

Due to the widespread usage of LANs and wireless networks in industrial equipment, cyberattacks have multiplied. However, most industrial control equipment used in critical infrastructure is proprietary, and computer equipment uses proprietary communication protocols that are not connected to the world outside the local area network and are specially protected.

In order to overcome the mentioned shortcomings, a combination strategy is utilized for information collection, information investigation, and information revelation in the proposed integrated EIDS architecture, according to Fig. 1. As shown in Fig. 1, seven ML algorithms models are used to achieve a precise prediction model for analyzing data stored in the EIDS server database. To identify attacks against ICS and record their performance metrics, SVM, MLP, KNN, DT, Dense Layer, CNN1D, and LSTM are used as classification algorithms. According to the results of implementing ML algorithms on the world's leading databases, a model is designed to recognize and show anomaly behaviors of data collected from the EIDS network. This design also enables to obtain of low False-Positive Rates (FPR) to identify and model incoming traffic attacks.

Figure 1 shows the schematic of the proposed research architecture, including the relationship of industrial indoor systems to servers, firewalls, switches, Wireless Fidelity (WiFi), actuators, sensors, etc., that form the overall architecture of EIDS. As seen in this figure, the router is in charge of data routing to the EIDS, and honeypot sensors are critical in safeguarding industrial control systems from unauthorized access and manipulation. If the EIDS determines that the submitted data is potentially hazardous, it immediately notifies the administrator system.

Also, Fig. 1 depicts the process-based industrial automation architecture. Servers, PCs, PLCs, RTUs, and other conventional industrial control devices are available. Input/output subsystems link them to processing equipment (e.g., sensors and valves). In addition, PLCs are typically linked to other computers through a LAN or wireless network.

A. Research Model

The investigated strategy in this work is considered as follows:

• The issue undermining arranged security was recognized; the framework shortcomings of conventional IDS were evaluated from previous paper works performed and then added to the new system topology's general plan.

• The framework of topology for intrusion discovery testing and ongoing evaluation was designed.

• Attack area zones were decreased by dividing networks into logical sections and restricting host-to-network communications direction.

• Blacklists and whitelists were used to protect topologies and architectures designed against potentially harmful applications.

• After implementing and testing the framework, the assessment was performed simultaneously as in previous experiments. The method proceeded until the aiming results were accomplished.

• Access logs were analyzed, and anomalies with ML were verified.

• In the ultimate operation stages, the interruption discovery framework was re-experimented step by step to guarantee proper operation.

The prerequisites for introducing, designing, and programming the EIDS, which incorporates open-source Linux programs and equipment, will be briefly explained and discussed in subsections A to D.

A. Router

The router is the central routing framework for controlling, monitoring, and reading the traffic of industrial and non-industrial networks. Its OS is programmed to store routing in an Internet Protocol table (IPtable) internal memory and communicate well with network equipment distributed across the facility. Therefore, to control and protect the network against attackers, a copy of the router traffic packets will be sent to the IDS system to check the status of the packets in terms of being normal or abnormal.

To detect attackers analyzing incoming traffic is necessary. For this reason, by activating the Sniffer mode in the Mikrotik router, IDS Snort is allowed and has access to read and receive incoming traffic packages. Therefore, all packets will be read from incoming traffic logs and stored in the database to analyze packets suspected of attacking the attackers.

B. IDS Snort

The anomaly packet discovery is a framework for identifying assaults by proposed EIDS. For this purpose, input traffic from the router to IDS snort is compared to IPtables rules until IDS Snort warns about a suspicious packet. All updated rules about the latest attacks, downloaded directly from the site or manually applied in IDS Snort, compare attackers' attacks and abnormal behavior on the network with this updated data and report any abnormal behavior for the desired traffic, and it is sent to the system administrator for review.

According to the sentences mentioned above, IDS Linux must be run to check network traffic packages. Fig. 2 shows the successful implementation of IDS Snort after the installation and configuration steps.

C. WiFi Honeypot Networks

In industrial environments, Radio communication equipment is used in terms of environmental pollution conditions. Consider that the Radio-frequency (RF) environment uses IEEE 802.11 protocols. Therefore, most vulnerabilities in the wireless network environment are performed using this protocol in telecommunication cyberspace.An RF device with WiFi router capability was intended for hardware configuration to be suitable for analyzing attacker data. Extensions were used in Honeypot WiFi to maintain essential functions. Depending on the selected device, a range of possible OSs were used, such as Ubuntu 18.04 LTS, Kali 20.4, Conpot. The mentioned equipment and tools should theoretically perform RF monitoring functions, data traffic evaluation scripts, and implement a virtual access point for attackers.

A permanent configuration is created to monitor RF environment cyberspace using Honeypot WiFi networks so that with proper device operation, specific attacks on the wireless network will be distinguished, such as deactivation attack, Service Set Identifier (SSID) attack, Address Resolution Protocol (ARP) spoofing, Transmission Control Protocol Synchronize (TCP SYN) authentication and identify the denial. At the same time, collected data is sent to the EIDS system database for processing.

WiFi Honeypot supports a wide range of IEEE 802.11 protocols from IEEE 802.11b to IEEE 802.11ac for data transfer in the available frequency bands of 2.4 GHz and 5 GHz. In this design, the device provides its resources to an attacker at a virtual access point. As shown in Fig. 3, the WiFi Honeypot tool shows the RF environment after setting up and configuring for sniffing and standby mode.

D. Conpot

Honeypot Conpot frameworks are sources of data applications that are used explicitly in industrial control and processing equipment designed to collect anomalous data, detect logs used by unauthorized users, and trap attackers. Honeypot ICS/SCADA Conpot has unique industrial protocols to check the performance of an attacker when logging in. The Conpot is programmed to support all ports used in the industry, such as Modbus, SNMP, HTTP, IP, etc. It can easily encourage attackers to work with it and prevent them from identifying themselves as an industrial fake control process server.

Security systems such as anti-virus, firewall, and IDS are slow in detecting attacks and cause erroneous detection rates, manifesting themselves in False Positive (FP) and False Negative (FN), resulting in misdiagnosis, makes it possible for industrial systems to become vulnerable. Therefore, it is difficult for network administrators to identify new methods of attack. The contribution of this work is to evaluate the performance of different machine learning algorithms incorporated with Honeypot to accelerate and improve the accuracy of intrusion detection in industrial environments. The accelerated framework can help with the timely detection of intrusion. Since all activities done by attackers are recorded by Honeypot and will be sent to the EIDS database to store, they are suitable for analysis in ML systems. In ML, classifying data into normal and abnormal classes can improve the diagnosis speed of abnormal situations and the system issues and announce the necessary warning to the system administrator in an online manner with a high percentage coefficient and the lowest detection error.

ML gives more choice components in analyzing the behaviors of attackers. It has outstanding performance compared to manual and hardware analysis such as traditional techniques, and it is highly accurate in dealing with Big data. Furthermore, making special adjustments to learning methods makes it possible to automatically display data from raw data and output results to ML algorithms to detect real-time intrusion. A distinctive feature of ML is its DL structure, which, compared to traditional ML, involves several hidden layers. Traditional ML models include one layer. Therefore, its conventional models are called Shallow Learning (SL) models. A simple flowchart about the classification of conventional SL and DL is shown in Fig. 4.

A. Support Vector Machine (SVM)

In the SVM algorithm method, each data item's classification will be plotted to point in the next n space that n will have multiple EIDS attributes. In this way, the size of each character will have a certain number of coordinates. The properties of variables will be plotted in a two-dimensional space where each point has two coordinates known as support vectors. By entering this algorithm, the data, given that they will be in their unique coordinates, will form groups that can be classified with a line between these groups. This line will organize two groups.

B. Multi-Layer Perceptron (MLP)

In MLP, inputs are not directly connected to the output. Instead, there is a middle layer for this. Like the first layer, this layer does the same thing as multiplying the values in the weights and then adding them to the bias value. The last line with a more complex pattern can be discovered using the middle layer, also known as the hidden layer.

C. K-Nearest Neighbor (KNN)

In the KNN algorithm, each new sample's distance must first be compared with the previous samples. To do this, comparing and subtracting each of the unknown sample features from the same features found in an old sample seems to be necessary. Different dimensions are compared on a peer-to-peer basis. Then the results of these reductions are added. K In the k-nearest-neighbor algorithm is the number of nearest neighbors which used to vote on a data sample's status belonging to existing classes. The choice of the parameter k in the k-nearest neighbor algorithm is significant. As the value of k increases, the boundaries of the categories become smoother.

D. Decision Tree (DT)

The decision tree consists of several nodes and branches that make specimens in such a way that it grows from the root downwards and finally reaches the leaf nodes. In addition, a property identifies each internal or non-leaf node.

E. Dense layer

The new generation of ML algorithms, known as DL algorithms, seeks to reduce feature extraction from input data. In these new methods, an attempt is made to extract the feature in the training process and the algorithm itself. This critical point in these networks and their high accuracy in many previously performed tasks by conventional ML algorithms cause this field to grow and develop with incredible intensity.

In the dense layer, fully connected networks are used to classify the traffic of telecommunication and computer networks. These dense networks are made up of layers in which each neuron is connected to all the neurons in its next layer. To build these networks, using a sequential structure to organize the layers seems essential (each layer only relates to the next layer). When using dense layers, the number of neurons in each layer must be first determined. The number of these neurons is a hyperparameter that can help get the right amount by trial and error.

F. One-Dimensional Convolution Neural Network (CNN1D)

The CNN1D with a DL approach is used to develop the IDS function presented in this scheme that gets the data set features properly. The CNN1D model learns a single property in the first layer of convolution and calculates its weight with the defined core size according to the length of the input matrix. Subsequent layers of convolution follow the same logic as the first layer of convolution. Because CNN1D works with filters on input data and can adjust the filters if adequately trained, therefore, it is easier to train CNN1D models with lower initial parameters than other neural networks, and a large number of hidden layers will not be needed because convolutional will be able to control the discovery of hidden layers.

G. Extended Short-Term Memory networks (LSTM)

LSTM uses repetition to remember information for long periods by learning long-term dependencies. In a typical neural network, all inputs and outputs are independent of each other, but this idea can be terrible in many cases. For example, suppose a person is trying to predict the next word in a sentence. If the network cannot learn the relationships between the terms, it cannot predict the next word correctly. So, to expect the next time step, it is necessary to update the weights in the network, which requires preserving information of the initial time steps, but LSTMs can learn these long-term dependencies correctly.

The success of the Honeypot early detection system depends on the correct choice of factors and features used in tracking attacks. This paper presents Honeypot EIDS technology using DL and SL algorithms because of their suitability in identifying attackers by extracting and collecting the most salient features from the attackers' performance logs. All models are implemented using Python programming language and the open-source scikit-learn library.

Data used for testing the models should be easily obtained for the proposed EIDS and reflect the host or network's behavior. Consider that building a dataset is a complex and time-consuming process. Therefore, using a benchmark dataset helps to facilitate the diagnosis time. Because the benchmark data sets are valid, they produce and extract the experimental results in the laboratory research more convincing and allow the results in the proposed method to be compared with previous studies. To determine the most optimal and efficient detection model for the stored data from the Honeypot, EIDS logs are used in the laboratory for this research and ensure its results and accuracy. Three world-famous datasets are used in this study: NSL-KDD dataset, CIC-IDS-2017 dataset, and Kyoto 2006 dataset explained in the following sections.

Therefore, an executable implementation model is designed to classify the datasets mentioned according to Fig. 5, which can be run and done with this method, such as importing the dataset, data preprocessing, data analysis, etc.

A. NSL-KDD

The NSL-KDD data has one test data and one train data, which are network records. The NSL-KDD version has 43 features; 41 are related to incoming traffic, the normal tag or attack, and the other items related to traffic intensity points.

The most important thing in this benchmark database is the attribute tag that specifies the normal or attacks label. The attribute tag tells whether this record is a normal record or an attack record, and all the records in these attributes are data. Also, there are more attacks on test data, and there are cases of unknown attacks but do not deal with this data in this way. Given the number of attacks mentioned and the normal state, five classes for work are considered. Normal, Dos, R2L, U2R, Probe.

Here are five tags for data. The preprocessing work is expanded, and the algorithms used in the DL and SL discussion are chosen to develop a simple process that makes testing and scaling easier.

B. CIC-IDS 2017

CIC-IDS2017 is a dataset with 78 features and respective class labels that include various 14 diversity of attacks, such as brute-force, Denial of Service (DoS), web attacks, etc. By summarizing the above example, it can be concluded that CICIDS2017 counts eight categories: benign, brute-force, DoS, web attacks, infiltration, botnet, port scan.

C. Kyoto 2006

The Kyoto 2006 dataset is the actual network traffic logs extracted from Honeypot sensors. These logs contain data collected from different types of Honeypots that consists of 23 attributes plus a tag attribute. This data log also includes normal traffic and abnormal traffic (types of attacks).In this data set, several columns are defined as primary columns that consist of known attacks, such as the Identification of all attacks in the label column, detected attacks with the exploit code in the Ashula detection column, the detection of Malware attacks in the Malware detection column, and the attack detection with the IDS firewall in the IDS detection column.

D. EIDS Database

As the logs are generated in the industrial network, IDS Snort completes the dataset of the EIDS. The proposed EIDS database is a lightweight and potent tool that authorizes the system to detect intrusion of malicious network traffic early. So, almost any threat that crosses the network can be identified by defining flexible and robust rules. To provide the mentioned needs, a solution to process the alert data of this huge dataset is needed.

Therefore, the CSV format for processing alert data is used, which is the most flexible and compatible method for data collection. To configure IDS Snort to use the CSV output format, add the following command to the Snort. conf file:

output alert_csv: alert.csv default

This command configures IDS Snort to create a CSV log file called alert.csv in the configuration log using the default output, and 30 features can be extracted from IDS Snort in the following as Tab. 2.

Table 2

Generated features for the EIDS database.
Feature		Feature
time	icmpseq	icmpid
icmpcode	date	sig_generator
icmptype	iplen	dgmlen
id	tos	ttl
tcpwindow	tcpln	tcpack
tcpseq	tcpflags	ethlen
ethdst	ethsrc	dstport
dst	srcport	src
proto	msg	sig_rev
sig_id	timestamp

Honeypot EIDSs are used to detect cyberattacks in a network of the ICS. Thus, various studies have been conducted on high-performance datasets based on ML techniques. In the field of IDS, famous datasets are available for evaluations like NSL-KDD intrusion detection datasets, CIC-IDS 2017, and Kyoto 2006 datasets. However, these datasets do not reflect recent cyberattack trends in the proposed research. For this reason, the EIDS dataset from the same traffic data from the study with the latest Snort Log is refined. Besides, the new dataset is evaluated by applying several ML techniques and comparing the datasets' classification results.

E. Metrics

Methods and criteria are needed to measure accuracy, Recall (R), Precision (P), and F1-Score to evaluate the methods used by ML for the proposed design in this paper to achieve the most optimal model for analyzing data properties. Therefore, the following criteria used in this article are briefly explained, along with the relevant formula and equations.

1) Accuracy

The accuracy parameter expresses the number of correct predictions made by the category divided by the number of total predictions made by the same category. That is the ratio of accurate diagnoses True Positive (TP) + True Negative (TN) to the total data that included TP + TN + False Positive (FP) + False Negative (FN). This criterion is very effective for many real-world classification problems because it considers both unintended data (denominator) and identifying data (deduction form) as equation 1. The goal of the proposed method is to reach accuracy=1 or 100%.

$$Accuracy=\frac{(TP + TN)}{(TP + TN + FP + FN)}{(1)}$$

2) Recall (R)

Accuracy parameter is not suitable for unbalanced data, data whose number of positive and negative labels in the real world are very different numerically. Many real-world issues fall into this category. This significant difference between different data sets makes the accuracy criterion inefficient. Therefore, a more objective benchmark is needed for measuring the proposed classification algorithms' accuracy and efficiency. In such cases, it is better to focus on the number of TP identified as the total number of positive samples. The R parameter for this purpose is defined as equation 2.

$$R=\frac{TP }{TP +FN }{(2)}$$

3) Precision (P)

If any positive sample in the R formula cannot be detected, its fraction becomes zero, which indicates that the proposed model is weak and can therefore be quickly rejected. To solve this problem, in addition to the retrieval criterion, another benchmark is defined called Precision, equal to the number of TP diagnostic samples to the total positive samples declared as equation 3 to consider the number of FP.

$$P=\frac{TP }{TP +FP }{(3)}$$

4) F1-Score

If a combination of these two criteria, R and P, can be obtained for measuring classification algorithms, it would be more appropriate to focus on one criterion instead of examining the two simultaneously. An average of the two as a new criterion can be used in order to raise the arithmetic mean. Therefore, the usual average of the two R and P measures is considered a criterion for high R and low P (or vice versa). In that case, the normal numerical average will be accepted if the proposed algorithm does not get a passing score.

According to equation 4, this average harmony for the two values of R and P is known as F1-Score, which according to the above procedure, is equal to:

$$F1=2\frac{P\times R}{P+R}{(4)}$$

5) False-Negative Rate (FNR)

According to the specified type of action in equation 5, FNR means when the sensor detects healthy traffic as malicious traffic and acts on this traffic. According to the applied signatures on the system, the proposed healthy traffic will be blocked, and it will not be allowed to pass, or if there are actions for logging, it will generate logs and alerts. Therefore, it would be difficult for the system administrator to root the reasons for the created FNs when checking logs and alerts. In general, having many FNs in the network hurts network performance and must be identified, investigated, and managed.

$$\text{F}\text{N}\text{R} = \frac{FN}{TP+ FN}{(5)}$$

6) False-Positive Rate (FPR)

The FPR in equation 6 means when the sensor does not detect malicious traffic. In this case, the proposed network is endangered because malicious traffic passes through the proposed network without being detected and blocked and can damage the network resources. This non-detection of malicious traffic can be due to various reasons. For example, the sensor signatures have not been updated, and new signatures have not been received, or the sensor settings have not been done correctly. Therefore, the sensor has not been able to function correctly, or this malicious traffic is a new method that has not yet been addressed.

$$\text{F}\text{P}\text{R} = \frac{FP}{FP+ TP}{(6)}$$

F. Monitoring and data mining application

This program uses 1056 lines of code and some other codes, such as web recall applications, algorithm connectors, etc., to execute the learning model algorithms in the EIDS project program. As shown in Fig. 6, the design is done in a convenient, simple, and user-friendly way for learning model systems, so that the steps of uploading CSV files can be done directly from the EIDS system log storage, and real-time analysis can be done to detect new attacks. The advantage of this is that it increases the percentage of reliability and reliability in detecting the early intrusion system alongside the detection system and gives us a deeper and more comprehensive understanding of the detection and investigation of various attacks to analyze the behavior and future actions of attackers.

The EIDS presented architecture in Fig. 1 has used several different interconnected industrial networks that form a comprehensive industrial network. Each industrial network has its own identifiers that nodes outside the network must be known in order to be able to send data. Furthermore, there are other networks between the source network and the destination network, and there is more than one route to send data from the source to the destination. Therefore, this extensive network nodes' connection with each other is not as simple as the connections of nodes within a network.

To plan an early intrusion framework, an arranged plan, agreeing to Fig. 1, was utilized. An Industrial switch, used in Fig. 1, sets all the connections in the comprehensive industrial network because the industrial switch has the ability of interfacing users` systems to computers, servers, and IP-based broadcast communications.

The EIDS framework is designed as it is within the instruction discovery mode to be prepared to distinguish assaults on only fake devices. In Fig. 7, a sample of used signatures for anomaly discovery using fake devices created by Honeypot sensors is shown.

Figure 8 shows the steps for diagnosing anomalies using ML in the logs stored by the EIDS project program, which can help system administrators to see the results of detecting anomalies in this designed application in the shortest time and with the least error. The advantage of such an early detection program with ML is to improve system performance in EIDS to prevent intruders from infiltrating key OSs prior to sabotage by such attackers. Therefore, SL and DL in ML are used to assist the EIDS hardware and software detection system in improving performance and reliability to identify and analyze intruders, as shown in Figure 8. This system offers learning models for processing and extracting implemented results.

The rules utilized in designing an EIDS based on ML are used for early attack detection in industrial networks. Several rules have been written to distinguish between a standard package, an unknown package, and an attack package to detect early attacks properly. As the IDS Snort task in the designed EIDS matches the data packets to the router's Iptables pattern and determines whether the containers are operating as instructed, this mode controls the data packets throughout the incoming network traffic according to the available sensors. Therefore, the EIDS application reports any abnormal traffic instantaneously, as can be seen in the EIDS project program from the Monitoring and data mining application, which displays abnormal network traffic status according to the code written in those programs. The EIDS administrator takes contingency measures by observing the current status of the reported IPs, and if the administrator wants to know about the intruder's behavior, the mentioned quick process from the attacker's behavior in Fig. 9 can be an excellent solution to decide on the mentioned conditions. One of the advantages of using the proposed network monitoring system is achieving the duration of DDOS attacks with destination IP addresses known as attackers. The design is significant, simple, and user-friendly, and the use of such an advanced system increases the security of industrial control systems in new generation industrial facilities.

As shown in Fig. 10, Conpot Honeypot sensors are utilized as deceiver Industrial PLCs and equipment sensors. The assignment of these imitator frameworks in computer, media transmission, and processing is to play down network attacker's detection time. That's why accessing these frameworks has become facilitated to create them appealing to assailants and information. For example, Fig. 10 illustrates a deceiver Conpot recorder framework on the Ubuntu OS framework called fake Conpot-1.

In Fig. 11, as shown, immediately after detecting abnormal traffic, attacker information is sent by EIDS to the system administrator's monitor screen for display. The display of attacking information depends on the definition of our rules, and this is a significant design advantage. Attack type, IP of origin, and destination of the attack are also displayed.

EIDS has several advantages, such as showing cautions as logs in its program. To group these cautions, comprehensive monitoring needs to be used. In addition to the EIDS project program based on ML, another comprehensive monitoring system compatible with the EIDS is the Basic Analysis and Security Engine (BASE). All the recall work is recorded by the assailants on the trap Conpot frameworks. At this time, early interruption location cautions are sent by the EIDS application at the side of all Information determinations to BASE. Fig. 12 shows the appearance of the framework along with the taken information attack from Honeypot sensors in the BASE screen.

The logs of all incoming and outgoing traffic that interacted with the Honeypot sensors in any way are stored in the MySQL database of the Linux-based OS. The EIDS dataset has been introduced for the accuracy of the operation and the ability to detect the intrusion of logs stored in the EIDS database. Being compared with standard evaluation methods is essential, which makes the evaluation results reliable for this dataset.

Therefore, the four databases have been analyzed and processed separately. The results of separate analyses of each item according to the measurements metrics given are obtained and shown in the form of tables and diagrams in this section with explanations.

For the NSL-KDD dataset, processing and analyzing the obtained results is given in Tab. 3. The obtained results from 7 algorithms in Tab.3 are shown in Fig. 13 as a bar chart. In Fig. 16(a), the learning curve of the seven classification algorithms is demonstrated.

Table 3

Obtained results for detecting anomalies traffic for the algorithms used in the research for the NSL-KDD dataset.
Method	Accuracy	Recall	Precision	F1
Tree	0.763	0.662	0.895	0.761
KNN	0.772	0.652	0.924	0.765
MLP	0.797	0.696	0.929	0.796
SVM	0.763	0.639	0.920	0.754
Dense	0.815	0.706	0.959	0.813
CNN1D	0.770	0.652	0.922	0.764
LSTM	0.886	0.976	0.847	0.907

According to Tab. 3, the LSTM network has shown the best performance on the NSL-KDD dataset in terms of the Accuracy, Recall, and F1-score. Although, Dense network has the highest Precision, its F1-score much lower than the LSTM model. The results presented in Tab.3 is illustrated in Fig. 13.

Similarly, Tab. 4 shows the obtained results from the CIC-IDS2017 dataset. The obtained results from 7 algorithms are demonstrated in Fig. 14 and 16(b). In terms of described in the accuracy and F1-Score metrics criterion, the KNN Shallow Learning algorithm performs better than other methods, as shown in Fig. 14. The DT algorithm in Fig. 14 offers a very high P, but it doesn't mean that this algorithm has a high F1-Score.

Table 4

Obtained results for detecting anomalies traffic for the algorithms used in the research for the CIC-IDS2017 dataset.
Method	Accuracy	Recall	Precision	F1
Tree	0.998	0.852	0.995	0.918
KNN	1.000	0.987	0.985	0.986
MLP	0.997	0.958	0.835	0.892
SVM	0.994	0.854	0.736	0.791
Dense	0.997	0.976	0.836	0.900
CNN1D	0.995	0.863	0.808	0.808
LSTM	0.680	0.912	0.037	0.070

Next, Tab. 5 shows the obtained results from the Kyoto 2006 dataset. The obtained results from 7 algorithms are demonstrated in Fig. 15 and 16(c). In terms described in the accuracy and F1-Score criterion, the DT Shallow Learning algorithm performs better than other methods, as shown in Fig. 15. The SVM algorithm in Fig. 15 offers a high R, but it doesn't mean that this algorithm has a high F1-Score.

Table 5

Obtained results for detecting anomalies traffic for the algorithms used in the research for the Kyoto 2006 dataset.
Method	Accuracy	Recall	Precision	F1
Tree	0.9995	0.9995	0.9993	0.9994
KNN	0.9901	0.9993	0.9775	0.9883
MLP	0.9743	0.9806	0.9586	0.9695
SVM	0.8774	0.9881	0.7778	0.8704
Dense	0.8521	0.6455	0.9994	0.7844
CNN1D	0.9794	0.9514	0.9992	0.9747
LSTM	0.6262	0.5344	0.5533	0.5436

Features for the EIDS database are considered to be maximally effective features that will help the data stored as logs to be used in the best possible way to detect anomalies in the EIDS system. A pair plot in Fig. 17 is shown to prove this subject in EIDS systems. A pair plot is a distribution diagram that basically draws a common diagram for all possible combinations of numeric and Boolean columns in the EIDS database and sends the EIDS data frame as a parameter to the pair plot function. All null values were removed from the data before the pair plot command was executed. Common diagrams of all numeric and Boolean columns in the EIDS database can be viewed in the output of the pair diagram. The batch column name is given to the hue parameter to add categorical column information to a pair chart. For example, to draw label information on a pair chart, information about normal logs in blue and information about abnormal logs (attack logs) in orange are visible in the output (as shown in the descriptions and abbreviations). This is clearly seen in the common diagram at the top left that most early detection of right logs is related to attacks.

Finally, for the model presented in this study called EIDS, Tab. 6 shows the obtained results for the EIDS database. The obtained results from 7 algorithms are demonstrated in Fig. 18 and 19. The accuracy criterion shows that the DT, KNN, MLP, and SVM Shallow Learning algorithm and dense layer and CNN1D from Deep Learning algorithms perform better than other methods shown in Fig. 18.

Table 6

Obtained Results for detecting traffic anomalies for the algorithms used in the research for the EIDS database.
Method	Accuracy	Recall	Precision	F1
Tree	0.9995	0.9995	0.9993	0.9994
KNN	0.9901	0.9993	0.9775	0.9883
MLP	0.9743	0.9806	0.9586	0.9695
SVM	0.8774	0.9881	0.7778	0.8704
Dense	0.8521	0.6455	0.9994	0.7844
CNN1D	0.9794	0.9514	0.9992	0.9747
LSTM	0.6262	0.5344	0.5533	0.5436

Modeling the proposed EIDS is tried to comprehensively cover all available ICSs such as RF systems, PLC, temperature sensors, pressure sensors, flow sensors, position valves, actual valves, and types of actuators such as control valves, actuators with electric motors, etc. Accordingly, in this study, for all the mentioned industrial control facilities, simulation was done. Due to the comprehensiveness of the proposed work, modeling and application programing were studied and performed in several different telecommunication areas, industrial computer, control processing, and instrumentation. Simulations were performed in several other regions due to the work scope of the simulation; for example, on industrial wireless networks, it was utterly different from simulations in the field of industrial PLCs.

In the real industrial environment, the proposed system must detect the attackers' attacks early. So, simulation and design work had to be done for several different areas mentioned simultaneously. Therefore, these systems' incoming and outgoing traffic logs were collected in existing facilities, which are practically in several various networks with other protocols, in a comprehensive database as the EIDS system database.

The system displays suspicious attack traffic on the BASE screen and the screen created specifically for this study, which serves as a detection screen. If necessary, the industrial facility security system manager, by analyzing the data received from the attacker by ML, examines all the attacker's behavior and makes a decisive decision for the attacker before the attacker has enough time and opportunity to sabotage the control systems of industrial facilities.

So, the incoming and outgoing traffic logs collected from Honeypot's extensive network of sensors are real in this simulation. The labeling of this data is also privatized and applied using a creative idea. Also, labeling network logs is scalable because with operational experience in the laboratory simulation environment, as Fig. 20, the strengths of the data infrastructure obtained from the sensors were created and established according to the dataset's needs. In Fig. 20, detected normal (valid source & destination) and anomalies (invalid source & destination) traffic with Python simulation application EIDS project from recorded real EIDS database are given as well as measured distinguished amount of normal and attack logs shown as a percentage in the pie chart in the real-time.

The EIDS database is needed to be correctly labeled, as far as possible, by comparing the simulation results obtained from other datasets in this field with the results obtained from this study. Complete data labeling, accuracy, and scientific work methods are accessible in the EIDS study. In this research, various approaches and strategies for collecting correct data have been studied. Besides, repetitive records in the test cause confusion in the detection rate of the evaluation result in duplicative documents. All duplicate records of the entire train and test data were deleted to solve this problem, and only one copy of each document has remained. The preprocessing is adjusted according to the simulation needs in the laboratory relative to the obtained results. This section performs a series of experiments on the datasets to demonstrate the correctness of the EIDS research project dataset in this paper.

The proposed EIDS system's performance of early detection utilizing stored logs in the database EIDS using machine learning was compared to other large datasets in the globe, which were and their findings were acquired. were analyzed, processed, and reviewed, and the results obtained from SL and DL algorithms for EIDS indicate the success of the IDS in its early detection. Besides, EIDS performed in all the algorithms under the same conditions as with the other datasets.

In Fig. 21 (a), and 21 (b), respectively, the results obtained for detecting anomalies traffic in the EIDS database using different ML types have been performed on the mentioned algorithms. The two crucial criteria, accuracy and F1-Score for EIDS, show that they have more satisfying detection results from anomalies with high accuracy compared to other datasets. Therefore, accuracy and F1-Score have an impressive and acceptable status and show the efficiency and improvement of the proposed method compared to other datasets.

Also, in Tab. 7 and 8, respectively, for the two essential criteria accuracy and F1-Score, the improvement rate of the obtained results is given in percentage for detecting traffic anomalies of the proposed EIDS database compared to the three datasets mentioned in this research. This improvement is so significant for EIDS that the designed method for this study can detect the intrusion of abnormal traffic and shows high accurate performance. Therefore, this design can be used in industrial facilities with high reliability.

Table 7

The improvement rate of the obtained results for accuracy is written in percentage for detecting traffic anomalies of the proposed EIDS database compared to the three other datasets mentioned in this research.
Dataset Method	NSL-KDD Accuracy	CIC-IDS2017 Accuracy	Kyoto2006 Accuracy
Tree	31.00%	0.20%	0.05%
KNN	29.59%	0.02%	0.99%
MLP	25.50%	0.31%	2.64%
SVM	31.06%	0.60%	13.97%
Dense	22.66%	0.29%	17.35%
CNN1D	29.80%	0.46%	2.10%
LSTM	11.67%	45.61%	58.03%

Table 8

The improvement rate of the obtained results for F1-Score is written in percentage for detecting traffic anomalies of the proposed EIDS database compared to the three other datasets mentioned in this research.
Dataset Method	NSL-KDD F1-Score	CIC-IDS2017 F1-Score	Kyoto2006 F1-Score
Tree	31.39%	8.94%	0.06%
KNN	30.75%	1.45%	1.18%
MLP	25.62%	12.10%	3.15%
SVM	32.58%	26.49%	14.89%
Dense	23.00%	11.09%	27.49%
CNN1D	30.92%	19.82%	2.59%
LSTM	9.67%	1316.56%	82.97%

This paper investigated the effect of combining different deep and shallow machine learning algorithms with the existing Honeypot mechanism to improve the speed and accuracy of early intrusion detection systems in industrial environments. According to the results, different data sources such as logs, packets, flows, and the learning algorithms analyzed different sessions. To produce a suitable database, simulation of DDoS attacks, sniffing attacks, Port Scanner attacks, etc. on sensors with Ubuntu OS, Centos OS, Raspberry OS, Windows OS, PLC frameworks, WiFi, Router OS, MikroTik OS, MikroTik switches, and other required tools and equipment were performed. The EIDS system, using expanded sensors such as Conpot ICS/SCADA Honeypot, WiFi Honeypot, Cowrie Honeypot, and Kippo Honeypot, successfully stored the attack logs in a laboratory environment for industrial installations and sent a copy of the attack data and logs directly to EIDS database. In fact, it offers a special solution based on Honeypot and a combination of ML algorithms for modeling and forecasting to identify and classify the characteristics such as normal and abnormal (suspicious) data. Simultaneously with these steps, the effects of attacks on all utilized sensors in this research were also displayed in the web of EIDS project and BASE monitoring to the system administrator. By analyzing the received data from the sensor traffic, the system administrator could easily analyze the attacker's behavior in real-time in the EIDS based on the industrial Honeypot designed for this research.

Various datasets, such as NSL-KDD, CIC-IDS2017, and Kyoto 2006, were used to implement a comprehensive plan for the classification of industrial networks, and a database was created based on the best features. Finally, the accuracy index was evaluated in a fully equipped ICS laboratory for the three reference datasets and a database in the proposed method (EIDS) in ML. The accuracy of EIDS has increased compared to the three mentioned datasets. The accuracy of EIDS on the main database has increased by 31% in test data DT, 29.59% in test data KNN, 25.50% in test data MLP, 31.06% in test data SVM, 22.66% in Dense layer test data, 29.80% in test data CNN1D, and 66% in test data LSTM compared to NSL-KDD. The accuracy of EIDS on the main database has increased by 0.20% in test data DT, 0.20% in test data KNN, 0.31% in test data MLP, 0.60% in test data SVM, 0.29% in Dense layer test data, 0.46% in test data CNN1D, and 45.61% in test data LSTM compared to CIC-IDS2017. The accuracy of EIDS on the main database has increased by 0.05% in test data DT, 0.99% in test data KNN, 2.64% in test data MLP, 13.97% in test data SVM, 17.35% in Dense layer test data, 2.10% in test data CNN1D, and 58.03% in test data LSTM compared to Kyoto 2006.

According to the obtained results, the program developed for this research significantly improved the analysis of the EIDS database for early intrusion detection compared to other datasets. The performed design with high accuracy can detect abnormal traffic in industrial facilities by its expanded sensors in the network of industrial facilities. Therefore, it is an efficient and integrated system for cybersecurity to counter future attacks and Zero days in industrial facilities.

AI	Artificial Intelligence
ANN	Artificial Neural Network
ARP	Address Resolution Protocol
BASE	Basic Analysis and Security Engine
CIC	Canadian Institute for Cybersecurity
CNN	Convolutional Neural Network
DDoS	Distributed Denial of Service
DL	Deep Learning
DM	Data Mining
EIDS	Early Intrusion Detection System
FN	False Negative
FP	False Positive
FNR	False Negative Rate
ICS	Industrial Control Systems
ICT	Information and Communication Technology
IDS	Intrusion Detection System
IIS	Industrial Intelligent Sensor
IoT	Internet of Things
ISS	Industrial Smart Sensors
KDD	Knowledge Discovery and Data mining
LAN	Local Area Network
MITM	Man-in-the-Middle
ML	Machine Learning
MLP	MultiLayer Perceptron
OS	Operating System
P	Precision
PLC	Programmable Logic Controller
R	Recall
R2L	Remote to Local
RTU	Remote Terminal Unit
SCADA	Supervisory Control and Data Acquisition
SL	Shallow Learning
SSH	Secure Shell
TN	True Negative
TP	True Positive
U2R	User to Root

Pei, C., Xiao, Y., Liang, W., & Han, X. (2020). Pmu placement protection against coordinated false data injection attacks in smart grid. IEEE transactions on industry applications, 56(4), 4381–4393
Pashaei, A., Akbari, M. E., Lighvan, M. Z., & Teymorzade, H. A. (2020, June). Improving the IDS performance through early detection approach in local area networks using industrial control systems of honeypot. In 2020 IEEE International Conference on Environment and Electrical Engineering and 2020 IEEE Industrial and Commercial Power Systems Europe (EEEIC/I&CPS Europe) (pp. 1-5). IEEE
Wang, K., Du, M., Maharjan, S., & Sun, Y. (2017). Strategic honeypot game model for distributed denial of service attacks in the smart grid. IEEE Transactions on Smart Grid, 8(5), 2474–2482
Dehkordi, A. B., & Soltanaghaei, M. (2020). A novel distributed denial of service (DDoS) detection method in software defined networks. IEEE Transactions on Industry Applications
Haghighi, M. S., Farivar, F., & Jolfaei, A. (2020). A machine learning-based approach to build zero false-positive IPSs for industrial IoT and CPS with a case study on power grids security. IEEE Transactions on Industry Applications
Mohamed, M. A., Almalaq, A., Awwad, E. M., El-Meligy, M. A., Sharaf, M., & Ali, Z. M. (2020). A modified balancing approach for renewable based microgrids using deep adversarial learning. IEEE Transactions on Industry Applications
Bhamare, D., Zolanvari, M., Erbad, A., Jain, R., Khan, K., & Meskin, N. (2020). Cybersecurity for industrial control systems: A survey. computers & security, 89, 101677
Gómez, Á. L. P., Maimó, L. F., Celdran, A. H., Clemente, F. J. G., Sarmiento, C. C., Masa, C. J. D. C., & Nistal, R. M. (2019). On the generation of anomaly detection datasets in industrial control systems. IEEE Access, 7, 177460–177473
Sean, W. Y., Chu, Y. Y., Mallu, L. L., Chen, J. G., & Liu, H. Y. (2020). Energy consumption analysis in wastewater treatment plants using simulation and SCADA system: Case study in northern Taiwan. Journal of Cleaner Production, 276, 124248
Lu, H., Iseley, T., Behbahani, S., & Fu, L. (2020). Leakage detection techniques for oil and gas pipelines: State-of-the-art. Tunnelling and Underground Space Technology, 98, 103249
Kim, S., Heo, G., Zio, E., Shin, J., & Song, J. G. (2020). Cyber attack taxonomy for digital environment in nuclear power plants. Nuclear Engineering and Technology, 52(5), 995–1001
Kermani, M., Adelmanesh, B., Shirdare, E., Sima, C. A., Carnì, D. L., & Martirano, L. (2021). Intelligent energy management based on SCADA system in a real Microgrid for smart building applications. Renewable Energy, 171, 1115–1127
Wang, X., Zhao, Q., Yang, X., & Zeng, B. (2020). Condition monitoring of wind turbines based on analysis of temperature-related parameters in supervisory control and data acquisition data. Measurement and Control, 53(1-2), 164–180
Wertani, H., Salem, J. B., & Lakhoua, M. N. (2020). Analysis and supervision of a smart grid system with a systemic tool. The Electricity Journal, 33(6), 106784
Aghenta, L. O., & Iqbal, M. T. (2019, May). Development of an IoT Based Open Source SCADA System for PV System Monitoring. In 2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE) (pp. 1-4). IEEE
Pliatsios, D., Sarigiannidis, P., Lagkas, T., & Sarigiannidis, A. G. (2020). A survey on SCADA systems: secure protocols, incidents, threats and tactics. IEEE Communications Surveys & Tutorials, 22(3), 1942–1976
Yu, S., Chang, H., & Wang, H. (2020). Design of Cloud Computing and Microservice-Based Urban Rail Transit Integrated Supervisory Control System Plus. Urban Rail Transit, 6(4), 187–204
Kosturko, J., Schlieber, E., Futch, S., & Nielson, S. (2018, October). Cracking a Continuous Flow Reactor: A Vulnerability Assessment for Chemical Additive Manufacturing Devices. In 2018 IEEE International Symposium on Technologies for Homeland Security (HST) (pp. 1-6). IEEE
Xu, X., & Tian, N. (2019, December). The Search and Improvement of DES Algorithm for Data Transmission Security in SCADA. In 2019 International Conference on Intelligent Computing, Automation and Systems (ICICAS) (pp. 275-279). IEEE
Rosa, L., Freitas, M., Mazo, S., Monteiro, E., Cruz, T., & Simões, P. (2019). A comprehensive security analysis of a scada protocol: From OSINT to Mitigation. IEEE Access, 7, 42156–42168
Abou, A. (2021). Securing SCADA and critical industrial systems: From needs to security mechanisms. International Journal of Critical Infrastructure Protection, 32, 100394
Pashaei, A., Andalib, A., & Banaei, H. A. (2014). Decrease of crosstalk phenomenon optic two-channel demultiplexer using resonant line defect cavity in 2D photonic crystal.Majlesi journal of Telecommunication devices, 3(1)
Ferrag, M. A., Babaghayou, M., & Yazici, M. A. (2020). Cyber security for fog-based smart grid SCADA systems: Solutions and challenges. Journal of Information Security and Applications, 52, 102500
Ahn, S., Lee, T., & Kim, K. (2019, October). A Study on Improving Security of ICS through Honeypot and ARP Spoofing. In 2019 International Conference on Information and Communication Technology Convergence (ICTC) (pp. 964-967). IEEE
Pliatsios, D., Sarigiannidis, P., Liatifis, T., Rompolos, K., & Siniosoglou, I. (2019, September). A novel and interactive industrial control system honeypot for critical smart grid infrastructure. In 2019 IEEE 24th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD) (pp. 1-6). IEEE
Yamin, M. M., Katt, B., Sattar, K., & Ahmad, M. B. (2019, March). Implementation of insider threat detection system using honeypot based sensors and threat analytics. In Future of Information and Communication Conference (pp. 801-829). Springer, Cham
Zhao, C., & Qin, S. (2017, December). A research for high interactive honepot based on industrial service. In 2017 3rd IEEE International Conference on Computer and Communications (ICCC) (pp. 2935-2939). IEEE
Sun, Y., Tian, Z., Li, M., Su, S., Du, X., & Guizani, M. (2020). Honeypot Identification in Softwarized Industrial Cyber–Physical Systems. IEEE Transactions on Industrial Informatics, 17(8), 5542-5551.Mm
Chen, Y., Lian, X., Yu, D., Lv, S., Hao, S., & Ma, Y. (2020). Exploring Shodan from the perspective of industrial control systems. IEEE Access, 8, 75359–75369
Matin, I. M. M., & Rahardjo, B. (2019, November). Malware detection using honeypot and machine learning. In 2019 7th International Conference on Cyber and IT Service Management (CITSM) (Vol. 7, pp. 1-4). IEEE
Vanhoenshoven, F., Nápoles, G., Falcon, R., Vanhoof, K., & Köppen, M. (2016, December). Detecting malicious URLs using machine learning techniques. In 2016 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1-8). IEEE
Siniosoglou, I., Efstathopoulos, G., Pliatsios, D., Moscholios, I. D., Sarigiannidis, A., Sakellari, G. … Sarigiannidis, P. (2020, July). NeuralPot: An Industrial Honeypot Implementation Based On Deep Neural Networks. In 2020 IEEE Symposium on Computers and Communications (ISCC) (pp. 1-7). IEEE
Vishwakarma, R., & Jain, A. K. (2019, April). A honeypot with machine learning based detection framework for defending IoT based Botnet DDoS attacks. In 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI) (pp. 1019-1024). IEEE
Kwon, D., Natarajan, K., Suh, S. C., Kim, H., & Kim, J. (2018, July). An empirical study on network anomaly detection using convolutional neural networks. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS) (pp. 1595-1598). IEEE
Nayyar, S., Arora, S., & Singh, M. (2020, July). Recurrent Neural Network Based Intrusion Detection System. In 2020 International Conference on Communication and Signal Processing (ICCSP) (pp. 0136-0140). IEEE
Selvaraj, R., Kuthadi, V. M., & Marwala, T. (2016). Honey pot: A major technique for intrusion detection. In Proceedings of the Second International Conference on Computer and Communication Technologies (pp. 73-82). Springer, New Delhi
Mashima, D., Chen, B., Gunathilaka, P., & Tjiong, E. L. (2017, October). Towards a grid-wide, high-fidelity electrical substation honeynet. In 2017 IEEE International Conference on Smart Grid Communications (SmartGridComm) (pp. 89-95). IEEE
Dalamagkas, C., Sarigiannidis, P., Ioannidis, D., Iturbe, E., Nikolis, O., Ramos, F. … Tzovaras, D. (2019, June). A survey on honeypots, honeynets and their applications on smart grid. In 2019 IEEE Conference on Network Softwarization (NetSoft) (pp. 93-100). IEEE
Shi, L., Li, Y., Liu, T., Liu, J., Shan, B., & Chen, H. (2019). Dynamic distributed honeypot based on blockchain. IEEE Access, 7, 72234–72246
Luo, T., Xu, Z., Jin, X., Jia, Y., & Ouyang, X. (2017). Iotcandyjar: Towards an intelligent-interaction honeypot for iot devices.Black Hat,1–11
Nursetyo, A., Rachmawanto, E. H., & Sari, C. A. (2019, October). Website and network security techniques against brute force attacks using honeypot. In 2019 Fourth International Conference on Informatics and Computing (ICIC) (pp. 1-6). IEEE
Baykara, M., & Das, R. (2018). A novel honeypot based security approach for real-time intrusion detection and prevention systems. Journal of Information Security and Applications, 41, 103–116
Ziaie Tabari, A., & Ou, X. (2020, October). A Multi-phased Multi-faceted IoT Honeypot Ecosystem. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (pp. 2121-2123)
Yang, H., Cheng, L., & Chuah, M. C. (2019, June). Deep-learning-based network intrusion detection for SCADA systems. In 2019 IEEE Conference on Communications and Network Security (CNS) (pp. 1-7). IEEE
Gao, J., Gan, L., Buschendorf, F., Zhang, L., Liu, H., Li, P. … Lu, T. (2020). Omni SCADA intrusion detection using deep learning algorithms. IEEE Internet of Things Journal, 8(2), 951–961
Perez, R. L., Adamsky, F., Soua, R., & Engel, T. (2018, August). Machine learning for reliable network attack detection in SCADA systems. In 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE) (pp. 633-638). IEEE
Sheng, C., Yao, Y., Fu, Q., & Yang, W. (2021). A cyber-physical model for SCADA system and its intrusion detection. Computer Networks, 185, 107677
Khan, I. A., Pi, D., Khan, Z. U., Hussain, Y., & Nawaz, A. (2019). HML-IDS: A hybrid-multilevel anomaly prediction approach for intrusion detection in SCADA systems. IEEE Access, 7, 89507–89521
Qian, J., Du, X., Chen, B., Qu, B., Zeng, K., & Liu, J. (2020). Cyber-physical integrated intrusion detection scheme in SCADA system of process manufacturing industry. IEEE Access, 8, 147471–147481
Bulle, B. B., Santin, A. O., Viegas, E. K., & dos Santos, R. R. (2020, October). A Host-based Intrusion Detection Model Based on OS Diversity for SCADA. In IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society (pp. 691-696). IEEE
Al-Asiri, M., & El-Alfy, E. S. M. (2020). On Using Physical Based Intrusion Detection in SCADA Systems. Procedia Computer Science, 170, 34–42

Download PDF

Version 1

posted

You are reading this latest preprint version

Machine Learning-Based Early Intrusion Detection System in Industrial LAN Networks Using Honeypots

Status:

Version 1

Abstract

Figures

I. Introduction

Ii. Related Works

Iii. Proposed Methodology

Iv. Configuration And Assembling

V. Machine Learning (Research Contribution)

Vi. Datasets In Honeypot Industrial Early Intrusion Detection System (Eids)

Vii. Discussion On The Eids

Viii. Discussion On The Dataset And Statistical Observations

Ix. Results

X. Conclusion

Abbrevations

References

Status:

Version 1