Artificial Neural Networks (ANN) are a form of artificial intelligence that are designed to mimic the way the human brain works [1]. ANNs consist of interconnected nodes that are capable of processing and transmitting information, similar to the brains of creatures. Neural networks are used in a wide variety of applications, including image recognition, natural language processing, and predictive analytics [2].
One of the main advantages of neural networks is their ability to learn from data. They can be trained on large datasets to identify patterns and make predictions, which makes them particularly useful for tasks like image classification or speech recognition. Neural networks are also highly parallel, which means that they can process information much faster than traditional computers [3]. Another advantage of neural networks is their ability to generalize. Once they have been trained on a dataset, they can be used to make predictions on new data that they have never seen before. This makes them particularly useful for applications like predictive maintenance, where the goal is to identify potential problems before they occur. Overall, neural networks are a powerful tool for solving complex problems and making predictions based on large datasets [4].
One of the main concerns about neural networks is that they can be vulnerable to attacks. There are several types of attacks that can be launched against neural networks, including adversarial attacks, poisoning attacks, and backdoor attacks. These attacks are as follows [5]:
Adversarial attacks are designed to trick a neural network into misclassifying data. These attacks work by adding small, carefully crafted perturbations to the input data that are imperceptible to humans but can cause the neural network to make incorrect predictions.
Poisoning attacks involve manipulating the training data used to train the neural network. The goal of this type of attack is to introduce subtle changes to the training data that can cause the neural network to behave maliciously.
Backdoor attacks involve inserting a backdoor into the neural network during the training process. This backdoor can then be triggered by specific inputs, allowing an attacker to take control of the neural network.
Researchers are developing new techniques to make neural networks more robust and secure to mitigate these types of attacks. These contributions include techniques for detecting and removing adversarial examples, methods for detecting poisoning attacks, and ways to prevent backdoor attacks [6].
Overall, while neural networks are vulnerable to attacks, researchers are actively working to develop new techniques to make them more secure and resilient to these types of threats.
In this paper, we enhance neural networks resistance against FGSM attack through machine unlearning implementation. Many deep learning models, it is susceptible to adversarial attacks such as FGSM. To protect against FGSM attacks, machine unlearning can be implemented. This involves periodically retraining the model on a dataset that includes adversarial examples. This process essentially "unlearns" the incorrect patterns that were learned by the model during the adversarial attack, making it more robust to future attacks.
This paper is organized as follows. section 2, we will examine the structure of neural networks. In section 3, we will introduce and examine machine unlearning and how it protects neural networks in our proposed method. In section 4, our proposed method explains the protection of neural networks against fgsm attacks and presents experimental results. Finally, we evaluate the obtained results.
1.2 Related work
A considerable amount of work has already been done in the area of neural network insecurities. In this section, we will briefly review and classify the literature. Several recent studies have explored the security vulnerabilities of neural networks. In particular, researchers have focused on the following topics:
Adversarial attacks
Goodfellow et al. (2014) [7] demonstrated that neural networks are vulnerable to adversarial attacks, in which small modifications to input data can cause the network to misclassify
the data. This work has been extended by a number of subsequent papers, including Papernot et al. (2016) [8] and Carlini and Wagner (2017) [9].
Backdoor attacks
Gu et al. (2017) [10] introduced the concept of "backdoor attacks," in which an attacker can subtly modify the training data to introduce a flaw that can be exploited later. This work has been extended by Bhagoji et al. (2018) [11] and Liu et al. (2018) [12].
Model stealing
Tramer et al. (2016) [13] showed that it is possible for an attacker to "steal" a neural network model by querying the model and using the responses to reverse-engineer the network's architecture. This work has been extended by Juuti et al. (2018) [14] and Orekondy et al. (2019) [15].
By reviewing the work that has been done in this area, we can better understand the challenges and opportunities for improving the security of neural networks. In this article, we focus on increasing
the security of neural networks against adversarial examples. Neural networks are susceptible to adversarial examples, which are carefully crafted inputs that cause the network to make the wrong prediction. This can have serious consequences in applications such as autonomous driving and medical diagnosis.
One approach by Aleksander Madry et al. (2017) [16], to increasing neural network security against adversarial examples is through adversarial training, where the network is trained on both clean and adversarial examples. This can increase the network's robustness to adversarial attacks.
Another approach by Nicolas Papernot et al. (2016) [17] is defensive distillation, which involves training
a second network to mimic the behavior of the first network. The second network is then used to make predictions, making it more difficult for attackers to craft adversarial examples.
Other techniques include randomized smoothing by Cohen et al. (2019) [18], where the output of the network is perturbed with random noise in order to make it more difficult for attackers to craft adversarial examples, and feature squeezing by Xu et al. (2017) [19], where the number of colors or bits per pixel in the input image is reduced to make it more difficult to exploit small perturbations.
In past works, it was tried to make it more difficult to create adversarial samples, but in the method we present, the attempt is to remove the adversarial samples from the neural network, and thus the accuracy of the desired model can be significantly improved.
In this paper, we propose protecting the neural networks against FGSM attack using machine unlearning. The Fast Gradient Sign Method (FGSM) attack is a popular adversarial attack technique used to generate adversarial examples to fool machine learning models. The idea behind the FGSM attack is to take the sign of the gradient of the loss function with respect to the input and use it to perturb the input by a small amount in the direction that maximizes the loss.