The purpose of this research is to explain the image classification results of the VGG16 model. Unfortunately, the computing process of the VGG16 convolutional neural network (CNN) model does not meet transparency requirements. Therefore, we used post-hoc explanation techniques to explain the VGG16 model, referring to research on visualizing features in CNNs (Melnyk et al., 2019) that considered visualization to be feasible. Thus, we used visualization to provide a post-hoc explanation of the model errors. This section describes how we used the visual explanation method in post-hoc explanation techniques to conduct research and design experiments to achieve our objective: explaining image classification results.
2.3. Practical Methods for Explaining DNNs
Because an analysis of deep neural network (DNN) models results in many challenges, many scientists have proposed practical methods to explain these models. There are many such explanations, and each explanation method has both advantages and disadvantages. The following discussion focuses on six main explanation techniques: sensitivity analysis, integrated gradients, smooth gradients, smooth integrated gradients, Grad-CAM, and XRAI.
2.3.1. Sensitivity analysis
Image data is typically represented by \(\left\{{x}_{1}, {x}_{2}, \dots , {x}_{n}, \dots , {x}_{N}\right\}\) as vectors. If the probability of an image being classified into class \(c\) is \({y}_{k}\), by adding perturbation \(\varDelta x\) to a pixel \({x}_{n}\), the change ∆y played by \({y}_{k}\) can be observed, as shown in Eq. (1). If this perturbation ∆x significantly impacts the final classification, it indicates a high degree of importance of pixel \({x}_{n}\) in the model judgment:
\(\left\{{x}_{1},{x}_{2},\cdots ,{x}_{n},\cdots ,{x}_{N}\right\}\) →\(\left\{{x}_{1},{x}_{2},\cdots ,{x}_{n}+\varDelta x,\cdots ,{x}_{N}\right\}\) \({y}_{k}\) → \({y}_{k}+\varDelta y\) | (1) |
If one wants to determine the impact of each pixel's perturbation on the prediction, it is necessary to calculate \(\left|\frac{{\Delta }y}{{\Delta }x}\right|\), which calculates gradient \(\left|\frac{\partial {y}_{k}}{\partial {x}_{n}}\right|\). The gradient represents the importance of pixel \({x}_{n}\) in the judgment category. A saliency map is drawn according to the gradient, where an area with a higher level of brightness indicates that the pixels in this area have a more significant influence on the prediction.
A sensitivity analysis (Simonyan et al. 2013) is the first scheme to use a gradient-based method to do the post-hoc explanation of the artificial intelligence model in the aftermath. The process is as follows: First assume a simple linear model \({S}_{c}\left(I\right)\) as in Eq. (2).
\({S}_{c}\left(I\right)= {\omega }_{c}^{T}I+ {b}_{c}\) | (2) |
where a one-dimensional vector represents image \(I\); \(c\) represents class; \({\omega }_{c}\) represents the weight vector; \({b}_{c}\) is the model bias, and \({S}_{c}\left(I\right)\) is the score of the linear model. The weight \({\omega }_{c}\) shown in Eq. (2), defines the pixels’ level of importance in image \(I\).
A neural network is a very complicated nonlinear model \({S}_{c}\left(I\right)\). Therefore, we cannot apply the derivation of Eq. (1) to it. Nevertheless, we can use a first-order Taylor expansion to expand the linear model around the given image \({I}_{0}\) to approximate the \({S}_{c}\left(I\right)\) value, as shown in Eq. (3):
\({S}_{c}\left(I\right) \approx {\omega }^{T}I+ b\) | (3) |
where \(\omega\) is the derivative of the output value \({S}_{c}\) in the input image \({I}_{0}\), as in Eq. (4), and \(\omega\) is the gradient of the output \({S}_{c}\) relative to the input image \({I}_{0}\).
\(\omega = \frac{\partial {S}_{c}}{\partial I}{|}_{{I}_{0}}\) | (4) |
Finally, a saliency map is used to present the weight \(\omega\) of each pixel, so the impact of each pixel in the image on the classification result can be determined.
2.3.2. Integrated Gradients
The gradient-based interpretation method can indeed successfully explain some artificial intelligence prediction results in many cases, but there may also be some unexplainable situations. Both Simonyan et al. (2013) and Baehrens et al. (2010) mentioned that gradient-based explanations encounter unexplainable situations in saturated regions. Once the gradient enters the saturated region and approaches 0, no effective information can be obtained, as shown in Fig. 2.
Taking the gradient as the importance score, then, the gradient is close to 0 in some areas. This will be represented as a low intensity area on the saliency map. Therefore, Mukund Sundararajan et al. (2017) proposed using integrated gradients to solve problems encountered in the saturation region. Instead of gradients, all gradients are integrated as importance scores of pattern features. Using integrated gradients to calculate the integral of the gradient as the importance score can prevent the gradient from approaching 0 in some areas. The difficulty of this method is that for a given image \(x\). However, its intensity has been fixed, so a method is needed to obtain the gradient of an image whose intensity is less than\(x.\)
First, suppose that the current image is \(x\), and set a baseline image \(x{\prime }\) with an intensity of 0, which is usually a black image with all zero pixels or a random image. Then, a linear interpolation is performed between the baseline \(x\text{'}\) and the original image \(x\) to generate the intermediate image, as expressed by Eq. (5):
\(\tilde{x}={x}^{\text{'}}+ \alpha (x-x\text{'})\) | (5) |
As shown in Fig. 3, when \(\alpha =0\) is the baseline image, when \(\alpha =1\), it is the original input image \(x\). The integrated gradient is defined as in Eq. (6):
\({IntegratedGrads}_{i}\left(x\right)=\left({x}_{i}-{x}_{i}\text{'}\right)\times {\int }_{\alpha =0}^{1}\frac{\partial F\left({x}^{\text{'}}+\alpha \left(x-x\text{'}\right)\right)}{\partial {x}_{i}}d\alpha\) | (6) |
2.3.3. Smooth Gradients and Smooth Integrated Gradients
Gradient-based interpretation methods, such as sensitivity analysis and integrated gradients, are saliency maps obtained through backpropagation. A saliency map typically has a lot of visual noise. These noises will result in the saliency map only inferring the location of the relevant area, which is not the result of a human process. The surrounding pixels of the target in the saliency map will exhibit a high degree of brightness, but these high-brightness pixels are not related to the target in the original image.
Smilkov et al. (2017) mentioned the reason for the noise may be that the derivative of the function \({S}_{c}\) is not even continuously differentiable.
The SmoothGrad method randomly adds a specific degree of noise to the input image and then calculates the gradient. After several gradient calculations, the average value is taken to make the gradient change more stable in order to remove the noise. If the saliency map \({M}_{c}\left(x\right)\) is the gradient value of the output to the input, as in Eq. (7):
\({M}_{c}\left(x\right)= \frac{\partial {S}_{c}\left(x\right)}{\partial x},\) | (7) |
then, SmoothGrad perturbates the input image, adds a slight difference \(N\left(0, {\sigma }^{2}\right)\) to generate n perturbated images, calculates their gradient average, and then obtains a stable saliency map, as shown in Eq. (8):
\(\widehat{{M}_{c}}\left(x\right)= \frac{1}{n}\sum _{1}^{n}{M}_{c}\left(x+N\left(0, {\sigma }^{2}\right)\right)\) | (8) |
When the value of sample size n becomes larger, the saliency map has less noise and is more stable. We apply the SmoothGrad scheme to eliminate noise and generate two methods, a smooth gradient and a smooth integrated gradient.
2.3.4. Grad-CAM
Using class activation mapping (CAM) (Zhou et al. 2016), it is necessary to connect the global average pooling layer (GAP Layer) after the latest convolutional layer output and retrain the model, so this method is unsuitable for practical applications. Therefore, Selvaraju et al. (2017) improved the CAM method and proposed a Grad-CAM combining gradient information and a feature map. Regardless of whether the convolutional layer is connected to a fully connected layer or other different types of networks, the heatmap can be obtained using the Grad-CAM method without modifying the network. Grad-CAM replaces the weight value with a gradient and directly uses the output layer's output and the convolutional layer to calculate the gradient.
To improve performance details, Selvaraju et al. (2017) multiplied the result of Guided Backpropagation and the original Grad-CAM result to affect the output image and obtain a higher resolution and a heatmap with superior positioning accuracy.
2.3.5. XRAI
The XRAI method is used to extract symbolic representations of a mathematical function. A symbolic representation of the mathematical function is used as the basis of the representations for learning the neural network during training. The XRAI method is used to adjust the interpretation method used after the neural network is trained. After neural network training, the weight and deviation values are used as the input for the interpretation method and determine how to express the neural network formula during training. This formula is then converted into a symbolic representation in the XRAI method.
In the study of XRAI: Explainable Representations through AI (Christiann et al., 2020), it is indicated that Boolean functions and low-order polynomials can be used as examples to perform offline training on synthetic data to explain different types of functions. Unlike with integrated gradients, XRAI also evaluates the overlapping area of the image to reconstruct a saliency map that highlights the relevant area of the image instead of pixels.
Kapishnikov et al. (2019) first performed oversegmentation on an image, repeatedly tested the importance of each area, and merged smaller regions into larger regions based on the attribution score. The results of the experiment confirmed that this strategy can produce significant regions with high quality, tight boundaries, and the performance of XRAI was better than other existing post-explanatory methods. More importantly, XRAI can be used with any DNN-based model as long as the input features can be classified through a similarity calculation (for example, the color similarity in the image).
Kapishnikov et al. (2019) proved that under a general neural network model, for the purpose of comparing post-interpretations of an ImageNet data set, XRAI is more effective. This interpretable method is often applied to models that accept image input, such as natural images of any real scene containing multiple objects.
The XRAI algorithm proposed by Christiann et al. (2020) is as follows:
Given image I, model f and attribution method g
Over-segment I to segments s ∈ S
Get attribution map A = g (f, I)
Let saliency mask M = 0, trajectory T = []
while S ≠ ∅ and area(M) < area (I) do
for s ∈ S do
Compute gain2:\({g}_{s}=\sum _{i\in s\setminus M}\frac{{A}_{i}}{area(s\setminus M)}\)
end for
\(\widehat{s}\) = arg\(\underset{s}{{max}}{g}_{s}\)
S = S /\(\widehat{s}\)
M = M ∪\(\widehat{s}\)
Add M to list T
end while
return T
XRAI uses integrated gradients to satisfy sensitivity-N (Ancona et al., 2017), where the sum of all its input parameters will be equal to the input of the softmax value subtracted from the benchmark softmax value. XRAI starts with an empty mask and then selectively and continuously adds the most profitable block in the total attribution of each block until the complete image is obtained as a mask or all the blocks that can be added have been used. The density of the trajectory of masks obtained from the calculation above is regarded as an essential order when sorting the blocks, which means that image blocks contributing to the prediction category should have a high positive attribution. Blocks that are not related to the prediction should have an attribution close to zero. Blocks that contain competing types should have a negative attribution.