An automated fluorescent defect detection system in precision castings by convolutional neural network

doi:10.21203/rs.3.rs-3337794/v1

Download PDF

Research Article

An automated fluorescent defect detection system in precision castings by convolutional neural network

https://doi.org/10.21203/rs.3.rs-3337794/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Fluorescent Penetrant Inspection (FPI) is a widely used Non-Destructive Testing (NDT) method in the aerospace industry to inspect precision casting components. Currently, FPI inspection relies on visual examination, which can lead to challenges in distinguishing between defects and false indications. Moreover, human factors introduce variability in the results, impacting the consistency and reliability of the inspection process. This highlights the desirability of the automation of FPI to increase consistency, reliability and productivity. The deep learning method is gradually replacing the traditional approaches that involve image processing and machine learning classifiers in automated defect detection system. Deep learning method offers automatic feature extraction and high robustness, which contribute to more accurate and efficient defect detection.

The use of various convolutional neural networks (CNN) in defect detection for flat superalloy plates processed with FPI and photographed to create digital images was investigated. Among the CNN models, MobileNetV2 exhibited outstanding performance, with a remarkable recall rate of 99.2% and an accuracy of 99.2%. Additionally, the effect of dataset imbalance on model performance was carefully examined. Moreover, the features extracted by the model are visualized using Guided Grad-CAM to reveal the attention of the CNN model to the fluorescent display features.

These results underscore the strong capability of deep learning architectures in detect defects in aerospace precision casting components, paving the way for the automation of the entire FPI process.

Fluorescent penetrant inspection

Deep learning

Precision casting

Surface defect

Automation

Fluorescent penetrant inspection (FPI) is extensively used as a non-destructive testing (NDT) methods for detecting defects in precision castings, especially in critical fields like aerospace [1–4]. While other parts of the process have been automated, the final inspection is still predominately carried out manually [5]. The stringent acceptance criteria for defects in aerospace precision casting components necessitate the detection of even minute defects. This poses a significant challenge for inspectors who must work long shifts under low light conditions to meticulously identify identical components with minute defects. Moreover, the subjective nature of inspection standards among different inspectors can lead to inconsistent results and potentially allow defective parts passing the inspection process undetected [6, 7]. These conditions can cause operators to be less attentive and may result in missing defects, ultimately reducing the overall effectiveness of the inspection process. The introduction of an automated inspection system is expected to improve consistency, reliability and productivity compared to a manual system.

There has been a limited amount of research into the topic of automated defect detection for FPI. In the early years, the most common approach was to combine image processing with machine learning method to identify defects. Tang et al. made the first attempt to implement an automated fluorescent defects detection system in 1995 [8]. Different brightness and geometry features were extracted, followed by pattern classification methods. The final system achieved a correct call rate of 97.7% on an artificially induced cracks dataset [8]. Alba et al. proposed a starting point extraction (SPE) algorithm to establish an automated inspection system capable of identifying fluorescent defects ranging from 8 ~ 20 mm in size [9]. Zheng et al. presented an automated fluorescent defect inspection system for aerospace components [6]. The Hough transformation and geometry features were extracted, then put into the multi-layer perceptron (MLP) classifier. The system achieved an accuracy of 91.4% [6]. Shipway et al. used Random Forest (RF) to identify crack defects by brightness, geometry and texture features [7]. The model achieved a recall rate of 76%, which is greater than the inspector’s accuracy rate of 74% [7].

However, the aforementioned systems rely on manually designed image processing algorithms for feature extraction, which can lead to the problems of incomplete feature extraction and high cost. Additionally, these systems have a poor robustness when confronted with challenges like uneven lighting and noise in real FPI environment.

The deep learning method, proposed by Hinton et al. in 2015, has been widely used for various image classification tasks [10]. Thanks to the deep learning method, it can automatically extract the features from the images, eliminating the need for manual feature extraction. As a result, researchers are actively applying it into the defects detection system for FPI. Shipway et al. presented the performance of ResNet34 and ResNet50 in a fluorescent defect detection system [11]. Compared to the RF method used previously, the deep learning method demonstrated an excellent performance [11]. Niccolai et al. designed an automated fluorescent defect detection system for crack, linear and false indication by introducing the deep learning method [12]. Although the model achieved an accuracy rate of 98.0%, it abandoned the use of automatic feature extraction and inputs the extracted features [12].

The previous work shows the progress made in automated defect detection for FPI. However, several challenges still need to be addressed. These challenges include the acquisition of fluorescent images, the detecting precision of system(size of detecting defects), the robustness of system, the ability to distinguish defects and false indications, etc [7].

In this study, different convolutional neural network (CNN) architectures were used to implement an automated fluorescent defect system for identifying defects and false indications. In Section 2, the underlying theory of both Fluorescent Penetrant Inspection and Deep Learning are outlined. The methodology is presented in Section 3. Section 4 will present the results and discussions, before conclusions in Section 5.

2.1. Fluorescent penetrant inspection

The Capillary effect is the fundamental principle underlying FPI. Fluorescent dye have good wettability and it can penetrate into tight crevices, cracks, pores, folds and other discontinuous parts which opens to the surface under capillary effect [4, 13].

The overview of the FPI is depicted in Fig. 1. To reveal defects in the component, the surface is first cleaned and then applied with a dye penetrant, as shown in Fig. 1(a, b). Excess penetrant is subsequently removed from the surface of the component, see Fig. 1(c). Next, the developer is applied to the surface, which facilitates the penetration of the dye contained within defects to bleed out onto the surface of the component under UV light, creating bright fluorescent indications, see Fig. 1d.

FPI is carried out manually in most cases, where the fluorescent displays are visually identified by the operator. In the displays of most precision casting parts, the number of defect indications is relatively small compared to the false indications, which is challenging for operators to accurately classify them only by visual inspection [14]. Moreover, the long periods of work under the black light can lead to operator fatigue, further compounded by the subjective standards of operators, potentially results in missed and false detections. In the aerospace industry, even a tiny defect can cause a component to fail, potentially resulting in a serious accident. Therefore, it is crucial to implement an automated fluorescent defect detection system to achieve higher accuracy, efficiency and lower cost.

2.2. Deep learning

Deep learning is developed on the basis of artificial neural network, which is inspired by the human brain’s neurons. Compared to the conventional machine learning techniques, deep learning can extract the data features by using a general-purpose learning procedure [10]. This automated feature extraction capability has proven to be remarkably effective, especially in image identification tasks, where deep learning has outperformed the combination of traditional image processing and machine learning methods [15, 16].

The typical structure of a CNN consists of three main layers: the convolutional layer, the pooling layer and fully connected layer, as shown in Fig. 2(a). The features are extracted through the convolution operation and pooling operation, and then these features are fed into the fully connected layer to map them to the results. The automated feature extraction is primarily accomplished by the convolutional layer, as shown in Fig. 2(b). This layer comprises two essential elements: the convolution kernel and the convolution operation. When the convolution layer comes into play, the kernel slides over the input image, performing the convolution operation. As the kernel traverses the entire input image, an output feature map is generated. The stride determines the step size of the kernel sliding across the image, a larger stride simplifies the original features, resulting in a smaller output feature map, while a smaller stride preserves more detailed information in the output feature map.

Deep learning has showcased remarkable capabilities in image identification, leading to solutions in various domains such as self-driving cars [17–19], biomedical applications [20–22] and defects detection [23, 24]. While deep learning requires large dataset for learning, several methods have been developed to address the challenge of limited datasets. One such method is data augmentation, wherein images are transformed in various ways to enlarge the dataset [25]. Another approach involves using generative adversarial networks (GANs), where existing dataset are used to train models that can generate images matching the real-world data [15]. It is important for application within FPI as small dataset and complex fluorescent display are common.

3.1. Dataset description

In order to create a dataset for training and validating the fluorescent defect automated detection system, and to provide a geometrically simple shape for experimentation, a series of test plates were produced using precision casting. In this case, the test plates are rectangular superalloy plates(100mm×50mm×2mm) and each plate was detected by the operators using FPI. 30 plates were used to conduct this work, with varied in the number of inclusions defect present.

The parts of FPI were carried in a semi-automated fluorescent penetration inspection pipeline, with high sensitivity penetrant (1D3, ZL-67) and developer (ZP-4D). The photographs were taken by using a Nikon D7500 camera with an AF micro 60mm f/2.8G ED macro lens. This camera is suitable for FPI environments with significant differences in brightness due to its high dynamic tolerance. Although D7500 comes with a half-frame CMOS sensor, it still minimizes noise. The AF micro 60mm f/2.8G ED macro lens can magnify objects on a 1:1 scale, allowing for the generation of higher-quality fluorescence images. The camera was mounted on a stand, as a distance of 312mm from the table (310mm from the surface of the component). The UV light used was ST-45 and this was positioned around 300mm from the surface of the component. Each time images were taken, a light meter was used to ensure the UV and ambient light levels were within the accepted range as specified by the ASTM specification, ASTM E1417 2016. Other camera settings are shown in Table 1.

Table 1

Camera settings for fluorescent images acquisition.
Property	Value
Focal length	90mm
Exposure time	1/2.5s
Exposure bias	0
F-stop	f/4
ISO	ISO 100
White balance	6150K

A calibration piece, PSM-5, as specified by the ASTM E1417 specification [26], was used to determine the camera settings, to ensure consistency. The calibration piece, as shown in Fig. 3, contained five cracks on the top surface, arranged along the same horizontal line, with size ranging from 0.50 mm to 4.57 mm. The exposure time of the camera was determined according to the brightest region on the calibration test piece, which is typically associated with the 4.57 mm crack. Figure 3(a) displays the image of PSM-5 taken at the specified exposure time and Fig. 3(b) shows that gray image obtained after applying a median filter. It should be noted that the PSM-5 image was just for determining the camera settings. Figure 3(c) and Fig. 3(d) represent the binary images before and after morphology operations, respectively. Importantly, the same image processing procedure applied to the PSM-5 test piece image is also followed for the images of the superalloy plates. Some examples of the precision casting superalloy plates are shown in Fig. 4. These plates contain runners on the surface, which need to be considered during the defect detection process. Figure 4(a) and Fig. 4(c) show some examples of round bumps on the plates, identified as cut runners (regions surrounded by red circles). These cut runners result in fluorescent indication (regions surrounded by red rectangular), which are considered false indications.

Each image was manually annotated using the labeling software Labelme to indicate the pixel are associated with defects. Referring to the PSM-5 calibration piece, a threshold level of 40 (gray image is 8-bit image) was chosen to ensure the smallest visible crack after binary operation. Pixels above this threshold were considered as possible defective regions. Subsequently, the morphology operations were performed to smoothen the edges and fill in any holes in the binary images. The regions of interest (ROI) were then extracted using edge detection algorithms. Windows were created around each of these bright locations, generating candidate regions. The size of these windows varied depending on the size of the fluorescent indication present. In this study, indications smaller than 0.1 mm (12pixels) are assumed not to be defective and were neglected. A total of 3129 windows were extracted from the 30 test plates, out of which 632 were inclusion defect images.

Considering the input size of CNN model, the images were resized to 224×224, as shown in Fig. 5. To address the class imbalance in the dataset, data augmentation techniques were performed to increase the quantity of images provided to the classifier for classification. These techniques include rotation, width shift, height shift and vertical flip. The dataset consists of 2528 defect images and 2497 non-defect images. The dataset was split in an 8:1:1 ratio, 80% of the total dataset is used for training, 10% is used for validation, while the remaining 10% is for the test, as shown in Table 2.

Table 2

The distribution of fluorescent defect datasets.
Class		Validation set	Testing set	Total
Defect	2024	252	252	2528
Non-defect	1999	249	249	2497
Imbalance-defect	506	63	63	632

To gain deeper insight into the characteristic of the false indication displays and inclusion defect images in the FPI image, part of the gray distribution maps corresponding to these two types of images from Fig. 5 was extracted and shown in Fig. 6. In the case of fluorescent defect, the typical grayscale distribution is noticeably higher than the background, and there exists a sharp and distinct boundary between the background and the defect display region. On the other hand, for false indication, there is a smooth transition from the background to the display region, leading to a more gradual change in grayscale values. And the grayscale distribution of false indication is lower than the defect one.

3.2. Automated fluorescent defect detection system

In this section, a detailed description of the proposed automated fluorescent defect detection system is presented. The system can be divided into two main parts: the acquisition of inclusion defect ROI and the classification using CNN neural network. In the first stage, based on the brightness characteristic of fluorescent display, the system obtains a binarization image of the original image. Next, an edge detection algorithm is applied to detect and extract the candidate regions. The circumscribed rectangle around the fluorescent display corresponding to the original image is considered as the candidate regions. In the second stage, the extracted candidate images are resized into 224× 224 images to match the input size of the CNN model. These resized images are then input into the CNN model for training and further determination of the presence of a fluorescent defect. The entire automated fluorescent detection system is illustrated in Fig. 7.

3.3. Implementation and evaluation metrics

In this study, the experiments were carried out using the Jupyter Notebook with a total of 8 different models, namely VGG16, GoogLeNet-InceptionV1, GoogLeNet-InceptionV3, ResNet18, ResNet34, ResNet50, MobileNetV2 and MobileNetV3 [27–32]. The proposed automated defect detection system was implemented on a workstation equipped with a CPU (Intel(R) Core (TM) i7-12700KF, 3.60 GHz × 20) and a GPU (NVIDIA GTX 3080 12 GB). The computer operated on a Windows 11 system, and the detection program was developed in a Python 3.9 environment with the required packages including PyTorch, Numpy and OpeCV. The hardware specifications of the deep learning computer used for the simulation are summarized in Table 3. For the training of each model, the following parameters were set: num of epochs: 50, loss function: cross entropy [33], optimizer: adam, batch size: 32, learning rate: 0.001, learning rate scheduler: StepLR (step size: 5, gamma : 0.5).

Throughout the training, the validation loss per epoch was monitored, and wight variables were adjusted accordingly to optimize the performance of the experimental models. After training, the testing dataset was used to evaluate the generalization ability and performance of each model.

Table 3

Hardware specifications of the CNN computer.
Item	Specification Details
CPU	Inter (R) Core (TM) i7-12700KF
GPU	Nvidia GTX 3080 12G
RAM	32.0 GB
OS	Microsoft Windows 11, 64-bit

In this paper, various evaluation metrics were employed to assess the performance of the used detection models. These metrics include accuracy, precision, recall, F1-score, PR (Precision-Recall) curve and ROC (Received Operating Characteristic) curve. The evaluation metrics were derived from the confusion matrix, as illustrated in Fig. 8. Accuracy is the probability that all samples are correctly classified. It is used to quantitatively evaluate the performance of the binary classification task. Recall, also known as sensitivity, represents the ability of the classifier to find all positive samples. Precision is the probability that a sample classified as a positive sample is actually positive. The F1-score is a weighted average of precision and recall.

The following four equations of the evaluation metrics can be calculated by the corresponding confusion matrix:

$$accuracy=\frac{{TP+TN}}{{TP+FP+FN+TN}}$$

$$recall=\frac{{TP}}{{TP+FN}}$$

$$precision=\frac{{TP}}{{TP+FP}}$$

$$F1 - score=\frac{{2*precision*recall}}{{precision+recall}}$$

A total of 4023 images were used for training, 501 images for validation, and another 501 images for testing. Eight different models were used in the implementation of the system. The model complexity is measured by two metrics: parameters (Params) and floating-point operations (Flops) [34]. Table 4 presents the Flops and Params of each model. It is observed that MobileNetV3 has the lowest complexity, with 58.79 M Flops and 1,519,906, Params. This is primarily due to the use of depthwise separable convolution as discussed in section 2.3. On the other hand, VGG16 has the highest complexity among the models, with 15.53 G Flops and 134,277,186 Params, mainly due to its linear connection structure. Models like ResNet and GoogLeNet fall in between MobileNet and VGG in terms of complexity, as they incorporate certain architectural features, such as Shortcut structure in ResNet and Inception modules in GoogLeNet, to avoid the linear connection structure.

Table 4

Params and Flops of each model.
Model	Flops	Params
VGG16	15.53G	134,277,186
ResNet18	1.82G	11,177,538
ResNet34	3.67G	21,285,698
ResNet50	4.12G	23,512,130
MobileNetV2	318.96M	2,226,434
MobileNetV3	58.79M	1,519,906
GoogLeNet-IncepitonV1	1.51G	5,601,954
GoogLeNet-InceptionV3	2.85G	21,789,666

Figure 9 shows the loss and accuracy curve of each CNN models during training and validating. Figure 9(a, b) shows the loss curve of each model during training and validating vary the epochs. The models can be divided into three groups based on the training loss: MobileNetV2, MobileNetV3 and InceptionV3 from the first group, ResNet18, ResNet34 and ResNet50 comprise the second group, VGG16 and GoogLeNet are in the third group. It is evident that the first group start to converge at approximately 15 epochs, the second group converges around 10 epochs, and VGG16 begins to converge at around 20 epochs. After 50 epochs, MobileNetV2, MobileNetV3 and InceptionV3 networks achieve the lowest training loss, with values of 0.00314, 0.004452, 0.0007174, respectively. In Fig. 9(b), the validation loss curves show that all models start to converge before 10 epochs. Similar to Fig. 9(a), the models can be divided into three groups by the validation loss. After 50 epochs MobileNetV2, MobileNetV3 exhibit the lowest validation loss among eight models, with values of 0.02386, 0.03566, 0.08964, respectively.

Considering Fig. 9(c, d), it can be seen that accuracy curve during training and validating for the eight models. In Fig. 9(c), except for the first group, the remaining five models show comparable accuracy, ranging from 0.8 to 0.9. MobileNetV2, MobileNetV3 and InceptionV3 have higher accuracy, with values between 0.95 and 1.0. In Fig. 9(d), the accuracy of ResNet, GoogLeNet and VGG16 is around 0.9, while MobileNet achieves an accuracy of about 1.0, InceptionV3 is close to 0.95.

Overall, all the models converge well and have good performance during training and validation processes. These models also show good generalization ability during validating. Among these models, the models of first group exhibit the best performance in the modeling process. These trained models are further evaluated using the testing dataset and the performance of each model can be more deeply analyzed using the confusion matrix, which reports the percentage of defect and non-defect fluorescent indications for each case.

Figure 10(a-h) display the confusion matrices of eight models on the testing set. As mentioned in section 3.3, the TP, FN, FP, TN can be observed from these matrices. In aerospace precision castings, even small defect can lead to significant losses. Therefore, in the detection process, it is crucial to detect as many defects as possible, which means the TP value should be close to the number of true defects and FN should be close to 0. Based on this criterion, the fluorescent defect identification performance of the eight models is ranked as follows: MobileNetV2, MobileNetV3, InceptionV1, InceptionV3, ResNet50, ResNet34, ResNet18, VGG16.

However, cost reduction is also an important consideration, and the accuracy of defect detection needs to be taken into consideration. In this case, both FN and FP should be close to 0. It can be seen from the Fig. 11 that the performance of the eight models is ranked as follows: MobileNetV2, MobileNetV3, InceptionV3, ResNet18, ResNet34, InceptionV1, ResNet50, VGG16.

In both cases, MobileNetV2 consistently exhibits the best performance among the eight models. This can be attributed to the efficiency of depthwise separable convolution and inverted residual block in the feature extraction process [32].

The ROC curve is a plot that allows the comparison of the true positive rate against the false positive rate for varying discrimination threshold[35]. Figure 11 shows the ROC curves obtained using the MobileNet, ResNet, GoogLeNet and VGG methods for the testing dataset. Considering Fig. 11, these models can be divided into two groups: MobileNetV2, MobileNetV3 and InceptionV3 and the remaining networks. In the first group, the three models exhibit similar capabilities at lower discrimination thresholds, with MobileNetV2 performing better at higher discrimination thresholds. In the second group, the ResNet18, ResNet34, ResNet50, VGG16 and InceptionV1 architectures have similar capabilities for all discrimination thresholds.

The area under curve (AUC) in ROC curve represents the classifier’s ability to classify samples. A larger AUC indicates better classification capability, meaning that the classifier ranks more positive cases before negative cases. The AUC scores for each architecture are listed in Table 5. The eight architectures are ranked by AUC value from largest to smallest as follows: MobileNetV2, MobileNetV3, InceptionV3, ResNet18, ResNet50, ResNet34, InceptionV1 and VGG16, with corresponding AUC value:0.999, 0.998, 0.988, 0.955, 0.954, 0.954, 0.952 and 0.950, respectively. Theoretically, a perfect classifier would have an AUC value of 1. All these CNN architectures show a good performance in binary task, and MobileNetV2 outperforms among these models.

Within the testing dataset, there are some false indications which look more obviously defect-like than others. It is expected the MobileNet exceeds the other architectures when the system gets past these more obvious cases to the more difficult cases as inverted residual block is able to extract more complex features.

The Precision-Recall (PR) curve was conducted on four representative models to investigate the effect of dataset imbalance on the fluorescent defect detection system. Only four representative models from the three groups were considered, as shown in Fig. 12. The models trained on balanced dataset exhibited superior performance compared to those trained on imbalanced datasets. In contrast, the models trained on imbalanced datasets showed similar performance for all discrimination thresholds when compared to the models trained on balanced datasets. Notably, the models trained on balanced datasets displayed a division into two groups based on their performance at lower discrimination thresholds. The difference show that features may not have been adequately learned in the imbalanced dataset, resulting in the performance of models being both less distinguishable and lower than that of models trained on balanced datasets.

The results presented in Table 5 show the classification evaluation metrics for eight CNN architectures on the testing set. Both the defect and non-defect type classification results are listed. It can be seen that MobileNetV2 achieves the best performance in all metrics for the defect type, with 99.2% Precision, 99.2% Recall, 99.2% F1-score, 99.2% Accuracy and 0.999 AUC. These results indicate that MobileNetV2 exhibits the highest performance among the architectures considered, making it a strong candidate for defect detection in the automated fluorescent defect detection system.

Table 5

Overall Precision, Recall, F1-score, Accuracy, AP and AUC of models on balanced testing set.
Model	Type	Precision	Recall	F1-score	AP	AUC	Accuracy
VGG16	Defect	87.2%	88.9%	88.0%	0.945	0.950	87.8%
VGG16	Non-defect	88.5%	86.7%	87.6%	0.959	0.951	87.8%
ResNet18	Defect	88.1%	91.3%	89.7%	0.947	0.955	89.4%
ResNet18	Non-defect	90.8%	87.6%	89.2%	0.964	0.955	89.4%
ResNet34	Defect	86.6%	92.1%	89.2%	0.945	0.954	88.8%
ResNet34	Non-defect	91.4%	85.5%	88.4%	0.964	0.954	88.8%
ResNet50	Defect	85.3%	92.1%	88.5%	0.948	0.954	88.0%
ResNet50	Non-defect	91.3%	84.0%	87.4%	0.961	0.954	88.0%
MobileNetV2	Defect	99.2%	99.2%	99.2%	0.999	0.999	99.2%
MobileNetV2	Non-defect	99.2%	99.2%	99.2%	0.999	0.999	99.2%
MobileNetV3	Defect	96.5%	98.8%	97.6%	0.998	0.998	97.6%
MobileNetV3	Non-defect	98.8%	96.4%	97.6%	0.998	0.998	97.6%
GoogLeNet-InceptionV1	Defect	82.4%	98.4%	89.7%	0.937	0.952	88.6%
GoogLeNet-InceptionV1	Non-defect	98.0%	78.7%	87.3%	0.787	0.963	88.6%
GoogLeNet-InceptionV3	Defect	96.0%	94.0%	95.0%	0.986	0.988	95.0%
GoogLeNet-InceptionV3	Non-defect	94.1%	96.0%	95.0%	0.991	0.988	95.0%

The good performance of the InceptionV3 architecture can be attributed to the InceptionV3 block, which effectively extracts features from the input images [30]. On the other hand, the MobileNetV2 and MobileNetV3 show better performance among the considered architectures due to the inverted residual block and depthwise convolution. However, the MobileNetV2 slightly outperforms MobileNetV3, which could be the Neural Architecture Search (NAS), which may not be optimally suited for the specific task of this study [31].

The overall performance of ResNet architecture in this study aligns with the findings of Shipway et al. in crack fluorescent defect dataset [11]. It is observed thath as the ResNet architecture becomes deeper, the accuracy decreases, but the recall increases. It is noticed that the ResNet shows the performance about 89% accuracy, which slightly exceeds VGG16 network in this study. This indicates that the residual module does not perform as effectively on the fluorescent defect dataset.

Interestingly, the InceptionV1 shows excellent performance in recall (98.4%), but a not in accuracy (88.6%). As shown in Fig. 10(e), there are 53 non-defect samples misclassified as defect samples. It may be attributed to the InceptionV1 block’s introduction of different sized kernels to extract features at different scales [28]. In some non-defect samples, there exist a lot backgrounds like defect samples, making it difficult for the classifier to classify them due to similar background features.

Figure 13 shows the 2-D representation of the fluorescent images within the testing set: the classification classes are represented by means of the colors: red for defect indications; green for false indications. In this figure, it is possible to observe that the initial distribution is semi-elliptical with small clusters, meaning that the initial distribution of images cannot be directly linearly classified. The high concentration of mixed defect and false indications in the bottom-left of the picture is the most important to focus on classification in industry production.

Figure 14 shows the t-SNE visualization after the feature extraction by four representative CNN models. Notably, a majority of the samples are well-clustered, indicating effective feature extraction. However, in the case of VGG16 and ResNet50, as shown in Fig. 14(a, b), there is a small proportion of samples that remain mixed together. Conversely, there is only a few false indication samples mixed with the defect indications in MobileNetV2 and InceptionV3, as Fig. 14(c, d) show. It validates the excellent feature extraction performance of MobieleNetV2 and InceptionV3 with respect to VGG and ResNet models, and its increased accuracy in defect detection.

To gain deeper insights into the feature extraction of fluorescent defects by CNN model, Grad-CAM [36] was used to visualize the extracted features. The last feature extraction layer before classification was extracted as the input to visualize here. As shown in Fig. 15, Grad-CAM (GC) heat map, Guided Grad-CAM (GGC) and feature fusion image are used to visualize the features extracted by the models. Figure 15(a) reveals that the final extracted feature presents like a hollow polygon. When combined with Fig. 15(b), the GGC image indicates that the extracted features correspond to the boundaries of fluorescent display, with a probability of being defective at 0.87 (Fig. 15(c)). It indicates that VGG16 pays more attention to boundary features during classification. ResNet50, as shown in Fig. 15(d-f), the behavior of the extracted features is somewhat similar to that of VGG16, but it focuses more on the lower right boundary. Conversely, InceptionV3 appears to concentrate more on the bright region and the upper boundary of the display, as depicted in Fig. 15(g-i). As for MobileNetV2, illustrated in Fig. 15(j-l), the GC heatmap and GGC image show that the features are distributed at the top and bottom in an oval-like shape, with a high probability of being defective at 0.99. This suggests that MobileNetV2 pays particular attention on the upper and lower boundaries of fluorescent display and gradually expanding towards the center of the bright region.

The attention to feature extraction varies among the four models, but there are some commonalities among them, particular in terms of boundary features and the texture of bright region. These features are especially prominent in InceptionV3 and MobileNetV2, which also happen to exhibit better performance. These features are similar to the current guidelines for manual inspection. However, the visualization of features extracted by CNN models enables a more precise localization of upper and lower boundary features of fluorescent displays, as well as the texture of bright regions. This provides valuable guidance for the design of white box fluorescent defect displays features.

In this paper, eight different CNN architectures were tested to implement an automated fluorescent defect detection system. This system is able to detect the fluorescent defect by combining the information from the input images and the expertise of trained operators, leading to improved overall effectiveness in the FPI process. Several conclusions could be summarized as follows:

The MobileNetV2 exhibited the best performance among these architectures, achieving high values for precision, recall, F1-score, accuracy and AUC (99.2%, 99.2%, 99.2%, 99.2%, and 0.999, respectively) for the defect type.
The models trained on balanced dataset exhibited superior performance compared to those trained on imbalanced datasets. In contrast, the models trained on imbalanced datasets showed similar performance for all discrimination thresholds when compared to the models trained on balanced datasets.
The Guided Grad-CAM was employed to visualize the extracted features. The result shows that different models focus on different features of the fluorescent display, but they focus on both boundary features and bright region texture features, which is more evident in the better performing MobileNetV2. It suggests that more attention should be paid to the upper and lower boundaries of the fluorescent display and the texture feature of the bright region.

Author contribution: All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Huipeng Yu, Chengyang Ding, Maodong Kang, Yahui Liu and Jun Wang. The first draft of the manuscript was written by Huipeng Yu and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Data availability: Not applicable.

Code availability: Not applicable.

Ethics approval: The present manuscript contains an original research; it has not been published elsewhere, it has not been submitted simultaneously for publication elsewhere, and publication has been approved by the co-author.

Consent to participate: Not applicable.

Consent for publication: The present paper does not require any consent to publish since all the figures, tables, and text are original.

Conflict of interest: The authors declare no competing interests.

Acknowledgments

This study was financially supported by the following funding sources: the National Science and Technology Major Project of China (J2019-VI-0004-0117), the National Natural Science Foundation of China (51971142), the National Natural Science Foundation of China (52031012), the National Natural Science Foundation of China (52090042), and Shanghai Pujiang Program (2022PJD032).

Stolt R, Elgh F, Andersson P. Design for Inspection - Evaluating the Inspectability of Aerospace Components in the Early Stages of Design. Procedia Manufacturing. 2017;11:1193-9.
Gupta M, Khan MA, Butola R, Singari RM. Advances in applications of Non-Destructive Testing (NDT): A review. Advances in Materials and Processing Technologies. 2021:1-22.
Lampman S, Mulherin M, Shipley R. Nondestructive Testing in Failure Analysis. Journal of Failure Analysis and Prevention. 2022;22:66-97.
Todorov EI. Nondestructive Testing and Evaluation. In: Caballero FG, editor. Encyclopedia of Materials: Metals and Alloys. Oxford: Elsevier; 2022. p. 168-92.
Adair TL, Wehener DH, Kindrew MG, Winter HI, MacCracken B. Automated fluorescent penetrant inspection (FPI) system is triple A. 1998 IEEE AUTOTESTCON Proceedings IEEE Systems Readiness Technology Conference Test Technology for the 21st Century (Cat No98CH36179)1998. p. 498-529.
Zheng J, Xie W, Viens M, Birglen L, Mantegh I. Design of an advanced automatic inspection system for aircraft parts based on fluorescent penetrant inspection analysis. Insight-Non-Destructive Testing and Condition Monitoring. 2015;57:18-34.
Shipway NJ, Barden TJ, Huthwaite P, Lowe MJS. Automated defect detection for Fluorescent Penetrant Inspection using Random Forest. NDT & E International. 2019;101:113-23.
Tang Y, Niu A, Wee W, Han C. Automated inspection system for detecting metal surface cracks from fluorescent penetrant images: SPIE; 1995.
Alba CAC. Image acquisition and processing in an attempt to automate the fluorescent penetrant inspection: École de technologie supérieure; 2011.
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436-44.
Shipway NJ, Huthwaite P, Lowe MJS, Barden TJ. Using ResNets to perform automated defect detection for Fluorescent Penetrant Inspection. NDT & E International. 2021;119:102400.
Niccolai A, Caputo D, Chieco L, Grimaccia F, Mussetta M. Machine Learning-Based Detection Technique for NDT in Industrial Manufacturing. Mathematics. 2021;9:1251.
Trimm M. An overview of nondestructive evaluation methods. Practical Failure Analysis. 2003;3:17-31.
Şimşir M, Ankara A. Comparison of two non-destructive inspection techniques on the basis of sensitivity and reliability. Materials & Design. 2007;28:1433-9.
Yang Y, Min Z, Zuo J, Han B, Li L. Crack identification of automobile steering knuckle fluorescent penetrant inspection based on deep convolutional generative adversarial networks data enhancement. Frontiers in Physics. 2022;10.
Yang Y, Yang Y, Li L, Chen C, Min Z. Automatic Defect Identification Method for Magnetic Particle Inspection of Bearing Rings Based on Visual Characteristics and High-Level Features. Applied Sciences. 2022;12:1293.
Juyal A, Sharma S, Matta P. Object Classification Using A rtificial i ntelligence Technique s i n Autonomous Vehicles. 2023 3rd International Conference on Artificial Intelligence and Signal Processing, AISP 20232023.
Prakash M, Janarthanan M, Devi D. Multiple Objects Identification for Autonomous Car using YOLO and CNN. Proceedings of the 7th International Conference on Intelligent Computing and Control Systems, ICICCS 20232023. p. 597-601.
Singal G, Singhal H, Kushwaha R, Veeramsetty V, Badal T, Lamba S. RoadWay: lane detection for autonomous driving vehicles via deep learning. Multimedia Tools and Applications. 2023;82:4965-78.
Ahmad F, Khan MUG, Tahir A, Masud F. Deep ensemble approach for pathogen classification in large-scale images using patch-based training and hyper-parameter optimization. BMC Bioinformatics. 2023;24.
Wu L, Zhang Y, Jiang Q, Zhang Y, Ma L, Ma S, et al. Study on CAT activity of tomato leaf cells under salt stress based on microhyperspectral imaging and transfer learning algorithm. Spectrochimica Acta - Part A: Molecular and Biomolecular Spectroscopy. 2023;302.
Hellström H, Liedes J, Rainio O, Malaspina S, Kemppainen J, Klén R. Classification of head and neck cancer from PET images using convolutional neural networks. Scientific Reports. 2023;13.
Du W, Shen H, Fu J, Zhang G, He Q. Approaches for improvement of the X-ray image defect detection of automobile casting aluminum parts based on deep learning. NDT & E International. 2019;107:102144.
Hu Y, Wang J, Zhu Y, Wang Z, Chen D, Zhang J, et al. Automatic defect detection from X-ray Scans for Aluminum Conductor Composite Core Wire Based on Classification Neutral Network. NDT & E International. 2021;124:102549.
Shorten C, Khoshgoftaar TM. A survey on Image Data Augmentation for Deep Learning. Journal of Big Data. 2019;6.
ASTM. Standard Practice for Liquid Penetrant Testing. America2021. p. 1-32.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition2016. p. 770-8.
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition2015. p. 1-9.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556. 2014.
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the Inception Architecture for Computer Vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)2016. p. 2818-26.
Howard A, Sandler M, Chen B, Wang W, Chen LC, Tan M, et al. Searching for MobileNetV3. 2019 IEEE/CVF International Conference on Computer Vision (ICCV)2019. p. 1314-24.
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC. MobileNetV2: Inverted Residuals and Linear Bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition2018. p. 4510-20.
Heravi AR, Hodtani GA. A New Correntropy-Based Conjugate Gradient Backpropagation Algorithm for Improving Training in Neural Networks. IEEE Transactions on Neural Networks and Learning Systems. 2018;29:6252-63.
Chen G, Choi W, Yu X, Han T, Chandraker M. Learning efficient object detection models with knowledge distillation. Advances in Neural Information Processing Systems2017. p. 743-52.
Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29-36.
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. International Journal of Computer Vision. 2020;128:336-59.

Download PDF

Reviewers agreed at journal
13 Sep, 2023
Reviewers invited by journal
12 Sep, 2023
Editor assigned by journal
11 Sep, 2023
First submitted to journal
08 Sep, 2023

You are reading this latest preprint version

An automated fluorescent defect detection system in precision castings by convolutional neural network

Status:

Version 1

Abstract

Figures

1. Introduction

2. Theory

2.1. Fluorescent penetrant inspection

2.2. Deep learning

3. Experimental method

3.1. Dataset description

3.2. Automated fluorescent defect detection system

3.3. Implementation and evaluation metrics

4. Results and discussion

5. Conclusions

Declarations

References

Status:

Version 1