Applying Deep Learning for automated visual verification of manual bracket installations

doi:10.21203/rs.3.rs-3423625/v1

Download PDF

Research Article

Applying Deep Learning for automated visual verification of manual bracket installations

https://doi.org/10.21203/rs.3.rs-3423625/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

In this work, we explore a deep learning based automated visual inspection and verification algorithm, based on the Siamese Neural Network architecture. This is explored alongside the typical Convolutional Neural Network. Consideration is also given to how the input pairs of images can affect the performance of the Siamese Neural Network, dependent on the nature of the dataset used. A case study is provided from the aircraft manufacturing industry focusing on the visual inspection of wing bracket installations. A novel voting scheme specific to the Siamese Neural Network which sees a single model vote on multiple reference images is validated in this work. The results obtained show great potential for the use of the Siamese Neural Network for automated visual inspection and verification tasks in aircraft manufacturing and similar industries where there is often a scarcity of training data available.

As the world becomes ever-more connected, the demand on the aviation industry to provide more efficient aircraft, in higher volume, while maintaining safety and high standards is challenging even for world-leading aircraft manufacturers [1]. With the aircraft assembly stage still being manual intensive [2], it is difficult to upscale the production of aircraft to meet high demand without overstretching the workforce and jeopardizing quality and safety. Therefore, many aircraft manufacturers, are investigating ways to digitize the aircraft assembly stage of production [1].

This is particularly in areas of installation of components into the aircraft wings, where work is conducted in confined spaces as seen in Figure 1 [3]. There have been some studies on the installation of components into aircraft wings [3] however there is little research into the verification of such installations. Automated verification by visual inspection is commonplace in high-volume, low-variation manufacturing using methods like deep learning. But, this technology is yet to be widely adopted by high value manufacturing industries, like the aircraft manufacturing industry, especially in scenarios shown in Fig 1 [3]. In this work, we develop and compare two deep learning architectures, a Convolutional Neural Network and a Siamese Neural Network, for the purposes of automated visual verification of installed bracket components.

The manual installation of brackets that secure electrical and hydraulic lines is one task that takes place in an enclosed space. This requires a worker to adhere the bracket to the wing securely, within a marked boundary and with only limited visibility (see Fig 2). The conditions for a correct application of the bracket are that it must be within the boundary within a tolerance, and that there must be flow of adhesive around the whole circumference of the bracket (see Fig 2). These are currently checked visually to ensure that they are correctly placed. Despite recent major advancements in computer vision and deep learning, tasks such as visual inspection of installed parts within aircraft wings is still performed manually by workers. Therefore, this research has the overall aim to explore the research question concerning the application of deep learning-based image verification techniques for visual inspection and their performance when applied to enclosed spaces within manufactured structures. In particular the use of a Siamese Neural Network (SNN) in the visual inspection approach is investigated in this research; making use of SNNs ability to determine whether the contents of two images are the same or not, rather than classify the image as with the more routinely used CNN (Convolutional Neural Network) approaches.

In Section 2, a literature review is presented to assess the current state of visual inspection in the aircraft manufacturing industry, and to identify suitable methods to tackle the problem of automated installation verification. Section 3 presents the methodology, including the dataset used, the actual network architectures, training strategies, and the specific tests conducted. Section 4 provides a critical discussion of the results including visiting the hypothesis if SNNs outperform CNNs in this task and whether additional methods like transfer learning and ensemble methods improved performance. The implications of the results in an industrial context are discussed in Section 5 while in Section 6, conclusions are drawn on both the academic and industrial implications of the work; limitations of the research and future work are also highlighted.

2.1 Industrial Visual Inspection Techniques

Visual inspection of manufactured products to ensure they meet standards have been a norm for a very long time. Visual inspection was originally, and in many cases is still carried out manually by human workers across a range of industries, even though it has been shown that humans are prone to making many errors due to various social and environmental factors [4][5]. The manufacturing industry has therefore started to adopt automated approaches to visual inspection to overcome these shortfalls of human workers, however the primary applications of this are in high-volume, low-variance mass manufacturing [6]. Despite human flaws, visual inspection is still carried out in low-volume, high-value manufacturing such as aircraft wing production, where the consequences of inspection errors can be severe [3]. Recent created sophisticated algorithms, such as Deep Neural Networks (DNNs), are slowly having great success in visual inspection in the high-volume manufacturing industry. This is due to their ability to automatically extract features within datasets unlike previous algorithms which require handcrafting of the data before training [7]. A DNN algorithm of particular interest for automated visual inspection is the Convolutional Neural Networks (CNNs) due to their exceptional ability to extract features within images, and then learn relationships between these features to recognise entities of interest within the image. This has led to CNNs being the dominant algorithm in image processing and classification applications [7][8].

The use of CNNs for visual inspection and defects detection in aircraft wings has been carried out in literation [9]. The primary issue highlighted is the scarcity of data to train the CNN. This problem in deep learning projects is prevalent in the high-value manufacturing industry, due to the relatively low quantity of products from which data could be generated.

2.2 Siamese Neural Networks (SNN)

In many applications of CNNs the aim is to classify an image into one of several classes [10] (e.g types of faults in aircraft assembly). However, the primary limitation of CNNs and DNNs in general are that they require very large quantities of data to train them [10] [11]. The quantity of this data needs to be large in terms of the total number of samples, but also with a sufficiently high number of samples in each class. An alternative to the CNN architecture is the Siamese Neural Network (SNN) [12] which aims to determine whether the contents of two images are the same or not, rather than classify the image. SNNs have had a great deal of success in the fields of character recognition, face verification, and signature verification. They have performed very well when presented with a few or even a single training sample per class [12][13][14]. This performance, even when presented with low volumes of training data, is due to the nature of what SNNs learn from that data. In standard CNNs the network aims to learn the unique features that identify the class of an image [11], whereas in SNNs the network aims to learn the more basic features that describe an image and compare these across pairs of images [12][13][14]. These features require much less data to learn and can also make use of transfer learning to start with some of the basic features, such as edges and corners, already learned [15].

The network architecture of an SNN can achieve the task of image verification by comparing two images typically sampled randomly from the available classes of images. These are then passed to the two identical input networks, their features are extracted using standard convolutional layers as in a CNN, the distance between these feature vectors are compared, and then the likelihood of them being of the same entity is produced at the output layer (see Fig 3).

2.3 Image Data Augmentation

Although the volume of data in this industrial application area is relatively small, data augmentation can be used to increase the size of the data set artificially. Data augmentation is usually applied to image data for two reasons. Firstly, to increase the size of the data set to prevent underfitting, and secondly, to introduce variability to the data set to prevent overfitting [16][17]. As supported by the state-of-the-art applications of SNNs for image verification, we focused on the use of basic image manipulations of geometric affine transformations, and colour space transformations [12] [13].

Affine transformations are geometric transforms applied to an image that can include flipping, cropping, translating, shearing, and rotating the image [12][17]. These transforms are sufficiently small and maintain important relationships between features within an image, ensuring it remains similar enough to have the same class [17]. This is particularly important in this research where the aim is to verify the similarity of an input image to a reference image. However, the affine transform of cropping would not be suitable. This is because there is the potential for key areas of the image that characterise the correct installation of the component, to be cropped out. If a defect in the installation lies within this area of the image, the algorithm would have no way of identifying it and would consequently provide a false result. Finally, the alteration of the colour channels in an image allows the model to generalise better to images in different lighting conditions and further increases the size of the data set.

2.4 Neural Network Transfer Learning

A popular learning framework used in deep learning is transfer learning. The convolutional layers within a CNN learn increasingly complex features within the training images, starting with fundamental descriptors, such as edges and corners at the shallow layers, up to domain specific features near the output layer. Low level features such as edges and corners is often present in almost any image from any context, and mid-level features such as textures will be related amongst images that are similar.

Transfer learning takes a neural network pre-trained on a specific dataset and adapts it to a new dataset, potentially in a different application domain [15]. Transfer learning is also commonly employed in tasks where there is insufficient labelled training data to train a model from random initialisation, or where the distribution of the test data changes from that which it was trained on [18]. It has been proven that transfer learning from a model trained on large-scale datasets, such as the ImageNet dataset, can yield much better results than training from scratch. Hence, this learning approach has become common practice for many classification tasks [19]. One such pre-trained model is the VGG16 architecture [20], which achieved state-of-the-art results on the ImageNet classification challenge 2014 and is now used commonly in transfer learning problems, where training data is highly limited and there is no existing large-scale dataset available. As SNNs consist of twin CNNs, transfer learning can be used for each twin to utilise the feature extracting capability of the pre-trained model. These features can then be passed to the common comparison layers which can be trained specifically for the verification task. This has been proven to work when transfer learning from several state-of-the-art CNN architectures [21]. However, according to our knowledge, there lacks research into ensemble methods for SNNs specifically using bagged model voting. An ensemble vote on the similarity between a reference and test image can be used to decide whether the images are the same or not.

The methods employed for implementing deep image verification in this research differ slightly to those typically used in deep learning projects. Usually the data is considered first, however in this work the model has been considered beforehand. This is due to the fact that the type of data is already known and the aim is to apply a new model type. Hence, a suitable model and architecture must be selected.

3.1 Model Selection

The literature reviewed in (summarized in section II of this paper) largely informed the model selection process. It was clear that SNNs had promise for the problem at hand, and so this architecture was selected as the image verification model to investigate. To set a benchmark for performance, the decision was also made to train and test a standard CNN. Additional models for both the CNN and SNN were then trained and tested using transfer learning. Finally, the decision was made to also implement an ensemble method for the SNN, informed by current literature. This would allow for novel research in the use of SNN ensemble methods, where an increase in performance was hypothesised as already shown to be the case for CNNs in literature.

3.2 Dataset Selection

The aim of this research is to explore the research question concerning the application of deep learning-based image verification techniques for visual inspection and their performance when applied to enclosed spaces. In meeting this aim a case study was employed that involved the automatic verification of bracket installations within aircraft wings during the assembly stage. The case study involved the use of experimental data representative of the real-world scenario on the aircraft assembly line. This required images of components similar to the brackets used within the aircraft manufacturing industry. These images would also have to be of similar quantity and variability as would be expected from an aircraft assembly line. Bearing this in mind, a systematic search was conducted over many open-source data repositories, using the documentation provided by the case study organisation as a reference for what a suitable component would look like. Throughout these, there were no suitable datasets found that represented the scenario to be emulated. The decision was therefore taken to generate a custom dataset, using a replica of the bracket adhered to a surface with a visual boundary marking.

3.3. Model Performance Tests

When choosing how best to test the models developed, standard practices in machine learning were followed whilst also considering the industrial application. The typical approach of validating the model on a validation dataset was taken, given the indication of the generalisability of the models to new data, and to how well they would perform if deployed in the industrial context. To ensure that the models developed were comparable, the same training and validation datasets were used for all models and architectures. Likewise, the same performance metrics were also used for each model and architecture. In this study, a holdout test set was not used as an additional test of model performance. This was due to the limitations on data and the amount of time it took to develop the training and validation sets. The performance of the models on the validation sets did however give estimates of their generalisability and so were sufficient to draw conclusions from when comparing approaches.

3.4 Verification of Results

To give confidence in the results obtained using the custom dataset generated, the decision was made to repeat the experiments on a well-known baseline dataset. It could then be seen whether the same trends were present in the results. This was an important additional test for this study, where the data used was created specifically for the task. It would prove that the previous results observed on the hand-crafted dataset were not obtained through any bias or simplicity of the data.

3.5 Research Workflow

To give an overview of the workflow followed to develop the models and achieve the final results, the flowchart in Fig 4 has been created where the order in which developments were made can be seen. Each step of the workflow is now detailed in section 4 of this paper.

This section presents how experimental data was generated and how each model was created and trained. We also explain the tests conducted to assess the performance of the models and the verification done on the results.

4.1 Problem Definition for Case Study Area

The case study organisation for this research is an aircraft manufacturer and the studied processes are drawn from the assembly stage for the wings. The current installation process for components on the inside of aircraft wings requires workers to place and secure them in defined places, as indicated by the wing frame suppliers. A component of difficulty for the case study organisation engineers is the wing brackets which secure electrical and hydraulic lines, seen in Fig 5. This requires the engineer to adhere the bracket to the wing, within the marked boundary, with only limited visibility. The conditions for a correct application of the bracket are that it must be within the boundary within a 7mm tolerance, and that there must be flow of adhesive around the whole circumference of the bracket. Fig 5 shows these conditions for installation.

In the case study organisation’s drive to automate the verification of aircraft wing assembly, the installation of these brackets will be a key task. The automatic verification of the bracket installation process will have to respect the criteria that the human operators adhere to. Therefore, the solution proposed must be able to determine whether a 7mm tolerance is maintained around the edge of the bracket, and whether there is complete adhesion around the whole circumference of the base. If these criteria are met, then the bracket can be considered installed correctly. If either of these criteria are violated, then the solution will have to identify this bracket for second inspection by a human operator, followed by a corrective procedure if necessary.

4.2 Component Representation and Data Generation

A decision was taken to generate a custom dataset for this project because of the lack of open source databases with the brackets needed for aircraft installation. A suitable mock setup of the bracket installation process was developed. The replica bracket was created using LEGO® bricks to represent the bracket and modelling compound to represent the adhesive applied to the bracket base. Modelling compound was chosen over a true adhesive for the ease of removal and reapplication. A solid blank background with a slight shine and texture was used to represent the wing surface, and a suitable boundary marking applied to locate the placement of the bracket (see Fig 5).

Experimental data was generated by taking images of the bracket installed both correctly and incorrectly. Consideration was given to the subtleties in each of these cases, to identify the sub-cases that might exist within both, and to ensure that data was gathered on as many variations of the installation as possible and supporting generalization of the trained model.

For the correct installation case, this involved all those installations which met the criteria as discussed previously. Special attention was paid to the edge cases, where the bracket was just within or outside the tolerance range from the marking. It was expected that these would be the most difficult for the models to distinguish between correct and incorrect installation, so sufficient images of these were obtained to support any conclusions drawn from the results. In the incorrect installation case, there were several sub-cases identified that the model would have to learn as belonging to the incorrect distribution of images. These were as follows: (1) Inside guide, no adhesion; (2) Outside guide, correct adhesion; (3) Outside guide, incorrect adhesion in one location; (4) Outside guide, incorrect adhesion in many locations; (5) Outside guide, no adhesion (Fig 6).

Images were obtained for each of these sub-cases, again with attention given to including images of the edge cases. By ensuring that all these cases were covered in the data generation, the chances of inherent bias in the dataset were mitigated. Hence, any results obtained in testing would be representative of what would be expected given a true dataset. This dataset was validated by an Aerospace expert as what would be expected on a true production line. The images taken were RGB images of shape (256, 256, 3). For training, there were 70 images of correctly installed brackets and 70 images of incorrectly installed brackets, totalling 140 training images.

In order to replicate a low data scenario, 70 images of incorrectly installed brackets corresponding to 10 images of each of the 7 sub-classes identified previously were collected. For validation, there were 100 images of correctly installed brackets and 100 images of incorrectly installed brackets, totalling 200 validation images. Furthermore, a framework to procedurally load the generated images and perform random augmentations was created. This included affine transforms of translation, rotation, scaling, shear, horizontal and vertical flipping, and the colour space transform of scaling image brightness. The images were also normalised to aid the gradient descent algorithm in minimising the loss function.

4.3 Baseline CNN Model

A baseline CNN architecture was implemented (Fig 7) and trained as a binary classifer. Each convolutional layer was followed by batch normalisation, a ReLU activation function, 30% dropout, and a Keras max-pooling layer with default settings. Optimisation of the model was achieved using the ADAM algorithm with a learning rate of 0.0001 and a binary cross-entropy loss function. Training was performed over 25 epochs with a batch size of 16. Images of either correctly installed brackets or incorrectly installed brackets were passed to the network during training, along with their class labels. The model predicted the probability of the image belonging to the positive class, which in this case was of the image showing an incorrectly installed bracket. The default threshold of 0.5 was applied to determine the binary class prediction.

4.4 SNN Model

The SNN architecture was implemented with two baseline CNNs as seen in Fig 7. The dense feature layers of each twin layer were followed by 30% dropout also, however, no dropout was added to the L1 distance layer. There were no learnable parameters in this layer and hence no chance of overfitting. Optimisation of the model was achieved using the same parameters as the baseline CNN model.

The SNN was trained by passing pairs of images to the twin CNNs (see Section III.D for further detail on the selection of this pair), along with the label of whether they were of the same class or not. The identical CNN networks with shared weights would then perform the same feature extraction on the

images, producing feature vectors to be compared in the L1 distance layer. The distance between the feature vectors would be calculated as in Equation 1, where p and q are the two n-dimensional feature vectors. This would then be converted to a similarity score at the output layer by passing it through a sigmoid function, where a value closer to 1 denoted a higher confidence that the images were of the same class. A default threshold of 0.5 was used to determine whether the final decision was that the images were of the same class or not.

4.5 Model Training with Transfer Learning

The VGG16 model was selected for use in this study, due to its proven ability to work well in transfer learning. The VGG16 architecture (shown in Fig. 8) is simple when compared to other state-of-the-art CNNs, such as ResNet [22] and Inception [23]. This enabled us to focus on evaluating the overarching SNN architecture instead of dealing with various aspects of the network. The input layer of the VGG16 model was replaced with a new input layer suitable for taking the images in the bracket installation dataset. The earlier layers of the VGG16 model were then set to remain fixed throughout training (i.e., a learning rate of 0), but the last block of convolutional layers was set to be trainable. This would allow specific high-level features of the brackets and adhesive to be learned. Finally, a randomly initialised dense layer of 128 units was added to the end of the VGG16 model to create the feature vector, and the same output layer in Section III.D was added. The dense layer after the VGG16 model had a dropout of 50%, and the model was optimised using Adam with a learning rate of 0.0001 and a binary cross-entropy loss function. The model was trained over 25 epochs with a batch size of 16. For the SNN, each of the twin CNNs were replaced with the VGG16 model and the input layer of each modified as above. The outputs of each VGG16 model were then passed to separate dense layers of 1024 units with 30% dropout to create the feature vectors. The remainder of the model then stayed the same as for the base SNN. The model was optimised using the same optimiser and training options as before.

4.6 SNN Input Pair Consideration

The two images selected at each training step for an SNN are typically randomly sampled from the available classes. In this study, the image pairs could be one of: (correctly installed, correctly installed), (correctly installed, incorrectly installed), (incorrectly installed, correctly installed), or (incorrectly installed, incorrectly installed). Initially, this default method of selecting the images was used.

However, following on from our experimental results, the method of passing in the data was revisited. As discussed previously, the incorrectly installed bracket case had several sub-cases where each sub-case could differ significantly from the others. It was therefore hypothesized that attempting to label images from different incorrect sub-cases as being similar would cause optimisation problems for the SNN. An additional test was therefore conducted, looking to investigate the performance impact of the input image pair choice. For this, one of the input images was always presented as a reference image of a correctly installed bracket, and always to the same input of the network. The other input image could be of either a correctly installed bracket or any case of incorrectly installed bracket. This method of passing the input images is referenced by “custom input image pair”. The method was justified by the nature in which the SNN would be used in deployment. One image would be a reference image of a correctly installed bracket and the second would be of a newly installed bracket. The model would then verify whether the bracket had been installed correctly by comparing the images.

4.7 Edge Case Consideration

The edge case images were of particular interest in this study. It was expected that these would be where most misclassifications would be made. The edge cases are those where the bracket is placed just within or outside the tolerance of the marked boundary and with correct adhesion. All the edge case images in the original dataset were taken into a separate dataset, and then additional images were taken and added to create a suitably sized dataset specifically for the edge case images. The new edge case dataset had 50 images each of correctly and incorrectly installed brackets for training, resulting in 100 images. For validation, there were 40 images of each case, resulting in 80 images in total.

4.8 SNN Similarity Voting

Ensemble methods involve training multiple models on subsections of the training data and then voting on the same set of testing data. When verifying whether a new test image is the same as a known reference image, the similarity score will depend on the reference image used. This means that the resulting decision of whether the images are the same or not could change depending on which reference image is used and how similar it is to the test image provided. In deployment into aircraft assembly lines, verification will be done by providing a reference image to the SNN of a correctly installed bracket, and then passing it an image taken of the newly installed bracket. It is expected that following training of the model, if the test image and the similar reference image were passed through the SNN then these would be identified as being similar, and the newly installed bracket would be verified as having been installed correctly.

If however an incorrectly installed bracket image were used as the reference image, the similarity score generated may be more uncertain and lie around the threshold value of 0.5. The bracket may then be identified as being installed incorrectly. Furthermore, using the bagged model voting as a reference, the hypothesis here is that the dependency of the verification result on the reference image used would be reduced if the test image were compared to multiple reference images. Given the limited distribution of possible reference images, the test image should be similar to more reference images than not. Hence, when compared to enough reference images, it would be expected to see the majority of the outcomes being that the images are similar. A majority voting rule would then produce the correct result. Fig 9 shows the method used at testing to implement this voting scheme. Compared to the typical bagging method, multiple test input pairs are evaluated on a single model and a vote had on the final outcome, rather than evaluating multiple models on a single test input.

4.9 Testing and Evaluation

To evaluate how the changes made improved or hindered performance, suitable metrics were required to assess the models. Though the SNN performed image verification rather than classification, the typical classification performance metrics still applied. In particular, the accuracy, precision, and recall metrics were selected. The accuracy metric was selected as an initial indicator of model performance, which allowed for the training progress to be monitored and simple comparisons to be made between models. The accuracy by itself was not a sufficient indicator of performance however, and so the precision and recall metrics were also observed.

The chosen metrics enabled quantitative assessment of the model performances and also held important industrial implications. The precision of any model proposed would indicate what proportion of brackets identified as being incorrectly installed actually were incorrectly installed. From another perspective, it could be inferred from this value how often a bracket identified as being incorrectly installed was in fact correctly installed.

This metric, known as the False Discovery Rate (FDR), is calculated from the precision metric. Aerospace companies would be interested in these two metrics, as they would indicate how often a human operator would have to give a second opinion on the automated test. If the precision was too low, or conversely the FDR was too high, then the solution would be returning many false positives (FPs). It would then not be cost effective, as human operators would still be a common necessity in the procedure. Perhaps the most important metric in safety critical sector like Aerospace is the recall. Practically, this metric shows what proportion of incorrectly installed brackets are identified as such. From a different perspective, the number of incorrectly installed brackets falsely identified as being correctly installed can be evaluated by another metric, the False Negative Rate (FNR). A low recall or high FNR value would indicate many incorrectly installed brackets were passing through undetected, which could have severe consequences should aircraft be put through to operation with these onboard.

4.10 Results Verification on a Comparable Dataset

In order to give confidence in the results obtained using the custom dataset generated, the decision was made to repeat the experiments on a well-known baseline dataset. It could then be seen whether the same trends were present in the results. This was an important additional test for this study, where the data used was created specifically for the task. It would prove that the previous results observed on the hand-crafted dataset were not obtained through any bias or simplicity of the data. The Omniglot dataset (see Fig 10) was selected for this, which is a popular benchmark for SNNs [12].

Modifications to the dataset were necessary to formulate the problem in the same way as for the bracket verification task. To make the dataset sufficiently similar to that used in the bracket verification task, two of the alphabets were chosen that were similar in nature to each other. Namely, these were the Latin and Greek alphabets. This maintained the similarity that was present between the correctly and incorrectly installed brackets. Examples of similar and dissimilar images for this task are seen in Fig 10, where in this case similarity refers to whether the character is from the same alphabet or not. With the new dataset, the task was to perform alphabet verification at the alphabet level because each alphabet was analogous to the correctly or incorrectly installed bracket classes, and each character was analogous to the different sub-classes of brackets as described previously.

This section presents results obtained, comparisons between CNN and SNN, analysis on the effectiveness of transfer learning, similarity voting and what the results mean in an industrial context. The Python programming language was used along Keras API and a TensorFlow backend in this work. This enabled access and customization flexibility on a vast array of matured deep learning tools and model structures.

5.1 Comparison of Models Trained from Random Initialisation

The validation metrics of the baseline CNN trained from random initialisation showed that the model had learned to classify every validation image as being of an incorrectly installed bracket (Table 1). This is evident from the 100% recall and 50% precision, meaning that all images of incorrectly installed brackets were identified, but only 50% of the images identified as being of incorrectly installed brackets actually were. When compared to the validation performance of the SNN trained from random initialization (See Table 2), the CNN’s accuracy indicated that the performance was no better than random verification of the images. Hence, SNN was no better but no worse than the baseline CNN. Also, it became apparent that although the SNN did not require many images per class to train, it still required a considerable number of images in total in the dataset if it were to be trained from random initialisation. Consequently, we focused on approaches using transfer learning to allow for more meaningful results to be obtained and for better comparisons of the architectures.

Table 1: Performance of the baseline CNN

Table 2: Performance of the SNN

5.2 Comparison of Models using Transfer Learning

As informed by literature [15], it was expected that transfer learning would boost performance of both the CNN and SNN, particularly in this low-quantity data task. Firstly, comparing the CNN with transfer learning to that trained from random initialisation, there was an increase in overall validation performance as shown by the improvement in accuracy to significantly better than random. Although the recall reduced, this was due to the classifier attempting to correctly classify the correctly installed brackets, something which it did not do at all before using transfer learning. Next, comparing the SNN with transfer learning to that without, there was a slight performance increase observed. This was seen in the accuracy and significantly so in the precision, but there was a large decrease in the recall. This was again a concerning result for the SNN, potentially indicating that it would not be appropriate for this task before even considering whether it was better than the CNN. However, after the consideration given to the operation of the SNN as discussed in section IV, the SNN performance was improved. Table 2 summarises the comparisons between models trained with transfer learning. Comparing the two architectures with transfer learning revealed that CNN outperformed the SNN across all metrics. This was expected due to the previous observation that the SNN had not improved in performance when using transfer learning. Nevertheless, SNN performance improved when image pairs where used.

5.3 Effects of Input Image Pairs on SNN Performance

With the use of input image pairs, SNN was seen to perform equally as well as the CNN in validation. The CNN was seen to be more precise than the SNN with near-perfect precision, however the SNN achieved higher recall (Table 2). These results did not prove either way whether the SNN was better or worse than the CNN at performing visual inspection with low quantities of data, but from an industrial perspective the differences in precision and recall have significant implications. Although the SNN did not prove better or worse than the CNN for this task, the significant improvement in SNN performance when using the custom input image pair highlighted the importance of considering the data. The improvements seen show that the variations in the incorrect installation sub-cases were the source of the model instability during training. This was supported further by analysing the training graphs for the SNN with and without the new method of passing the input images. Fig 11 shows the instability during training and that no real convergence was reached. Fig 12 on the other hand shows a smooth training process and that convergence was reached. The variations in the incorrectly installed class and its sub-classes likely hindered the CNN performance. The solution to this problem was one that is specific to the SNN architecture and CNNs would not be able to utilise this solution because of their single input. The CNN did however show greater resilience to this problem than the SNN and still achieved a similar level of performance.

5.4 Edge Case Discrimination

The results obtained by both the CNN and SNN on the edge case data confirmed that it is possible for these models to discriminate between the edge cases. That is, brackets just within and just outside the tolerance, with no visual marking to distinguish the boundary. The CNN with transfer learning performed very well, even more so than on the whole dataset, as seen in Table 1. This was likely due to there being less variation in the incorrect case, as the only incorrect sub-case included in this set was for brackets with correct adhesion but placed just outside the tolerance. This higher correlation between the images in that class would have then led to more stable training of the network, yielding the better performance seen. The SNN with transfer learning and the custom input image pair underperformed when compared to its performance on the whole dataset (Table 2). However, with similarity voting this drastically improved.

5.5 Observations from Results Verification

Repeating the primary experiments on the custom Omniglot dataset yielded reassuring results. Starting with the CNN and SNN, both with transfer learning and the SNN with the custom image input pair, they were both seen to perform similarly to how they did on the bracket installation dataset. The CNN did perform better than previously, but not to the extent that would indicate the CNNs superiority for the task. When the similarity voting was introduced for the SNN, a similar result to that seen on the bracket installation dataset was observed as seen in Table 3; this led to the conclusion that the similarity voting method introduced in this study could have significant potential in overcoming the precision-recall trade-off.

Table 3. CNN and SNN performance on omniglot dataset

5.6 Industrial Consideration

This study was undertaken to investigate appropriate methods for performing visual inspection of installed components when little data was available for model training. When using transfer learning, the validation precision achieved was high, which in an industrial context would mean few FPs would need investigating unnecessarily by human operators, making the solution more cost-effective. However, the recall achieved was low and would not be acceptable in industry, as this would imply that nearly half of all incorrectly installed brackets were passing through inspection unnoticed. There would however be opportunities to improve the recall if further methods were employed, such as ensemble models. As such, the use of CNNs should not be ruled out until further research. SNN, particularly with similarity voting or transfer learning, is potentially a strong candidate in low-quantity data scenarios. It achieved very high recall, which is an important metric for the safety regulators of aircraft manufacturing companies. This would give confidence that the number of incorrectly installed brackets making it onto operational aircraft unnoticed would be low, especially considering that the frequency of incorrectly installing brackets is low as it is. The precision achieved by the SNN was relatively high, although perhaps not to the level that would make the solution usable in industry. Additional work would further improve precision, and possibly recall using various techniques.

The contributions made in this research apply to both Industry and academia. From an industrial perspective, our contributions can be applied to the aircraft manufacturing industry when visual inspection processes on assembly lines need to be automated. Specifically, an image verification approach has been presented that could take in a test image of a newly installed component, whether installed by a human operator or autonomously, and verify whether it has been installed correctly. Considering the safety nature of the aircraft manufacturing industry, performance metrics were observed that could be attractive to the sector. From an academic perspective, we investigated the effectiveness of SNN for image verification with low quantities of training data. It was shown that the SNN architecture was suitable and could achieve a reasonable level of performance when compared with CNN and potentially better when additional methods are applied. It was confirmed that transfer learning can allow both architectures to significantly increase their level of performance, even with low quantities of data.

In the future, we would implement and use the contrastive loss function for training the SNNs, as opposed to the binary cross-entropy loss function due to claims in [21] that it significantly improves the performance of SNNs. Also, other alternative state-of-the-art CNNs, such as ResNet and Inception, will be trialed in place of the VGG16 model when transfer learning. From an industrial perspective there would be two more avenues of interest. The first of these would be to apply the methods of this study to images of the real brackets used on aircraft wings. Similar trends in the results would be expected, verifying the methods’ applicability to the real-world problem. Furthermore, the results would prove that the proposed solution can be transferred to variations of the bracket in new aircraft models, without the need for retraining leading to cost savings.

The SNN with additional methods has proven to be a suitable deep learning-based approach to the visual inspection task, even when training data is scarce. Although the performance was significant, additional work would still be required in collaboration with the aircraft manufacturing industry to determine suitable performance requirements. If developed further and accepted as a fully autonomous application for industry deployment, this would mark another step towards the automation of the aircraft assembly process, and could be followed thereon by further work to automate the full installation procedure of components into aircraft wings.

Ethical Approval

N/A

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

All authors were involved in the writing editing and production of the final manuscript.

Funding

N/A

Availability of data and materials

No datasets are available with this publication.

Poulton, G. (2017). : Wing of the future, Airbus, [Online]. Available: https://www.airbus.com/newsroom/news/en/2017/01/Wings-of-the-future.html. [Accessed 4 Oct 2023].
Grendel, H., Larek, R., Riedel, F., & Wagner, J. C. (2017). : Enabling manual assembly and integration of aerospace structures for Industry 4.0 - methods, Procedia Manufacturing, vol. 14, pp. 30–37.
Oyekan, J., Chen, Y., Turner, C., & Tiwari, A. (2021). Applying a fusion of wearable sensors and a cognitive inspired architecture to real-time ergonomics analysis of manual assembly tasks. Journal of Manufacturing Systems, 61, 391–405.
Latorella, K. A., & Prabhu, P. V. (2000). A review of human error in aviation maintenance and inspection. International Journal of Industrial Ergonomics, 26(2), 133–161.
Judi, E., Drury, C. G., Speed, A., Williams, A., & Khalandi, N. (2017). : The Role of Visual Inspection in the 21st Century, in Human Factors and Ergonomics Society annual meeting, Austin, Texas.
Shirvaikar, M. (2006). : Trends in automated visual inspection, Journal of Real-Time Image Processing, no. 1, pp. 41–43.
Aloysius, N., & Geetha, M. (2017). : review on deep convolutional neural networks, in 2017 International Conference on Communication and Signal Processing (ICCSP), Chennai, India.
Alom, M. Z., Taha, T. M., Yakopcic, Westberg, S., Sidike, P., Nasrin, M. S., Esesn, V., Awwal, B. C. A. A. S. and, & Asari, V. K. (2018). : The History Began from AlexNet: A Comprehensive Survey on Deep Learning Apporaches, arXiv.org.
Doğru, A., Bouarfa, S., Arizar, R., & Aydoğan, R. (2020). Using convolutional neural networks to automate aircraft maintenance visual inspection. Aerospace, 7(12), 171.
Rawat, W., & Wang, Z. (2017). : Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review, Neural Computation, vol. 29, no. 9, pp. 2352–2449.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). : ImageNet classification with deep convolutional neural networks, Communications of the ACM, vol. 60, no. 6.
Koch, G., Zemel, R., & Salakhutdinov, R. (2015). : Siamese neural networks for one-shot image recognition, ICML deep learning workshop, vol. 2.
Taigman, Y., Yang, M., Ranzato, M. A., & Wolf, L. (2014). : DeepFace: Closing the Gap to Human-Level Performance in Face Verification, in IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH.
Bromley, J., Bentz, J. W., Bottou, L., Guyon, I., Lecun, Y., Moore, C., Sackinger, E., & Shah, R. (1993). : Signature Verification using a Siamese Time Delay Neural Network, International Journal of Pattern Recognition and Artificial Intelligence, vol. 7, no. 4, pp. 669–688.
Shin, H. C., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., & Summers, R. M. (2016). Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characterstics and Transfer Learning. IEEE Transactions on Medical Imaging, 35(5), 1285–1298.
Shorten, C., & Khoshgoftaar, T. M. (2019). : A survey on Image Data Augmentation for Deep Learning, Journal of Big Data, vol. 6, no. 60.
Fawzi, A., Samulowitz, H., Turaga, D., & Frossard, P. (2016). : Adaptive data augmentation for image classification, in 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA.
Pan, S. J., & Yang, Q. (2010). A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., & Fei-Fei, L. (2015). : ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision (IJCV), pp. 211–252.
Simonyan, K., & Zisserman, A. (2015). : Very Deep Convolutional Networks for Large-Scale Image Recognition, arXiv.org.
Shorfuzzaman, M., & Hossain, M. S. (2021). : MetaCOVID: A Siamese neural network framework with contrastive loss for n-shot diagnosis of COVID-19 patients, Pattern Recognition, vol. 113.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). : Deep Residual Learning for Image Recognition, arXiv.org.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2014). : Going Deeper with Convolutions, arXiv.org.
Buckland, M., & Gey, F. (1994). The relationship between Recall and Precision. Journal of the American Society for Information Science, 45(1), 12–19.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Applying Deep Learning for automated visual verification of manual bracket installations

Status:

Version 1

Abstract

Figures

1. Introduction

2. Literature Review

3. Methodology

4. Algorithm Development and Industrial Case Study Outline

5. Results, Analysis and Discussion

6. Conclusions

Declarations

References

Additional Declarations

Status:

Version 1