Steel is one of the most important of all metals in terms of its quantum and variety of use. Steel has played a vital role in advancing industrial societies, and it's often used as a key indicator to assess a country's level of development [1]. As per the World Steel Association, production of crude steel in 2022 was 1,885,738 thousand tons (Mt). When we compare steel to similar materials, it stands out with its cost-effective production process. Extracting iron from ore demands only a quarter of the energy required for aluminium extraction, making steel an energy-efficient choice. Additionally, steel is environmentally friendly due to its recyclability, contributing to sustainability. With 5.6% of the Earth's crust composed of iron, it offers a stable source of raw materials. Impressively, steel production surpasses the combined production of all non-ferrous metals by a factor of 20 [2]. Steel's exceptional strength has revolutionized construction, enabling towering skyscrapers and intricate bridges. It underpins the transportation industry, providing both stability and flexibility for modern designs. Beyond its mechanical prowess, steel fuels industrialization, shaping machinery and tools. In energy, it's vital for power plants and corrosion-resistant pipelines. Steel's recyclability aligns with environmental goals, reducing resource consumption and waste. In an era of heightened environmental consciousness, steel plays a crucial role in promoting sustainability across various sectors.
Quality issues with flat steel can result in substantial economic losses and damage the reputation of steel manufacturers. In the case of thin and wide flat steel, surface defects pose the most significant risk to product quality. Even in situations where internal defects occur sporadically, there is a high likelihood of visible changes in the surface's appearance [3]. Surface defects on steel components, originating from various sources including manufacturing, handling, and environmental exposure, have detrimental effects. Cracks, scratches, and inclusions create stress points, diminishing load-bearing capacity and increasing the risk of premature failure. These defects also heighten susceptibility to corrosion, weakening structural integrity. In industries where aesthetics matter, such as architecture and automotive manufacturing, surface irregularities mar visual appeal. Moreover, defects disrupt manufacturing processes, leading to higher costs, material wastage, and reduced efficiency. They adversely impact wear resistance, corrosion resistance, fatigue strength, and other vital properties of steel. Therefore, detecting these defects is crucial for safety, reliability, and cost-effectiveness.
This process is usually performed manually in industries, which is unreliable and time-consuming. In order to replace the manual work, it is desirable to allow a machine to automatically inspect surface defects from steel plates with the use of computer vision technologies [4]. These traditional methods [5] also faced problems such as low accuracy and high labour intensity. The conventional breakthrough in machine learning was a significant leap forward from manual examination. It is typically initiated with manually extracting features. Subsequently, these features were extracted and then inputted into a classifier to accomplish defect classification. As stated before, due to the dependence on manually formulated feature extraction rules, this approach resulted in weak resilience and limited ability to adapt to new situations. It was easily affected by external factors and noise, consequently diminishing the accuracy of defect detection. Since 2012, CNNs (convolutional neural networks) have taken over as the standard model for vision tasks in the field of computer vision [6]. Since then, object detection has been broadly classified as either single-stage or single-target detectors, or region-based / two-staged detectors. The YOLO family [11, 13–16] is an exemplary technique representing single-stage detection, and the R-CNN [19–21] family represents the two-stage detection algorithms. Deep learning-based industrial research is currently being used in a variety of research fields since it can completely use the data's potential characteristics without the requirement for manually designing them. Luo et al. [7] introduced an algorithm for detecting surface defects using YOLO feature enhancement. This approach enhanced detection speed but exhibited limited accuracy. In a separate study, Liu et al. [8] presented a method for identifying insulators and detecting defects in aerial images by employing the YOLO algorithm by using attention mechanism modules and called it YOLO-SO. By combining these models, they effectively addressed issues related to both the speed and accuracy of insulator defect detection. Shun et al [9] experimented on defect detection using the Yolov5 model which improved their accuracy significantly as compared to the yolov4 model. However, even now, there are quite a few hurdles that we still need to factor in to make improvements.
The typical surface imperfections found in steel surfaces, encompass crazing, inclusions, patches, pitting, rolled-in scale, and scratches. These can be seen in Fig. 1. The intraclass defects in the dataset result in considerable differences in appearance, such as the category scratches exhibiting horizontal, vertical, or slanted scratch defects. Meanwhile, interclass defects such as rolled-in scale, crazing, and pitted surfaces have similar properties to each other. The presence of variations in lighting and material characteristics within grayscale images makes it exceptionally difficult to detect defects that share similarities across different defect categories [10]. Furthermore, considering the diverse range of steel surface defects, some of these defects might overlap in terms of location and similarity in features. In typical classification tasks, the focus is often on identifying defects within a category with the highest confidence level, leading to less precise classification outcomes [4].
Therefore, to solve the problems related to poor detection and classification, as well as improving existing methods for detecting defects in steel surfaces, we introduce the AFF-YOLO (Attention and Feature Fusion based YOLOv5). All the modifications are made in the neck of the architecture. We included an effective channel attention network (ECA-Net) mechanism into the backbone network, connected it in parallel to the C3 module, and termed it ECA-C3 to help YOLOv5 network improve against distracting information and concentrate on useful target objects.
In the body, along with the ECA attention module, we introduce the BiFPN feature fusion. BiFPN is a component introduced in the EfficientDet object detection architecture [12]. It is designed to address some limitations of traditional Feature Pyramid Networks (FPN) used in object detection models. BiFPN enhances the feature pyramid by introducing bidirectional connections and combining different levels of features in a more adaptive manner. This allows information to flow both up and down the pyramid, helping to capture more context and details at various scales. The adaptive spatial feature fusion (ASFF) was introduced for feature fusion of different scales in the prediction head. Using the NEU-DET dataset in the experiment, we assessed our algorithm and contrasted it with the original YOLOv5 algorithm. By including additional attention mechanisms and multi-scale feature fusion techniques, our model outperforms the original YOLOv5 architecture. Our contributions to the model are listed below:
-
We introduce the ECA attention module within the body of the network, replacing the original C3 convolution modules. This module is referred to as the ECA-C3 module.
-
We upgraded the conventional feature pyramid network and by introducing the BiFPN feature fusion, in the body of the network.
-
The ASFF module was attached before the prediction head to introduce feature fusion of varied scales and improve the networks learning.
-
Our model outperformed the original Yolov5 model by showing 6.5% increase in the mAP.
The remainder of this paper follows a structured organization. In Section 2, we delve into the existing body of work related to our research. Section 3 offers an in-depth exploration of both the methodology employed in the original YOLOv5 model and the enhancements introduced in our novel AFF-YOLO. In Section 4, we engage in a detailed discussion of our experiments and the outcomes obtained while working with the NEU-DET dataset. This section also includes an ablation study to shed further light on our findings. Finally, Section 5 summarizes the conclusions we have drawn from our experimentation and outlines future improvements.