The oceans are a massive energy reservoir and an indispensable power source for humankind's sustainable development. The rich resources in the oceans can be rationally developed and utilized, while the oceans, lakes, and rivers provide the environment for human economic production. Aquaculture is a rapidly expanding sector of the agricultural structure, contributing significantly to both the fishery economy and the national economy. It can vigorously drive the economic development of coastal areas and gradually become a pillar industry. Globally, farmed aquatic products' overall production is also rising yearly. The scale of farming and quality benefits have an impact on the aquaculture industry's financial income. Traditional low-density, low-yield aquaculture methods, which are driven by the quality and yield of aquatic products, have not been able to keep up with the demands of the marine economy's development. As a result, the aquaculture industry has incorporated new and modern information technology and intelligent technology to optimize the chain of aquaculture. However, it's crucial to use computer vision technologies to create novel aquaculture, which can achieve production modernization and automation, improve production efficiency, promote the development of the fishery economy, and save workforce and material resources.
Since 2014, deep learning-based target detection technology has advanced quickly, and its range of applications is growing, such as road surface collapse detection [1], crop pest detection [2], ship target detection [3], and the field of automatic driving [4]. The mainstream underwater target recognition technology currently includes sonar image detection [5]and underwater optical image detection [6–8]. Sonar equipment is more expensive and will cause a certain degree of noise pollution, affecting the growth and development of aquatic organisms, so the underwater target recognition technology based on sonar equipment does not apply to the field of aquaculture. Underwater optical images taken by underwater camera equipment using optical image processing technology can realize the accurate recognition of submerged targets. Underwater cameras with underwater robots and other automated equipment for underwater operations can complete the feeding and fishing of specific types of aquatic products. They will not cause pollution of the underwater environment. As technology advances, so do their precision and frame rate, which can completely satisfy the automation requirements of the aquaculture sector.
Target detection techniques built on deep learning have become increasingly popular as a result of deep learning's advancement and superior performance. Currently, classical target recognition algorithms mainly include single-stage and multi-stage approaches. Single-stage approaches include YOLO (You Only Look Once) series [9], SSD [10], and RetinaNet [11] algorithms. Single-stage target detection algorithms mean the target can be detected by extracting features only once, and their recognition speed is faster. Still, their accuracy is inferior compared to multi-stage methods. Multi-stage methods include RCNN [12], Fast RCNN [13], Faster RCNN [14], and Mask RCNN [15] algorithms. Multi-stage target detection methods lower the inference speed while increasing detection accuracy by first extracting instance bounding boxes based on the input picture and then performing secondary correction based on the candidate region to obtain the detection point results, which is suitable for the domain of higher requirements on the accuracy of detection, but does not require real-time performance. Even though prediction accuracy is still inferior to the two-stage target detection algorithms, the YOLO series' quicker inference time has made it a mainstay in the industry's marketplace.Even though prediction accuracy is still inferior to the two-stage target detection algorithms, the YOLO series' quicker inference time has made it a mainstay in the industry's marketplace. The YOLOv5 network introduces adaptive anchor box calculation and adaptive image scaling based on YOLOv4 [16]. It improves the scale of CSPNet [17] in the backbone network so that it has four scales of network structure: S, M, L, and XL, which can be applied to various fields to meet different needs. Compared with the latest YOLOv7 and YOLOv8 networks, the YOLOv5 model technology is more mature. The network structure is more stable and more robust. Especially for the research field of this paper, the model needs to be able to perform well in real underwater scenes with a certain degree of real-time, which has a higher demand for the stability and flexibility of the model, combined with the above considerations, and this paper chooses YOLOv5 as the basic network of the model. The capability of feature extraction for target detection models is crucial, and to achieve a higher level of multi-scale feature fusion, this paper introduces an improved weighted bi-directional feature pyramid network (BiFPN [18]) instead of the original PANet [19] structure in the YOLOv5 neck network.
Traditional aquatic organism identification methods can use machine learning techniques to recognize the class of marine organisms being studied. For example, classifiers such as Naive Bayes [20] (NB), Decision Tree [21] (DT), and Support Vector Machine (SVM) algorithms rely on manual selection techniques for feature extraction, i.e., selecting the relevant traits based on human subjectivity and the selection of characteristics using such methods is very subjective, insufficient, and tends to overlook essential features; thus their accuracy is limited. Several researchers have made rapid progress using deep learning to implement target detection algorithms in the last several years.
In 2020, Song et al. [22] achieved mAP values greater than 90% on a small dataset of aquatic organisms by combining the Mask R-CNN with the MSRCR technique for image augmentation. Although the detection accuracy was increased, the training period was lengthy. The same year, Han et al. [23] integrated the refined YOLOv3 algorithm into an underwater robot for real-time detection of enhanced marine creature images. Nevertheless, the method is plagued by leakage issues and is unable to identify marine species with hazy edges. Using the enhanced YOLOv4 network, Mao et al. [24] presented a model in 2021 for the detection of marine species in shallow waters. To increase detection accuracy, the YOLOv4 network's Embedded Connection (EC) component was built and integrated. This technique lowers computing work while increasing detection accuracy. Iqbal et al. [25] presented a significant end-to-end CNN in 2022 with the goal of classifying fish behavior into two groups: normal and hungry. By changing the number of layers in the fully connected layer and deciding whether to employ the maximum pooling technique, they were able to assess the CNN's performance. According to the experimental results, the detection method's accuracy may be increased by 10% to reach 98% accuracy and demonstrate high performance by including a maximum pooling function into the CNN's shallow architecture and adding three fully connected layers. For the categorization of fish species, Kaya et al. [26] developed the CNN-based model IsVoNet8, which demonstrated a 91.37% classification accuracy in 2023. The same year, Ren et al. [27] conducted a study that used LIBS and Raman spectroscopy to create a new method of fish species identification. They combined two machine learning algorithms, SVM and CNN, with Raman spectroscopic data from 13 different fish species to generate a classification model, with the proposed CNN model achieving the highest classification accuracy of 96.2%. Even if the field of underwater target recognition has made great strides in the past, there is still much space for improvement, particularly in the areas of aquatic creature detection, localization, species identification, and quantitative statistics. At the same time, underwater target recognition requires high real-time performance, so designing a fast and high-accuracy underwater target detection model is vital.
This study presents research that enhances the YOLOv5 algorithm and applies it to underwater target detection, aiming to achieve automated rearing and fishing in the aquaculture industry. The real-time classification and detection model for aquaculture that this study suggests greatly enhances the speed and accuracy of underwater target recognition, saves fishermen's time, increases the productivity of fishermen, and fosters the growth of the aquaculture sector. To summarize, our work has made the following major contributions: