Mosquito rearing:
All experiments were conducted in accordance with biosafety level 3 (BSL-3) guidelines at the insectary facility located at the CSIRO Australian Centre of Disease Preparedness (ACDP). Aedes aegypti mosquitoes were raised under controlled conditions at a temperature of 27.0°C, maintaining a relative humidity of 70% and adhering to a 12-hour light/dark cycle. Adult mosquitoes were continuously supplied with a 10% sucrose solution for sustenance.
Experimental setup:
A plexiglass enclosure (25 cm x 25 cm x 40 cm) was used to isolate Aedes aegypti mosquitoes. This enclosure featured a movable white plexiglass wall to eliminate background interference from standard netting sleeves. By positioning the white wall between the net and the recording area of the cage, we ensured that video footage was free from obstruction for accurate inspection. Video data collection was facilitated using infrared capture technology (Flea3, Point Grey Research, Canada) (Fig. 1).
Data collection:
The recording was started two weeks post-emergence. Video data of 120 minutes duration was recorded at 60 FPS, with five mosquitoes inside the cage. Despite the Flea3 camera's native resolution being 1280 pixels in width and 1024 pixels in height, adjustments were made to set the frame size at 1040 pixels in width and 1024 pixels in height. This configuration was tailored to match the dimensions of the enclosure and prevent the capture of extraneous background.
Data processing:
The majority of locomotion-related behavioural analyses only focuses on individual mosquito's behaviour [27, 28]. In this study, the recorded video was subdivided into smaller segments, with each segment corresponding to a single mosquito flight. In instances where multiple mosquitoes were in flight concurrently, individual trimming was applied to ensure that each segment captured the flight of a single mosquito. With a total of 52 flights, the recorded video data was segmented into 52 individual video segments using [29].
Training, validation and test data:
Out of the 52 flights, 11 were short in duration lasting less than 10 seconds and were consequently excluded from the dataset. For the customised training of the Yolov8 model from the remaining video segments, we opted for 6 randomly selected video segments (around 15% of the total number of remaining videos), each spanning durations between 31 to 99 seconds, to serve for training and validation datasets. The remaining 35 segments, with durations ranging from 12 to 286 seconds, were reserved for performance testing. The average duration of the mosquito's flight was 52.13 seconds. Therefore, 10 consecutive videos from average duration onward, ranging from 54 seconds to 131 seconds were selected for trajectory testing, while all 35 videos were used for occlusion analysis. We converted training and validation videos into images at 60 FPS using the Python OpenCV library to prepare the data for training and validation. Out of these images, 900 were randomly chosen for training and 100 for validation. The CNN model underwent training for 500 epochs, but due to the early stopping patient set at 50, the training halted at 267 epochs, giving no observed improvement in performance and indicating that the best performance observed was at epoch number 217. An epoch represents a complete iteration over the entire training dataset during the training process. To annotate the training and validation data, we utilised the Makesense web tool, generating .text files [30].
Ground truth calculation:
Mosquito ground truth positions were also validated through the Makesense web tool [30]. The tool allows the import of images, permitting the confirmation of object positions by navigating the cursor across the screen (Fig. 2a).
Trajectory estimation evaluation:
Evaluating how well a tool performs its function is crucial in understanding its effectiveness. The FlightTrackAI tool's trajectory evaluation performance is assessed by considering both Mean Absolute Error (MAE) and accuracy. The accuracy metric evaluates whether the detected centroids meet the specified criteria, which considers both the position of the ground truth centroids and the tolerance allowed, while the MAE provides a detailed analysis of the variations between the detected centroids and the ground truth centroids, offering insight into the precision of the FlightTrackAI tool's performance. The MAE takes into account deviations along both the x and y axes in pixels, calculated using Eq. 1. The Python library Numpy is employed to compute MAE.
$$MAE=\frac{1}{n}\sum _{f=1}^{n}|{(x}_{f }-{\widehat{x}}_{f})|+|{(y}_{f}-{\widehat{y}}_{f}\left)\right|$$
1
Where \({x}_{f }\)is x-axis estimated position in pixels, \({\widehat{x}}_{f}\) is x-axis ground truth position in pixels, \({y}_{f}\) is y-axis estimated position in pixels, \({\widehat{y}}_{f}\) is y-axis ground truth position in pixels. The value of f represents the frame number, \(MAE\) is the mean absolute error across all the frames, and n indicates the total number of frames.
In this context, accuracy refers to how closely the FlightTrackAI tool captures trajectory points compared to the trajectory points of the ground truth. A tolerance of 6 pixels is set for accuracy, indicating that estimated points within this range (absolute value on both horizontal (x-axis) and vertical axis (y-axis)) of the ground truth centroids are considered accurate (Eq. 2). This choice is grounded in the understanding that mosquitoes are not single-pixel organisms, and given their varied positions, exact centroid estimation is challenging. Moreover, there are also chances of errors while calculating exact ground truth centroids manually, further emphasising the need for a reasonable tolerance in accuracy assessment. The 6-pixel threshold is determined by randomly selecting 20 frames, taking the average of mosquito lengths in selected frames, and dividing the average by two. This value corresponds to 0.57% of the x-axis total pixels and 0.58% of the y-axis total pixels.
$$if \left\{\begin{array}{c}|{(x}_{f }-{\widehat{x}}_{f})|<6 \cap |{(y}_{f}-{\widehat{y}}_{f}\left)\right|<6 Accurate\\ else Inaccurate\end{array}\right.$$
2
Occlusion calculation:
In our analysis, occlusions were determined using Python, specifically by computing the Intersection over Union (IoU) between the bounding boxes detected by the Convolutional Neural Network (CNN) for mosquitoes in the frames of videos. A threshold of 0.25 (25%) was chosen for IoU; if the IoU value between the detected bounding boxes of mosquitoes exceeded this threshold, the mosquitoes were considered occluded. This threshold, demonstrating partial overlap, was chosen to address the possibility that selecting a high threshold could result in overlapping mosquitoes being perceived as single mosquitoes before their calculation as occluded. Furthermore, occlusions were also manually verified by watching the videos. IoU was calculated using Eq. 3, given below.
$$IoU=\frac{Area of intersection}{Area of union}$$
3
Computation and programming system:
The FlightTrackAI was developed utilising an AMD Ryzen 9 5900HX processor integrated with Radeon Graphics (3.30 GHz), and 2x16GB SO-DIMM DDR4-3200 RAM, operating within a Windows 11 Pro 64-bit environment. FlightTrackAI performs detection and tracking with the assistance of Python, Numpy, Scipy, Tensorflow, and OpenCV.
Model architecture:
FlightTrackAI uses YOLOv8 for object detection [31], Deep SORT for multi-object tracking [32], and cubic spline interpolation to fill in the missing data points. The YOLOv8 architecture employs distinct backbone and head sections to execute object detection tasks. The CSPDarknet53 serves as the backbone, utilising convolutional layers to extract key features from the input image. The SPPF layer and the following convolution layers deal with features of different sizes, and the Upsample layers make the feature maps clearer. The C2F module combines important features and contextual data to improve detection accuracy. Finally, the Head section utilises a mix of convolutional and linear layers to convert complex features into output bounding boxes and classes.
Deep SORT performs multi-object tracking using the Kalman filter and the Hungarian algorithm equipped with deep association metrics. The Kalman filter is employed for predicting the state of objects, encompassing parameters like position, velocity, and acceleration. This prediction is based on the last known state of the object, incorporating considerations for the dynamic motion of the object. Simultaneously, the Hungarian algorithm performs matching by evaluating a cost matrix, which considers both the Mahalanobis distance for motion consistency and the cosine distance for appearance similarity. Incorporating a deep association metric enables Deep SORT to maintain tracking continuity even during short periods of occlusion. Then, FlightTrackAI employs cubic spline interpolation to smoothly fill in the missing data points, if there are any. This technique involves creating a smooth curve that passes through all the points in the dataset. Cubic spline interpolation seeks to create a smooth and uninterrupted curve at every data point using a cubic polynomial sequence (Fig. 2b). FlightTrackAI uses the Python library Scipy to perform interpolation. In the end, mosquito positions' data is saved in .xlsx format and before and after interpolation trajectories are generated using the Matplotlib Python library (Fig. 2c).
Mosquito flight tracking using FlightTrackAI:
The FlightTrackAI software's user interface provides five adjustable input parameters for effective mosquito flight tracking. The model also requires a trained model, which can be selected by clicking on the "Select Model" button. A pre-trained model is included with the software. However, if custom training is desired, users can follow the straightforward instructions provided in [31] to train their own model. The adjustable input parameters include Image Size, Confidence, Intersection over Union (IoU), Max Age, and Filtration. The Image Size parameter lets users set the maximum dimension of the input image, allowing adaptability across various resolutions. Considering the largest dimension of our experimental videos, its default value is set to 1040; it can be adjusted to any required value, and then FlightTrackAI will automatically re-adjust it to the nearest acceptable value, which should be a multiple of maximum stride 32, for compatibility with the convolutional neural network's architecture. For example, in our case, it was automatically adjusted to 1056. The Confidence parameter filters out detections with confidence scores below the specified threshold, ensuring that only high-confidence detections contribute to the final results. The default value is set to a low value (0.05) to accommodate small sizes of mosquitoes. The next parameter is the IoU, which is critical for Non-Maximum Suppression (NMS) during the detection phase. It helps to eliminate redundant bounding box predictions. The default IoU threshold is set at 0.05, promoting a lenient evaluation of spatial overlap during the NMS step, which is particularly useful for detecting small and closely spaced objects.
Additionally, FlightTrackAI includes the Max Age parameter, determining how many frames a detection remains in the tracking system if undetected for a certain number of frames. This parameter is essential for accurate tracking over time, preventing the system from retaining outdated or incorrect information. The last parameter is the Filtration parameter, which is crucial as it helps eliminate identities with short durations. This parameter is essential for excluding identities resulting from reflections on plexiglass surfaces, ensuring more precise and clean final results. By default, its value is set to 5, meaning that if an identity appears for less than 5% of total frames, it will be removed from the final results.
After setting these parameters, the input folder containing the videos can be selected by clicking on the "Select Folder" button. Selecting the input folder will automatically start the processing. After processing each video, the FlightTrackAI software generates a subfolder within the 'Results' directory located inside the input folder, named after the respective video in the input folder. This subfolder stores processed video and information about the positions of mosquito axes and the trajectories before and after interpolation. Besides saving the output results in the 'Results' folder, the FlightTrackAI will also show the unprocessed input and processed output frames in the user interface during processing. The FlightTrackAI allows users to halt the operation at any point by simply pressing the stop button at the bottom. Furthermore, it ensures that any partially processed video is automatically saved along with the information about the positions of mosquito axes and the trajectories before and after interpolation. The FlightTrackAI graphical user interface can be visualised in Fig. 3.