In the realm of agricultural disease detection, the utilization of Internet of Things (IoT) devices, particularly cameras, holds significant importance in the collection of visual data for assessing plant health. The following elucidates the application of IoT cameras in agricultural data collection:
Cameras are strategically positioned on fixed mounts at specified locations within the field to facilitate data gathering. The placement of these cameras is contingent upon factors such as field size, crop type, and the targeted monitoring area. To cover expansive regions or specific points of interest within the crop field, multiple cameras may be deployed.
RGB cameras capture images of tomato plants within the visible spectrum, offering crucial color information for visual inspection and disease identification based on color variations, lesions, or anomalies on plant leaves. The cameras can be configured for continuous monitoring, capturing images at predetermined intervals (e.g., daily). Subsequently, the acquired images and relevant data are wirelessly transmitted to a cloud server through a LoRa gateway equipped with a Raspberry Pi. Upon transmission to the cloud, tomato plant images undergo a sophisticated segmentation process via Mask R-CNN, a technique prominently featured in the work of (Yang et al., 2020). This model is chosen for its exceptional precision in separating objects from complex backgrounds, particularly beneficial for scenes involving tomato plants against challenging backgrounds.
Mask R-CNN is employed for the purpose of segmenting the leaf image from the overall plant image. This deep learning segmentation algorithm, known as Mask R-CNN, finds its utility in a wide range of applications, particularly in instance segmentation. It is an extension of Faster R-CNN and excels in both object detection and pixel-level segmentation, making it well-suited for tasks involving the dissection of overlapping plant leaves. The initial step in this process entails the utilization of a pre-trained "backbone network" such as VGG16 or ResNet, which extracts feature maps of high quality from the entire image. Subsequently, the "Region Proposal Network (RPN)" acts as a scout, generating potential object locations (bounding boxes) for further investigation. Following this, the "ROI Pooling" stage isolates regions of fixed size from the feature maps, thereby directing the analysis towards each proposal. This prepares the proposals for the final step, wherein the "Mask Branch" takes center stage and employs a separate neural network to predict a segmentation mask for each proposal. This intricate mask meticulously delineates the precise shape of each individual leaf, even in the presence of overlapping foliage. Concurrently, the "Classification Branch" identifies the object class associated with each proposal. This dual functionality provides both accurate segmentation and classification, thereby proving invaluable for the analysis of plant leaves. The strengths of Mask R-CNN lie in its ability to accurately segment individual objects, even in the presence of complex backgrounds; its efficiency in leveraging pre-trained models for feature extraction; and its versatility in handling diverse datasets containing multiple leaf species and overlapping structures. However, like any demanding concerto, Mask R-CNN has certain requirements. Large datasets are crucial for achieving optimal performance, and the training process can be computationally expensive, particularly for large datasets. Additionally, fine-tuning the hyperparameters and selecting the appropriate model are essential for achieving peak performance. Mask R-CNN serves as a powerful tool for unraveling the hidden secrets within plant images. With its precise segmentation and classification capabilities for individual leaves, it proves to be a valuable asset for various tasks related to plant leaf classification and analysis. Therefore, in this work, Mask R-CNN plays a pivotal role in the segmentation of the leaf from the plant image before subjecting it to classification.Through this approach, the image is effectively segmented into distinct regions, laying the groundwork for subsequent analyses and applications in plant classification and monitoring. The segmented tomato leaf image is further employed for disease classification.
For disease classification, pretrained deep learning models, including GoogLeNet, SqueezeNet, and ResNet-50, are utilized, and their classification efficiency is compared. The best-performing model among the three, ResNet-50, is selected for deployment on the cloud server to classify the images. The Tomato leaf disease detection dataset, comprising 3890 images (1926 healthy and 1974 diseased), is employed to train and test the models. The entire dataset was split in the experiment in an 80:20 ratio between training, testing, and validation data.
Comparative analysis highlights the remarkable superiority of ResNet-50 in terms of classification accuracy and other performance metrics. The classified information is promptly updated in the cloud, and an alert mechanism is implemented to notify farmers on their smart devices. This provides real-time insights into detected diseases, their locations, and recommended proactive measures. The selection of ResNet-50 as the deep learning model is justified by its outstanding performance, ensuring accurate and effective detection of tomato leaf diseases in agricultural settings.
Figure 1 provides direction for creating an automated solution that combines DL and IoT methods to help farmers practice smart farming.
Conventionally, the IoT architecture comprises five layers (Indira et al., 2023): a physical layer, network layer, middleware layer, processing layer, and application layer. In our proposed work, we incorporate an IoT layer architecture as
1. Device Layer:
Sensor Nodes are Deployed in the field, these sensor nodes consist of various sensors like IR cameras, Visual Cameras and other relevant sensors. Each sensor node is equipped with a Raspberry Pi board acting as an IoT gateway, capable of processing and transmitting data. LoRa Transceivers are Connected to the Raspberry Pi and it facilitate long-range wireless communication for transmitting sensor data. These transceivers transmit data wirelessly from the sensor nodes to the gateway.
2. Connectivity Layer:
A central LoRa gateway equipped with a Raspberry Pi collects data transmitted by sensor nodes using LoRa transceivers. LoRaWAN or other suitable communication protocols are used for efficient data transmission between sensor nodes and the gateway.
3. Network Layer:
Connectivity Backbone layer establishes the connection between the LoRa gateway and the cloud or central server. Utilizes suitable network protocols and infrastructure to ensure reliable data transfer from the gateway to the cloud.
4. Cloud Layer:
Cloud Server Receives and processes data transmitted by the gateway. Storage Systems which store the incoming data in databases or data lakes for further analysis and processing. Initial data processing occurs here, including data cleaning, normalization, and pre-processing for use in the Deep Learning model.
5. Application Layer:
Deep learning trained model for Leaf disease detection is deployed and operated in this layer. These model analyze the pre-processed data to detect pests. Based on the analysis from the Deep Learning model, decisions are made regarding the presence of pests or potential threats. If diseases are detected, alerts or notifications are generated for the concerned stakeholders through various means of mobile applications. Mobile application also consists of various other modules along with disease detection.
- Although the health of a crop is estimated through field monitoring and information collected by other devices, such as IR cameras, visual cameras, etc., DL algorithms can operate on such live data sets obtained by IoT devices (Patil et al., 2021). Early detection and examination of these problems is highly beneficial for crop disease prevention. Together, IoT and DL techniques enable automated pest identification and alerts or warnings.
- Here, we offer three distinct deep learning models that are appropriate for identifying leaf disease in tomato plants.
GoogLeNet/Inception
The GoogLeNet architecture was introduced in 2014 (Team, O.T.R, 2021) as a response to the resource utilization and parallel processing challenges faced by the AlexNet model. This architecture consists of 22 layers, with 21 convolutional layers and 1 fully connected layer. It incorporates batch normalization to address the issue of vanishing gradients and has four million trainable parameters. The softmax activation function is utilized for multiclass classification, with 10 output units. GoogLeNet has proven to be effective, achieving a top-five test error rate of 6.67% in the ILSVRC-2014 competition, which is 8.63% lower than AlexNet. Notably, this model replaces fully connected layers with global average pooling, and its performance can be improved by increasing its limit of divergence.
SqueezeNet
In 2017 (Iandola et al., 2016), the SqueezeNet model was introduced with the goal of developing a CNN model that has low computational complexity while maintaining high accuracy. This model uses 1 × 1 filters instead of 3 × 3 filters and is composed of fire modules and pooling layers. Each fire module includes a squeeze layer to reduce network depth and an expanded layer to increase it. Both layers have feature maps of the same size. SqueezeNet has 1.25 million trainable parameters and is claimed to be 50 times shallower than AlexNet. The addition of a bypass connection to the network resulted in a 2.9% increase in top-one accuracy and a 2.2% increase in top-five accuracy.
ResNet
To address the challenges of training deeper CNNs, which require extensive datasets and prolonged training times, researchers introduced residual networks in 2015 (He et al., 2016). ResNet has variations of 50, 101, or 152 layers, with the number of trainable parameters varying accordingly. ResNet employs residual functions instead of traditional learning methods to address the vanishing gradient problem. The softmax layer is integrated into ResNet for plant disease identification. However, these models require extended training periods, which can be challenging for real-time applications that require quick decision-making.
ResNet-50, which consists of 50 deep layers organized into five stages comprising convolution and identity blocks, serves as a backbone for computer vision tasks. ResNet introduces the concept of stacking convolution layers with skip connections to address the vanishing gradient issue. By bypassing the original input to reach the output, ResNet mitigates gradient problems. Placing the skip connection before the activation function further helps to minimize gradient issues, acknowledging that deeper models may encounter more errors. The shortcut connections in the residual neural network are based on identity mapping, providing a solution for these challenges.
Let x represent the input image, H(x) the residual mapping, and F(x) the mappings that suit the nonlinear layers. The residual mapping formula can therefore be expressed as follows:
H(x) = F(x) + x (1) ----------------(1)
Convolution is an identification block in ResNet-50. Each identity block consists of three convolutional layers and about 23 million trainable parameters. The two matrices, input x and shortcut x, can only be joined when the output dimension from a shortcut and the convolution layer—post-convolution and batch normalization—are the same. If not, shortcut x must pass via a convolution layer and batch normalisation in order to ensure dimensional alignment.
Table 1 illustrates typical configurations for each model. Variants may possess distinct structures and performance metrics. SqueezeNet employs fire modules, which merge 1x1 and 3x3 convolutional layers to achieve efficient feature extraction. GoogLeNet's inception modules facilitate parallel processing of various filter sizes, thereby enhancing flexibility. ResNet's residual blocks tackle the issue of the vanishing gradient by introducing shortcut connections, enabling the creation of deeper networks.
The selection of a model is contingent upon specific requirements:
Feature
|
SqueezeNet
|
GoogLeNet
|
ResNet-50
|
Overall Architecture
|
Fire modules with squeeze and expand paths
|
Inception modules with parallel convolutional layers
|
Residual blocks with shortcut connections
|
Layers (Total)
|
53
|
22
|
174
|
Convolutional Layers
|
18 (within fire modules)
|
22 (within inception modules)
|
164 (within residual blocks)
|
Pooling Layers
|
4 Max Pooling
|
5 Max Pooling, 2 Global Average Pooling
|
3 Max Pooling
|
Batch Norm Layers
|
8
|
2, 2 Global Average Pooling
|
0- (implicitly through residual connections)
|
Fully Connected Layers
|
1
|
1
|
1
|
Other Layers
|
Dropout
|
-
|
Activation functions (ReLU, Sigmoid)
|
Learnable Parameters
|
~736,450
|
~6 million
|
~23.5 million
|
Image Input Size
|
227x227
|
224x224
|
224x224
|
Target Outputs
|
1000 classes (ImageNet)
|
1000 classes (ImageNet)
|
1000 classes (ImageNet)
|
Advantages
|
Very lightweight, computationally efficient
|
Good accuracy, innovative architecture
|
High accuracy, addresses vanishing gradient problem
|
Disadvantages
|
Lower accuracy than other models
|
Complex structure, computationally expensive
|
Large number of parameters, requires more training data
|
Accuracy: ResNet50 generally attains the highest level of accuracy. Computational Cost: Squeeze Net is notably more lightweight and exhibits faster performance. Training Data: ResNet50 necessitates a larger amount of data due to its larger parameter count. Implementation Complexity: GoogLeNet's inception modules may present more intricate challenges when implementing in real time.
Hence, we have opted for ResNet50 as it delivers satisfactory accuracy and is comparatively easier to implement in real-time scenarios.