MFC-NAS: Multifunctional Cells Based Neural Architecture Search for Plant Images Classification

doi:10.21203/rs.3.rs-4889773/v1

Download PDF

Article

MFC-NAS: Multifunctional Cells Based Neural Architecture Search for Plant Images Classification

https://doi.org/10.21203/rs.3.rs-4889773/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

To develop a high-performance convolutional neural network (CNN) model for plant image classification automatically, we propose a neural architecture search (NAS) method tailored to multifunctional cells (MFC), termed MFC-NAS. Initially, a search space based on MFC is designed, encompassing transfer cell, normal cell, pooling cell, and dropout cell, with transfer cell dedicated to exploring weight-sharing layers. Subsequently, an MFC-oriented search strategy is adopted: different shallow blocks from pre-trained models such as MobileNet V3 are searched to construct transfer cell. Similar strategies are applied to pooling cell, dropout cell, and normal cell, exploring diverse pooling types and sizes for pooling cell and various dropout rates for dropout cell. Finally, the best-found cells are stacked to form a plant image classification CNN based on MFC. Experiments conducted on two publicly available plant image datasets demonstrate that MFC-NAS achieves the optimal cells after approximately 69 GPU-hours of search. Compared to state-of-the-art (SOTA) methods like ResNet-50 and EfficientNet, this approach attains higher accuracy (~ 99.10%) with an average single-sample inference time of around 12.6 ms. Moreover, the number of network parameters used in the proposed method is only 6.9% of ResNet-50's (approximately 1.58M).

Physical sciences/Mathematics and computing/Computer science

Physical sciences/Mathematics and computing/Computational science

Neural architecture search

multifunctional cells

transfer cell

deep learning

plant images classification

Leaf diseases and weeds are obstacles to crop growth, necessitating weed removal during the seedling stage to prevent harm to crops. Precision agriculture, founded on information technology, aims to optimize agricultural inputs for maximum efficiency^[1]. Computer vision technology facilitates the automatic detection and recognition of leaf diseases and weeds^[2]. Accurate identification of plant images through this technology is critical, as it helps mitigate the inconveniences associated with manual methods for detecting and identifying leaf diseases and weeds.

Within computer vision, convolutional neural networks (CNN) stand as comprehensive models for image feature extraction and classification^[3]. Recent advancements have introduced plant image classification methods based on deep CNN architectures^[4][5]. In studies and applications concerning plant recognition and weed detection, the challenge lies in distinguishing plant species that may share similar leaf shapes and colors. CNN excel in extracting features like leaf texture and shape across various scales, facilitating plant image recognition through classification networks^[6].

Recently, there has been a trend of using manually designed CNNs for plant image classification. For example, Elnemr HA et al.^[7] introduced a new CNN architecture for classifying plant seedlings without pre-training on large datasets, resulting in a classification accuracy of 94.38%. To improve accuracy, Martinson Ofori et al.^[8] transferred weights from pre-trained models like VGG16, InceptionV3, and DenseNet121 to the backbone model for plant seedling classification, achieving higher accuracy through training on plant seedling datasets. Keshav Gupta et al.^[9] also achieved high accuracy in plant seedling classification by transferring weights from ResNet50, VGG19, and Xception to the backbone model. Additionally, to reduce the cost of manual CNN design and tuning, researchers have explored neural architecture search (NAS) methods for plant and disease classification. For instance, Arasakumaran U et al.^[10] used gray-level co-occurrence matrices to extract features from plant leaves and optimized the search using progressive neural architecture search (PNAS). Umamageswari A et al.^[11] proposed a method combining image segmentation and NAS for leaf disease classification, although these methods did not consider weight sharing.

To improve the performance of NAS in plant image classification models, a novel approach called multifunctional cells (MFC) based NAS classification method (MFC-NAS) is introduced. The key contributions of this work are outlined below:

MFC-based search space: This study defines a search space centered on MFC, comprising transfer cell, dropout cell, normal cell, and pooling cell, each serving distinct functions. The transfer cell performs hierarchical searches on various shallow layers of pre-trained models, enabling adaptive pre-training weight sharing. dropout cell optimizes model generalization through dropout rate exploration. normal cell emphasizes convolutional layers with attention mechanisms for feature extraction, while pooling cell explores different pooling layer sizes to optimize downsampling.

MFC-oriented search strategies: This work devises strategies to extract shared weights from diverse shallow layers of pre-trained models to build transfer cell. Similar approaches are employed for pooling cell, dropout cell, and normal cell, investigating various pooling types/sizes and dropout rates, respectively. The best cell configurations are then combined to construct a plant image classification CNN.

In comparative experiments with RepVGG, EfficientNet, and other SOTA methods, MFC-NAS demonstrates superior test accuracy (~ 99.10%) and the number of network parameters is about 1.58M.

Cell-based NAS. Drawing from the practice of reusing functional blocks in manually designed network architectures, Zoph et al. (2018)^[12] introduced NASNet, which incorporates a search space centered around cell units, dividing the discovered units into normal cell and reduction cell. Similarly, Liu et al.^[13] introduced differentiable architecture search (DARTS), which operates at the cell level like NASNet but with a smaller search space focused on operations along paths. Both NASNet and DARTS share a common structure for normal cell and reduction cell, using stacked cells for feature extraction and downsampling. However, while this approach reduces time overheads, its fixed structure and limited search space can constrain the diversity of discovered neural network architectures and overlook considerations for improving model generalization.

Weight sharing based NAS. The computational overhead of NAS primarily lies in performance evaluation. NASNet, proposed by Zoph et al., involves retraining the cell structures discovered during each search, leading to resource wastage in performance evaluation. To address this issue, Pham et al.^[14] proposed efficient neural architecture search (ENAS), which shares weight parameters among all sub-networks in a supernet, eliminating the need for retraining sub-networks. Instead, weight transfer and reuse accelerate training convergence, thereby avoiding additional resource consumption. In a similar vein, Cai et al. (2019)^[15] introduced the once for all method, which not only shares weights among different sub-networks but also among different operations. They employ knowledge distillation in inheriting weights from the supernet, significantly speeding up sub-network training convergence. Apart from internal network parameter transfer, external pre-trained network weight transfer can also be utilized. For instance, network-depth-based transfer learning^[16] involves reusing a pre-trained external source network's structure and parameters, transferring them to the target internal network. Furthermore, inspired by transfer learning experiments with AlexNet^[17], which showed better generalization in shallow network layers, this study considers using NAS methods to search for and share weight parameters from different shallow layers of external networks to achieve external weight sharing.

Dropout. In order to mitigate overfitting in complex neural networks, Hinton^[23] proposed a stochastic regularization algorithm known as dropout. Srivastava et al.^[24] demonstrated through classification experiments across multiple datasets that dropout effectively enhances the generalization of classification models. Traditionally, dropout rates have been manually adjusted to optimize model performance, leading to unnecessary repetitive work and increased time and labor costs. Therefore, this study introduces a dropout cell that autonomously adapts dropout rates, reducing the need for manual tuning.

MFC-NAS comprises three main stages: defining the search space, searching for optimal cells, and stacking cells, followed by testing (as depicted in Fig. 1). In the stage of defining the search space, four search space structures are designed: transfer cell, normal cell, pooling cell, and dropout cell. Among these, the transfer cell employs shallow blocks from multiple pre-trained models as searchable weight-sharing layers, enhancing the model's ability to express feature generalization and improve its generalization capability by automatically searching for suitable shallow blocks to adapt to different network structures. The normal cell is a multi-branch structure consisting of multiple block units, each including input-output nodes, candidate operations, and connection methods, with their outputs concatenated through concatenation operations. The pooling cell is a single-branch structure that incorporates various pooling operations as search candidates, formed by a single block unit containing input-output nodes and candidate operations. On the other hand, the dropout cell considers multiple dropout rates as search candidates, aiming to automatically determine suitable dropout rates to enhance the model's generalization capability.

During the optimal cell search stage, the initial step involves initializing search parameters such as search strategy, search period, and controller parameters. Subsequently, a recurrent neural network (RNN)^[25] is employed as the controller to generate sampling probabilities for candidate options of each cell, while simultaneously training the controller using policy gradient reinforcement learning to update its weights. Through probabilistic sampling, suitable candidate options are selected from the search space. Periodic training of the controller yields the optimal structures for transfer cell, normal cell, pooling cell, and dropout cell.

During the cell stacking and testing stage, the optimal transfer cell, normal cell, pooling cell, and dropout cell, derived from the search space, are combined to form the MFC-NAS model. Finally, the model undergoes training and testing, which includes comparison and ablation experiments, to validate this approach.

3.1 Transfer cell

Inspired by the observation that the shallow layers of CNNs are more adept at representing general features^[17], we incorporate different shallow layers from various pre-trained models into the weight transfer module, named transfer cell (T-cell), within the search space. This inclusion allows for the automatic selection of shallow layers from different levels of various models, facilitating the sharing of pre-trained general feature information and accelerating model convergence while enhancing generalization capabilities.

We selected five pre-trained candidate models: MobileNet V3-Large, EfficientNet, ResNet-50, Xception, and Inception-v3. As shown in Table 1, each candidate model consists of three searchable layers. For instance, the operation code 2,1 denotes the candidate model as ResNet-50, with its searchable layer being layer 2. To thoroughly explore the different shallow layers of candidate models, the searchable layers are positioned at the end of different shared layers (S_i) within the candidate models. As illustrated in Fig. 2, these shared layers comprise various shallow layers, with each shared layer (S_i) containing multiple searchable layers (L_j).

Table 1

Candidates for the T-cell search space
Candidate model	OP code
Candidate model	Searchable layer 1	Searchable layer 2	Searchable layer 3
MobileNet V3-Large	0,0	0,1	0,2
EfficientNet	1,0	1,1	1,2
ResNet-50	2,0	2,1	2,2
Xception	3,0	3,1	3,2
Inception-v3	4,0	4,1	4,2

The shared layers of the candidate model MobileNet V3-Large^[18] are outlined in Table 2, where "conv2d" represents a convolutional layer with a 3×3 kernel size and a stride of 2, and "bneck" denotes a depthwise separable convolution (DSC) layer. This model comprises three shared layers, denoted as S_i, $i \in \left\{ {1,2,3} \right\}$. For instance, S₁-L₁ refers to the first searchable layer of the first shared layer, specifically the first "bneck_3×3" layer within the "3×bneck_3×3" searchable layer, and so forth. The shared layer "MBConv1" in EfficientNet^[19] is composed of multiple "bneck" modules from MobileNet V3, as illustrated in Table 3. In ResNet-50^[20], the shared layer "conv1" is a convolutional layer with a 7×7 kernel size and a stride of 2, while "conv2_x" consists of three sets of convolution blocks, each containing 1×1 convolution, 3×3 convolution, and another 1×1 convolution, as detailed in Table 4. The shared layer in Xception^[21] (refer to Table 5) includes two 3×3 convolutions and fifteen DSC_3×3 layers. As for Inception-v3^[22], its shared layer (see Table 6) comprises six 3×3 convolutions and three Inception blocks, where an Inception block represents a convolution module for multiscale feature fusion.

Table 2

MobileNet V3-Large searchable weight sharing layers
Candidate model	Shared layer	Range of shared layer	Name	Searchable layers
MobileNet V3-Large	S₁	conv2d,3×bneck_3×3	L_i	3×bneck_3×3
	S₂	S₁, 3×bneck_5×5	L_i	3×bneck_5×5
	S₃	S₁, S₂, 3×bneck_3×3	L_i	3×bneck_3×3

Table 3

EfficientNet searchable weight sharing layers
Candidate model	Shared layer	Range of shared layer	Name	Searchable layers
EfficientNet	S₁	Conv_3×3,MBConv1(3×3), 2×bneck_3×3,1×bneck_5×5	L_i	2×bneck_3×3, 1×bneck_5×5
	S₂	S₁, 2×bneck_5×5,1×bneck_3×3	L_i	2×bneck_5×5, 1×bneck_3×3
	S₃	S₁, S₂, 3×bneck_3×3	L_i	3×bneck_3×3

Table 4

ResNe-50 searchable weight sharing layers
Candidate model	Shared layer	Range of shared layer	Name	Searchable layers
ResNet-50	S₁	conv1, conv2_x, 1×$\left[ \begin{gathered} {\text{1}} \times {\text{1}} \hfill \\ {\text{3}} \times {\text{3}} \hfill \\ {\text{1}} \times {\text{1}} \hfill \\ \end{gathered} \right]$	L_i	$\left[ \begin{gathered} {\text{1}} \times {\text{1}} \hfill \\ {\text{3}} \times {\text{3}} \hfill \\ {\text{1}} \times {\text{1}} \hfill \\ \end{gathered} \right]$
	S₂	S₁, 3×$\left[ \begin{gathered} {\text{1}} \times {\text{1}} \hfill \\ {\text{3}} \times {\text{3}} \hfill \\ {\text{1}} \times {\text{1}} \hfill \\ \end{gathered} \right]$	L_i	3×$\left[ \begin{gathered} {\text{1}} \times {\text{1}} \hfill \\ {\text{3}} \times {\text{3}} \hfill \\ {\text{1}} \times {\text{1}} \hfill \\ \end{gathered} \right]$
	S₃	S₁, S₂, 1×$\left[ \begin{gathered} {\text{1}} \times {\text{1}} \hfill \\ {\text{3}} \times {\text{3}} \hfill \\ {\text{1}} \times {\text{1}} \hfill \\ \end{gathered} \right]$	L_i	$\left[ \begin{gathered} {\text{1}} \times {\text{1}} \hfill \\ {\text{3}} \times {\text{3}} \hfill \\ {\text{1}} \times {\text{1}} \hfill \\ \end{gathered} \right]$

Table 5

Xception searchable weight sharing layers
Candidate model	Shared layer	Range of shared layer	Name	Searchable layers
Xception	S₁	2×Conv_3×3,9×DSC_3×3	L_i	3×DSC_3×3
	S₂	S₁, 3×DSC_3×3	L_i	3×DSC_3×3
	S₃	S₁, S₂, 3×DSC_3×3	L_i	3×DSC_3×3

Table 6

Inception-v3 searchable weight sharing layers
Candidate model	Shared layer	Range of shared layer	Name	Searchable layers
Inception-v3	S₁	3×conv_3×3	L_i	3×conv_3×3
	S₂	S₁, 3×conv_3×3	L_i	3×conv_3×3
	S₃	S₁, S₂, 3×Inception	L_i	3×Inception

3.2 Improved Normal cell

The structure of the normal cell (N-cell) is essentially similar to NASNet^[12]. As illustrated in Fig. 3, the N-cell is a multi-branch structure composed of N blocks, each containing 2 candidate operations (OP) followed by a Combine operation. The combine operation encompasses two searchable connection candidates, using Add or Concat. Here, H[i-1] and H[i] represent inputs from the (i-1)^th and i^th cells, respectively, while H[i + 1] denotes the output of the current (i + 1)^th cell. Unlike the NASNet search space, OP primarily includes depthwise separable convolution (DSC) and attention-based DSC as the main candidate operations. As shown in Table 7, DSC reduces parameter count while incorporating attention mechanisms aids in extracting more critical features. Table 8 outlines the candidate operations for Combine.

Table 7

Candidate operations in the N-cell
Candidate OP	OP code	Candidate OP	OP code
Identity	0	DSC 3×3 CBAM	6
Conv 1×1	1	DSC 5×5 CBAM	7
DSC 3×3	2	DSC 3×3 CA	8
DSC 5×5	3	DSC 5×5 CA	9
DSC 3×3 SE	4	Max pooling 3×3	10
DSC 5×5 SE	5	Avg pooling 3×3	11

Table 8

Combine's candidate operations
Candidate OP	OP code
Add	0
Concat	1

The diagram in Fig. 4 outlines the structure of the depthwise separable convolution (DSC) module based on attention mechanisms. This module encompasses DSC 3×3 and DSC 5×5 as candidate operations within the search space, while SE^[26], CBAM^[27], and CA^[28] are considered as candidate attention modules. The process starts with the input of the feature map into a 1×1 Convolution layer, followed by batch norm (BN) and ReLU activation (BN + ReLU). Next, the output undergoes another round of processing through a DSC 3×3 or DSC 5×5 layer, followed again by BN + ReLU. Subsequently, it enters the candidate attention module to obtain attention weights, which are then multiplied with the input to yield the weighted feature map. Finally, the module outputs the result through a convolution layer (Conv 1×1) and BN.

3.3 Pooling cell

A pooling cell (P-cell) has been introduced to dynamically reduce number of parameters. Similar to the N-cell, the P-cell is structured with block, albeit limited to only two operations: max pooling and avg pooling (as detailed in Table 9). Notably, it consists of two pooling operations followed by a Combine operation, as depicted in Fig. 5.

Table 9

Candidate operations in the P-cell
Candidate OP	OP code
Max pooling 3×3	0
Max pooling 5×5	1
Max pooling 7×7	2
Avg pooling 3×3	3
Avg pooling 5×5	4
Avg pooling 7×7	5

3.4 Dropout cell

In addition, a dropout cell (D-cell) has been devised for adaptive dropout rate selection to mitigate model overfitting. Illustrated in Fig. 6, the D-cell introduces multiple Dropouts with distinct dropout rates into the search space (refer to Table 10). This mechanism empowers the controller to dynamically choose an appropriate Dropout, culminating in a voting mechanism that selects the best prediction result, consequently enhancing the model's generalization capabilities.

Table 10

Candidate dropout rates in the D-cell
Dropout rate	OP code	Dropout rate	OP code
0.1	0	0.6	5
0.2	1	0.7	6
0.3	2	0.8	7
0.4	3	0.9	8
0.5	4

3.5 Search strategy

This study employs reinforcement learning techniques to identify the optimal cell architecture. As illustrated in Fig. 1, the process initiates with the controller estimating the sampling probabilities for each cell and conducting sampling to derive the operation codes for each cell. Subsequently, utilizing these operation codes, the MFC-NAS candidate models are constructed. Following this, training occurs using randomly selected subsets of the training data, and the controller parameters are iteratively updated via policy gradient descent based on validation accuracy, initiating subsequent search iterations until completion. The T-cell is specialized in exploring various shallow layers across different pre-trained models. The N-cell focuses on investigating candidate operations for feature extraction. The P-cell is dedicated to exploring candidate operations for downsampling. Lastly, the D-cell is designed to explore different dropout rates.

3.6 Candidate neural network architecture

The proposed MFC-NAS model is established by utilizing operation codes. As illustrated in Fig. 7, once the controller generates a comprehensive list of operation codes for all cells, cells are then constructed based on the corresponding operation codes, with each operation code list corresponding to a unique cell. Initially, the construction begins with the T-cell, whose operation code list corresponds to the candidate items in the search space of T-cell, followed by the sequential establishment of N-cell, P-cell, and D-cell. Within the candidate model, N-cell and P-cell appear in pairs, stacked L times, with the final P-cell outputting to the D-cell.

The experimental setup included Windows 10, an Intel i5 processor, an NVIDIA 3050 Ti GPU, 16 GB DDR4 RAM, and PyTorch 1.11.0. The search and training phases began with optimizing the various shared layers of the T-cell on the plant seedling dataset. Subsequently, the best cell was identified, and the resulting optimal neural network architecture was trained. During testing, ablation experiments were conducted on the plant seedling dataset, followed by performance comparisons in plant seedling and plant leaf disease classification against multiple SOTA methods.

4.1 Datasets

The experiment utilized the plant seedlings dataset (V2)^[29] and the plant village dataset^[30]. Plant seedlings dataset comprises 12 types of plant seedlings, including 3 common crops and 9 weeds (as shown in Fig. 8). Table 11 presents the category labels and quantity distribution of the 12 plant seedling types.

Plant village dataset comprises 38 types of plant leaf diseases (as shown in Fig. 9 for some examples). Table 12 only lists the category labels and quantity distribution of 12 plant leaf diseases out of 38 species. All samples underwent data augmentation processes such as scaling, random horizontal/vertical flipping, and random rotation. Additionally, the dataset was partitioned into training, validation, and testing sets in an 8:1:1 ratio. The spatial resolution of all samples was resized to 224×224.

Table 11

Distribution of number of seedlings of 12 species of plant seedlings dataset
Species	0	1	2	3	4	5	6	7	8	9	10	11
Numbers	309	452	335	713	253	538	762	257	607	274	576	463

4.2 Parameter settings

During the search phase, the N-cell consists of three blocks with a depth of two layers, and D-cell consists of 5 types of dropout. The controller is trained for 1000 epochs with a learning rate of 0.001, employing the Adam optimizer. In the training phase of the classification model, the training spans 50 epochs with a batch size of 12, utilizing a learning rate of 0.001 and the cross-entropy loss function. The optimizer employed is AdamW. Additionally, the ReduceLROnPlateau learning rate adjustment strategy from PyTorch is implemented to automatically adjust the learning rate based on changes in validation accuracy.

4.3 Evaluation indicators

This study employs accuracy (Acc), F1-score (F1), precision (PR), and recall (RC) as the evaluation metrics for experiments, as indicated by Eq. (1)–(4). Here, TP represents the count of correctly predicted positive samples, TN denotes correctly predicted negative samples, FP signifies the count of incorrectly predicted negative samples, and FN represents the number of incorrectly predicted positive samples.

$$Acc=\frac{{TP+TN}}{{TP+TN+FP+FN}}$$

$$F1=2 \times \frac{{PR \times RC}}{{PR+RC}}$$

$$PR=\frac{{TP}}{{TP+FP}}$$

$$RC=\frac{{TP}}{{TP+FN}}$$

Table 12

Distribution of the number of leaf diseases of 12 plant species of plant village dataset
Species	0	1	2	3	4	5	6	7	8	9	10	11
Numbers	630	621	275	1645	1502	854	1052	513	1192	1162	985	1180

4.4 Search for the best cell

In the initial phase of the search preparation, the impact of three shared layers of the T-cell on classification accuracy was compared to identify the most effective shared layer. Figure 10 presents the average training loss values over 500 samplings for 50 epochs across different shared layers, with each layer containing three searchable layers. As indicated in Fig. 10, the model's convergence rate is slowest with a shared layer of S₁, whereas it is fastest with a shared layer of S₃. This suggests that deeper shared layers contribute to higher classification accuracy. Hence, S₃ was chosen as the shared layer for subsequent T-cell searches.

Table 13 provides a detailed comparison of the classification performance of searchable layers when S₃ is employed as the shared layer. The top-performing T-Cells in terms of accuracy are EfficientNet (S₃-L₁, S₃-L₂) and ResNet-50 (S₃-L₃), with S₃-L₁ achieving the highest accuracy at 99.10%. Therefore, the optimal T-cell identified through the search process is EfficientNet (S₃-L₁), as depicted in Fig. 11(a).

The best N-cell identified is shown in Fig. 11(b), featuring operations like DSC 5×5 CBMA, DSC 5×5 CA, DSC 5×5 SE, and max pooling 3×3. Notably, the best N-cell favors DSC operations enhanced with attention mechanisms over traditional DSC operations.

Figure 11(c) demonstrates the best P-cell, characterized by avg pooling 7×7 and max pooling 5×5 operations. Meanwhile, Fig. 11(d) reveals the preferred dropout rates of the best D-cell as 0.9, 0.6, 0.1, 0.8, and 0.5. Searching for the best cells costs a total of ~ 69 GPU-hours.

Table 13

Experimental results for different searchable layers of shared layer S₃
Shared layer	Model	Searchable layers	Accuracy(%)	F1(%)	Precision(%)	Recall(%)
S₃	EfficientNet	L₁	99.10	99.20	99.28	99.13
	EfficientNet	L₂	98.75	98.83	98.79	98.88
	ResNet-50	L₃	98.74	98.83	99.21	98.61

4.5 Ablation experiment

Subsequently, the best cell configuration was applied to stack the plant classification neural network as depicted in Fig. 7, with L set to 2. To assess the efficacy of N-cell's attention module alongside T-cell and D-cell, ablation experiments were conducted on plant seedlings dataset. Table 14 illustrates that the introduction of the Attention module resulted in a marginal increase in both accuracy and F1-score. Further incorporation of D-cell on top of the Attention module led to an enhancement of approximately 0.8% in both accuracy and F1-score. Moreover, the additional inclusion of T-cell resulted in a substantial improvement of approximately 3% in both accuracy and F1-score. These results highlight the beneficial impact of the proposed versatile cells on classification performance, with T-cell exhibiting the most pronounced effect.

4.6 Classification results

During the testing phase, the stacked plant classification neural network was compared against SOTA methods on both plant seedlings dataset and plant village dataset. Table 15 presents the classification performance comparison of MFC-NAS with Inception-v3^[31], AgroAVNET^[32], R-CNN-FPN^[33], ResNet-50^[34], EfficientNet-B0^[35], GA-CNN^[36], and others on plant seedlings dataset. GA-CNN is an NAS method, while others are manually designed CNNs. From Table 15, it is evident that MFC-NAS achieves the highest accuracy and F1-score among the compared methods. GA-CNN, ranking second in accuracy, utilizes a genetic algorithm to automatically design CNN architectures, encoding multiple pre-trained models as feature extraction modules. However, due to sharing all feature extraction layer information of the pre-trained models, it lacks adaptation to different shared layers and shallow layers. EfficientNet-B0, ranking third, relies heavily on handcrafted design experience for transferring and sharing pre-trained parameters. Regarding computational performance, MFC-NAS exhibits minimal parameters and FLOPs (1.58M and 0.32G, respectively), thanks to a more refined shared layer search and the introduction of P-cell. The average single-sample inference time is approximately 12.6ms.

Furthermore, the best neural network architecture found by MFC-NAS on plant seedlings dataset was transferred to the plant village dataset classification task. Table 16 compares the experimental results of MFC-NAS with methods such as Transformer^[44], LDC-CNN^[45], 9L-CNN^[46] and CDRM^[47] on plant village dataset, which contains images of plant leaf diseases. From Table 16, it is observed that MFC-NAS achieves the highest F1-score and precision, although its accuracy is slightly lower than that of CDRM. CDRM also employs a parameter sharing mechanism (sharing pre-trained MobileNet-V2 parameters) and introduces location-wise soft attention, but it is a manually designed neural network. CNN, ranking third in accuracy, uses six data augmentation methods to enhance model performance but is also a manually designed neural network. Compared to CNN, MFC-NAS exhibits only a slight decrease in recall. When compared to the CNN ranked fourth in accuracy, MFC-NAS performs best across all performance metrics. While, Transformer has the fewest parameters, its classification accuracy is lower. In contrast, MFC-NAS, with a slightly increased model’s number of parameters, achieves a 7.62% improvement in accuracy. This indicates that the best neural network architecture discovered by MFC-NAS and transferred to the plant village dataset classification task demonstrates good generalization ability and exhibits good classification performance, showcasing the method's strong adaptability.

Table 15

Comparison of MFC-NAS classification performance on plant seedlings dataset
Methods	Accuracy/%	F1-score/ %	Precision/%	Recall/%	Parameters/ M
Inception-v3^[31]	90.14	-	-	-	21.86	2.85
AgroAVNET^[32]	93.64	93.00	93.00	94.00	-	-
R-CNN-FPN^[33]	95.61	91.24	-	87.26	-	-
ResNet-50^[34]	96.21	95.42	95.25	95.83	23.59	4.13
EfficientNet-B0^[35]	96.52	96.26	-	-	4.06	0.41
GA-CNN^[36]	97.74	97.83	97.95	97.74	-	-
MFC-NAS	99.10	99.20	99.28	99.13	1.58	0.32
Bold indicates the best indicator

Table 16

Comparison of MFC-NAS classification performance on plant village dataset
Methods	Accuracy/ %	F1-score/ %	Precision/ %	Recall/ %	Parameters/ M	FLOPs/ G
Transformer^[44]	91.92	-	-	-	0.53	-
LDC-CNN ^[45]	96.88	95.34	93.78	94.91	6.08	-
9L-CNN^[46]	96.46	98.15	96.47	99.89	-	-
CDRM^[47]	99.71	98.56	-	98.56	-	-
MFC-NAS	99.54	99.30	99.28	99.42	1.58	0.32
Bold indicates the best indicator

The paper introduces a NAS method based on multifunctional cells and applies it to plant image classification. MFC-NAS devises four types of cells for automatic CNN construction, among which the T-cell meticulously searches shareable pre-trained model layers. The N-cell incorporates convolutional blocks based on attention mechanisms for more accurate feature extraction. Both the P-cell and D-cell adaptively select downsampling methods and dropout rates. The CNN built and searched by MFC-NAS on plant seedlings dataset accurately classify plant images with a relatively small parameter count. Its performance in the plant village dataset classification task demonstrates its good generalization ability. Compared to SOTA methods in plant image classification, MFC-NAS exhibits competitive and advanced capabilities. Future research will focus on refining search strategies to further optimize search time.

Competing Interests. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Author contributions statement. TJY and LH designed the study; XQ and TJY performed the study, analyzed the data, and interpreted the data; TJY and XQ wrote the manuscript; TJY and LH reviewed the manuscript. All authors contributed to the article and approved the submitted version.

Ethical and informed consent for data used. In this study, we used public benchmark dataset.

Acknowledgment. This research was supported in part by the National Natural Science Foundation of China (62166012), the Guangxi Natural Science Foundation (2022GXNSFAA035644).

Data Availability. The author has used third party data and therefore do not own the data, and those benchmark datasets are public.

Sharma, A., Sharma, A., Tselykh, A., Bozhenyuk, A., Choudhury, T., Alomar, M.A., Sánchez-Chero, M.J.: Artificial intelligence and internet of things oriented sustainable precision farming: Towards modern agriculture. Open Life Sciences 18, 20220713 (2023)
Kamath, R., Balachandra, M., Prabhu, S.: Paddy Crop and Weed Discrimination: A Multiple Classifier System Approach. International Journal of Agronomy 2020, 1-14 (2020)
Wang, W., Yang, Y., Wang, X., Wang, W., Li, J.: Development of convolutional neural network and its application in image classification: a survey. Optical Engineering 58, 040901 (2019)
Nagaraju, M., Chawla, P.: Systematic review of deep learning techniques in plant disease detection. International Journal of System Assurance Engineering and Management 11, 547-560 (2020)
Joseph, D., Pawar, P.M., Pramanik, R.: Intelligent plant disease diagnosis using convolutional neural network: a review. Multimedia Tools and Applications 82, 21415-21481 (2022)
Weng, Y., Zeng, R., Wu, C., Wang, M., Wang, X., Liu, Y.-J.: A survey on deep-learning-based plant phenotype research in agriculture. SCIENTIA SINICA Vitae 49, 698-716 (2019)
A, E.H.: Convolutional Neural Network Architecture for Plant Seedling Classification. International Journal of Advanced Computer Science and Applications, 10, 319–325 (2019)
Ofori, M.Q., El-Gayar, O.F.: Towards Deep Learning for Weed Detection: Deep Convolutional Neural Network Architectures for Plant Seedling Classification. In: Americas Conference on Information Systems, pp. 10–14. (2020)
Gupta, K., Rani, R., Bahia, N.K.: Plant-Seedling Classification Using Transfer Learning-Based Deep Convolutional Neural Networks. Int. J. Agric. Environ. Inf. Syst. 11, 25-40 (2020)
Arasakumaran, U., Johnson, S.D., Sara, D., Kothandaraman, R.: An Enhanced Identification and Classification Algorithm for Plant Leaf Diseases Based on Deep Learning. Traitement du Signal 39, 1013-1018 (2022)
Umamageswari, A., Bharathiraja, N., Irene, D.S.: A Novel Fuzzy C-Means based Chameleon Swarm Algorithm for Segmentation and Progressive Neural Architecture Search for Plant Disease Classification. ICT Express 9, 160-167 (2021)
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning Transferable Architectures for Scalable Image Recognition. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8697-8710 (2017)
Liu, H., Simonyan, K., Yang, Y.: DARTS: Differentiable Architecture Search. abs/1806.09055 (2018)
Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient Neural Architecture Search via Parameter Sharing. abs/1802.03268 (2018)
Cai, H., Gan, C., Han, S.: Once for All: Train One Network and Specialize it for Efficient Deployment. abs/1908.09791 (2019)
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A Survey on Deep Transfer Learning. In: International Conference on Artificial Neural Networks, pp. 270–279. (2018)
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? abs/1411.1792 (2014)
Howard, A.G., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., Le, Q.V., Adam, H.: Searching for MobileNetV3. In: IEEE/CVF International Conference on Computer Vision, pp. 1314-1324. (2019)
Tan, M., Le, Q.V.: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. abs/1905.11946 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 770-778. (2015)
Chollet, F.: Xception: Deep Learning with Depthwise Separable Convolutions. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 1800-1807. (2016)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the Inception Architecture for Computer Vision. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 2818-2826. (2015)
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. abs/1207.0580 (2012)
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929-1958 (2014)
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent Neural Network Regularization. abs/1409.2329 (2014)
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-Excitation Networks. In: IEEE/CVF Conference on Computer Vision Pattern Recognition, pp. 7132-7141. (2017)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.-S.: CBAM: Convolutional Block Attention Module. abs/1807.06521 (2018)
Hou, Q., Zhou, D., Feng, J.: Coordinate Attention for Efficient Mobile Network Design. In: IEEE/CVF Conference on Computer Vision Pattern Recognition, pp. 13708-13717. (2021)
Giselsson, T.M., Jørgensen, R.N., Jensen, P.K., Dyrmann, M., Midtiby, H.S.: A Public Image Database for Benchmark of Plant Seedling Classification Algorithms. abs/1711.05458 (2017)
Hughes, D.P., Salathé, M.: An open access repository of images on plant health to enable the development of mobile disease diagnostics through machine learning and crowdsourcing. abs/1511.08060 (2015)
Mishra, N., Jahan, I., Nadeem, M.R., Sharma, V.: A Comparative Study of ResNet50, EfficientNetB7, InceptionV3, VGG16 models in Crop and Weed classification. In: 4th International Conference on Intelligent Engineering Management, pp. 1-5. (2023)
Chavan, T.R., Nandedkar, A.V.: AgroAVNET for crops and weeds classification: A step forward in automatic farming. Comput. Electron. Agric. 154, 361-372 (2018)
Mu, Y., Feng, R., Ni, R., Li, J., Luo, T., Liu, T., Li, X., Gong, H., Guo, Y., Sun, Y., Bao, Y., Li, S., Wang, Y., Hu, T.: A Faster R-CNN-Based Model for the Identification of Weed Seedling. Agronomy 12, 2867 (2022)
Rahman, N.R., Hasan, M.A.M., Shin, J.: Performance Comparison of Different Convolutional Neural Network Architectures for Plant Seedling Classification. In: 2nd International Conference on Advanced Information Communication Technology, pp. 146-150. (2020)
Makanapura, N., Sujatha, C., Patil, P.R., Desai, P.: Classification of plant seedlings using deep convolutional neural network architectures. In: Journal of Physics: Conference Series, Vol. 2161, pp. 012006. (2022)
Aliouat, W., Badis, L., Bouchiba, K.: Metaheuristic-based automated design of Convolutional Neural Network architecture for plant seedlings classification. In: 5th International Conference on Pattern Analysis Intelligent Systems, pp. 1-8. (2023)
Reddy, T.V., Rekha, D.K.S.: Deep Leaf Disease Prediction Framework (DLDPF) with Transfer Learning for Automatic Leaf Disease Detection. In: 5th International Conference on Computing Methodologies Communication, pp. 1408-1415. (2021)
Agarwal, M., Gupta, S.K., Biswas, K.K.: Grape Disease Identification Using Convolution Neural Network. In: 23rd International Computer Science Engineering Conference, pp. 224-229. (2019)
Sardoğan, M., Tuncer, A., Ozen, Y.: Plant Leaf Disease Detection and Classification Based on CNN with LVQ Algorithm. In: 3rd International Conference on Computer Science Engineering, pp. 382-385. (2018)
Agarwal, M., Singh, A., Arjaria, S.K., Sinha, A., Gupta, S.K.: ToLeD: Tomato Leaf Disease Detection using Convolution Neural Network. Procedia Computer Science 167, 293–301 (2020)
Thangaraj, R., Pandiyan, P., Anandamurugan, S., Rajendar, S.: A deep convolution neural network model based on feature concatenation approach for classification of tomato leaf disease. Multim. Tools Appl. 83, 18803-18827 (2023)
Chowdhury, M.E.H., Rahman, T., Khandakar, A.A., Ayari, M.A., Khan, A., Khan, M.S., Al-Emadi, N.A., Reaz, M.B.I., Islam, M.T., Ali, S.H.M.: Automatic and Reliable Leaf Disease Detection Using Deep Learning Techniques. Agri Engineering 3, 294-312 (2021)
T, A., Murugaiyan, J.S.M.: Identification of Tomato Leaf Disease Detection using Pretrained Deep Convolutional Neural Network Models. Scalable Comput. Pract. Exp. 21, 625-635 (2020)
DeRieux, A.C., Saad, W., Zuo, W., Budiarto, R., Koerniawan, M.D., Novitasari, D.J.A.: A Transformer Framework for Data Fusion and Multi-Task Learning in Smart Cities. abs/2211.10506 (2022)
Gajjar, R., Gajjar, N.P., Thakor, V.J., Patel, N.P., Ruparelia, S.: Real-time detection and identification of plant leaf diseases using convolutional neural networks on an embedded platform. The Visual Computer 38, 2923-2938 (2021)
G., G., J., A.P.: Identification of plant leaf diseases using a nine-layer deep convolutional neural network. Comput. Electr. Eng. 76, 323-338 (2019)
Chen, J., Zhang, D., Suzauddola, M., Zeb, A.: Identifying crop diseases using attention embedded MobileNet-V2 model. Appl. Soft Comput. 113, 107901 (2021)

No competing interests reported.

Download PDF

Editor invited by journal
22 Aug, 2024
Submission checks completed at journal
20 Aug, 2024
First submitted to journal
09 Aug, 2024

You are reading this latest preprint version

MFC-NAS: Multifunctional Cells Based Neural Architecture Search for Plant Images Classification

Status:

Version 1

Abstract

Figures

1 Introduction

2 Related work

3 Methods

3.1 Transfer cell

3.2 Improved Normal cell

3.3 Pooling cell

3.4 Dropout cell

3.5 Search strategy

3.6 Candidate neural network architecture

4 Experiments

4.1 Datasets

4.2 Parameter settings

4.3 Evaluation indicators

4.4 Search for the best cell

4.5 Ablation experiment

4.6 Classification results

5 Conclusions

Declarations

References

Additional Declarations

Status:

Version 1

Candidate model	Shared layer	Range of shared layer	Name	Searchable layers
ResNet-50	S₁	conv1, conv2_x, 1×\(\left[ \begin{gathered} {\text{1}} \times {\text{1}} \hfill \\ {\text{3}} \times {\text{3}} \hfill \\ {\text{1}} \times {\text{1}} \hfill \\ \end{gathered} \right]\)	L_i	\(\left[ \begin{gathered} {\text{1}} \times {\text{1}} \hfill \\ {\text{3}} \times {\text{3}} \hfill \\ {\text{1}} \times {\text{1}} \hfill \\ \end{gathered} \right]\)
	S₂	S₁, 3×\(\left[ \begin{gathered} {\text{1}} \times {\text{1}} \hfill \\ {\text{3}} \times {\text{3}} \hfill \\ {\text{1}} \times {\text{1}} \hfill \\ \end{gathered} \right]\)	L_i	3×\(\left[ \begin{gathered} {\text{1}} \times {\text{1}} \hfill \\ {\text{3}} \times {\text{3}} \hfill \\ {\text{1}} \times {\text{1}} \hfill \\ \end{gathered} \right]\)
	S₃	S₁, S₂, 1×\(\left[ \begin{gathered} {\text{1}} \times {\text{1}} \hfill \\ {\text{3}} \times {\text{3}} \hfill \\ {\text{1}} \times {\text{1}} \hfill \\ \end{gathered} \right]\)	L_i	\(\left[ \begin{gathered} {\text{1}} \times {\text{1}} \hfill \\ {\text{3}} \times {\text{3}} \hfill \\ {\text{1}} \times {\text{1}} \hfill \\ \end{gathered} \right]\)