Comparison of Original and Annotated Data
Disease detection models were developed individually for original field data as well as annotated data to evaluate the influence of heterogeneous field data on model performance.
Table 1 shows the Classification Accuracy (CA), True Positive Rate (TPR) and False Positive Rate (FPR) for the detection of Esca leaf symptoms for individual years. Regarding original data, TPRs of 70 to 82 % per year were achieved depending on the camera, meaning that an acceptable amount of pixels was correctly identified. However, 20 to 34 % of the pixels were falsely classified as positives. This results in CAs of 73, 70, and 77 % (VNIR) and 73, 81, and 80 % (SWIR) for the years 2016, 2017, and 2018, respectively. When only symptomatic leaves were considered, results improved significantly. TPRs of up to 100 % could be reached with satisfying FPRs of 0 to 11 %, leading to CAs between 88 and 95 %. Since no differences could be detected between the VNIR and SWIR range neither for original nor annotated data, both wavelength ranges seem to be suitable for the differentiation tasks.
Table 1: Results for the detection of Esca leaf symptoms using original field data and annotated data.
|
|
|
|
VNIR
|
|
|
SWIR
|
|
|
|
|
2016
|
2017
|
2018
|
2016
|
2017
|
2018
|
Modeling
|
Original Data
|
CA (%)
|
73 ± 2
|
70 ± 2
|
77 ± 2
|
73 ± 2
|
81 ± 2
|
80 ± 2
|
TPR (%)
|
71 ± 2
|
73 ± 2
|
72 ± 2
|
70 ± 10
|
82 ± 5
|
74 ± 10
|
FPR (%)
|
29 ± 2
|
32 ± 2
|
22 ± 2
|
34 ± 9
|
26 ± 4
|
20 ± 7
|
Annotated Data
|
CA (%)
|
92 ± 1
|
90 ± 1
|
94 ± 1
|
88 ± 1
|
95 ± 1
|
92 ± 1
|
TPR (%)
|
89 ± 1
|
90 ± 1
|
93 ± 1
|
86 ± 4
|
90 ± 1
|
100 ± 1
|
FPR (%)
|
0 ± 1
|
11 ± 1
|
5 ± 1
|
2 ± 7
|
6 ± 1
|
5 ± 1
|
Application per Plant
|
Original Data
|
CA (%)
|
81
|
73
|
88
|
74
|
84
|
95
|
TPR (%)
|
79
|
76
|
86
|
63
|
80
|
86
|
FPR (%)
|
19
|
27
|
12
|
23
|
16
|
5
|
Annotated Data
|
CA (%)
|
78
|
75
|
91
|
79
|
91
|
90
|
TPR (%)
|
58
|
71
|
71
|
60
|
60
|
71
|
FPR (%)
|
17
|
25
|
72
|
21
|
8
|
8
|
CA = classification accuracy, TPR = true-positive rate, FPR = false-positive rate
The model development is based on an evaluation of all pixels not considering spatial scales. In a next step, the detection models were therefore applied on plant scale. For these evaluations, a majority voting of pixel results can be used to derive a prediction of the symptom status per plant. The number of false positives can be manually adjusted, thereby, influencing the number of true positives. For the per plant evaluation, FPR was set between 5 and 27 % in order to achieve the best TPR. Both models showed good CAs of up to 95 %, when applied on plant level, but the model developed with original field data performed better, which is indicated by higher TPRs (see Table 1).
Testing the Transferability of Disease Detection Models
In order to test the suitability of disease detection models for practical application, models were developed using two-year hyperspectral data. These models were then applied to the third year, which was not included in the model development. Thereby, the transferability of such disease detection models to unknown data could be examined.
Performances of the different models using original field data as well as annotated data are shown in Table 2. The development of disease detection models for two years showed moderate CAs that were comparable to the models developed on one experimental year. Again, the model using annotated data performed better. For the original field data, TPRs of 71, 73, and 72 % (VNIR) and 79, 67, and 79 % (SWIR) were achieved for the three models, respectively, with FPRs of 29, 32, and 22 % (VNIR) and 36, 25, and 25 % (SWIR). This results in CAs of 68, 72, and 72 % for VNIR and 72, 73, and 79 % for SWIR. Regarding annotated data, higher performances could be obtained; 86 to 90 % of all pixels were correctly identified as symptomatic in the VNIR range and 82 to 90 % in the SWIR range. Only 6 to 18 % of all pixels were falsely classified as symptomatic, leading to CAs of 86 to 93 % depending on the model and camera.
Table 2: Results for the transferability evaluation of disease detection models.
|
|
|
|
VNIR
|
|
|
SWIR
|
|
|
|
|
2016/17
|
2016/18
|
2017/18
|
2016/17
|
2016/18
|
2017/18
|
Modeling
|
Original Data
|
CA (%)
|
68 ± 1
|
72 ± 1
|
72 ± 1
|
72 ± 1
|
73 ± 1
|
79 ± 1
|
TPR (%)
|
71 ± 4
|
73 ± 3
|
72 ± 4
|
79 ± 8
|
67 ± 5
|
79 ± 4
|
FPR (%)
|
29 ± 4
|
32 ± 3
|
22 ± 4
|
36 ± 7
|
25 ± 4
|
25 ± 5
|
Annotated Data
|
CA (%)
|
86 ± 2
|
89 ± 1
|
89 ± 0
|
86 ± 1
|
86 ± 3
|
93 ± 0
|
TPR (%)
|
86 ± 3
|
89 ± 3
|
90 ± 1
|
83 ± 3
|
82 ± 4
|
90 ± 2
|
FPR (%)
|
18 ± 4
|
12 ± 1
|
10 ± 1
|
14 ± 3
|
13 ± 2
|
6 ± 2
|
Application on third year
|
Original Data
|
CA (%)
|
63
|
62
|
57
|
57
|
62
|
52
|
TPR (%)
|
61
|
61
|
47
|
38
|
71
|
51
|
FPR (%)
|
19
|
43
|
68
|
35
|
30
|
72
|
Annotated Data
|
CA (%)
|
82
|
73
|
59
|
36
|
56
|
49
|
TPR (%)
|
85
|
79
|
92
|
47
|
42
|
99
|
FPR (%)
|
14
|
16
|
40
|
20
|
31
|
72
|
CA = classification accuracy, TPR = true-positive rate, FPR = false-positive rate
The application of the two-year models to the third year led to very divergent results. Although similar CAs could be reached for the models using original data, the TPRs and FPRs differed significantly, especially in the SWIR range. Here, TPRs between 38 and 71 % and FPRs of 30 to 72 % were obtained. Models using annotated data performed significantly better for VNIR with TPRs of 85, 79, and 92 % and corresponding FPRs of 14, 16, and 40 %. Again, the three models showed high variations in the SWIR range. In general, the model 2017/18 performed worst when applied to the third year for both original and annotated data approaches, as indicated by high FPRs. Results also indicate that VNIR appears to provide the more stable wavelength set.
Pre-Symptomatic Detection
For the pre-symptomatic detection, in 2017 and 2018, plants were considered that did not show any symptoms at the time of hyperspectral imaging but expressed Esca symptoms within the next two weeks. Again, models were developed per year for all pixels combined and then applied on plant level.
Table 3 shows CAs as well as TPRs and FPRs for 2017 and 2018. In 2017, CAs of 62 and 74 % for VNIR and SWIR, respectively, and similar TPRs could be achieved. However, one third of the pixels were falsely identified as symptomatic. The CAs as well as the TPRs were higher in 2018. Concurrently, the FPRs were lower at 28 % (VNIR) and 21 % (SWIR), meaning that the pre-symptomatic disease detection performed better in 2018. In general, applying the models per plant higher CAs could be reached. In 2018, better results were achieved again with TPRs of 100 % for both cameras and only 20 and 9 % false positives for VNIR and SWIR, respectively. For both approaches, the SWIR range seemed to be more suitable. So far, results indicate that pre-symptomatic disease detection should be possible.
Table 3: Results for the pre-symptomatic detection of Esca leaf symptoms.
|
|
VNIR
|
SWIR
|
|
|
2017
|
2018
|
2017
|
2018
|
Modeling
|
CA (%)
|
62 ± 2
|
79 ± 2
|
74 ± 1
|
86 ± 2
|
TPR (%)
|
63 ± 2
|
83 ± 4
|
78 ± 5
|
87 ± 4
|
FPR (%)
|
35 ± 1
|
28 ± 3
|
32 ± 5
|
21 ± 4
|
Application per Plant
|
CA (%)
|
73
|
81
|
79
|
91
|
TPR (%)
|
69
|
100
|
75
|
100
|
FPR (%)
|
26
|
20
|
21
|
9
|
CA = classification accuracy, TPR = true-positive rate, FPR = false-positive rate
Multispectral Simulation
The machine learning approach enables the calculation of relevance profiles, thereby providing information on the most important wavelengths for the differentiation tasks. The relevance profiles for the symptomatic – original and annotated data – and the pre-symptomatic detection models for the combined years are depicted in Figure 1. Depending on the differentiation task, individual spectral bands are of higher importance. Relevance profiles of the symptomatic detection show almost similar curves for original and annotated data, in contrast to the pre-symptomatic differentiation task.
Figure 1: Relevance profiles for the differentiation tasks in the VNIR (a) and SWIR (b) range combined for the years 2016, 2017, and 2018. Initially, all relevance values are set to 1.0, a number higher than 1.0 indicates an above neutral importance and a number smaller than 1.0 indicates an importance lower than the start condition. The sum of relevance values is constrained to a fixed value so a winner takes all competition for feature importance is build into the system.
In order to improve the efficiency of hyperspectral data analysis and to transfer the detection systems to a more practical application, data dimensionality has to be reduced. Therefore, up to ten local maxima were selected based on the relevance profiles to simulate a multispectral system (see Additional File 1). More than ten wavebands were seen as not realistic for commercial multispectral cameras. A threshold avoids overlapping between selected spectral bands. An example of the band selection methodology is depicted in Figure 2.
Figure 2: Example of the multispectral simulation. Based on the relevance profile (a), multispectral bands can be formed in important wavelength ranges. Starting with the two most informative bands (b), single bands can be added (c) according to their importance until a maximum of ten bands (d) is reached.
Table 4 shows CAs, TPRs and FPRs for the dimensionality reduction. In addition to results of the two to ten most important spectral bands, the entire spectrum is represented, which consist of 186 and 288 bands in the VNIR and SWIR range, respectively. Furthermore, the wavelengths of the MicaSense RedEdge™ camera used for airborne multispectral data acquisition were selected and simulated. In general, regarding the symptom detection models, the TPRs increased and the FPRs decreased as more wavelengths were considered. Again, the model using annotated data performed better than the model using original field data. For the annotated detection model, the TPRs were usually lower and the FPRs usually higher for the multispectral simulation in comparison to the complete spectra. Regarding the model based on original field data, the TPRs of the best performing multispectral approach were always higher than those of the entire spectrum. For the pre-symptomatic model, the FPRs decreased, but the TPRs did not necessarily increase when considering more spectral bands. The RedEdge™ wavelengths showed satisfying results but performed worse than the entire spectrum and best band selection approach, which is indicated by higher FPRs and lower TPRs.
Table 4: Results of the multispectral simulation for the three differentiation tasks.
Airborne Multispectral Detection of Esca Symptoms
The airborne multispectral detection of foliar symptoms was conducted according to vines’ disease severity. Due to the small amount of vines with < 25 % and > 75 % symptomatic leaves, only results for plants with 25 to 75 % symptomatic leaves are listed in Table 5. CAs between 58 and 73 % for both years and classes were achieved, indicating that airborne symptom detection is a challenging task, as expected. The FPRs were similar between disease severity classes per year, but the TPRs varied. For disease severity class 2, 57 and 45 % of all symptomatic vines were detected in 2017 and 2018, respectively. Regarding disease severity class 3, TPRs of 60 and 74 % could be shown, implying a slight trend towards better detection of Esca symptoms with higher disease severity.
Table 5: Results for the multispectral detection of esca leaf symptoms.
|
|
2017
|
2018
|
Disease severity class 2
(25 – 50 % symptomatic)
|
CA (%)
|
60
|
58
|
TPR (%)
|
57
|
45
|
FPR (%)
|
36
|
28
|
Disease severity class 3
(50 – 75 % symptomatic)
|
CA (%)
|
65
|
73
|
TPR (%)
|
60
|
74
|
FPR (%)
|
30
|
28
|
CA = classification accuracy, TPR = true-positive rate, FPR = false-positive rate