Remote sensing retrieval model of daily maximum temperature in rice growing region of Sichuan province
MOD11A2 was an 8-day composite data of surface temperature (LST), which was the average value of surface temperature during 8 days of sunny weather and the image with the least cloud cover within 8 days. Due to the most sensitive temperature corresponding to the 8-10 μ m -band range of 15-90 ℃, which was consistent with the daily maximum temperature [25]. The study used LST (MOD11A2) and NIR (MOD09A1, 8.41 -8.76 μ m) from June 1 to September 30, 2019 to 2021. Using EVI and NDVI calculated by MOD09A1 as model spectral parameters. As spatial features, latitude and longitude information and elevation information respectively describe the spatial relationships in the horizontal and vertical directions, which could effectively characterize the spatial distribution of points [26].
Firstly, the above basic information and image data corresponding to the geographical location of the conventional meteorological observation station on the ground in the rice growing region of Sichuan Province were extracted. Combined with the measured maximum temperature data, there were 4943 groups of data in total, and 2968 groups of valid data after removing cloud coverage and other invalid data. The final model input parameters were 4 spectral features and 3 spatial features (position X, Y, elevation H) and 1 temporal feature (diurnal sequence, D).
This study set up six sets of control experiments with different feature parameters, namely spectral model (only input spectral features), elevation model (input spectral features and elevation features), diurnal sequence model (input spectral features and diurnal sequence features), positional model (input spectral features and positional features), spatial model (input spectral features, elevation features and positional features) and spatiotemporal model (input spectral features, elevation features, position features, time features). The input parameters were shown in Table 2.
Table 2. Model input parameters
spectral model
|
Elevation model
|
location model
|
spatial model
|
spatiotemporal model
|
spectral band
|
spectral band
|
spectral band
|
spectral band
|
spectral band
|
|
Elevation (H)
|
longitude (X)
|
longitude (X)
|
longitude (X)
|
|
|
latitude (Y)
|
latitude (Y)
|
latitude (Y)
|
|
|
|
Elevation (H)
|
Elevation (H)
|
|
|
|
|
Diurnal sequence (D)
|
Statistical indicators represent the quality of model fitting through changes in data. In order to evaluate the accuracy of different models, this paper selected correlation coefficient (R2), Root-mean-square deviation (RMSE) and Mean absolute error (MAE) as evaluation parameters. Among them, R2 represents the correlation coefficient, which was used to measure the fitting effect of the model. RMSE measured the deviation between observed values and true values, and was commonly used as a standard for measuring the prediction results of machine learning models. MAE was the average of absolute errors, which could reflect the actual situation of predicted value errors [30].
Inversion model accuracy based on genetic algorithm BP neural network method
A BP neural network based on genetic algorithm generally included an input layer, a hidden layer, and an output layer. The main parameters that genetic algorithm affect network performance are population size (20-160), crossover probability (0.25-1), and mutation probability (0.05-1). After debugging, build a model with 6 input layers, 8 hidden layers, and 1 output layer. The population size was set at 100, with a crossover probability of 0.5 and a mutation probability of 0.1.
Inversion model accuracy of different training point reflection
Table 3 showed the accuracy of Point reflection inversion model for different numbers of training based on genetic algorithm BP neural network method. It could be seen that as the number of training points increases, RMSE showed a trend of first decreasing and then increasing. Due to the smaller RMSE, the higher the model accuracy, so the model accuracy showed an increase and then a decrease. As the number of training points increases, the model accuracy no longer improves. When the number of training points was 2000, the model accuracy was the highest. When the number of training points was 2500, the model accuracy showed overfitting phenomenon and begins to decline. The results indicate that the continuous increase in training points did not necessarily lead to an increase in model accuracy. Therefore, 2000 sets of data were randomly selected as training points, and an additional 400 sets of data were selected as validation points for model evaluation.
Table 3. Accuracy of Point reflection inversion model for different number of training points of BP neural network method based on genetic algorithm
The number of training points
|
RMSE(℃)
|
MAE(℃)
|
500
|
3.64
|
2.69
|
1000
|
3.22
|
2.49
|
2000
|
3.11
|
2.44
|
2500
|
3.63
|
2.51
|
Accuracy of Inversion Models with Different Parameters
Figure 1 showed the correlation between the predicted values and actual values of the inversion model using the BP neural network method based on genetic algorithm. As shown in Figure 1, the correlation coefficient of the spectral model was 0.62, the position model was 0.65, the elevation model was 0.71, the spatial model was 0.74, the daily order model was 0.78 and the spatial model was 0.81. After adding spatial and temporal features, the correlation of the model had been improved to varying degrees. Compared to the spectral model, the correlation coefficient of the position model had increased by about 3%, the elevation model had increased by about 9%, the spatial model had increased by 11%, and the diurnal sequence model had increased by about 16%. The diurnal sequence information had a more significant improvement in correlation compared to the position and elevation information. The spatiotemporal model combined three features, and the correlation coefficient increased by about 19%, achieving the best effect.
Table 4 showed the inversion model accuracy of the BP neural network method based on genetic algorithm. It could be seen that compared to spectral models that only input spectral data, the addition of spatial and temporal features resulted in varying degrees of improvement in pattern accuracy. After adding elevation features, RMSE decreased by 0.1 and MAE decreased by 0.13, resulting in a slight improvement in model accuracy. After adding positional features, RMSE decreased by 0.05 and MAE decreased by 0.05, resulting in a slight improvement in model accuracy. After adding daily sequence features, RMSE decreased by 0.39 and MAE decreased by 0.49, resulting in a significant improvement in model accuracy. After adding elevation and position features, RMSE decreased by 0.16 and MAE decreased by 0.22, resulting in a slight improvement in model accuracy. Compared to spatial features, temporal features had a more significant impact on the model, as diurnal sequence was an important component that affects temperature. Compared to positional features, elevation features had a more significant impact on the model, as the higher the altitude, the lower the daily maximum temperature. The spatiotemporal model combined three features and has the best performance, with RMSE of 3.11 and MAE of 2.44.
Table 4. Inversion Model Accuracy of BP Neural Network Method Based on Genetic Algorithm
|
Spectral model
|
Elevation model
|
Location
model
|
Daily order model
|
Spatial model
|
Spatiotemporal model
|
MRSE(℃)
|
3.61
|
3.51
|
3.56
|
3.12
|
3.45
|
3.11
|
MAE(℃)
|
2.97
|
2.84
|
2.92
|
2.48
|
2.75
|
2.44
|
Inversion Model Accuracy Based on Support Vector Machine Method
Support vector machine was a small sample learning method with good nonlinear processing ability, which can effectively avoid the iterative process falling into local minima. This method achieved linear regression by constructing a decision function in high-dimensional space through feature dimensionality enhancement using the sum function [27]. After debugging, a model with a convolutional kernel of RBF and a penalty parameter of 6.0 was constructed.
Inversion model accuracy of different training Point reflection
Table 5 showed the accuracy of Point reflection inversion model for different numbers of training based on the support vector machine method. It could be seen that as the number of training points increases, RMSE showed a trend of first decreasing and then increasing, and the model accuracy showed a trend of increasing and then decreasing. At 1000 training points, the model accuracy reached its highest level. As the number of training points increases, the model accuracy no longer improves. When the number of training points was greater than 1000, the model accuracy showed overfitting phenomenon, and the accuracy begins to decrease. The results indicate that increasing the number of training points in the support training machine model cannot continuously increase the accuracy of the model. Therefore, 1000 sets of data were randomly selected as training points, and an additional 200 sets of data were selected as validation points for model evaluation.
Table 5. Accuracy of Point reflection model with different number of training points based on support vector machine method
The number of training points
|
RMSE(℃)
|
MAE(℃)
|
500
|
3.38
|
2.74
|
1000
|
3.01
|
2.36
|
1500
|
3.51
|
2.69
|
2000
|
3.56
|
2.73
|
2500
|
3.54
|
2.78
|
Accuracy of Inversion Models with Different Parameters
Figure 6 showed the correlation between the predicted values and actual values of different parameter inversion models based on the support vector machine method. It could be seen that the correlation coefficient of the spectral model was 0.65, the position model was 0.67, the elevation model was 0.66, the spatial model was 0.69, the daily order model was 0.71 and the spatial model was 0.79. After adding spatial and temporal features, the correlation of the model had improved to varying degrees. Compared to the spectral model, the correlation coefficient of the position model, the elevation model, the spatial model and the diurnal model had increased by about 2%, 1%, 3%, and 6%, respectively. The daily order information based on support vector machine inversion model had a more significant improvement in correlation compared to position information and elevation information. The spatiotemporal model combined three features, and the correlation coefficient had increased by about 13%, achieving the best effect.
Table 6 showed the accuracy of different input parameter inversion models based on the support vector machine method. It could be seen that compared to spectral models that only input spectral data, the addition of spatial and temporal features results in varying degrees of improvement in pattern accuracy. After adding elevation features, RMSE decreased by 0.42 and MAE decreased by 0.42, resulting in a slight improvement in model accuracy. After adding positional features, RMSE decreased by 0.39 and MAE decreased by 0.33, resulting in a slight improvement in model accuracy. After adding daily sequence features, RMSE decreased by 0.47 and MAE decreased by 0.4, resulting in a significant improvement in model accuracy. After adding elevation and position features, RMSE decreased by 0.59 and MAE decreased by 0.59, resulting in a significant improvement in model accuracy. In the support vector machine model, compared to the daily order features, spatial features had a higher degree of improvement on the model. Compared to positional features, elevation features had a better improvement on the model. The spatiotemporal model combined three features and had the best performance, with RMSE of 2.78 and MAE of 2.31.
Table 6 Precision of Inversion Models with Different Input Parameters Based on Support Vector Machine Method
|
Spectral model
|
Elevation model
|
Location
model
|
Daily order model
|
Spatial model
|
Spatiotemporal model
|
MRSE(℃)
|
3.61
|
3.19
|
3.22
|
3.14
|
3.02
|
2.78
|
MAE(℃)
|
2.99
|
2.57
|
2.66
|
2.59
|
2.43
|
2.31
|
Accuracy of inversion model based on Random Forest method
Random Forest was an integrated algorithm of decision tree. The Random Forest of Nonlinear regression was formed by growing trees according to random vectors. Random Forest contained multiple decision trees to reduce the risk of over fitting, and had the properties of easy interpretation, handling category features, easy expansion to multi classification problems, and no need for Feature scaling. After debugging, a Random Forest model with 90 decision trees and 5 minimum leaves was constructed.
Inversion model accuracy of different training Point reflection
Table 7 showed the model accuracy of different number of training points based on the Random Forest method. It could be seen that as the number of training points increases, RMSE continued to increase, and the model accuracy shows an upward trend. The results showed that the increase of training points in the Random Forest model could increase the accuracy of the model, and the use of more training samples was more meaningful for the retrieval of daily maximum temperature using the Random Forest method. Therefore, 2500 sets of data were randomly selected as training points, and the remaining 468 sets of data were used as validation samples for model accuracy evaluation.
Table 7. Accuracy of different number of training Point reflection inversion model based on Random Forest method
The number of training points
|
RMSE(℃)
|
MAE(℃)
|
500
|
3.42
|
2.79
|
1000
|
3.15
|
2.47
|
1500
|
3.09
|
2.41
|
2000
|
2.97
|
2.37
|
2500
|
2.75
|
2.19
|
Inversion model accuracy with different input parameters
Figure 3 showed the correlation between the predicted values and the actual values of the inversion model with different input parameters based on the Random Forest method. It could be seen that the correlation coefficient of the spectral model was 0.56, the position model was 0.65, the elevation model was 0.61, the spatial model was 0.71, the daily order model was 0.74 and the spatial model was 0.84. The results showed that after adding spatial and temporal features, the model correlation improved to varying degrees. Compared to the spectral model, the correlation coefficient of the position model had increased by about 9%, the elevation model by about 5%, the spatial model by 15%, and the diurnal model by about 18%. The results indicate that the increase in correlation was more significant with daily order information than with location and elevation information. The spatiotemporal model combined three features, and the correlation coefficient increased by about 28%, achieving the best effect.
Table 7 showed the inversion model accuracy of different input parameters based on the Random Forest method. It could be seen that compared to spectral models that only input spectral data, the addition of spatial and temporal features resulted in varying degrees of improvement in pattern accuracy. After adding elevation features, RMSE decreased by 0.24 and MAE decreased by 0.15, and the improvement in model accuracy was not significant. After adding positional features, RMSE decreased by 0.15 and MAE decreased by 0.29, resulting in a slight improvement in model accuracy. After adding daily sequence features, RMSE decreased by 0.65 and MAE decreased by 0.6, resulting in a significant improvement in model accuracy. After adding elevation and position features, RMSE decreased by 0.26 and MAE decreased by 0.14, resulting in a certain improvement in model accuracy. For the inversion model based on the Random Forest method, the time series feature improved the accuracy of the model most significantly, followed by the elevation feature. The spatiotemporal model combined three features and has the best performance, with an RMSE of 2.75 and a MAE of 2.19.
Table 7. Accuracy of inversion model with different input parameters based on Random Forest method
|
Spectral model
|
Elevation model
|
Location
model
|
Daily order model
|
Spatial model
|
Spatiotemporal model
|
MRSE(℃)
|
3.76
|
3.54
|
3.61
|
3.11
|
3.5
|
2.75
|
MAE(℃)
|
3.07
|
2.92
|
2.88
|
2.47
|
2.93
|
2.19
|