Research on Retrieval Model of Daily Maximum Temperature in Rice Growing region of Sichuan Province Based on Remote Sensing Data

doi:10.21203/rs.3.rs-4318534/v1

Based on the observation data of 103 conventional meteorological stations, MODIS data and geographic information data, the BP neural network, Random Forest and support vector machine methods based on genetic algorithm were applied to build the retrieval model of daily maximum temperature in Growing region of Sichuan Province. The results indicated that: (1) Compared to other models, the model combining multispectral information with spatiotemporal information for daily maximum temperature inversion had the highest inversion accuracy. The inversion model using only spectral information for daily maximum temperature had the lowest inversion accuracy. The addition of spatial and temporal information could effectively mitigate the impact of complex atmospheric environments on model inversion. (2) For the inversion model of BP neural network based on genetic algorithm and Random Forest algorithm, the accuracy of the model was significantly improved by the day sequence information, and for the inversion model based on support vector machine, the accuracy of the model was significantly improved by the spatial information. (3) The inversion model based on Random Forest had the highest precision, RMSE = 2.19 ℃. The inversion model based on support vector machine had the second highest accuracy, with RMSE = 2.31 ℃. The inversion model based on genetic algorithm and BP neural network has the lowest accuracy, with RMSE = 2.44 ℃.

Growing region of rice in Sichuan Province

Daily maximum temperature

Machine learning

Inversion model

Temperature is not only a key parameter characterizing of surface climate, but also one of the most basic meteorological elements. At present, most temperature data in China is obtained through observation by meteorological stations. Due to the significant impact of factors such as vegetation, altitude, soil, and water on temperature, the measured data from a single meteorological station can only reflect a limited area. In sparsely populated and harsh environments, the distribution of meteorological stations is very rare, and spatial interpolation methods are often difficult to obtain high-precision air temperature spatial distribution information [1]. Therefore, high-precision temperature estimation methods urgently need to be solved. Compared with conventional surface meteorological observation data, satellite remote sensing data has more advantages. There is an energy balance relationship between land surface temperature (LST) retrieved by satellite remote sensing and air temperature. At the same time, Satellite imagery can also obtain information about land vegetation, water body and atmosphere. For this reason, it is possible to retrieve air temperature by using land surface temperature.

Currently, there has been a large of academic research on temperature remote sensing inversion methods in the academic community, mainly including single factor statistics, multi factor statistics, machine learning, etc. The single factor statistical method was to establish the brightness temperature of the thermal infrared channel or retrieve the linear or nonlinear correlation between land surface temperature and near surface temperature from it. Huang et al. (2015) based on the ground surface temperature of MODIS remote sensing data, used the spatial interpolation method to obtain the near ground temperature of the region in spring, summer, autumn and winter, with Root-mean-square deviation RMSE=2.41[2]. Kawashima et al. (1996) used Landsat satellite data to estimate the temperature in Japan by establishing the relationship between surface temperature and atmospheric temperature under cloudless weather conditions[3]. The single factor statistical method was simple and easy to implement, but only using surface temperature as a variable could cause significant deviation. Therefore, when constructing the correlation between surface temperature and air temperature, other relevant factors must be considered simultaneously. Multifactor statistics integrate multiple factors. For example, Zhou Hongmei et al. (2001) used building density, vegetation distribution, and water system as factors to estimate the temperature of urban areas, industrial areas, suburban intersections, and agricultural areas in Shanghai, with correlation coefficients>0.87[4].

Machine learning mainly studied how to use data and algorithms to simulate human learning processes and gradually improve their accuracy. Cheolhee et al. (2018) designed eight different variable input schemes, and simulated the daily maximum and minimum temperatures using the Random Forest method[5]. The best model R²>0.7, and the Root-mean-square deviation<1.7 ℃. Gao Liang et al.(2020) used machine learning methods such as Random Forest, SVM, Ada Boost and ridge regression to calculate the correlation between temperature and various factors[6]. The results showed that the Random Forest model had the highest precision, R²=0.85, and Root-mean-square deviation=0.50 ℃. Xing Liting et al. (2020) used the Random Forest model to simulate the near ground temperature in Lanzhou, with R² of 0.921 and 0.916 respectively [7]. Guo Jianmao et al. (2018) used LST, EVI, and NDVI to invert the daily maximum temperature[8].

In this study, the observation data, MODIS data, and geographic information data of 103 conventional meteorological stations were used, and the BP neural network, Random Forest, and support vector machine methods based on genetic algorithms were used to select spectral characteristics, spatial characteristics, and time characteristics as model input parameters, respectively, to build the retrieval model of daily maximum temperature in rice growing region of Sichuan Province.

Introduction the research area

The study area was located at 26 ° -33 ° N and 100 ° -103.6 ° E. It belongs to the subtropical monsoon climate zone, with warm climate and abundant precipitation. There were 19 rice Growing region in Chengdu, Guang'an, Neijiang, Ya'an, Meishan, Leshan, Deyang, Dazhou, Panzhihua, Nanchong, Yibin, Suining, Guangyuan, Luzhou, Bazhong, Ziyang, Zigong, Mianyang and Liangshan Yi Autonomous Prefecture. It was warm and humid throughout the year, with an average annual temperature of 16-18 ℃, a accumulated temperature of 4000-6000 ℃, and a frost-free period of 230-340 days. It had high cloud cover, few sunny days, and abundant rainfall, with an annual precipitation of 1000-1200mm.

Data

The meteorological data used in this study were provided by Sichuan Meteorological Bureau, including the daily average temperature and maximum temperature of 103 conventional meteorological stations in the rice growing region from 1961 to 2022. Remote sensing data includes MODIS, Landdat8OLI data and DEM elevation image data of Sichuan Province.

MODIS was carried by TERRA and AQUA. The TERRA satellite passes by at 10:30 local time and was called the "Morning Star". The AQUA satellite passes by at 13:30 local time and was called the "Afternoon Star". The MODIS data products were divided into Level 0 (original data), Level 1 (given calibration parameters), Level 2 (data after calibration and positioning), Level 3 (product data), and Level 4 (geometric correction and radiometric calibration of the image were carried out according to the parameters of the parameter library, so that each point in the image has an accurate geographical location code, reflectivity or Emissivity). Specifically, there were three main types of standard data products: terrestrial standard data products, atmospheric standard data products, and marine standard data products, which were decomposed into 44 types of standard data products.

MOD09 was a level 2 product data, with each product pixel containing the best observation data for 8 days. It was selected based on high observation coverage, low viewing angle, no clouds or shadows, and aerosol loading conditions. This study used a spatial resolution of 250m × Product data of MOD09Q1 with a time resolution of 8d and a depth of 250m. Table 1 provides an introduction to the MOD09Q1 band.

Table 1. Introduction to MOD09Q1 Band

Band	Spectral range (nm)	resolution ratio
1(Red)	620-670	250m
2(NIR)	841-876	250m

MOD11A2 was synthesized from the daily 1km surface temperature product (MOD11A1) with a spatial resolution of 1km × 1km. The stored data was the average surface temperature during 8 days of sunny weather.

This study used MOD09Q1 and MOD11A2 data. The research area covered four areas: h26v05, h26v06, h27v05 and h27v06. The period was from 2019 to 2021.

Research Methods

BP neural network based on genetic algorithm

Neural networks were widely interconnected networks composed of multiple neurons, which could simulate the interaction reactions between the real world and physics of biological neural systems. BP neural network was a multilayer feedforward network trained according to the error back propagation algorithm. It was one of the most widely used neural network models. It had simple structure, many adjustable parameters, many training algorithms and good operability. However, it had slow learning rate of convergence speed and could not guarantee convergence to the global minimum. Genetic algorithms could be used to optimize neural networks based on these characteristics. The basic idea of genetic algorithm was to use the initial weights and thresholds of individual representative networks, and used the crossover and mutation operations of BP neural networks initialized with individual values to find the optimal BP neural network weights and thresholds [9-13]. This study implemented genetic algorithms through programming in MATLAB software.

Support Vector Machine

Support Vector Regression (SVR) was a small sample learning method with good nonlinear processing ability, which could effectively avoid the iterative process falling into local minima. This method achieved linear regression by constructing a decision function in high-dimensional space through feature dimensionality enhancement using the sum function [14]. From the perspective of geometry, the SVR algorithm was to find a regression Plane of reference in n-dimensional space to minimize the distance from each point to the Hyperplane [15-20].

Random Forest

Random Forest was a classifier that contains multiple decision trees, and there was no association between these decision trees. Random Forest integrated multiple trees into one algorithm through Ensemble learning. The basic unit was the decision tree, which was essentially Ensemble learning. Integrate various learning methods through specific rules to achieve better results than a single learner. Ensemble learning solved a single prediction problem by building several models and combining them. The working principle of Random Forest was to generate multiple classifiers or models, learn and make predictions independently [21-24].

Remote sensing retrieval model of daily maximum temperature in rice growing region of Sichuan province

MOD11A2 was an 8-day composite data of surface temperature (LST), which was the average value of surface temperature during 8 days of sunny weather and the image with the least cloud cover within 8 days. Due to the most sensitive temperature corresponding to the 8-10 μ m -band range of 15-90 ℃, which was consistent with the daily maximum temperature [25]. The study used LST (MOD11A2) and NIR (MOD09A1, 8.41 -8.76 μ m) from June 1 to September 30, 2019 to 2021. Using EVI and NDVI calculated by MOD09A1 as model spectral parameters. As spatial features, latitude and longitude information and elevation information respectively describe the spatial relationships in the horizontal and vertical directions, which could effectively characterize the spatial distribution of points [26].

Firstly, the above basic information and image data corresponding to the geographical location of the conventional meteorological observation station on the ground in the rice growing region of Sichuan Province were extracted. Combined with the measured maximum temperature data, there were 4943 groups of data in total, and 2968 groups of valid data after removing cloud coverage and other invalid data. The final model input parameters were 4 spectral features and 3 spatial features (position X, Y, elevation H) and 1 temporal feature (diurnal sequence, D).

This study set up six sets of control experiments with different feature parameters, namely spectral model (only input spectral features), elevation model (input spectral features and elevation features), diurnal sequence model (input spectral features and diurnal sequence features), positional model (input spectral features and positional features), spatial model (input spectral features, elevation features and positional features) and spatiotemporal model (input spectral features, elevation features, position features, time features). The input parameters were shown in Table 2.

Table 2. Model input parameters

spectral model	Elevation model	location model	spatial model	spatiotemporal model
spectral band	spectral band	spectral band	spectral band	spectral band
	Elevation （H）	longitude （X）	longitude （X）	longitude （X）
		latitude （Y）	latitude （Y）	latitude （Y）
			Elevation （H）	Elevation （H）
				Diurnal sequence （D）

Statistical indicators represent the quality of model fitting through changes in data. In order to evaluate the accuracy of different models, this paper selected correlation coefficient (R²), Root-mean-square deviation (RMSE) and Mean absolute error (MAE) as evaluation parameters. Among them, R² represents the correlation coefficient, which was used to measure the fitting effect of the model. RMSE measured the deviation between observed values and true values, and was commonly used as a standard for measuring the prediction results of machine learning models. MAE was the average of absolute errors, which could reflect the actual situation of predicted value errors [30].

Inversion model accuracy based on genetic algorithm BP neural network method

A BP neural network based on genetic algorithm generally included an input layer, a hidden layer, and an output layer. The main parameters that genetic algorithm affect network performance are population size (20-160), crossover probability (0.25-1), and mutation probability (0.05-1). After debugging, build a model with 6 input layers, 8 hidden layers, and 1 output layer. The population size was set at 100, with a crossover probability of 0.5 and a mutation probability of 0.1.

Inversion model accuracy of different training point reflection

Table 3 showed the accuracy of Point reflection inversion model for different numbers of training based on genetic algorithm BP neural network method. It could be seen that as the number of training points increases, RMSE showed a trend of first decreasing and then increasing. Due to the smaller RMSE, the higher the model accuracy, so the model accuracy showed an increase and then a decrease. As the number of training points increases, the model accuracy no longer improves. When the number of training points was 2000, the model accuracy was the highest. When the number of training points was 2500, the model accuracy showed overfitting phenomenon and begins to decline. The results indicate that the continuous increase in training points did not necessarily lead to an increase in model accuracy. Therefore, 2000 sets of data were randomly selected as training points, and an additional 400 sets of data were selected as validation points for model evaluation.

Table 3. Accuracy of Point reflection inversion model for different number of training points of BP neural network method based on genetic algorithm

The number of training points	RMSE（℃）	MAE（℃）
500	3.64	2.69
1000	3.22	2.49
2000	3.11	2.44
2500	3.63	2.51

Accuracy of Inversion Models with Different Parameters

Figure 1 showed the correlation between the predicted values and actual values of the inversion model using the BP neural network method based on genetic algorithm. As shown in Figure 1, the correlation coefficient of the spectral model was 0.62, the position model was 0.65, the elevation model was 0.71, the spatial model was 0.74, the daily order model was 0.78 and the spatial model was 0.81. After adding spatial and temporal features, the correlation of the model had been improved to varying degrees. Compared to the spectral model, the correlation coefficient of the position model had increased by about 3%, the elevation model had increased by about 9%, the spatial model had increased by 11%, and the diurnal sequence model had increased by about 16%. The diurnal sequence information had a more significant improvement in correlation compared to the position and elevation information. The spatiotemporal model combined three features, and the correlation coefficient increased by about 19%, achieving the best effect.

Table 4 showed the inversion model accuracy of the BP neural network method based on genetic algorithm. It could be seen that compared to spectral models that only input spectral data, the addition of spatial and temporal features resulted in varying degrees of improvement in pattern accuracy. After adding elevation features, RMSE decreased by 0.1 and MAE decreased by 0.13, resulting in a slight improvement in model accuracy. After adding positional features, RMSE decreased by 0.05 and MAE decreased by 0.05, resulting in a slight improvement in model accuracy. After adding daily sequence features, RMSE decreased by 0.39 and MAE decreased by 0.49, resulting in a significant improvement in model accuracy. After adding elevation and position features, RMSE decreased by 0.16 and MAE decreased by 0.22, resulting in a slight improvement in model accuracy. Compared to spatial features, temporal features had a more significant impact on the model, as diurnal sequence was an important component that affects temperature. Compared to positional features, elevation features had a more significant impact on the model, as the higher the altitude, the lower the daily maximum temperature. The spatiotemporal model combined three features and has the best performance, with RMSE of 3.11 and MAE of 2.44.

Table 4. Inversion Model Accuracy of BP Neural Network Method Based on Genetic Algorithm

Spectral model

Elevation model

Location

model

Daily order model

Spatial model

Spatiotemporal model

MRSE（℃）

3.61

3.51

3.56

3.12

3.45

3.11

MAE（℃）

2.97

2.84

2.92

2.48

2.75

2.44

Inversion Model Accuracy Based on Support Vector Machine Method

Support vector machine was a small sample learning method with good nonlinear processing ability, which can effectively avoid the iterative process falling into local minima. This method achieved linear regression by constructing a decision function in high-dimensional space through feature dimensionality enhancement using the sum function [27]. After debugging, a model with a convolutional kernel of RBF and a penalty parameter of 6.0 was constructed.

Inversion model accuracy of different training Point reflection

Table 5 showed the accuracy of Point reflection inversion model for different numbers of training based on the support vector machine method. It could be seen that as the number of training points increases, RMSE showed a trend of first decreasing and then increasing, and the model accuracy showed a trend of increasing and then decreasing. At 1000 training points, the model accuracy reached its highest level. As the number of training points increases, the model accuracy no longer improves. When the number of training points was greater than 1000, the model accuracy showed overfitting phenomenon, and the accuracy begins to decrease. The results indicate that increasing the number of training points in the support training machine model cannot continuously increase the accuracy of the model. Therefore, 1000 sets of data were randomly selected as training points, and an additional 200 sets of data were selected as validation points for model evaluation.

Table 5. Accuracy of Point reflection model with different number of training points based on support vector machine method

The number of training points	RMSE（℃）	MAE（℃）
500	3.38	2.74
1000	3.01	2.36
1500	3.51	2.69
2000	3.56	2.73
2500	3.54	2.78

Accuracy of Inversion Models with Different Parameters

Figure 6 showed the correlation between the predicted values and actual values of different parameter inversion models based on the support vector machine method. It could be seen that the correlation coefficient of the spectral model was 0.65, the position model was 0.67, the elevation model was 0.66, the spatial model was 0.69, the daily order model was 0.71 and the spatial model was 0.79. After adding spatial and temporal features, the correlation of the model had improved to varying degrees. Compared to the spectral model, the correlation coefficient of the position model, the elevation model, the spatial model and the diurnal model had increased by about 2%, 1%, 3%, and 6%, respectively. The daily order information based on support vector machine inversion model had a more significant improvement in correlation compared to position information and elevation information. The spatiotemporal model combined three features, and the correlation coefficient had increased by about 13%, achieving the best effect.

Table 6 showed the accuracy of different input parameter inversion models based on the support vector machine method. It could be seen that compared to spectral models that only input spectral data, the addition of spatial and temporal features results in varying degrees of improvement in pattern accuracy. After adding elevation features, RMSE decreased by 0.42 and MAE decreased by 0.42, resulting in a slight improvement in model accuracy. After adding positional features, RMSE decreased by 0.39 and MAE decreased by 0.33, resulting in a slight improvement in model accuracy. After adding daily sequence features, RMSE decreased by 0.47 and MAE decreased by 0.4, resulting in a significant improvement in model accuracy. After adding elevation and position features, RMSE decreased by 0.59 and MAE decreased by 0.59, resulting in a significant improvement in model accuracy. In the support vector machine model, compared to the daily order features, spatial features had a higher degree of improvement on the model. Compared to positional features, elevation features had a better improvement on the model. The spatiotemporal model combined three features and had the best performance, with RMSE of 2.78 and MAE of 2.31.

Table 6 Precision of Inversion Models with Different Input Parameters Based on Support Vector Machine Method

Spectral model

Elevation model

Location

model

Daily order model

Spatial model

Spatiotemporal model

MRSE（℃）

3.61

3.19

3.22

3.14

3.02

2.78

MAE（℃）

2.99

2.57

2.66

2.59

2.43

2.31

Accuracy of inversion model based on Random Forest method

Random Forest was an integrated algorithm of decision tree. The Random Forest of Nonlinear regression was formed by growing trees according to random vectors. Random Forest contained multiple decision trees to reduce the risk of over fitting, and had the properties of easy interpretation, handling category features, easy expansion to multi classification problems, and no need for Feature scaling. After debugging, a Random Forest model with 90 decision trees and 5 minimum leaves was constructed.

Inversion model accuracy of different training Point reflection

Table 7 showed the model accuracy of different number of training points based on the Random Forest method. It could be seen that as the number of training points increases, RMSE continued to increase, and the model accuracy shows an upward trend. The results showed that the increase of training points in the Random Forest model could increase the accuracy of the model, and the use of more training samples was more meaningful for the retrieval of daily maximum temperature using the Random Forest method. Therefore, 2500 sets of data were randomly selected as training points, and the remaining 468 sets of data were used as validation samples for model accuracy evaluation.

Table 7. Accuracy of different number of training Point reflection inversion model based on Random Forest method

The number of training points	RMSE（℃）	MAE（℃）
500	3.42	2.79
1000	3.15	2.47
1500	3.09	2.41
2000	2.97	2.37
2500	2.75	2.19

Inversion model accuracy with different input parameters

Figure 3 showed the correlation between the predicted values and the actual values of the inversion model with different input parameters based on the Random Forest method. It could be seen that the correlation coefficient of the spectral model was 0.56, the position model was 0.65, the elevation model was 0.61, the spatial model was 0.71, the daily order model was 0.74 and the spatial model was 0.84. The results showed that after adding spatial and temporal features, the model correlation improved to varying degrees. Compared to the spectral model, the correlation coefficient of the position model had increased by about 9%, the elevation model by about 5%, the spatial model by 15%, and the diurnal model by about 18%. The results indicate that the increase in correlation was more significant with daily order information than with location and elevation information. The spatiotemporal model combined three features, and the correlation coefficient increased by about 28%, achieving the best effect.

Table 7 showed the inversion model accuracy of different input parameters based on the Random Forest method. It could be seen that compared to spectral models that only input spectral data, the addition of spatial and temporal features resulted in varying degrees of improvement in pattern accuracy. After adding elevation features, RMSE decreased by 0.24 and MAE decreased by 0.15, and the improvement in model accuracy was not significant. After adding positional features, RMSE decreased by 0.15 and MAE decreased by 0.29, resulting in a slight improvement in model accuracy. After adding daily sequence features, RMSE decreased by 0.65 and MAE decreased by 0.6, resulting in a significant improvement in model accuracy. After adding elevation and position features, RMSE decreased by 0.26 and MAE decreased by 0.14, resulting in a certain improvement in model accuracy. For the inversion model based on the Random Forest method, the time series feature improved the accuracy of the model most significantly, followed by the elevation feature. The spatiotemporal model combined three features and has the best performance, with an RMSE of 2.75 and a MAE of 2.19.

Table 7. Accuracy of inversion model with different input parameters based on Random Forest method

Spectral model

Elevation model

Location

model

Daily order model

Spatial model

Spatiotemporal model

MRSE（℃）

3.76

3.54

3.61

3.11

3.5

2.75

MAE（℃）

3.07

2.92

2.88

2.47

2.93

2.19

At present, research on high-temperature heat damage in rice mainly consists of four categories: field measurement data, station data, high-temperature stress experiments, reanalysis data, and remote sensing data. Field research generally measures local field data during the occurrence of high-temperature heat damage in rice, studying the impact of different climate conditions on rice growth and development in the local area. This method had strong locality. Wang Cailin et al. (2004) used field experimental data and combined it with the investigation results of the harm of local high temperatures to rice, analyzed the impact of high temperatures on rice seed setting rate, and proposed corresponding defense strategies [28]. Site data was most commonly used in the study of high-temperature heat damage in rice, with a long lifespan and a wide range of analysis. Wan Suqin et al. (2009) and Yu Kun et al. (2010) used station data to study the relationship between high temperature heat damage, empty shell rate, and yield of rice in Hubei, Jiangsu, and other regions [29-30]. They analyzed the spatiotemporal distribution characteristics of high temperature heat damage in rice. Based on high temperature data from 68 meteorological stations in Zhejiang Province, Jin Zhifeng et al.(2009) analyzed the occurrence pattern of high temperature heat damage in early rice and its impact on yield [31]. They established a simulation equation between circulation factors and high temperature heat damage, which could provide certain reference for monitoring, warning, and impact assessment of high temperature heat damage in rice.

After the United States launched the Earth Resources Satellite (Landsat) in 1972, satellite remote sensing technology was applied to the agricultural field. At present, there were three representative agricultural remote sensing monitoring and yield estimation experiments both domestically and internationally. One was the "Large scale Crop Census Experiment" jointly conducted by NASA, the Department of Agriculture, the National Oceanic and Atmospheric Administration, which used image data received by Landset1-3 to estimate the area and yield of nine wheat producing states in the United States. The second was the "Space Remote Sensing Survey Program for Agriculture and Resources", which aim to develop the technological means needed by the United States to monitor global food production and forecast the area and unit yield of major food crops in the United States and many countries around the world. Thirdly, the "Remote Sensing Agricultural Monitoring" project implemented by the Sixth Division of the European Union had established a crop assessment system in the European region and applied its resulted to the EU's common agricultural policy, such as verifying agricultural subsidies and farmer declarations. Although China started relatively late in the field of agricultural remote sensing, it has also achieved good results. Since the 1980s, the Chinese Ministry of Agriculture and other relevant departments had conducted research on crop yield estimation using remote sensing technology in the northern region of China. In the past 40 years, the development of agricultural remote sensing technology in China had completed a transformation from introducing technology, tackling scientific and technological challenges to macro decision-making, accumulating a large amount of practical experience, and producing good economic and social benefits. The purpose of agricultural remote sensing monitoring was to monitor the planting area, growth, yield, and other factors of crops. Therefore, current research in agricultural remote sensing mainly includes monitoring changes in crop sowing area, monitoring crop growth, drought conditions, yield changes, and predicting total crop yield. The use of satellite remote sensing technology to monitor regional changes in crops mainly includes Landsat satellites, SPOT satellite, Fengyun series satellites, and Terra and Aqua satellites, among others.

The data used in this article was a 500m MODIS reflectance dataset as the main data for rice area recognition. Due to the fragmentation of the study area, low spatial resolution data will affect the accuracy of rice area extraction and rice heading and flowering stage recognition. Therefore, Landsat data had been added to the MODIS data. However, due to the low temporal resolution of Landsat data and the significant impact of cloud cover, there may also be some errors. In practical applications, more satellite data with high spatial and temporal resolution should be used to extract the area and developmental period of rice. Using MODIS eight-day synthetic data through machine learning to estimate daily maximum temperature requires data with higher temporal resolution to avoid errors. The national meteorological industry standards adopted in this article indicate that with the increasing frequency of high-temperature heat damage in rice, more heat-resistant varieties had been developed and promoted for cultivation. Further implementation of different high-temperature heat damage standards for different rice varieties was needed to achieve higher precision remote sensing monitoring of high-temperature heat damage in rice.

In this study, BP neural network, support vector machine and Random Forest method based on genetic algorithm were used to build the retrieval model of daily maximum temperature and retrieved the daily maximum temperature in rice growing region in Sichuan Province. The following conclusions were obtained.

(1) In machine learning, the effectiveness of using different training points to invert models varies, and different inversion models had different requirements for training points. If there were too many training points, the model was prone to overfitting and accuracy begin to decline.

(2) The machine learning method was used to retrieve the daily maximum temperature in the rice growing region of Sichuan Province. Based on the BP neural network method of genetic algorithm, the spectral information combined with the daily sequence features reduces RMSE by 0.39, MAE by 0.49, RMSE by 0.16 and MAE by 0.22 after adding spatial information. Based on the support vector machine method, the combination of spectral information and daily order information reduces RMSE by 0.47 and MAE by 0.4. After adding spatial information, RMSE decreased by 0.59 and MAE decreased by 0.59. Based on the Random Forest method, the RMSE and MAE of spectral information combined with the daily order information feature decreased by 0.65 and 0.6 respectively. After adding spatial information, RMSE decreased by 0.26 and MAE decreased by 0.14. Multi spectral information combined with spatiotemporal information could effectively retrieve the daily maximum temperature in the rice growing region of Sichuan Province. Compared with building the model only using spectral information, adding spatial information and time series information could significantly improve the accuracy of the model. The addition of spatial and temporal information could effectively mitigate the impact of complex atmospheric environments on model inversion. For the inversion model of BP neural network based on genetic algorithm and Random Forest method, the accuracy of the model was improved more significantly by the day sequence information, and for the inversion model based on support vector machine method, the accuracy of the model was improved more significantly by the spatial information.

(3) The inversion model based on Random Forest method had the highest accuracy and strong applicability. The inversion model based on support vector machine method had the second highest accuracy, while the BP neural network method based on genetic algorithm had the lowest inversion model accuracy.

Supporting information

Sichuan Natural Science Foundation project "Quantitative Assessment of Climate Impact of Northwest Sichuan Ecological Demonstration Zone on Surrounding Regions" (2022 NSFSC0208); Rainstorm, Drought and Flood disasters in Plateau and Basin, Science and Technology Development of Sichuan Provincial Key Laboratory (SCQXKJYJXMS202210); Key Project of Gansu Provincial Natural Science Foundation (21JR7RA694); Arid Meteorological Science Research Fund Project (IAM202004).

Author Contribution

Wang F.Z. & Yuan S.J.: Formal analysis, Writing- original draft. Yang S.H.: Conceptualization, Supervision, Writing-review & editing. Han.L.& Wang T.: Formal analysis, Writing—review & editing. All authors have read and agreed to the published version of the manuscript.

Qi, S. H., Wang, B. J., Zhang, Q. Y., Luo, C. F., & Zheng, L. Study on the Estimation of Air Temperature from MODIS Data. National Remote Sensing Bulletin, 2005; (05):570–575.
Huang, R., Zhang, C., Huang, J. X., Zhu, D. H., Wang, L. M.,& Liu, J. Mapping of Daily Mean Air Temperature in Agricultural Regions Using Daytime and Nighttime Land Surface Temperatures Derived from TERRA and AQUA MODIS Data. Remote Sensing, 2015; 7:8728–8756.
Kawashima, S., Ishida, T., Minomura, M., & Tetsuhisa, M. Relations between Surface Temperature and Air Temperature on a Local Scale during Winter Nights. Collected Papers of Agricultural Meteorology,1996; 41:1570–1579.
Zhou, H. M., Zhou, C. H., Ge, W. Q, & Ding, J. C. The Surveying on Thermal Distribution in Urban Based on GIS and Remote Sensing. Acta Geographica Sinica, 2001; 56(2): 189–197.
Cheolhee, Y., Jungho, Im., Seonyoung, Park., Lindi, J,. & Quackenbush. Estimation of Daily Maximum and Minimum Air Temperatures in Urban Landscapes Using MODIS Time Series Satellite Data. ISPRS Journal of Photogrammetry and Remote Sensing, 2018; 137:149–162.
GAO, L., DU, X., LI, Q. Z., WANG, H.Y., ZHANG, Y., & WANG, S.Y. A Near-surface Air Temperature Spatialization Method Integrating Landuse and Soil Moisture Products. Journal of Geo-Information Science, 2020; 22(10):2023–2037.
XING, L. T., & LI, J. Temperature Simulation and Temporal Variation Based on Remote Sensing Data and Random Forest Algorithm: A Case Study in the Loess Plateau Region, China. Mountain Research, 2020; (6):873–880.
LI, F. F., Zhao, Y. K. & XU, B. X. Study of Neural Network Based on Hierarchical Genetic Algorithm. Machinery & Electronics, 2006; (02):41–44.
Wang, Z. L., & Gu, S. S. FNN Identifier Based on Real-Valued Genetic Algorithms. Journal of Northeastern University(Natural Science), 2020; (04):354–356.
Zhou, A. Z., Li, L. P., Zhong, G. Y., Yang, H. Q., & Chen, G. Prediction of Casing Window with Swirling Abrasive Jet Based on Genetic Algorithm Optimization of BP Artificial Neural Network. Science Technology and Engineering, 2014; 14(27):202–206.
Feng, H. D., Han, X., & Luo, H. S. Indoor air quality evaluation model based on GA-BP neural network. Microcomputer & Its Applications, 2017; 36(23):54–57.
Zhou, Y., Lu, J. A., & Liu, X. Prediction of Chlorophyll a Content in Water Body Based on BP Neural Network with Improved Genetic Algorithm. Electronic Test, 2022; (15):37–42.
Zhu, J. S., Song, Z. Z., & Zhao, L. L. Multi-spectral Water Depth Inversion Based on Bottom Sediment Classification and SVR Algorithm. Geospatial Information, 2019; 17(11): 44–46.
Niu, Y., & Guo, C. Prediction of atmospheric evaporation data in Fuping County based on support vector machine. Meteorological Science and Technology, 2022; 42(14):63–66.
Wu, Y. T., Chen, J. C., Xu, Y. X., Chai, J. W., & Zhang, X.Research on Temperature Forecast of Zhangye City Based on Support Vector Machine. Gansu Science and Technology, 2022; 38(05):26–28 + 36. 10.3969/j.issn.1000-0952.2022.05.009.
Guo, F., & Xie, L. Y. Air Quality index Estimation Method based on Meteorological Elements Data and Modified Support Vector Machine. Environmental Engineering, 2017; 35(10):151–155. Luo, F. Q. An Overview of Rainfall Forecasting Based on Support Vector Machine. Journal of Guangxi Science & Technology Normal University, 2017; 32(02):113–116.
Zhang, Q., Huang, S. Z., & Chen, X. H. Simulation and Prediction of Soil Moisture Based on Support Vector Machine Technique. Acta Pedologica Sinica, 2013; 50(01):59–67.
Wang, Z. W., Zheng, Z. F., Chen, & M., Gao, H. Prediction of Meteorological Elements Based on Nonlinear Support Vector Machine Regression Method. Journal of Applied Meteorological Science, 2012; 23(05):562–570.
Fan, S. X., Yang, C. X., Yang, Q. L., & HAN, S. C. Prediction model of Panax notoginseng leaf area growth based on particle swarm-optimization random forest algorithm and meteorological data. Chinese Herbal Medicines, 2022; 53(10):3103–3110.
Xu, Y. P. & Chen, Y. A. Urban Air Quality Prediction Model Based on Random Forest Regression and Meteorological Parameters:Take Chongqing as an Example. Journal of Chongqing Technology and Business University (Social Science Edition), 2021; 38(06):118–124.
Li, D., Chen, W. T., Le, Z. Y., Fan, X. L., Sun, Y., Meng, Y. T., & Yang, J. Forecast method for the first flowering date of Dangshansu pear based on random forest algorithm and meteorological factors. Transactions of the Chinese Society of Agricultural Engineering, 2020; 36(12):143–151.
Ren, C. R., & Xie, G. Prediction of PM_(2.5) Concentration Level Based on Random Forest and Meteorological Parameters. Computer Engineering and Applications, 2019; 55(02):213–220.
Guo, J. M., Wang, J. J., Wu, Y., Xie, X. Y., Shen, S. H., & Yu, G. G. Improvement of model on rice heat injury monitor and assessment by MODIS and meteorology station data. Journal of Natural Disasters, 2018; 27(01):163–174.
Zhang, X., Tang, Y. Y., & Wei, J. Research on retrieval of remote sensing images on surface temperature-sensitive band. Science of Surveying and Mapping, 2015; 40(05):37–43.
Meng, X. Y., Liu, Y. X., Chen, Y. L., Wang, Y. H., & Chen, Y. C. Spatial feature-based machine learning model for shallow water depth retrieval. Advances in Marine Science,2023; 1–12.
Zhang, J. S., He, C. Y., Pan, Y. Z., & Li, J. The High Spatial Resolution RS Image Classification Based on SVM Method with the Multi-Source Data. National Remote Sensing Bulletin,2016; (01):49–57.
Wang Cailin, Zhong Weigong. The effect of high temperature on rice seed setting rate and its defense strategies. Jiangsu Agricultural Science, 2004; (01): 15–18.
Wan Suqin, Chen Chen, Liu Zhixiong, Zhou Yuehua, Deng Huan, Gao Suhua. Temporal and Spatial Distribution of High Temperature Heat Damage to Rice in Hubei Province under the Background of Climate Change. China Agricultural Meteorology, 2009;30 (S2): 316–319.
Yu Kun, Song Jing, Gao Ping. The occurrence pattern and characteristics of high-temperature heat damage in rice in Jiangsu Province. Meteorological Science, 2010;30 (04): 530–533.
Jin Zhifeng, Yang Taiming, Li Renzhong, et al. The occurrence pattern of high temperature heat damage in Zhejiang Province and its impact on early rice yield. Chinese Journal of Agricultural Meteorology, 2009;30 (4): 628–631.

No competing interests reported.

Research on Retrieval Model of Daily Maximum Temperature in Rice Growing region of Sichuan Province Based on Remote Sensing Data

Status:

Version 1

Abstract

Introduction

Materials and methods

Results

Discussion

Conclusions

Declarations

Supporting information

Author Contribution

References

Additional Declarations

Status:

Version 1