3.1. Descriptive Statistics
The TSN values of the samples ranged from 0.21 to 1.4 mg/kg, with a reported standard deviation of 0.44, mean value of 0.79, and a median value of 0.92 mg/kg. The distribution of nitrogen content showed a moderate change, with an average value higher than the standard deviation. The distribution was skewed, with a skewness coefficient of 0.4. Five different models (Models A, B, C, D, and E) were created by combining various environmental variables for TSN mapping. Table 3 presents all the information pertaining to the TSN value and predictive variables of soil samples.
Table 3
Statistics of environmental parameters and TSN values.
| Minimum | Maximum | Mean | Median | Standard Deviation | Skewness |
TSN (g/Kg) | .21 | 1.40 | .79 | .92 | .44 | − .40 |
Elevation (m) | 500 | 1700 | 1050 | 987 | 231.36 | .68 |
Aspect (Deg) | 0.0 | 345 | 210 | 134 | 101 | .41 |
Slope (Deg) | .28 | 15.32 | 12.01 | 4.65 | 4.68 | .94 |
TWI | 6.21 | 16.35 | 9.24 | 9.45 | 3.74 | .81 |
BC-1 (db) | -24.13 | 1.37 | -14.26 | -14.56 | 3.79 | 1.27 |
BC-2 (db) | -`16.45 | -4.21 | -10.31 | -9.41 | 2.37 | − .23 |
BC-3 (db) | -16.87 | -2.31 | -10.23 | -9.4 | 1.84 | − .28 |
BC-4 (db) | -25.14 | -6.94 | -17.36 | -17.54 | 2.64 | .21 |
Band-4 (digital number) | 324.13 | 3145.36 | 1084.51 | 927.45 | 841.36 | .70 |
Band-5 (digital number) | 2047.31 | 4861.58 | 3316.74 | 3612.34 | 641.52 | − .23 |
Band-6 (digital number) | 1362.47 | 4123.54 | 2145.78 | 3478.31 | 674.42 | .80 |
NDVI | .04 | .76 | .50 | .63 | .37 | − .31 |
MAP (mm) | 92 | 149 | 124 | 91.5 | 19.6 | .48 |
MAT (°C) | 4.97 | 7.32 | 6.54 | 7.0 | .60 | − .70 |
Model A lacked information and optical data, while the B and C models examined the effects of SAR and optical data. Model D utilized information and data from Sentinel-1 and Landsat-9 images, and Model E considered all variables related to remote sensing data without restrictions. The performance of these models was evaluated using three algorithms (RF, SVM, and BRT) and is presented in Table 4.
Table 4
The efficiency of the used models in relation to the RF, SVM, and BRT algorithms.
| Model | MAE | RMSE | R2 |
RF | Model A | 0.20 | 0.26 | 0.53 |
Model B | 0.23 | 0.30 | 0.38 |
Model C | 0.23 | 0.28 | 0.45 |
Model D | 0.21 | 0.26 | 0.51 |
Model E | 0.20 | 0.25 | 0.56 |
SVM | Model A | 0.21 | 0.27 | 0.48 |
Model B | 0.23 | 0.29 | 0.37 |
Model C | 0.23 | 0.29 | 0.41 |
Model D | 0.22 | 0.28 | 0.45 |
Model E | 0.21 | 0.26 | 0.51 |
BRT | Model A | 0.20 | 0.26 | 0.53 |
Model B | 0.24 | 0.30 | 0.38 |
Model C | 0.24 | 0.29 | 0.41 |
Model D | 0.20 | 0.25 | 0.56 |
Model E | 0.19 | 0.25 | 0.58 |
The accuracy of the five models was evaluated using the R2, RMSE, and MAE indices. A higher R2 value and lower RMSE and MAE values indicated higher accuracy. The BRT algorithm demonstrated the highest accuracy in TSN distribution prediction with an R2 value of 0.58, a RMSE value of 0.25, and a MAE value of 0.19. Among the three algorithms, Model C (including multi-temporal SAR data) outperformed Model B (including Landsat-9 variables) in terms of accuracy. These results indicated that incorporating SAR data was highly successful in TSN mapping and could potentially replace optical images.
According to the results, Model C with the RF algorithm had the highest R2 value and the lowest MAE and RMSE values of 0.45, 0.23, and 0.28, respectively, indicating a 45% accuracy in monitoring TSN changes (Table 4). The success of Model C may be attributed to the use of multi-temporal Sentinel-1A and Landsat-9 data. The BRT algorithm with SAR images also improved the accuracy of TSN prediction by 50%, 18%, and 18% in terms of R2, RMSE, and MAE, respectively. Model A's accuracy was also increased by incorporating Sentinel-1A data, resulting in R2 values of 0.58 and 0.56 for the BRT and RF algorithms, respectively. Finally, Model E, which included multiple remote sensing variables, climate, topography, and LULC, had the highest performance in predicting TSN changes, with the BRT and RF algorithms able to explain 58% and 56% of TSN changes, respectively.
3.2. Relative importance of environmental data
For better results, variable normalization was carried out in this study. Figure 2 depicts the values of each environmental variable in Model E as influenced by the BRT and RF algorithms.
The values of similar environmental variables differed between the two algorithms. The environmental variables in the RF algorithm were as follows: Band-4, Band-6, NDVI, BC-3, and LULC. But environmental variables in the BRT algorithm included "altitude, NDVI, BC-2, BC-3, and LULC".
Both BRT and RF algorithms had almost equal values for the three variables of land use, land cover, and BC-3. In the RF algorithm, climatic and topographical variables, LULC, and Sentinel-1A data accounted for 10%, 13%, 16%, and 28%, respectively, in the estimation of TSN changes. However, Landsat-9 variables were the most important, with a share of 33% in monitoring TSN changes. On the other hand, remote sensing variables had a larger share in the BRT algorithm, with values of 51% and 61%, respectively. Climatic and topographical variables, LULC, Sentinel-1A, and Landsat-9 data, contributed 7%, 17%, 25%, 31%, and 20%, respectively, to the estimation of TSN changes by the BRT algorithm.
3.3. Prediction of spatial distribution of TSN
The Predictive maps of TNS spatial distribution were presented in Fig. 3 (Model E) and Fig. 4 (Model D). These maps produced by the BRT, RF, and SVM algorithms. The reported TSN values in Model E were 0.82 (± 0.25), 0.84 (± 0.32), and 0.85 (± 0.21) g/kg, respectively (Table 5). The results showed that the observed (real) TNS data value were more than the predicted TNS by the BRT, RF, and SVM algorithms. BRT and RF algorithms were used to investigate the changes caused by the TSN level of two E and D models. The main reason for this choice was that both mentioned algorithms are tree-based. They also have a more acceptable performance than the SVM algorithm for monitoring TSN changes.
The BRT and RF algorithms demonstrated satisfactory performance with average values of -0.01 and − 0.03 in monitoring TSN using D and E models, respectively. However, it is worth noting that the significant difference in predicted TSN values between the D and E models was only observed in areas with bare land and vegetation.
Table 5
TSN values in Model E by SVM, BRT, and RF methods.
Algorithm | Minimum | Maximum | Mean | Standard Deviation (SD) |
SVM | 0.22 | 1.41 | 0.85 | 0.21 |
BRT | 0.11 | 1.68 | 0.84 | 0.32 |
RF | 0.28 | 1.35 | 0.82 | 0.25 |
3.4. Efficiency of the algorithm
The research findings demonstrated that RF and BRT algorithms were more effective in accurately estimating TSN values. This is consistent with the results of other studies, such as Wang et al. (2018b), who found that tree-based algorithms were more accurate than SVM for mapping soil carbon stocks in grasslands in eastern Australia (S. Wang, Jin, et al., 2018). Ottoy et al. (2017) compared the accuracy of ANN, BRT, and SVM algorithms for evaluating soil organic carbon (SOC) reservoirs through digital soil mapping and concluded that BRT had higher accuracy than the other two algorithms (Ottoy et al., 2017). Tziachris et al. (2019) used RF and co-kriging to investigate SOC using DEM, and found that RF outperformed co-kriging (Tziachris et al., 2019). Peng et al. (2019) produced a TSN distribution map using remote sensing techniques and compared the accuracy of BRT, SVM, and RF algorithms. They reported that BRT and RF algorithms had higher accuracy than SVM, and combining Landsat-8 and Sentinel-2 data using tree-based algorithms resulted in a 17% increase in MAE and RMSE indices (Peng et al., 2019).
The use of Sentinel-1 data and optical images ensured high accuracy in this study, indicating that incorporating more useful parameters can improve results. Specifically, the results showed that using Sentinel-1 data as multi-temporal SAR data improved TSN mapping accuracy in the studied paddy fields. These findings align with those of previous research by (Asfaw et al., 2018; Periasamy & Shanmugam, 2017; Rahmati & Hamzehpour, 2017; Robinson et al., 2017), who demonstrated that combining satellite data and indices with topographical parameters can yield accurate estimates of TNS distribution. Wang et al. (2017) also noted that incorporating various related auxiliary and environmental variables can significantly enhance the mapping of different soil characteristics (C. Wang et al., 2017).
A spatial distribution map of TSN values was created by Zhang et al. (2019) using multi-spectral satellite images from Sentinel-2A, and a comparison of different algorithms was conducted, with RF being used (Zhang et al., 2019). To estimate the spatial distribution of TSN, they used 21 auxiliary variables, which included main spectral bands and indices as well as environmental variables. The results indicated that the RF algorithm, based on remote sensing techniques, was able to accurately record changes in TSN. The performance of the prediction model could also be enhanced by using appropriate types of predictors. Wang et al. (2017) used RF and stepwise multiple regression algorithms TSN in Lushun city, northeastern Liaoning province in China, using. To map TSN in forest areas, they suggested that remote sensing data and environmental variables should be used as the main predictors.
3.5. Value of predictor variables
The use of SAR images is crucial for accurately predicting TSN content. Radar data can capture information beyond the soil and vegetation surface, allowing for the identification of targets such as soil texture, moisture, salinity, and above-ground biomass using backscatter signals. Felsberg et al. (2021) showed that radar images can be used to check soil properties by recording information on vegetation cover that strongly affects soil reflectance (Felsberg et al., 2021). Guo et al. (2019) demonstrated the effectiveness of combining Sentinel-1 and Sentinel-2 digital data for land surface biomass estimation (Guo et al., 2019). Satellite bands and indices, including Sentinel-1, Sentinel-2, and Landsat, have been utilized in mapping different soil properties. The mapping of TSN content is particularly influenced by land use and land cover (LULC) changes, which affect spectral reflection that is closely related to biomass and plant density. Xu et al. (2018) concluded that spectral reflectance is the most effective factor for monitoring TSN changes based on their research on image pan sharpening of soil nitrogen content in India (Xu et al., 2017). Digital satellite images are the most important auxiliary data for monitoring environmental change compared to other variables. Topographic variables are considered to have high predictive power for digital soil mapping. In addition to height, topographic wetness index, aspect, and slope are critical variables in TSN distribution as they affect nitrogen mineralization and organization, as well as available nitrogen content. Rainfall also indirectly affects changes in the nitrogen cycle by influencing the amount of absorbed nitrogen by plants and vegetation productivity (Pérez-Piqueres et al., 2017).