3.2 Distribution of flash flood event frequency based on their causative factors
The influencing flash flood factors were divided based on natural break, and flash flood records were identified within each subclass. By dividing the number of flash flood records in each subclass by the total number of flood records in the region studied, the relative frequency was obtained. Figure 8, presents the outcomes of the flash flood frequency analysis and their corresponding causative factors. Considering the annual mean temperature range (8 to 22°C) in Semnan province, 50% of floods occurred at temperatures between 14 and 20°C, while the lowest number of floods was recorded in the highest temperature class. The results indicated that temperatures below 14°C contribute to 25% of the flash flood frequency, particularly in the northern belt mountain region. Conversely, temperatures exceeding 18°C also accounted for 25% of the flash flood frequency. It is noteworthy that these areas include desert regions with limited datasets and knowledge.
Over 75% of flash flood events occur in areas with a total rainfall less than 180 mm per year. Interestingly, locations with higher rainfall values (> 200 mm per year) exhibit the lowest flash flood frequency (only 25% of the total occurrences). The maximum frequency was detected at approximately 150 mm per year, while the minimum frequency of flash floods was identified in the highest rainfall class (Fig. 8). This area, characterized by high elevation, generates runoff, leading to water flow accumulation in regions with slower angle slopes and lower altitudes. A peak in flash flood frequency (75%) was observed in areas with an NDVI below 0.25. As the NDVI value increases, there is a corresponding decrease in recorded flash flood frequency. These findings suggest that the maximum flash flood frequency was observed in locations with NDVI ranges between 0.05 and 0.15, representing over 40% of all observed events. Lower values of the STI were associated with the highest probability of flooding, with 50% of flood events occurring within the STI range of < 13. In contrast, only 25% of flash flood events were observed where STI values exceeded 40.
The slope class of 0–4 demonstrated the highest incidence of flash floods, constituting 75%, whereas flash floods were found to be infrequent on steep slopes. The SPI values indicated that lower values were associated with a higher frequency of flash floods compared to other ranges. SPI values less than 355 accounted for half of the total flood frequency, while only 25% of flash flood frequency occurred in SPI values exceeding approximately 3500.
Variation in elevation indicated that lower elevations are associated with a higher probability of flash flood frequency, with 75% of the total occurrences observed at elevations below 1600 m. In contrast, flood frequency diminished in high-elevation areas. According to the TWI, areas with higher potential wetness are susceptible to a greater frequency of flash floods. TWI values range from 3.6 to 14, and in regions where TWI is less than 8, only 25% of flash flood occurrences were recorded (Fig. 8). The peak frequencies of flash floods were identified within the TWI range of 8 to 9.5, constituting 50% of the total frequency.
In the studied region, the CN ranged from 45 to 100, and the results highlighted that the maximum flood frequency is associated with CN values between 80 and 90. Notably, half of the flood events occurred when CN values exceeded 85. Considering the distance to rivers factor, approximately 60% of flash flood events took place within a distance of less than 1 km from rivers. By increasing distance to rivers, the frequency of flash flood events decreases (Fig. 8). In addition, MRVBF varied from 0 to 9 in the study region. A significant finding demonstrated that 75% of flash flood events were observed when MRVBF was less than 3.4. Similarly, as MRVBF increased, the frequency of floods decreased.
3.3 Selection of flood-influencing factors based on the Boruta wrapper-based algorithm
To enhance the performance of ML models, the Boruta feature selection was applied (Table. 5 and Fig. 9). The MZS was considered a threshold, and the mean importance of each feature through all iterations was compared with the MZS. The results of the Boruta algorithm showed that all 19-flash flood-influencing factors were confirmed and had effects on flash floods; with CN recognized as a tentative feature. Based on Table. 5 and Fig. 9, the following features showed the maximum importance: temperature (26.51), distance to river (23.71), elevation (16.46), TWI (16.33), MRVBF (14.16), curvature (14.04), and NDVI (13.49). In contrast, CN, plan curvature, aspect, and infiltration showed the minimum importance, with their importance measure estimated at 4.27, 4.81, 4.81, and 4.81, respectively.
Table 5
The results of the Boruta variable importance (IMP. is the importance measure calculated through several iterations)
Variable | Mean. IMP. | Median. IMP. | Min. IMP. | Max. IMP. | Decision | Ranked |
Aspect | 4.81 | 4.86 | 2.41 | 7.00 | Confirmed | 16 |
Catchment Area | 11.26 | 11.24 | 9.52 | 12.92 | Confirmed | 9 |
Flow Accumulation | 9.78 | 9.62 | 8.80 | 11.09 | Confirmed | 11 |
Elevation | 16.46 | 16.49 | 14.88 | 18.42 | Confirmed | 3 |
MRVBF | 14.16 | 14.14 | 12.81 | 15.33 | Confirmed | 5 |
Plan Curvature | 4.81 | 4.89 | 1.96 | 6.83 | Confirmed | 16 |
Curvature | 14.04 | 14.01 | 12.28 | 16.14 | Confirmed | 6 |
Slope | 10.33 | 10.32 | 9.19 | 11.63 | Confirmed | 10 |
TWI | 16.33 | 16.35 | 14.32 | 17.95 | Confirmed | 4 |
SPI | 8.51 | 8.49 | 7.12 | 10.06 | Confirmed | 12 |
STI | 7.25 | 7.23 | 5.57 | 9.23 | Confirmed | 14 |
Infiltration | 4.81 | 4.78 | 3.62 | 6.63 | Confirmed | 16 |
Distance to River | 23.71 | 23.64 | 21.23 | 25.84 | Confirmed | 2 |
Geology | 6.59 | 6.67 | 4.95 | 8.08 | Confirmed | 15 |
LULC | 7.32 | 7.26 | 6.17 | 8.88 | Confirmed | 13 |
CN | 4.27 | 4.14 | 2.55 | 6.03 | Confirmed | 17 |
NDVI | 12.85 | 12.75 | 11.14 | 14.33 | Confirmed | 7 |
Rainfall | 12.41 | 12.49 | 10.80 | 13.82 | Confirmed | 8 |
Temperature | 26.51 | 26.38 | 24.21 | 29.09 | Confirmed | 1 |
3.4 Model construction and validation
The four standalone and hybrid ML models (RF, SVR, GLMnet, TreeBag, and Ensemble) were employed to generate flash flood susceptibility maps in the examined region. The flash flood inventory map (Fig. 1) played a pivotal role in training and testing these ML models. Initially, a binary scale was applied to classify points as flooded or non-flooded, with 1 and 0 denoting flooding and non-flooding locations, respectively. For model training and testing, a 70:30 ratio approach was adopted. The flash flood inventory map was divided into two segments, with 70% assigned to the training dataset and the remaining 30% to the testing dataset. Given the problem of binary nature in the susceptibility classification, non-flood locations were considered for dataset generation. In the subsequent step, optimal parameter values for each machine learning model (RF, SVR, GLMnet, TreeBag, and Ensemble) were determined. It is essential to underscore that the selection of the best hyperparameter configuration directly impacts the performance of ML models (Yang and Shami 2020). Consequently, the hyperparameter tuning process was conducted before the actual training of the models.
In the subsequent phase, we generated flash flood susceptibility maps utilizing machine learning models (Fig. 10). The maps were created with 30 × 30-m cells. The natural break classification approach was used to categorize each prediction map. The resulting risk maps were then classified into four distinct categories: low, moderate, high, and very high susceptibility.
According to Fig. 10, the southern part of the WSP exhibited the lowest susceptibility to flash flooding, while the highest risk was identified in the northern half of the study region across all predicted models. Both the SVR and TreeBag models displayed a tendency to overestimate the very high class (20 and 21% of the total area, respectively). The GLMnet model, on the other hand, underestimated the very high class and overestimated the moderated class (9 and 27% of the total area, respectively). The results of the RF and Ensemble models were nearly identical.
The percentage coverage of each susceptibility class of flash flood maps based on the ML models is illustrated in Fig. 11. According to Fig. 11, the TreeBag, SVR, RF, GLMnet, and Ensemble models indicate that about 21, 20, 12, 9, and 17% of the total area fall within the very high susceptibility class, respectively. The area with high susceptibility class covers approximately 10, 15, 16, 18, and 14% of the total area based on the TreeBag, SVR, RF, GLMnet, and Ensemble models, respectively.
The performance of each model was assessed using evaluation criteria such as accuracy, precision, recall, and AUC indexes. The results of the evaluation for the ML models are presented in Fig. 12. Based on these findings, all considered models exhibited acceptable performance, with the average values of accuracy, precision, recall, and AUC index across all classification models estimated at 0.88, 0.93, 0.84, and 0.88, respectively. Upon closer examination of Fig. 12, it is evident that among all prediction models, the GLMnet model displayed the weakest performance, with corresponding values of 0.73, 0.87, 0.62, and 0.75 for accuracy, precision, recall, and AUC, respectively. In summary, the evaluations of the models’ accuracy showed that the RF and Ensemble models exhibited superior performance in the flash flood susceptibility modeling.