As mentioned in Sect. 3, seven ML algorithms and two ensemble forecasting methods were used in this study. The results and discussion have been presented in the following order. Section 4.1 lists the optimal parameters and input combinations of each model determined by the grid-search method. The methods used to evaluate the performance of models are also introduced in this section. Comparisons of the performance of each algorithm are presented in Sect. 4.2, which is divided into several segments. First, the MLP is compared with the relatively novel DL algorithm, (DNN herein). Second, several algorithms that applied recursive techniques are analyzed to derive the pros and cons of each recurrent unit. Finally, the performances of all seven models are compared, and the best single algorithm under different forecasting conditions is presented. This model’s integrated techniques (used to merge the ensemble forecasts into a single result) are compared with EM and SP. Additionally, the confidence interval is calculated for the results of all models using the reliability analysis. The results of this analysis are presented in Sect. 4.3.
4.1 Determining models and performance measures
As previously mentioned, the optimization algorithm used in this study was the grid-search method, which took into account all the hyperparameter combinations within a reasonable range. Rainfall and reservoir inflow information were the input factors that had to be determined for reservoir inflow forecasting. In this study, the best input combinations selected for reservoir inflow forecasting were the same for all models. Because of the runoff concentration time for the study area, the range of reasonable inputs includes rainfall and inflow information from a current to a 4-h lead-time, i.e., the information from t to t-4.
4.1.1 Optimal inputs and hyperparameters
The optimal hyperparameters and input combinations are listed in Tables 2 and 3, respectively. Table 2 shows the optimal combinations of the SVM, RF, and MLP. Based on these combinations, the conclusion can be drawn that forecasts under short lead-time conditions may rely more heavily on the early rainfall information. Conversely, long lead-time forecasting (e.g., 4 and 6-h cases) will focus more on runoff information. Table 3 shows the optimal combinations for DL algorithms. Based on the number of hidden layers and the neurons in each, we hypothesized that RNN, LSTM, and GRU (networks that adopted the recursive technique) would be prone to exhibiting a more complex architecture when processing relatively longer lead-times. Contrastingly, DNN tended to use fewer hidden layers and neurons, with longer lead-times. A possible rationale for these patterns is that the specific recursive technique included in three types of RNN, as well as evolutionary networks, can effectively improve the solution for space-fitting performance under complex fitting conditions. However, the over-complicated structure of the conventional DNN may give rise to significant biases.
Table 2
The optimal parameters for conventional MLs
Lead-time (h)
|
|
|
|
|
|
|
SVM
|
Input
|
Kernel function
|
Gamma
|
Cost
|
Epsilon
|
Degree
|
1
|
R(t − 1), R(t − 2), Q(t), Q(t − 1), Q(t − 2)
|
rbf
|
2
|
8
|
0.007813
|
0
|
4
|
R(t), R(t − 1), Q(t), Q(t − 1), Q(t − 2), Q(t − 3)
|
rbf
|
1
|
0.125
|
0.007813
|
0
|
6
|
R(t), R(t − 1), Q(t), Q(t − 1), Q(t − 2), Q(t − 3)
|
rbf
|
2
|
0.125
|
0.007813
|
0
|
RF
|
Input
|
Number of trees
|
Max. features
|
|
|
|
1
|
R(t − 1), R(t − 2), Q(t), Q(t − 1), Q(t − 2)
|
200
|
auto
|
|
|
|
4
|
R(t), R(t − 1), Q(t), Q(t − 1), Q(t − 2), Q(t − 3)
|
150
|
auto
|
|
|
|
6
|
R(t), R(t − 1), Q(t), Q(t − 1), Q(t − 2), Q(t − 3)
|
150
|
auto
|
|
|
|
MLP
|
Input*
|
Hidden layer**
|
Activation function
|
Optimizer
|
Batch size
|
Learning rate
|
1
|
R(t − 1), R(t − 2), Q(t), Q(t − 1), Q(t − 2)
|
{[256]}
|
ReLU
|
Lbfgs
|
200
|
0.0001
|
4
|
R(t), R(t − 1), Q(t), Q(t − 1), Q(t − 2), Q(t − 3)
|
{[256]}
|
ReLU
|
Lbfgs
|
200
|
0.0001
|
6
|
R(t), R(t − 1), Q(t), Q(t − 1), Q(t − 2), Q(t − 3)
|
{[10]}
|
ReLU
|
Lbfgs
|
200
|
0.001
|
* R(t) represents the rainfall at time t and Q(t − 1) is the inflow for time t − 1.
** {[x]} indicates the number of hidden layers as a single layer and the number of neurons is x.
Table 3
The optimal parameters for DLs
Lead-time (h)
|
|
|
|
|
|
|
DNN
|
Input
|
Hidden layer*
|
Activation function
|
Optimizer
|
Batch size
|
Loss function
|
1
|
R(t − 1), R(t − 2), Q(t), Q(t − 1), Q(t − 2)
|
{[32], [64], [128]}
|
ReLU
|
Nadam
|
32
|
MSLE
|
4
|
R(t), R(t − 1), Q(t), Q(t − 1), Q(t − 2), Q(t − 3)
|
{[128], [256]}
|
ReLU
|
Rmsprop
|
32
|
MSLE
|
6
|
R(t), R(t − 1), Q(t), Q(t − 1), Q(t − 2), Q(t − 3)
|
{[16], [32]}
|
ReLU
|
Rmsprop
|
64
|
MSLE
|
RNN
|
Input
|
Hidden layer
|
Activation function
|
Optimizer
|
Batch size
|
Loss function
|
1
|
R(t − 1), R(t − 2), Q(t), Q(t − 1), Q(t − 2)
|
{[64], [128]}
|
ReLU
|
Rmsprop
|
32
|
MSLE
|
4
|
R(t), R(t − 1), Q(t), Q(t − 1), Q(t − 2), Q(t − 3)
|
{[32], [64], [128]}
|
ReLU
|
Rmsprop
|
32
|
MSLE
|
6
|
R(t), R(t − 1), Q(t), Q(t − 1), Q(t − 2), Q(t − 3)
|
{[32], [64], [128]}
|
ReLU
|
Rmsprop
|
32
|
MSLE
|
LSTM
|
Input
|
Hidden layer
|
Activation function
|
Optimizer
|
Batch size
|
Loss function
|
1
|
R(t − 1), R(t − 2), Q(t), Q(t − 1), Q(t − 2)
|
{[64], [128]}
|
ReLU
|
Rmsprop
|
32
|
MSLE
|
4
|
R(t), R(t − 1), Q(t), Q(t − 1), Q(t − 2), Q(t − 3)
|
{[128], [256]}
|
ReLU
|
Rmsprop
|
32
|
MSLE
|
6
|
R(t), R(t − 1), Q(t), Q(t − 1), Q(t − 2), Q(t − 3)
|
{[32], [64], [128]}
|
ReLU
|
Rmsprop
|
32
|
MSLE
|
GRU
|
Input
|
Hidden layer
|
Activation function
|
Optimizer
|
Batch size
|
Loss function
|
1
|
R(t − 1), R(t − 2), Q(t), Q(t − 1), Q(t − 2)
|
{[64], [128], [256]}
|
ReLU
|
Rmsprop
|
64
|
MSLE
|
4
|
R(t), R(t − 1), Q(t), Q(t − 1), Q(t − 2), Q(t − 3)
|
{[64], [128], [256]}
|
ReLU
|
Rmsprop
|
32
|
MSLE
|
6
|
R(t), R(t − 1), Q(t), Q(t − 1), Q(t − 2), Q(t − 3)
|
{[64], [128], [256]}
|
ReLU
|
Rmsprop
|
32
|
MSLE
|
* As mentioned in Table 2, {[x], [y], [z]} represent that there has three hidden layers and the neurons of each layer are x, y, and z, respectively.
4.1.2 Performance measures
To more objectively evaluate the pros and cons of the seven algorithms and two ensemble integration techniques, six performance measures were adopted in this study, i.e., RMSE, mean absolute error (MAE), correlation coefficient (CC), correlation of efficiency (CE), error of peak inflow (EQP), and error of time to peak (ETP).
The purpose of the RMSE is to evaluate the difference between the forecasted and observed values. Particularly for the extreme value, using the sum of the squares of all differences between observations and forecasts, the RMSE values will comparatively emphasize errors in the peak value. Thus, the RMSE is often used to assess the performance of the peak value in the linear and nonlinear prediction. The MAE can effectively represent the error between predictions and real values. Different from the RMSE, all predicted values will be evaluated fairly, without a tendency to focus on any particular segment. The CC is used to illustrate the relevance between the forecasted and observed values. The closer the CC is to 1, the better the model’s performance will be. The CE represents the degree to which the forecasts produced by a model are more accurate than forecasts using directly average. Similar to the CC, the closer the value is to 1, the better that model will perform. In addition, EQP and ETP are used to evaluate a model’s forecasting effectiveness regarding peak values. Their equation can be derived using Equations (1) and (2). The EQP represents whether the peak value of inflow is close to the observed value. The closer the value is to ±0%, the better is the model’s performance. Finally, ETP calculates the error between the timing of the peak value of the inflow forecasted by models and the real timing.
4.2 Model comparisons
4.2.1 Comparison of conventional machine learning and deep learning
This study first compares the difference between conventional ML and DL, which employ MLP and DNN as representatives, respectively, in 1, 4, and 6-h lead-time inflow forecasts for Shimen Reservoir. Figure 5 shows the hydrographs of inflow forecasted by MLP and DNN. Under the 1-h lead-time forecasting condition, both MLP and DNN accurately forecasted the inflow characteristics, regardless of the rising limb, peak segment, and falling limb aspects, which stand for the timing that water level is rising, the maximum water level and water level is falling. For the peak value, the DNN forecast showed comparatively lower overestimation compared with the MLP and more closely matched the observed inflow; thus, the lag time generated by the DNN was shorter than that generated with the MLP. In the 4-h lead-time forecasting case, the inflow hydrographs forecasted by MLP and DNN shared similar trends, particularly for low inflow, where predictions were accurate. For peak-value forecasting, both the MLP and DNN tended to underestimate the peak value and generated a degree of lag time. Notably, regardless of lead-time conditions, the MLP tended to indicate a longer lag time compared with the DNN. In a 6-h lead-time forecasting scenario, the DNN tended to underestimate to a larger degree than the MLP; when forecasting the peak value, the MLP also indicated an unstable forecasting status regarding aspects other than the peak value. Overall, in short lead-time forecasting conditions, both the MLP and DNN obtained relatively good forecasting results. In most cases, however, the DNN obtained more stable and accurate forecasting results compared with the MLP for Shimen Reservoir under extreme events.
4.2.2 Comparison of recursive networks
In this segment, the RNN, LSTM, and GRU (which involved recursive techniques) are evaluated and analyzed. The results forecasted by the three algorithms are shown in Fig. 6. Figure 6(a) shows the forecasting inflow under 1-h lead-time forecasting. The hydrographs generated by all algorithms accurately forecasted the inflow at each stage of the typhoon in question. However, for peak-value forecasting, the three algorithms yielded different degrees of overestimation. The order from most severe to smallest overestimation is RNN, LSTM, and GRU. The lengths of lag time for peak value generated by all the models were essentially the same. In general, the three algorithms could forecast results accurately in the 1-h lead-time forecast scenario. For the 4-h lead-time forecasting scenario, as Fig. 6(b) shows, the three algorithms still stimulated the trend of time series of inflow. This differed for the 1-h lead-time scenario, where all three algorithms tended to underestimate the peak value and generated a longer lag time. Before reaching the forecasted peak value, the inflow hydrographs forecasted by the three algorithms were approximately the same. However, the peak forecasted by the RNN occurred later compared with those forecasted by the two other algorithms and included a greater error related to the observed inflow. The results of long-term forecasts below a 6-h lead-time condition are shown in Fig. 6(c). Three separate time series (one for each algorithm) obtained a longer lag time compared with 1 and 4-h lead-time forecasts. Additionally, the three algorithms showed similarities in 4-h lead-time forecasting cases, which tended to underestimate the peak value. Here, the severity of underestimation, ranked from high to low, is RNN, GRU, and LSTM. The hydrographs forecasted by LSTM and GRU showed similar trends. However, the inflow hydrograph forecasted by RNN could not effectively derive the characteristics of the observed inflow. The twin peaks and bumpy rising limbs demonstrate that its forecasting ability is not as good as LSTM and GRU in the long-term forecasting scenarios.
In summary, for short lead-time forecasting scenarios, all the networks with recursive techniques could effectively forecast the inflow and did not generate severe error estimations or lag times. However, when lead time gradually became longer, LSTM and GRU indicated better stability and generated more accurate forecasting performances compared with RNN. Overall, LSTM and GRU showed advantages for using single algorithms to forecast the inflow in Shimen Reservoir. On the other hand, according to the Fig. 6(c), the hydrograph forecasted by RNN also reserved advantages, such as the rising limbs in a 6-h lead-time forecasting case. Prior to reaching the first peak forecasted by RNN, the curve trend of RNN was observably more accurate than that predicted by LSTM and GRU. These advantages will subsequently be applied to the SP method in Sect. 4.3.
4.2.3 Comparison of all models
In this section, all algorithms are compared to denote their pros and cons. The inflow hydrographs forecasted by all algorithms are shown in Fig. 7.
Figure 7(a) shows the 1-h lead-time inflow hydrographs forecasted by all the models. In these algorithms, except for SVM (which tended to underestimate the peak value), the remaining algorithms show slight overestimation. The rationale for the underestimation of SVM may be related to the kernel function used in this model being a radial basis function, which can fit all curve trends well but can lead to underestimation of the peak value while the forecasted event having the maximum flow among selected events.
The differences between algorithms were observed primarily when the lead-time became longer than 3 h. Figure 7(b) shows the 4-h lead-time forecasts for all seven algorithms. The forecasts for the low inflow segment by all algorithms were similar to each other and were the same as the observed values. Regarding rising limbs, all seven algorithms tended to underestimate the inflow relative to the observed value. Conversely, for falling limbs, all seven algorithms tended to overestimate the inflow because of the average 1.5-h lag time, resulting in right-shifting of the forecasted time series. The analysis of peak flow showed that, except for RF, the remaining algorithms were inclined toward varying degrees of underestimation. According to the Fig. 7(b), the order of severity regarding underestimation is as follows: SVM, GRU, LSTM, RNN, MLP, and DNN. However, the lag time generated by each algorithm can be arranged from short to long as follows: LSTM was similar to GRU and SVM, but shorter than DNN, MLP, and RNN. Concerning the peak value forecasted by RF, in contrast to the other algorithms, it tended to overestimate the peak value and produced a lag time similar to DNN. The hydrographs of inflow forecasted for a 6-h lead time are shown in Fig. 7(c). The 6-h lead-time forecasts were less accurate than the 1 and 4-h lead-time forecasting results. Underestimation is more serious in this context; it can be sorted from severe to minor as GRU, SVM, RNN, DNN, LSTM, MLP, and RF. Furthermore, from short to long, the lag time can be ranked as RNN, MLP, RF, DNN, LSTM, RNN, GRU, and SVM. In particular, although the RNN produced the shortest lag time, the curve forecasted by this algorithm had double peaks, which was observably different from the actual situation, indicating that it may be overly sensitive or insufficiently robust for use in extreme events.
The performance measures of all algorithms are shown in Table 4. The performance measures of each model are the average of the forecasts for all typhoons. Under a total of 18 conditions with six indicators and three forecast lead times, GRU achieved the best performance in nine cases and its performance have the improvement of 7.63% and 6.4% in the RMSE and MAE compared with second-class algorithms. In addition, SVM achieved the best performance in four other conditions (the best performance after GRU).
Table 4
Performance measures of all models
Lead-time (h)
|
SVM
|
RF
|
MLP
|
DNN
|
RNN
|
LSTM
|
GRU
|
RMSE (m3/s)
|
|
|
|
|
|
|
|
t + 1
|
88.69
|
102.97
|
95.53
|
89.86
|
100.18
|
91.36
|
82.40
|
t + 4
|
226.45
|
245.98
|
259.77
|
218.92
|
243.26
|
258.98
|
218.56
|
t + 6
|
346.52
|
436.23
|
359.66
|
354.40
|
385.12
|
377.99
|
399.97
|
MAE (m3/s)
|
|
|
|
|
|
|
|
t + 1
|
52.87
|
55.89
|
53.76
|
52.94
|
54.66
|
50.87
|
47.81
|
t + 4
|
121.80
|
126.08
|
130.15
|
108.88
|
121.45
|
136.55
|
117.97
|
t + 6
|
177.51
|
235.22
|
221.19
|
174.80
|
185.19
|
191.46
|
195.85
|
CE
|
|
|
|
|
|
|
|
t + 1
|
0.908
|
0.921
|
0.914
|
0.916
|
0.899
|
0.918
|
0.926
|
t + 4
|
0.631
|
0.578
|
0.577
|
0.634
|
0.613
|
0.540
|
0.560
|
t + 6
|
0.176
|
−0.575
|
0.083
|
0.108
|
0.143
|
0.157
|
0.204
|
CC
|
|
|
|
|
|
|
|
t + 1
|
0.962
|
0.962
|
0.961
|
0.965
|
0.967
|
0.966
|
0.967
|
t + 4
|
0.900
|
0.861
|
0.874
|
0.898
|
0.887
|
0.880
|
0.896
|
t + 6
|
0.801
|
0.671
|
0.727
|
0.752
|
0.742
|
0.751
|
0.726
|
EVP (%)
|
|
|
|
|
|
|
|
t + 1
|
0.03
|
0.06
|
0.05
|
0.05
|
0.10
|
0.05
|
0.03
|
t + 4
|
0.07
|
0.18
|
0.13
|
0.06
|
0.11
|
0.08
|
0.05
|
t + 6
|
0.06
|
0.38
|
0.19
|
0.09
|
0.19
|
−0.02
|
−0.03
|
EPT (h)
|
|
|
|
|
|
|
|
t + 1
|
0.40
|
0.60
|
0.20
|
0.00
|
0.80
|
0.60
|
0.80
|
t + 4
|
1.20
|
1.40
|
2.40
|
1.40
|
1.60
|
1.40
|
1.40
|
t + 6
|
3.00
|
−6.60
|
2.40
|
−0.40
|
2.20
|
3.80
|
3.60
|
Although the above results indicate that GRU can efficiently perform in terms of average performance measures, we suggest that based on the analysis in Sect. 4.2, other algorithms have advantages over GRU (e.g., peak-value forecasting). Accordingly, in Sect. 4.3, two methods that are used to combine various algorithms are proposed, and the ability to draw on the advantages of combined algorithms is explored.
4.3 Performance of switched prediction
As mentioned in Sect. 4.2, according to the performance measures, although GRU and SVM may have performed better in most cases, the remaining algorithms still indicate advantages in particular contexts. It will be unreasonable to select the best algorithm among seven and ignore the advantages of the remaining six when the aim is to generate the stable and reliable forecasts. Therefore, in this section, two methods, (EM and SP) are employed to integrate the seven models.
As in the sections comparing algorithms, due to limited space, we selected 1, 4, and 6-h time-lead forecasts to represent short, medium, and long-term inflow forecasts, respectively. To render the analysis representative, the hydrographs presented herein include typhoons representing the top three largest inflow during the test sessions, i.e., typhoons Soudelor, Megi, and Maria.
Table 5 shows the best parameter combinations used in SP after calibration, where N represents the length of forecasts used to evaluate the ranking of algorithms, P is the performance measure used for ranking, and M is the number of results forecasted by algorithms selected to integrate the final SP result.
The 1-h lead-time forecasted inflow hydrographs of three typhoons are shown in Fig.
8. The red and blue lines represent the forecasts integrated by SP and EM. The gray lines and areas represent the seven algorithms presented in Sect. 4.2 and their calculated 95% confidence interval. The best parameter combinations determined for the SP under a 1-h lead-time forecasting condition was N4M4, where N4 denotes that forecasts from
t to
t-4 were used to calculate the performance measures and determine which algorithms performed better under such conditions. Concurrently, M4 indicated that SP would select the top four algorithms and calculate their average as the forecasting value. On the other hand, the EM was employed to directly average the results of all algorithms.
Table 5
The optimal parameters for SP
Lead-time (h)
|
N
|
P
|
M
|
t + 1
|
4
|
RMSE
|
4
|
t + 4
|
2
|
RMSE
|
1
|
t + 6
|
3
|
RMSE
|
4
|
* Parameter N represents the length of the forecasted values used to evaluate the ranking of algorithms.
* P is a performance measure as the target function when implementing SP.
* M is the number of algorithms that will be selected to integrate the final forecast.
As Fig. 8 shows, with a 1-h lead-time, either EM or SP was able to obtain fairly accurate forecasts for three typhoons, particularly regarding the low inflow value. For peak forecasting, both methods may exhibit slight error estimations to the same extent. Therefore, we speculated that the individual performance of the two integrated methods would be similar in a 1-h lead-time forecasting case.
Figure 9 shows the 4-h lead-time forecasting hydrographs obtained from the EM and SP. The results of the integrated SP were significantly more accurate than the results generated by the integrated EM. During the low water-level period, in the case of rising and falling limbs, both methods tended to overestimate the inflow. However, unlike the unstable EM, which would alternative overestimation and underestimation, the results forecasted by SP were comparatively more stable and controllable. The comparison of the forecasting peaks for the three events observably indicated that SP could effectively improve the accuracy of peak forecasting and reduce lag time.
The results of the long-term forecasting, i.e., 6-h time-lead forecasts, are shown in Fig. 10. Under these conditions, the combined forecasts generated by the two methods showed longer lag times than the previously mentioned 1 and 4-h lead-time forecasts; this was because the results of the original seven algorithms each incorporated a certain degree of lag time. Based on typhoons Soudelor and Megi, SP provides more advantages than EM, regardless of peak forecasting or the prediction of other elements. Particularly, in the case of Typhoon Maria, because most algorithms were unable to accurately forecast the inflow, neither SP nor EM could effectively generate improved integration results.
The RMSE and the rankings of the seven algorithms, as well as EM and SP forecast results, are listed in Tables 6, 7, and 8, and present the forecast results of 1, 4, and 6-h lead-time forecasts, respectively. The rationale for choosing RMSE is that it can effectively reflect the forecasting accuracy of extreme values, and for the extreme rainfall events employed in this study, peak forecasting accuracy is a primary concern. As shown in the column headings, the events listed in the Tables 6–8 represent the average training RMSE obtained from 13 training typhoons, as well as the top three typhoon events in the observed inflow of the test events (typhoons Soudelor, Megi, and Maria). Under 1-h lead-time forecasting conditions, DNN achieved the smallest RMSE value in the two test events, as well as the smallest average among all the events in the test sessions. However, during Typhoon Megi, DNN observably regressed from being the most accurate model to being in the sixth position. The reason for this could have been the insufficient stability of using a single algorithm to effect. Conversely, the forecast results integrated by SP maintained the second and third positions in the three test events, and their average performance and stability was better compared to EM, which represented the third and fourth positions. Concerning the 4-h lead-time scenario, using the RMSE, we found no single algorithm that could achieve a stable and outstanding performance in the three test events. In terms of average performance, GRU achieved second place but also indicated instability in its forecast for Typhoon Maria. Contrastingly, the integrated forecasts generated by SP could be stabilized in all algorithms to obtain the top two forecasting performances, and even the best forecasts for the average test sessions when using long time-lead forecasting scenarios. Finally, for long-term 6-h lead-time forecasts, although SP ranked at the fourth place for Typhoon Maria, we detected minimal deterioration between the top three algorithms. Nonetheless, SP achieved excellent results for the other two test typhoons and exhibited the best performance on average.
Table 6 Comparison of all models with single, EM, and SP for 1-h lead-time forecasting
Method
|
13 Typhoons
|
Soudelor
|
Megi
|
Maria
|
Mean
|
Training
|
Test
|
Rank
|
Test
|
Rank
|
Test
|
Rank
|
Test
|
Rank
|
RMSE (m3/s)
|
RMSE (m3/s)
|
RMSE (m3/s)
|
RMSE (m3/s)
|
RMSE (m3/s)
|
t+1
|
|
|
|
|
|
|
|
|
|
SVM
|
82.19
|
152.34
|
5
|
86.34
|
5
|
108.32
|
8
|
115.67
|
5
|
RF
|
45.21
|
201.55
|
9
|
126
|
9
|
105.57
|
4
|
144.37
|
9
|
MLP
|
110.16
|
175.75
|
6
|
94.26
|
7
|
119.56
|
9
|
129.86
|
7
|
DNN
|
100.23
|
124.35
|
1
|
88.07
|
6
|
90.17
|
1
|
100.86
|
1
|
RNN
|
76.37
|
186.51
|
8
|
99.34
|
8
|
106.58
|
5
|
130.81
|
8
|
LSTM
|
96.58
|
183.55
|
7
|
72.64
|
1
|
108.29
|
7
|
121.49
|
6
|
GRU
|
100.21
|
135.92
|
2
|
77
|
4
|
107.9
|
6
|
106.94
|
2
|
EM
|
|
151.42
|
4
|
75.38
|
3
|
99.63
|
3
|
108.81
|
4
|
SP
|
|
151.4
|
3
|
72.86
|
2
|
97.34
|
2
|
107.2
|
3
|
Table 7 Comparison of all models with single, EM, and SP for 4-h lead-time forecasting
Method
|
13 Typhoons
|
Soudelor
|
Megi
|
Maria
|
Mean
|
Training
|
Test
|
Rank
|
Test
|
Rank
|
Test
|
Rank
|
Test
|
Rank
|
RMSE (m3/s)
|
RMSE (m3/s)
|
RMSE (m3/s)
|
RMSE (m3/s)
|
RMSE (m3/s)
|
t+4
|
|
|
|
|
|
|
|
|
|
SVM
|
281.41
|
442.03
|
3
|
247.72
|
3
|
243.39
|
7
|
311.05
|
5
|
RF
|
111.36
|
500.09
|
7
|
281.11
|
8
|
231.36
|
6
|
337.52
|
7
|
MLP
|
274.52
|
533.35
|
9
|
278.01
|
7
|
272.75
|
8
|
361.37
|
9
|
DNN
|
242.37
|
461.39
|
4
|
254.5
|
5
|
169.13
|
1
|
295
|
3
|
RNN
|
215.11
|
531.15
|
8
|
272.17
|
6
|
208.68
|
4
|
337.33
|
6
|
LSTM
|
217.94
|
499.36
|
6
|
301.39
|
9
|
274.8
|
9
|
358.52
|
8
|
GRU
|
237.92
|
412.27
|
1
|
234.94
|
2
|
212.47
|
5
|
286.56
|
2
|
EM
|
|
464.16
|
5
|
249.65
|
4
|
206.24
|
3
|
306.68
|
4
|
SP
|
|
440.16
|
2
|
229.15
|
1
|
175.86
|
2
|
281.72
|
1
|
Table 8 Comparison of all models with single, EM, and SP for 6-h lead-time forecasting
Method
|
13 Typhoons
|
Soudelor
|
Megi
|
Maria
|
Mean
|
Training
|
Test
|
Rank
|
Test
|
Rank
|
Test
|
Rank
|
Test
|
Rank
|
RMSE (m3/s)
|
RMSE (m3/s)
|
RMSE (m3/s)
|
RMSE (m3/s)
|
RMSE (m3/s)
|
t+6
|
|
|
|
|
|
|
|
|
|
SVM
|
394.47
|
674.83
|
1
|
336.57
|
1
|
421.78
|
7
|
477.73
|
2
|
RF
|
165.01
|
821.53
|
8
|
406.55
|
5
|
550.26
|
9
|
592.78
|
9
|
MLP
|
441.05
|
730.43
|
6
|
411.12
|
6
|
345.65
|
1
|
495.73
|
5
|
DNN
|
379.94
|
717.45
|
5
|
359.58
|
2
|
372.78
|
5
|
483.27
|
3
|
RNN
|
320.3
|
703.86
|
3
|
471.8
|
8
|
455.77
|
8
|
543.81
|
7
|
LSTM
|
388.87
|
790.79
|
7
|
435.55
|
7
|
365.77
|
3
|
530.71
|
6
|
GRU
|
385.98
|
841.26
|
9
|
476.29
|
9
|
406.95
|
6
|
574.83
|
8
|
EM
|
|
717.09
|
4
|
394.43
|
4
|
357.09
|
2
|
489.54
|
4
|
SP
|
|
691.38
|
2
|
368.33
|
3
|
371.72
|
4
|
477.14
|
1
|
Overall, the integration of multiple algorithms using SP can effectively merge the advantages of all algorithms and enable them to exert their respective advantages in various situations. Based on the results in the above figures and tables, SP has a high degree of stability and accuracy compared with that when using a single algorithm forecast or EM integration. Hence, in this study, it is recommended that SP be used with various ML algorithms for ensemble forecasting to improve forecasting accuracy and enhance its practical use.