3.1 Construction of conceptual hydrological model
To evaluate the ability of the above four methods for identifying causal associations in hydrological systems, firstly, a conceptual hydrological model was employed to generate the synthetic time series, where the underlying causal associations have already been acquired and can be deemed as true relationships. The EXP-HYDRO model (Patil and Stieglitz, 2014), which characterizes the complexity of hydrological processes while maintaining as few variables and parameters as possible to ensure the interpretability of the constructed causal network, was chosen as our basic model framework. We simplified and split the original EXP-HYDRO model into two structures, namely the rainfall-runoff structure (Model S1, Fig,2a) and the snowmelt-runoff structure (Model S2, Fig. 2b).
Model S1 includes four variables: effective precipitation P, soil water storage S, interflow I, and runoff Q. Besides, the model contains three parameters: the maximum soil water storage Smax, and nonlinear storage-discharge parameters Ks, and γ. The model structure and corresponding causal network are shown at the top of Fig. 2a. The only forcing variable P was generated by a stochastic weather generator (WeaGETS; Chen et al., 2010), including a Markov chain for occurrence and a gamma distribution for quantity. Specifically, the occurrence of precipitation was generated with a binary variable \(\widehat {P}\) (i.e., wet or dry days) using the first-order Markov chain (Richardson, 1981):
\({\widetilde {p}_{i,j}}(t)=probability({\widehat {P}_t}=j|{\widehat {P}_{t - 1}}=i){\text{ ; }}i,j=0,{\text{ }}1{\text{ ; }}t>1\) (12)
where \({\widetilde {p}_{i,j}}\)is the transition probability from the state i to j; the states 1 and 0 represent the wet and dry days, respectively. Considering that precipitation either occurs or not on a given day, \({\widetilde {p}_{0,1}}+{\widetilde {p}_{0,0}}={\widetilde {p}_{1,0}}+{\widetilde {p}_{1,1}}=1\). In this study, we set \({\widetilde {p}_{0,0}}={\widetilde {p}_{1,1}}=0.8\), \({\widetilde {p}_{0,1}}={\widetilde {p}_{1,0}}=0.2\). The values of \({\widetilde {p}_{0,0}}\)and \({\widetilde {p}_{1,1}}\) were set relatively high in order to raise the persistence in the model, which could make some obstacles to the detection of causal associations. Next, the quantity of precipitation was generated by a Gamma distribution:\(\widehat {P}\sim Gamma(\alpha ,\beta)\); Generally, for precipitation data, the parameter α is smaller than 1 while β has a wide range of possible values (Geng et al., 1986). In this study, α and β were set as 0.6, and 6, respectively. Besides, adding some noise to the forcing variable helps to approach the real situation and test the anti-interference ability for causal inference methods. Here, \({P_t}={\widehat {P}_t}+{\eta _p}\), and \({\eta _p}\)is the white Gaussian noise (zero mean and \(\sigma _{P}^{2}\)variance). The \(\sigma _{P}^{2}\) can be calculated as the expectation value of P2 divided by the signal-to-noise ratio (SNR), as follows: \({\sigma _P}^{2}=E{\text{ }}[{P^2}]/SNR\) and the SNR can be transformed into the unit of dB:\(SNR(dB)=10{\log _{10}}SNR\).
The other variables S, I, and Q were determined by the equations of water balance and nonlinear storage-discharge relationship, as:
$$\frac{{d{S_t}}}{{dt}}={P_{t - 1}} - {I_{t - 1}}$$
13
$${I_t}={K_s} \times S_{{t - 1}}^{\gamma }$$
14
$${Q_t}=\left\{ \begin{gathered} {S_{t - 1}}+{P_{t - 1}} - {S_{\hbox{max} }}{\text{ }};{\text{ }}{S_t} \geqslant {S_{\hbox{max} }} \hfill \\ {\text{ }}0{\text{ ; }}{S_t}<{S_{\hbox{max} }} \hfill \\ \end{gathered} \right.$$
15
where Ks represents the speed of reduction in water storage, and γ is an exponent for S.
Model S2 includes four variables: temperature T, snowmelt M, soil water storage S, and interflow I, and three parameters: the nonlinear storage-discharge parameters Ks, and γ, and snow-melting parameter Kmelt. The model structure and corresponding causal network are shown at the top of Fig. 2b. The only forcing variable T was generated by a Normal distribution with a mean of zero. We also added some noise to this forcing variable: \({T_t}={\widehat {T}_t}+{\eta _T}\), where \({\eta _T}\)is the white Gaussian noise (zero mean and \(\sigma _{T}^{2}\)variance). The calculation of \(\sigma _{T}^{2}\) is similar to \(\sigma _{P}^{2}\).
The other variables S, I, and M were determined by the equations of water balance and nonlinear storage-discharge relationship, as:
$${I_t}={K_s} \times S_{{t - 1}}^{\gamma }$$
16
$$\frac{{d{S_t}}}{{dt}}={M_{t - 1}} - {I_{t - 1}}$$
17
$${M_t}=\left\{ \begin{gathered} {K_{melt}} \times {T_{t - 1}}{\text{ }};{\text{ }}{T_{t - 1}} \geqslant 0 \hfill \\ {\text{ }}0{\text{ ; }}{T_{t - 1}}<0 \hfill \\ \end{gathered} \right.$$
18
The parameters of the conceptual hydrological model are listed in Table 2 and the corresponding frequency distributions of the generated hydrological variables are shown at the bottom of Fig. 2. The frequency distributions are estimated with a sample size of 2000 and SNR(dB) of 40. Since most hydrological series follow a skewed distribution and have considerable zeros values (Gong et al., 2014), especially for the forcing variable rainfall/snow, the synthetic series match these properties and can represent the real hydrological system to some extent.
Besides, for each model structure, 100 datasets were generated to ensure the robustness of the assessment. In the sensitivity evaluation of causal methods, sample size varies from 100 to 2000; SNR(dB) from 2 to 40; data missing rate from 10–60% with two scenarios, namely synchronous and asynchronous sparsity. In the former scenario, all variables are missing at the same time indices that are generated randomly, while in the latter, the time indices vary from the variables. The missing values are complemented by the arithmetic mean of observed values of the same variable (Gao et al., 2018).
Table 2
The parameters of the conceptual hydrological model.
|
Parameter
|
Value
|
|
maximum soil water storage (Smax)
|
20
|
Model S1
|
storage-discharge parameter 1 (Ks)
|
0.01
|
|
storage-discharge parameter 2 (γ)
|
1.5
|
|
snow-melting parameter (Kmelt)
|
2
|
Model S2
|
storage-discharge parameter 1 (Ks)
|
0.5
|
|
storage-discharge parameter 2 (γ)
|
0.6
|
In this study, three metrics, namely True Positives Rate (TPR), False Positives Rate (FPR), and Accuracy Rate (AR), were adopted to evaluate the performance of causal inference methods:
$$TPR=\frac{{TP}}{{TP+FN}}$$
19
$$FPR=\frac{{FP}}{{FP+TN}}$$
20
$$AR=\frac{{TP+TN}}{{TP+FP+TN+FN}}$$
21
where TP and FN denote the true positive and false negative, respectively, which means that the causal link generated by the model is correctly and incorrectly identified, respectively; TN and FP are true negative and false positive, respectively, which means that the non-causal link is correctly and incorrectly identified, respectively. The lower values of FPR and the higher values of TPR and AR indicate the better performance of the tested causal method.
3.2 Performance in large sample tests
In this section, 100 datasets were generated with a sample length of 2000 to evaluate the performance of different causal methods. Figure 3 presents the causal structures of the hydrological model identified by four causal methods. The results represent the average behavior of the experiment, i.e., a causal link exists only if it is identified by more than half of the 100 simulations. The detection of the true causal structures in Models S1 and S2 is mainly disturbed by confounding and mediating variables, respectively, which are typical challenges for causal inference. In Model S1, all methods correctly identified the causal links P → Q, P → S, and S → I. The CCF method incorrectly identified the links Q → I and Q → S due to the influence of the confounding variables P (simultaneously affecting S and Q) and S (simultaneously affecting Q and I) respectively. Besides, the indirect link P → I results from the mediating variable S. These incorrect links in other causal methods (CCM, TE, and PCMCI+) also share the similar reasons. As the bivariate unidirectional causal methods, the CCF and TE methods miss the link I → S because the opposite direction S → I has a more significant causal effect. It should be noted that since the CCM method successfully identified all of the predefined causal links, many new false connections, i.e., Q → P, Q → S, S → P, I → P, were generated with the control of confounding and mediating variables. One reasonable explanation is that the synthetic system is strongly coupled, such that the CCM method easily mistakes unidirectional causal relationships for bidirectional relationships. In Model S2, PCMCI + perfectly restores all predefined causal links, while the other three methods (CCF, CCM, and TE) incorrectly identified many indirect causal links due to the transitivity of causal relationships. Similar to the results of Model 1, the CCF and TE methods miss the causal link I → S, and CCM also mistakes all unidirectional links for bidirectional ones.
Table 3 further lists the evaluation results of the causal tests in the synthetic case. The PCMCI + method shows the best overall performance in both model structures, with the TPR, AR, and FPR values of 0.97(1.00), 0.85(0.96), and 0.23(0.05) in Model S1(S2). The CCF and TE methods show relatively lower TPR and AR values, and relatively higher FPR values. In contrast, the CCM method shows the lowest AR (lower than 0.5) and the highest TPR and FPR (higher than 0.9). Besides, in situations where direct and indirect causality are not distinguished, that is, the indirect links identified by the methods are not considered as the wrong links (the values in parentheses of Table 3), the performances of CCF, CCM, and TE methods improve significantly, especially in Model S2, indicating that the three methods are strongly affected by mediating variables and the identified causal links are likely to contain indirect effects, which might hinder our understanding of the real mechanism in hydrological systems. It should be noted that in such situations, the TPR values are not changed because the true positives (TP, i.e., the causal links generated by the hydrological model are correctly identified) remain the same. In addition, due to the complexity of the hydrological model, the overall performance of all causal methods in Model S2 is better than that in Model S1 except for the CCM method, which shows extremely high TPR and FPR simultaneously owing to the mistakes for bidirectional links. Therefore, in the following real case studies, the identified causal links were revised to unidirectional links, that is, the direction with a significant maximum causal effect was regarded as the only causal direction, thus to avoid misidentification.
Table 3
Evaluation results of the causal tests in the synthetic case.
|
|
CCF
|
CCM
|
TE
|
PCMCI+
|
|
TPR
|
0.60
|
1.00
|
0.60
|
0.97
|
Model S1
|
AR
|
0.58 (0.60)
|
0.42 (0.50)
|
0.66 (0.69)
|
0.85 (0.87)
|
|
FPR
|
0.43 (0.40)
|
1.00 (1.00)
|
0.30 (0.22)
|
0.23 (0.22)
|
|
TPR
|
0.75
|
1.00
|
0.75
|
1.00
|
Model S2
|
AR
|
0.67 (0.89)
|
0.36 (0.49)
|
0.67 (0.89)
|
0.96 (1.00)
|
|
FPR
|
0.38 (0.00)
|
0.95 (0.93)
|
0.38 (0.00)
|
0.05 (0.00)
|
Note: The values in parentheses represent situations where direct and indirect causality are not distinguished, i.e., the indirect links identified by the methods are not considered as the wrong links. The TPR values are not changed in such situations. |
3.3 Performance in sensitivity tests
In this section, the sensitivity of each method in sample length, noise, and missing data was analyzed. Figure 4 presents the impact of sample length with the sizes of 100, 300, 500, 1000, and 2000. 100 datasets were generated to ensure the robustness of the results. As shown in Figs. 4a and b, in both Model S1 and S2, for CCF and CCM methods, the TPR is not sensitive to variations in sample length; while for TE and PCMCI + methods, the TPR increases with increasing sample length. For the CCF method, which is based on linear lag correlation, 100 samples are enough to detect all the causal links. This number is also applicable to the CCM method, which is based on the deterministic dynamical system theory. In contrast, TE is based on the probability density framework, which needs to estimate the probability density function from the histogram of the frequency distribution, as well as to implement statistical hypothesis testing for conditional independence, thus requiring sufficient sample length. Similarly, the PCMCI + method also requires sufficient length samples in conditional independence tests to keep iterating and removing initialized redundant connections. To achieve relative stability, the TE and PCMCI + methods need at least 1000 and 500 samples for Model S1 and S2, respectively. This difference results from the complexity of the model structure.
As for the Accuracy Rate (AR), the values in each causal method fluctuate or even decrease with the increasing sample length, especially for TE and CCM methods (Figs. 4c and d). This is attributed to the limitations of causal methods in dealing with the impacts of confounding and mediating variables, which become more striking with the increase of simple length and lead to a significant increase in false positive rate. Additionally, the dashed lines in Fig. 4 represent the situations where direct and indirect causality are not distinguished. The difference in trends between the solid and dashed lines can be further analyzed to determine whether variations in sample length affect the identification of indirect causal links. The CCF, CCM, and PCMCI + methods show a similar trend between the solid and dashed lines in both model structures, while for the TE method, the variation rate is slightly inconsistent or even the reverse, due to the increasing detection rate of indirect links P → I in Model S1, and T → S, T → Q, M → Q in Model S2, with the increase of sample length. Besides, in the situation where direct and indirect causality is not distinguished, the improvement in the performance of causal methods in Model S2 is more pronounced than that in Model S1, which is consistent with the setup of model structure, i.e., the former is mainly controlled by indirect causality.
Figure 5 presents the impact of noise with the SNR(dB) of 2, 3, 5, 10, 20, 30, and 40. The sample length was fixed as 2000, and 100 datasets were generated to ensure the robustness of the results. As shown in Fig. 5a, in model S1, the noise has little impact on the TPR values for the CCF and CCM methods with the increase of noise level (i.e., the decrease of SNR). In contrast, for the TE method, the TPR increases with the increasing noise level, especially when the SNR (dB) is below 20. This is attributed to the assumption of the nonlinear stochastic system for the TE method, and appropriate noise helps to identify causal relationships. For PCMCI+, the TPR decreases with the decreasing SNR(dB) and shows a slight increase when the SNR (dB) is lower than 10. However, in Model S2, the noise has little impact on the TPR values for all causal methods (Fig. 5b), owing to the insensitivity of this model structure to input noise. As for the Accuracy Rate (AR), the values in CCF and TE are relatively stable with the variations of the noise level in both model structures (Figs. 5c and d), while for the PCMCI + method, the AR decreases with the decreasing SNR in Model S1, and for the CCM method, the AR increases as the SNR(dB) drops below 10 in both model structures. Additionally, all causal methods show a similar trend between the solid and dashed lines in both model structures, indicating that the noise does not have significant impacts on the identification of indirect causal links.
Figure 6 presents the impact of missing data on the performance of each causal method with the sparsity rate increasing from 10–60%. Two scenarios of missing data, namely synchronous and asynchronous sparsity, which may occur in real hydrologic data due to equipment errors and defective storage technologies, were constructed with 100 tests. To ensure the computability of all causal methods, the missing values were complemented by the arithmetic mean of observed values of the same variable. As shown in Figs. 6a and b, with the increase of synchronous missing rates, the TPR values remain relatively stable for the CCF method and begin to decline at a critical missing value for the other three methods, especially for CCM and PCMCI+. In comparison with the synchronous sparsity, the TPR values remain relatively stable for the PCMCI method in the scenario of asynchronous sparsity, while for the CCM method, the TPR values decline significantly over the 30% point of missing rate (Figs. 6e and f). One reasonable explanation is that the CCM method requires state space reconstruction by continuous time series data, while the missing data can lose numerous effective information for the state space, especially in the asynchronous scenario, thus disturbing the detection of causal relationships.
As for the Accuracy Rate (AR), with the increasing missing rate, the values in the CCF method remain relatively stable, while in the other three methods (CCM, TE, and PCMCI+), the values show fluctuation (Figs. 6c, d, f and g). It should be noted that for the CCM method, the AR values increase substantially, especially over the critical missing value of 30%, which seems implausible. One reasonable explanation is that as the missing rate increases, effective information of variables in their state space is gradually lost, and causally linked variables in the system may no longer maintain the information signature of each other; therefore the efficiency of cross-mapping decreases. The detection rate of both correct and incorrect causal links declines simultaneously, and the latter is more pronounced. In addition, the CCF and PCMCI + methods show a similar trend between the solid and dashed lines, while for the TE method, the variation rate is slightly inconsistent or even the reverse over the 50% point of missing rate due to the decreasing detection rate of indirect links P → I in Model S1, and T → S and T → Q, M → Q in Model S2, with the increasing missing rate. For the CCM method, the variation rate is extremely inconsistent over the 30% point of the missing rate due to the declining detection rate of all indirect links (P → I, I → Q, T → S, T → Q, M → Q) simultaneously.
At the end of this section, the results of the sensitivity tests can be summarized as follows: In the impact of sample length, the CCF and CCM methods show relatively stable performance, while the TE and PCMCI + methods require at least 500 samples to achieve relative stability; In the impact of noise, the performance of causal models is affected by the model structure, and the TE and PCMCI + methods perform more unstably for Model S1; In the impact of missing data, all causal methods show relatively stable performance within 30% of the missing rate, and the performance of the CCM method deteriorates rapidly over 30%.