Global Estimation of River Bankfull Discharge Reveals Distinct Flood Recurrences Across Different Climate Zones

doi:10.21203/rs.3.rs-5185659/v1

Download PDF

Article

Global Estimation of River Bankfull Discharge Reveals Distinct Flood Recurrences Across Different Climate Zones

https://doi.org/10.21203/rs.3.rs-5185659/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

The maximum amount of water rivers can transport before flooding is known as the bankfull discharge, an essential threshold for flood risk and biogeochemical cycles. Current Global Flood Models rely on an untested assumption of a spatially-invariant, 2-year bankfull recurrence. Here, based on observations and machine learning, we deliver the first global estimation of bankfull discharge in different climates along a new bifurcating river network at ~ 1 km spatial resolution. In contrast to the 2-year assumption, we find rivers flood more frequently in tropical and temperate regions (median return periods of 1.5 and 1.8 years; IQR 2.5 and 3.2y, respectively), and less frequently in cold and arid regions (2.8/4.3 years; IQR 4.8/6.0y). Relative to observations, the 2-year assumption overestimates bankfull discharge in the tropics (54%±78%, mean ± std) and underestimates it in arid regions (10%±51%). This new understanding will transform our ability to make accurate global flood predictions.

Earth and environmental sciences/Natural hazards

Earth and environmental sciences/Hydrology

Flooding is the most frequent and costly weather-related hazard, with several million people affected every year¹. With the changing climate and growing population, river flood risks are projected to increase significantly in many locations, threatening lives, livelihoods, agriculture and infrastructure^2,3. Global Flood Models are the primary tool for estimating flood hazard and risk at the global scale. To simulate inundation, Global Flood Models must first provide a river channel capacity to initiate flow routing⁴. This river channel capacity is often quantified by the maximum flow rate (or bankfull river discharge, Q_BF) contained within a river just before inundation occurs in the surrounding floodplain. This critical river flow is not only fundamental for estimating river bankfull geometry^5,6, simulating flood inundation^7,8, but also for modelling sediment transport^9,10, investigating river geomorphological evolution¹¹, estimating river carbon fluxes^12,13, and understanding riverine habitats and migration¹⁴.

Due to the lack of global observations, Global Flood Models typically assume a spatially- and temporally-uniform recurrence interval of 2 years flow to approximate bankfull discharge^8,15–18. This assumption was originally based on field survey data from a sample of 36 rivers from the USA and one river in India, carried out in the 1950s¹⁷. For most of the 37 rivers surveyed, the bankfull discharge corresponded to the flow with a recurrence interval of 2 years. Although many later studies found that the bankfull discharge recurrence interval could be highly variable^19–23, this is not yet reflected in Global Flood Models, likely leading to large biases in estimating flood risk. For example, if a river has a channel capacity smaller than the 2-year assumption, more extended flood inundation or deeper water levels will occur in reality, and thus the models underestimate flood risk.

The two-year assumption has been widely adopted because reliable observations of bankfull discharge across different climate zones simply do not exist. A few studies have explored bankfull discharge at larger scales, primarily in the USA and Europe^23,24, but have focussed predominantly on individual river basins or limited within administrative boundaries²⁵. Since direct measurement of bankfull discharge is impractical due to its infrequent occurrence, it is typically estimated by interpolating the discharge corresponding to the bankfull water level based on the stage-discharge relationship from field surveys (Supplementary Fig. S1). Collecting such field data is costly. As a result, they are available for only a very limited number of rivers worldwide. Consequently, relying on field measurements of bankfull discharge for all river reaches globally is infeasible, and the wide range of river characteristics across the globe makes the accurate estimation of bankfull discharge a challenging task.

Accurately estimating bankfull discharge has become more pressing with the development of Global Flood Models^15,16. In recent years, a host of novel global datasets which can be used for this purpose have emerged, including the location and connectivity of rivers^25,26, observation-based river width datasets²⁷, and novel dam and reservoir databases²⁸. Additionally, machine learning is increasingly being used to produce new geophysical datasets^29,30. Here, we harness this rich variety of emerging datasets and machine learning to estimate bankfull discharge for ~ 2.87 million km of global rivers wider than 30 m. We present the first global estimation bankfull river discharge (GQBF), revealing the distinct pattern of bankfull discharge return periods across climate regions, and quantify the bias of the common 2-year return period bankfull discharge assumption relative to observations.

Global estimates of bankfull discharge

To estimate bankfull discharge for the world’s largely ungauged rivers, we first compiled an observed bankfull discharge (Q_BFobs) dataset based on 9 different sources, including literature values, cross-section measurements, and river gauging transects-based discharge-stage curves. The newly created dataset consists of 2,657 observations with wide spatial and magnitude distribution (details in Methods, Supplementary Fig. S2, Text S1). We based our river bankfull discharge estimation on a newly developed river network, Global RIver Topology (GRIT)²⁶, using GRIT’s ~ 1 km river reaches as the spatial scale to represent the variation in bankfull discharge. We included all GRIT river reaches that coincided with the Global River Width from Landsat (GRWL)²⁷ river masks (with overlapping ratio > = 0.5). This selects river reaches with satellite-derived width measurements wider than 30 m, resulting in a total length of ~ 2.87 million km. For those river reaches, we then prepared 31 predictors, including river characteristics and river discharge-related climate indicators (Methods and Supplementary Table S1). Based on the bankfull discharge observations dataset and the predictors, we developed a Random Forest model, and then use the trained and tested model to estimate bankfull discharge (Q_BFest) across the full ~ 2.87 million km of river reaches.

Our training of the model on 80% of the observed dataset and testing on the remaining 20% (excluded from the training) showed consistently high performance, with both R² values exceeding 0.72 (Supplementary Figs. S3-4). The Random Forest model exhibits a slight over-estimation, with Median Percentage Errors (MdPEs) of 22%, 17% (training, testing), which is similar to the 50-year flood errors obtained from Regional Flood Frequency Analysis²⁹. The Mean Percentage Errors (MPEs) are 25%, 46% (training, testing), which is slightly greater than a previous study at a coarser resolution of 5 arc min raster grid in Europe²³. This performance is good considering the challenge of estimation at high resolution across the globe. The normalized Root Mean Square Error (nRMSE) of training and testing is 1.22, 1.28, respectively. Among the four Köppen climate regions (tropical, temperate, cold, and arid)³⁰, the estimation has the smallest error (nRMSE = 0.97, testing) in the temperate region, likely due to the larger training sample within this climate region. The tropical region experiences the most significant overestimation, but also underestimations with values greater than ~ 7000 m³s^− 1 (Supplementary Figs. S4-5). Equations of the error metrics (R², MdPE, MPE, nRMSE) can be found in Supplementary Text S2.

The global bankfull discharge estimation is shown in Fig. 1a. The annual total bankfull discharge exhibits the highest values around 5°S, largely contributed by the Amazon River, and in the northern latitudes around 65°N, partly from the Yukon River (Fig. 1b).

Return period of bankfull river discharge varies across climate regions

Using records of daily discharge at 8,519 gauging stations, we calculated the return period for both the observed and the estimated bankfull river discharge distributed across four climate regions (Fig. 2a). The return periods of bankfull river discharge (Q_BF−RP) vary widely across the globe. Overall, almost a quarter of sites flood several times per year (23.2% of all sites), greater than both the proportions of sites with an estimated bankfull return period 1–2 years (22.1%), and 2–3 years (13.3%). The proportion of sites with return periods between 3 and 5 years takes 15.6%, decreases to 14.4% for 5–10 years, to 11.4% for return periods equal to or greater than 10 years (Fig. 2b). The spatial distribution suggests return periods greater than 3 years are rare in the tropical sites (Fig. 2a-b).

The return periods of bankfull discharge show different density distribution across climate regions (Fig. 2c). The peak density values (highest point in the y-axis) and the median values increase from tropical to temperate, cold, and arid regions. The median return period of the observed (estimated) bankfull discharge (Q_BF−RP) increases from 1.0 years (1.5) in the tropical region, to 1.4 years (1.8) in the temperate region, 2.0 years (2.8) in the cold region, to 3.2 years (4.3) in the arid region. The variation in the return period of river bankfull discharge is also pronounced within each climate region. The observed Q_BFobs−RP interquartile range (IQR) is [0.5, 2.2] (0.8, 3.3 estimated) for the tropical region, [0.8, 3.0] (0.9, 4.1) for the temperate region, [1.1, 4.3] (1.3, 6.1) for the cold region, and [1.9, 6.6] (2.4, 8.4) for the arid climate regions (Fig. 2d).

The proportions of sites at return period ranges within individual climate region fortify their bankfull discharge return period differences (Fig. 2e). Preeminently, the observed (estimated) bankfull return period is smaller than 1 years in 51% (36%) of tropical sites, much greater than the overall proportion of sites. In contrast, 52% of sites (66%) have return periods exceeding 3 years in the arid region.

Both observed and estimated bankfull discharge show a similar distribution of return periods (Fig. 2c-e). This consistency, to some extent, validates the reliability of our bankfull discharge estimations beyond the testing sites. Given the wider climatic distribution of the estimated bankfull discharge locations relative to those of the observed bankfull discharge, greater variation in the estimated bankfull discharge return periods is expected (Fig. 2c-d).

Biases in the commonly-used river bankfull discharge assumption

By comparing the 2-year return period of discharge (Q_RP2) with the observed bankfull river discharge, we quantified the bias of the widely used 2-year river channel capacity assumption. Overall, the 2-year return period discharge aligns surprisingly well with observed bankfull discharge (Fig. 3a). Compared to the overall density distribution, bias in the 2-year return period discharge is more evenly distributed (flatter density curves) in the tropical and temperate regions than in the cold and arid climate regions (Fig. 3b). Despite the general alignment between the 2-year return period discharge and the observed bankfull discharge, the magnitude of bias in the 2-year return period discharge is substantial. It overestimates the bankfull discharge in the tropical region by 54% ± 78% (mean ± standard deviation), by 44% ± 141% in the temperate region. The overestimation bias is the smallest in the cold climate region of 13% ± 59%. In contrast, it underestimates the bankfull discharge in the arid region by 10% ± 51% (Fig. 3c).

Overall, more than half of the stations show a 2-year return period discharge bias exceeding 25%, with more sites being overestimated (33.9%) than underestimated (20.8%). While a spatially invariant bankfull discharge return period equal to 2 years is relatively good estimate in the cold region, it largely overestimates more than 50% of river channel capacity in the tropical region (bias > 25%), and greatly underestimates 41% of river channel capacity in the arid region (bias > 25%) (Fig. 3d). Therefore, as the 2-year assumption overestimates the river channel capacity biases in the modelled flood risk incurred by the assumption of a spatially invariant 2-year bankfull discharge are likely to be severely underestimated in the tropical region.

The estimation of global river bankfull discharge has unavoidable uncertainties, arising first and foremost from the underlying bankfull discharge observations. Field measurements of the flow rate close to bankfull can have 10–20% uncertainty³¹ or more. Therefore, achieving accurate estimates of bankfull discharge is a challenging enterprise considering the uncertainty in the target variable itself. Second, the channel conveyance capacity, and thus bankfull discharge itself, can vary over time in many rivers^32,33. Here, due to the sparsity of bankfull discharge observations used to train the model for global estimation, we do not consider temporal changes in the bankfull discharge. In locations with multiple estimates of bankfull discharge, the mean values of all the measurements were used. Third, the bankfull discharge is typically measured at a specific cross section in the landscape, often corresponding to a gauging station, rather than at reach-scale. We assume our point-based measurements are representative of the average bankfull discharge for the GRIT river reach ( < = 1 km in length), and therefore, ignore spatial variation within the river reach. Fourth, this work focuses on the bankfull discharge estimation in gauged rivers, which are often large, perennial rivers draining more human-occupied watersheds³⁴. Fifth, even though the influence of human-intervention on river channels might be reflected in some of our predictors (dams and reservoirs, impervious ratio), rivers with flood defences can have much higher bankfull discharge, therefore we recommend caution if this data is used for rivers with flood defences. Finally, uncertainties also arise from the Random Forest model predictors, such as the river width predictor²⁷. Some smoothing could reduce the uncertainty at regional scales, as it is largely caused by a small number of outliers. When excluding very small or very large rivers, especially those smaller than 20 m³s^− 1 or greater than 7000 m³s^− 1 for which we have fewer observations, our bankfull discharge estimations performs well (Supplementary Fig. S5).

Our bankfull discharge ML model could be further refined in several ways. First, increasing the number and spatial coverage of observations could improve the model, although such data are particularly challenging to obtain. Remotely-sensed information, such as the retrievals of river stage from the recently launched Surface Water and Ocean Topography (SWOT) satellite³⁵ might be helpful for bankfull discharge estimation, particularly for large rivers. Second, upgrading the existing predictors with higher spatial resolution data once they become available in the future could enhance model performance. Additional types of predictors could potentially provide useful information, such as terrestrial water storage, glaciers, topographic information derived from digital elevation models, or more detailed information about channel modification and river regulation. Finally, we believe there is scope to extend the work to smaller rivers, developing regional models to provide higher-resolution estimates of channel conveyance and other river attributes such as channel depth and cross-sectional flow velocity.

This work refutes the prevailing assumption that rivers reach bankfull every two years, revealing distinct patterns of bankfull across different climate regions of the globe. We estimate bankfull discharge at ~ 1 km spatial resolution for ~ 2.87 million km of rivers globally, using a newly-compiled dataset of bankfull discharge observations at 2,657 gauging stations and a newly-created multi-threaded global river network (GRIT), with globally-distributed predictors reflecting catchment and river characteristics, climatology, landscape, and human intervention for all river reaches worldwide. By quantifying the variation in the return period of bankfull discharge across and within climate regions globally, we highlight the systematic biases associated with the use of a spatially-invariant bankfull return period. More importantly, we reveal the return period of observed (estimated) river bankfull discharge varies distinctly across climate regions, increasing from 1.5 [0.8, 3.3] years in the tropical region (median [25th, 75th percentile]), to 1.8 [0.9, 4.1] years in the temperate region, 2.8 [1.3, 6.1] years in the cold region, and 4.3 [2.4, 8.4] years in the arid region. Relative to observations, assuming a spatially-invariant 2-year discharge return period particularly overestimates bankfull discharge in the tropics (54%±78%, mean ± std) and underestimates it in arid regions (10%±51%). Hence, the assumption that bankfull discharge occurs consistently every two years on average should be considered as obsolete, especially for large scale applications crossing climate regions.

Our bankfull discharge estimation fills the gap in ungauged rivers globally. It improves on the assumption currently employed in Global Flood Models by representing the spatial variation of the bankfull discharge return period and thereby has the potential to improve our current understanding of global fluvial flood hazard and risk. These findings can also be a valuable source for riverine and water resources studies across multiple disciplines, including hydrology, geomorphology, and ecology.

The GRIT river network

We used the recently-developed Global RIver Topology (GRIT) river network^26,36 as the basis for developing the global bankfull discharge model. GRIT is the first global, high-resolution bifurcating river network, developed from observation-based river data²⁷ and an improved high-resolution global Digital Elevation Model (DEM): the Forest and Building removed DEM (FABDEM²⁵). The centrelines were generated from the GRWL and the OpenStreetMap water layer was used to maximize the topographic accuracy of the river network outside of the GRWL mask. GRIT is pioneering the representation of bifurcating and multi-threaded rivers which are missing from previous global river networks such as HydroSheds³⁷ and MERIT Hydro³⁸. GRIT represents all rivers with a drainage area larger than 50 km² or rivers with widths wider than 30 m globally, and the average length of a GRIT river reach is 1 km. We estimated the bankfull discharge for all the river reaches wider than 30 m in the GRIT network except Polar Köppen climate regions³⁰, where river channels are likely to be frozen for a large proportion of the year.

Bankfull discharge reference data

We assume that the point-based bankfull discharge values, typically measured at gauging stations, are representative of the nearest river reach. The observed bankfull discharge locations were matched to river reaches on the GRIT network using a snapping approach, pairing the Q_BFobs with the nearest river reach, defined by the distance from the Q_BFobs to the river centreline. The Q_BFobs values were discarded if this distance exceeded 1 km, or the drainage area of the Q_BFobs did not match with the drainage area of the paired river reach (deviated by more than 20%). We obtained the climate region of the observed bankfull discharge sites using the Köppen climate zone dataset³⁰. Considering the high seasonal variation of the river water status (free-flowing or frozen), sites in the Polar region were excluded. After applying the filtering steps, this left a total of 2,657 Q_BFobs values to train and test the RF model. These Q_BFobs spread across four climate regions (minimum number of 156 measurements), with bankfull discharge values ranging from 10 to 42,235 m³s^-1 (median value of 165 m³s^-1). Temporal changes in bankfull river discharge associated with river channel evolution were not considered in this paper due to the very small number of sites where such observations exist. More details of this dataset can be found in the Supplementary Fig. S2, Text S1.

Bankfull discharge predictors

A selection of hydrological, landscape and climatic indicators known to control river conveyance capacity (bankfull discharge) were identified based on the literature^33,39,40 and used as predictors for estimating the bankfull discharge. These indicators include drainage area, annual mean discharge, annual total runoff, river width at multiple inundation frequencies, river sinuosity, reservoir capacity, elevation, slope, water table depth, soil moisture and soil thickness, Leaf Area Index (LAI), Fraction of Absorbed Photosynthetically Active Radiation (FAPAR), impervious percentage area, and a suite of climate indicators (temperature, precipitation, aridity index, snow cover fraction), detailed below.

Depending on the indicator type, different aggregation approaches were used to process these indicators as GRIT river reach predictors. These approaches include Catchment Sum (CS: total value accumulated from the upstream catchment), Catchment Average (CA: mean of all raster values within the catchment), Catchment Majority (CM: the most common value among all raster values within the catchment), Reach Median (RM: median value of all raster values within the river reach), Reach Average (RA: mean value of all raster values within the river reach). It is worth noting that at bifurcated rivers, the calculation of CS follows the partitioning approach in the GRIT network²⁶; no extra processing is done for the RM and the RA as each channel of a bifurcated river is treated as an individual river reach. Some of the predictors were available over multiple years, so time averaged values (mean) were computed. The predictors and processing details are described below (Supplementary Table S1).

Drainage area has a controlling effect on river channel size and discharge^6,20,41. The upstream drainage area was estimated at each point along the river network from the river attributes of the GRIT network. The predictors GLOFAS_v4.0_discharge and GLOFAS_v4.0_runoff are re-analysis values of global annual mean discharge and annual total runoff available at 0.05° (° means degree of latitude and longitude throughout the manuscript unless otherwise stated) resolution for the period 1993 to 2022 (~ 5 km at the equator)⁴². From these data, we computed the time-average (mean) discharge value and mean annual total runoff between 1993 and 2022. Values of the time-averaged discharge and annual runoff were extracted for each GRIT reach and we took the spatial median value (RM) at each reach as the predictor.

The GRWL dataset is an observation-based, global compilation of river planform geometry at mean annual discharge²⁷, derived from thousands of Landsat images acquired in the months during which the mean annual discharge occurs at each location. Here, the river width attribute was calculated from the GRWL river mask by taking the reach median (RM) perpendicular distance from the GRIT river centre line to the river bank²⁶. We only considered rivers which overlap with the GRWL river mask (grwl_overlap > = 0.5). To take account of the uncertainty of the GRIT river width in our model, we included the overlap ratio³⁶ between GRIT river and the GRWL river mask as an additional predictor.

The Wetted River Widths (WRWs) dataset was derived from the Landsat surface water occurrence data⁴³. We use the term “wetted river widths” to describe the dynamic nature of the river width from various discharge scenarios. The Landsat surface water occurrence data is a global surface water mask which indicates the ratio between the number of months in which water was detected and the total number of valid observations over a 37-year period (1984–2021). We calculated the WRWs as the wetted area connected with the reach divided by the overlapping length between that river and the water layer. The calculation is applied to each river reach directly, resulting in the WRWs predictors. We extract these values for different water occurrence values (50%, 40%, 30%, 20%, 10%, 1%), where a smaller occurrence value indicates larger discharge events. We include several frequencies of occurrence because they provide different information on the surface extent (and morphology) of the channel based on the frequency with which the water goes out of bank in each location. The RF model is known to be relatively insensitive to collinear predictors⁴⁴, supporting this decision. We show the comparison between the WRWs at 1% occurrence and the GRWL in the Supplementary Fig. S6.

River sinuosity controls channel gradient and hence impacts the flow conveyance⁵. This was calculated as the river planform distance divided by the straight-line distance⁴⁵ for each segment in the GRIT network, using the GRASS GIS line sinuosity tool (https://grass.osgeo.org/grass83/manuals/v.to.db.html).

The Global Reservoir and Dam (GRanD) database is a global compilation of existing (up until 2016) dam and reservoir data. GRanD includes dams and reservoirs with capacity greater than 0.1 km³, with a cumulative storage capacity of 6864 km^{3 46}. The reservoir capacity predictor was computed by mapping the reservoir location (vector) to the nearest river reach and employing the upstream accumulated (CS) reservoir capacity as the reservoir predictor.

Catchment averaged (mean) (CA) elevation and slope were used for each river reach. They were derived based on the recent Forest And Buildings removed DEM (FABDEM, ~ 30 m spatial resolution)²⁵. This DEM is an artefact error-removed DEM based on the recent released 30 m global DEM, Copernicus DEM. The Copernicus DEM was resampled from the high-solution (12 m) Interferometry Synthetic Aperture Radar (InSAR) acquired global DEM, which has been demonstrated to be more accurate than previous global DEMs.

Climate predictors (Temperature, Annual precipitation, Aridity Index, Snow cover fraction) were averaged (mean) within the direct contributing upstream drainage area of each GRIT reach (CA). Here, we use the MSWX temperature and precipitation data from 1979–2022 (spatial resolution of 0.1°) to compute the annual mean temperature and annual mean precipitation⁴⁷. The temperature variation predictor (predictor: MSWX_temperature_range) was calculated using the monthly maximum temperature minus the monthly minimum temperature, averaged (mean) over all months.

Water table depth and soil moisture data were obtained from SOIL-WATERGRIDS⁴⁵. They are monthly data at a spatial resolution of 0.25° over the period 1970–2014. We used the mean depth of the highest monthly water table and of the lowest monthly water table as predictors Water Table Depth-High: WTDH, Water Table Depth-Low: WTDL, respectively. The monthly mean soil moisture was extracted at depth ranges of 0–30 cm, 30–60 cm, and 60–100 cm in the root zone based on all months of the period 1970–2014 (predictors: SI_0_30, SI_30_60, SI_60_100). Additionally, we used the mean soil thickness over the period of 1900 to 2015⁴⁸. The above predictors were computed as river catchment mean values (CA).

The aridity index⁴⁹ (version 3) has a spatial resolution of 30” (~ 1 km). Time-averaged values of the aridity index from 1970–2000 were computed, then they are averaged over each GRIT river catchment as the predictor (CA). The Moderate Resolution Imaging Spectroradiometer (MODIS) daily snow cover fraction dataset from 2000–2019 with a spatial resolution of ~ 1 km⁵⁰ was used to compute the percentile values used for the snow cover fraction predictors. We computed the time-averaged value at the 5th, 50th, and 95th percentile of the snow cover fraction as the three predictors. The averaged value (mean) over each GRIT reach catchment was used as the predictor (CA). The Köppen-Geiger climate category³⁰ is also used as a predictor where the climate region is defined by the most common Köppen value within the catchment.

Vegetation in the upstream catchment controls evaporation and infiltration rates, influencing stream discharge and the resulting river channel network^51,52. Here we used the LAI as a proxy for vegetation in the catchment. Leaf Area Index (LAI) tracks the one-sided green leaf area per unit of ground surface area, while the Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) quantifies the solar radiation absorbed by plants within the photosynthetically active radiation spectral region. Gridded daily data from 1981–2022 of LAI and FAPAR were accessed from NOAA Climate Data Record of Advanced Very High Resolution Radiometer⁵³. Both datasets have a spatial resolution of 0.05°. The time-averaged value over the full study period was computed for each GRIT river catchment. Averaged value (mean) over each GRIT reach catchment was used as the predictor (CA).

Urbanisation may also affect runoff rates, and therefore we also used the percentage of impervious area as a predictor. We obtained the urban areas from the gridded map of the land cover dataset derived from multiple satellite images acquired for the year 2010, then calculated the ratio between urban areas and the total area per grid. This ratio represents the impervious percentage per pixel at a spatial resolution of 0.25°⁵⁴. The catchment averaged (CA) value of this data is used as the predictor.

Random Forest model

Benefiting from a bootstrap aggregating technique, RF models can assimilate information from a large number of predictors to estimate the target without overfitting. This is achieved by creating multiple regression models from different subsets of the training samples and combining their outputs to compute the estimation^55,56. The importance score is defined as the total decrease in node impurity resulting from splitting the predictor variables, averaged over all trees⁵⁵, measured by the residual sum of squares. The importance indicates the relative significance of each predictor for predicting the target variable (Supplementary Fig. S3). We used the scikit-learn package (version 1.1.2) in python⁵⁷ to construct our RF model.

Eighty percent of the observations were randomly as the training data. We enforced a proportional selection of the training samples on top of the randomness, meaning that the magnitude and climate region distribution of training samples are proportional to the total distribution in the observation dataset. We achieved this by using the class_weight in scikit-learn package. A five-fold cross-validation was used in the training process to optimize the model parameters (max_features, min_samples_leaf, n_estimators), ultimately achieving the highest R² value in predicting the bankfull discharge. To ensure the robustness of the cross-validation, we ran the cross-validation 10 times (samples used in the cross-validation are different in each run) and report the averaged (mean) result as the final performance (Supplementary Fig. S3). We adopted the best-performing model from the multiple cross-validation runs to estimate the Q_BF for global ungauged rivers.

Flood frequency analysis

In order to estimate the return period of the Q_BF, long time series of discharge records have to be used. We applied a three-step process to obtain the return period of river bankfull discharge and the value of widely used bankfull discharge assumption: 2-year return period flow. We relied on a global gauging station dataset to identify the discharge records associated with the Q_BFobs and Q_BFest. We first selected gauging stations with high-quality discharge records (a minimum of 10 complete years of data, where each complete year has at least 90% of the daily discharge records). We then paired the selected gauging stations with the Q_BFobs by the gauging site number, and with the Q_BFest based on the snapping approach mentioned in the bankfull discharge reference data section above. Our initial source of global gauging station dataset includes a total number of 22,538 stations collected from various agencies for the period 1950-2021⁵⁸. Eventually, we paired 1,818 sites of Q_BFobs, 9,058 sites of Q_BFest with the qualified discharge records (1,545 sites are overlapping). We estimated the return period of the Q_BFobs and the Q_BFest as well as the value of bankfull discharge assumption for these sites (Q_RP2) using flood frequency analysis.

The return period estimation, and estimation of various return periods of discharge (Q_RP2), were conducted using Flood Frequency Analysis (FFA). FFA establishes the relationship between the flood magnitude and frequency of occurrence (or return period). It is typically used to estimate the flood magnitude of design floods (such as the 2-year return period of discharge or more extreme ones). To obtain the relationship between the flood magnitude and frequency of occurrence, flood events are first selected from long-time series of discharge records, then the probability of occurrence is estimated based on the statistics of selected flood events. To select the flood events, there are two commonly used approaches: the Annual Maximum Series (AMS) approach, which uses the maximum discharge of each year, and the Peak-Over-Threshold (POT) approach, which employs all daily discharge values above a specified threshold⁵⁹. Because the AMS only considers the maximum discharge in each year, it is limited to estimating flows with a return period of one year or greater. However, some rivers may reach bankfull more frequently than once per year on average. For this reason, we selected the POT approach. The POT method was shown to perform better than the AMS approach in at-site FFA⁶⁰.

We selected the flood events based on the POT method offered by the floodnetRfa R package (https://github.com/floodnetProject16/floodnetRfa) and used a Locally Weighted Scatterplot Smoothing (Loess) curve⁶¹ to extract the bankfull discharge return period (Q_BFobs−RP and Q_BFest−RP) from the relationship between the recurrence probability and discharge. In the POT method, the user must determine the appropriate threshold and the adjacent flood events separation rule to select independent flood events⁶². Here, to be able to estimate the return period of bankfull discharge, the event identification threshold must be smaller than Q_BF. Rather than manually identifying this threshold, we systematically assessed four different thresholds, namely 10%, 25%, 50%, and 75% of the bankfull discharge value. We used the adjacent peaks separation rule recommended by the Water Research Council of the United States⁶³. We fitted a Loess curve to the selected flood events and estimated the Q_BFobs−RP and Q_BFest−RP for each threshold. Loess is a non-parametric method that fits a smooth curve to the data, particularly useful for capturing the underlying patterns of non-linear relationships. In our case, we used a Loess span of 0.5, which provides a moderate degree of smoothing.

Our analysis of the four bankfull discharge return periods (sensitivity tested using four different event thresholds) at each gauging station revealed that some sites with low frequency of Q_BF return periods (> 10 years) could also have very large variation in return period (Supplementary Figs. S7-8). We thus kept only those sites where the estimated Q_BF return period was robust to the choice of the peak threshold, with no greater than 50% variation among the four estimated Q_BF return periods (Q_BF−RP, from the four sensitivity-testing thresholds above). The variation was calculated as the maximum Q_BF−RP minus the minimum Q_BF−RP, divided by the mean Q_BF−RP. We chose this value of 50% variation to reduce uncertainty and avoid removing all the sites with relatively large return periods. Details of the relationship between the variation in the estimated return period and the number of events exceeding Q_BF can be found in Supplementary Figs. S7-8. We averaged the Q_BF−RP estimated from the four sensitivity thresholds as the final return period. Eventually, we obtained the Q_BFest−RP for 8,519 sites, among which 1,545 sites has Q_BFobs−RP (Fig. 2).

Using the above selected flood events, we estimated the discharge at the 2-year return period. Again, a Loess curve is used here. Averaged values from all the thresholds were used as the final Q_RP2. We calculated the bias of 2-year return period of discharge by comparing to the Q_BFobs at 1,725 sites (Fig. 3).

Funding

This work is part of the Evolution of Global Flood Hazard and Risk (EVOFLOOD) project supported by the Natural Environment Research Council supporting YL/MW/LS (NE/S015728/1), LW/JCN (NE/S015639/1), PD/SJM (NE/S015795/2), SED/JL (NE/S015817/1), APN (NE/S015612/1), HC (NE/S015590/1), GSS (NE/S015736/1) and PJA (NE/S015655/1). LS was additionally supported by UKRI (MR/V022008/1). JY acknowledges support from the National Natural Science Foundation of China (52361145864). BA acknowledges support from SNSF Grant 200021_214907.

Data availability

The GQBF data is available at https://zenodo.org/records/13855371. The GRIT network (river vectors with associated drainage area and sinuosity attributes) can be downloaded from https://zenodo.org/records/11219313. FABDEM can be downloaded at https://data.bris.ac.uk/data/dataset/25wfy0f9ukoge2gs7a5mqpq2j7. GLOFAS version 4 can be downloaded from https://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-historical?tab=form. The GRWL river data can be downloaded from https://zenodo.org/records/1297434. The water occurrence data can be downloaded from https://global-surface-water.appspot.com/download. The temperature and precipitation data can be downloaded from https://www.gloh2o.org/mswx/. LAI and FAPAR can be downloaded from https://www.ncei.noaa.gov/products/climate-data-records/leaf-area-index-and-fapar. Urban coverage data can be downloaded from https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-land-cover?tab=overview. Snow cover fraction data can be downloaded from https://zenodo.org/records/5774954. The soil grid (water table depth, soil moisture) is obtained from https://zenodo.org/records/4997453. The soil thickness can be downloaded from https://daac.ornl.gov/SOILS/guides/Global_Soil_Regolith_Sediment.html#:~:text=6)%20average_soil_and_sedimentary%2Ddeposit_thickness.tif,upland%20hillslopes%20and%20valley%20bottoms. The snow fraction data can be downloaded from https://zenodo.org/records/5774954.

Code availability

The code employed to estimate the return period of river bankfull discharge and the 2-year return period of discharge from gauging stations using the Peak-Over-Threshold is available at https://github.com/YinxueLiu/G-QBF.

Acknowledgements

The authors thank Gardner Bent at the US Geological Survey, Dave Struebel at the NWS Alaska Pacific River Forecasting Centre, and Tomohiro Tanaka at Kyoto University Graduate School of Engineering, for sharing bankfull discharge observation data. The Colombian Institute of Hydrology, Meteorology and Environmental Studies is thanked for sharing cross-section data. The authors thank the various scientists from the UKCEH, USGS, Colombian Institute of Hydrology, Meteorology and Environmental Studies - IDEAM for sharing essential data with us.

Author contributions

Conceptualization: Wortmann, Liu, Slater, Parsons, Darby

Methodology: Wortmann, Liu, Slater

Investigation: Liu, Wortmann

Data curation: Liu, Wortmann, Slater

Formal Analysis: Liu

Writing-original draft: Liu, Slater

Visualization: Liu

Validation: Liu

Writing-review & editing: Liu, Slater, Wortmann, Hawker, Boothroyd, Yin, Santos, Anderson, Nicholas, McLelland, Leyland, Sambrook-Smith, Ashworth, Cloke, Vahidi, Delorme, Darby, Parsons

Resources: Slater, Neal, Darby, Yin, Santos, Anderson, Vahidi, Boothroyd, Hawker, Griffth, Gebrechorkos, Zhang

Supervision: Slater

Project management: Slater, Darby, Parsons

Funding acquisition: Slater, Parsons, Darby, Neal, Sambrook-Smith, Cloke, Nicholas, McLelland, Leyland, Ashworth

Competing interests

The authors declare no competing interests.

CRED. Disaster Year In Review 2023. Centre for Research on the Epidemiology of Disasters; Available at: https://files.emdat.be/2024/04/CredCrunch74.pdf (2024).
Yin, J. et al. Large increase in global storm runoff extremes driven by climate and anthropogenic changes. Nat. Commun. 9, 4389 (2018).
Rentschler, J. et al. Global evidence of rapid urban growth in flood zones since 1985. Nature 622, 87–92 (2023).
Andreadis KM, Schumann GJ, Pavelsky T. A simple global river bankfull width and depth database. Water Resour. Res. https://doi.org/10.1002/wrcr.20440 (2013).
Leopold, L. B., Bagnold, R. A., Wolman, M. G. & Brush, L. M. Flow resistance in sinuous or irregular channels. US Geological Survey Professional Paper 282-D (1960). Available at: https://pubs.usgs.gov/pp/0282d/report.pdf.
Leopold, L. B. & Maddock, T. The hydraulic geometry of stream channels and some physiographic implications. US Geological Survey Professional Paper 252 (U.S. Government Printing Office, 1953).
Sosa, J., Sampson, C., Smith, A., Neal, J. & Bates, P. A toolbox to quickly prepare flood inundation models for LISFLOOD-FP simulations. Environ. Model. Softw. 123, 104561 (2020).
Bates, P. D. et al. Combined modeling of US fluvial, pluvial, and coastal flood hazard under current and future climates. Water Resour. Res. 57, https://doi.org/10.1029/2020WR028673 (2021).
Chen, F., Chen, L., Zhang, W., Yuan, J. & Zhang, K. Variations in the effective and bankfull discharge for suspended sediment transport due to dam construction. Front. Earth Sci. 16, 446–464 (2022).
Cohen, S., Kettner, A. J. & Syvitski, J. P. M. Global suspended sediment and water discharge dynamics between 1960 and 2010: Continental trends and intra-basin sensitivity. Glob. Planet. Change 115, 44–58 (2014).
Woodworth, K. A. & Pasternack, G. B. Are dynamic fluvial morphological unit assemblages statistically stationary through floods of less than ten times bankfull discharge? Geomorphology 403, 108135 (2022).
Worrall, F., Burt, T. P. & Howden, N. J. K. The fluvial flux of particulate organic matter from the UK: Quantifying in-stream losses and carbon sinks. J. Hydrol. 519, 611–625 (2014).
Hastie, A., Lauerwald, R., Ciais, P. & Regnier, P. Aquatic carbon fluxes dampen the overall variation of net ecosystem productivity in the Amazon basin: An analysis of the interannual variability in the boundless carbon cycle. Glob. Change Biol. 25, 2094–2111 (2019).
Price, A. N. et al. Biogeochemical and community ecology responses to the wetting of non-perennial streams. Nat. Water 2, 815–826 (2024).
Yamazaki, D. The global hydrodynamic model CaMa-Flood (version 3.6.2); Available at https://hydro.iis.u-tokyo.ac.jp/~yamadai/CaMa-Flood_v3.6/Manual_CaMa-Flood_v362.pdf (2014).
Sampson, C. C. et al. A high-resolution global flood hazard model. Water Resour. Res. 51, 7358–7381 (2015).
Neal J, Hawker L, Savage J, Durand M, Bates P, Sampson C. Estimating river channel bathymetry in large scale flood inundation models. Water Resour. Res. https://doi.org/10.1029/2020WR028301 (2021).
Zhang, H. et al. Estimating the lateral transfer of organic carbon through the European river network using a land surface model. Earth Syst. Dyn. 13, 1119–1144 (2022).
Williams, G. P. Bank-full discharge of rivers. Water Resour. Res. 14, 1141–1154 (1978).
Petit, F. & Pauquet, A. Bankfull Discharge Recurrence Interval in Gravel-bed Rivers. Earth Surf. Process. Landf. 22, 685–693 (1997).
Crowder, D. W. & Knapp, H. V. Effective discharge recurrence intervals of Illinois streams. Geomorphology 64, 167–184 (2005).
Edwards, P. J., Watson, E. A. & Wood, F. Toward a Better Understanding of Recurrence Intervals, Bankfull, and Their Importance. J. Contemp. Water Res. Educ. (2022).
Schneider, C., Flörke, M., Eisner, S. & Voss, F. Large scale modelling of bankfull flow: An example for Europe. J. Hydrol. 408, 235–245 (2011).
Bieger, K., Rathjens, H., Allen, P. M. & Arnold, J. G. Development and Evaluation of Bankfull Hydraulic Geometry Relationships for the Physiographic Regions of the United States. JAWRA J. Am. Water Resour. Assoc. 51, 842–858 (2015).
Hawker, L. et al. A 30 m global map of elevation with forests and buildings removed. Environ. Res. Lett. 17, 024016 (2022).
Wortmann, M., Slater, L. J., Hawker, L. P. et al. Global River Topology (GRIT): A bifurcating river hydrography. ESS Open Archive https://doi.org/10.22541/essoar.172108645.52746193/v1 (2024).
Allen, G. H. & Pavelsky, T. M. Global extent of rivers and streams. Science 361, 585–588 (2018).
Wang, J. et al. GeoDAR: georeferenced global dams and reservoirs dataset for bridging attributes and geolocations. Earth Syst. Sci. Data 14, 1869–1899 (2022).
Smith, A., Sampson, C. & Bates, P. Regional flood frequency analysis at the global scale. Water Resour. Res. 51, 539–553 (2015).
Beck, H. E. et al. Present and future Köppen-Geiger climate classification maps at 1-km resolution. Sci. Data 5, 180214 (2018).
McMillan, H., Krueger, T. & Freer, J. Benchmarking observational uncertainties for hydrology: rainfall, river discharge and water quality. Hydrol. Process. 26, 4078–4111 (2012).
Slater, L. Hydrologic versus geomorphic drivers of trends in flood hazard. Geophys. Res. Lett. 42, 507–514 https://doi.org/10.1002/2014GL062482 (2015).
Slater, L. J., Khouakhi, A. & Wilby, R. L. River channel conveyance capacity adjusts to modes of climate variability. Sci. Rep. 9, 12619 (2019).
Krabbenhoft, C. A. et al. Assessing placement bias of the global river gauge network. Nat. Sustain. 5, 586–592 (2022).
Durand, M. A. et al. A framework for estimating global river discharge from the Surface Water and Ocean Topography satellite mission. Water Resour. Res. 59, e2021WR031614 https://doi.org/10.1029/2021WR031614 (2023).
Wortmann, M., Slater, L., Hawker, L., Liu, Y. & Neal, J. Global River Topology (GRIT) (0.6) [Data set]. Zenodo https://doi.org/10.5281/zenodo.11219313 (2024).
Lehner, B., Verdin, K. L. & Jarvis, A. New global hydrography derived from spaceborne elevation data. Eos Trans. Am. Geophys. Union 89, 93–94 (2008).
Yamazaki, D. et al. MERIT Hydro: A High-Resolution Global Hydrography Map Based on Latest Topography Dataset. Water Resour. Res. 55, 5053–5073 (2019).
Lane, S. N., Tayefi, V., Reid, S. C., Yu, D. & Hardy, R. J. Interactions between sediment delivery, channel change, climate change and flood risk in a temperate upland environment. Earth Surf. Process. Landf. 32, 429–446 (2007).
Zahar, Y., Ghorbel, A. & Albergel, J. Impacts of large dams on downstream flow conditions of rivers: Aggradation and reduction of the Medjerda channel capacity downstream of the Sidi Salem dam (Tunisia). J. Hydrol. 351, 318–330 (2008).
Wharton, G., Arnell, N. W., Gregory, K. J. & Gurnell, A. M. River discharge estimated from channel dimensions. J. Hydrol. 106, 365–376 (1989).
Zsoter, E. River discharge historical data from the Global Flood Awareness System. ECMWF https://doi.org/10.24381/CDS.A4FDD6B9 (2019).
Pekel, J.-F., Cottam, A., Gorelick, N. & Belward, A. S. High-resolution mapping of global surface water and its long-term changes. Nature 540, 418–422 (2016).
James, G., Witten, D., Hastie, T. & Tibshirani, R. An introduction to statistical learning. Available at: https://www.statlearning.com.
Dente, E., Lensky, N. G., Morin, E. & Enzel, Y. From straight to deeply incised meandering channels: Slope impact on sinuosity of confined streams. Earth Surf. Process. Landf. 46, 1041–1054 (2021).
CIESIN. Dams, v1.01: Global Reservoir and Dam (GRanD), v1. SEDAC. Available at: https://sedac.ciesin.columbia.edu/data/set/grand-v1-dams-rev01.
MSWX: Global 3-hourly 0.1° bias-corrected meteorological data including near-real-time updates and forecast ensembles. Bull. Am. Meteorol. Soc. 103, https://doi.org/10.1175/BAMS-D-21-0145.1 (2022).
Pelletier, J. D. et al. A gridded global data set of soil, intact regolith, and sedimentary deposit thicknesses for regional and global land surface modeling. J. Adv. Model. Earth Syst. 8, 41–65 (2016).
Zomer, R. J., Xu, J. & Trabucco, A. Version 3 of the Global Aridity Index and Potential Evapotranspiration Database. Sci. Data 9, 409 (2022).
Hengl, T. et al. Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions. PLOS ONE 10, e0125814 (2015).
Box, W., Järvelä, J. & Västilä, K. Flow resistance of floodplain vegetation mixtures for modelling river flows. J. Hydrol. 601, 126593 (2021).
Sgarabotto, A., D’Alpaos, A. & Lanzoni, S. Effects of Vegetation, Sediment Supply and Sea Level Rise on the Morphodynamic Evolution of Tidal Channels. Water Resour. Res. 57, e2020WR028577 (2021).
NOAA National Centres for Environmental Information (NCEI). NOAA Climate Data Record (CDR) of AVHRR Leaf Area Index (LAI) and Fraction of Absorbed Photosynthetically Active Radiation (FAPAR), Version 5 (2023). Available at: https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.ncdc
Copernicus Climate Change Service (C3S). Land cover classification gridded maps from 1992 to present derived from satellite observations. ECMWF https://doi.org/10.24381/CDS.006F2C9A (2019).
Breiman, L. Random Forests. Mach. Learn. 45, 5–32 (2001).
Cutler, A., Cutler, D. R. & Stevens, J. R. Random Forests. in Ensemble Machine Learning: Methods and Applications (eds. Zhang, C. & Ma, Y.) 157–175 (Springer, New York, NY, 2012). https://doi.org/10.1007/978-1-4419-9326-7_5.
Fabian Pedregosa et al. scikit-learn: Machine learning in Python. Available at: https://scikit-learn.org/stable/.
Yin, J. et al. Global Increases in Lethal Compound Heat Stress: Hydrological Drought Hazards Under Climate Change. Geophys. Res. Lett. 49, e2022GL100880 (2022).
Bobée, B. & Rasmussen, P. F. Recent advances in flood frequency analysis. Rev. Geophys. 33, 1111–1116 (1995).
Bezak, N., Brilly, M. & Šraj, M. Comparison between the peaks-over-threshold method and the annual maximum method for flood frequency analysis. Hydrol. Sci. J. 59, 959–977 (2014).
R Core Team. loess function - RDocumentation. Available at: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/loess.
CSHS Hydrology Group. vignette_CSHShydRology_pot.pdf. Google Docs. Available at: https://drive.google.com/file/d/1pkOSuJauiVaXAiHh_CFC1mP2GjR_VqFv/view?usp=sharing&usp=embed_facebook.
Lang, M., Ouarda, T. B. M. J. & Bobée, B. Towards operational guidelines for over-threshold modeling. J. Hydrol. 225, 103–117 (1999).

There is NO Competing Interest.

Supplementary.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Global Estimation of River Bankfull Discharge Reveals Distinct Flood Recurrences Across Different Climate Zones

Status:

Version 1

Abstract

Figures

Main text

Global estimates of bankfull discharge

Return period of bankfull river discharge varies across climate regions

Biases in the commonly-used river bankfull discharge assumption

Discussion

Conclusions

Methods

The GRIT river network

Bankfull discharge reference data

Bankfull discharge predictors

Random Forest model

Flood frequency analysis

Declarations

Funding

Data availability

Code availability

Acknowledgements

Author contributions

Competing interests

References

Additional Declarations

Supplementary Files

Status:

Version 1