This study features a significantly simplified modelling framework than previous studies of this scale. Specifically, the unstructured nature of Delft3D Flexible Mesh (utilized in GTSM-ERA5) eliminates the need for various levels of model nesting to downscale global level oceanographic data to the local scale. Furthermore, the increased resolution and quality of the ERA5 reanalysis product reduces the need for dynamic downscaling of atmospheric data. Therefore, the full methods of this study reduce to a series of statistical corrections to the GTSM data and a wave setup model. These components are detailed in the following sections.
3.1 Hydrodynamics
Modelling of nearshore water levels is performed using the Global Tide and Surge Model version 3.0 (GTSMv3.0) (Muis et al., 2020; Muis et al., 2022). GTSMv3.0 is a global scale implementation of Delft3D Flexible Mesh (Kernkamp et al., 2011) developed by Deltares. The unstructured nature of the GTSM grid allows for variable mesh resolution ranging from approximately 2.5 km along the U.S. coastlines to 25 km in the deep ocean. While resolution is improved from previous global models, nearshore resolution is still often insufficient to resolve small-scale coastal features. The overall mesh is over 5 million grid cells with bathymetry set using the General Bathymetric Chart of the Ocean (GEBCO, 2014) bathymetry dataset, which provides global 30 arc-second bathymetry. GTSM is a global model so has no open ocean boundaries, only atmospheric. Therefore, tides are generated internally through tidal generating forces for over 60 harmonics (Apecechea et al., 2017). Atmospheric forcing is provided through wind and sea level pressure fields using the ERA5 climate reanalysis (Hersbach et al., 2018; Hersbach et al., 2020). The 38-year hindcast dataset (1979–2017) produced using GTSM forced with ERA5 is called GTSM-ERA5 (which is used for the rest of this paper). GTSM-ERA5 was configured with output stations at approximately 20 km resolution along the coastline (see Fig. 1). These stations were supplemented with outputs co-located at tide gauge locations as well as increased station resolution in bathymetrically complex regions. Co-located output/tide gauge locations allow for multi-decadal scale model-observation comparisons at various locations within the study region.
GTSM-ERA5 and NOAA tide gauge water levels were decomposed into a subset of contributing signals: tides, seasonality, non-tidal residual (NTR), and monthly mean sea level (MMSL). Water levels were separated into 4 components to gain insight into processes contributing to high water levels, as well as to better develop methods for process specific corrections to each signal. The decomposition of water levels was accomplished by first separating water levels into tides* and NTRs*, labeled here with a star as these signals are intermediate and further separated. For GTSM-ERA5, two different model runs were performed, one with full forcing and one with tidal only forcing. Modelled NTRs* were calculated by subtracting modelled tides from simulations with full forcing. For tide gauge stations, observed NTRs* were calculated by subtracting NOAA-predicted tide signals from observed water levels. After separating tidal* and NTR* water levels, each of these signals were split into two additional signals. NTR* was separated into a MMSL component and a NTR component (as defined as NTR* with MMSL removed). Tidal* water levels were separated into a seasonal signal and a tidal signal (as defined as tidal* with seasonality removed). The definition and calculation of the MMSL and seasonal signal is described below in section 3.1.3 and 3.1.4.
Here we define the still water level (SWL) as the linear superposition of four components (Eq. 1a): astronomic tides, astronomic seasonal cycles, non-tidal residuals, and monthly mean sea levels. Note that the SWL varies in time but includes the term ‘still’ to distinguish the elevations from the addition of wave setup or runup. The separation of SWL signals can be visually seen for both the observed and raw GTSM-ERA5 data in Fig. 2. The addition of wave setup (see section 3.2) to the SWL is termed “mean water levels” or MWLs (Eq. 1b) (FEMA, 2015).
\({S}{t}{i}{l}{l} {W}{a}{t}{e}{r} {L}{e}{v}{e}{l} \left({S}{W}{L}\right) ={T}{i}{d}{e}{s}+{S}{e}{a}{s}{o}{n}{a}{l}{i}{t}{y}+{N}{T}{R}+{M}{M}{S}{L}\) (Eq. 1a)
\({M}{e}{a}{n} {W}{a}{t}{e}{r} {L}{e}{v}{e}{l} \left({M}{W}{L}\right) ={S}{W}{L}+{w}{a}{v}{e} {s}{e}{t}{u}{p}\) (Eq. 2b)
3.1.1 SLR
Observed SWLs across the study region were found to have a statistically significant trend (ranging from 2–4 mm/year). At the scale of the considered 38-year period, this trend was considered contributable to sea level rise (SLR) rather than short term variability. Global SLR was included in GTSM-ERA5 (both in the full forcing and tide-only runs) to capture non-linear effects of SLR on hydrodynamics (Idier et al., 2019; Wahl, 2017). Modelled local trends in water levels were found to be consistent with observed trends at the tide gauges and satellite altimetry (Sweet et al., 2022). This study takes the approach of removing the trend in SWL, both from the modelled and observed timeseries. A linear regression was fit to the entire modelled and observed timeseries at each output location and then removed from the water level timeseries. SLR was removed for a variety of reasons, the first being that it allows for a stationary analysis where extremes (as well as other SWL statistics) are not a function of time. Secondly, the approach allows for validating the effectiveness of the modelling framework separate from SLR. In the future SLR becomes a source of significant uncertainty (Sweet et al., 2022), so it is valuable to assess the skill of the modelling framework separate from the SLR component.
3.1.2 Tides
The tidal component of SWL is the largest error fraction, with respect to Root-Mean-Square-Error (RMSE), in GTSM-ERA5 data at most tide gauge comparison points. Therefore, an alternative tidal correction model was developed. The periodic nature of tides represents a unique problem in that corrections in the time domain are unlikely to be effective. As an example, a small error in tidal harmonic phase (shifting the tidal timeseries) will create a cyclic over and under bias that is difficult to correct with time domain methods targeted at systematic bias. Corrections to tidal harmonics in the frequency domain are therefore more effective. This has the additional benefit of reducing the complexity of the correction since a tidal harmonic can be fully represented by only two parameters (phase and amplitude).
A variety of openly available and well validated global tide models were selected to improve modelled tidal water levels. A subset of best performing global tide models (Stammer et al., 2104; Seifi et al., 2019) were downloaded and compared to NOAA predicted tides at all tide gauges within the study region (see Table 1). Older versions of both the TPXO and FES model series were included as some studies have found that these older versions may perform better than the newest version in certain regions (Lyard et al., 2021). For our study site, the FES2014 (Lyard et al., 2006) and TPXO v9.2 (Egbert et al., 2010) were found to be the best performing models, with approximately equal skill on average across the region. This said, model performance was found to depend on location.
Table 1
Comparison of model RMSE (in centimeters) for all tide gauges (rows) and all tidal model products (columns) tested as part of this study. The final column is the model performance of the Ensemble method. Bold values represent the best performing model at a specific tide gauge station.
|
Tidal Model Performance vs. Tide Gauges (RMSE, cm)
|
Station ID
|
GTSM
|
TPXO8
|
TPXO9.2
|
FES2012
|
FES2014
|
Ensemble
|
8632200
|
10.8
|
20.2
|
3.3
|
11.9
|
9.2
|
4
|
8638901
|
11.3
|
20.4
|
4.8
|
5.8
|
3
|
2.9
|
8638610
|
10.4
|
12.3
|
14
|
11.1
|
5
|
2.9
|
8651370
|
17
|
33
|
24.6
|
29.8
|
9.1
|
17.4
|
8652587
|
14
|
4.5
|
6.7
|
14.4
|
26.7
|
10.5
|
8654467
|
20.2
|
4.7
|
10.7
|
21.5
|
18.4
|
15.9
|
8656483
|
14.3
|
31.4
|
11.6
|
10
|
12.1
|
6.7
|
8658163
|
20.8
|
32
|
3.9
|
4.5
|
3.7
|
5.2
|
8661070
|
26.8
|
50.4
|
6
|
5.2
|
4.6
|
8.2
|
8662245
|
27.3
|
46.2
|
22.4
|
30.3
|
27
|
23.6
|
8665530
|
29.6
|
42
|
7.5
|
16
|
12
|
8.5
|
8670870
|
37
|
52.8
|
12.1
|
23.5
|
12.4
|
14.4
|
8720030
|
35.5
|
33.9
|
17.8
|
29.5
|
21.2
|
17.7
|
8720218
|
20.4
|
29.7
|
14.4
|
16.3
|
14.3
|
9.9
|
8721604
|
19
|
34.4
|
2.6
|
5.8
|
3.1
|
5.7
|
8722670
|
14.4
|
23.9
|
2.1
|
5.8
|
2.2
|
4.2
|
8722956
|
12.3
|
19.1
|
2.1
|
7.4
|
2.3
|
3.6
|
8723214
|
7.1
|
8.5
|
17.4
|
4.7
|
7.8
|
2.6
|
Average
|
19.3
|
27.7
|
10.2
|
14.1
|
10.8
|
9.1
|
A blended product was created to take advantage of the best parts of each model in the ensemble. Conceptually this is like Global Climate Model ensembles where the mean of multiple models can be considered more robust than any individual model. The developed strategy predicts tides using a weighted mean of all tidal harmonics (both phase and amplitude). The weight for each model was determined by the average performance of that model across all tide gauges. More specifically, each tidal constituent amplitude / phase was assigned its own weighting of the contributing models as determined by the performance of that model. Eq. 2 details the utilized weighting scheme:
\({{W}}_{{i},{j}}=\left(\frac{1}{{M}{{A}{E}}_{{i},{j}}}\right)/ \sum _{{j}=1}^{{n}}{{M}{A}{E}}_{{i},{j}}\) (Eq. 3 )
where:
W
Individual model weighting for the specific harmonic phase or amplitude
MAE
Mean Absolute Error across all tide gauges for the specific harmonic phase/amplitude
i
index for each amplitude and phase of tidal harmonics
j
index for models contributing to the tidal ensemble
n
number of models available for the specific amplitude/phase of the tidal harmonic
The numerator of Eq. 2 penalizes the model’s weight in the weighted average for a larger MAE. The denominator normalizes the weighting so that the sum of weights is 1.
Tides were computed by extracting each tide model’s phase and amplitude for all available tidal harmonics at each GTSM-ERA5 output location. As GTSM-ERA5 data are only available as timeseries, the phase and amplitudes were determined by performing a harmonic analysis using Utide software (Codiga, 2011,2021) at each output location. While generally found to perform less well than using results from the FES and TPX models, GTSM-ERA5 was included within the ensemble to integrate nonlinear interactions between tide and surge. A new set of phase and amplitudes were generated by combining the ensemble phase and amplitudes through the weighting scheme detailed above. Tides were then recomputed using this new set of phase and amplitudes using the t_tide software package (Pawlowicz et al., 2002).
3.1.3 Astronomic Seasonality
Seasonality was defined for this project as the astronomically derived SA (Solar Annual; period of 365 days) and SSA (Solar Semi-Annual; period of 183 days) harmonics within the tidal component. This differs from other study definitions of seasonality, for example by NOAA (NOAA, 2009), who analyses the yearly average of MMSL. NOAA’s definition would therefore include seasonal variations in NTR in addition to tides. Seasonality was included as an individual component in the modelling framework for a variety of reasons. The first was that seasonality was determined to be a major source of error in GTSM-ERA5 data. Secondly, seasonality was found to be insufficiently reproduced using the tidal model ensemble method (see section 3.1.2). This was primarily because most of the selected tidal models do not include seasonal harmonics. Only FES2014 and GTSM-ERA5 contained seasonality as a tidal component, and both were found to greatly underpredict variability in the study region, necessitating the need for corrections to this parameter.
A simple linear regression-based shift of the seasonal harmonic phase and amplitude was selected as the statistical correction methodology. The regression was developed using a single explanatory variable, station latitude, which was implemented to account for along coast variations in the seasonal error (see Eq. 3).
\({C}{o}{r}{r}={{B}}_{1}{L}{a}{t}+{{B}}_{2}\)(Eq. 4)
where: Lat: latitude in degrees of the tide gauge location.
B
least squares fitted regression coefficients.
Corr
Correction applied to the phase/amplitude of the Seasonal Harmonic.
Each harmonic amplitude and phase error (the response variable in the regression) was fitted with an individual linear regression based on the error at tide gauges within the region. This can be seen visually in Fig. 3. Notably, two gauges were omitted from the linear regression fitting (station 8652587 and 8654467; Oregon Inlet, NC, and Hatteras, NC) due to poor comparison between GTSM-ERA5 data and observations. A sensitivity analysis showed that inclusion of these stations did not significantly affect regression coefficient estimates. Rather, these stations were omitted more for consistency with the handling of these outlier stations in other parts of the modelling framework.
3.1.4 Monthly Mean Sea Level
Monthly Mean Sea Level (MMSL) was derived in this study using a low-pass filter on SWLs after the removal of tides and seasonality. A filtering approach was chosen instead of the more conventional “monthly means” method as a more complete method of removing low frequency energy from the signal. The monthly means definition is somewhat complicated when removed from the overall signal, requiring interpolation between months to a finer temporal resolution to remove offsets. Furthermore, the resulting effects in the frequency domain are complicated with removal and retention of periodic signals being based on the relationship of the mean and signal’s periods (Wang et al., 2009). This study instead uses a 4th order butterworth filter with cut-off frequency of 0.3858 µHz (1/30 days) applied using a backward forward methodology to remove phase shift. The resulting MMSL timeseries was found to be a cleaner representation of low frequency energy than the classic monthly means definition.
It was decided to separate MMSL from the NTR signal for additional insight into GTSM-ERA5’s performance. MMSL is often controlled by different processes than local NTR. The U.S. East Atlantic has seen a recent surge of research on MMSL after it was noticed that tide gauge records in the region showed a significant acceleration of SLR, approximately 3–4 times greater than the global average (Sallenger et al., 2012). The exact cause of this increase in MMSL remains debated but has been tied to a variety of processes, including regional atmospheric patterns (Andres et al., 2013; Domingues et al., 2018; Zhao and Johns, 2014), Gulf Stream dynamics (Andres et al., 2013; Domingues et al., 2018; Ezer et al., 2013; Kopp, 2013; Yin and Goddard, 2013), ice sheet decline (Davis and Vinogradova, 2017), and climate patterns such as the North Atlantic Oscillation (NAO) (Ezer and Atkinson, 2014; Kenigson et la., 2018; Kopp, 2013). GTSM-ERA5 is not capable of capturing all of these potential contributors to MMSL, so the signal was separated from higher frequency NTR for further investigation and potentially for correction.
Despite limits to the physics included in GTSM-ERA5, further investigation found that MMSL variability was reasonably well represented by GTSM-ERA5. GTSM-ERA5’s MMSL signal was found to have a small bias (~ 3 cm) in the mean MMSL value in comparison to observations. Therefore, MMSL was post-processed by adding 3 cm. With GTSM-ERA5 skill found to be sufficient, no additional statistical corrections were applied to the MMSL component.
3.1.5 Non-Tidal Residual
Non-tidal residual (NTR) in this study was defined as conventional NTR* (SWL minus tides), but additionally with MMSL removed (see section 3.1.4). Therefore, NTR is herein defined as all NTR energy with a frequency higher than 0.3858 µHz. GTSM-ERA5 was found to have significant skill at resolving NTR across the study region with only a few problematic tide gauges. Tide gauges 8652587 (Oregon Inlet, NC) and 8654467 (Hatteras, NC) were flagged as outliers in comparisons to GTSM-ERA5. Further investigation showed that these gauges are located behind thin barrier islands with small inlet channels. While GTSM-ERA5 leverages unstructured meshing to have very high coastal resolution for a global model, the resolution is still on the order of 2.5 km. This resolution remains insufficient to resolve small scale coastal features (such as thin barrier islands or small channels). While sub-resolution features make model-observation comparisons at the flagged tide gauges problematic, it is less of a problem for the overall study and produced hindcast dataset. In the resulting analysis, GTSM-ERA5 data are only examined in the nearshore open-coast region. The use of only open-coast GTSM-ERA5 data restricts the study’s conclusions to the open coast region and means that GTSM-ERA5 does not need to be, nor is expected to be, skillful within back bays.
Overall, GTSM-ERA5 NTR data were found to be reasonably skillful in comparison to tide gauges (average RMSE of 6 cm). A statistical correction was performed due to the importance of NTR during flooding events. Additionally, extreme SWLs were found to be significantly underpredicted, a result in agreement with results by Muis et al. (2017). Underprediction of extremes is particularly undesirable due to the importance of these events from a coastal hazard perspective.
Bias correction was performed using a modified version of quantile matching. While quantile matching (Déqué, 2007) has seen successful application in coastal modelling settings (Parker et al., 2017), this study has a much larger region of interest where exact matching of model cumulative distribution functions (CDFs) to observations becomes problematic. Specifically, applying corrections derived at tide gauges to locations far from the tide gauge may introduce errors. While this approach is possible, most likely through interpolating corrections between tide gauges, it is likely that an exact CDF matching constraint is too strong and may result in overfitting. For example, there is no reason to assume that the GTSM-ERA5 bias measured at a tide gauge located deep within a hydraulically complex river mouth is the same as the bias at a nearby ocean coast location. This study compares each tide gauge to GTSM-ERA5 data as a single sample of the overall error in the model. We therefore correct the GTSM-ERA5 data at each quantile by the average error across all tide gauges. The same correction (based on the average error) is applied across the entire region, as shown in Fig. 4.
This figure plots the difference (observed NTR – modelled NTR) between CDFs at each quantile. Therefore, the value on the x-axis (correction) is the amount that must be added to the model NTR to exactly match the observed CDF. The dotted line is the average of all tide gauges and therefore the correction applied to the GTSM-ERA5 NTR data across the entire region. Figure 4 shows that GTSM-ERA5, in general, overpredicts low NTR and underpredicts high NTR (a bias towards too little variability). The median bias (quantile 0.5) is low (model underprediction) and this low bias increases moving towards larger NTR with the largest underpredictions found for the upper extreme NTR. Across tide gauges, the general trend, although with significant variability, is increasing bias in NTR moving from the South (Darker colors) to the North (Lighter colors).
Notably, three outlier gauges were removed from this analysis and not included in the average correction calculation. Tide gauges 8652587 (Oregon Inlet, NC), 8654467 (Hatteras, NC), and 863901 (Chesapeake Channel, VA) correction CDFs were found to deviate significantly from the general pattern shown in Fig. 4. Gauges 8652587 (Oregon Inlet, NC) and 8654467 (Hatteras, NC) were highlighted earlier in section as likely effected by a mesh resolution / bathymetry related issue. Gauge 8638901’s (Chesapeake Channel, VA) outlier behaviour is probably due to an insufficient overlap between the GTSM-ERA5 and tide gauge record (less than a year). Additionally, the gauge is located at the center of the Chesapeake Bay inlet and likely experiences complex hydrodynamics not resolved by GTSM-ERA5. Based on these arguments, it was judged valid to remove these outlier stations from the NTR statistical correction procedure.
3.2 Wave Setup
This study additionally considers waves as an important driver of coastal MWLs in the study region (Dietrich et al., 2011a, 2011b). The primary pathway through which waves influence flooding is wave setup, via an increase in mean sea level near the shoreline caused by the transfer of momentum during wave breaking. While either wave run-up or setup could have been considered in this study, wave setup was selected because sustained flooding occurs at timescales of hours to days. Wave run-up occurs at scales of seconds to minutes and so wave setup is more influential and appropriate for consideration in a flood hazard case. Waves can additionally affect MWLs through coupling with surge, tides, sea level, and other processes, but these effects are generally smaller than wave setup along the open coast (Idier et al., 2019). There are two approaches for calculating wave setup: physical modelling or empirical formulations. At the scale of this study, 1000s of kilometers, a physical model capable of accurately resolving wave setup would be computationally prohibitive. Instead, this study utilizes the empirical Stockdon formulation (Stockdon et al., 2007), following other large spatial scale approaches (e.g., Melet et al., 2018; Kirezci et al., 2020; Wahl et al., 2016; Serafin et al., 2017).
The Stockdon formulation relates setup at the shoreline to the deep-water wave height, period, and beach foreshore slope. Foreshore beach slope data were obtained from the USGS National Assessment of Coastal Change Hazards (Doran et al., 2020; Kratzmann et al., 2017). The dataset consists of foreshore slopes along pre-defined transects spaced approximately 50 m in the alongshore direction (Farris et al., 2018). Transects were resampled from 50-m spacing to 1-km spacing alongshore. At each transect, a temporally average foreshore beach slope was calculated from all available shoreline data and spatially smoothed using a Hanning filter to minimize noise (Doran et al., 2015). Wave characteristics were determined by extracting data from the full ERA5 (Hersbach et al., 2018; Hersbach et al., 2020) wave output grids along the 10 m depth contour. ERA5 wave data were then matched to the nearest transect offshore endpoint, while also preventing duplication with multiple wave data points matched to a single transect location. The 10 m depth ERA5 wave data were then reverse-shoaled to respective deep-water conditions using linear wave theory and assuming a shore-normal approach. This approach, rather than directly extracting the wave parameters from deep-water, was taken for two reasons. The first was to account for local changes to wave conditions between deep-water and the nearshore. The second was for consistency with the methodology used in developing the wave setup parameterization, e.g., Stockdon et al. (2007). The spatial resolution of the transect based wave setup data are significantly finer than the GTSM-ERA5 data. Therefore, a characteristic setup was calculated for each GTSM-ERA5 data location by averaging wave setup across all transects within 5 km of the location of interest. It is also worth noting that the utilized wave setup parameterizations predict wave setup at the beach while GTSM-ERA5 output stations are in the nearshore. There is therefore a cross-shore offset between these two MWL components. This was explored further by examining GTSM-ERA5 output stations located at multiple locations in the cross-shore to see how much predicted SWL varied. No significant difference was found, so the cross-shore offset was deemed to be similarly nonrelevant to results. This conclusion is likely since the processes responsible for changes to MWL in the nearshore (e.g., wave setup/setdown, three-dimensional circulation, etc.) are not resolved in the GTSM-ERA5 model.