Bias reduction of high return levels for extreme hazard modelling

doi:10.21203/rs.3.rs-2703361/v1

Download PDF

Research Article

Bias reduction of high return levels for extreme hazard modelling

https://doi.org/10.21203/rs.3.rs-2703361/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 20 Nov, 2023

Read the published version in Natural Hazards →

You are reading this latest preprint version

Many existing extremal data span only a few decades, often resulting in large bias and uncertainty in the estimated shape parameter of the extreme hazard model. This in turn leads to unreliable predicted extreme values at high average recurrence intervals (ARI’s). This paper illustrates a statistical method that provides a mechanism to obtain a hazard model that produces return levels at high ARI’s with reduced bias. The method makes use of the maximum recorded values of extremal data independently recorded from a number of observational sites. The logarithmically transformed probability of the maximum recorded value at a site is shown to follow the Gumbel (Type I extreme-value) distribution, therefore multiple, say m, sites provide a sample of size m transformed probabilities of extreme values, each from a distinct site. The sample can be treated as being drawn from a Gumbel distribution, irrespective of the underlying hazard-generating mechanisms or the statistical hazard models. The method is demonstrated by an analysis of the extreme wind gust data collected from automatic weather stations in South Australia. The results are compared to the specifications in the Australian standard AS/NZS 1170.2:2021 and indicates that the standard may have overestimated the wind gust hazard, hence the specified design wind speeds may fall on the conservative side for South Australia.

extremes

extremal datasets

parameter estimation

tail index

wind hazard

Estimates of extremal return levels at high average recurrence intervals (ARI’s) are strongly dependent on the shape parameter of the statistical model. Especially with only a few tens of years of observed data, high bias and uncertainty invariably exist for parameter estimation based on data from an individual observational stations. For instance, with a short record of extreme wind gust data, say around 20 years, the unreliably estimated shape parameter could lead in some cases to the prediction errors of 1000-year wind speeds up to a few hundred percent (Simiu and Filliben 1975). A common approach to ameliorate this shortcoming is the ‘super-station’ (or station-year) approach (Buishand 1991; Peterka 1992; Wang, Wang, and Khoo 2013) which takes advantage of having multiple weather stations with valid records in a climatologically uniform region. This approach commingles all the data of the valid records from independent events into a single record with the years of the record being the summed years of all the original records. A ‘super-station’ extends the length of record and should reduce the uncertainty in high ARI’s, hence has been demonstrated to be a sensible approach for regions in which all the data could be commingled; however, the problem of potentially high bias remains for return levels beyond the accumulated record length of the super-station.

With regards to natural hazards, for instance, if multiple regions of different climatic conditions are considered, the hazard in each region is analysed based on the data collected in the region, hence each region has its own hazard model parameters, and in such cases the model estimation processes would most likely give different values of the shape parameters for different regions. In practices, because of consideration for convenience of engineering applications or consensus of expert judgment, it may be decided to use a specific shape parameter value across all the regions. For instance, the Australian design standard for wind actions (Australian / New Zealand Standard 2021) uses a shape parameter of 0.1 for all of the four wind regions specified, and to avoid underestimating the extreme wind speeds the ASCE 7 Standards Committee on Loads has used 0 (effectively the Type I extreme-value distribution) for extreme non-hurricane speeds (Lombardo, Main, and Simiu 2009; Simiu and Yeo 2019). However, the specific values were chosen based either on the consensus of judgment or heuristic averaging of the set of shape parameters derived from analysing the wind gust data (Buishand 1991; Holmes 2002). Either way, there is a lack of objective criteria or theoretical basis to derive the best shape parameter value. A wide range of goodness-of-fit test methods such as the Anderson-Darling statistic, the Kolmogorov-Smirnov test, and graph-based tests exist (Palutikof et al. 1999), but they are applicable only for testing the within-data fitness of a dataset from one individual station (or one ‘super-station’). In addition, insufficient record length of observational data poses a serious challenge to derive a shape parameter value with high confidence for extrapolation to hundreds of years beyond the record length.

Van Den Brink and Können (2011) introduced a concept in terms of return period for modelling the probabilities of occurrences of the maximum values from each record of an ensemble of independently collected records and found that the logarithmically transformed return period of a maximum value is approximately a standard Gumbel variate. They demonstrated that this concept can be employed to check the appropriateness of the probability distribution and its parameter values used for modelling the observed phenomena (Van Den Brink and Können 2008, 2011). Instead of return period, this paper derives the concept by means of the relationship between the exceedance probability and ARI, and shows that the log-transformed ARI follows exactly the standard Gumbel distribution. The use of ARI assumes that the occurrence of extreme events follows stochastic continuous processes, in contrast to using return period which assumes that the occurrence of events follows discrete event processes (Wang and Holmes 2020). In addition, the use of ARI along with exceedance probability enables more flexible choice of extreme-value distributions between the two most commonly employed: the generalised extreme-value distribution (GEV) and the generalised Pareto distribution (GPD).

Extreme natural hazard data were typically extracted through the block maxima (BM) (usually annual extremes) method or the peaks-over-threshold (POT) method, two of the most widely used methods for processing extremal data. For large-scale synoptic wind-induced gusts, for example, extracting annual extremes is straightforward with high-quality data, whereas for less frequent, non-synoptic wind events such as thunderstorms, tornadoes, downbursts, and tropical cyclones, which may not occur every year, the wind gusts over a sufficiently high threshold from independent events may be taken for analysis. In this context, the GEV is conceptually a BM method, whereas the GPD conforms to the POT method. Despite the conceptual distinction in data extraction, the GEV and GPD possess a duality relationship that admits the parameters of one distribution to be converted to that of another (Wang and Holmes 2020). As a result, either of the two distributions can be employed for analysis, whether the dataset is processed via block maxima or peaks-over-threshold as long as the ARI’s are estimated through the rates of exceedance rather than the probabilities of exceedance, allowing the choice between the distributions to be based on the preference of the analyst.

In the following, the theory which describes the log-transformed ARI’s of maximum values from an ensemble of records of different hazard-generating mechanisms constitute a sample drawn from the standard Gumbel distribution is first derived, followed by an application of the theory to determine the best shape parameter using the wind gust records in South Australia. Since the GEV requires one less (i.e. the rate of threshold exceedance) model parameters than the GPD, it is employed for gust hazard analysis. This study demonstrates a theoretical basis to derive the best shape parameter value for one or multiple regions of different climatological conditions, with which the derived hazard model is safeguarded to produce extrapolated wind gust speed at high ARI’s with reduced bias. This method promises to be a useful tool in cases where one shape parameter value is applied across regions of various hazards, as of the case in Australian / New Zealand Standard (2021).

The method described herein is applicable to an ensemble of extremal data records that may be produced by different mechanisms from different geographical regions as it involves only the maximum value of each record. In addition, it applies whether the data be processed by the BM or the POT method. For derivation, we will concentrate on the POT method. If the occurrence of extremes exceeding a high threshold follows a Poisson process, the relationship between return period $R$ and average recurrence interval $A$ can be shown to be (Wang and Holmes 2020)

$$1/R=1-{e}^{-1/A} \left(1\right)$$

in which $R$ is conventionally defined as the inverse of the probability of exceedance. Assuming that the probability distribution of annual extreme $Y$ is ${F}_{Y}\left(y\right)=P\{Y\le y\}$, then the probability of $Y$ less than ${y}_{a}$, the $y$ value at $a$-year ARI, is

$${F}_{Y}\left(y\right)=P\{Y\le {y}_{a}\}={e}^{-1/a} \left(2\right)$$

Let ${Y}_{n}$ be the extreme value of $Y$ in an $n$-year time interval, then

$$P\{{Y}_{n}\le {y}_{a}\}={\left[P\{Y\le {y}_{a}\}\right]}^{n}={e}^{-n/a} \left(3\right)$$

If $Y$ is continuous, for every $y$ in $Y$, there is a unique $a$ in $A$, a bijective function $g:Y\to A$ exists that maps $Y$ to $A$. Therefore $A$ is also a random variable. With bijection, the function $g\left(Y\right)$ first maps $Y$ through the function ${F}_{Y}$ to the uniform random variable $U$ on [0, 1], then maps $U$ to $A$ by

$$A=-{\left(\text{l}\text{n}U\right)}^{-1} \left(4\right)$$

If ${F}_{A}\left(a\right)$ is the distribution function of $A$, we have

$${F}_{A}\left(a\right)=P\{A\le a\}={e}^{-1/a} \left(5\right)$$

which is an inverted exponential distribution (Lin, Duran, and Lewis 1989), the same as the right-hand side of Eq. 2. This is not surprising as both $Y$ and $A$ are mapped bijectively to $U$ through ${F}_{Y}\left(y\right)$ and ${F}_{A}\left(a\right)$, respectively.

Let ${A}_{n}$ be the corresponding ARI (in a reference time interval of $n$ years) of ${Y}_{n}$, and write

$$P\{{A}_{n}\le a\}={\left[P\{A\le a\}\right]}^{n}={e}^{-n/a}={e}^{-{e}^{-\left(\text{l}\text{n}a-\text{l}\text{n}n\right)}} \left(6\right)$$

If we further define

$$\varDelta {L}_{A}=\text{l}\text{n}A-\text{l}\text{n}n \left(7\right)$$

then $\varDelta {L}_{A}$ is also a random variable with the probability distribution,

$$P\{\varDelta {L}_{A}\le \varDelta {\widehat{L}}_{A}\}={e}^{-{e}^{-\varDelta {\widehat{L}}_{A}}} \left(8\right)$$

which is the standard Gumbel distribution.

In theory, if ${F}_{Y}\left(y\right)$ is perfectly known, any given ${y}_{a}$ and the corresponding $a$ will be known. In this case, if we choose, e.g. $a=n$, $\varDelta {\widehat{L}}_{A}$ will be zero. In practical situations, however, ${F}_{Y}\left(y\right)$ is not known, an observed quantity ${y}_{a}$ would correspond to an unknown $a$, which effectively constitutes a random sampling problem with interest of obtaining an estimated $a$ or equivalently $\varDelta {\widehat{L}}_{A}$.

Note that the distribution of $\varDelta {L}_{A}$ depends on neither the underlying distribution ${F}_{Y}\left(y\right)$ nor its distribution parameters. Suppose that there is a sample $\varDelta {\widehat{L}}_{{A}_{i}}$ of size $m$, $i=1,...,m$, from $m$ stations, where $\varDelta {\widehat{L}}_{{A}_{i}}$ is the maximum among the ${N}_{i}$ values recorded at station $i$, then provided that the $\varDelta {\widehat{L}}_{{A}_{i}}$’s are independent, the extreme-value distribution functions ${F}_{{Y}_{{n}_{i}}}\left({y}_{{a}_{i}}\right)$ used to model each of the $m$ records are allowed to be different. Consequently, records from different climatological regions may be combined to gauge the conformance of $\varDelta {\widehat{L}}_{{A}_{i}}$’s to the standard Gumbel distribution. In other words, this property allows different records of extremes to ‘learn’ from the experience of others by pooling the $\varDelta {\widehat{L}}_{{A}_{i}}$’s as if they were a sample drawn from a standard Gumbel variate.

Even though $\varDelta {L}_{A}$ is independent of the underlying ${F}_{{Y}_{i}}\left({y}_{i}\right)$’s, in practical situations for extreme hazard analysis, ${F}_{{Y}_{i}}\left({y}_{i}\right)$’s are typically unavailable and still needs to be substituted by an empirically determined distribution ${\tilde{F}}_{{Y}_{i}}\left({y}_{i}\right)$ with the distribution parameters being estimated from observational data, which are invariably plagued by sampling errors. That is, if the ${N}_{i}$ extreme values of ${Y}_{i}$ are arranged in ascending order, ${y}_{\left({1}_{i}\right)}\le {y}_{\left({2}_{i}\right)}\le ...\le {y}_{\left({N}_{i}\right)}$, then $\varDelta {\widehat{L}}_{{A}_{i}}$ based on ${\tilde{F}}_{{Y}_{i}}\left({y}_{\left({N}_{i}\right)}\right)$ is obtained. As $\varDelta {\widehat{L}}_{{A}_{i}}$ is determined by the largest value from station $i$, special attention should be paid to ensure that all the $m$ largest values are contributed by independent extreme events. If one event contributes to multiple $\varDelta {\widehat{L}}_{{A}_{i}}$’s, the $\varDelta {\widehat{L}}_{{A}_{i}}$ value which represents the highest ARI among them is kept in the analysis but all others triggered by the same event should be discarded.

The duality of the GEV and GPD ensures that they exhibit the same tail behaviour and have the same shape parameter (Wang and Holmes 2020) when applied to the same set of data. The GEV is used in this study as it does not depend on the rate of exceedance, even though the wind gust data used in this study (described in the next section) were chosen by the POT method. Because of the duality, the outcomes and conclusion drawn for the GEV should be equally applicable to the GPD.

The GEV may be expressed as

$$P\{Y\le {y}_{a}\}={e}^{-{\left[1-k\left(\frac{{y}_{a}-\eta }{\sigma }\right)\right]}^{1/k}} \left(9\right)$$

where $\eta$, $\sigma$, and $k$ are the location, scale, and shape parameters, respectively, of the distribution. ${y}_{a}$ can be related to its corresponding $a$ as follows,

$${y}_{a}=\left\{\begin{array}{ll}\eta +\frac{\sigma }{k}\left(1-{a}^{-k}\right),& \text{if }k\ne 0;\\ \eta +\sigma \text{l}\text{n}a,& \text{otherwise}.\end{array}\right. \left(10\right)$$

For extreme hazard analysis, the analysts exercise their own decisions for the type of extreme value distributions. The distribution parameters are then estimated by a model-fitting method such as the least-squares regression (Press et al. 2007), method of moments (Ang and Tang 2007), probability weighted moments, maximum likelihood method, principle of maximum entropy, elemental quantile method, or Bayesian approaches (de Zea Bermudez and Kotz 2010). Except the method of moments and the maximum likelihood method, all other methods require an estimate of empirical cumulative distribution function (ECDF) for parameter estimation. Since the data to be analysed herein were extracted by the POT method, as to be described in the next section, the rate of exceedance was used to obtain the ECDF.

For a given wind type (e.g. non-synoptic) at a station, suppose there are $N$ extreme gust speeds exceeding a specified threshold in $n$ years and the occurrence of exceedance obeys a Poisson process, then an unbiased estimate of the rate of exceedance, ${\lambda }_{j},j=1,...,N$, with respect to the j-th smallest gust speed may be estimated by (Ang and Tang 2007)

$${\lambda }_{j}=\frac{N-j+1}{n} \left(11\right)$$

Because the ARI ${a}_{j}=1/{\lambda }_{j}$, Eq. 11 can be used to obtain the ECDF for hazard model fitting.

For simplicity and without loss of generality, in the following the least-squares linear (for cases with fixed shape parameter) and nonlinear (for cases with free shape parameter) regression techniques for the wind gust speed on ARI were used for model parameter estimation.

The first Dines pressure-tube/float anemometer in South Australia, managed by the Bureau of Meteorology, Australia, became operational around 1956. Three datasets of 3-second wind gust speeds, recorded at 10 meters high, were acquired: half-hourly data (up to January 2015) from 64 stations, daily data (up to May 2017) from 76 stations, and one-minute data (up to May 2017) from 69 stations. After data screening, some of the stations were eliminated because of a high percentage of missing data, suspect recordings, or complicated topographical surroundings that make highly doubtful a gust speed could be corrected to terrain category 2 (i.e. open terrain) exposure as specified by Australian / New Zealand Standard (2021). This gives the longest record length of around 30 years among all the stations.

Because of the short record lengths of the datasets, insufficient number of each of the convective windstorms such as downbursts, thunderstorms, and tornadoes were recorded at a station, they were hence grouped as non-synoptic wind events. Other non-convective, large-scale events were grouped as synoptic wind events. Synoptic and non-synoptic winds were considered separately and only the records with data length $\ge$ 10 years were kept for analysis. This leaves 13 stations for synoptic and 12 stations for non-synoptic winds, as shown in Fig. 1. Ten of the stations had both synoptic and non-synoptic wind records. Even though not a large number of stations left for analysis, it fulfils the purpose of illustrating the use of the method introduced in Section 2 to gauge the accuracy of the probability distribution and distribution parameters for extreme wind gust modelling.

Application of probability distributions for analysis of observed data typically requires independence of the data points. For extreme wind gust analysis, this requires different recorded gust speeds be generated by different storm events. To reduce the inadvertent inclusion of multiple peak gust speeds from the same wind event, minimum separation intervals of 4 days for synoptic and of 12 hours for non-synoptic wind gusts (Lombardo, Main, and Simiu 2009) were specified. In addition, the gust speeds were corrected as follows:

The instrumented anemometers were changed from the Dines anemometers to the three-cup anemometers around 1991. The wind gust speeds recorded by the two anemometer types were somewhat incompatible, hence required correction as suggested by Holmes and Ginger (2012).
The recorded 3-second gust speeds were corrected for the effects of terrain, topography and of shielding by nearby plantation and construction in the cardinal and inter-cardinal directions around each station in accordance with Australian / New Zealand Standard (2021).
The 3-second gust speeds were then converted to 0.2-second gust speeds (Holmes and Ginger 2012).
A storm-type separation algorithm (Holmes 2019) was used to split the gust events into synoptic and non-synoptic wind types.

For the wind gust hazard modelling of both wind types in the next section, only those exceeding a threshold of 25 m/s were retained for analysis, as shown in Fig. 2.

4.1 Shape parameters for component wind hazards

Different shape parameter values of a GEV distribution fitted to a wind gust record led to different estimates $\varDelta {\widehat{L}}_{A}$ of $\varDelta {L}_{A}$. This section illustrates the computation of $\varDelta {\widehat{L}}_{{A}_{i}}$’s given $m$ data series of a wind hazard type (synoptic, non-synoptic, or combined wind hazard) and the determination of a shape parameter. The resulting shape parameter gives the best fit of $\varDelta {\widehat{L}}_{A}$ to the standard Gumbel distribution.

The $m$ independent $\varDelta {\widehat{L}}_{{A}_{i}}$’s are plotted against the theoretical quantiles of standard Gumbel variate determined by a plotting position formula (Cunnane 1978). If the plotted data points fall closely along the diagonal line, the chosen hazard model and its assumed parameters are consistent with that implied in Eq. 8. However, if the $m$ data points fall below (above) the diagonal line, it means the model underestimate (overestimate) the ARI value; i.e. overestimate (underestimate) the hazard. If the data points form a linear trend that crosses the diagonal line with slope < 1, then it means the hazard model may have too many parameters and hence may be inappropriate for predicting the extreme values of ARI’s beyond the record length (Van Den Brink and Können 2008).

A range of shape parameter $k$ values was used to fit the 13 synoptic and 12 non-synoptic wind records. The root mean squared errors (RMSE’s) between $\varDelta {\widehat{L}}_{A}$ and its idealised counterpart from the standard Gumbel variate were computed. The best $k$ value was chosen based on the minimisation of RMSE. Figure 3 shows the RMSE values versus $k$ values, in which $k=$ 0.2 for synoptic and $k=$ 0.25 for non-synoptic (shown as star-shaped points) were revealed to be optimal. The hazard curves determined using the optimal $k$ values and the observed wind gusts are plotted in Fig. 4.

The Gumbel quantile-quantile (Q-Q) plots in Fig. 5 (a) shows that the GEV models with fixed $k=$ 0.2 for synoptic and $k=$ 0.25 for non-synoptic approximately follow the diagonal line, hence are in agreement with the theory implied in Eq. 8. The commingled synoptic and non-synoptic winds (red connected points), representing the sample of maximum recorded data points from stations of two different generating mechanisms, follow also the standard Gumbel distribution, as asserted in Section 2.

Instead of fixing the $k$ values, if all three GEV distribution parameters were determined by nonlinear regression for each of the $m$ data records, Fig. 5 (b) shows that the lines connecting $\varDelta {\widehat{L}}_{A}$ values cross the diagonal line, meaning that they are overestimated (above the diagonal line) in the lower-value range but underestimated (below the diagonal line) in the higher-value range, and hence do not follow the standard Gumbel. This implies that the fitted models may be biased and hence inappropriate for extrapolation to high ARI levels. The conundrum may be of a consequence that, with free shape parameter, the GEV has too many parameters such that the fitted models exhibit unacceptably high extrapolation bias to high return levels, which manifests as an underestimated standard deviation of $\varDelta {\widehat{L}}_{A}$. In this regard, fixing the $k$ value, as of the case in the Australian standard (Australian / New Zealand Standard 2021), would avoid such unfavourable bias, and hence a sensible decision for more reliably determining the design wind speeds at ARI’s beyond the available data lengths. If indeed individual $k$ value for each station is preferred, then in addition to the standard goodness-of-fit tests for interpolation, Eq. 8 can serve as a safeguard for extrapolation of the estimated hazard models.

The values for the abscissa in Fig. 5 represent $m$ theoretical quantile values (denoted by $\varDelta G$). They can be computed by inverting the standard Gumbel distribution and using a plotting position formula to estimate the ECDF as follows,

$$\varDelta {G}_{i}=-\text{l}\text{n}\left(-\text{l}\text{n}\left(\frac{i-c}{m+1-2c}\right)\right) \left(12\right)$$

where $c$ depends on the plotting position used (Cunnane 1978). For small sample sizes (as of the cases in this study), $c$ may need to be carefully chosen depending on the objective of study since different choice may lead to unacceptable difference in results. $c=0$ (Weibull plotting position) was used for the results shown in Fig. 5 and the rest of the paper as it produces comparatively conservative results. As an illustration for the extent of difference by using different $c$ values, $c=0.5$ (Hazen plotting position) was tested and the resulting $k$ values for synoptic and non-synoptic events, respectively, were 0.217 and 0.276. These represent about 8% and 10% differences, respectively, from that with $c=0$. Among the most commonly used plotting positions ($c\le 0.5$), Weibull and Hazen typically give rise to the most and least, respectively, conservative hazard modelling results (Folland and Anderson 2002).

4.2 Shape parameter for combined wind hazard

As shown in Fig. 4, the synoptic and non-synoptic wind events at a location pose different extent of threats as they are induced by different climatic mechanisms. The hazards posed by both mechanisms need to be taken into account for design of structures. One way for estimating the combined hazard is to commingle all extreme gust speeds from all mechanisms for wind hazard analysis. However, hazard models derived from commingled datasets tend to underestimate wind hazard at higher ARI’s (Gomes and Vickery 1978; Lombardo, Main, and Simiu 2009). An alternative is to combine the hazards of different, independent mechanisms by probability theory (Wang and Holmes 2020; Holmes and Bekele 2021). However, no closed-form probability distribution is readily available for probabilistic combination of hazards with parent GEV distributions, and no simple expression exists for the shape parameter of the combined hazard model. As a result, Monte-Carlo simulation was conducted to generate annual gust speeds for 1000 years from the best models (i.e. $k=$ 0.2 for synoptic and $k=$ 0.25 for non-synoptic) for each of the 10 locations where the records of the two wind types were available. For a given year at a specific site, the maximum of the two generated gust speeds was taken as the extreme speed of the year. Figure 6 shows the generated combined annual extremes (red lines) up to 1000-year ARI and the best fitted hazard curves of the non-synoptic (green lines) and synoptic (blue lines) winds for the 10 locations.

As an approximation and for comparison with the Australian Standard, the generated combined gust speeds were fitted to a GEV distribution. Similar to the model parameter estimation described in Section 4.1, we computed the RMSE between estimated $\varDelta {\widehat{L}}_{A}$ and the standard Gumbel quantiles, and plotted it over a range of $k$ values, as shown in Fig. 7. It shows that the optimal $k$ value is about 0.16 for the combined hazard. Incidentally, this $k$ value is close to $k=0.161$ obtained by Holmes and Moriarty (1999) for thunderstorm downbursts at Moree, New South Wales, Australia, which is located in wind hazard region A, the same region as the locations studied herein.

For the combined hazards, Fig. 8 shows the Gumbel Q-Q plot of $\varDelta {\widehat{L}}_{A}$ by the GEV models with $k=$ 0.16, which indicates that the simulated wind hazards agree with the theory, whereas with $k=0.10$ the hazard models overestimate the hazard (i.e. underestimate the ARI). The simulated maximum wind speed of 1,000 years among the 10 stations is ${V}_{\text{m}\text{a}\text{x}}=$ 46.2 m/s at Port Augusta Aero. For $k=$ 0.16 with $\varDelta {\widehat{L}}_{A}=$ 2.33, ${V}_{\text{m}\text{a}\text{x}}$ is predicted (by Eq. 7) to have an ARI of 10,229 years (close to 10,000 years inferred by the 10 stations simulated independently for 1000 years), whereas for $k=0.10$ with $\varDelta {\widehat{L}}_{A}=$ 0.22, it is predicted to have an ARI of 1,250 years. Incidentally, in AS/NZS 1170.2:2021 (Australian / New Zealand Standard 2021), $k=0.10$ is used for all the four wind regions, the regional wind speed of ARI $=1000$ years for Region A (where the studied area is located) is 46 m/s. This comparison shows that the computed results agree well with the Australian standard and implies that the standard may have overestimated the wind gust hazard for South Australia. Incidentally, a recent study (El Rafei et al. 2023) on the wind gust hazard in New South Wales, Australia, using high-resolution Australian regional reanalysis found that using $k=0.10$ overestimates the 500-year ARI gust speeds, when compared to that using variable $k$ values, by approximately 4% for non-synoptic and 2.5% for synoptic events.

Figure 9 illustrates the fitted combined hazard models with $k=0.10$ and $k=$ 0.16 along with the simulated annual extreme gust speeds (i.e. same as the read lines in Fig. 6) for the 10 locations. Compared with $k=0.10$, the curves with $k=$ 0.16 provide closer fit to the data points in most locations and, as expected, result in lower gust speeds at high ARI years. On average, as shown in Fig. 10, the models with $k=0.10$ give about 2.9% and 3.5% higher gust speed estimates than that with $k=$ 0.16 for ARI’s of 500 and 1000 years, respectively. That is, the Australian standard-specified design wind speeds for South Australia generally fall on the conservative side with respect to the design for wind actions of structures specified as of importance levels 2 (domestic housing and structures under normal operations) and 3 (construction designed to contain a large number of people) in the 2019 National Construction Code of Australia (Australian Building Codes Board 2019). Nevertheless, dependent upon the balance of the benefits gained versus the costs incurred due to the more conservative design wind speed, the resulting higher cost but more conservative construction may be justified if the additional benefits gained are deemed to outweigh the extra costs incurred.

Because many currently available extreme data of climatic events such as observational extreme wind gusts span only a few tens of years, resulting in high bias and uncertainty of distribution parameter values estimated based on the data, and hence unreliable predicted extreme values when extrapolated to high ARI’s beyond the range of data length. For the quality of model fitting, the typical goodness-of-fit tests allows assessment within the record length, but unable to test the fitness for higher ARI’s. The approach used in this study serves to gauge whether the fitted model is appropriate and unbiased for extrapolation, providing a mechanism to safeguard the accuracy of fitted values longer than the record length or at high ARI’s, which is what’s typically needed for engineering design and reliability assessment.

The ARI is proved to follow an inverted exponential distribution and the log-transformed ARI (i.e. $\varDelta {\widehat{L}}_{A}$) becomes Gumbel distributed and is often used for visualising the estimated hazard. Moreover, the ability of the method in pooling the ARI’s of maximum recorded data points from all observational stations, even when they are from regions of different hazard-generating mechanisms, may be regarded as a generalisation of the ‘super-station’ approach that has been used to commingle all the extreme wind gust or rainfall data from a climatologically uniform region. Therefore, the method is useful for cases such as the wind gust speed specification in the Australian standard in which a shape parameter value is applied across the four wind regions. In such cases, the estimated values of $\varDelta {\widehat{L}}_{A}$ for the records from all regions may be combined and the $k$ value that fits best the $\varDelta {\widehat{L}}_{A}$ to the standard Gumbel distribution can be chosen to apply to all regions.

In the Australian context, although it often occurs that non-synoptic wind gusts dominate the extreme wind climate, particularly at larger ARI’s, consideration of both synoptic and non-synoptic is necessary as the combined wind hazard tends to have a smaller shape parameter value that, compared to a larger one, typically leads to higher wind gust values at high ARI’s. In addition, synoptic wind gusts at some locations dominate at smaller ARI’s, which is important for construction of temporary and secondary structures such as formwork, circus tents, and farm shelters that are intended to be in services for only a short period of time. The analysis of wind gust data from South Australia indicates that the shape parameter value of 0.1 used in the Australian standard, AS/NZS 1170.2:2021, may be too low that it appears to lead to overestimate the wind hazard, and hence fall on the conservative side, in South Australia.

As with typical experimental and observational studies, the accuracy of estimation by the method used in this paper clearly depends on the accuracy of measurement, the quality of the data collection and processing, and the classification of the right hazard-generating mechanism when heterogeneous mechanisms are concerned. In addition, the method assists only in selecting the hazard model and its fitted parameters for bias reduction of prediction at high ARI’s but does not provide uncertainty estimates of the prediction. If preferred, uncertainty quantification may be conducted by the bootstrap (Efron and Tibshirani 1993) or Bayesian analysis (Gelman, Hill, and Vehtari 2021). On the other hand, the method can be used in conjunction with uncertainty minimization techniques as well as within-data goodness-of-fit tests to help obtain a model of minimised extrapolation bias and variance.

Funding

This work was supported by research project: EE_BEE_NatHERS data and communication, funded by CSIRO, Australia.

Competing Interests

The author has no relevant financial or non-financial interests to disclose.

Author Contributions

Study conception and design, material preparation, data collection and analysis, and draft of the manuscript were performed by Chi-Hsiang Wang.

Acknowledgments

The author thanks Dr. John D. Holmes, Director of JDH Consulting, for kindly providing the correction factors for wind gust speeds related to the effects of terrain, topography, and shielding of buildings surrounding the anemometer stations, and providing insightful comments for the manuscript.

Ang, Alfredo H-S., and Wilson H. Tang. 2007. Probability concepts in engineering : emphasis on applications in civil & environmental engineering. 2nd ed. John Wiley & Sons, Inc.
Australian / New Zealand Standard. 2021. “Structural design actions Part 2: Wind actions.”
Australian Building Codes Board. 2019. “National Construction Code Volume 1.” https://ncc.abcb.gov.au/editions-national-construction-code.
Buishand, T. A. 1991. “Extreme rainfall estimation by combining data from several sites.” Hydrological Sciences Journal 36 (4): 345–65. https://doi.org/10.1080/02626669109492519.
Cunnane, C. 1978. “Unbiased plotting positions — A review.” Journal of Hydrology 37 (3-4): 205–22. https://doi.org/10.1016/0022-1694(78)90017-3.
de Zea Bermudez, P., and Samuel Kotz. 2010. “Parameter estimation of the generalized Pareto distribution-Part I.” Journal of Statistical Planning and Inference 140 (6): 1353–73. https://doi.org/10.1016/j.jspi.2008.11.019.
Efron, Bradley., and Robert J Tibshirani. 1993. An introduction to the bootstrap. Chapman & Hall.
El Rafei, Moutassem, Steven Sherwood, Jason P. Evans, and Fei Ji. 2023. “Analysis of extreme wind gusts using a high-resolution Australian Regional Reanalysis.” Weather and Climate Extremes 39 (November 2022): 100537. https://doi.org/10.1016/j.wace.2022.100537.
Folland, Chris, and Clive Anderson. 2002. “Estimating Changing Extremes Using Empirical Ranking Methods.” Journal of Climate 15 (20): 2954–60. https://doi.org/10.1175/1520-0442(2002)015<2954:ECEUER>2.0.CO;2.
Gelman, Andrew, Jennifer Hill, and Aki Vehtari. 2021. Regression and other stories. Cambridge University Press.
Gomes, L., and B. J. Vickery. 1978. “Extreme wind speeds in mixed wind climates.” Journal of Wind Engineering and Industrial Aerodynamics 2 (4): 331–44. https://doi.org/10.1016/0167-6105(78)90018-1.
Holmes, J. D. 2002. “A Re-analysis of Recorded Extreme Wind Speeds in Region A.” Australian Journal of Structural Engineering 4 (1): 29–40. https://doi.org/10.1080/13287982.2002.11464905.
———. 2019. “Extreme Wind Prediction – The Australian Experience.” Lecture Notes in Civil Engineering 27: 365–75. https://doi.org/10.1007/978-3-030-12815-9_29/FIGURES/9.
Holmes, J. D., and S. Bekele. 2021. Wind Loading of Structures. 4th ed. CRC Press, Taylor & Francis Group.
Holmes, J. D., and J. D. Ginger. 2012. “The gust wind speed duration in AS/NZS 1170.2.” Australian Journal of Structural Engineering 13 (3): 207–16. https://doi.org/10.7158/S12-017.2012.13.3.
Holmes, J. D., and W. W. Moriarty. 1999. “Application of the generalized Pareto distribution to extreme value analysis in wind engineering.” Journal of Wind Engineering and Industrial Aerodynamics 83 (1-3): 1–10. https://doi.org/10.1016/S0167-6105(99)00056-2.
Lin, C. T., B. S. Duran, and T. O. Lewis. 1989. “Inverted gamma as a life distribution.” Microelectronics Reliability 29 (4): 619–26. https://doi.org/10.1016/0026-2714(89)90352-1.
Lombardo, Franklin T., Joseph A. Main, and Emil Simiu. 2009. “Automated extraction and classification of thunderstorm and non-thunderstorm wind data for extreme-value analysis.” Journal of Wind Engineering and Industrial Aerodynamics 97 (3-4): 120–31. https://doi.org/10.1016/j.jweia.2009.03.001.
Palutikof, J. P., B. B. Brabson, D. H. Lister, and S. T. Adcock. 1999. “A review of methods to calculate extreme wind speeds.” Meteorological Applications 6 (2): 119–32. https://doi.org/10.1017/S1350482799001103.
Peterka, J. A. 1992. “Improved extreme wind prediction for the United States.” Journal of Wind Engineering and Industrial Aerodynamics 41 (1-3): 533–41. https://doi.org/10.1016/0167-6105(92)90459-N.
Press, William H, Saul A Teukolsky, William T Vetterling, and Brian P Flannery. 2007. Numerical Recipes: The Art of Scientific Computing. 3rd ed. Vol. 1. Cambridge University Press.
Simiu, Emil, and James Filliben. 1975. “Statistical Analysis of Extreme Winds.” National Bureau of Standards. https://www.govinfo.gov/content/pkg/GOVPUB-C13-415fc91b46e5b7d06ec2ee0b8e936ed9/pdf/GOVPUB-C13-415fc91b46e5b7d06ec2ee0b8e936ed9.pdf.
Simiu, Emil, and DongHun Yeo. 2019. Wind Effects on Structures. Chichester, UK: John Wiley & Sons, Ltd. https://doi.org/10.1002/9781119375890.
Van Den Brink, H. W., and G. P. Können. 2008. “The statistical distribution of meteorological outliers.” Geophysical Research Letters 35 (23): 1–5. https://doi.org/10.1029/2008GL035967.
———. 2011. “Estimating 10000-year return values from short time series.” International Journal of Climatology 31 (1): 115–26. https://doi.org/10.1002/joc.2047.
Wang, C.-H., and J. D. Holmes. 2020. “Exceedance rate, exceedance probability, and the duality of GEV and GPD for extreme hazard analysis.” Natural Hazards 102 (3): 1305–21. https://doi.org/10.1007/s11069-020-03968-z.
Wang, C.-H., X. Wang, and Y. B. Khoo. 2013. “Extreme wind gust hazard in Australia and its sensitivity to climate change.” Natural Hazards 67 (2). https://doi.org/10.1007/s11069-013-0582-5.

Download PDF

Journal Publication

published 20 Nov, 2023

Read the published version in Natural Hazards →

Editorial decision: Major revisions
07 Aug, 2023
Reviewers invited by journal
06 May, 2023
Reviewers agreed at journal
19 Apr, 2023
Editor assigned by journal
20 Mar, 2023
First submitted to journal
19 Mar, 2023

You are reading this latest preprint version

Bias reduction of high return levels for extreme hazard modelling

Status:

Journal Publication

Version 1

Abstract

Figures

1. Introduction

2. Method

3. Data

4. Results

4.1 Shape parameters for component wind hazards

4.2 Shape parameter for combined wind hazard

5. Conclusion

Declarations

References

Supplementary Files

Status:

Journal Publication

Version 1