Data sources were used to gather daily information on mortality, air pollution, weather, hospital admissions for influenza, and public holidays in Hong Kong. All data were indexed daily to form a time series from 01 January 1999 to 30 November 2019. All data processing and analyses were performed using the statistical computation language R, and models were generated using the ‘mgcv’ package.
2.1. Mortality
Mortality data were obtained from the death registry supplied by the Census and Statistics Department of Hong Kong. Data were filtered over three cause groups; all diseases, circulatory diseases, and respiratory diseases. Data from 01 January 1999 to 31 December 2000 were acquired, and cause of death was differentiated in accordance with the International Classification of Diseases (ICD) version 9. All deaths were filtered by numeric codes ranging from 001–799, deaths from circulatory diseases were filtered via codes 390–459, and deaths from respiratory diseases were filtered via codes 460–519. Data from 01 January 2001 to 31 December 2016 were differentiated in accordance with the ICD-10, therefore all deaths were filtered by numeric codes ranging from A00–R99, deaths from circulatory diseases were filtered via codes I00–I99, and deaths from respiratory diseases were filtered via codes J00–J99 (Lin et al., 2017a). Three more cause groups based on conditions commonly considered to be associated with air pollution were also incorporated into the current study; mental and behavioral conditions (Dales & Cakmak, 2016; Ho et al., 2020), diseases of the nervous system and sense organs (Calderón-Garcidueñas et al., 2015; Genc et al., 2012), and diseases of the skin and subcutaneous tissue (Kim et al., 2016). ICD-10 codes F00–F99 and ICD-9 codes 290–319 were used to filter deaths associated with mental conditions. ICD-10 codes G00–G99 and ICD-9 codes 320–389 were used to filter deaths associated with diseases of the nervous system. ICD-10 codes L00–L99 and ICD-9 codes 680–709 where used to filter deaths associated with skin diseases.
2.2. Air pollution
Hourly air pollution data including PM2.5 levels were obtained from the Hong Kong Environmental Protection Department. Only 4 of 19 weather stations collected PM2.5 levels before 2004, but more weather stations began to monitor PM2.5 levels after that time. By the end of 2019 a total of 16 weather stations across Hong Kong were monitoring PM2.5 levels. In the present study daily average pollution levels were calculated using all the data available for each given timepoint. In accordance with many prior studies (Lin et al., 2017a; Lin et al., 2017b), daily mean and daily peak PM2.5 concentrations were calculated. Daily meteorological data such as mean temperature (degrees Celsius) and relative humidity (percentage) were also collected. Daily data from all available stations where averaged to obtain daily means.
2.3. Influenza hospital admissions
Influenza hospital admissions data were obtained from the Hong Kong Department of Health’s Centre for Health Protection. These data record the weekly influenza admissions totals. In accordance with Qiu et al. (2012) an “outbreak” week was defined as a week exceeding the 75th percentile of admissions for all weeks in that year. Notably the Centre for Health Protection has stated that “Since Feb 10, 2014, Public Health Laboratory Services Branch has adopted new genetic tests … this transition … may bring about increases in detection of and percentage positive for influenza viruses” (CHP, 2014: 2).
2.4. DECH metric
As initially proposed by Lin et al. (2017c) the DECH metric is defined as “daily concentration hours > 25 µg/m3 … [where] for example, an hour with a mean concentration of 28.5 µg/m3 contributes 3.5 concentration-hours to the daily total; and hours with average concentration lower than 25 µg/m3 contribute zero … to the daily total”. The boundary of 25 µg/m3 was chosen by Lin et al. (2017c) based on guidelines published by the World Health Organization (2006). DECH values were calculated for each day on a per-station basis, then the mean DECH of all available stations was used to define the DECH for that day over the region.
2.5. Statistical model
A model was generated then applied to different segments of the time series data. In an effort to maximize consistency and reproducibility, a generalized additive model (GAM) with an expected quasi-Poisson distribution was generated in accordance with Lin et al. (2017c). The aim of this model was to relate the discrete variable of daily circulatory mortality (count) to PM2.5 concentrations. By finding the coefficient on the DECH term for the model, a relative mortality risk effect percent relationship to changes in DECH PM2.5 levels can be calculated.
The specific statistical model is as follows, where the time series Y is indexed by day, and hence E[Yt] gives the expected daily cardiovascular mortality at day t:
DECH is the mean daily measure described in section 2.4 for PM2.5 concentration lag 3 days. DECH(-l) is lagged l day from t as described in Lin et al. (2017c), where acute mortality occurs between hours and days from initial exposure to elevated levels. MT is the mean temperature (degrees Celsius) at lag 0, and MT1-3 is a moving average of MT from days lag 1 through 3. This parameter was chosen for similar reasons as DECH being lagged 3 days. MRH is mean relative humidity (%) at lag 0. INFL is a dummy variable that takes the value of 1 when the given day at t is contained within a week designated as an “outbreak” as described in section 2.3. above. DOW refers to day of the week, a dummy variable ranging from 0 to 6 from Monday through Sunday. PH is a dummy variable indicating a public holiday on the present day, where 0 indicates no holiday and 1 indicates a holiday (including Sunday, as defined by the Hong Kong government). The temporal index t was included to account for the clear trend and seasonality described in section 1.1. above, and α is a random error term. The model incorporates smoother functions as penalized regression splines; s(). Degrees of freedom were chosen in accordance with standards described in Lin et al. (2017c) and Tian et al. (2013).
2.6. Model DECH lags
In the above model, DECH lag l was 2 days when applied to all mortalities, 3 days when applied to circulatory system mortalities, and 2 days when applied to respiratory system mortalities. These lag days were differentiated to match the significance figures identified and used by Lin et al. (2017c). For the newly added cause groups, 1 day lag was applied to mental condition mortalities and nervous system mortalities, and 0 day lag was applied to skin mortalities (Ho et al., 2018).
2.7. Model objectives
The data sources and model were carefully constructed to replicate the methods described in Lin et al. (2017c). That study incorporated three models over the mortality groups; all cause, circulatory system, and respiratory system ranging from 1998–2011. The data used in the current study spanned from 1999–2019, facilitating testing and validation of the results over a more comprehensive scale. Three additional mortality groups were also incorporated into the current study; mental and behavioral, nervous system and sense organs, and skin. Notably the lack of 1998 data is due to fine suspended particulate (FSP) data not being available from the Environmental Protection Department for that year. It is unclear how other reports were able to include this data.
Part A of this study aimed to directly replicate results reported by Lin et al. (2017c) within the same time series, and Part B aimed to investigate validity beyond the fitted time series. In Part B the 13-year model in Part A was fitted on a sliding window basis starting in 1999, extending through 2007, and ending in years 2011 and 2019, generating 9 models to test the significance of the model on newer and out-of-sample data (data from 2012–2019). In Part C, to test shorterterm changes in DECH, models were fitted to 4-year periods on a sliding window basis starting from 1999 and ending in 2019 inclusive, yielding fitted models across mortality groups for time series beginning with the year range 1999–2002, and extending to the year range 2016–2019. In Part D three additional models were incorporated, derived from the mortality groups mental and behavioral, nervous system and sense organs, and skin using 5-year periods on a sliding window basis starting from 1999 and ending in 2019 inclusive. This resulted in fitted models across mortality groups for time series with year ranges beginning at 1999–2003, and extending to 2015–2019.