Survey-experiment design
Survey flow
Informed consent was obtained from all the participants before trained interviewers initiated each survey and the questionnaire survey in this study guaranteed the privacy rights of participants. This study was approved by the Research Ethics Committee of China University of Geosciences in Beijing [CUGB-EC-2022001]. We confirmed that all experiments were performed in accordance with relevant guidelines and regulations.
Our questionnaire data were derived from a field survey(N=568) conducted in 64 gyms of 16 administrative districts of Beijing from 25th July to 30th September, 2021. We randomly selected 24 gyms in urban area, 24 gyms in suburban area and 16 gyms in exurban area from 16 administrative districts of Beijing and applied convenience sampling to randomly select 20 respondents of each gym for a questionnaire interview. However, the response rate in urban and suburban area were nearly 50% and the response rate in exurban area was nearly 30%,because there were fewer gyms sports people in the exurban area. As a result, we deleted samples of respondents who have lived Beijing for less than three years and people have no gym sports habits and we used the remaining 488 sets from 568 filled questionnaires.
Questionnaire design and measure air risk perceptions
The questionnaire used in this investigation was purpose-designed, incorporated review and input from researchers by using a professional questionnaire App (Wenjuanxing). The survey included the following information for every respondent: (1) gym sports experiences and daily gym exercise information; (2) air risk perceptions when travelling outside; (3) potential sports mode alterations under air pollution; and (4) subjective gym sports frequencies in P1-P5 of the COVID-19 epidemic.
We measured air risk perceptions by 5 questions from question8 to question12 of the questionnaire, using a five-point Likert scale, where “1” signified the least favorable response and “5” signified the most favorable response. These five questions showed four dimensions, namely air satisfaction (the degree of air quality satisfaction), air concern (the degree of concern about air pollution, an additive index of concern health and concern going out on polluted days), air check (the frequency of checking the air quality index on phone before going out) and gym substitution (the degree of the substitute effect of gym sports on outdoor sports on polluted days). The complete questionnaire questions and variable descriptions were supplied in ‘Questionnaire design and basic statistics’ and Supplementary Table 1 of Supplementary Material.
Actual PM2.5 concentration estimates
PM2.5 data collection
To evaluate air quality, hourly PM2.5 data in Beijing from 1 January 2019 to 31 December 2020 were obtained from the Beijing Municipal Environmental Monitoring Center39. Since 2013, there have been 35 environmental monitoring stations (EMSs) in Beijing, the botanical garden EMS was eliminated from the twelve urban EMSs in this study and 34 EMSs were used to obtain hourly PM2.5 concentration in this study based on the Ambient Air Quality Standards (GB 3095-2012)40.
LUR model under COVID-19
Predictor variables. We used average PM2.5 concentrations of P1-P5 and independent variables to build LUR models, including land use, digital elevation model (DEM), meteorological variables, road length, population density, remote sensing PM2.5 data, normalized difference vegetation index (NDVI), aerosol optical depth (AOD) and point of interests (POI). These nine driving factors were selected on the basis of their significant impact on PM2.5 concentrations41, 42.
Land use data were derived from Esri 2020 Land Cover with a spatial resolution of 10 m. (https://livingatlas.arcgis.com/landcover/). Land use type was divided into cropland, woodland, grassland, water bodies, construction land, and unused land. The specific indicator of the land factor was the ratio of land area to buffer area (area of circular buffer).
Elevation data were obtained from Advanced Spaceborne Thermal Emission and Reflection Radiometer Global Digital Elevation Model, version2 (ASTER GDEM v2) with a spatial resolution of 30m (https://yceo.yale.edu/aster-gdem-global-elevation-data).
Meteorological data were extracted from the ERA-Interim reanalysis dataset of European Centre for Medium-Range Weather Forecasts (ECMWF), with a spatial resolution of 0.125°×0.125°. The data downloaded were daily means of 2metre temperature data, U wind speed data, V wind speed data, Boundary layer height data, surface pressure data and total precipitation data, which needed to be summed into mean data of P1-P5 by Python prepared for later modelling43.
Road data were obtained from OpenStreetMap (https://www.openstreetmap.org). The specific road factor indicator was the ratio of road length to buffer area.
Remote sensing PM2.5 data were provided by the MODIS/Terra + Aqua Level 3 (L3) Yearly 0.01degree gridded ground-level PM2.5 products in Eastern China (ECHAP_PM2.5_Y1K) from 2019 to 2020 (https://zenodo.org/record/4660858#.YU2JKux0IdU)44, 45.
Population density data used in this study were based on Gridded Population of the World data (GPW) from the Columbia University Socioeconomic Data and Application Center (CU2020) as raster data with a resolution of 1 km × 1 km46.
Normalized difference vegetation index (NDVI) data were provided by the MOD13A2 Version 6 product, which was a MODIS/Terra Vegetation Indices 16-Day L3 Global 1 km SIN Grid with a spatial resolution of 1km (https://lpdaac.usgs.gov/products/mod13a2v006/). Data were calculated into mean data for the corresponding period of P1-P5.
Aerosol Optical Depth (AOD) data were obtained from the MCD19A2 Version 6 data product at 1 kilometer (km) pixel resolution. (https://lpdaac.usgs.gov/products/mcd19a2v006/)47.
Point of interests (POI) data were derived from Amap, by applying API based on category and keyword semantics (http://lbs.amap.com/api/webservice/guide/api/search/). Different POI categories in different buffer sizes represented different emission information with regard to PM2.5. In this study, we used four types of POI as pollutant emission sources, including bus station, gas station, polluted enterprise and Chinese restaurant. To reflect the impact of local emission sources and possible regional transmission, the buffer sizes for POI in this study were set from 100m to 7000m48. This study considered a total of 29 subcategories, which included 103 independent variables within the nine major independent variable categories (Supplementary Table 7-8).
Model building. A five-step backward method was adopted for fitting the LUR model49-51. For all available potential predictor variables, bivariate correlation analysis with average PM2.5 concentrations of P1-P5 were conducted first (Supplementary Table 9). Second, the predictor variables were sorted by adjusted R2. Third, other variables (Pearson correlation coefficient R ≥ 0.7) with high relevance to the highest ranked variables in each subcategory were removed. Fourth, all the remaining predictor variables and PM2.5 concentration were entered into the stepwise linear regression model to obtain multiple linear regression equation. Finally, the significance level (p-value<5%) and variance inflation factor (VIF<4) of each predictor variable were checked to confirm the variables' significance levels and ensure no issues of multicollinearity. The leave-one cross-validation (LOOCV) and 10-fold CV method were chosen to evaluate the predictive capacity of the model. From the cross-validation, the R2 and root-mean-squared error (RMSE) were used to evaluate and compare the predictivity of the model52-53(Supplementary Table 10).
Model results. The results of the five LUR model simulations were strongly correlated with the independent in situ PM2.5 values based on the leave-one cross-validation (LOOCV) and 10-fold CV results (Supplementary Table 10) on the scale of an epidemic of P1-P5. Then we extracted the corresponding PM2.5 concentration values of the gym in P1-P5 as PM2.5 concentrations of P1-P5. After the final regression models were obtained, regular 1 km × 1 km grids were generated and calculated predicted PM2.5 concentrations at the grid points. Finally, kriging interpolation were conducted to generate PM2.5 concentration distribution simulation maps of Beijing in P1-P5. Then we extracted the corresponding PM2.5 concentration values of 64 gyms in P1-P5 as PM2.5 exposure concentrations in P1-P5 of the COVID-19 pandemic, preparing for the follow-up study of the effect of air pollution on gym visits under COVID-19.
Gym data collection
Attribute information and location data of gyms were derived from the Chinese leading local lifestyle information and trading platform ‘Dianping.com’ and a leading provider of digital map content, navigation and location service solutions in China-‘Amap’ respectively.
First, we looked for properties and location information of all gyms in 16 districts of Beijing by applying python toolkit, crawling the data searched with the keyword “gym” on the Dianping.com. There were 3627 gyms data in Beijing, including gym name, star rating, comments, per capita consumption and address, which could be used as control variables in the following regression models. Among this, we used gym comments of P1-P5 from platform Dianping.com to describe Beijing’s gym visits under COVID-19, we implicitly assumed that the probability of writing a review was not related to the air pollution level by using the count of reviews as a proxy for the count of gym visits. If customers were less likely to write reviews on polluted days although some of them still exercised in gyms, the negative impact of pollution on gym visits would be overestimated, and vice versa. Second, in order to obtain specific latitude and longitude data of gyms, we crawled 2869 gym data in Beijing from Amap. Third, we merged gym data from two different platforms through name and address. As a result, we extracted the attributes and location information of 64 selected gyms and apply them to the following quantitative models.
Statistical analysis
Fig. 1 shows the overall theoretical framework used for assessing the effect of air pollution (air risk perceptions/PM2.5 concentration) on gym sports behavior.
Associated with air risk perceptions
To study the effect of people’s air risk perceptions (air satisfaction, air concern, air check and gym substitution) on their subjective sports choice on polluted days, we used a multiple linear regression approach through equation (1).
Possibilityi = α + β1 Air satisfactioni + β2 Air concerni + β3 Air checki + β4 Gym substitutioni + βXi + εi (1)
Possibilityi (i = 1,2) represented the possibility of choosing gym sports or outdoor sports on polluted days, which were measured by question15 and question16 from questionnaire respectively. Xi represented gym controls (gym star rating, lcapita consumption and gym comments in 2019–2020) and individual controls (area, gender, age, educational level, income, occupation type, subjective health state, years of sports in gyms and logged of sum minutes spending in gym every time), which were all described in Supplementary Table 1.
Concerning the impact of air risk perceptions on specific change of sports mode on polluted days, multiple logistic regression analysis was employed through Eq. (2).
Logit (CHOICE i) = \({\sum }_{i=1}^{4}\text{l}\text{n} \left(\frac{p(CHOICE \le i)}{1-p(CHOICE\le i)}\right)\)= αi + β1 Air satisfactioni + β2 Air concerni + β3 Air checki + β4 Gym substitutioni + βXi (2)
CHOICEi represented the four options for changing sports way, including switching from outdoor sports to gym sports, switching from gym sports to outdoor sports, switching from sports to non-sports and keep original sports mode, which were measured by question17 from questionnaire. The control variables were the same as those of Eq. (1).
Associated with the actual air pollution concentration
Specific exposure assessments in P1-P5 under COVID-19. We investigated whether gym visits in Beijing was influenced by the actual PM2.5 concentration and climate conditions under the influence of the COVID-19 epidemic in P1-P5. We used gym comments of P1-P5 from platform Dianping.com to describe Beijing’s gym visits in P1-P5. And we used a negative binomial model to test how the count of the gym visits varies as a function of air pollution and climate conditions:
NUMit = \(\alpha\)0 + \(\alpha\)1 PM2.5it + \(\alpha\)2Wit + \(\alpha\)3Xit + Tt + \({\gamma }_{i }\)+ \({\epsilon }_{it}\) (3)
NUMit and PM2.5it represented the comments reviews and PM2.5 concentration of gym i on COVID-19 period t. Wit represented weather conditions, including gym mean temperature and precipitation of P1-P5. Xit were used as control variables, which added changes in sports awareness from 2019 to 2020 to the model based on the control variables of Eq. (1). Tt and \({\gamma }_{i }\) were used to control for COVID-19 wave fixed effects and gym area fixed effects. We clustered standard errors by respondent. Coefficient \({\alpha }_{1}\) reflected people’s gym visits under different pollution concentrations, which was expected to be positive.
To avoid collinearity with PM2.5 data generated by LUR model, the average temperature and precipitation data of P1-P5 used in Eq. (3) were based on another meteorological data obtained from 18 standard meteorological stations in Beijing Meteorological Informational Center, including hourly temperature and precipitation.
What’s more, we defined the variable COVID19 to denote the number of confirmed cases in P1-P5 waves and introduced the interaction term COVID19*PM, COVID19*PRE and COVID19*TEM into Eq. (3) to generate Eq. (4).
NUMit = \(\alpha\)0 + \(\alpha\)1 COVID19t + \(\alpha\)2 PM2.5it + \(\alpha\)3 Wit + \(\alpha\)4 Xit + \(\alpha\)5 PM2.5it \(\bullet\) COVID19t + \(\alpha\)6 Wit \(\bullet\) COVID19t + \(\alpha\)7Xit \(\bullet\) COVID19t + Tt + \({\gamma }_{i }\)+ \({\epsilon }_{it}\) (4)
COVID19t represented the cumulative number of confirmed cases of COVID-19 in P1-P5. Coefficient \({\alpha }_{2}\) reflected the effect of PM2.5 concentration on gym visits under the background of the COVID-19 epidemic, which was expected to be positive.
Robustness check. We used robustness check to replace the PM2.5 variable with PM2.5 data obtained from the Kriging interpolation method of the MEP PM2.5 monitoring sites and that from the US Embassy and Consulates to verify the association of actual PM2.5 concentration with gym visits in P1-P5 of the COVID-19 pandemic54. What’s more, subgroup analyses were conducted to test the heterogeneous effect of air pollution on gym sports behavior and verity the reliability of all experiments.