Geographical data
We collected addresses of the COVID-19 cases form two Application programming interfaces (API) from from January 18th to April 30th, 2020, and one of the Application Programming Interfaces (APIs) was based on the Tencent location-based service (https://ncov.html5.qq.com/api/getCommunity). The other was built by vuejs (JAVA SE 8, available in Github: https://github.com/hack-fang/ncov-map) from 510 Chinese Centers for Disease Control and Prevention (CDC) local website, local health commission website and WeChat public accounts. Only laboratory-confirmed cases of COVID-19 were reported in those websites or public accounts and data from the official reports of the health commission, CDC report or WeChat public accounts of 162 city-level administrative units was collected by the two APIs. We extracted the latitude and longitude coordinates, reported time and reported city of the COVID-19 cases from those two POIs using.
We collected the points-of-interest (POIs) dataset extracted from Autonavi (Gaode) by Tencent, a Chinese desktop web mapping service application that provides information on the names, locations, and types of various facilities [14]. Following previous studies and the classification of POI data itself [14], eight types of facilities were extracted from the POI dataset including restaurant (e.g. food restaurant, tea house and bakery), shopping center (e.g. supermarket, sports store, commercial street and clothing store), hotels, living facilities (e.g. travel agency, ticket office and job center), recreational facilities (e.g. sports stadium, theatre and cinema, park and square), public transit (railway station, coach station and subway station), education (e.g. museum, library and school) and health services (e.g. hospital, clinic and pharmacy) (Table S1). A total of 986,363 POIs were gathered in 162 cities. All the geocodes were coded using the geocoding GPS co-ordinates (Esri, ArcGIS 10.4)
Case-control design
We defined the case group as those communities with COVID-19 cases. We excluded Wuhan city due to its exceptionally high incidence of COVID-19 that made it difficult to find control neighborhoods and may bias the results. We also excluded neighborhoods with no or few nearby public facilities (total number across eight types < 8) because the results on public facilities may not apply to these neighborhoods. A total of 4,329 communities in 26 provinces were defined as cases. A radius of 1500-meter (approximately 1 mile) from the reported geocodes and the resulting circular areawas used to capture information on nearby public facilities. Although there is no consensus on “proximity,” this cut-off point has been widely used in research [15-17]. To determine the control neighborhood, we collected the latitude and longitude coordinates of all the neighborhood (N=127,995) in those regions first. Then, we moved the coordinates of the case communities 4.5 km (for 1.5 km buffer) to its north, south, east and west respectively and calculated the coordinates of the new points. We applied K-nearest-neighbor algorithm (Python3, Scikit-learn 0.23.2) to find the communities with nearest distance to each control coordinates but higher than 4.5 km from the case neighborhood. Those neighborhoods were selected as the control communities (Figure S1). If there is another case neighborhood in 4.5 km from the control neighborhood, we selected another control neighborhood farther than 4.5 km from the two case communities. We counted the number of each type of facilities in the 1.5 km buffers around both the case and control communities. We use Python3 to calculate the numbers of each types of facilities. For sensitivity analysis, we counted the number of the eight types of facilities in 800 m (Figure S1: R=800 m) and 1.2 km (Figure S1: R=1.2 km) buffer using the same methods as the 1.5 km buffer.
Co-variates
Population sizes, gross domestic product (GDP), unemployment rate, Government Budget Balance (GBB) for each city in 2018 were obtained from the China City Statistical Yearbook (http://olap.epsnet.com.cn/). Resident mobility (outflow and influx population flow in each city) were tracked with mobile phone data, through location-based services (LBS) employed by popular Tencent applications such as WeChat and QQ. Movement outflows from Wuhan City to other cities (i.e. records of the number of people leaving each day) by air, train and road, were obtained from the migration flows database (https://heat.qq.com/) (19). We use the average outflow and influx population flow (resident mobility) in each city from 18th January 2017 to 30th April as co-variates.
Statistical analysis
The eight types of facilities were compared according to the case-control neighborhood. Means and standard deviations (SD) were calculated for each types of facilities. Paired T-test and Mann-Whitney U test were applied to examine the difference in each types of facilities.
We fit mutivariable logistic regression models adjusted for city-level covariates including population size, Gross Domestic Product (GDP), unemployment rate, Government Budget Balance, and resident mobility recorded by Tencet applications (WeChat and QQ). Due to the high correlations between the eight types of facilities with Pearson’s correlation coefficients ranging from 0.62 to 0.92 (Table S2), we ran separate multivariable models for each of the eight types. As we had eight variables of interest, we applied the Bonferroni method to control for multiple comparisons and used the p-value of <0.006 to indicate statistical significance [18]. We also used multivariable logistic regression with penalized splines to evaluate the potentially non-linear associations of public facilities with having COVID-19 cases in the neighborhoods [19].
In subgroup analyses and interaction analyses, we dichotomized the public facilities by the median cutoffs and examined whether the associations between public facilities and having COVID-19 cases in the neighborhood differed by city population (<6 million versus ≥6 million, approximately the median value for city population size in our sample). We also conducted the sensitivity analyses using the 800 m and 1.2 km buffer in the case-control design to exam the robustness of our results.