4.1. Study region and data
We collected cumulative information of confirmed cases of COVID-19 in 487 US counties, reported by the Johns Hopkins Coronavirus Resource Center (https://github.com/CSSEGISandData/COVID-19) from March 5, 2020 to May 1, 2020. We used three parameters, including infection rate (IR), effective reproduction number (Rproxy), and compound growth rate (CG), to evaluate the spread of COVID-19.
IR was calculated using the following equation:
where C(t) denotes the cumulative number of confirmed cases per day t, and N represents the total population of the county. IR during time interval T was calculated using the following equation:
The effective reproduction number (Rproxy) was used, as described in a study by Luo et al. 13. Briefly, a proxy for the reproductive number R in 5-day intervals was calculated using cumulative incidence data for each county. A proxy for R, Rproxy, indicates the occurrence of cases from time (t) to time (t + d) onto cases reported from time (t + d) to time (t + 2d), where d is the calculated serial interval (i.e., the interval between successive cases in a series of disease transmissions). For multiple time points, t, values of Rproxy (t, d) were obtained using the equation below:
Taking d as 5, we estimate the Rproxy of D days (where D>10); for example, to calculate the Rproxy of 17 days, we used the following formula:
Compound growth rate (CG) was calculated using the following equation 23:
Where C1 represents the number of confirmed cases on the first day after the 50th case, the C2 represents the number of confirmed cases on the last day of the investigation period, and D represents the duration (days) of the period.
Meteorological data, including temperature, relative humidity, and atmospheric pressure were collected from Reliable Prognosis (https://rp5.ru/Weather_in_the_world). We calculated the “absolute humidity” using temperature and relative humidity for each county with the following formula, which is an approximation of the Clausius–Clapeyron equation 13:
where AH refers to absolute humidity, T is the temperature in Celsius, RH is the relative humidity (%), and e is the base of the natural log.
We calculated average temperature (AT), average absolute humidity (AAH), and average atmospheric pressure (AAP) for 17, 24 and 31 days after the 50th confirmed case in each county. Population data at county level at the start of the year 2020 were obtained from the U.S. Census Bureau, and population density (PD) was calculated (inhabitants/mi2) for each county. To perform statistical analysis, we converted the population data logarithmically.
4.2 Statistical and modeling analysis
Statistical analysis was performed using the R statistical platform, v. 3.6.1 (The R Project for Statistical Computing, Vienna, Austria). First, univariate linear regression analysis was used to identify relationships between the measured environmental variables and Rproxy, IR, and CG of COVID-19. R-squared values (R2) were calculated for the regression model to evaluate the percentage of variance in Rproxy, IR, and CG of COVID-19 that could be explained by each environmental variable. Second, stepwise multiple linear regression (sMLR) models were developed. Model fit was assessed using R2, and the Akaike information criterion was used to determine whether to add or remove variables during the stepwise procedure 24. Before their use in sMLR, the data were standardized to their z-scores.