Data sources
Data regarding Covid-19 were collected from Johns Hopkins GitHub repository Systems Science and Engineering 14. The information on governmental measures (school and university closures) were acquired from the UNESCO database 15.
The climatic parameters reported were taken from the dataset of the NASA Langley Research Center (LaRC) POWER Project 16.
The demographic estimates (population size, land area, population density) were obtained from the United Nations population estimates and from the World Factbook of Central Intelligence Agency (CIA) 17.
Geoclimatic categories
In order to evaluate the relation of Covid-19 with geo-climatic environment, the world has been divided into five geoclimatic zones, according to the updated Koppen-Geiger classification: polar, cold-temperate, warm- temperate, arid, and equatorial 18.
Countries considered
For SARS-Cov-2 analysis, all the UN 193 countries have been taken into account. Among these, 16 small countries, with a population below one million and a density of less than 100 people/km2, as well as 18 countries with insufficient data on Covid-19 have been excluded. Conversely, all 50 states of the United States of America have been considered individually. Therefore, a total of 209 Countries have been included in the present study.
Data acquisition
A total of 134,871 data were acquired from the sources mentioned above and inserted in a Microsoft Excel spreadsheet (Supplementary table S1). Fifteen variables have been considered for the analysis and organized into the 3 groups herein reported.
1) Demographic: population size (number of people), land area (Km2), population density (people/Km2).
2) Climatic: climatic zone (one to five), temperature at two meters (°C), solar irradiation (MJ/m2/day), relative humidity (%), wind speed at two meters (m/sec), surface pressure (KPa), precipitation (mm/day).
3) Covid-19: date of the first confirmed case, number of new weekly cases, number of active weekly cases, weekly incidence (number of new weekly cases /population size per 100,000) and weekly prevalence (number of active weekly cases/population size per 100,000).
Data processing
To ensure that the data collected met the purposes of the study, a set of specific criteria was established for the selection of the appropriate sample, and separate studies were performed to confirm the appropriateness of these choices. In particular:
1) Data on weekly new cases and active cases of SARS-CoV-2 infection were collected for a period of 16 weeks. Since the beginning of the infection did not occur simultaneously across all the countries, the data collected start from the first documented case in each country.
2) To evaluate the relationship between Covid-19 and climatic factors, matching epidemic and climatic data was found to be of importance. In each country, climatic conditions vary considerably across regions. Therefore, one to four cities, one for each of the regions most affected by Covid-19, were chosen for each country. Then, the weekly average was calculated for all the six climatic variables. Finally, the weekly means of all the cities were averaged to get the six total national weekly values. The process was repeated for all the weeks considered.
3) To evaluate the relationship between Covid-19 and climatic factors, a shift time between the collection of virologic data and the acquisition of climatic data had to be taken into consideration. In fact, the incubation period, the delay between symptom onset and testing, and the delay due to the communication of the result, contribute to a time shift between the infection exposure and the publication of the virologic data. Consequently, it is necessary to take into account a lag time between the collection of virologic data and the acquisition of climatic data for the analysis. According to the literature data 19–21, a lag time of two weeks was considered in the present study.
Data analysis
Data relative to SARS-CoV-2 were collected into a balanced panel dataset of 209 countries, starting from the first week of outbreak, until the sixteenth week. Due to data skewness (i.e. data with a non-Gaussian distribution), logarithmic transformation was applied to the analyzed variables. To evaluate two-way association between incidence, prevalence, population density, and meteorological variables, Spearman's rank correlation coefficient was calculated, considering a lag of two weeks between climatic data collection and virologic data acquisition.
In order to analyze the relation among the above-mentioned variables simultaneously, a theoretical path diagram was presumed (figure 1). In this theoretical path, meteorological factors were hypothesized to be correlated to each other and related to an unobserved variable, indicated by climate label. In addition, it is supposed that climate and population density are regressors on incidence and prevalence, whose covariation is expressed by an arc.
To test this theoretical path and convert it into a set of equations, the authors applied the SEM, a broad and flexible statistical technique for modeling causal chain of effects simultaneously. Using a confirmatory approach (hypothesis-testing), this technique, examines the relationships between observed variables and not observed (latent) variables, in turn linked to observed variables, their indicators.
The SEM graphical representation is given by a path diagram, a kind of flow-chart that uses boxes and ellipses linked via arrows. Observed variables are represented by a box, and latent variables by an ellipse. Straight single-headed arrows express causal relations and double-headed curved arrows express correlations or covariance (without a causal interpretation). The values on the straight arrows between latent and observed variables and those between latent and observed indicators, represent the standardized path coefficients and the factors loading, respectively. The circles with short arrows to their respective measured variables define random error and residual terms (sources of systematic variance not due to the variable).
To evaluate the adequacy of the model, the following fit indices were considered 22: a) coefficient of determination (CD) (similar to the R-squared value, ranging 0-1, good fit for values close to 1); b) root mean square error of approximation (RMSEA) (good fit for RMSEA < 0.08), and c) standardized root mean square residual (SRMR) (adequate fit for SRMR < 0.08).
SEM was fitted by maximum likelihood estimation (MLE) method and p-value less than 0.05 was considered as statistically significant. All of the statistical analysis was performed using STATA 14.0 (STATA Corp, College Station, TX) 23.