Data
The objective was to quantify epidemiological, public health, economic, and governmental intervention factors associated with Covid-19 spread worldwide. New Covid-19 cases per million per country per time step were used a proxy of disease spread. New cases per million in each country was chosen instead of new cases per country as this estimator is less biased by the total population of each country - countries with higher populations are more likely to produce higher new cases or total but the number may be relatively low in comparison to the total population pool. Data regarding new Covid-19 cases per million from the “Our world in data” database were analyzed. The dataset was last accessed on 25/05/2020 and the download location is https://github.com/owid/covid-19-data/tree/master/public/data. The data derive from the European Centre for Disease Prevention and Control (ECDC), an EU agency with the aim to strengthen Europe’s defense against infectious diseases. The ECDC collects and aggregates data from countries around the world. The most up-to-date data for any particular country is therefore typically available earlier via the national health agencies than via the ECDC. This lag between nationally available data and the ECDC data is not very long as the ECDC publishes new data daily; typically this time lag is at the level of some hours and less than a day. The ECDC collects compiles and harmonizes data from around the world in a consistent way which allows us to compare what is happening in different countries. The spatial replicate of the dataset comprised of 160 countries while the temporal replicate spans from 31/12/2019 to (including) 25/05/2020. The variables included:
Table 1. Description of the variables employed in the analysis and their source.
Column
|
Description
|
Source
|
iso_code
|
ISO 3166-1 alpha-3 – three-letter country codes
|
International Organization for Standardization
|
location
|
Geographical location (Country)
|
Our World in Data
|
date
|
Date of observation
|
Our World in Data
|
new_cases_per_million
|
New confirmed cases of COVID-19
|
European Centre for Disease Prevention and Control
|
total_tests
|
Total tests for COVID-19
|
National government reports
|
new_tests
|
New tests for COVID-19
|
National government reports
|
new_tests_smoothed
|
New tests for COVID-19 (7-day smoothed). For countries that don't report testing data on a daily basis, we assume that testing changed equally on a daily basis over any periods in which no data was reported. This produces a complete series of daily figures, which is then averaged over a rolling 7-day window
|
National government reports
|
total_tests_per_thousand
|
Total tests for COVID-19 per 1,000 people
|
National government reports
|
new_tests_per_thousand
|
New tests for COVID-19 per 1,000 people
|
National government reports
|
new_tests_smoothed_per_thousand
|
New tests for COVID-19 (7-day smoothed) per 1,000 people
|
National government reports
|
tests_units
|
Units used by the location to report its testing data
|
National government reports
|
stringency_index
|
Government Response Stringency Index: composite measure based on response indicators including school closures, workplace closures, and national and international travel bans, canceling public events and exiting home rescaled to a value from 0 to 100 (100 = strictest response)
|
Oxford COVID-19 Government Response Tracker, Blavatnik School of Government
Reference:[25]
|
population
|
Population in 2020
|
United Nations, Department of Economic and Social Affairs, Population Division, World Population Prospects: The 2019 Revision
|
population_density
|
Number of people divided by land area, measured in square kilometers, most recent year available
|
World Bank – World Development Indicators, sourced from Food and Agriculture Organization and World Bank estimates
|
median_age
|
Median age of the population, UN projection for 2020
|
UN Population Division, World Population Prospects, 2017 Revision
|
aged_65_older
|
Share of the population that is 65 years and older, most recent year available
|
World Bank – World Development Indicators, based on age/sex distributions of United Nations Population Division's World Population Prospects: 2017 Revision
|
aged_70_older
|
Share of the population that is 70 years and older in 2015
|
United Nations, Department of Economic and Social Affairs, Population Division (2017), World Population Prospects: The 2017 Revision
|
gdp_per_capita
|
Gross domestic product at purchasing power parity (constant 2011 international dollars), most recent year available
|
World Bank – World Development Indicators, source from World Bank, International Comparison Program database
|
extreme_poverty
|
Share of the population living in extreme poverty, most recent year available since 2010
|
World Bank – World Development Indicators, sourced from World Bank Development Research Group
|
cvd_death_rate
|
Death rate from cardiovascular disease in 2017
|
Global Burden of Disease Collaborative Network, Global Burden of Disease Study 2017 Results
|
diabetes_prevalence
|
Diabetes prevalence (% of population aged 20 to 79) in 2017
|
World Bank – World Development Indicators, sourced from International Diabetes Federation, Diabetes Atlas
|
female_smokers
|
Share of women who smoke, most recent year available
|
World Bank – World Development Indicators, sourced from World Health Organization, Global Health Observatory Data Repository
|
male_smokers
|
Share of men who smoke, most recent year available
|
World Bank – World Development Indicators, sourced from World Health Organization, Global Health Observatory Data Repository
|
handwashing_facilities
|
Share of the population with basic handwashing facilities on premises, most recent year available
|
United Nations Statistics Division
|
hospital_beds_per_100k
|
Hospital beds per 100,000 people, most recent year available since 2010
|
OECD, Eurostat, World Bank, national government records and other sources
|
In particular regarding the governmental stringency index [25], the methodology is explained here:
https://github.com/OxCGRT/covid-policy-tracker/blob/master/documentation/codebook.md
and here:
https://github.com/OxCGRT/covid-policy-tracker/blob/master/documentation/index_methodology.md
From the available variables male and female smokers were averaged as ‘smokers’.
Data analytics
We employed generalised linear mixed effects models (LME; [26]) with new Covid-19 cases per million as the dependent variable. As the dataset contained several potential indexes of testing, population density, or age structure within each country, initial analysis was conducted in order to select the most informative index of each.
We initially sought to quantify the most parsimonious data driven index of testing which included the fixed effects of (i) news tests (ii) total tests (iii) new tests per thousand (iv) total tests per thousand (v) new tests smoothed, (vi) new tests smoothed per thousand. This was achieved by fitting six LMEs with new cases per million as the dependent variable and six LMEs with i - vi as the single independent variable. The random effect structure of each LME included the nested variance of time within each country (Random~Country/Time). Doing so the fitted LMEs accounted for both temporal and spatial autocorrelation in the time replicated data deriving from different geographic locations [21, 22]. LMEs were fitted with Maximum Likelihood (ML) estimation to allow comparisons between models with different fixed effects and selecting the LME that exhibited the lowest Akaike (AIC) value [27, 28]. Here and throughout the analysis, there were 19,709 data points in the analysis but there were variables with missing values at some time steps or at some countries. Missing values were omitted from the statistical analysis. Therefore AIC values are compared between models fitted with different fixed effects but also with potentially different sample sizes.
Similarly, LMEs with new cases per million as the dependent variable and the fixed effects of (i) population or (ii) population density, and the nested random effects of time within country were fitted with ML and compared against AIC values to select the optimal data driven population index.
Regarding age structure of the population within each country, the available variables were (i) median age of the population, (ii) the percentage of the population aged 65 or older, and (iii) the percentage of the population aged 70 or older. The analysis proceeded by selecting the ML fitted LME with the lowest AIC between the three available age population structure variables. All three fitted LMEs contained the random effects of time nested within country.
Regarding economic status of the population within each country, the available variables were (i) gdp per capita, and (ii) percentage of the population under extreme poverty. The analysis proceeded by selecting the ML fitted LME with the lowest AIC between the two available economy status variables. The two fitted LMEs contained the random effects of time nested within country.
Having selected the optimal index of testing, population density and age structures the analysis proceeded with the following variables: (1) population density, (2) new tests per thousand, (3) governmental stringency index, (4) percentage of the population aged > 65, (5) percentage of the population under extreme poverty, (6) cvd death rate, (7) diabetes prevalence, (8) percentage of smokers, (9) percentage of the population with access to hand washing facilities, and (10) hospital beds per 100k inhabitants within each country as independent variables.
Hierarchical Variance Partitioning (HVP) statistical modelling was implemented to account for the contribution of each data driven epidemiological, economic, public health, and governmental intervention explanatory variable to the total variance of new Covid-19 per million cases [29, 30]. HVP is a statistical framework that is capable of handling correlated independent variables, whilst providing a reliable ranking of predictor importance of each variable [29]. Variance partitioning is calculated from the Akaike (AIC) weights of each explanatory variable and it is based upon the number of times that a variable was significant in all possible combinations of the explanatory variables. The HVP function produces a minor rounding error for hierarchies constructed from more than nine variables [31] - the available data driven variables were 10. To check if this error affects the inference from an analysis, the analysis was repeated several times with the variables entered in a different order [31]. The analysis resulted in changes in the derived results when the order of the variables was changed. The analysis proceeded by creating a new variable that merged together other disease related variables: other diseases variable= (cvd_death_rate + diabetes_prevalence) plus the other remaining eight variables resulting in a total of nine variables. There is no known statistical bias in HVP when 9 or fewer variables are used [31].