Background: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) the causal agent for COVID-19, is a communicable disease spread through close contact. It is known to disproportionately infect certain communities due to both biological susceptibility and inequitable exposure. In this study, we investigate the most important health, social, and environmental factors impacting both the early and later phases of COVID-19 transmission and mortality in US counties.
Methods: We aggregate county-level physical and mental health, environmental pollution, access to health care, demographic characteristics, vulnerable population scores, and other epidemiological data to create a large feature set to analyze COVID-19 outcomes. Because of the high-dimensionality and multicollinearity of the data, we use ensemble machine learning and marginal prediction methods to identify the most salient factors associated with several COVID-19 outbreak measures.
Findings: Our variable importance results show that measures of ethnicity, public transportation and preventable diseases are the strongest drivers for both incidence and mortality. Specifically, the CDC measures for minority populations, CDC measures for limited English, and proportion of Black/African-American individuals in a county were the most important features for COVID-19 cases at day 25 and to date. For mortality at day 100 and total to date, we find that public transportation use and proportion of Black/African-American individuals in a county are the strongest predictors. The methods predict that, keeping all other factors fixed, a 10% increase in public transportation use increases mortality at day 100 by 2012 (95% CI [1972, 2356]) and likewise a 10% increase in the proportion of Black/African-American individuals in a county increases total deaths to date by 2067 (95% CI [1189, 2654]). In terms of cases to date, ethnicity turns out to almost twice as important as the next most important factors, which are location, disease prevalence, and transit factors.
Interpretations: Our findings indicate that a more focused approach should be taken when managing COVID-19, by considering features of the economy most responsible for transmission and sectors of society most vulnerable to infection and mortality. In particular, our results strongly reinforce others pointing to the disproportionate impact of COVID-19 on minority populations. They also suggests that mitigation measures, including rolling out vaccinations as they become available, will be most efficacious for the US population as a whole when, beyond healthcare workers and first responders, are focused first on the highest-risk communities.
Funding: UC Berkeley, Biomedical Big Data Training Fellowship; NSF Grant 2032264 to WMG and AH.