Data Sources
Cases and Mortalities
SARS-CoV-2 laboratory-confirmed case and mortality data were acquired from the Centers for Disease Control and Prevention (CDC) data archive21. The time period of January 1, 2020 to March 30, 2021 was selected for three reasons: 1) the Spring (approximately January-May) and Fall (approximately September-December) periods capture the entirety of a typical university calendar year, providing a control period to adequately measure COVID-19 transmission in local communities, 2) The commencement of the fall semester (September 1, 2020) is a critical point as it is approximately the median time point for this analysis, and 3) The U.S. experienced three distinctive COVID-19 “waves” over this period, creating a large representative dataset for comparing counties with campuses across the country.
By March 30, 2021 a total of 22,385,335 cases and 374,130 deaths were reported to the CDC. Each case in this dataset contained 32 elements, including demographics (e.g., age, race and ethnicity, and sex) and county/state of residence. Numerous reporting agencies provide this information to the CDC and as a result, lineage-specific COVID-19 designation, symptom onset, and/or test positive dates were often incomplete, inconsistent, and/or provisional. All cases were aggregated with no sequencing-specific lineage designation. If multiple dates were listed, a single case-positive date was selected from the following (the same order of preference used by the CDC) data: symptom onset date, test positive date, and CDC report date. If no date or county of residence information was provided, the case was excluded. All personally identifiable information (names, addresses, etc.) were removed by the CDC prior to public release. Geographic locations for each case were aggregated to county-level for additional privacy protections.
University Enrollment
University enrollment (UE) data were acquired from the Integrated Postsecondary Education Data System (IPEDS)22. Total enrollment by postsecondary institution for the Fall semester of 2018-2019 academic year was aggregated by county and divided into four categories. Counties with total enrollment x 15000, 15000 > x > 5000, 5000 x > 0, or no enrollment, were labeled large (n = 253), medium (n = 361), small (n = 792), and absent (n = 1641), respectively.
U.S. Population and Other Key Covariates
County population was acquired from the American Community Survey (ACS) 2019 5-year estimate23. These data provide estimates of population by age from 60 months of sampling between January 1, 2015 and December 31, 2019. For purposes of this study, age groups have been categorized by 10-year intervals as 0-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, and 80+.
Other factors of interest in evaluating COVID-19 among populations at the county level included median household income23, unemployment24, COVID-19 community vulnerability index (CCVI)25, percentage of population that was vaccinated as of March 30, 2021 (at least 1 dose)26, and percentage vote by party candidate in the 2020 U.S. Presidential Election27.
University COVID-19 Mitigation and Containment Policies
Highly incongruous COVID-19 policies were implemented across U.S. universities during of the Fall 2020 semester reopening. We therefore selected aggregated variables that could be compared widely across institutions of various size and location by the College Crisis Initiative at Davidson College28. Only data from four-year universities were included in the analysis, and schools were organized by the aforementioned enrollment size categories. Reopening plan variables included the “Mode of Instruction” (MOI) as of September 1, 2020 and proposed on-campus COVID-19 testing strategy (Table S1). Other educational institution factors (e.g. land grant university status and degree of campus urbanization)22 were included in the analysis due to potential impact of funding mechanisms and population density structures. Additionally, we evaluated county-aggregate factors such as self-reported masking adherence, state-instituted mask mandates29, median household income23 and unemployment rates24.
Statistical Analysis
Cases and deaths were age-adjusted using the 2019 ACS 5-year estimates as the reference population. Each county’s COVID-19-confirmed cases and deaths were organized by age group for the time period between January 1, 2020 and March 30, 2021. These were then aggregated as a total for each outcome and age-adjusted.
Age-adjusted COVID-19 case and death rates were evaluated across three time scenarios: 1) January 1, 2020 to March 30, 2021; 2) By each “wave” period, determined by natural breaks between peaks among national cases (Wave 1: January 1, 2020 – June 7, 2020, Wave 2: June 8, 2020 – September 6, 2020, Wave 3: September 7, 2020 – March 30, 2021); and 3) Before and during the Fall 2020 academic semester (January 1, 2020 - August 31, 2020 and September 1, 2020 - March 30, 2021)30–32.
All statistical comparisons of groups were assessed using JMP® (Version 16.0)33. Group means were compared using Student’s t-test or Dunnett’s Test (control group = counties without university enrollment). Comparisons were significant at p < 0.05. Age-adjusted case and mortality rates were modeled separately as dependent variables. Models were stratified by county university enrollment size, totaling eight standard least squares regression models. Final model selections were made using backwards Bayesian Information Criterion (BIC)-based stepwise regression. Overall significance of an independent variable was assessed by its frequency of inclusion in each of the four county university enrollment types, as well as averaging the logWorth (calculated as -log10(p-value)) of that variable across all four university county types.
Since the college reopening dataset contained both quantitative and qualitative variables, a factor analysis of mixed data using the FactomineR package34 in R (version 4.1.2) was conducted to identify important variables related to university reopening plans. Once the contributions of these variables to the overall variance of the dataset were assessed, a hierarchical cluster analysis was performed (using Bartlett test of sphericity, p < 0.05) to identify similar school mitigation strategy clusters associated with population-adjusted COVID-19 cases at the county level.
Finally, due to overdispersion in the case data, we fit negative binomial models using the MASS package35 to identify university COVID-19 mitigation strategies and other county-level variables that significantly predicted population-adjusted county COVID cases in the Fall 2020 semester. Model fits were evaluated with the performance package36 which gives a performance score based on the model’s BIC, Nagelkerke’s R2, and root-mean square error.