We first describe the end products of our work, which make our algorithm available to the public: the 19andMe web application and the 19andMe API. We then illustrate how our use of underreporting factors in calculating exposure risk captures variability in exposure risk that is not reflected in reported case counts alone; taking these factors into account provides users with a more accurate sense of their relative risk over time than can reported case counts alone. Finally, we present validation results for comparisons with Nexoid and COVER.
Availability of algorithm to the public. We made the algorithm publicly available in April 2020 through a web-based interactive risk calculator (https://19andme.covid19.mathematica.org). Users answer questions in three sections: (1) About You, (2) Pre-existing Conditions and (3) Your Behavior (supplemental information [SI] Figure S1). In January 2021, we added a fourth section on vaccination status, but this update is outside the scope of this paper. After providing inputs, the user sees a risk gauge with their overall risk score, between 1 and 100, color-coded into three categories: 1 to 30 (low risk, green), 31 to 70 (medium risk, yellow), and 71 to 100 (high risk, red). Under the risk gauge, the app provides detailed statistics on exposure risk, susceptibility risk, and the effect of modifiable behaviors such as handwashing and wearing personal protective equipment (PPE) (SI Figure S2).
To aid integration with other digital health tools, we developed an API to allow batch processing of calculations (https://api.covid19.mathematica.org/score). Documentation and access key instructions for the API is available at https://github.com/mathematica-mpr/covid_risk_score/wiki/19andMe-API-documentation.
Comparison of estimated COVID-19 exposure risk with reported case prevalence. The 19andMe app helps users understand their risk of community transmission by estimating the number of active cases in their county and their risk of exposure, determined from their location and the number of close contacts (defined as more than 10 minutes at a distance of less than six feet) (https://www.cdc.gov/coronavirus/2019-ncov/global-covid-19/operational-considerations-contact-tracing.html). For users who live with other people, the direct contacts of their household members become indirect contacts of the user. Because testing kits were limited in the early stage of the pandemic and asymptomatic individuals did not seek care, the actual number of SARS-CoV-2 infections is most likely greater than the number of officially reported cases. We developed a method based on delay-adjusted case fatality rates to estimate county-level underreporting factors15. Figure 1 shows the average number of daily clinically confirmed cases per 100,000 people (Fig. 1A) and the 19andMe estimated exposure risks, assuming the user has 10 direct contacts (Fig. 1B) between December 8, 2020, and December 15, 2020. The first map shows only reported cases, whereas the exposure risks shown on the second map rely on the total estimated number of cases calculated by using an estimated underreporting factor.
In Fig. 1, reported cases and exposure risk have similar ordinal ranking of states (Spearman correlation coefficient 0.65, p = < 0.001). However, by incorporating the underreporting factor, the exposure map captures between-states and within-state variability not represented by case counts alone. For example, clinically confirmed case prevalence is higher in counties in Utah than in counties in Texas, but after accounting for the underreporting factors of the two states (Utah: 1.3x; Texas: 5.0x), the estimated exposure risk is lower in Utah than in Texas. Similar underreporting multipliers estimated using seroprevalence surveys by Angulo and co-authors corroborate our findings16. The exposure risk map also shows greater variance within states. For example, the case count in Bertie County and Mecklenburg County in North Carolina is 59 cases per 100,000 people. However, their underreporting factors are different; taking the underreporting factor into account, the estimated exposure risk in Bertie County is 4.5%, and 1.1% in Mecklenburg County.
Underreporting factors increase the accuracy of exposure risk over time. We estimate the underreporting factor as the ratio between the delay-adjusted 90-day case fatality rate (CFR) and the infection fatality rate (IFR), using deaths as a more reliable indicator of prevalence than case counts. Figure 2A shows the CFR nationally (dashed lines) and for the five states with the highest cumulative case counts. These rates are compared to the IFR reported by Russell and co-authors17 in March 2020 and the Institute for Health Metrics and Evaluation (IHME) in November 2020 (http://www.healthdata.org/sites/default/files/files/Projects/COVID/briefing_US_20201112.pdf). The CFR and the estimated underreporting factor decreased substantially from May 2020 to February 2021, as the proportion of infected individuals got tested, diagnosed and reported changes16. As of February 18, 2021, the national underreporting factor using the IMHE IFR was 2.3. The CFR varies substantially by state. For example, in May 2020, the CFR was higher in New York and Illinois than in Florida and Texas; however, by October 2020, this trend had reversed18.
The correction for underreporting provides users with a more accurate sense of their relative risk over time than does case count alone (Fig. 2B). For example, by reported case count (black line), the peak in cases during the winter 2020 national surge was many times larger than the peak during the initial spring 2020 period. However, after adjusting for underreporting using the IFR from Russell and co-authors for the spring 2020 period (red) and the updated IHME IFR for the winter 2020 surge (blue), the magnitude of the surges is more comparable. The app would show a similar exposure risk during the spring 2020 and winter 2020 peaks for a user in an average county with similar behavior.
Validation analyses using Nexoid. Mortality risk estimates were consistent between 19andMe and Nexoid (Spearman correlation 0.91, p < 0.001). We obtained 51,799 publicly available Nexoid user records and calculated 19andMe mortality and exposure risks for comparison. For 98.5% of these records (n = 51,024), results from the two apps were within 10% of each other; for 0.03% (n = 18), 19andMe estimates were at least 10% lower than Nexoid; and for 1.5% (n = 757), 19andMe estimates were at least 10% higher than those from Nexoid (Fig. 3A). All users in the higher and lower bands were over age 60. We found evidence that racial distributions differ across bands (p < 0.001). For the lower band, all 18 users were Black (Fig. 3B). After adjusting for race by matching the 18 users with White users who have otherwise equivalent demographics and behaviors, the estimates were in the within 10% or higher band, in line with other estimates for users over age 60. For the higher-band users over age 60, we found evidence of different mean numbers of pre-existing conditions (p < 0.001) (Fig. 3B). We also found evidence of differences in proportions for the incidence of diabetes, heart disease, hypertension, immune disease, kidney disease, lung disease, obesity, and smoking in the within-10% versus the higher band (all p < 0.001, adjusted for multiple comparisons). We did not find evidence of differences in types of employment between the two bands (p = 0.080).
Exposure risk estimates were also correlated between 19andMe and Nexoid (Spearman correlation 0.48, p < 0.001). Of the 51,799 users, 78.4% (n = 40,600) had results from two apps within 1% of each other, 13.6% (n = 7,043) had 19andMe estimates at least 1% lower than Nexoid, and 8.0% (n = 4,156) had 19andMe estimates at least 1% higher than Nexoid (Fig. 4A). For users in the higher band, the mean number of direct and indirect contacts differed (p < 0.001 for both), with the number of direct and indirect contacts larger in the higher band (Fig. 4B). Handwashing and mask wearing were less prevalent in the higher band (p < 0.001 for both). For users in the lower band, several factors absent from 19andMe were associated with higher Nexoid risk estimates: the presence of pre-existing conditions (diabetes or kidney, liver, or lung disease, p < 0.001), employment in the healthcare sector (p < 0.001), use of public transit (p < 0.001), and working outside the home (p < 0.001).
Validation analyses using COVER. The risk estimates for hospitalization, intensive care unit (ICU) admission, and mortality were consistent between 19andMe and COVER, with Spearman correlations of 0.89, 0.87, and 0.93, respectively. Of the 51,678 users, 68% (n = 35,103) were within 10% of each other for hospitalization, 91% (n = 46,951) were within 10% for ICU, and 99% (n = 50,972) were within 10% for mortality. For nearly all of the remainder of users, 19andMe estimates were at least 10% higher than those from COVER (Figure S3), with only n = 4, n = 0 and n = 73 users having COVER estimates more than 10% higher than 19andMe for hospitalization, ICU, and mortality, respectively. For all three outcomes, users under age 60 were very likely to be in the within 10% band (82%, 97% and 100% for hospitalization, ICU, and mortality, respectively), and users age 60 and older were less likely to be in the within-10% band, especially for hospitalization (16%, 69%, and 94%, respectively). For the hospitalization and ICU outcomes, much of this discrepancy between the calculators for older users can be attributed to 19andMe’s higher baseline risks (risks before adjusting for underlying conditions) relative to COVER (Fig. 5). For both older and younger users, users in the higher band had more pre-existing conditions than those in the within-10% band (Fig. 6, p < 0.001 for all outcomes and both age groups). Among users under age 60 for all three outcomes, we also found evidence of differences in proportions for the incidence of diabetes, heart disease, hypertension, immune disease, kidney disease, lung disease, obesity and smoking in the within 10% versus higher band (all p < 0.001, adjusted for multiple comparisons). For users age 60 and older, we found evidence of differences in proportions for all eight conditions for ICU risk all p < 0.001, adjusted for multiple comparisons) and mortality (all p < 0.001, except for renal disease, which was 0.02, adjusted for multiple comparisons), and for all conditions except renal disease and lung disease for hospitalization (all p < 0.01, adjusted for multiple comparisons).