This paper illustrates the use of a web-based system for collecting salient signs and symptoms of COVID-19 in a community. The crowd-sourced information can estimate local prevalence and incidence without the heavy expense of testing of biological samples. The website questionnaire can be completed in 30 seconds. The system relies on self-reporting, not on location tracking to protect the anonymity of the user by only requesting the zip code. Repeated completion of the website can give accurate daily updates. Responses were rapid with the majority of responses coming within a 24-hour period of a request. No serologic or molecular testing was required for the prevalence estimates.
To illustrate the effectiveness of a web-based model such as ours, one can compare our results with other surveillance studies of community testing. Our system reported 21% prevalence in NYC. After the study had accrued over 3000 participants, including over 1300 in New York City, Mayor Cuomo reported antibody testing results that indicated 1/5 of New Yorkers were infected (19.9%).17,18 Our survey estimate for the Greater Boston area is 5.24% (2.687–9.109, Table 1). Three weeks later, the City of Boston reported a seroprevalence study of 9.9% (6.3–13.3%) that varied by zip code on May 18, 2020.19 Tests in other geographical areas of the USA show a similar magnitude of community prevalence to the 7% we found in Georgia. In Los Angeles County, antibody testing estimated prevalence at 4.06% (2.84–5.6%).20 Antibody seroprevalence in Santa Clara County showed 2.8% (1.3–4.7%).21 A recent report of prevalence by large-scale RT-PCR testing in the Baltimore-Washington, DC region shows 16.3% (16.0-16.7%) from March 11 to May 25.22 The remarkable similarity in prevalence estimates between our survey-based study and the aforementioned testing studies highlights the ability of self-reporting to yield a reasonable determination of COVID19 prevalence.
The prevalence values for community COVID are much greater than confirmed cases. Given the preferential testing of moderate to severe cases which present in the hospital setting, it is likely that the number of confirmed cases greatly underestimates the overall prevalence and incidence of the disease. When the CountCOVID results showed 40x confirmed cases in Georgia, we were initially concerned that this ratio was improbable and too high. Since then, two antibody surveillance studies in California gave ratios of 43.5 × (28–55) for Los Angeles County20, and 54 × (25–91) for Santa Clara County21. Thus, despite the striking discrepancy with confirmed cases, our estimate of 40x for Georgia appears to be validated.
The ratio of prevalence values to confirmed counts may fall as the number of widespread testing (php) increases. For example, the ratio for NYC (20x) is about 1/2 of that for Georgia (40x), although NYC has 5x more confirmed cases php. NYC boasts the largest number of tests in the country in May 2020. Although presumed cases and confirmed cases may approach each other with increasing testing, urgency and financial consideration is of utmost importance.
Incidence will track the number of new cases and may provide information on basal levels and outbreaks. For the month of May, we estimate the prevalence in Georgia to be between 0.56–1.79% as the state reopened. In comparison, the COVID-19 website estimates symptomatic COVID incidence as between 0.2–0.4% for a slightly different time frame.23 The order of magnitude is similar and all the values may reflect sampling bias and definition of COVID symptoms. Nonetheless, the similarity in values provides confidence that this method of self-reporting is scientifically reasonable.
Criteria for presumed COVID may need to be refined as we learn more about this disease. We did not ask for symptoms of diarrhea or “COVID toes” in the initial survey. Each question is subject to False Positives and False Negatives. For instance, cough was quite common in the community, especially given the temporal association with an uptick in hay fever symptoms during the spring. Thus, cough by itself at 17% was not specific and would have many False Positives. Fever was at 9%, but can be caused by a plethora of illness. However, the intersection of fever and cough yielded about 5%. We included in the definition of Presumed COVID all positively tested individuals and loss of smell plus at least one other symptom. These additional categories yielded a small subset of the total cases of the final estimate of Presumed COVID prevalence of 7%. This algorithm of signs and symptoms mirrors current clinical judgement. We do note that the survey by the Imperial College with predominantely users from the UK yielded a much greater percentage of loss of smell.24 It is not known whether this is a difference in patient population or survey technique. The selection of other criteria can be made post-hoc, but our current criteria yielded extraordinarily similar results to serotesting. Given time, the analysis can be back-calibrated with selective surveillance testing to correct for errors or biases. However, the application of natural intelligence instead of artificial intelligence may be good enough.
Prevalence is an important parameter to assess for determining the effectiveness of social distancing, testing, and herd immunity. The proposed method of web sampling is rapid and inexpensive. Given the current financial crisis which has resulted from this pandemic, economic burden can be minimized in the quantification of disease burden by using web sampling. In contrast, serologic or PCR testing is so much more expensive. PCR testing of 1000 people would be approximately $1 million. The web-based survey of 1000 people is estimated to cost approximately $100. Because one can sample quickly and often, a sudden increase in symptoms on CountCOVID.org may provide advance warning of an outbreak.
This survey, and all sampled studies have bias. Bias is possible if the sample size is small or skewed by the population completing the survey. It would be essential to widely encourage the population to participate in a web-based survey such as ours. The current results are likely biased to adult faculty and staff from Georgia Tech who are employed, instead of the general population. Unfortunately, the comparison Confirmed cases is also subject to large bias since they are mostly directed at severe cases that can access high quality health care, and not a sampling of the wider community. Conversely, sampling may be purposely restricted to quantitatively assess the baseline and trends for selected populations such as the elderly or particular neighborhoods. There are other electronic based systems that do a sampling based on signs and symptoms.13–14 We applaud all of these systems and encourage them to report on their findings for academic comparison and collaboration.