The National Centre for Disease Control (NCDC) India conducted a serological survey for IgG antibodies to SARS-CoV-2 virus in the state of Delhi, India – a city-state and the capital of the country – between June 26 and July 10, 2020(1), median date July 3. A second serological survey was conducted by the Government of Delhi on the same population, from August 1-7 2020(2), median date August 3. Table 1 below provides details of these surveys and results, considering Delhi’s population as 19.1 million(3). The median time to develop IgG antibodies is 13 days from symptom onset(4), so these serosurveys correspond to those infected by the virus latest by June 20th and July 21st, respectively.
Surv No.
|
Survey Dates
|
Median Date
|
Sample Size
|
Positivity Detected
(antibody +ve)
|
Implies, infected…
|
Corresponding “Infected” Dt
|
Tot virus-test +ve by date
|
Multiple of Antibody +ve to Virus +ve
|
1
|
June 26-July 10
|
July 3
|
21,387
|
23.48%
|
4.48 mill
|
By June 20
|
56,746
|
79-times
|
2
|
Aug 1 - 7
|
Aug 3
|
15,311
|
29.1%
|
5.56 mill
|
By July 21
|
125,096
|
44-times
|
Table 1: Details of Two Serological Surveys Conducted at Delhi
|
Table 1 shows that while there were 56746 infected cases (2971 per mn) – all tested for virus positivity by RT-PCR, negligible RAT testing, there were 4.48 mn (234,800 per mn) seropositive cases – positive for the presence of SAR-CoV-2 antibodies. By conventional wisdom, 4.48 mn residents had contracted Covid-19 and had developed antibodies thereafter. Those antibody-positive cases were 79 times higher than virus-tested cases. These were “invisible” cases – not only were they adequately asymptomatic to not require medical attention, but they had not also been tested as primary contacts of infected cases. Similarly, for the second survey, there were 6550 virus-positive cases per million, while there were 291,000 per million seropositive, a multiple for 44x.
The Research Question
Although it is surprising that seropositive cases should be much higher than those recorded infected by virus testing, this result is consistent with other studies across the world (see following section). However, it is intriguing that the gap between serosurvey positives and virus-test positives should fall so much – 79 times down to 44 times – in the course of 1 month.
The objective of this study is not to establish that antibody-positive cases are higher in Delhi than recorded-infected cases or to examine why the multiples are as high as 79x, though the study will provide some answers to these questions. The objective of this study is to scrutinize why multiple antibody-tested positive cases fell from 79x in June-July to 44x exactly one month later.
This phenomenon is counterintuitive. If there are “invisible” asymptomatic untested cases who have recovered and developed antibodies, as high as 79 times the recorded numbers at any point of time, this multiple should remain similar at second and subsequent surveys. If the natural phenomenon is that invisible cases infect others because they are without quarantine or isolation, they should continue doing so at all times.
There is one possibility that testing volumes were low until June 20th, and consequently, a number of symptomatic cases were untested in that period but soon fully recovered with active antibodies that showed up in the first survey. A look at the data leads to the intuitive conclusion that virus-tested positives would not have gone up to comparable levels even if testing volumes were higher before the first serosurvey. We establish this intuitive conclusion by the statistical revisions we perform, using testing volumes and spot positivity, to “equate” the two surveys on this parameter. [We define the term Spot Positivity to mean Current or Fresh Positivity Rates of Virus-tests; Spot Positivity on Day D is defined as the (Total of Fresh Cases on Days D, D+1, D+2) divided by (Total of Fresh Tests conducted on Day D-1, D and D+1)]
Background: Antibody-tested Positives are Always Higher than Virus-tested Positives
There is no debate that the total number of those who will go on later to develop antibodies will always be higher than the recorded positive cases. Many infected cases, mostly asymptomatic, are untested. Testing capacity and protocols are not designed to hunt for asymptomatic cases. Practical testing logistics limits testing to symptomatic cases and their primary contacts and excludes asymptomatic cases and secondary contacts. For these and other reasons, there will always be a number of “invisible” Covid-19 cases in the community.
This has been established time and again. All serological surveys conducted so far have indicated an antibody-positive number in excess of virus-tested positive cases, with the range varying between 6x and 80x. Among a few such, a study in Gangelt Germany in March 2020(5) revealed a 7-fold higher seroprevalence than confirmed infected cases. A widely cited study in The Lancet(6) conducted in Geneva during April-May reached the definitive conclusion that antibody-positive cases were 11.6x higher than virus-tested positive cases. A study conducted at 10 diverse sites in the USA between Mar-May 2020(7) showed an average gap of 38x between seroprevalent cases versus recorded-infected cases (counted 7 days prior to antibody testing) – the multiple varied widely across the 10 sites. A study in Spain involving 61075 samples conducted in April-May(8) showed seroprevalence between 3.7% and 6.2% and an antibody-positive figure that is at least 19x the virus-positive cases (after extrapolating the math in the paper). Several other studies(8) report seroprevalence data without comparing with corresponding recorded-infected cases – if computed, these would also reveal significantly higher multiples of antibody-positive cases. It can be inferred that an unstated informal consensus is that seroprevalent cases 10x-15x higher than recorded infected cases are not unusual.
Interestingly, two of the studies cited above have reported a drop between two serial serostudies – in Geneva and some sites of USA – but these have been considered non-typical aberrations in only one subsequent round of testing. These have been seen in light of the possibility that antibodies may decrease over time. This is an unresolved question, with other research upholding both sides of the argument, and we will not factor-in the possibility of antibody decrease in our study.
Background: A Basic View of Viral Dynamics and Antibody Generation
With a large number of asymptomatic cases, the classical picture of exposed à incubation à onset à mild/moderate/severe disease à resolution is now inadequate. A study of viral dynamics is unwarranted in this study, but the relevant context is presented in Fig 1 below. This is a simplified schematic that shows the time relationships between the disease (detection and later), infectivity and antibody generation. The schematic is simplified by the use of median values when each element is actually a probability distribution. Exceptions arising from some recent research (e.g., no antibody generation) are avoided. Viral shedding is detected by RT-PCR testing; however, this oversensitive test will also detect viruses that are not alive (cannot be cultured) and hence do not contribute to active disease in the patient. The patient remains infective as long as the virus is live, and there is generally a phase-out of infectivity simultaneously with a phase-in of seroconversion (antibody generation). Fig 1 below is self-explanatory.
Statistical Adjustments Prior to Analysis
Raw data regarding total infected cases and seropravelence for the two serosurveys are given below:
Situation, originally; (Per mil population, corresponding seroprevalence data approx. 15 days later)
Case 1: As of 20th June, the total Covid-infected (virus test) was 2971, and the antibody-test positive was 234,800 (79x).
Case 2: As of 21st July, the total Covid-infected (virus-test) was 6550, and the antibody-test positive was 291,000 (44x).
Sampling Error. Extrapolating readings from a sample for the population may result in errors. The sample sizes for the two surveys were 25387 and 15311, respectively, with results 23.48% and 29.1% positive. The Adjusted Wald approach(12) adjusts for the population extrapolation by computing a confidence interval with a 95% confidence level of this assertion. The confidence intervals were [22.96% - 24.01%] for the first survey and [28.51% - 29.95%] for the second survey. We take the lower bound in both cases to avoid inflating the sampling bias. The resultant revision is below.
Situation, after removing the sampling error (per mil pop, corresponding seroprevalence data approximately 15 days later)
Case 1: As of 20th June, the total Covid-infected (virus test) was 2971, and antibody-test positive was 229,600 (77x).
Case 2: As of 21st July, the total Covid-infected (virus-test) was 6550, and antibody-test positive was 285,100 (44x).
Error of Sensitivity/Specificity of Diagnostic Kits. Both surveys used the KOVID KAVACH ELISA test, indigenously developed by the National Institute of Virology, Pune. This test quantifies IgG antibodies against the spike glycoprotein of the SARS-CoV-2 virus. The developers report(13) a sensitivity of 92.37% and a specificity of 97.9%. [These figures were unnecessarily mystified by a hasty Press Release by Indian Council of Medical Research (ICMR), reporting far-higher Specificity/Sensitivity scores(14), subsequently amended.] After adjusting for false positives and false negatives by standard formula (15)
Actual Prevalence Rate = (Seropravelence x Sensitivity) + (1-Specificity) x (1-Seropravelence)
The actual prevalence rate computes as 22.83% for Survey 1 and 27.84% for Survey 2.
After removing Sampling and Diagnostic Kit Errors: (Per mn pop, corresponding seroprevalence data ~15 days later)
Case 1: As of 20th June, the total Covid-infected (virus test) was 2971, and antibody-test positive was 228,300 (77x).
Case 2: As of 21st July, the total Covid-infected (virus-test) was 6550, and the antibody-test positive was 278,400 (43x).
Adjusting for Testing Volumes. On 20th June, when there were 2971 cumulative infected cases per million, a total of 351,909 tests were conducted. Over subsequent days, fresh tests added to the cumulative total until 13th July (by when 789853 tests), when the Government of Delhi notified that “reconciliation with ICMR figures” had led to a reduction of 97008 tests cumulatively(16). We have prorated this reduction across all previous days from 12th July. Data details are provided in Table 2 below. For some key dates, including 20th June and 21st July, the two equivalent dates for virus-tested positive cases correspond to the two serosurveys. We also provide in the table the Cumulative Covid Positivity Rate (Total Infections divided by Total Tests), as well as the Spot Positivity Rate on these dates. Spot Positivity Rate, as defined earlier, provides the best estimate of Fresh (Current) Positivity on a given date.
Cumulative by Date OR As on Date
|
Cumulative Covid Positivity Rate
|
Spot Positivity Rate
|
Total Infections till date
|
Total Tests till date
|
Adjusted Tests till Date
|
Total Infections per mn
|
Total Tests per mn
|
Cumulative Covid Positivity Rate
|
.Fresh Infections days D, D+1, D+2
|
Fresh Tests days D-1, D, D+1
|
Spot Positivity on Day D
|
20-Jun
|
56746
|
351909
|
308688
|
2971
|
16162
|
18.38%
|
9539
|
42729
|
22.32%
|
21-Jul
|
125096
|
851311
|
851311
|
6550
|
44571
|
14.70%
|
3617
|
52432
|
6.90%
|
20-Aug
|
157354
|
1375193
|
1375193
|
8238
|
72000
|
11.44%
|
3877
|
55554
|
6.98%
|
6-Sep
|
191449
|
1780512
|
1780512
|
10024
|
93221
|
10.75%
|
8942
|
97895
|
9.13%
|
Table 2: Cumulative and Spot Covid Positivity Rates on Selected Dates
|
We wish to adjust the Infected Cases to account for Testing Volumes and Positivity – i.e., we want to forecast how many additional Infected Cases would grow with increased volume of tests. We deal with two different forces at play. In a short time frame of a day or two, additional tests until a point would detect Covid-positive patients at the same rate as the Spot Positivity Rate; tests beyond that point would begin to detect more negative cases, reducing Spot Positivity. Over a longer time frame, fresh infections would emerge at a rate increasing or decreasing depending upon the disease trajectory in the community. In both the short run and the long run, it is difficult to forecast the outcome in terms of additional infected cases detected.
In perhaps the only study of its kind, Favero(17) identifies a statistical basis to adjust case counts with respect to testing volume by adjusting for current positivity rate. By Favero’s rule, the Total Outbreak number, or Adjusted Infected Cases, depends on the Spot Positivity at the time and is expressed as:
Adjusted Infected Cases = Actual Infected Cases x [1 + (Positivity Rate x 100 x Constant)], Constant=.01--> .02
For example, assuming the constant to be 0.02, for 2971 cases per million at a Spot Covid Positivity Ratio of 22.32%, the Adjusted Confirmed Cases works out to 4297 per million on June 20. Obviously, the Adjusted Infected Cases or Total Outbreak Number will match the testing strategy adopted. If the strategy is to test only 18-year-olds, the total outbreak number will only be with respect to 18-year-olds. Alternatively, if the strategy is to test high-incidence areas, the total outbreak number will reflect only those high-incidence areas. The Total Outbreak Number or Adjusted Infected Cases is not a miracle formula for the total confirmed cases in the world!
We assume a constant = 0.02 for our exercise. Given positivity 22.32% on 20th June and 6.9% on 21st July, actual infected cases will rise to 4297 and 7454 per million, instead of the initial scores of 2971 and 6550 per million, respectively.
Removing Sampling and Diagnostic Kit Errors & Adjusting for Testing Volumes:
(Per million population, corresponding seroprevalence data approx. 15 days later)
Case 1: As of June 20, the number of adjusted infected cases (virus test) was 4297, and the number of antibody-test-positive cases was 228,300 (53x).
Case 2: As of July 21, the number of adjusted infected cases (virus test) was 7454, and the number of antibody test-positive cases was 278,400 (37x).
Analysis of Differences in Seropravelence Multiples Over Two Studies
On 20th June, there were 53 times more antibody positive cases compared to recorded virus positives, and by 21st July, this multiple had gone down to 37x. Between the two dates, virus-tested cases increased by 3157 per million, but antibody-positive cases increased by only 16 times (50100 per mn), not by 53 times as would be expected. This seems to mimic the linear equation y = mx + c, where the antibody positives (y) equals a linear increase mx (x are the virus-positives, and m is 16 above), plus a constant c.
This phenomenon is explained if there is a proportion of the population that has pre-existing SARS-CoV-2 antibodies without having gone through the disease. If say, 150,000 per million have pre-existing antibodies (15%), then those developing antibodies after undergoing disease will roughly be a constant multiple of virus-positive cases. Our research question – why is there a drop in the multiples between two studies – would be answered by the existence of a population with pre-existing antibodies, and the multiple would then not change between studies.
We develop this model analytically and then solve for the values:
- Let X be the number per million within the population with pre-existing SARS-CoV-2 antibodies, or equivalent, such that they test positive to Covid 19 antibody tests without having undergone the disease. Equivalently, X divided by 10,000 is a percentage that is the non-susceptible population, and (100 – this %) is the percentage of the population susceptible.
- Seropravelence (expressed as a number found antibody-positive per million), less X, is the actual number of seroprevalent individuals who developed antibodies after contracting the disease. This follows from (1) above, where X (per million) seropositive were pre-existing Therefore, [Seroprevalent number – X] are those (per mn) who acquired antibodies and can be numerically compared with virus-tested positive cases.
- Let F be the Amplification Factor of virus-tested positive cases to account for the known phenomenon that the actual number of silent, asymptomatic, invisible Covid-19 cases and/or untested cases is invariably higher than recorded virus-tested positives. In other words, virus-positive (as a number of people tested positive per million) multiplied by F is the actual number of cases with Covid 19.
- [Virus-Positive x F] must equal [Seroprevalence -X]. In other words, those virus-tested-positive multiplied by the Amplification Factor (F) of asymptomatic invisible cases must equal the number of acquired-seroprevalent cases, e., seroprevalent minus those pre-existing seroprevalent.
We fit the data from the two surveys after all statistical adjustments:
[Virus-tested-positive] x F = [Seroprevalence -X]
4297 x F = 228300 -X … from Survey 1, where figures are per million population, and F is unit-less
and, 7454 x F = 278400 -X … from Survey 2
Solving the set of simultaneous equations leads to X = 160150, and F = 15.86
A total of 160150 per million, or 16.02%, are pre-existing with SARS-CoV-2 antibodies without having undergone the disease. Every virus-tested positive case represents 15.9 Covid-infected people, implying that 14.9 people were uncounted as possibly asymptomatic cases.
The discussion so far presumes the presence of pre-existing SARS-CoV-2 antibodies. However, these could equally be other antibodies cross-reactive with the SARS-CoV-2 antigen. This has been frequently reported in recent literature. Both Van der Heide(18) and Ma et al(19) in research published in June 2020 and August 2020 report the cross-reactivity of endemic human coronavirus (HCoV) antibodies against SARS-CoV-2, in one case as high as 10% among individuals not exposed to SARS-CoV-2. Pre-existing cross-reactive antibodies mean that antibodies generated after some other infection are effective against SARS-CoV-2. This is more likely than individuals who acquired precise SARS-CoV-2 antibodies without going through the disease.
To understand the range of variations in X (% of population with pre-existing antibodies) and F (Amplification Factor for any virus-tested positive case), we repeat the calculations with a different set of data. We use the figures prior to adjustment for testing volumes, which also helps us understand the impact of Test Volume adjustment. Fitting the data:
2971 x F = 228300 -X … from Survey 1, where figures are per million population, and F is unit-less
and, 6550 x F = 278400 -X … from Survey 2
implies, X = 186706 (18.67%), and F = 14.0
The results vary within a small range, with or without adjustments for testing volume. The pre-existing antibody coverage varies between 16% and 18.7%, while the amplification factor varies between 15.9 and 14.