Received specimens
We received self-collected saliva from 121 unique participants: 40 participants (33%) who tested positive for SARS-CoV-2 infection by RT-PCR and 81 participants who tested negative (67%). As shown in Table 1, the participants were diverse in age, gender, ethnicity, and socioeconomic status (SES). Participants who tested positive had similar demographics to participants who tested negative. The groups were well-balanced by gender and SES, as indicated by the area disadvantage index (ADI)36. Participants whose PCR test did not detect SARS-CoV-2 (PCR- participants) tended to be slightly older and more likely to be insured through Medicare than participants whose PCR test detected SARS-CoV-2 (PCR+ participants). This trend may result from a lower positivity rate among older asymptomatic patients undergoing pre-surgical clearance for outpatient surgery than among patients tested due to symptoms of COVID-19. Additionally, PCR+ participants were less likely to identify as non-Hispanic White, reflecting the rates of COVID-19 infection among different ethnic groups37.
Table 1
Demographics of study participants; p-values were computed using the chi-square test. Fisher’s exact tests were also run in consideration of small cell sizes. In all cases, the p-values for the chi-square or Fisher’s tests were >0.05.
| | | Result of SARS-CoV-2 PCR Test at Recruitment | P-value for difference between SARS-CoV-2 PCR Test Result (Detected vs. Not Detected) |
Detected (N= 40) | Not Detected (N=81) |
| | Number of Participants | Percentage of All Participants | Number of PCR+ Participants | Percentage of All PCR+ Participants | Number of PCR- Participants | Percentage of All PCR- Participants |
Age | ≥ 75 years | 7 | 5.79 | 1 | 2.50 | 6 | 7.41 | 0.31 |
65-74 years | 23 | 19.01 | 5 | 12.50 | 18 | 22.22 |
50-64 years | 42 | 34.71 | 15 | 37.50 | 27 | 33.33 |
18-49 years | 49 | 40.50 | 19 | 47.50 | 30 | 37.04 |
Gender | Female | 63 | 52.07 | 21 | 52.50 | 42 | 51.85 | 0.95 |
Male | 58 | 47.93 | 19 | 47.50 | 39 | 48.15 |
Race / Ethnicity | Hispanic | 3 | 2.48 | 1 | 2.50 | 2 | 2.47 | 0.07 |
Non-Hispanic Black | 40 | 33.06 | 16 | 40.00 | 24 | 29.63 |
Non-Hispanic White | 73 | 60.33 | 19 | 47.50 | 54 | 66.67 |
Other / Unknown | 5 | 4.13 | 4 | 10.00 | 1 | 1.23 |
Health Plan | Standard HMO | 71 | 58.68 | 29 | 72.50 | 42 | 51.85 | 0.07 |
High deductible | 8 | 6.61 | 2 | 5.00 | 6 | 7.41 |
Medicare | 31 | 25.62 | 6 | 15.00 | 25 | 30.86 |
Medicaid | 4 | 3.31 | 0 | 0 | 4 | 4.94 |
Other / Unknown | 7 | 5.79 | 3 | 7.50 | 4 | 4.94 |
Area Disadvantage Index | Highest SES qtl. | 47 | 38.84 | 14 | 35 | 33 | 40.74 | 0.85 |
Upper mid-qtl. | 42 | 34.71 | 15 | 37.5 | 27 | 33.33 |
Lower mid- qtl. | 15 | 12.4 | 6 | 15 | 9 | 11.11 |
Lowest SES qtl. | 17 | 14.05 | 5 | 12.5 | 12 | 14.81 |
Participants were instructed to return specimens immediately upon receiving an enrollment kit and then again at 10 and 30 days after their PCR test. Among participants who provided a specimen, the majority provided three specimens as instructed; however, the timing of collection and mailing in some cases varied considerably from the specified times. Samples were mailed by participants as early as 1 day and as late as 102 days after their RT-PCR test. The mode and median of days between PCR test and mailing the first sample were 9 and 11 days, respectively. To create groups with roughly equal numbers of samples, samples were divided into three time categories depending on whether they were mailed by participants <2 weeks, 2-4 weeks, or 4-8 weeks after the PCR test (Table 2).
Table 2
Overview of Received Specimens. Table shows number of participants who returned a specimen within the indicated time after a PCR test for SARS-CoV-2. Additionally, numbers of participants returning the indicated number of specimens are listed.
| Result of SARS-CoV-2 PCR Test at Recruitment |
| Not Detected (N=81) | Detected (N=40) |
Time between PCR test and when sample mailed | Number of PCR- Participants | Percentage of All PCR- Participants | Number of PCR+ Participants | Percentage of All PCR+ Participants |
≤2 Weeks | 47 | 58.0% | 28 | 70.0% |
2-4 Weeks | 47 | 58.0% | 25 | 62.5% |
4-8 Weeks | 63 | 77.8% | 27 | 67.5% |
>8 Weeks | 15 | 18.5% | 2 | 5.0% |
Number of Specimens Returned by Participant | | | | |
1 Specimen | 14 | 17.3% | 7 | 17.5% |
2 Specimens | 10 | 12.3% | 9 | 22.5% |
3 Specimens | 57 | 70.4% | 24 | 60.0% |
The transit time of specimens from participants to the laboratory ranged from 1 to 31 days (1st quartile = 1.4 days, median = 1.9 days, 3rd quartile = 3.5 days) in the mail with 91% arriving in less than 5 days, which was our target window based on prior testing showing stability for 5 days35. Two specimens that were in the mail over 20 days were excluded from analysis of test performance, but were included in an analysis investigating potential indicators of sample degradation.
IgG reactivity to SARS-CoV-2 antigens
Concentrations of salivary IgG reactive to coronavirus antigens are shown in Figure 1. Antibody positivity for SARS-CoV-2 antigens was determined based on pre-established thresholds set at the 98th percentile for saliva self-collected from presumed naive participants (no PCR confirmed diagnosis, no household exposure, and no symptoms of COVID-19) in a previous study35. Pre-established IgG thresholds for Spike, RBD, and N were 0.963, 0.244, and 3.18 AU/mL, respectively.
Clinical performance of the serology assays was determined relative to the COVID-19 PCR test result at enrollment. Sensitivity and specificity were calculated, respectively, as (i) the proportion of saliva specimens from PCR-confirmed cases with antibody levels above the pre-established thresholds and (ii) the proportion of saliva specimens from PCR-negative cases with antibody levels at or below the pre-established thresholds. Measured sensitivity and specificity values are provided in Table 3. The SARS-CoV-2 Spike IgG assay provided the best overall accuracy. The sensitivity was only 40.7% within two weeks of PCR testing, but increased to 96.0% at 2-4 weeks and 92.6% at 4-8 weeks after PCR testing. The specificity was 92.4%. By comparison, when the same assay was evaluated with serum samples in an independent study, the sensitivity and specificity were reported as 90.8% and 97.4%, respectively.38 The SARS-CoV-2 N IgG assay performed similarly to the SARS-CoV-2 Spike assay with point estimates for sensitivity and specificity that were not statistically different. The SARS-CoV-2 RBD IgG assay exhibited similar sensitivity; however, the specificity was significantly poorer (Table 3), which may indicate that the pre-set assay threshold was not optimal (see discussion of threshold verification below).
Table 3
Sensitivity and specificity of salivary IgG for detection of prior SARS-CoV-2 infection. Point estimates and 95% confidence intervals (indicated in parentheses) were computed at pre-established thresholds.
Antigen | Isotype | Threshold (AU/mL) | Sensitivity at <2 weeks | Sensitivity at 2-4 weeks | Sensitivity at 4-8 weeks | Specificity |
SARS-CoV-2 Spike | IgG | 0.963 | 40.7% (22.4%-61.2%) | 95.8% (78.9%-99.9%) | 92.6% (75.7%-99.1%) | 92.4% (87.4%-95.9%) |
SARS-CoV-2 N | IgG | 3.18 | 48.1% (28.7%-68.1%) | 79.2% (57.8%-92.9%) | 92.6% (75.7%-99.1%) | 90.7% (85.3%-94.6%) |
SARS-CoV-2 S1 RBD | IgG | 0.244 | 66.7% (46.0%-83.5%) | 91.7% (73.0%-99.0%) | 92.6% (75.7%-99.1%) | 64.5% (56.9%-71.7%) |
IgG reactivity to the SARS-CoV-2 spike protein was highly correlated with IgG reactivity to the N protein and RBD domain of spike protein, especially for samples from PCR-positive cases (Figure 2). The correlation of the reactivities to RBD and Spike (Figure 2a) shows that the concentrations of anti-RBD IgG antibodies tended to be about 3-fold lower than for full-length spike. As RBD is a fragment of the Spike protein, the difference in antibody activity is likely due to the reduced number of antigenic epitopes displayed for RBD relative to Spike. For PCR-negative individuals, measured reactivities of IgG to SARS-CoV-2 N tended to span a larger range than IgG to SARS-CoV-2 Spike, which was heavily skewed to the bottom of the assay range. This result may be a consequence of cross-reactive host antibodies from previous infections with other circulating coronaviruses, since the N protein has greater conservation across human coronaviruses than the Spike protein. However, the effect of any cross-reactivity on assay performance was small, with the N assay showing only a small and non-statistically significant decrease in specificity relative to the Spike assay.
We looked for evidence that elevated antibody levels in PCR-negative participants may have been due to undiagnosed infections. For the 67 PCR-negative participants who provided at least two samples, 6 of these 67 participants had at least one sample above the assay threshold for the SARS-CoV-2 Spike IgG assay. Of these 6 participants with at least one positive sample, 2 had salivary IgG levels above the threshold for SARS-CoV-2 Spike for all three of their samples. These participants also had salivary IgG levels above the threshold for SARS-CoV-2 N protein, which suggests an undiagnosed infection prior to enrollment. Of the 6 participants with at least one positive sample, 2 participants had an initial negative sample and then showed delayed sero-conversion after 30 days. The sero-conversion was also observed using the SARS-CoV-2 N IgG assay, which suggests that these participants may have been infected at enrollment and received a false negative PCR test,39 or they may have become infected after enrollment.
Verification of pre-established thresholds
Receiver operator characteristic (ROC) curves were generated (Figure 3), and the area under curve (AUC) values for the ROC curves were calculated (Supplemental Table 1) to compare the diagnostic performance of the serology assays at different times after nasal PCR testing and to confirm that the pre-determined thresholds were optimal for identifying infections. For all three SARS-CoV-2 antigens, the area under the curve (AUC) was significantly greater for samples collected more than two weeks after PCR testing relative to samples collected within 2 weeks of testing, largely reflecting the higher sensitivity that was observed for the later samples (Table 3). ROC curves for samples and the associated AUC values were not significantly different for samples collected 2-4 weeks and >4 weeks after PCR testing, indicating that the assay achieved optimal diagnostic performance by the 2 week time point. The ROC curves for the SARS-CoV-2 Spike and N IgG assays were similar with AUC values of 0.926 and 0.916, respectively, for samples collected 4-8 weeks after PCR testing. The SARS-CoV-2 RBD assay provided poorer classification with an AUC value of 0.883 for the same samples.
To assess the validity of our pre-established thresholds, we computed the optimal thresholds using data only from this study by identifying thresholds that maximize the sum of sensitivity and specificity. The pre-established thresholds and the optimal thresholds for this study are compared graphically in Figure 3. For the spike and N IgG assays, the pre-established thresholds were close to optimal and no significant improvement in sensitivity or specificity could be achieved by adjusting the threshold. In contrast, the pre-determined threshold for the RBD IgG assay was lower than optimal and increasing the threshold from 0.244 AU/mL to 0.684 AU/mL greatly improved the specificity for samples collected 4 weeks or later after PCR testing from 64–87%, while causing a much smaller loss in sensitivity from 93–89%.
Exploration of Retest Criteria
Although saliva collection is simple and intuitive, the potential for poor specimen quality should be addressed when specimens are self-collected without supervision and transported under uncontrolled conditions. We explored options for identifying specimens of high risk for providing inaccurate results including (i) the measurement of salivary antibodies that are expected to be universally abundant due to vaccinations or common natural infections, (ii) the measurement of total salivary immunoglobulin levels, and (iii) the measurement of background assay signals in the absence of an antigen target.
Prior infection with endemic coronaviruses is common,40,41 so we expected that all donors would have high levels of antibodies to at least one of the four pre-COVID-19 endemic coronaviruses.42 The multiplexed antigen panel used to measure antibodies to SARS-CoV-2 antigens also measured antibodies against the spike antigens for the four pre-COVID-19 endemic coronaviruses HKU1, NL63, OC43, and 229E (Figure 4a). Nearly all specimens had readily detectable levels of antibodies to a spike protein of at least one endemic coronavirus. As an aggregate metric of reactivity to endemic coronaviruses, we computed the geometric mean of salivary IgG for HKU1, NL63, OC43, and 229E. We flagged eight outlier samples with geometric means below 0.17 AU/mL, which is the geometric mean of the 5th percentiles of the salivary IgG for these four antigens measured in a prior study35. These outliers appear to result from an issue with sample collection or sample deterioration, as opposed to the lack of immunity to an endemic coronavirus due to the absence of previous exposure or from general immunosuppression. In all cases where donors provided at least one other sample, normal levels of antibodies for endemic coronaviruses were measured at another time point. Sample deterioration due to delayed transit time in the mail could explain some of the flagged outliers, but not all of them. Two of the flagged samples were the samples with the longest transit times (>20 days due to a general slowdown in mail during a period of this study), but the other 6 flagged samples were received within the target range of 5 days. For samples received within the target time range, there was no clear dependence of measured antibody levels with transit time (Figure 4b). We note that for the 8 flagged samples, 6 were true negatives, so excluding these samples did not significantly impact the reported sensitivity or specificity.
We also measured the total concentration of salivary IgG, IgM and IgA using a separate assay panel run at a different dilution (Figure 4c). Median concentrations of total IgG, IgA and IgM were 3.3 µg/mL, ≥ 200 µg/mL (the top of the assay dynamic range at the selected sample dilution) and 3.4 µg/mL, respectively, which are comparable to the values we measured previously43 (1.8 µg/mL for IgG, 124 µg/mL for IgA and 3.7 µg/mL for IgM). Moreover, the IgG, IgA and IgM concentrations align with published ranges measured using a different assay and collection method (IgG range = 0.4-93 µg/mL44; IgA = 50.2 ± 19.1 µg/mL45; IgM = 0.5-13.0 µg/mL46). Low observed levels for the endemic coronaviruses were generally associated with low levels of total immunoglobulin. Of the eight samples that were flagged for low antibody levels against the four endemic coronaviruses, six samples had undetectable total IgG levels at the sample dilution used for the total immunoglobulin measurement, and four samples provided the lowest measured levels of total IgA (Figure 4c).
Non-specific binding is another potential source of measurement error. Bovine serum albumin (BSA) was included as an antigen in the multiplex as a negative control. Specific binding of anti-BSA antibodies in samples should not occur due to the high concentration of BSA present in the assay diluents, therefore, binding to the non-BSA element in the antigen array should be indicative of antibodies that are able to bind non-specifically to the array surface. Non-specific binding, as assessed by signal for the control spot coated with BSA, was generally low (average of 191 counts). Two specimens from the same PCR negative donor were noted to exceed 5,000 counts on the BSA coated spot, whereas a third intermediate sample from the same donor showed low non-specific binding.