To start, we introduce some notation. Disease status, \(D\), is denoted 1 if a subject has the disease in question (or for the case of seroprevalence, has antibodies for it), and 0 otherwise. Similarly, the result of the diagnostic test, \(Y\), is given as 1 if the subject tests positive for the disease, and 0 otherwise. FP is often used to refer to false positive test results, and similarly FN for false negatives, TN for true negatives and TP for true positives.
Prevalence is the probability of having the disease of interest, \(P=Pr\left(D=1\right)\). Often in prevalence studies, this probability is studied at a specific point in time, giving so-called point prevalence (14). Seroprevalence, a related concept, looks at the proportion of individuals in the population have antibodies for a specific disease, for example, SARS-CoV-2 (15). Sensitivity, denoted \(Se\), sometimes also called the true positive fraction (TPF), is the probability of having a positive test result, given that the subject has the disease, \(Pr\left(Y=1\vee D=1\right)\) (2). On the other hand, specificity, \(Sp\) is the probability of having a negative test result when a subject does not have the disease, \(Pr\left(Y=0\vee D=0\right)\) (sometimes 1 - specificity is discussed, which is often referred to as false positive fraction, or FPF (16)). In real settings where true disease status is known via another method, sometimes referred to as the “gold standard”, \(Se\) can be computed as \(TP/\left(TP+FN\right)\), where TP is the number of true positives and FN is the number of true negatives. Similarly, \(Sp\) can be computed as \(1-FP/\left(FP+TN\right)\).
The proportion of positive tests can be expressed as
$$Pr\left(Y=1\right)=\left(FP+TP\right)/\left(FP+TP+TN+FN\right),$$
while the disease prevalence in the sample can be expressed as
$$Pr\left(D=1\right)=\left(FN+TP\right)/\left(FP+TP+TN+FN\right).$$
The difference between these two quantities is simply \(\left(FP-FN\right)/\left(FP+TP+TN+FN\right)\), that is, the proportion of false positives minus the proportion of false negatives.
According to the definition of joint probability \(Pr\left(A,B\right)=Pr\left(A\vee B\right)Pr\left(B\right)\), the proportion of false positives can be written as
$$Pr\left(Y=1,D=0\right)=Pr\left(Y=1\vee D=0\right)Pr\left(D=0\right),$$
which simplifies to \(\left(1-P\right)\left(1-Sp\right)\). In a similar fashion, the proportion of false negatives can be written as
$$Pr\left(Y=0,D=1\right)=Pr\left(Y=0\vee D=1\right)Pr\left(D=1\right),$$
which simplifies to \(P\left(1-Se\right)\). The bias when using the proportion of positive tests, \(Pr\left(Y=1\right)\), to estimate the proportion with disease, \(Pr\left(D=1\right)\), is therefore \(\left(1-P\right)\left(1-Sp\right)-P\left(1-Se\right)\) or equivalently \(1-Sp+P\left(Sp+Se-2\right)\).
Suppose we want to guarantee that the bias is no larger than, say, \(\delta =0.02\), that is ± 2% in either direction. We can solve
$$-\delta \le 1-Sp+P\left(Sp+Se-2\right)\le \delta$$
for \(P\), to get:
$$max\left(\frac{\delta +Sp-1}{Sp+Se-2},0\right)\le P\le min\left(\frac{-\delta +Sp-1}{Sp+Se-2},1\right).$$
The lower bound will be 0 if \(\delta \ge 1-Sp\), while the upper bound will be 1 if \(\delta \ge 1-Se\). Therefore, if both \(Se\) and \(Sp\) are very high, say 99% or higher, then the proportion of positive tests is a good estimate of the true prevalence. If only \(Se\) is that high, this is will be true only when the true prevalence is quite high, and conversely if only \(Sp\) is very high, this will be true only when true prevalence is quite low. When neither \(Se\) nor \(Sp\) is high, the proportion of positive tests may or may not be a good estimate of the true prevalence.
One simple way to reduce this bias, if no dependence on covariates is assumed, is to use the Rogan-Gladen correction (4). Assuming an observed fraction \({P}_{obs}\) of positive test results, the corrected prevalence is
$${P}_{RG}=\frac{{P}_{obs}+Sp-1}{Se+Sp-1}.$$
In a small number of cases, primarily when the sample size and the prevalence are both small (17, 18), the Rogan-Gladen correction will yield values less than 0 or greater then 1. However, even if this “clipped” version has some bias, the variance will be smaller.
The systematic review of recent studies of seroprevalence in the literature started with a pubmed (https://pubmed.ncbi.nlm.nih.gov/) search for “covid-19 seroprevalence”, which yielded 637 publications published in 2022. Publications were included in the systematic review if they assess COVID-19 seroprevalence in humans, and were published in 2022 in English or German. Exclusion criteria included: 1) studies comparing seroprevalence in different subgroups, 2) studies examining risk factors for seropositivity, 3) studies in animals, 4) reviews, 5) methodological papers, 6) studies with possible conflict of interest, 7) if the full text was not available or 8) if the publication was a research letter. The following information was extracted: 1) whether the aim of the study was to assess COVID-19 seroprevalence in humans, 2) the sensitivity and 3) specificity of the diagnostic test, 4) the reported seroprevalence estimate (the first mentioned value, and if unadjusted was reported before adjusted, we extracted the most adjusted value of the first mentioned seroprevalence), and 5) which statistical methods were used to calculate seroprevalence. A protocol for the systematic review was developed using the PRISMA-P checklist (https://osf.io/b59x2/). Two independent reviewers (SRH and DK) screened the publications using the rayyan.ai web-based tool, and performed data extraction in parallel using a structured spreadsheet. Discrepancies were resolved by discussion. Summary statistics were computed for the methods used (n (%)), reported sensitivity and specificity (median [range]) and estimated bias (median [range]).
To provide a concrete example of this problem, we use the Ciao Corona study (19), a school-based longitudinal study of seroprevalence in Swiss school children with 5 rounds of SARS-CoV-2 antibody testing between June 2020 and June 2022, covering a range of seroprevalences in the population (Trial Registration: ClinicalTrials.gov NCT04448717). The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Canton of Zurich, Switzerland (2020 − 01336). All participants provided written informed consent before being enrolled in the study.
Patient and public involvement It was not appropriate or possible to involve patients or the public in the design, or conduct, or reporting, or dissemination plans of our research.