Simulation methods
A thorough summary of the simulation methods and a full list of microsimulation parameter sets and results are included in the supplementary materials. We created populations of 100,000 people aged 18–49 years without protection against SARS-CoV-2 infection at the beginning of the COVID-19 pandemic (the Morbidity and Mortality Weekly Report [MMWR] week of January 19, 2020).19 Each week until the MMWR week of May 7, 2023, we updated each person’s vaccine- and infection-induced protection against SARS-CoV-2 infection based on their most recent infection and vaccine dose since people could accumulate multiple infections and doses over time (Fig. 1).
Weekly infection probabilities were derived from aggregated case count data from 60 U.S. jurisdictions19 divided by 2020 population estimates20 (Figure S1). These proportions were increased by a multiplier (Figure S2) to account for underreporting of infections21 and to reach approximately 95%-98% of the population acquiring a prior infection by the end of the study period (Figure S3).
Individual, weekly probabilities of vaccination receipt utilized U.S. vaccination data, the number of prior doses and prior infection status. Data on vaccination distributions by vaccination dosage and week for U.S. people aged 18–49 years22 (Figure S4) were fit to probability distributions (Figure S5). For naïve people, we set the probability of obtaining two vaccination doses at 0.70, the probability of a third dose conditional on having two doses at 0.30, and the probability of a fourth dose conditional on having a third dose at 0.10.22 People with a prior SARS-CoV-2 infection have been less likely to initiate or subsequently receive an additional vaccination dose.23–28 We assumed people with a prior infection were less likely to receive an additional dose with an odds ratio of 0.525.23–28
A person’s protection was based on the most recent week of vaccination and infection. Waning curves were based on trajectories in published literature.29–34 A week after vaccination we assumed 90% vaccine-induced protection against infection (VP) prior to Omicron predominance and 70% protection thereafter. VP waned linearly to zero (1) at 48 weeks post-vaccination prior to Omicron predominance and 24 weeks thereafter or (2) at 24 weeks post-vaccination prior to Omicron predominance and 12 weeks thereafter (Figure S6) with variability by person (Figure S7). A week after infection, infection-induced protection (IP) had 90% protection against infection that waned to zero at 96 or 72 weeks (Figure S8) again with variability by person (Figure S9).
Hybrid immunity or protection (HP) definitions were taken from meta-analyses of protective effectiveness:33,35 (1) the greater of VP or IP was boosted by 30% of the other (Figure S10); or (2) VP was boosted by 30% of IP or IP was boosted by 10% of VP, whichever was greater (Figure S11). Both HP definitions were truncated at 99%. In these simulations, we considered 8 different protection calculations since we simulated each combination of the two VP, two IP, and two HP definitions.
Infections were generated from a person’s weekly protection with the function
$$\:\text{P}\text{r}\left({I}_{j,k}\right)=\text{P}\text{r}\left({c}_{k}\right)\text{*}\left(1-{\psi\:}_{j,k-1}\right),$$
where the probability of infection for each person (\(\:j\)) and week (\(\:k\)), \(\:\text{P}\text{r}\left({I}_{j,k}\right)\), depended on \(\:\text{P}\text{r}\left({c}_{k}\right)\), the case probability in week \(\:k\) and person \(\:j\)’s protection calculated from the previous week (\(\:{\psi\:}_{j,k-1}\)). An infection for person \(\:j\) in week \(\:k\) was generated from a Bernoulli distribution with probability \(\:\text{P}\text{r}\left({I}_{j,k}\right)\).
A total of 200 populations were generated for each of the 8 protection definition combinations. An example of protection trajectories is included in the supplementary materials (Figure S12).
The analytic period consisted of a hypothetical 32-week period beginning immediately after the historical period. Infections, vaccination doses, and protection were generated similarly to the historical period. Parameters were the 8 protection definition combinations, case distribution, vaccination rollout timing, total vaccination coverage, TND timing, and type of outcome (infection or severe disease).
Four infection distributions were utilized during the analytic period (Figure S13): weekly 2%; weekly 4%; weekly 2% increasing to a peak of 4% at weeks 16 and 17 before returning to 2%; and weekly 2% increasing to a peak of 6% at weeks 16 and 17 before returning to 2%. The vaccination rollout happened in weeks 1–12 (before the case peak), weeks 11–22 (during the case peak), or weeks 21–32 (after the case peak) and followed a lognormal distribution with a mean of 1.5 and a standard deviation of 0.5 (Figure S14). Other weeks had a vaccination probability of 0.005. Total vaccination coverage in the analytic period was 10% based on fourth dose vaccination coverage22 or 25% (Figure S15). The TND for symptomatic infections was implemented in weeks 1–12, weeks 11–22, or weeks 21–32. Since we implemented all possible combinations of vaccination rollout and TND timing, some scenarios involve assessing VE via the TND before the vaccination rollout. This is the equivalent of assessing VE long after vaccination has been given. All 32 weeks were used for the TND for severe disease.
COVID-19 symptoms were expected in 80% of infected people (Figure S16) and were present only in the week of infection. An uninfected person in week \(\:k\) was expected to have COVID-like symptoms with a probability of 0.20 divided by the number of weeks in the TND (Figure S17). For estimating VE against symptomatic infection, all symptomatic people were included in the TND. Diagnostic testing was assumed to have perfect specificity, but sensitivity was 90% during the week of infection and declined thereafter36 (Table S1).
For estimating VE against severe disease, VP was 90% the week after vaccination and waned to zero after 48 months.31,32,34,37 IP against severe disease started at 95% protection the week after infection and waned to zero after 96 months33 (Figure S18). For people with a SARS-CoV-2 infection, the probability of severe disease was
$$\:\text{P}\text{r}\left({S}_{j,k}|{I}_{j,k}=1\right)=\frac{\left(1-{{\psi\:}^{s}}_{j,k-1}\right)}{\left(1-{\psi\:}_{j,k-1}\right)},$$
where \(\:{S}_{j,k}\) is a severe disease event for person \(\:j\) in week \(\:k\), and \(\:{{\psi\:}^{s}}_{j,k-1}\) is the protection against severe disease for person \(\:j\) in week \(\:k\). All people with severe disease were included in the TND with perfect detection.
A total of 1000 simulations were run for each parameter set. Each of the 200 populations was utilized five times in each parameter set.
Statistical methods
Exposures analyzed were vaccination at any time during the analytic period, vaccination in the previous 2 months, vaccination in the previous 3 months, vaccination in the previous 4 months, vaccination in the previous 5 months, vaccination in the previous 6 months, the number of doses (unvaccinated as the reference group, 2-dose, 3-dose, 4-dose, or 5-dose), and the time since vaccination (unvaccinated as the reference group, 0–2 months, 3–4 months, 5–11 months, and 12 or more months).
Two logistic regression models were fit to each exposure definition. The first model included only the exposure variable (henceforth, the unadjusted model), whereas the second model added categorical time since the last infection (categories were monthly from 1 to 11 months and 12 or more months) and the number of prior infections as a continuous variable (the adjusted model). Odds ratios (OR) from logistic regressions were converted to VE in percentage points by the formula \(\:\text{VE}=\left(1-\text{OR}\right)*100\).
Our primary measurement is the difference between the VE estimate from the unadjusted model and the VE estimate from the adjusted model, which we refer to as bias. Bias is defined not in the traditional sense as the deviation from truth, but as the percentage point difference in VE from the unadjusted model and VE from the adjusted model. Bias less than zero indicated VE was underestimated without accounting for prior infection. A small percentage of simulations resulted in small sample sizes and unstable estimates. Details on bias definition and handling of unstable estimates are in the supplementary methods.
Results were aggregated by parameter set and exposure and plotted by exposure with ridgeline plots (Figure S20). Simple, random effects meta regression was used to estimate the expected VE and bias and, for infection outcomes, the percentage of simulations with a negative VE estimate. Separate meta regressions were run for the unadjusted and adjusted VE estimates. Multivariable meta regression models were run with simulation parameters to determine the mean VE, bias, and negative VE associated with each parameter level and the 95% confidence intervals. The overall mean and 95% confidence interval from simple meta regressions are used in Figs. 2 and 3 and values from multivariable meta regressions in Figs. 4 and 5 including the overall mean as a dashed line and the 95% CI as a shaded background.
All simulations were performed in R version 4.0.4 and analyses in R version 4.2.4.