All participants that were eligible for this study were enrolled; no a-priori sample size analyses were conducted to guide enrollment. All analyses were observational, and investigators were aware of participant exposure and outcome status.
Setting
Participants were selected from the United States Veterans Health Administration (VHA) electronic health databases. The VHA delivers healthcare to discharged Veterans of the US armed forces in a network of nationally integrated healthcare systems including more than 1,415 healthcare facilities. Veterans enrolled for care in the VHA have access to extensive medical benefits such as inpatient and outpatient services, preventative, primary and specialty care, mental health services, geriatric care, long term and home healthcare, medications, and medical equation and prosthetics. VHA electronic health databases are updated daily.
Cohorts
A flowchart of cohort construction is provided in Supplementary Fig. 3. We first identified users of the VHA with at least one positive SARS-CoV-2 test between March 1, 2020 and September 4, 2021 (N=310,223), enrolling these participants at date of first positive test (set as T0). Use of the VHA was defined as having record of use of outpatient or inpatient service, receipt of medication, or use of laboratory service with the VHA health care system in the two-years prior to enrollment. We excluded those who died during the first 30 days after the first positive SARS-CoV-2 test (N=296,353). We then further selected participants who experienced reinfection, defined as positive SARS-CoV-2 test more than 30 days after the first infection13,14. There were 38,926 participants who had a reinfection within the 6 months following 30 days after T0, while 257,427 participants had only the first infection.
We then constructed a non-infected control group. We first identified 5,714,736 VHA users between March 1, 2020 and September 4, 2021 with no record of a positive SARS-CoV-2 test. We then randomly assigned a T0 to each participant in the group on the basis of the distribution of the T0 dates in those with at least one positive SARS-CoV-2 test between March 1, 2020 and September 4, 2021. We excluded those who died in the first 30 days after their T0, yielding a control cohort of 5,396,855. All cohort participants were followed until April 4th, 2022.
Data sources
Participant data was obtained from the VHA Corporate Data Warehouse (CDW). The SPatient and Patient domains provided data on demographic characteristics. The Outpatient and Inpatient Encounters domains provided health characteristic information including details on date and place of encounter with the healthcare system, as well as diagnostic and procedural information. The Pharmacy and Bar Code Medication Administration domains provided medication records, while the Laboratory Results domain provided laboratory rest results for tests conducted in both inpatient and outpatient settings7,15. Information about SARS-CoV-2 tests and vaccinations were obtained from the COVID-19 Shared Data Resource (CSDR). The 2019 Area Deprivation Index (ADI) at each cohort participant residential address was used as a contextual measure of socioeconomic disadvantage16. Information from the US Center for Disease Control and Prevention (CDC) provided portion of SARS-CoV-2 variant by week in each Health and Human Services region.
Outcomes
Outcomes were pre-specified on the basis of on prior evidence1-9,15,17-22. Outcomes included all-cause mortality, hospitalization, having at least one sequela, as well as organ system disorders including cardiovascular disorders, coagulation and hematologic disorders, diabetes, fatigue, gastrointestinal disorders, kidney disorders, mental health disorders, musculoskeletal disorders, neurologic disorders, and pulmonary disorders. Organ system disorders were defined at date of first incident sequela in that system during follow-up. A list of individual sequelae by organ system are provided in Supplementary Table 14. The outcome of “at least one sequela” was defined at the time of occurrence of first incident sequela among all individual sequelae. For a participant, for a given outcome, each individual sequala was included in the assessed outcome only when there was no record of that health condition in the two years prior to T0. Participants were excluded from the analysis of an outcome if they had prior history of all the individual sequalae that contributed to the outcome being examined. Hospitalization was defined as first inpatient admittance during follow-up. In analyses of kidney disorders, participants with a prior history of end stage kidney disease (ESKD) were excluded, and follow-up was censored at time of ESKD (Supplementary Table 14).
Covariates
Covariates included a set of variables that were predefined based on prior knowledge4-7,15,17,19-21,23-29 and a set of variables that were algorithmically selected. Predefined covariates included demographic information (age, race, and sex), contextual information (ADI), and measures of the healthcare utilization in the two years prior to T0, which included the number of outpatient visits, inpatient visits, unique medication prescriptions, routine laboratory blood panels, and utilization of Medicare services, as well as a prior history of receiving an influenza vaccination. Smoking status was also included as a covariate. Characteristics of the participants health history included record of anxiety, cancer, cardiovascular disease, cerebrovascular disease, chronic kidney disease, chronic obstructive pulmonary disease, dementia, depression, type 2 diabetes mellitus, estimated glomerular filtration rate, immunocompromised status, peripheral artery disease, as well as systolic and diastolic blood pressure and body mass index (BMI). Immunocompromised status was defined according to CDC definitions by a history of organ transplantation, advanced kidney disease (an estimated glomerular filtration rate less than 15 ml/min/1.73m2 or end stage renal disease), cancer, HIV, or conditions with prescriptions of more than 30 days use of corticosteroids or immunosuppressants including systemic lupus erythematosus and rheumatoid arthritis.
We also included a set of covariates related to the acute phase of the first infection: severity of the acute phase of the disease, defined in mutually exclusive groups of non-hospitalized, hospitalized, and admitted to the ICU during the acute phase, and whether the participant received a SARS-CoV-2 treatment (antivirals, antibodies, and steroids). We also included — as measures of spatiotemporal differences — the calendar week of enrollment and geographic region of receipt of care defined by Veterans Integrated Services Networks. We also adjusted for vaccination status, which was defined as receiving 0, 1, or 2 or more Janssen [Johnson & Johnson] (Ad26.COV2.S) vaccination, Pfizer-BioNTech (BNT162b2) or Moderna (mRNA-1273) vaccination shots. In consideration of the dynamicity of the pandemic, additional covariates included hospital system capacity (the total number of inpatient hospital beds), and inpatient bed occupancy rates (the percentage of hospital beds that were occupied), as well as a measure of the proportions of SARS-CoV-2 variants by Health and Human Services region29. These measures were ascertained for each participant in the week of cohort enrollment at the location of the health care system they received care at.
In addition to the predefined covariates, we leveraged the high dimensionality of VA electronic health records by employing a high dimensional variable selection algorithm to identify additional covariates that may potentially confound the examined associations30. We used the diagnostic classifications system from the Clinical Classifications Software Refined (CCSR) version 2021.1, available from the Healthcare Cost and Utilization Project sponsored by the Agency for Healthcare Research and Quality, to classify more than 70,000 ICD-10 diagnoses codes in the two years prior to T0 for each participant into 540 diagnostic categories31-33. Using the VA national drug classification system, we also classified 3,425 different medications into 543 medication classes34,35. Finally, on the basis of Logical Observation Identifiers Names and Codes, we classified laboratory results from 38 different laboratory measurements into 62 laboratory test abnormalities, defined by being above or below the corresponding reference ranges. Of the high dimensional variables that occurred at least 100 times in participants in each group, we selected the top 100 variables with the highest relative risk for differences in group membership in first infection or reinfection.
Statistical Analysis
Mean (standard deviation) and frequency (percentage) of characteristics are reported for those with first SARS-CoV-2 infection, SARS-CoV-2 reinfection, and the non-infected control group, where appropriate. We provide information on the distribution of frequency of reinfections and time between infections.
All associations were estimated based on weighting approaches combined with survival analyses. We conducted a primary analysis to evaluate the risk and burden of reinfection in comparison to first infection (Supplementary Fig. 3), where reinfection was compared to those with first infection at the same time since the first infection. To achieve this, we constructed six sets of sub cohorts by 30-day time periods starting from 30 days after T0 where within each period, participants were assigned to the reinfection or first infection group dependent on having a reinfection during that period. Those with a reinfection prior to the period were excluded. Participants with multiple reinfections were not censored at time of third plus infection. Participants, by period, were followed from time of reinfection (T1) up to death, six months, or administrative censoring. To enhance comparisons, within each sub-cohort the distribution of time from the initial infection for the first infection group was randomly assigned on the basis of the distribution in those with reinfection.
For each sub cohort, logistic regressions were constructed to estimate the propensity score of group membership. A reference cohort of the overall infected cohort at T0 was used as the target population. Inverse probability weighting was then used to balance of covariates. Differences in duration of follow-up were adjusted for. Cohorts across periods were pooled to estimate the average risk difference between those with and without a repeat infection using a weighted Cox survival model conditional on period. Standard errors were estimated by applying the robust sandwich variance estimator method. Covariate balance among all predefined and high dimensional variables were assessed for each group/period pair through the standardized mean difference (SMD), where a difference <0.1 was taken as evidence of balance. We estimated the incidence rate difference (referred to as excess burden) between groups per 1,000 participants at 6 months after the start of follow-up based on the difference in survival probability between the groups. These analyses were repeated by subgroups on the basis of the number of vaccination shots received (0, 1, or 2+) before reinfection. To test whether the risk on the multiplicative scale differed between the periods, a model with a linear interaction term between reinfection status and period was constructed, and the corresponding p-value is reported for the outcomes of all-cause mortality, hospitalization, and having at least one sequela.
To examine whether risks associated with a reinfection were present in the acute and post-acute phase of the reinfection, we conducted analyses to examine risks in 30-day time intervals starting at time of reinfection up to 180 days after reinfection. Hazard ratios and 30-day burdens were estimated independently for each 30-day time interval. During each 30-day interval outcomes were defined at time of first occurrence within this interval in those that did not have that outcome in the two years prior to the first infection.
We then examined the risk and cumulative burden of sequelae associated with first, two, and three or more infections versus a non-infected control (Supplementary Fig. 4). A third or more infection was defined as a positive test at least 30 days after the second infection. Number of infections and outcomes were assessed in the 180 days following T0 + 30 days. Because participants with three or more infections must have not died during the follow-up period to have that third (or more) infection, we did not examine the outcome of all-cause mortality due to immortal time bias.
Positive and negative controls
We examined, as positive outcome controls, the risk of fatigue in those with a SARS-CoV-2 infection compared to the non-infected control as a means of testing whether our approach would reproduce established knowledge4,5,19-21.
The application of negative outcome control may help detect both suspected and unsuspected sources of spurious biases. We, therefore, examined the difference in risks of atopic dermatitis and neoplasms between those with reinfection and the first infection– where no prior knowledge suggests an association should be expected. The testing of positive outcome control and negative outcome controls may lessen, though not eliminate, concerns about biases related to study design, covariate selection, analytic approach, outcome ascertainment, unmeasured confounding, and other potential sources of latent biases36,37.
All analyses were two-sided. In all analyses, a 95% confidence interval that excluded unity was considered evidence of statistical significance. All analyses were conducted in SAS Enterprise Guide 8.2, and all figures were generated in R 4.0.4. This study was approved the VA St. Louis Health Care System Institutional Review Board (protocol number 1606333).
Data availability: The data that support the findings of this study are available from the US Department of Veterans Affairs. VA data are made freely available to researchers behind the VA firewall with an approved VA study protocol. For more information, please visit https://www.virec.research.va.gov or contact the VA Information Resource Center (VIReC) at [email protected]
Code availability: The analytic code is available at https://github.com/BcBowe3
References
1. Al-Aly, Z., Xie, Y. & Bowe, B. High-dimensional characterization of post-acute sequelae of COVID-19. Nature 594, 259-264 (2021).
2. Cohen, K., et al. Risk of persistent and new clinical sequelae among adults aged 65 years and older during the post-acute phase of SARS-CoV-2 infection: retrospective cohort study. BMJ 376, e068414 (2022).
3. Bull-Otterson L, B.S., Saydah S, et al. Post–COVID Conditions Among Adult COVID-19 Survivors Aged 18–64 and ≥65 Years — United States, March 2020–November 2021. MMWR Morb Mortal Wkly Rep 2022;71:713–717. DOI: http://dx.doi.org/10.15585/mmwr.mm7121e1external.
4. Daugherty, S.E., et al. Risk of clinical sequelae after the acute phase of SARS-CoV-2 infection: retrospective cohort study. BMJ 373, n1098 (2021).
5. Ayoubkhani, D., et al. Post-covid syndrome in individuals admitted to hospital with covid-19: retrospective cohort study. BMJ 372, n693 (2021).
6. Carfi, A., Bernabei, R., Landi, F. & Gemelli Against, C.-P.-A.C.S.G. Persistent Symptoms in Patients After Acute COVID-19. JAMA 324, 603-605 (2020).
7. Xie, Y., Bowe, B. & Al-Aly, Z. Burdens of post-acute sequelae of COVID-19 by severity of acute infection, demographics and health status. Nat Commun 12, 6571 (2021).
8. Al-Aly, Z., Bowe, B. & Xie, Y. Long COVID after breakthrough SARS-CoV-2 infection. Nature Medicine (2022).
9. Taquet, M., et al. Incidence, co-occurrence, and evolution of long-COVID features: A 6-month retrospective cohort study of 273,618 survivors of COVID-19. PLoS Med 18, e1003773 (2021).
13. Adrielle Dos Santos, L., et al. Recurrent COVID-19 including evidence of reinfection and enhanced severity in thirty Brazilian healthcare workers. J Infect 82, 399-406 (2021).
14. Michlmayr, D.a.A., Michael Asger and Meaidi, Marianna and Irshad, Irfatha and de Sousa, Luís Alves and Fonager, Jannik and Rasmussen, Morten and Gubbels, Sophie Madeleine and Rasmussen, Lasse Dam. SARS-CoV-2 Reinfections in Denmark Confirmed by Whole Genome Sequencing. Available at SSRN: https://ssrn.com/abstract=4054457 or http://dx.doi.org/10.2139/ssrn.4054457 (2022).
15. Bowe, B., Xie, Y., Xu, E. & Al-Aly, Z. Kidney Outcomes in Long COVID. Journal of the American Society of Nephrology, ASN.2021060734 (2021).
16. Kind, A.J.H. & Buckingham, W.R. Making Neighborhood-Disadvantage Metrics Accessible - The Neighborhood Atlas. N Engl J Med 378, 2456-2458 (2018).
17. Xie, Y., Xu, E., Bowe, B. & Al-Aly, Z. Long-term Cardiovascular Outcomes of COVID-19. Nature Medicine (2022).
18. Xie, Y., Xu, E. & Al-Aly, Z. Risks of Mental Health Outcomes in People with Covid-19: cohort study. BMJ (2022).
19. Taquet, M., et al. Incidence, co-occurrence, and evolution of long-COVID features: A 6-month retrospective cohort study of 273,618 survivors of COVID-19. PLOS Medicine 18, e1003773 (2021).
20. Davis, H.E., et al. Characterizing long COVID in an international cohort: 7 months of symptoms and their impact. EClinicalMedicine 38, 101019 (2021).
21. Taquet, M., Geddes, J.R., Husain, M., Luciano, S. & Harrison, P.J. 6-month neurological and psychiatric outcomes in 236 379 survivors of COVID-19: a retrospective cohort study using electronic health records. Lancet Psychiatry 8, 416-427 (2021).
22. Xie, Y. & Al-Aly, Z. Risks and burdens of incident diabetes in long COVID: a cohort study. The Lancet Diabetes & Endocrinology 10, 311-321 (2022).
23. Xie, Y., Xu, E. & Al-Aly, Z. Risks of mental health outcomes in people with covid-19: cohort study. BMJ 376, e068993 (2022).
24. Yan Xie, Z.A.-A. Risks and burdens of incident diabetes in long COVID-19: a cohort study. Lancet Diabetes Endocrinol (2022).
25. Spudich, S. & Nath, A. Nervous system consequences of COVID-19. Science 375, 267-269 (2022).
26. Cai, M., Bowe, B., Xie, Y. & Al-Aly, Z. Temporal trends of COVID-19 mortality and hospitalisation rates: an observational cohort study from the US Department of Veterans Affairs. BMJ Open 11, e047369 (2021).
27. Nalbandian, A., et al. Post-acute COVID-19 syndrome. Nature Medicine 27, 601-615 (2021).
28. Daugherty, S.E., et al. Risk of clinical sequelae after the acute phase of SARS-CoV-2 infection: retrospective cohort study. BMJ 373, n1098 (2021).
29. Sharma, A., Oda, G. & Holodniy, M. COVID-19 Vaccine Breakthrough Infections in Veterans Health Administration. in medRxiv 2021.2009.2023.21263864 (2021).
30. Schneeweiss, S., et al. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology 20, 512-522 (2009).
31. Wei, Y., et al. Short term exposure to fine particulate matter and hospital admission risks and costs in the Medicare population: time stratified, case crossover study. BMJ 367, l6258 (2019).
32. Aubert, C.E., et al. Best Definitions of Multimorbidity to Identify Patients With High Health Care Resource Utilization. Mayo Clin Proc Innov Qual Outcomes 4, 40-49 (2020).
33. HCUP CCSR. Healthcare cost and utilization project (HCUP). Agency for Healthcare Research and Quality, Rockville, MD. Vol. 2021.
34. Olvey, E.L., Clauschee, S. & Malone, D.C. Comparison of critical drug-drug interaction listings: the Department of Veterans Affairs medical system and standard reference compendia. Clin Pharmacol Ther 87, 48-51 (2010).
35. Greene, M., Steinman, M.A., McNicholl, I.R. & Valcour, V. Polypharmacy, drug-drug interactions, and potentially inappropriate medications in older adults with human immunodeficiency virus infection. J Am Geriatr Soc 62, 447-453 (2014).
36. Lipsitch, M., Tchetgen Tchetgen, E. & Cohen, T. Negative controls: a tool for detecting confounding and bias in observational studies. Epidemiology 21, 383-388 (2010).
37. Shi, X., Miao, W. & Tchetgen, E.T. A Selective Review of Negative Control Methods in Epidemiology. Current Epidemiology Reports 7, 190-202 (2020).