Data sources, study design, and study period
We conducted a population-based descriptive cohort study using primary care electronic health records from England, UK and Catalonia, Spain.
Primary care electronic health records in England were obtained from the Clinical Practice Research Datalink (CPRD) AURUM, which comprises 20% of the population in the UK.20 22 Spanish data were obtained from the Information System for Research in Primary Care (SIDIAP; www.sidiap.org) database, which captures more than 75% of the population living in Catalonia, a region in the northeast of Spain.19 SIDIAP was linked to hospital discharge records from public and private hospitals in Catalonia (Conjunt Mínim Bàsic de Dades d’Alta Hospitalària, CMBD-AH).23 Both databases include information on demographics, clinical diagnoses, and laboratory tests, including SARS-CoV-2 reverse transcription polymerase chain reaction (RT-PCR) tests. SIDIAP also captures SARS-CoV-2 antigen tests performed at public healthcare facilities. Although information on SARS-CoV-2 antigen testing may appear in CPRD, the counts are expected to be incomplete.
The databases were standardised to the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM),24 allowing the same analytical code to be applied without sharing individual data.
The study period spanned from 1 September 2020 to the end of data availability, i.e. June 2021 for CPRD and March 2022 for SIDIAP, where data was censored to avoid misclassification due to changes in COVID testing policies.
Study population
We defined three non-mutually-exclusive COVID-19 cohorts ‒ (1) all COVID-19 cases, (2) first SARS-CoV-2 infections, and (3) SARS-CoV-2 reinfections ‒ and two negative-test comparator cohorts ‒ (1) first/earliest SARS-CoV-2 negative tests and (2) all SARS-CoV-2 negative tests.
COVID-19 cases were identified using positive SARS-CoV-2 antigen and RT-PCR tests, using the test date as the index date. COVID-19 was defined as infections without a record of SARS-CoV-2 infections in the previous 42 days. First infections were defined as SARS-CoV-2 infections without any prior history of COVID-19. Reinfections were defined as SARS-CoV-2 infections that were not identified as a first infection.
The two negative-test comparator cohorts were identified using negative SARS-CoV-2 antigen and RT-PCR tests, using the test date as the index date. Individuals included in these cohorts were required to have a SARS-CoV-2 negative test result without a clinical COVID-19 diagnosis or positive SARS-CoV-2 test result before the index date and up to 120 days after the index date. First SARS-CoV-2 negative tests were defined as SARS-CoV-2 negative tests without any prior history of a negative test. SARS-CoV-2 negative tests were defined as records of a negative test without a record of a prior negative test 42 days before the index date. Concept lists used to define the COVID-19 and test-negative cohorts are available from https://github.com/oxford-pharmacoepi/LongCOVIDWP1A.
All cohorts included individuals aged ≥18 years with ≥180 days of data visibility available before the index date. Individuals with an influenza clinical diagnosis or positive test result for influenza 42 days before or on the index date were excluded. To ensure sufficient follow-up to develop long-COVID-related symptoms, we only included individuals with ≥120 days of follow-up, i.e., with an index date ≥120 days before the end of data availability. All cohorts were followed until the occurrence of the first of the event of interest, death, new COVID-19 infection, or a record of a COVID-19 clinical diagnosis, influenza infection (positive test result or clinical diagnosis), one year of follow-up, or end of data availability. In SIDIAP, cohorts were also censored on 28 March 2022, as national guidelines no longer recommended testing all suspected COVID-19 cases after that date.
Long COVID symptoms
We identified long COVID symptoms defined by the WHO clinical case definition of “post COVID-19 syndrome”7 based on SNOMED codes in the OMOP CDM mapped respective datasets. Twenty-five symptoms were included: abdominal pain, allergy, altered smell and/or taste, anxiety, blurred vision, chest pain, cognitive dysfunction, cough, depression, dizziness, dyspnoea, fatigue or malaise, gastrointestinal issues (acid reflux, constipation, or diarrhoea), headache, intermittent fever, joint pain, memory issues, menstrual problems, muscle spasms or pain, neuralgia, pins and needles sensation, post-exertional fatigue, sleep disorder, tachycardia, and tinnitus and hearing problems. Separate code lists were developed for each symptom and reviewed independently by three clinicians (https://github.com/oxford-pharmacoepi/LongCOVIDWP1A). Quality checks were conducted to systematically identify missing codes (CohortDiagnostics R package)25.
The WHO definition was then operationalised to identify long COVID in primary care data. Long COVID was defined as having at least one record of any of the pre-defined symptoms between 90 and 365 days after the date of COVID-19 infection and no record of that symptom 180 days before the index date. Figure S1. illustrates this definition.
For the negative-test cohorts, we anchored the algorithm at the date of the negative test to compare the proportion of people with symptoms. In sensitivity analyses, we also reported “ongoing symptomatic COVID-19”26, defined as having at least one record of one of the symptoms ≥28 days after the index date.
Statistical analyses
We developed a common analytical code, which was run locally in OMOP CDM mapped CPRD AURUM and SIDIAP. All results are reported separately by database. We described and compared baseline characteristics (age groups [≤34, 35-49, 50-64, 65-79 and ≥80], sex, calendar time [trimester], COVID-19 vaccine status [unvaccinated and number of vaccine doses received], and co-morbidities) for people with COVID-19 infection and negative-test comparator cohorts. We compared the proportion of people with long and ongoing COVID-19 symptoms (≥90 and ≥28 days) across the five cohorts. We calculated monthly incidence rates per 100,000 person-years for COVID-19 and long COVID in the general population and among people with COVID-19.
To understand which of the pre-specified symptoms would be more differential for long COVID, we matched people with COVID-19 infections and negative controls (first negative tests, and any negative test, respectively) by 5-year age group, SARS-CoV-2 test (antigen or PCR), and index week (ratio 1:3). Rate ratios with 95% confidence intervals for each symptom are presented in forest plots. We similarly matched people with first and re-infections (ratio 1:3) and compared rate ratios for long and ongoing COVID symptoms.
Analyses were conducted in R (version 4.3.1). All analytical code is available at https://github.com/oxford-pharmacoepi/LongCOVIDWP1A.
Patient and Public Involvement
A patient and public representative was involved in planning the overarching project and helped to contextualise the study results using their patient perspective.