Study design and population
This study used longitudinal data from COVIMPACT cohort study conducted in Belgium [9]. People aged 18 years and older, living in Belgium, with a recent SARS-CoV-2 infection confirmed via a molecular or an antigen test were contacted for tracing purposes by contact tracing call centers [15]. Participants completed two online questionnaires: a baseline questionnaire aiming to assess the initial health status of the participants and their status during the acute phase of the illness, and a follow-up questionnaire that is sent every three months after participants enter into the cohort [15, 16]. In total, 9,199 individuals completed both questionnaires from April 1st, 2021, to September 22nd, 2022. However, to limit misclassification of SARS-CoV-2 variants (see Exposure assessment), only 8,238 participants were included in this study. The published study protocol [9] showed that 5% of all Belgian adults infected during the study period completed the baseline questionnaire and that the follow-up participation rate was 79%.
Measurement
Outcome assessment
The primary outcome of this study was the PCC status of participants, i.e. whether or not they self-reported at least one symptom of PCC three months after the infection (binary variable). PCC was assessed in the 3-months follow-up questionnaire, based on the definition form the National Institute for Health and Care Excellence (NICE): “signs and symptoms that develop during or after an infection consistent with COVID-19, continue for more than twelve weeks and are not explained by an alternative diagnosis” [17]. The question asked was: “Within the last seven days have you had any of these symptoms, that you did not experience before onset of your COVID-19 illness?”. Thirty potential PCC symptoms were listed, based on published guidelines of the WHO and the NICE [1, 17].
The second outcome of this study was PCC symptoms categories. Based on the classification proposed by Fernandez de las Peñas et al., the thirty potential PCC symptoms collected in this study can be classified as follows [18]: (1) Neurocognitive post-COVID (sleeping problems, headache, memory problems, dizziness, confusion, problems speaking, incontinence, seizures), (2) autonomic post-COVID (chest pain, palpitations), (3) gastrointestinal post-COVID (constipation/diarrhea, stomach pain, nausea/vomiting), (4) respiratory post-COVID (general fatigue, dyspnea, persistent cough), (5) musculoskeletal post-COVID (muscle pain, joint pain, swelling-oedema), (6) anosmia and/or dysgeusia post-COVID (loss of smell, loss of taste), (7) other manifestations (tingling feeling, loss of appetite, problem seeing, ringing in ears, general malaise, weight loss, skin rashes, problem swallowing, other symptom(s)). Each symptom category was a binary variable (having or not these symptoms) and as people with PCC tend to report numerous and heterogeneous symptoms, a participant can be in more than one category.
Exposure Assessment
The exposure variable in this study was the type of SARS-CoV-2 variants and subvariants, which was an indicator at the population level extracted from the SARS-CoV-2 surveillance system of Sciensano, the Belgian institute for health [19]. In this surveillance system, at least 5% of all positive RT-PCR samples in Belgium was randomly selected for sequencing [20], which met the recommendation from the European Centre for Disease Prevention and Control (ECDC) [21]. Since 5 April 2021 until 17 September 2022, the Belgium COVID-19 Epidemiological Situation Dashboard of Sciensano reported daily percentages of eight SARS-CoV-2 (sub)variants including Alpha, Beta, Gamma, Delta, Omicron, Omicron BA.2, Omicron BA.4 and Omicron BA.5 [19].
Each study day, the dominant variant (i.e. responsible for more than 80% of daily infections) was selected and assigned to participants who reported testing positive for SARS-CoV-2 on the same day. The threshold of 80% was chosen to limit the misclassification of the exposure variable due to periods with concomitant variants [22]. Consequently, the days for which no variant was above the 80% threshold were not included in the analysis (e.g. a day when the Alpha variant was responsible for 50% of infections and the Delta the other 50%). The analyses were therefore carried out on 407 days out of the 448 of the study period (91%) and on 8238 participants out of the 9199 (90%). Sensitivity analyses were conducted with a 70% and 90% threshold (supplementary table 1). Additional sensitivity analyses were also performed to compare the profile of excluded and included cases (supplementary table 2).
Other Covariates
Covariates including demographic factors, the presence of comorbidities, vaccination status before COVID-19 (completed one, two, or three doses), and the number of COVID-19 acute symptoms were collected in the baseline questionnaire as they may be associated with the probability of having PCC [23].
Statistical analysis
All analyses were carried out using the statistical package R version 4.1.0 [24]. The number of missing data among each variable is reported in Table 2. Descriptive statistics were performed to show the distribution of variables. Categorical variables were summarized by the frequencies and percentages of the levels, while numeric variables were summarized by their mean and standard deviation or median and IQR (inter-quartile range) based on their distribution. Chi-square tests were used to test the difference of the distribution of PCC status (% having or not PCC) in the different explanatory variables.
Multivariable logistic regression was used to assess the association between the different SARS-CoV-2 variants and PCC. The final model was developed based on three steps. First, a multivariable logistic regression model was used, which included all covariates as fixed effects. As all covariates were significantly associated with PCC, they were kept to the next step. Second, due to the association between SARS-CoV-2 variants and the severity of COVID-19 acute infection [12], interactions between SARS-CoV-2 variants and covariates related to the acute infection (i.e. number of COVID-19 acute symptoms and hospitalization status) were tested, but turned out to be not significant. Finally, we performed the Hosmer-Lemeshow goodness of fit test to determine whether the fitted model adequately described the outcome in the data (p = 0.27). The p-value was above alpha = 0.05 (accepted H0), so the null hypothesis could not be rejected, meaning this model predicted the outcome well. Sensitivity analyses with a 70% and 90% threshold (see supplementary material) confirmed the result from the multivariable logistic regression model.
To investigate a group of predictors that can be used to make accurate predictions for PCC, the decision tree method was performed using the rpart package and a chi-squared automatic interaction detector, with 80% participants as the training dataset, and the remaining 20% as the test dataset. Table 2 shows the variable assignments utilized in both the logistic regression analysis and the decision tree model.
The difference in the distribution of the seven groups of PCC symptoms between the different SARS-CoV-2 variants was described by using frequencies and percentages and tested using Pearson's Chi-squared test. Seven multivariable logistic regression models were performed to assess the association between the different SARS-CoV-2 variants and PCC symptom categories with the same approach as with the model on PCC. The Hosmer-Lemeshow goodness of fit test was used for seven models and no significant p-values were reported, meaning that these models well predicted the outcome.