Clinical characterization
The primary cohort included healthy convalescent individuals (controls, n = 70) and patients with long COVID (cases, n = 70) recruited from University Hospital Llandough (UK) (Table 1). All participants had a clearly defined episode of acute COVID-19 confirmed via direct molecular evidence of infection with SARS-CoV-2. Intergroup comparisons revealed largely equivalent distributions for age (cases, median = 45 years; controls, median = 43 years), body mass index (BMI; cases, median = 29.8; controls, median = 28.5), ethnicity (cases, white = 88.6%; controls, white = 82.9%), sex (cases, female = 74.3%; controls, female = 77.2%), time since initial reported infection (cases, median = 416 days; controls, median = 268 days), and vaccination against SARS-CoV-2 (cases, median number of vaccinations = 3; controls, median number of vaccinations = 3) (Fig. 1A,B and Table 1). All baseline medical evaluations were normal in patients with long COVID. Symptom scores are depicted in Fig. 1C. Breathlessness was further assessed using the Dyspnea-12 Questionnaire, scored out of 36, and the Nijmegen Questionnaire, scored out of 64, which provides a metric for hyperventilation (Fig. 1D). Pain was most commonly localized to the chest (31%), joints (26%), and muscles (16%) in patients with long COVID (Fig. 1E). The secondary cohort included healthy convalescent individuals (controls, n = 20) and patients with long COVID (cases, n = 56) recruited from the Karolinska University Hospital (Sweden).
Suboptimal neutralizing antibody titers are a feature of long COVID
To evaluate the humoral immune system, we measured total SARS-CoV-2 spike-specific immunoglobulin (Ig) titers, neutralization activity, and antibody-dependent natural killer (NK) cell activation (ADNKA) in plasma samples obtained from donors in the UK. Healthy convalescent individuals exhibited significantly better neutralization activity in standard plaque reduction assays than patients with long COVID (Fig. 1F), despite equivalent overall titers of antibodies targeting the spike protein of SARS-CoV-2 (Fig. 1G). However, no such intergroup differences were apparent for ADNKA measured as a cumulative metric against all expressed viral target proteins using healthy donor cell preparations with a surrogate marker of potential cytotoxicity (Fig. 1H), namely CD57 19.
Collectively, these findings identify a qualitative deficit in the humoral immune response against SARS-CoV-2, specifically impacting neutralization activity in patients with long COVID.
Immune cell perturbations are limited in patients with long COVID
To evaluate the cellular immune system, we first conducted a multidimensional flow cytometric analysis of the major lineages typically present among peripheral blood mononuclear cells (PBMCs) obtained from donors in the UK (Fig. 2A). Using dimensionality reduction and Gaussian mixture models, we identified clusters that corresponded to the major lineages of monocytes, B cells, NK cells, and T cells (Fig. 2B and Supplemental Figure S1), but these analyses were unable to differentiate reliably between some other immune cell subsets, such as basophils and plasmacytoid dendritic cells (pDCs) (Fig. 2B) and were also unable to differentiate reliably between healthy convalescent individuals and patients with long COVID (Fig. 2C). We therefore interrogated the data conventionally using a manual flow cytometric gating strategy (Supplemental Figure S2A).
In the adaptive lymphocyte compartment, similar proportions of naive B cells, total B cells, naive T cells, total T cells, naive CD4+ T cells, total CD4+ T cells, naive CD8+ T cells, and total CD8+ T cells were identified in healthy convalescent individuals and patients with long COVID (Fig. 2D), and in the innate lymphocyte compartment, similar proportions of immature NK cells (CD16−CD56bright), mature NK cells (CD16+CD56dim), total NK cells (including CD16−CD56dim), and total innate lymphoid cells (ILCs, CD127+) were identified in healthy convalescent individuals and patients with long COVID (Fig. 2E). A comparable pattern was observed for classical monocytes (CD14+), intermediate monocytes (CD14+CD16+), and conventional DCs (cDCs, CD11c+CD123−) in the myeloid cell lineage, whereas the proportions of nonclassical monocytes (CD16+) were relatively increased in healthy convalescent individuals (Fig. 2F), and the proportions of basophils (CD123+HLA-DR−) and pDCs (CD123+HLA-DR+) were relatively increased in patients with long COVID (Fig. 2G). Hierarchical clustering confirmed these differences within an otherwise rather uniform immune cell landscape (Fig. 2H,I). In contrast, no such perturbations were apparent in the secondary cohort of donors recruited from Sweden, although the proportions of classical monocytes were relatively decreased and the proportions of intermediate monocytes were relatively increased in patients with long COVID (Supplemental Figure S3A–D).
Collectively, these data indicate that immune cell perturbations are inconsistent, subtle, and generally confined to the myeloid compartment in patients with long COVID.
T cell immunity is largely unaltered in patients with long COVID
To extend these findings, we quantified memory CD4+ and CD8+ T cell responses against SARS-CoV-2 and the persistent herpesviruses CMV and EBV, exposure to which has been differentially linked with the development of long COVID 10, 16, 20, 21. We used activation-induced marker (AIM) assays for this purpose, enumerating functional antigen-specific CD4+ T cells via the upregulation of CD69 and CD40L (CD154) and functional antigen-specific CD8+ T cells via the upregulation of CD69 and 4-1BB (CD137) directly ex vivo 22, 23 after stimulation with peptide pools spanning the major immunogenic proteins from SARS-CoV-2 (spike, nucleocapsid, combined membrane and envelope, ORF1a, ORF1b, and ORF3–10) and selected immunogenic proteins from CMV (IE-1, IE-2, and pp65) and EBV, the latter segregated according to lytic (BRLF1, BZLF1, BMLF1, and BARF1) and latent phases (EBNA1, EBNA2, EBNA3A, EBNA3B, EBNA3C, and LMP2) of the viral life cycle (Supplemental Figure S2B). The frequencies of antiviral CD4+ and CD8+ T cells were statistically indistinguishable across all of these specificities in healthy convalescent individuals and patients with long COVID recruited from the UK (Fig. 3A). In contrast, the frequencies of CD4+ T cells targeting the SARS-CoV-2 nucleocapsid protein and the EBV latent proteins and the frequencies of CD8+ T cells targeting the SARS-CoV-2 spike protein, the combined CMV proteins, and the EBV lytic proteins were higher in patients with long COVID versus healthy convalescent individuals recruited from Sweden (Supplemental Figure S3E).
In further experiments, we measured the expression of immunophenotypic markers related to activation, memory, effector function, and exhaustion among CD4+ and CD8+ T cells targeting defined proteins from SARS-CoV-2, CMV, or EBV. In the primary cohort, no significant intergroup differences in expression intensity were observed for CD28, CD39, CD71, CD95, CX3CR1, or PD-1, but some markers of activation (CD38 and HLA-DR), exhaustion (TIGIT), and stemness (CD127) were variably downregulated among some antiviral CD4+ and CD8+ T cell populations in the context of long COVID (Fig. 3B). More profound differences were apparent in the secondary cohort, potentially reflecting the limited number of healthy convalescent individuals relative to the number of patients with long COVID (Supplemental Figure S3F). Lineage analysis further revealed comparable expression of CD38, CD69, HLA-DR, and PD-1 among global effector and effector memory CD4+ and CD8+ T cells in healthy convalescent individuals and patients with long COVID recruited from the UK (Fig. 3C).
Collectively, these results demonstrate that experimental findings are not necessarily transferable across geographically distinct cohorts of patients with long COVID, likely reflecting differences in clinical characterization and sample size, but nonetheless align overall with the notion that circulating antiviral CD4+ and CD8+ T cell populations are largely equivalent in healthy convalescent individuals and patients with long COVID 24.
SARS-CoV-2-specific CD8+ T cells differ phenotypically as a function of specificity
To refine our phenotypic analyses, which were potentially confounded by alterations in surface marker expression arising as a consequence of antigen-induced activation, we used peptide-HLA class I tetramers directly ex vivo to identify and characterize unperturbed CD8+ T cells targeting specific epitopes from SARS-CoV-2, CMV, EBV, or influenza A virus (IAV) 25, 26. For this purpose, we selected healthy convalescent individuals (n = 17) and patients with long COVID (n = 15) from the primary cohort based on the expression of HLA-A*02:01, and/or HLA-B*07:02, and as a means to calibrate our findings against CD8+ T cells with known features of exhaustion 27, we performed similar analyses using samples from untreated patients infected with human immunodeficiency virus type 1 (HIV-1), extending the range of specificities to include epitopes restricted by HLA-A*24:02, HLA-B*08:01, and HLA-B*57:01 (Fig. 4A).
CD8+ T cells targeting spike epitopes from SARS-CoV-2 expressed CD38 and HLA-DR more commonly than CD8+ T cells targeting nonspike epitopes from SARS-CoV-2 (Fig. 4B and Supplemental Figure S4A), likely as a consequence of repeated subunit vaccination. Similarly, CD8+ T cells targeting viral epitopes associated with persistent (CMV, EBV, and HIV-1) or recurrent antigen exposure (IAV) expressed CD38 and HLA-DR more commonly than CD8+ T cells targeting nonspike epitopes from SARS-CoV-2, and a comparable pattern was observed for expression of the cytotoxic serine protease granzyme B (Fig. 4B). Coinhibitory receptor expression also varied as a function of viral specificity, typically paralleling the likely frequency of antigen exposure (Fig. 4C). Of particular note, we found that CD8+ T cells targeting spike epitopes from SARS-CoV-2 expressed coinhibitory receptors more intensely than CD8+ T cells targeting nonspike epitopes from SARS-CoV-2, based on a combined score for PD-1, TIM-3, LAG-3, and TIGIT (Fig. 4D). No such differences were observed for the transcription factors TCF-1, T-BET, or EOMES (Fig. 4E). However, CD8+ T cells targeting epitopes from CMV or HIV-1 expressed T-BET more intensely than CD8+ T cells targeting nonspike epitopes from SARS-CoV-2, and CD8+ T cells targeting epitopes from CMV, EBV, or HIV-1 expressed EOMES more intensely than CD8+ T cells targeting nonspike epitopes from SARS-CoV-2 (Fig. 4E).
Collectively, these observations support the premise that antigen exposure drives the expression of activation markers and coinhibitory receptors as a function of viral specificity and further suggest that such encounters are not sufficiently frequent in the convalescent phase to induce exhaustion among CD8+ T cells targeting nonspike epitopes from SARS-CoV-2, irrespective of progression to long COVID.
SARS-CoV-2-specific CD8+ T cells overexpress coinhibitory receptors in patients with long COVID
To determine if any of these phenotypic attributes segregated with disease, we visualized our flow cytometry data using the dimensionality reduction technique Uniform Manifold Approximation and Projection (UMAP), focusing on CD8+ T cells targeting nonspike epitopes from SARS-CoV-2. A largely overlapping distribution was observed for healthy convalescent individuals and patients with long COVID (Fig. 4F). Phenograph analysis further revealed seven clusters, most of which displayed an even representation (Fig. 4G). However, clusters 3 and 7 were more obviously represented among healthy convalescent individuals, and cluster 5 was more obviously represented among patients with long COVID (Fig. 4G). Of note, cluster 5 exhibited the highest expression intensities of coinhibitory receptors, including PD-1 (Fig. 4G and Supplemental Figure S4B). In line with this observation, we found that CD8+ T cells targeting nonspike epitopes from SARS-CoV-2 expressed coinhibitory receptors more intensely in patients with long COVID versus healthy convalescent individuals, reaching significance for TIM-3 (Fig. 4H). No such differences were observed for CD8+ T cells targeting spike epitopes from SARS-CoV-2 (Fig. 4H). Moreover, CD8+ T cells targeting nonspike epitopes from SARS-CoV-2 displayed higher coinhibitory scores in patients with long COVID versus healthy convalescent individuals, suggesting a link between antigen exposure and disease (Fig. 4I). It was also notable that coinhibitory scores varied across specificities within the nonspike repertoire (Fig. 4J).
In further analyses, we found that CD8+ T cells targeting nonspike or spike epitopes from SARS-CoV-2 expressed TCF-1, a key determinant of memory formation, more intensely in healthy convalescent individuals versus patients with long COVID (Fig. 4K). Moreover, CD8+ T cells targeting lytic epitopes from EBV expressed CXCR3 more commonly, granzyme B less commonly, and TCF-1 more intensely in healthy convalescent individuals versus patients with long COVID (Fig. 4L and Supplemental Figure S4C). No such differences were observed for CD8+ T cells targeting epitopes from CMV or CD8+ T cells targeting latent epitopes from EBV (Supplemental Figure S4D,E).
Collectively, these findings suggest a possible role for cumulative viral antigen exposure in the pathogenesis of long COVID, potentially accompanied by suboptimal immune control of EBV.
Dysregulation of the plasma proteome is associated with breathlessness in patients with long COVID
To explore disease pathogenesis more systemically, we used a data-driven approach to select healthy convalescent individuals (n = 51) and patients with long COVID (n = 51) from the primary cohort for plasma proteome characterization using a targeted affinity platform (Olink Explore 3072). Briefly, immune cell subset proportions were summarized via principal component analysis (PCA), and outlier samples were excluded based on the greatest deviation from the origin along PC1 to PC4. Targets were grouped into eight panels under the broad themes cardiometabolic (n = 2), inflammation (n = 2), neurology (n = 2), and oncology (n = 2). PCA revealed that donors could not be separated by disease status but could be separated to some extent by BMI (Fig. 5A). We then performed a differential expression analysis, which showed that many proteins were upregulated in the context of long COVID, albeit mostly below the threshold for significance (Fig. 5B and Supplemental Figure S5A), and a gene set enrichment analysis (GSEA), which showed that several pathways, including those related to ceramide, platelet-derived growth factor receptor β (PDGFRB), and HIV-1 Nef, were upregulated in the context of long COVID (Supplemental Figure S5B).
To extend our analyses beyond a simple binary classification, we stratified donors into three groups for each clinical symptom, irrespective of the initial assignation to categories defined as healthy convalescent individuals or patients with long COVID (Fig. 5C). Donors with severe breathlessness (score, 6–10) segregated from donors with no (score, 0) or mild breathlessness (score, 1–5) via PCA (Fig. 5D), and the corresponding symptom scores correlated only weakly with BMI (Fig. 5E), which is known to impact the plasma proteome 28. The severity of other symptoms also correlated poorly with the confounding factors of age and BMI (Supplemental Figure S5C). A differential expression analysis comparing donors with no breathlessness versus donors with severe breathlessness further revealed clear proteomic signatures across all eight panels after appropriate statistical correction (Fig. 5F). In contrast, no significant differences were observed for other symptoms grouped by severity (Supplemental Figure S5D), and a shared pattern of differentially expressed proteins corroborated the importance of breathlessness as a key driver of the initially observed binary divergence (Fig. 5B).
To identify proteins with outsized roles in the breathlessness signatures linked with inflammation, we performed network analyses using Cytoscape. TNF receptor-associated factor 2 (TRAF2) emerged as a central hub protein in a module that also contained caspases (CASP2 and CASP9), kinases (MAP2K6 and IKBKG), and CD40 (Fig. 5G). These proteins exhibited stepwise increases in abundance in relation to the level of reported breathlessness and were enriched in plasma samples from patients with long COVID (Fig. 5H). Using the ranked list of differentially expressed proteins, we then performed a symptom-targeted GSEA. Severe breathlessness was associated most significantly with enrichment of the ceramide and thromboxane A2 (TXA2) pathways, whereas fatigue and mobility were associated most significantly with enrichment of the cell cycle-related E2F and androgen receptor (AR) transcription factor pathways, respectively (Fig. 5I).
Collectively, these data show that severe breathlessness is associated with marked dysregulation of the plasma proteome, highlighting the potential for symptom-related biomarker discovery in the context of long COVID.
Proteomic signatures of breathlessness yield biomarkers of long COVID
To extend these findings, we linked breathlessness scores directly with plasma protein abundances, calculating pairwise correlation coefficients across all markers in the analyzed proteome (Fig. 6A). A general skewing toward positive correlations was observed for many proteins, reinforcing the broad upregulation pattern we had observed previously, whereas a largely symmetric distribution of negative and positive correlations was observed in relation to age (Supplemental Figure S6A). Using this ranked list of correlated proteins, we identified enriched pathways associated with phenotypes related to breathlessness, such as atelectasis and tachypnea, and phenotypes not obviously related to breathlessness, such as elevated hepatic transaminases and lymphedema (Fig. 6B). The three most positively correlated proteins were ribosomal protein S10 (RPS10), isopentenyl-diphosphate δ-isomerase 2 (IDI2), and small proline-rich protein 3 (SPRR3) (Fig. 6C), whereas the three most negatively correlated proteins were ectonucleotide pyrophosphatase/phosphodiesterase family member 5 (ENPP5), olfactory marker protein (OMP), and myocilin (MYOC) (Fig. 6D). GSEA further revealed significant enrichment of five pathways from the Hallmark Collection and six pathways from the Pathway Interaction Database (PID), which overlapped with those identified using symptom categorization (Fig. 5I). The most significantly enriched hit was the transforming growth factor-β receptor (TGFBR) pathway (Fig. 6E). In addition, enrichment for terms such as oxidative phosphorylation, fatty acid metabolism, and mitotic spindle suggested that a plasma proteome reflecting an activated cellular state could be a common feature of long COVID (Fig. 6E).
It is known that changes associated with ageing are reflected in the plasma proteome 29, 30. Accelerated ageing has also been linked with COVID-19 31, 32. In our dataset, we validated some of these biomarkers 29 as a function of age, observing correlation coefficients as high as 0.81 for elastin (ELN) and as low as − 0.6 for immunoglobulin superfamily DCC subclass member 4 (IGDCC4) and podoxycalin-like protein 2 (PODXL2) (Supplemental Figure S6B). However, the strongest correlation trends observed among heathy convalescent individuals showed no obvious deviations in either direction as a function of disease, suggesting that systemic dysregulation of the plasma proteome associated with normal ageing is not a prominent feature of long COVID.
Collectively, these results identify dysregulated proteins that could serve as biomarkers for breathlessness after infection with SARS-CoV-2, potentially facilitating the diagnosis of long COVID.