Demographic characteristics
The duration of virus shedding was defined as the interval from illness onset until successive negative detection of SARS-CoV-2 RNA, consistent with other studies of COVID-196,7. As of April 30, 2020, a total of 12 non-critical COVID-19 in-patients exhibited LDs of viral shedding durations (viral shedding duration>45 days). Given that the median SARS-CoV-2 viral shedding duration is approximately 3 weeks8, we also examined 38 age- and gender-matched non-critical COVID-19 in-patients whose viral shedding durations were less than 21 days (short durations, SDs), for comparison (Supplemental Table 1). All the patients were identified as laboratory-confirmed SARS-CoV-2 infected patients at Tongji Hospital, Wuhan, China. The median viral shedding duration was 57 days (range: 45–100 days) and 16 days (range: 3–21 days) in LDs and SDs, respectively (log-rank P<0.0001) (Supplemental Table 1) . The basic demographic information of these patients and the clinical parameters are detailed in Supplemental Table 1. Notably, there were no significant differences in comorbidities, complete blood counts (white blood counts, lymphocyte counts, neutrophil counts, platelet counts, and hemoglobin), blood biochemistry (alanine/aspartate aminotransferase and lactate dehydrogenase), and coagulation function (prothrombin, activated partial thromboplastin time, and D-dimer) between LDs and SDs. Moreover, inflammatory markers, such as procalcitonin (PCT), erythrocyte sedimentation rate (ESR), and C-reactive protein (CRP), which have been well reported as high-risk factors of the development of severe COVID-199-11, were comparable in LDs and SDs. Therefore, there is a need for further investigations to identify new biomarkers for viral shedding duration and the underlying mechanism of persistent viral shedding.
Cytokines in SDs, LDs, and healthy donors (HDs)
Cytokines are central to the pathophysiology of COVID-19 and a “cytokine storm” has been described as a feature of COVID-19 severity, which is associated with adverse outcomes12,13. To further elucidate the immune response associated with the viral shedding duration, we checked the serum cytokine/chemokine levels in SDs, LDs, and 22 HDs. Intriguingly, from among the 48 cytokines/chemokines detected, 21 inflammatory cytokines/chemokines had the lowest levels in LDs when compared to SDs or HDs (Fig.1c). Of these, platelet-derived growth factor (PDGF-BB) (P = 0.000065), C-C motif ligand 5 (CCL5) (P = 0.00011), and macrophage migration inhibitory factor (MIF) (P = 0.00015) showed the most significant changes (Fig.1c and Extended Data Fig.1). Additionally, IL-1β, IL-2, IL-2R, IL-9, IL-18, TNF-α, and TNF-β, the upregulation of which contributed to lung injury, multiorgan failure, and ultimately death14-16, were present at lower levels in LDs (Fig.1c). Collectively, LDs of viral shedding were associated with a weaker inflammatory response characterized by low circulating concentrations of cytokines and chemokines.
Differences in cell compositions detected using single-cell transcriptomes of human peripheral blood mononuclear cells (PBMCs)
To characterize the immunological features of LDs and SDs compared to HDs, we performed 10X Genomics scRNA-seq to study the transcriptomic profiles of PBMCs from HDs (n = 3) and from patients exhibiting LDs (n = 5) and SDs (n = 4) of viral shedding (Supplementary Table 2).The demographics, clinical features, and laboratory findings of these patients are listed in Supplementary Table 2. After the unified single-cell analysis pipeline (see Methods), in total, 124,606 cells across all subjects, with an average of 10,384 cells per sample, were integrated into an unbatched and comparable dataset (Supplementary Table 2). In accordance with previous reports17, we did not detect SARS-CoV-2 RNA expression in the PBMCs of these patients (Extended Data Fig.2).
Using unsupervised clustering of uniform manifold approximation and projection (UMAP), we identified 20 cell populations based on the expression of canonical cell-type gene markers (Fig.2a-b, Extended Data Fig.3). To reveal the differences in cell composition across LDs and SDs and to compare them with that of HDs, we investigated the relative proportions of immune cells among the three groups (Fig.2c-d). While there were limited differences between SDs and HDs, significant differences were observed between LDs and HDs. Compared to that in the HDs, the proportion of natural killer (NK) cells and CD14+ monocytes were significantly decreased in LDs (Fig.2d). The massive decreases in NK cells and CD14+ monocytes in LDs were in accordance with the observed decrease in inflammatory cytokines in LDs. Notably, inflammatory monocytes, induced by T cells, have been reported to incite the cytokine storm in COVID-1918. In contrast, regulatory T cells (Tregs) were particularly elevated in LDs (Fig.2d). Given the importance of Tregs in secreting immunosuppressive cytokines and inhibiting the activation of both innate and adaptive immune cells11,19,20, elevated levels of Tregs could be one cause of the suppressed immune response observed in LDs. Taken together, these results demonstrated that the decreased NK cells and CD14+ monocytes, as well as increased Tregs in LDs may prevent the immune system from overreacting but contribute to the persistence of the virus.
Transcriptional signatures associated with LDs
Next, we performed hierarchical clustering based on relative gene expression changes with respect to the HDs to evaluate the molecular difference of each cell type in LDs and SDs. Unexpectedly, all cell types among the PBMCs clustered together according to the disease groups rather than by cell-types, with the exception of hematopoietic stem cells (HSCs), plasma B cells, and megakaryocytes (Extended Data Fig.4). This indicates that the molecular features of PBMCs in LDs and SDs are markedly different, regardless of the cell type. Therefore, we sought to identify variations in the relevant biological functions in individual cell types through differentially expressed gene (DEG) and Gene Oncology (GO) analyses. Most importantly, we found that protein targeting to the membrane, endoplasmic reticulum (ER) related pathways, translation related pathways, and ribosome small subunit assembly pathway was consistently downregulated in all cell types in LDs, with the exception of gamma delta (γδ) T cells, mucosal associated invariant T (MAIT) cells, and megakaryocytes (Fig.3a). In agreement with the GO results, we found that many genes encoding ribosomal proteins and the protein synthesis related proteins were specifically downregulated in LDs (Fig.3b-c). Interestingly, RPL41, RPS29, RPL36A, RPS27, RPS21, RPS10, RPL38, RPL39, and RPS28 localize to the ER and participate in protein synthesis, folding, and assembly, as detailed in the information provided on https://www.proteinatlas.org/. TMA721, TAF1022, and PTOV123 were also specifically downregulated in LDs. These genes have previously been reported to be associated with ribosomes, and their overexpression promotes global protein synthesis. Given that antibodies24 and cytokines25 are synthesized, folded, modified, and assembled by the rough ER and attached ribosomes, these findings suggest that immune cells of LDs tend to have reduced cytokine synthesis, folding, and assembly functions, which is consistent with the lower levels of inflammatory cytokines observed in LDs (Fig.1c). CEBPB and MAP3K8, which are in involved in the production of pro-inflammatory cytokines, and ZFP3626 and PDE4D27, which are involved in IL-2 production, were selectively reduced in LDs (Fig.3b). Additionally, genes involved in T cell activation (PCBP1, ARPC2), migration (FMNL1), cytotoxic function (GNLY, SRM), transcription factors (LYN), and downstream signal transduction (COTL1) were all reduced in LDs (Fig.3b-c). Given that cytokines are produced by several immune cells, including adaptive T cells 28, the reduced cytokine levels in LDs are at least partially explained by these findings. Meanwhile, IFITM2, an interferon (IFN)-stimulated gene (ISG), vital for viral clearance29, was downregulated in LDs and may contribute to the longer viral persistence in LDs (Fig.3b-c).
Extended viral shedding time leads to higher transmission probability and requires stricter infection control, the use of additional hospital resources, and result in more economic costs. Therefore, there is increased interest in early patient stratification to identify those who are more likely to recover rapidly in order to make hospital resources available for those at higher risk30. We assessed whether the RP levels could be used for stratification of the viral shedding duration by integrating bulk-RNA-seq data from the 103 independent non-critical COVID-19 patients whose viral shedding durations were available. Remarkably, we found that lower expression of RPs was associated with longer viral shedding durations, including the following RPs identified in our scRNA-seq data: RPL38, RPL41, and RPS10 (Fig.3d-e). In summary, lower RP levels were associated with persistent viral shedding, and specific RPs could be applied as indicators of longer viral shedding.
Molecular features of T and NK cells in LDs and SDs
We next performed sub-clustering analysis on T and NK lymphocytes considering their crucial anti-viral effects31,32. UMAP embedding of T and NK cells from all the samples identified substantial differences in the cellular phenotypes of CD4+ T, CD8+ T, NKT, and NK cells (Fig.4a-b). Although the proportion of T cells was comparable between LDs and SDs (Fig.2d), the correlation matrix revealed that the molecular features differed between the two groups (Fig.4c), such as memory CD8+ T cells and NK cells.
For example, in memory CD8+ T cells, DEGs involved in T-cell activation (CD74, SELENOK, FYN, CCL5, and RNF125), positive regulation of cytokine production (IRF1, SELENTOK, and HMGB2), proinflammatory mediators of secretion, and interferon-γ (IFN-γ) pathways (IRF1, HLA-DPB1, HLA-DPA1, HLA-DRB1, CCL5, and CCL4) were specifically downregulated in LDs, while they were upregulated in SDs compared to HDs (Fig.4d-e). In addition, in NK cells, DEGs associated with the positive regulation of T-cell activation pathways (ZFP36L2, SELENOK, SLA2, ZBTB1, MAP3K8, IRF1, CEBPB, RUNX3, and PIK3R1) were profoundly enriched in SDs but not in LDs (Fig.4f-g), suggesting consistent immune cell dysfunction in LDs that might be associated with the mild secretion of proinflammatory cytokines and contribute to viral persistence.
Next, we reconstructed T cell antigen receptor (TCR) sequences from the TCR sequencing data. Briefly, more than 70% of the cells in all the subsets had matched TCR information, with the exception of the γδT and NKT subsets (Fig.4h). Compared to HDs, clonal expansion was obvious in patients with COVID-19, especially in those with SDs of viral shedding (Fig.4i). Meanwhile, the proportion of large clonal expansions (clonal size>30), primarily in cytotoxic cells, was higher in SDs (Fig.4i), indicating that SDs have more efficient clonal expansion of effector T cells than LDs to promote viral clearance.
To explore the preferential V and J combinations in SDs and LDs, we first analyzed and listed the V and J combinations most frequently observed in the TCRs in all samples (Fig.4j). Among these combinations, relatively frequent pairings of the TCR in HDs were TRBV29::TRVJ2-7 and TRAV29/DV5::TRAJ20, while TRAV29/DV5::TRAJ49 and TRBV9::TRBJ1-3 were frequent in LD patients, and TRAV12-3::TRAJ54 and TRAV1-2::TRAJ49 were frequent in SD patients (Fig.4j). The selective usage of V(D)J genes suggests that different immunodominant epitopes may drive the molecular composition of T-cell responses and may be associated with LD and SD infection.
Features of B cell subsets in LDs
Some T cells and cytokines prime B cells for maturation, which go on to become plasma cells and produce pathogen neutralizing antibodies33. We subclustered B cells into three subsets according to the expression and distribution of canonical B-cell markers (Extended data Fig.5a-b). Compared with HDs, plasma B cells were not significantly increased in SDs, which may be due to sampling during the convalescent period34 (Extended Data Fig.5a, Fig.2d). In LDs, despite viral persistence, the proportion of plasma cells was also extremely low, which may indicate that LDs fail to produce sufficient neutralizing antibodies (Extended Data Fig.5a, Fig.2d). Previous studies5,35 have suggested that antibodies produced by plasma cells in response to SARS-CoV-2 during initial exposure disappeared within a few weeks, but memory B cells persisted for much longer. Therefore, we compared the expression profiles of memory B cells in the three groups. Interestingly, the pathways involved in T-cell differentiation (CD83, ZFP36L2, and GPR183) and cell growth and activation (CD83, ZFP36L2, GPR183, and PELI1) were selectively enriched in SDs but not LDs, indicating that B and T cells in LDs failed to synergize in order to clear the virus (Extended Data Fig.4d-e). Moreover, HMGB2 and PDE4B, which positively regulate the production of cytokines such as IL-2, and the pathways involved in leukocyte chemotaxis (LYN, DUSP1, and RAC1) were exclusively enriched in SDs (Extended data Fig. 4c-d).