Overview of the enrollment
A total of 326 patients, who were tested positive for SARS-CoV–2 RNA and were admitted into Shanghai Public Health Clinical Center from Jan 20th to Feb 25th, which was the designated hospital receiving all the COVID–19 cases in Shanghai, were included in this study. Their basic clinical and epidemiological information were shown in Extended Data Table 1. The median age of the patients was 51 years (range 15–88) with a male: female sex ratio of 1.10. Among them, 90 were residents of Hubei province coming to Shanghai. 80 were Shanghai residents who had recently been to Hubei province, 52 contacted people from Hubei and 104 had unidentifiable exposures. Four categories of infected cases were defined: Five individuals were asymptomatic as having no obvious fever, respiratory symptoms or radiological manifestations (asymptomatic cases). A majority of the patients, 292 patients had mild disease with fever and radiological manifestations of pneumonia(mild cases). There were 12 patients with severe symptoms of dyspnea (respiratory rate >30/min) and signs of expanding ground-glass opacity in the lung within 24–48 hours and were defined as severe cases (severe case). Another 17 patients deteriorated into acute respiratory distress syndrome (ARDS) and required mechanical ventilation or extracorporeal membrane oxygenation (ECMO) and and thus, were categorized as critical (Extended Data Table 1). As of March 13, 308 patients had been discharged, three had deceased (fatality rate 1.14%). The most common comorbidities were hypertension (76 cases), diabetes (24 cases), coronary heart disease (13 cases), chronic hepatitis B (10 cases), chronic obstructive pulmonary disease (2 cases), chronic renal disease (2 cases) and cancer (3 cases).
Nucleotide variation in viral genomes
We sequenced the viral genome using sputum and oropharyngeal swab samples from enrolled patients. Sequencing data from a total of 112 samples passed quality control and were used for nucleotide variation calling (Extended Data Fig. 1). As compared to the first genome of isolated virus (Wuhan-Hu–1), a total of 66 synonymous and 103 nonsynonymous variants were identified in nine protein coding regions (Extended Data Fig. 2a-b). Mutation rates of ORF1ab, S, ORF3a, E, M and ORF7a were similar (~0.5%), while variation rates of the ORF8 (1.37%) and N (1.51%) were much higher (Extended Data Fig. 2a-b). The recurrence of variations in the viral genome is similar between Shanghai samples and the GISAID datasets (Extended Data Fig. 2c).
Genomic phylogeny analysis
We next used 94 samples with over 90% genome coverage together with 221 sequences of SARS-CoV–2 (GISAID) for the phylogenetic analysis. Two major clades were identified (Fig. 1a), both of which included some earliest reported cases1,2. Several subclades in the clade I, such as those characterized by ORF3a: p.251G>V (subclade V), or S: p.614D>G (subclade G) were also observed (Fig. 1a). Clades II is distinguished from that of Clade I by two linked variations ORF8: p.84L>S (28144T>C) and ORF1ab: p.2839S (8782C>T). The sequences of the Shanghai cohort were found throughout the two major clades and all of their subclades, suggesting multiple origins of transmission into Shanghai. No significant expansion of clades/subclades in Shanghai were observed. Although it is consistent with the stable status of this viral genome observed since the first isolate, it may also be attributable to the limited transmission locally in Shanghai due to the effective control with early detection, reporting and isolation.
Additionally, we found that six cases with clear contact history to the HSWM1,2, the suspected early outbreak site, were all clustered into clade I, while three cases without contact history to HSWM12,13 were all clustered into clade II (Fig. 1a), implying that the virus might not exclusively originate from HSWM. The sequences around nt8,782 and nt28,144 of SARS-CoV–2 were analyzed in HSWM/non-HSWM-related samples and bat coronavirus Bat-SARS-CoV-RaTG13 (Fig. 1b). The non-HSWM sequences were identical to Bat-SARS-CoV-RaTG13 at these two sites, suggesting that clade II might be an evolutionarily ancestral form.
We compared the clinical manifestations of patients infected with viruses of either clade I or clade II. We found no statistical difference in disease severity (p = 1.00, Fisher’s exact test), lymphocytes count (p = 0.79), CD3 T cell counts (p = 0.21), C-reactive protein (p = 0.83), or D-dimer (p = 0.19) and duration of virus shedding after onset (p = 0.79, Mann-Whitney U test) (Extended Data Table 2). Thus, these two clades of virus exhibited similar pathogenic effects despite their genome sequence diversity.
Host factors associated with disease severity
A notable feature of this SARS-CoV–2 infection cohort was that some infected individuals (5 cases, 1.53% in our cohort) did not develop obvious symptom although significant virus shedding could be detected. As shown in Fig. 2A, no obvious lesions in the lungs were found in an asymptomatic patient upon admission till five days afterward. In comparison, unilateral and bilateral opacity lesions were observed in a mild case (Fig. 2B) and in a critical case, the latter quickly deteriorated in just two days (Fig. 2C).
We further analyzed the immunological and biochemical parameters of the four categories of patients (Extended Data Table 3). A prominent feature of SARS-CoV–2 infection was the progressive lymphocytopenia, particularly in severe and critical categories (p = 8.3E–5, Kruskal-Wallis test). Detailed analysis of the subsets in the lymphocytes revealed that CD3+ T cells were most significantly affected (p = 3E–6, Kruskal-Wallis test), with CD4+ and CD8+ T cells sharing similar trends (CD4+ T cell, p = 4E–6, CD8+ T cell, p = 4E–4, Kruskal-Wallis test). For CD19+ B cells, although significant decline was found in critical cases (p = 6E–5, Kruskal-Wallis test), no obvious changes were observed among asymptomatic, mild and severe cases (p = 0.47). This situation was in contrast to that of T lymphocytes where the changes of cell number was statistically significant not only in critical cases as above mentioned but also in the other three categories (asymptomatic, mild and severe) in terms of both CD3+ T cells (p = 0.013) and CD8+ T cells (p = 0.004). We subsequently combined the longitudinal cell counting data in each group and plotted their patterns according to the time point post onset. It was clear that the CD3+ T lymphocytes (including both CD4+ and CD8+ cells) exhibited graded decline as the disease became more severe (Fig. 3 A-C) but no such trend was found for B cell and NK cell (Fig. 3D-E). Indeed, univariate logistic regression analysis indicated that age (p<0.0001), lymphocyte counts upon admission (p<0.0001), comorbidities (p = 0.006) and gender (p = 0.006) were the major factors associated with disease severity (Extended Data Table 4). Multivariate analysis showed that age (p = 0.004) and lymphocytopenia (p = 0.02) were the two major independent factors whereas comorbidities did not reach to statistical significance.
The levels of eleven cytokines (IFN-α, IFN-γ, IL–1β, IL–2, IL–4, IL–5, IL–6, IL–8, IL–10, IL–12 and IL–17) in serum were measured upon admission and during treatment. Among them, IL–6 and IL–8 showed most significant changes. Remarkably, these two cytokines exhibited an inverse correlation with lymphocyte count (Fig. 4A-B, Extended Data Table 5) suggesting that lymphocytopenia could be mechanistically linked to inflammatory cytokine release. Furthermore, we combined the longitudinal cytokine data in each group and plotted their fluctuation patterns according to the time point post onset. We observed that the level of IL–6 in critical group fluctuated during treatment but was well above the other groups (Fig. 4C). A similar trend was also found in the kinetic of IL–8 (Fig. 4D). These data indicated a critical role of inflammatory cytokines in the pathogenesis of SARS-CoV–2 infection.