Patient characteristics and univariate analysis
A total of 304 hospitalized adult COVID-19 patients diagnosed by RT-PCR in Wuhan Jinyintan Hospital from January 16th 2020 to March 30th 2020 were included in this retrospective study. After fulfillment of all exclusion criteria, two patients with follow-up period less than 60 days and two patients whose admission CEA were unknown were excluded from the further analysis.
The baseline characteristics of 300 COVID-19 patients were described in Figure 1A. The cohort comprised 170 males and 130 females, with a median age of 63.0 (range, 21.0 - 90.0) years. After removing 17 of 28 laboratory indicators with missing values more than 20% of the sample size, the results of initial Kaplan-Meier survival analysis (Figure 1C-D) and parameter or non-parametric tests (Figure 1E) revealed that only five indicators (serum CEA, lymphocytes, neutrophils, CRP and albumin) were significantly associated with both the imaging score and prognosis of COVID-19 patients (Figure 1B).
Cox proportional hazards model and nomogram
CEA is the only laboratory indicator with significant difference in all the univariate and multivariate analysis. To identify the optimal cut off point of CEA, the cyclic log-rank test was performed. The results revealed that CEA = 7.3ng/ml was the optimal cut off point with the most significant P value in the log-rank test (Figure 2A-B). Then, 12 potential indicators showing prognostic values in Kaplan-Meier analysis were incorporated into the initial Cox proportional hazards models, along with two demographic information (age and gender). The final multivariable-models were constructed to confirm the effects of significant covariates in the initial models to the OS of COVID-19 patients (Figure 2C). Patients with lower CEA had better OS (HR, 0.57; 95% CI, 0.35 to 0.92; P = 0.021) in the multivariable model, suggesting CEA as prognostic indicator for COVID-19 patients independently.
The prognostic nomogram was constructed based on the multivariate Cox model including CEA to predict the 3-week and 5-week overall survival probability of COVID-19 patients (Figure 3A). The calibration curve and ROC curve (AUC = 0.776) suggested acceptable calibration and discrimination of the nomogram, respectively (Figure 3B; Figure S1A-B). Besides, the risk score (RS) was calculated by the formula generated by the multivariate Cox model. The scatter plot (Figure S1C) and risk curve (Figure S1D) of the model demonstrated the RS distribution based on risk score of each patient. Kaplan-Meier curve suggested the prognostic value of the RS (Figure 3C, P < 0.001). Besides, the residual distribution of the multivariate model was accessed by the residual plot (Figure S1E). Eventually, the RS was shown to be an independently prognostic indicator for COVID-19 patients in both univariate (HR = 34.215, 95%CI (17.827−65.687), P < 0.001, Figure 3D) and multivariate (HR = 1.281, 95%CI (1.214−1.353), P < 0.001, Figure 3E) Cox regression model corrected by demographics,.
Identification of the potential mechanism of CEA in COVID-19
The scRNA-seq data of bronchoalveolar lavage fluid (BALF) from three patients with moderate COVID-19 (C141, C142, C144), six patients with severe or critical infection (C143, C145, C146, C148, C149, C152), three healthy controls (C51, C52, C100) [13] were download from the GEO database. A UAMP analysis was performed in 63,010 cells in BALF and clearly identified 20 clusters and 11 cell types including B cell, CD4+ T cell, CD8+ T cell, Dendritic cell, Macrophage, Monocyte, Natural killer cell, Neutrophil, T cell: gamma-delta, Type I pneumocyte, Type II pneumocyte (Figure 4A-B, Figure S2A-B). The expression levels and expression percentages of the marker genes in each cell type were displayed in Figure S2C and S2D, respectively. Except for macrophages and type I and type II pneumocytes, all other immune cells (B cell, CD4+ T cell, CD8+ T cell, Dendritic cell, Monocyte, Natural killer cell, Neutrophil and T cell: gamma-delta) were dominantly differentiated and chemotactic in the BALF of COVID-19 patients compared to healthy volunteer (Figure 4C). Furthermore, in terms of the expression and distribution of CRGs, CEACAM1, CEACAM3, CEACAM5, CEACAM6, CEACAM7, CEACAM8 and CEACAM21 were differentially expressed among moderate, severe/critical COVID-19 patients and healthy controls while CEACAM5, CEACAM6 were significantly localized in the type II pneumocytes of COVID-19 patients (Figure 4D-E). Especially, Figure 4F summarized the absolute quantification of 50 hallmark gene sets calculated the GSVA in type I and type II pneumocytes, suggesting that the interferon response and cell proliferation signaling pathways were significantly activated in type II pneumocytes highly expressing CRGs of COVID-19 patients. Besides, cell cycle analysis suggested that COVID-19 patients were more likely to have cells in the G2M and S stages (Figure S3A-B). Furthermore, cellphoneDB analysis illustrated that pneumocytes of COVID-19 patients communicated extensively with other immune cells through CRGs (Figure S3C).
Similarly, the scRNA-seq data of 94,448 PBMCs from six patients with moderate COVID-19 and six healthy volunteers were also download [14]. The UAMP analysis identified 18 clusters and 10 cell types including B cell, B cell Naïve, CD4+ T cell, CD8+ T cell, Macrophage-Monocyte, Myelocyte, Natural killer cell, Neutrophil, Plasma cell. Platelets (Figure 5A-B; Figure S4A-B). All types of immune cell were significantly differentiated and chemotactic in COVID-19 patients’ PBMCs compared to healthy controls (Figure 5C). CEACAM1, CEACAM4, CEACAM6 and CEACAM8 were differentially expressed between PBMCs of COVID-19 patients and healthy controls, while CEACAM1, CEACAM6 and CEACAM8 were significantly localized in a novel cell subtype annotated as ‘developing neutrophils’, which was significantly differentiated and chemotactic only in COVID-19 patients with ARDS reported by Wilk, A.J., et al (Figure 5D-E) [14]. Additionally, dot plots summarized the results of Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis. Based on GO analysis, the DEGs were associated with the neutrophil activation and degranulation (Figure 5F). According to KEGG analysis, the DEGs were related to protein processing in endoplasmic reticulum, phagosome, epstein-barr virus infection and tuberculosis (Figure 5F). Besides, cell cycle analysis suggested that the developing neutrophils in COVID-19 patients’ PBMCs were all engaged in the G2M and S stages (Figure S4C-D).
The specific expressions of CRGs in COVID-19 patients
Due to the close correlation between CEA and ALI/IPF, we initially speculated that the poor prognosis of COVID-19 patients mediated by CEA might be related to ALI and IPF pathophysiologically. To validate this hypothesis, scRNA-seq data of ALI mouse lungs (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE134383) and IPF mouse lungs (https://www.ebi.ac.uk/gxa/sc/experiments/E-HCAD-14/results/tsne) were also downloaded to evaluate the distribution and expression of CRGs, key receptor-ligand pair of cellular communication and potential downstream pathways[24-28]. The UAMP analysis identified 18 clusters and 6 cell types in ALI mouse lungs while there were no abnormal expressions of CRGs (Figure 6A-C). The interferon response and cell proliferation signaling pathways were not significantly activated in type II pneumocytes of ALI mouse lungs (Figure 6D). Similarly, abnormal expressions of CRGs were also not detected in 31 clusters and 10 cell types of IPF mouse lungs (Figure 6E-G). Besides, the heatmap of GSVA also showed that the interferon response and cell proliferation signaling pathways were not activated in type II pneumocytes of IPF mouse lungs (Figure 6H). Thus, the abnormal expressions of CRGs in COVID-19 patients were COVID-19-specific and not related to CEA involvement in ALI and IPF.
Protein-protein interaction (PPI) network of CRGs
String [29] database was used to construct the PPI network of CRGs, illustrating that several CRGs had direct PPIs with a variety of immune cell surface markers (Figure 7A-C). Besides, the protein expression levels of CRGs in normal lung samples of The Human Protein Atlas were also checked[30], showing that only CEACAM21 was stained moderately in pneumocytes while the proteins of CEACAM5, CEACAM6 and CEACAM8 were not detected in normal lung samples (Figure 7D). To sum up, we supposed that CEA can serve as a predictor for the poor prognosis of COVID-19 patients. In COVID-19, the developing neutrophils/neutrophil progenitors (highly expressed CEACAM8, ELANE and LYZ) can have the cross-talk with Type II pneumocyte (highly expressed CEACAM5 and CEACAM6) via CEACAM8-CEACAM6. This process may not only promote the differentiation of developing neutrophils and subsequently induce the ARDS, but also regulate the proliferation of Type II pneumocyte, which is the target cell of SARS-Cov-2.