Identification of differentially expressed genes
Gene Expression Omnibus (GEO) analysis was used to obtain the gene expression profile of SARS-CoV-2. Genes from GSE148729 were separated into three groups; Caco2, Calu3 S1, and Calu3 S2. |log2FC|>1 and pvalue<0.05 were used as the cut-off values to separately screen 108 DEGs from Caco2, 791 DEGs from Calu3 S1, and 342 DEGs from Calu3 S2 (Supplementary Table S1). The Caco2 group had 51 down-regulated and 57 up-regulated DEGs, Calu3 S1 group had 73 down-regulated and 718 up-regulated, while the Calu3 S2 group had 68 down-regulated and 274 up-regulated DEGs (Supplementary Table S1). Among the DEGs, 16 were found to be up-regulated in the three groups (Figure 1A). In addition, volcano maps (Figure 1B, 1D, 1F) and heat maps (Figure 1C, 1E, 1G) of the three groups analyzed independently by R language, showed significant differences in each group in terms of distribution.
Gene Ontology Enrichment Analysis in SARS-CoV-2
The DEGs in three Caco2, Calu3 S1, and Calu3 S2 groups are shown in Figure 2 (ABC). Gene Ontology (GO) is subdivided into three non-overlapping ontologies of the biological pathway (BP), cellular component (CC), and molecular function (MF). The results of the first 10 items in BP for each group are shown in Supplementary Table S2.
As we all know, the principle of clustering is to group genes that perform the same function together, but if too few genes are screened out,which means that there will be fewer genes that perform the same function, therefore, meaningful results cannot be obtained. In the Caco2 group, since the number of differential genes in this group is relatively small and the standard for screening genes is p<0.05, which excludes the reason that P is not significant, it was difficult to obtain significant results in cluster analysis, so we presented several items which were ‘ethanol metabolic process’, ‘hormone metabolic process’, ‘antibiotic metabolic process’, ‘gap junction assembly’ and ‘ethanol oxidation’ with high correlation with the transfection process in Figure 2A. The transfection process specifically refers to the binding of the spike (S) glycoprotein to the receptor on the target cell, mainly by tightly grasping the Angiotensin‐converting enzyme 2 (ACE2) receptor, and after the virus particle is endocytosed, the SARS-CoV-2 is fused with the endosomal membrane through the viral envelope and/ or enter the cell through the fusion of the virus envelope with the plasma membrane.
In the Calu3 S1 group, three main processes were identified in BP, ‘response to the virus’, ‘defense response to the virus’ and ‘response to molecule of bacterial origin’ (Figure 2B), while in the Calu3 S2 group ‘response to the virus’, ‘defense response to the virus’ and ‘response to interferon-gamma’ were identified (Figure 2C). The CC and MF results for Calu3 S1 and Calu3 S2 are presented in Figures 2B and 2C, respectively.
KEGG Analysis in SARS-CoV-2
After KEGG enrichment analysis of all DEGs, P-value and adjusted P-values corresponding to each pathway were obtained. Supplementary Table S3 and Figure 3 (ABC) show the enrichment analysis results for each group in the KEGG analysis.
Like GO analysis, for the reason that the number of Caco2 differential genes is relatively small, it is difficult to obtain significant results in cluster analysis, therefore, it is difficult to analyze excellent results for this set of data. We show the data relationship in the form of a circular graph in Figure 3A. A total of five pathways were associated with SARS-CoV-2 infection, including ‘Viral myocarditis’, ‘Glycolysis / Gluconeogenesis’, ‘Fc epsilon RI signaling pathway’, ‘Drug metabolism - cytochrome P450’ and ‘Metabolism of xenobiotics by cytochrome P450’.
KEGG pathway analysis of DEGs revealed that Calu3 S1 DEGs were mainly enriched in the ‘TNF signaling pathway’, ‘Cytokine-cytokine receptor interaction and Influenza A’(Figure 3B), while Calu3 S2 DEGs were mainly enriched in ‘Influenza A, TNF signaling pathway’ and ‘Measles’(Figure 3C).
PPI network analysis
The PPI network of the Caco2 group comprised of 37 nodes and 45 edges (Figure 4A), 178 nodes and 241 edges in the Calu3 S1 group (Figure 4B), and 56 nodes and 60 edges in the Calu3 S2 group (Figure 4C). The list of the top ten genes is included in Figure 4D.
Validation of DEGs for SARS-COV-2 infection
The 10 hub genes were validated using data obtained from the GSE150728 gene dataset. P <0.05 was considered statistically significant. The expression of CXCL2, ETV7, and HIST1H2BG was found to be statistically significant (Figure 5).