1. Differential Expression Analysis
According to the screening criteria and method, a total of 2,457 differentially expressed mRNAs (dif-mRNAs) in GPLs compared with EGC (P versus T) were obtained, along with 2,857 dif-mRNAs in normal tissues compared with EGC (N versus T), of which 2,548 were upregulated and 2,766 were downregulated (Figure 1A-1B). The overlapping dif-mRNAs between P versus T and N versus T were filtered out using Venn diagrams (Figure 1C). To further integrate epigenetic elements into the results of dif-mRNAs, we obtained high-quality reads from the ATAC-seq with alignment of the reference genome, obtained the position information of the reads on the genome and selected an intersection for the dif-mRNAs and the aligned genes by ATAC-seq. Finally, the DEGs were formed. Among a total of 2,205 DEGs in P versus T and 2,578 DEGs in N versus T, the overlapping DEGs between the two pairwise groups were presented by Venn diagrams and hierarchical cluster heatmaps (Figure 1E-1H), and 1,705 DEGs overlapped in total. In particular, 11 DEGs were upregulated in NvsT but downregulated in PvsT, including CPS1, CYP2D6, APOB, CHST5, ANPEP, CLCA1, SLC6A19, MME, CDHR5, SLC5A1, and MALL. Most of these mRNAs maybe play an essential role in the occurrence and development of tumours, which is worthy of further exploration to determine whether these are the target mechanisms of EGC.
A total of 352 differentially expressed miRNAs (dif-miRNAs), 3,715 differentially expressed mRNAs (dif-lncRNAs), and 117 differentially expressed circRNAs (dif-circRNAs) were obtained in P versus T, and 455 dif-miRNAs, 4,404 dif-lncRNAs, and 208 dif-circRNAs were obtained in N versus T (Figures 2A–2F). The overlapping dif-miRNAs (207), dif-lncRNAs (2,235), and dif-circRNAs (59) between P versus T and N versus T were filtered out using Venn diagrams (see Figure 2G-2I). The overlapping dif-miRNAs, dif-lncRNAs, and dif-circRNAs are shown in hierarchical cluster heatmaps (Figure 2J-2L), from which we found significant separation between the three groups, indicating that the results of the differential expression analysis were available. The primary point here is that hsa-miR-215–5p levels were elevated in normal versus EGC tissues, whereas levels were decreased in both GPLs versus EGC samples which might be a potential target microRNA.
2. Functional Enrichment Analysis of DEGs
Functional enrichment analysis of the overlapping DEGs was performed to explore the biological functions. Upregulated genes and downregulated genes were enriched in the top 30 GO terms and top 20 KEGG pathways, respectively (Figure 1I-1L). According to the GO analysis enriched by the upregulated genes, we found that many pathways of gastric carcinogenesis involve abnormal physiological function of the extracellular matrix, which indicates that extracellular matrix structural constituents and collagen manifest the critical processes of transition from normal to GPLs and further to EGC. Among the top 20 KEGG pathways based on the upregulated genes, the PI3K-Akt signalling pathway, proteoglycans in cancer, cell cycle, p53 signalling pathway, etc., were related to gastric carcinogenesis. Similarly, the KEGG pathways of downregulated genes suggest that the calcium signalling pathway, gastric acid secretion and chemical carcinogenesis-DNA adducts may be involved in the occurrence of EGC.
3. Protein–Protein Interaction (PPI) Analysis of DEGs and dif-ncRNAs
To identify important hub genes among the overlapping DEGs, dif-miRNAs, dif-lncRNAs, and dif-circRNAs, we performed protein–protein interaction (PPI) analysis of all the differentially expressed RNAs. A total of 1,555 DEGs were generated, the degree of each node was calculated using CytoNCA and visualized by the R packages, and the nodes with high topological scores can be regarded as vital nodes of the network, with the hub-gene determined according to the number of adjacent nodes. A bar chart including the top 30 hub genes (e.g., VEGFA, FN1, CDK1, MYC, etc.) is shown in (Figure 3B). Meanwhile, PPI networks of the target genes in DEGs corresponding to dif-miRNAs, dif-lncRNAs, and dif-circRNAs were also established (Figure 4A-4C). The PPI network based on dif-miRNAs consisted of 1,390 genes and a total of 999 genes based on dif-lncRNAs, while only 13 DEGs corresponded to dif-circRNAs. We generated bar charts of the top 20 target genes (Figure 4D-4F). The target genes (e.g., FN1, MYC, VEGFA, CDK1, etc.) corresponding to the dif-miRNAs and dif-lncRNAs with a significantly higher degree (e.g., DNMT3B, HIST1H2BJ, HIST1H2N, etc.) were found to be associated with dif-circRNAs. We then extracted five clustered modules from the PPI network and formed five subnetworks, the details of the representative genes ranked by degree were shown in Table 1. The five modules contained 96 (Module 1), 82 (Module 2), 77 (Module 3), 48 (Module 4), and 49 (Module 5) nodes, among which the upregulated genes were most often included. Among the 17 genes in Module 1 with degree=95 (e.g., PBK, KIF23, TOP2A, CDK1, CDC45, etc.), FN1 with the highest degree in Module 2 (degree=48), ICAM1 with degree=23 in Module 3 with the same circumstance, THBS1 with twice the degree of the second gene in Module 4 (degree=30), VEGFA is the highest in Module 5 (degree=21).
Moreover, genes in the five modules were subjected to GO and KEGG enrichment analysis (Figure 3H-3I). According to the KEGG analysis, four main pathways may be involved in the occurrence and development of gastric carcinogenesis: the cell cycle, DNA replication, PI3K-AKT signalling pathway, and ECM-receptor interaction. We believed that changes in the extracellular matrix and extracellular structure organization were probably involved in the pathological processes of GPLs and EGC.
4. ceRNA Network Construction
First, we constructed a circRNA-related and lncRNA-related ceRNA network. Based on the theory of ceRNA and the regulatory relationships of dif-circRNA-dif-miRNA, dif-lncRNA-dif-miRNA, and dif-mRNA-dif-miRNA, the differentially expressed circRNA, lncRNA, and mRNA regulated by the same miRNA were screened. By using circRNA and lncRNA as decoys, circRNA–miRNA–mRNA and lncRNA-miRNA-mRNA networks were formed (Figure 4G-4H). The circRNA–miRNA–mRNA ceRNA network contained 17 mRNAs, two circRNAs, and 11 miRNAs (Figure 4G). In this network, the core circRNA is an unknown gene expressed as gene loci information, and nodes and edges represent genes and interactions, respectively. The lncRNA–miRNA–mRNA ceRNA network contained 5,916 edges, 366 mRNAs, 312 lncRNAs, and 55 miRNAs (Figure 4H). Since most of the lncRNAs (approximately 283 lncRNA transcripts) were expressed as gene loci information, we created a core ceRNA network that filtered out the unknown lncRNAs, which contained 2,762 edges, 366 mRNAs, 29 lncRNAs, and 55 miRNAs (Figure 4I).
According to the two subnetworks and the regulatory relationship of these differentially expressed ncRNAs, we constructed a circRNA-lncRNA-miRNA-mRNA ceRNA network containing 1,013 edges, 17 mRNAs, 11 miRNAs, 242 lncRNAs, and 2 circRNAs (Figure 5A) and then filtered out 223 unknown lncRNAs to form a core network containing 17 mRNAs, 11 miRNAs, 11 lncRNAs, and two circRNAs (Figure 5B), the representative RNAs in the three big ceRNA networks were shown in Table 2.
In all of the above networks, we found that in both the whole network and the core network, hsa-miR-1226–5p and hsa-miR-6720–5p were the most important miRNAs. Furthermore, we found two mRNAs of GPR161 and ARID3A: lncRNA H19–202 had the highest degree (degree=12) in the core ceRNA network. We performed GO and KEGG enrichment analysis of all the genes in the core ceRNA network. The results showed that the smoothed signalling pathway was significantly enriched in GO-BP terms, and cytokine receptor, immune receptor, G protein-coupled peptide receptor, and peptide receptor activity were significant in GO-MF terms (Figure 6A). The results of KEGG enrichment showed that the classical MAPK signalling pathway and cytokine-cytokine receptor interaction pathway were significant (Figure 6B).
5. Validation of The Cancer Genome Atlas (TCGA) Database
To validate our findings in the core ceRNA network, the RNA expression levels were compared with the TCGA database. Because the circRNAs represented gene locus information that could not be verified, we were more concerned with mRNA, miRNA, and lncRNA validation. Because H19–201, H19–202, and H19–203 were the isomers of lncRNA-H19, the validation of only nine lncRNAs was performed. The validation results are shown in Figure 6C-6E. Most of the ceRNA expression patterns in the network were consistent with the TCGA data. There were ten mRNAs (Figure 6C), eight lncRNAs (Figure 6D), and four miRNAs (Figure 6E) that were significantly different, indicating participation in the development of EGC.
6. Survivorship Curve Analysis and ROC Curve Analysis
To further determine the roles of these genes in the ceRNA network with respect to survival and prognosis, we performed a survivorship curve analysis of mRNAs, miRNAs, and lncRNAs. The results showed that four mRNAs and one lncRNA were significant (Figure 7A-7E). For example, CCR10 (Figure 7A), GSDME (Figure 7C), LRRN1 (Figure 7D), and AC136475.3 (Figure 7E) had higher expression, and GPC3 (Figure 7B) had lower expression, the overall survival rate and prognosis were better. Literature retrieval revealed that most of these RNAs play a role in cancer progression, suggesting potential targets for the above RNAs in GPLs and EGC. Therefore, we performed ROC curve analysis to evaluate the diagnostic value of the candidate ceRNAs in GPL and EGC. The results showed that the AUC values of nine mRNAs (e.g., UBAP2, NOL6, LGR5, ASCL2, IL1RAP, etc.) and five lncRNAs (e.g., DLGAP1-AS2, CYTOR, AC136475.3, AGAP2-AS1, and AP003774.1.) were more significant than 0.7 (Figure 6F-6G). These results demonstrated that the above mRNAs and lncRNAs have good diagnostic value in GPLs and EGC, and thus serve as early biomarkers to predict GPLs and EGC.
7. Real Time-PCR Validation
Based on the results of the ceRNA network, most miRNAs except hsa-miR-3180–3p and hsa-miR-3180–5p existed in the TCGA database, and we found that hsa-miR-1226-5p, hsa-miR-6720-5p, lncRNA H19, ARID3A and GPR161 were in critical core positions, but hsa-miR-1226–5p was not significantly different in the TCGA database. The three mRNAs of GSDME, LRRN1, and GPC3 had certain significance with respect to the overall survival rate and prognosis, but they were not significant in TCGA. We further validated the three miRNAs and the three mRNAs by real-time PCR in normal, GPLs and EGC tissues to prove our data meaningful. In the results depicted in (Figure 7F-7K), there are significant differences in the expression of GPC3 (Figure 7F), GSDEM (Figure 7G), LRRN1 (Figure 7H), miR-3180–3p (Figure 7I), miR-3180–5p (Figure 7J) and miR-1226–5p (Figure 7K) between the three groups of normal gastric mucosal tissues, GPL tissues, EGC tissues, which confirms that the RNA sequencing and real-time PCR results were consistent.