Basic characteristics of the microarray data
The total RNA extracted from pMMR colonic tissue samples, consisting of five NC, five LGIN, five HGIN, and four pMMR CC samples, were hybridized to a transcriptome microarray. The quality control results for RNA/microarray are summarized in Supplementary Tables 1 and 2. Cluster analysis of normalized data indicated that the samples from each group were well-separated when all DEGs were considered (Fig.1), suggesting the microarray data were reliable for further bioinformatics analysis.
Dysregulated lncRNAs and mRNAs in the colonic adenoma-carcinoma sequence
A total of 5783 lncRNAs and 4483 mRNAs were differentially expressed among the NC, LGIN, HGIN, and pMMR CC samples. In order to generate dynamic and significant trending models of DEGs related to colon mucosa malignant transformation, STC analysis was used to further identify the clusters of DEGs with similar expression patterns. As shown in Fig. 2A, the differential expression profile consisted of 26 differentially expressed lncRNA clusters, among which 15 clusters (profile No. 0, 1, 2, 3, 4, 5, 9, 12, 17, 20, 21, 22, 23, 24, and 25) showed significant expression trends (P < 0.05). Moreover, the lncRNA expression levels in clusters 21, 22, 24, and 25 were stable or gradually elevated, while the lncRNA expression in clusters 0, 1, 3, 4, 9, and 12 showed a decreasing trend. Cluster 23 contained 207 lncRNAs that gradually increased from NC to adenoma and were suppressed in CC (Fig. 2 B). However, lncRNA expression in cluster 3 was opposite that in cluster 23. As shown in Fig. 2A, a total of nine mRNA expression clusters, specifically clusters 2, 4, 5, 13, 20, 21, 22, 24, and 25, showed significant expression trends (P < 0.05), among which clusters 21, 22, 24, and 25 showed a stable or gradual elevation. The results suggested that adenoma is an intermediate step from normal tissue to CC, and certain lncRNAs may play important roles during the dynamic process in colonic mucosal protruding lesions.
Functional and pathway enrichment analysis
DEGs from significant expression trends were subjected to GO and KEGG pathway analyses to identify their potential functions and mechanisms in the dynamic process for colonic mucosal protruding lesions (Fig. 3).
. The top 20 GO terms included multicellular organismal development, cell division, cell adhesion, cell cycle, cell differentiation, mitosis, proteolysis, and angiogenesis. KEGG enrichment analysis revealed the DEGs were mainly involved in several signaling pathways, including metabolism, cancer, cell cycle, PI3K-Akt signaling, and transcriptional misregulation in cancer (Fig. 3B).
Construction of the lncRNA-mRNA co-expression networks
To discover the significant molecular mechanisms for the lncRNAs associated with tumorigenesis in colonic ACS, lncRNAs from elevated clusters (21, 22, 24, 25) or reduced clusters (0, 1, 3, 4, 9, 12) and mRNAs with significant GO terms and KEGG enrichment pathways were selected to construct elevated-lncRNA-mRNA co-expression networks (Co-expression network E) and decreased-lncRNAs-mRNA co-expression networks (Co-expression network D), respectively. The top 10 lncRNA /mRNA in terms by degree are listed in Tables 1 and 3. Co-expression network E contained 714 nodes and 1711 edges (Fig. 4A), among which the hub nodes with the highest degrees were NOTCH4 (mRNA, Degree=29), INHBA (mRNA, Degree=26), TSTA3 (mRNA, Degree=29), lnc-NPRL3-1:1 (lncRNA, Degree=22), NR_024431 (lncRNA, Degree=21), and ENST00000564984 (lncRNA, Degree=17). Co-expression network D contained 598 nodes and 1908 edges (Fig. 4B) and the DEGs with the highest degrees were AQP8 (mRNA, Degree=47), STRADB (mRNA, Degree=47), ENST00000435912 (lncRNA, Degree=15), NR_024431 (lncRNA, Degree=21), and ENST00000564984 (lncRNA, Degree=17) (Tables 1 and 3).
Angiogenesis is a crucial step in tumor growth and progression. The angiogenic switch has been observed at the adenoma stage in ACS [20]. We constructed angiogenesis-related lncRNA-mRNA co-expression networks (A-co-expression networks E and D) with DEGs from significantly elevated/decreased clusters as well as GO terms and KEGG pathways related to angiogenesis. A-co-expression network E consisted of 649 nodes and 2299 edges (Fig. 4C) and the top hub genes were NOTCH4 (mRNA, Degree=35), MAD2L1 (mRNA, Degree=34), E2F7 (mRNA, Degree=34), lnc-ZBTB20-2:1 (lncRNA, Degree=22), lnc-CENPH-2:1 (lncRNA, Degree=20), and lnc-PAQR4-2:1 (lncRNA, Degree=19) (Tables 2 and 3). The top hub genes in A-co-expression network D (Fig. 4D) were CAMK2B (mRNA, Degree=61), STRADB (mRNA, Degree=60), LTK (mRNA, Degree=57), ENST00000513255 (lncRNA, Degree=12), NR_024605 (lncRNA, Degree=11), and NONHSAT057082 (lncRNA, Degree=11) (Tables 2 and 3). All the DEGs primarily participated in angiogenesis as the key genes in each network.
Construction of TF regulatory networks
To investigate the mechanism of gene regulation at the transcriptional level, we constructed two interaction networks (TF-DEG network and TF-angiogenesis-DEG network) between TFs and DEGs that originated from the co-expression network or A-co-expression network. The TF-DEG network contained 99 regulation models and 402 nodes involving 85 TFs (Fig. 4B), among which three TFs, namely LUN-1 (TF, Degree=218), Tel-2 (TF, Degree=22), and 1-Oct (TF, Degree=20), regulated over 20 nodes (Table 3). LUN-1 had the highest degree and might play more important roles in the TF regulatory networks. The lncRNAs regulated by the most TFs were NR_103548, lnc-ARRDC3-1:16, ENST00000513626, and ENST00000516496 (regulated by six TFs). In the TF-angiogenesis-DEG network, due to the important role of LUN-1 (TF, Degree=73) and Tel-2 (TF, Degree=18) in the progression of CC, a subnetwork was constructed showing their effects on angiogenesis functions (Fig. 3, Table 3).
qRT-PCR verification for the candidate genes
According to the probe signal value, type, DEG fold change, degree value, and protein coding ability, five LncRNAs were selected from the networks above for validation in the expanded clinical samples (70 NC, 58 LIGN, 27 HIGN, 62 pMMR CC, eight dMMRCC samples). LncRNAs ENST00000545920, ENST00000521815, ENST00000609220, and ENST00000603052 presented a sequentially increasing trend in expression with tumor progression (Fig. 6 A-D), whereas NR_026543 showed a sequentially decreasing trend (Fig. 6 E). No statistically significant differences were observed in the expression of NR_026543 and ENST00000521815 among LGIN, HGIN, and pMMR CC. However, the expression levels for each group were significantly elevated or decreased compared with that of the NC samples, suggesting that these genes could be used to distinguish normal from lesion tissue. Furthermore, the expression of ENST00000603052, ENST00000521815, and ENST00000609220 was significantly different in pMMR CC and dMMR CC tissue samples (Fig 6. F-J), indicating these three genes could distinguish pMMR CC from dMMR CC.
Diagnostic efficacy of selected lncRNAs
The diagnostic efficacy of the above selected five lncRNAs was evaluated as potential biomarkers for pMMR CC diagnosis compared to carcinoembryonic antigen (CEA), a traditional serum tumor biomarker. The specificities of ENST00000521815, ENST00000603052, ENST00000609220, NR_026543, ENST00000545920, and CEA were 0.981, 1, 0.962, 0.932, 0.981, and 0.904, and the sensitivities were 0.931, 0.864, 0.746, 0.846, 0.576, and 0.593, respectively. ROC analysis showed significantly higher AUC for lncRNAs ENST00000521815, ENST00000603052, ENST00000609220, and NR_026543 compared with CEA (0.785) (P=0.0001, 0.0002, 0.0028, and 0.0011, respectively) (Fig 7. A-D). No statistically significant difference in the AUC was observed between ENST00000545920 (0.794) and CEA (P=0.8734) (Fig 7. E).