Identification of DEGs, DEMs and DELs in LUAD patients based on TCGA data
The LUAD transcriptome profiling data and corresponding clinical information were downloaded from the TCGA database. By using the “edgeR” package in R with the cutoff value of |log2FC| > 1 and FDR < 0.05, a total of 5595 cDEGs were considered in LUAD samples compared with those in normal samples, including 3750 (67.02%) upregulated and 1845 (32.98%) downregulated genes (Fig. 1a). A total of 3942 cDELs were then identified, including 3257 (82.62%) upregulated and 685 (17.38%) downregulated lncRNAs (Fig. 1b). Meanwhile, a total of 265 cDEMs were further screened, including 159 (77.56%) upregulated and 46 (22.44%) downregulated miRNAs (Fig. 1c). Additionally, the expression levels of these above genes are visualized in the form of heatmaps in Fig. 2.
Intersecting lncRNAs and mRNAs and construction of the ceRNA network in LUAD
A flowchart of the ceRNA network construction is shown in Fig. 3. As demonstrated in this flowchart, 34 potential miRNAs that interacted with 340 lncRNAs were first predicted on the basis of the miRcode database (Fig. 3). Then, these 34 miRNAs were applied to the further selection of targeted mRNAs. After using all three databases, including miRDB, miRTarBase, and TargetScan, 5 miRNAs were removed (namely, hsa-mir-508, hsa-mir-489, hsa-mir-301b, hsa-mir-184, and hsa-mir-187), whereas 943 targeted mRNAs remained (Fig. 3). Subsequently, at the intersection of these 943 targeted mRNAs and 5377 cDEGs (only those genes identified by R), 218 overlapping DEGs were identified (Fig. 4). Then, 340 DELs, 29 DEMs, and 218 DEGs were incorporated into a final LUAD ceRNA network (Fig. 5). The respective genes involved in the ceRNA network are presented in Table S1.
Functional enrichment analysis and PPI network construction on DEGs involved in ceRNA
The online tool DAVID database was applied to perform GO and KEGG pathway enrichment analyses. GO analysis of the above 218 DEGs grouped DEGs into three functional groups, namely, BP, CC, and MF. The top 5 significant terms from the GO enrichment analysis showed that in the BP category, the DEGs were involved in transcription, DNA-templated (18.81%, P < 0.001), positive regulation of transcription from RNA polymerase II promoter (13.30%, P < 0.001), regulation of transcription, DNA-templated (12.84%, P = 0.029), signal transduction (10.55%, P = 0.028), and negative regulation of transcription from RNA polymerase II promoter (9.63%, P = 0.001) (Fig. 6a). For the CC category, the DEGs were correlated with nucleus (40.83%, P < 0.001), cytoplasm (34.86%, P = 0.014), nucleoplasm (23.85%, P < 0.001), transcription factor complex (5.50%, P < 0.001), and proteinaceous extracellular matrix (3.67%, P = 0.035) (Fig. 6b). For the MF group, the DEGs were enriched for protein binding (61.00%, P < 0.001), metal ion binding (18.81%, P = 0.002), DNA binding (15.60%, P = 0.003), ATP binding (12.84%, P = 0.021), transcription factor activity, and sequence-specific DNA binding (11.93%, P < 0.001) (Fig. 6c). For KEGG pathway enrichment, the DEGs were associated with hsa05206: microRNAs in cancer (8.72%, P < 0.001), hsa05200: pathways in cancer (8.26%, P < 0.001), hsa04110: cell cycle (5.96%, P < 0.001), hsa05166: HTLV-1 infection (5.96%, P < 0.001), and hsa04151: PI3K-Akt signaling pathway (5.05%, P = 0.041) (Fig. 6d). Furthermore, the specific information on the DEGs identified in each category in the functional enrichment analysis is presented in Table S2.
The PPI network of the DEGs involved in the ceRNA network was constructed on the basis of the information from the STRING database. A total of 218 DEGs were mapped to the PPI network (Fig. S1). 218 nodes and 455 edges were included in this PPI network, and its PPI enrichment P value was lower than 1.0e – 16 (Fig. S1). In addition, the average local clustering coefficient was 0.38 (Fig. S1).
Survival analysis of ceRNA network-associated genes
To validate whether the potential DEGs, DELs, and DEMs were significantly associated with the progression of patients with LUAD, K-M survival analyses and log-rank tests were conducted for each gene involved in the above ceRNA network. After validation, 29 DEGs, 24 DELs, and 4 DEMs were considered oncogenes because all of them had statistical significance in the survival analysis of LUAD patients (P < 0.05; Table 1). K-M survival curves of the partial DEGs, DELs, and DEMs (DEMs: PRKCE, LATS2; DELs: NAV2-AS2; DEMs: hsa-mir-375) are shown in Fig. 7. The survival analyses of PRKCE and LATS2 were performed through K-M plotter database (Fig. 7a, b), whereas the survival results of NAV2-AS2 and hsa-mir-375 were based on TCGA database (Fig. 7c, d). The other DEGs, DELs, and DEMs are presented in Figs. S2-S3. Specifically, except hsa-mir-375, the other DEMs including hsa-mir-200a, hsa-mir-21. and hsa-mir-31 were performed on the OncomiR dataset (Table S3). In addition, we further constructed a ceRNA network based on the DEGs, DELs, and DEMs that were significantly related to the progression of LUAD patients (Fig. 8).
Genetic information and drug-gene interactions
Subsequently, cBioPortal was used to determine the genetic alterations of the 29 DEGs (Fig. 9). As presented in Fig. 9a, these queried genes were altered in 300 (59%) queried patients or samples. PTPRD was mutated most often (24%) (Fig. 9a). These mutations included missense mutations, truncating mutations, fusions, amplifications, and deep deletions (Fig. 9a). Among the different kinds of alterations, multiple alterations accounted for the highest percentages, followed by mutation, deep deletion, and amplification (Fig. 9b).
Furthermore, a total of 15 potential drugs approved by the FDA for treating LUAD patients were screened when drug-gene interactions were conducted (Table 2). In this study, the DEGs, including UBASH3B, ZEB1, SELE, PRKCE, and TBXA2R, were selected as the potential targets of the 15 drugs on the basis of the significant results of the above survival analysis (Table 2). Most potential drugs might interact with TBXA2R (10/15) as agonists (ABACAVIR, ILOPROST, DINOPROSTONE, and DINOPROST), antagonists (FELBINAC and ACETAMINOPHEN), or in some unknown manners (CYCLOSPORINE, MORPHINE, FUROSEMIDE, and VINBLASTINE) (Table 2). In addition, protein kinase C epsilon (PRKCE) is always considered an activator that interacts with INGENOL MEBUTATE and MEPROBAMATE (Table 2).
Construction of the mRNA-associated risk score system
Lasso-penalized Cox regression and multivariate Cox regression analyses were conducted to screen the potential prognosis-related mRNAs based on the 29 DEGs that were significantly associated with OS in the above survival analysis. We also used relative coefficients to weigh the contributions of these DEGs (Figs. 10a, b). Then, according to the minimum of the mean cross-validated error, only 6 DEGs were left for the follow-up analysis, namely, PRKCE, DLC1, LATS2, RALGPS2, ZNF367, and DPY19L1 (Figs. 10a, b). Next, these 6 DEGs were further incorporated into the multivariate Cox regression model using “both” directions, and the mRNA-associated risk score system was constructed as follows: PI = (-0.32648 × expression level of PRKCE) + (-0.12901 × expression level of DLC1) + (0.43411 × expression level of LATS2) + (0.20825 × expression level of DPY19L1). Among these 4 DEGs, PRKCE and DLC1 had negative coefficients in the multivariate Cox regression analysis, whereas LATS2 and DPY19L1 had positive coefficients. Specifically, the median risk score in LUAD patients was 0.97315. After we classified the LUAD samples into two groups (namely, low- and high-risk groups) with the median PI as the cutoff value, the survival curve and log-rank test were performed (Fig. 10c). As presented in Fig. 10c, for LUAD patients in the low-risk group, the 3-year survival rate was 66.5% (95% CI: 57.94% - 76.20%), whereas the 3-year survival rate of the high-risk group was 46.83% (95% CI: 38.27% - 57.30%). Furthermore, the 5-year survival rates of the low- and high-risk groups were 45.2% (95% CI: 34.81% - 58.60%) and 22.05% (95% CI: 14.11% - 34.40%), respectively (Fig. 10c). Additionally, the area under the curve (AUC) values of the time-dependent ROC curves at 3 years and 5 years were both higher than 0.5 (3 years: AUC = 0.64; 5 years: AUC = 0.664; Figs. 10d & S4), which means that the PI constructed by the 4 mRNAs had a good prognostic ability. Finally, the DEG expression levels and the risk scores of LUAD samples with survival time were visualized through a heatmap (Fig. 10e).
Univariate and multivariate analyses
We subsequently used univariate and multivariate Cox regression analyses to select the potential factors related to OS from 316 LUAD patients with clinical information based on TCGA database. The univariate results indicated that the pathological stage, tumour (T), and lymph node (N) classification were significantly associated with the prognosis of LUAD, similar to the risk score results (Fig. 11). For the multivariate results, the tumour and lymph node classifications were no longer related to the progression of LUAD, whereas the risk score system and pathological stage were still significantly correlated with the survival time of LUAD patients (Fig. 11). Compared to the low-risk patients, the high-risk patients had a hazard ratio (HR) of 1.74 (95% CI 1.17 – 2.59, P = 0.006; Fig. 11).
Correlations between DEGs and DELs
After the construction of the risk score system, 4 significant DEGs were identified, namely, PRKCE, LATS2, DPY19L1, and DLC1. DEGs are always positively regulated by DELs by directly interacting with DEMs. To verify this hypothesis in LUAD patients, a correlation analysis between these 4 significant DEGs and their corresponding DELs involved in the ceRNA network (Fig. 8) was performed. As shown in Fig. 12, we identified 2 DEL-DEG pairs, namely, NAV2-AS2 – PRKCE (r = 0.430, P < 0.001; Fig. 12a), and NAV2-AS2 – LATS2 (r = 0.338, P < 0.001; Fig. 12b). In addition, the correlations between the above 2 DEL and DEG pairs were further verified by the online database GEPIA2 (Fig. S5). These results were consistent with Fig. 12. Meanwhile, the IHC results of LATS2 (Fig. 13) and PRKCE (Fig. S6). As shown in Fig. 13, it is easily to see that most malignant cells presented moderate cytoplasmic and membranous positivity of LATS2 in LUAD. However, PRKCE was not detected in both LUAD tissues and normal lung tissues (Fig. S6).
Considering the ceRNA network shown in Fig. 8, hsa-mir-31 was selected as the key gene. Finally, in this study, two ceRNA networks were selected, namely, NAV2-AS2 – mir-31 – PRKCE and NAV2-SA2 – mir-31 – LATS2.