Identification of ARlncRNAs in TCGA-LIHC samples
The TCGA-LIHC gene expression matrix containing 374 tumor samples and 50 para-cancer samples obtained from the GDC database was distinguished into two biological types, mRNA and lncRNA. By comparing 363 ARGs downloaded from the GeneCards database with genes in the mRNA expression matrix, we got 346 ARGs expressed in TCGA-LIHC samples. And through co-expression analysis, we obtained 2902 ARlncRNAs with a regulatory relationship with 346 ARGs (correlation coefficient > 0.4 and p < 0.001). Figure 1A displayed the regulatory relationships between 331 ARGs and 2902 lncRNAs. Among them, 2143 ARlncRNAs were up-regulated and 20 were down-regulated (∣log2FC∣>1, FDR < 0.05) in the TCGA-LIHC cohort (Fig. 1B). The heatmap suggested that differentially expressed ARlncRNAs could well distinguish normal and tumor samples in the TCGA-LIHC cohort (Fig. 1C).
Figure.1 Identification of ARlncRNAs in LIHC. (A) ARGs-ARlncRNAs regulatory network. (B) Volcano plot of ARlncRNAs (∣log2FC∣>1, FDR < 0.05). (C) Heatmap of the expression of the 50 most significantly up- and down-regulated ARlncRNAs in LIHC.
Identification of prognosis-related ARlncRNAs
After removing normal samples from the TCGA-LIHC cohort, we screened for tumor samples with complete survival data and their differentially expressed lncRNA matrix files. The batch survival univariate Cox method (p < 0.05) was performed on 2163 ARlncRNAs, and we screened 27 ARlncRNAs that had an impact on prognosis (p < 0.001) (Fig. 2A). Sankey diagram demonstrated the regulatory relationships between ARGs and the 27 prognosis-related ARlncRNAs (Fig. 2E). The results showed negative regulatory relationships between ARGs-ARlncRNAs in only 14 pairs (MAPK3 to AC115619.1; RHOA to AC115619.1; HRAS to TMEM220-AS1; AKT1 to RAB11B-AS1; PAK2 to RAB11B-AS1; EEF2K to RAB11B-AS1; PTPN11 to ZSCAN16-AS1; ANXA5 to AC026803.3; ATF4 to AC026803.3; YWHAZ to AC026803.3; PLG to MAPKAPK5-AS1; PLG to CYTOR; PLG to AC011468.1; PLG to AC024060.2). The LASSO regression analysis further identified 6 ARlncRNAs that were used to construct the prognosis model of LIHC, including AC100812.1, AL365295.1, AC073352.1, ELFN1-AS1, LINC00513, and MIR4435-2HG (Fig. 2B-C). The details of ARlncRNAs were demonstrated in Table 1. The heat map suggested that the ARlncRNA signature was significantly down-regulated in the control group, while up-regulated in the tumor group (Fig. 2D).
Table 1
Details of the ARlncRNA signature.
LncRNA | Coefficient | HR | 95%HR | p Value |
---|
AC100812.1 | 0.749 | 3.949 | 1.896–8.226 | < 0.001 |
AL365295.1 | 1.454 | 6.348 | 2.115–19.051 | < 0.001 |
AC073352.1 | 0.585 | 2.916 | 1.770–4.803 | 2.65E-05 |
ELFN1-AS1 | 0.174 | 1.356 | 1.145–1.606 | < 0.001 |
LINC00513 | 0.573 | 1.987 | 1.427–2.765 | 4.69E-05 |
MIR4435-2HG | 0.564 | 2.978 | 1.701–5.216 | < 0.001 |
Construction and internal validation of the LIHC-related risk model associated with ARlncRNAs
After determining the ARlncRNA signature, we calculated risk scores for all samples in 3 cohorts and distinguished them into high- or low-risk groups with reference to the median value of the model risk score (RS).
RS = AC100812.1* 0.749256106692893 + AL365295.1* 1.45427488505416 + AC073352.1* 0.585423143197508 + ELFN1-AS1*0.174483348394598 + LINC00513* 0.572529045824369 + MIR4435-2HG*0.564436019020331.
The results indicated that in all the cohorts, the mortality rate of LIHC patients was higher in the high-risk group (Fig. 3A-B). And the expression of ARlncRNA was significantly increased in the high-risk group (Fig. 3C). It is hinted that the up-regulation of ARlncRNA expression is directly proportional to the poor prognosis of LIHC. K-M plots showed that overall survival (OS) was always significantly lower for patients in the high-risk group than for those in the low-risk group in train, test, and entire sets, implying that factors associated with the high-risk score also contributed to the poor prognosis of LIHC patients (Fig. 3D). The area under the curves (AUCs) can be used to assess the predictive value of the model. In the train set, the AUCs were 0.788 (1-year), 0.808 (3-years), and 0.810 (5-years); the AUCs were 0.684 (1-year), 0.541 (3-years), and 0.596 (5 years) in the test set; as for the entire set, the AUCs were 0.732 (1-year), 0.675 (3-years), and 0.700 (5-years), respectively (Fig. 3E-G). The results indicated that the risk model constructed based on the ARlncRNA signature had a reliable prediction performance in terms of the prognosis of all sets. Figure 3H confirmed that the risk score has a greater prognostic value for patients with LIHC compared to the other clinical features (AUC = 0.732).
Prognostic analysis of risk model in different clinical subgroups
We explored the prognostic value of the risk model across different subtypes of clinical characteristics (gender, age, T, N, M, and stage). The results showed that the survival prognosis of patients in the low-risk group was always better than that of the high-risk group in all clinical characteristics subgroups (Fig. 4). Notably, the prognostic value of our constructed risk model was greater for patients with early-stage LIHC (p < 0.001).
Assessment of the risk model
The results of univariate Cox regression analysis suggested that the prognosis of TCGA-LIHC patients was associated with the stage (p < 0.001; HR = 1.680; 95% CI = 1.369–2.062) and riskScore (p < 0.001; HR = 1.069; 95%CI = 1.053–1.132) (Fig. 5A). The results of multivariate Cox regression analysis provided evidence that riskScore (p = 0.003; HR = 1.064; 95% CI = 1.021–1.109) was an independent prognostic factor for LIHC (Fig. 5B). The curves of the c-index indicated that the riskScore-based model is more accurate in assessing patient prognosis than other clinical characteristics (Fig. 5E). We constructed a nomogram to predict the survival time of LIHC patients at 1-, 3-, and 5-years by scoring age, gender, T, N, M, stage, grade, and risk of the total TCGA-LIHC sample. The results showed that the survival time of LIHC patients at 1-, 3-, and 5-years could reach 0.899, 0.801, and 0.728 respectively (Fig. 5C). The calibration and DCA curves confirmed the value of the nomogram in predicting patient survival time and suggested that its predictive performance is superior to other clinical traits (Fig. 5D and Fig. 5F). In addition, we also confirmed that our identified anoikis-related lncRNA signature had good prognostic performance in predicting the prognosis of LIHC patients compared with the anoikis-related gene signature identified by Yutong Chen et al. and the pyroptosis-related gene signature identified by Min Deng et al. The c-index of our identified anoikis-associated lncRNA signature in predicting the prognosis of LIHC patients was 0.671 (95%CI: 0.559–0.767). The model of Yutong Chen et al., based on the signature of the anoikis-related gene, has a c-index of 0.687 (95% CI: 0.574–0.780); and the c-index of the model constructed by Min Deng et al. based on the signature of pyroptosis-related genes is 0.658(95%CI: 0.545–0.756) (Deng et al. 2022; Chen et al. 2023).
GSEA enrichment analysis
Using GSEA software, we first screened all the Kyoto Encyclopedia of Genes and Genomes (KEGG) signaling pathways involved in ARlncRNAs and ranked their relevance. The results suggested that the KEGG signaling pathways associated with the high-risk group mainly include OOCYTE_MEIOSIS, PYRIMIDINE_METABOLISM, RNA_DEGRADATION, SPLICEOSOME, and CELL_CYCLE (Fig. 6A), while the KEGG signaling pathways involved in the low-risk group mainly include PRIMARY_BILE_ACID_BIOSYNTHESIS, COMPLEMENT_AND_COAGULATION_CASCADES, FATTY_ACID_METABOLISM, DRUG_METABOLISM_CYTOCHROME_P450, and RETINOL _METABOLISM (Fig. 6B).
Immune infiltration landscape analysis and clinical treatment exploration of different ARlncRNAs scored groups
For TCGA-LIHC tumor samples we performed immune cell correlation analysis based on different platform sources (Fig. 7A). The results showed that XCELL platform-derived T cell CD4 + Th2; TIMER platform-derived T cell CD4+, Neutrophil, Macrophage, Myeloid dendritic cells; QUANTISEQ platform-derived Monocyte; EPIC platform-derived uncharacterized cells; CIBERSORT-ABS platform-derived T cell regulatory (Tregs), Macrophage M0 and M2; CIBERSORT platform-derived T cell regulatory (Tregs) and Macrophage M0 all showed a positive regulatory relationship with risk scores. As for the immune cells that showed a negative regulatory relationship with a risk score, the main ones were Endothelial cell, Hematopoietic stem cell, stroma score from XCELL platform; Macrophage from EPIC platform. Evidence showed that immune cells were an important influence on risk scores and tumor progression in patients with LIHC. By ssGSEA analysis, we calculated and compared the immune cell content between different risk groups and found that the expression of Macrophages (p < 0.001) and iDCs (p < 0.01) was significantly upregulated in the high-risk group compared to the low-risk group, while the expression of Mast_cells (p < 0.01) and NK_cells (p < 0.01) was significantly downregulated. Box plots of immune function indicated that Cytolytic_activity and Type_II_IFN_Reponse were significantly suppressed (p < 0.001) in the high-risk group of patients (Fig. 7B).
By scoring TME in samples from different risk groups, we found that only matrix scores differed and were higher in samples from low-risk groups (p < 0.05) (Fig. 7C). This intimated that improving the matrix score had potential benefits for patients with LIHC.
We collected 47 immune checkpoint-associated genes and performed immune checkpoint analysis on tumor samples in TCGA-LIHC. The results showed that these genes were significantly activated and upregulated (p < 0.001) in the high-risk group (Fig. 7D). Given the poorer survival outcomes in high-risk populations, immunosuppressive therapy may have a potential survival benefit for the high-risk population. We subsequently performed drug sensitivity analysis on tumor samples from the high-risk group. And the results suggested that the high-risk group was more sensitive to 32 drugs including Bortezomib, Doxorubicin, Etoposide, Embelin, and Epothilone.B (Fig. 7E). This provided a valuable reference to better guide the clinical use of medications in LIHC patients with high-risk scores.
Identification of LIHC subtypes
Applying the ConsensusClusterPlus package, we distinguished 370 tumor samples in the TCGA-LIHC cohort into 3 clusters (C1, C2, and C3), at which the clustering variable k = 3 (Fig. 8A-C). K-M plots suggested a prognostic difference between the 3 subtypes of LIHC (p < 0.001), with cluster C2 having a significantly higher OS than the other clusters (Fig. 8D). The PCA and T-SNE plots indicated that the sample distribution of different LIHC subtypes and different risk groups identified based on the whole TCGA-LIHC cohort is regionally significant (Fig. 8E-F). Sankey diagram revealed that the cluster 3 samples mostly belong to the low-risk group, while clusters 1 and 2 samples mostly belong to the high-risk group (Fig. 8G).
Immunological characterization and drug sensitivity analysis of LIHC subtypes
TME analysis suggested that the 3 clusters of LIHC do not differ in ImmuneScore, StromalScore, and ESTIMATEScore (Fig. 8H). Immune-related heat map demonstrated the degree of infiltration of immune cells from different platform sources in the 3 clusters (Fig. 8I). The results showed significant differences (p < 0.001) in the expression of some immune cells in the 3 clusters, including TIMER platform-derived Macrophage, QUANTISEQ platform-derived Monocyte, XCELL platform-derived Hematopoietic stem cell and T cell CD4 + Th2. The boxplot suggested that although most immune checkpoints had different expressions in the three clusters, the lowest activation frequently occurred in cluster C2 (Fig. 8J). It is noteworthy that samples in cluster C2 were the main source of samples from the high- and low-risk groups, so it was necessary to explore drugs that were sensitive to cluster C2. The results of drug sensitivity analysis showed that cluster C2 patients had high sensitivity to 14 drugs including AICAR, Nilotinib, Metformin, Temsirolimus, and Vorinostat (Fig. 8K).