Data Source
Gene expression data with clinical information from LIHC project (371cases, Workflow Type: HTSeq-FPKM) were collected from the Cancer Genome Atlas (TCGA). Then, level 3 HTSeq-FPKM (Fragments Per Kilobase per Million) data were transformed into TPM (transcript per million reads) for the further analyses. Unavailable or unknown clinical features were regarded as missing values. This study meets publication guidelines.
Clinical And Prognostic Statistical Analysis
The RNAseq data from TCGA and GTEx in TPM format was downloaded from UCSC XENA, which has been uniformly processed by the Toil process [11]. We compared the expression of PDRG1 in normal samples from GTEx combined with TCGA and HCC samples from TCGA by Wilcoxon rank sum test. Moreover, we also compared the expression of PDRG1 in HCC tumor issues compared with 50 pairs non-cancerous adjacent tissues using Wilcoxon signed rank test. Then, we analyzed the the relations between clinical pathologic characteristics and the expression level of PDRG1 using Wilcoxon rank sum test, Kruskal-Wallis rank sum test and logistic regression. Clinicopathological characteristics associated with Overall Survival from TCGA HCC patients using Cox regression and Kaplan-Meier method. The variables with p < 0.1 in Cox regression will be included in multivariate Cox regression to compare the influence of PDRG1 expression on Overall Survival of HCC patients. The cut-off value of PDRG1 expression was determined by its median value. P-values less than 0.05 were considered significant in all tests. All statistical analyses were conducted using R (version 4.0.2).
Identification Of Degs
The differentially expressed genes were analyzed by HTSeq-Counts data with DESeq2 package, considering |logFC|༞2 and p. adj < 0.05 as the threshold of differential genes (DEGs). The results of difference expression analysis are shown by volcano plot (Fig. 3A) and heatmap (Fig. 3B). A total of 436 differentially expressed genes were identified, of which 321 were up-regulated and 115 were down regulated. The top20 of DEGs including PCSK1, SPHK1, WNT7B, RP11-863K10.7, TMEM132A, ITIH5, PEBP4, PLEKHB1, TTYH1, UCHL1, SPIB, COL9A2, ZG16B, CTH, C19orf33, LAMP5, KIAA1549L, CXCL5, SIX2, PROM1.
Functional Enrichment Analysis (Go)
To predict the function of enrichment information of interactive genes of PDRG1, GO enrichment analyses were performed within Metascape. PDRG1 associated genes engaged in several BP, CCs, and MFs. We found that pattern specification process, skeletal system development, multicellular organismal homeostasis, reproductive structure development, epithelial cell differentiation, sensory organ development, metanephros development, gonadal mesoderm development, mesenchyme development, and negative regulation of cell differentiation had significant regulation by these genes. Moreover, digestive tract development, digestion, ossification, neuron fate specification, sphingolipid biosynthetic process, activation of adenylate cyclase activity, memory, stem cell proliferation, antimicrobial humoral response, regulation of blood pressure also involved in the regulation of PDRG1 interactive genes (Fig. 4A).
To further capture the relationships between the terms, a subset of enriched terms had been selected and rendered as a network plot, where terms with a similarity > 0.3 were connected by edges. We selected the terms with the best p-values from each of the 20 clusters, with the constraint that there are no more than 15 terms per cluster and no more than 250 terms in total. The network is visualized using Cytoscape, where each node represents an enriched term and is colored by its p-value (Fig. 4B).
Gene Set Enrichment Analysis (Gsea)
As many pathways contribute to tumor formation, the poor survival associated with high PDRG1 expression may be related to a number of signaling pathways activated in HCC. GSEA of differences between low and high PDRG1 expression data sets was performed to identify the key signaling pathways associated with PDRG1. We selected C2.cp.v7.0.symbols.gmt [Curated] in MSigDB Collections as the reference gene set, each gene set contains at least 10 genes and no more than 500 genes, calculation times was set as 10000. P Value less than 0.05 and False discovery rate(FDR)less than 0.25 were identified as significant differences. Statistical analysis and graphical plotting were conducted using R package ClusterProfiler (3.8.0). Among them, there are 342 datasets satisfying. We selected visual datasets were KEGG_CELL_CYCLE (NES = 2.034, p. adj = 0.004, FDR = 0.003), KEGG_DNA_REPLICATION (NES = 1.914, p. adj = 0.005, FDR = 0.004), REACTOME_DNA_REPAIR(NES = 1.655, p. adj = 0.004, FDR = 0.003) (shown in Fig. 5). The results suggest that the dataset is significantly enriched in the left red (high PDRG1 expression group).
Ppi Network Analysis
Following elimination of the DEGs with different expression tendency and isolated nodes in the network, a total of 177 nodes and 367 edges were included in the PPI network (Fig. 6). The CytoNCA plugin was used to analyze the centrality of nodes. The top 10 proteins, which were also defined as crucial proteins based on four different centrality parameters, were presented in Table 3, and they included SST, CALCA, GLP1R, AFP, FOXG1, GAGE2A, DLX2, CDX2, SIX3, FOXG1, NTS.
Table 3
The top 10 proteins ranked based on the node centrality of the PPI network
rank | Degree centrality | Betweenness centrality | Closeness centrality | Eigenvector centrality |
Gene symbol | Expression in HCC | Gene symbol | Expression in HCC | Gene symbol | Expression in HCC | Gene symbol | Expression in HCC |
1 | SST | up-regulated | FOXG1 | up-regulated | SST | up-regulated | SST | up-regulated |
2 | CHGA | up-regulated | UNC5D | down-regulated | ASCL1 | down-regulated | CALCA | down-regulated |
3 | CALCA | down-regulated | EPHA6 | up-regulated | LHX8 | up-regulated | CHGA | up-regulated |
4 | GAD2 | up-regulated | EFNA5 | up-regulated | DLX2 | up-regulated | TAC1 | up-regulated |
5 | CT45A1 | up-regulated | ASCL1 | down-regulated | CHGA | up-regulated | DRD1 | down-regulated |
6 | TAC1 | up-regulated | PAGE1 | up-regulated | GAD2 | up-regulated | NTS | up-regulated |
7 | DRD1 | down-regulated | GAGE2A | up-regulated | CDX2 | up-regulated | GAST | up-regulated |
8 | GAST | up-regulated | GAD2 | up-regulated | SIX3 | up-regulated | CRHR1 | up-regulated |
9 | GLP1R | up-regulated | SST | up-regulated | FOXG1 | up-regulated | ADCY8 | down-regulated |
10 | AFP | up-regulated | DLX2 | up-regulated | DRD1 | down-regulated | GLP1R | up-regulated |
Analysis of immune infiltration and its correlation with PDRG1 expression
Spearman correlation was employed to show the association between the expression level (TPM) of PDRG1 and immune cell infiltration level quantified by ssGSEA in the HCC tumor microenvironment. As shown in Fig. 7A, we found that PDRG1 expression was positively correlated with NK, CD56 bright cells, Tfh, Th2 cells, and negatively correlated with neutrophils, Th17 cells, Tgd, DCs, Tcm, eosinophils, cytotoxic cells. Then we studied the relation between the expression level of PDRG1 and the infiltration levels of Th2 cells by Spearman correlation analysis (Fig. 7B). The results suggested that the expression level of PDRG1 was significantly correlated with the infiltration levels of Th2 cells. The correlation coefficient R was 0.349 (P < 0.001). Moreover, we analyzed the difference of Th2 cells infiltration level between PDRG1 high and low expression groups by Wilcoxon rank sum test statistical method. The results were shown in Fig. 7C. The Th2 cells infiltration level in high expression group of PDRG1 were much more than the ones in low expression group, the result was statistically significant (p <0.001).
Development of prognostic model based on PDRG1 and clinicopathological factors
We analyzed the diagnostic efficacy of PDRG1 in HCC patients by ROC analysis. The area under the curve (AUC) of PDRG1 (Fig. 8) was 0.962 (CI: 0.943–0.982), representing a very efficient ability to identified HCC patients from healthy people. Then, we used Kaplan-Meier Plotter to evaluate the prognostic value of PDRG1 in HCC with OS by Survminer package. High expression of PDRG1 was associated with worse OS (HR = 1.98 (1.39–2.83), p < 0.001) (Fig. 9). Moreover, we used univariate Cox regression analysis on OS in HCC patients with clinicopathological characteristics and PDRG1 expression. The variables with p value < 0.1 in univariate Cox regression will be included in further analyzation by multivariate Cox regression. The results were showed in Table 4. The variables that meet this threshold were T stage (p < 0.001), M stage (paired 0.018), Pathologic stage (p < 0.001), Tumor status (p < 0.001), TP53 status (p = 0.069) and PDRG1 (p < 0.001). Furthermore, multivariate Cox regression showed that Tumor status (P = 0.002) and PDRG1 (P = 0.024) were independent prognostic factors in OS of HCC patients. Furthermore, we compared the correlation of PDRG1 expression and OS of different clinicopathological subgroups in HCC patients (Fig. 10–11). It showed that high expression of PDRG1 was correlated with worse OS in T1&T2 stage (HR=1.872, P༝0.008), T3 stage (HR༝2.541, P༝0.004), N0 stage (HR༝1.778, P༝0.009), M0 stage (HR༝2.093, P༝0.001). These results suggested that PDRG1 expression level can impact the prognosis in HCC patients with different pathological stages.
To provide clinicians with a quantitative approach to predict the prognosis of HCC patients, a nomogram (Fig. 12) was constructed to integrate PDRG1 and another independent prognostic variable tumor status. In this nomogram based on multivariate Cox analysis, a point scale was used to assign points to these variables. A straight line was drawn upward to determine the points to these variables, and the sum of the points assigned for each variable was rescaled to a range from 0 to 100. The range of available values of PDRG1 was from 0 to 78, it indicated that PDRG1 played an important role in the contribution to the OS of HCC patients. The points of the variables were accumulated and recorded as the total points. A worse prognosis was represented by a higher total number of points on the nomogram. The probability of HCC patient survival at 1, 3 and 5 years was determined by drawing a vertical line from the total point axis straight downward to the outcome axis. OS rate at 1, 3 and 5 years of HCC patients with tumor-free status in high expression of PDRG1 group were 0.86, 0.69, 0.54. The bias-corrected line in the calibration plot was found to be close to the ideal curve (the 45-degree line), which indicated good agreement between the prediction and observation (Fig. 13).
Figure 11. Kaplan-Meier Plotter showed that high expression of PDRG1 was associated with worse Overall Survival of HCC patients in different subgroups of clinicopathological factors. A. T1&T2 stage (HR=1.872, P༝0.008); B. T3 stage (HR༝2.541, P༝0.004); C. N0 stage (HR༝1.778, P༝0.009); D. M0 stage (HR༝2.093, P༝0.001).
Figure 13: Calibration plot of the nomogram for predicting the probability of OS at 1, 3, and 5 years. red line: 1-year OS; blue line: 3-year OS; green line: 5-year OS; grey line: ideal line.
Table 4
univariate / multivariate Cox regression analysis on Overall Survival in HCC patients
Characteristics | Total(n) | Univariate analysis | Multivariate analysis |
HR (95% CI) | P value | HR (95% CI) | P value |
T stage (T3&T4 vs. T1&T2) | 367 | 2.540(1.785–3.613) | < 0.001 | 1.668(0.222–12.539) | 0.619 |
N stage (N1 vs. N0) | 256 | 2.004(0.491–8.181) | 0.333 | | |
M stage (M1 vs. M0) | 270 | 4.032(1.267–12.831) | 0.018 | 1.310(0.308–5.577) | 0.715 |
Pathologic stage (Stage III &Stage IV vs. Stage I &Stage II) | 346 | 2.449(1.689–3.549) | < 0.001 | 1.450(0.194–10.808) | 0.717 |
Residual tumor (R1&R2 vs. R0) | 341 | 1.571(0.795–3.104) | 0.194 | | |
Histologic grade (G3&G4 vs. G1&G2) | 365 | 1.120(0.781–1.606) | 0.539 | | |
Gender (Male vs. Female) | 370 | 0.816(0.573–1.163) | 0.260 | | |
Age (> 60 vs. <=60) | 370 | 1.248(0.880–1.768) | 0.214 | | |
Race (White vs. Asian &Black or African American) | 358 | 1.245(0.867–1.789) | 0.235 | | |
Weight (> 70 vs. <=70) | 343 | 0.916(0.640–1.312) | 0.634 | | |
Height ( > = 170 vs. < 170) | 338 | 1.208(0.833–1.753) | 0.319 | | |
Adjacent hepatic tissue inflammation (Mild &Severe vs. None) | 233 | 1.228(0.755–1.997) | 0.409 | | |
AFP (ng/ml) (> 400 vs. <=400) | 277 | 1.056(0.646–1.727) | 0.827 | | |
Albumin(g/dl) (< 3.5 vs. >=3.5) | 296 | 1.085(0.665–1.771) | 0.743 | | |
Prothrombin time ( < = 4 vs. >4) | 293 | 0.752(0.496–1.140) | 0.179 | | |
Child-Pugh grade (B&C vs. A) | 238 | 1.616(0.797–3.275) | 0.183 | | |
Fibrosis ishak score (3/4&5/6 vs. 1/2) | 137 | 0.806(0.375–1.737) | 0.583 | | |
Vascular invasion (Yes vs. No) | 314 | 1.348(0.890–2.042) | 0.159 | | |
Tumor status | 351 | 2.361(1.620–3.441) | < 0.001 | 2.189(1.328–3.608) | 0.002 |
TP53 status (Mut vs. WT) | 357 | 1.434(0.972–2.115) | 0.069 | 1.460(0.870–2.451) | 0.152 |
PDRG1 (High vs. Low) | 370 | 1.984(1.391–2.828) | < 0.001 | 1.747(1.077–2.835) | 0.024 |
Construction And Evaluation Of The Nomogram
we compared PDRG1 expression in patients with HCC and healthy people, and evaluated the discrimination ability of PDRG1 in HCC by receiver operating characteristic (ROC) analysis using pROC package[23]. The computed area under the curve (AUC) value ranging from 0.5 to 1.0 indicates the discrimination ability from 50 to 100%. All statistical tests were two tailed with a statistical significance level set at 0.05 in this study.
To individualize the predicted survival probability for 1 year, 3 years and 5 years, a nomogram was constructed based on the results of the multivariate analysis. The rms R package (version 6.0–1) was used to generate a nomogram that included significant clinical characteristics and calibration plots. Calibration and discrimination are the most commonly used methods for evaluating the performance of models. In this study, the calibration curves were graphically assessed by mapping the nomogram predicted probabilities against the observed rates, and the 45° line represented the best predictive values.
Statistical analysis
All the data were presented as mean ± SD. OS was calculated by the Kaplan Meier method and analyzed with the log-rank test. The univariate and multivariate analyses were calculated based on the Cox regression model. The statistical analysis was performed as appropriate by chi-square test and t-test. The P value of the test was 2-tailed with a level of significance (α) = 0.05. The P-value < 0.05 were considered statistically significant.