Construction of risk score systems
Based on the univariate Cox regression analysis, 97 DE-mRNAs and 10 DE-lncRNAs significantly correlated with overall survival were selected. Furthermore, 10 DE-mRNAs (including aldehyde dehydrogenase 1 family member A1, ALDH1A1; carbonic anhydrase 9, CA9; GDP-mannose 4,6-dehydratase, GMDS; leucine carboxyl methyltransferase 2, LCMT2; leucine rich repeat containing 75A, LRRC75A; methyltransferase like 1, METTL1; RAB29, member RAS oncogene family, RAB29; transcriptional adaptor 2B, TADA2B; tudor domain containing 7, TDRD7; and tigger transposable element derived 2, TIGD2) and eight DE-lncRNAs (including COL18A1 antisense RNA 1, COL18A1-AS1; long intergenic non-protein coding RNA 92, LINC00092; long intergenic non-protein coding RNA 298, LINC00298; long intergenic non-protein coding RNA 636, LINC00636; long intergenic non-protein coding RNA 885, LINC00885; long intergenic non-protein coding RNA 957, LINC00957; long intergenic non-protein coding RNA 1124, LINC01124; and ZEB1 antisense RNA 1, ZEB1-AS1) correlated with independent prognosis were screened using the multivariate Cox regression analysis (Table 1).
Table 1
The mRNAs and long non-coding RNAs (lncRNAs) correlated with independent prognosis.
Type | RNA | Coef | P-value | Hazard Ratio | 95% CI | Cutoff |
lncRNA | COL18A1-AS1 | -1.16269 | 2.02E-02 | 0.313 | 0.052–0.866 | 0.01 |
| LINC00092 | -0.02708 | 3.85E-02 | 0.973 | 0.916–0.995 | 0.3 |
| LINC00298 | 0.08726 | 4.77E-02 | 1.091 | 1.008–1.388 | 0.04 |
| LINC00636 | -0.27707 | 1.15E-02 | 0.758 | 0.537–0.970 | 0.12 |
| LINC00885 | -0.27569 | 4.04E-02 | 0.759 | 0.397–0.950 | 0.06 |
| LINC00957 | -0.01955 | 3.63E-02 | 0.981 | 0.940–1.023 | 0.51 |
| LINC01124 | -0.05308 | 1.32E-02 | 0.948 | 0.885–1.016 | 0.02 |
| ZEB1-AS1 | 0.03671 | 1.05E-02 | 1.037 | 0.992–1.084 | 0.21 |
mRNA | ALDH1A1 | -0.043957 | 5.90E-03 | 0.957 | 0.928–0.987 | 1.7 |
| CA9 | 0.030203 | 3.85E-03 | 1.031 | 1.010–1.052 | 1.07 |
| GMDS | 0.076649 | 1.39E-02 | 1.080 | 1.016–1.148 | 0.96 |
| LCMT2 | -0.185329 | 3.72E-3 | 0.831 | 0.733–0.942 | 1.83 |
| LRRC75A | -0.068367 | 3.13E-02 | 0.934 | 0.878–0.994 | 0.92 |
| METTL1 | 0.072283 | 4.35E-02 | 1.075 | 1.002–1.153 | 0.52 |
| RAB29 | -0.095801 | 1.26E-02 | 0.909 | 0.843–0.980 | 0.55 |
| TADA2B | -0.105184 | 2.66E-02 | 0.900 | 0.820–0.988 | 0.67 |
| TDRD7 | 0.075387 | 3.58E-02 | 1.078 | 1.005–1.157 | 1.93 |
| TIGD2 | 0.114616 | 3.53E-02 | 1.121 | 1.008–1.248 | 1.9 |
Note: CI, confidence interval. |
The cutoff value of the expression level of each independent prognosis-associated DE-lncRNA and DE-mRNA was obtained, based on which the status of each sample in the expression of the RNA was defined. Then, the expression status-based risk score systems were constructed combined with the prognostic coefficients and expression status of the independent prognosis-associated DE-lncRNAs and DE-mRNAs, and the corresponding formula was:
mRNA status RS= (-0.043957)*StatusALDH1A1 + (0.030203)*StatusCA9 + (0.076649)*StatusGMDS + (-0.185329) *StatusLCMT2+ (-0.068367)*StatusLRRC75A+ (0.072283)*StatusETTL1+ (-0.095801)*StatusRAB29+ (-0.105184) *StatusTADA2B + (0.075387)*StatusTDRD7+ (0.114616)*StatusTIGD2
lncRNA status RS =(-1.16269)*StatusCOL18A1−AS1+ (-0.02708)*StatusLINC00092 + (0.08726)*StatusLINC00298+ (-0.27707)*StatusLINC00636+ (-0.27569)*StatusLINC00885+ (-0.01955)*StatusLINC00957+ (-0.05308)*StatusLINC01124+ (0.03671)*StatusZEB1−AS1
For the independent prognosis-associated DE-lncRNAs and DE-mRNAs, the expression level-based risk score systems were constructed and the formula was as follows:
mRNA exprs RS= (-0.043957)*ExprsALDH1A1 + (0.030203)* ExprsCA9 + (0.076649)* ExprsGMDS + (-0.185329) *ExprsLCMT2+ (-0.068367)*ExprsLRRC75A+ (0.072283)*ExprsETTL1+ (-0.095801)*ExprsRAB29+ (-0.105184) *ExprsTADA2B + (0.075387)* ExprsTDRD7+ (0.114616)* ExprsTIGD2
lncRNA exprs RS =(-1.16269)*ExprsCOL18A1−AS1+ (-0.02708)*ExprsLINC00092 + (0.08726)* ExprsLINC00298+ (-0.27707)*ExprsLINC00636+ (-0.27569)*ExprsLINC00885+ (-0.01955)*ExprsLINC00957+ (-0.05308)*ExprsLINC01124+ (0.03671)* ExprsZEB1−AS1
According to the median of the RSs, the OS samples in the training set and the validation set separately were divided into two groups (high risk group and low risk group). Based on KM method, correlation analysis for the risk grouping and actual survival prognosis information was performed. For both the training set and the validation set, the risk groups divided by the lncRNA expression status-based risk score system (training set: log-rank p-value = 2.196e-03, C-index = 0.703, Brier score = 0.0652; validation set: log-rank p-value = 3.537e-03, C-index = 0.779, Brier score = 0.0897) and mRNA expression status-based risk score system (training set: log-rank p-value = 6.264e-11, C-index = 0.878, Brier score = 0.0402; validation set: log-rank p-value = 8.905e-03, C-index = 0.822, Brier score = 0.0647) had significant correlations with the actual survival prognosis information (Fig. 2). Meanwhile, the risk groups divided by the lncRNA expression level-based risk score system (training set: log-rank p-value = 5.071e-04, C-index = 0.730, Brier score = 0.0603; validation set: log-rank p-value = 6.793e-03, C-index = 0.887, Brier score = 0.101) and mRNA expression level-based risk score system (training set: log-rank p-value = 1.927e-08, C-index = 0.869, Brier score = 0.0397; validation set: log-rank p-value = 1.132e-01, C-index = 0.733, Brier score = 0.147) were significantly correlated with the actual prognosis information (Fig. 3). Through comparing the log-rank p-values, C-indexes, and Brier scores, the mRNA expression status-based risk score system was selected as the optimal risk score system.