Construction of a recurrence risk model based on differentially coexpressed genes
We applied a Cox proportional hazard regression model to analyse the 182 differentially coexpressed genes(9) and TTR data in the training cohorts in TCGA, and then we submitted these genes that were significantly associated with the TTR of PTC (log rank P < 0.05) to Lasso Cox regression analysis and 10-fold cross-validation for model construction and identified 5 optimal genes when lambda = 0.020 (Supplementary Figure 1A-B). By applying Kaplan-Meier analysis to these five genes, we confirmed that LRP2, KCNJ13 and TMC3 were significantly associated with TTR (P < 0.05), PDZK1IP1 had a marginally significant positive correlation with TTR (P = 0.062), and PDE10A could not predict TTR in the training cohort (P = 0.108, Supplementary Figure 1C-G, Supplementary Table S2). Thus, we developed a new 4-gene signature calculated using multivariate Cox survival analysis, with the formula being as follows: RiskScore = 0.07 × expressionPDZK1IP1 - 0.129 × expressionLRP2 - 0.468 × expressionKCNJ13 - 0.595 × expressionTMC3. The risk score distribution of the sample showed that patients in the high-risk score group had worse recurrence than those in the low-risk score group; PDZK1IP1 expression was positively associated with recurrence risk, whereas the expression levels of LRP2, KCNJ13, and TMC3 were negatively associated with recurrence risk (Figure 1A). These data indicated that PDZK1IP1 was a risk factor and that LRP2, KCNJ13, and TMC3 were protective factors. Furthermore, we analysed the prediction efficiency of the 4-gene signature on one-, two-, and three-year TTR of PTC patients and found that the area under the ROC curve (AUROC) of the 4-gene signature was greater than 0.70 (Figure 1B). With this risk-score formula, patients in the training set were divided into high-risk or low-risk score groups, with the z-score (0) serving as the cut-off. Kaplan-Meier curves showed that patients in the high-risk group (n = 130) had a shorter TTR than those in the low-risk group (n = 118, P < 0.0001; Figure 1C).
TCGA and GEO datasets test and validation
We calculated the risk score of each sample in the other 50% of TCGA PTC samples (TCGA validation dataset) according to the expression level of the sample, which also showed that patients in the high-risk-score group had worse recurrence than those in the low-risk-score group (Figure 2A). Furthermore, we analysed the prediction efficiency of the 4-gene signature on one-, two-, and three-year TTR in the TCGA validation dataset and found that the AUROC of the 4-gene signature on one-, two-, three-year TTR was 0.84, 0.64 and 0.51, respectively (Figure 2B). Patients in the TCGA validation dataset were classified into high-risk and low-risk groups with the same cut-off as that used in the training set. As expected, patients in this cohort with high-risk scores (n = 143) had a shorter TTR than those with low-risk scores (n = 106, P = 0.044; Figure 2C).
We further assessed the prognostic value of these 4-gene signatures in the whole TCGA PTC dataset with the same algorithm as that used in the training set. The spectrum clustering analysis also successfully divided all patients into two groups (Figure 3A). In the ROC analysis, the AUROC for the one-, two-, three-year TTR was 0.73, 0.71 and 0.60, respectively (Figure 3B). With the same risk-score formula, patients in the whole TCGA PTC cohort with high-risk scores (n = 273) had a shorter TTR than those with low-risk scores (n = 224, P < 0.001, Figure 3C). We further investigated the prognostic significance of the 4-gene signature in patients stratified by clinicopathological parameters. Kaplan-Meier analysis showed that patients with high-risk scores had shorter overall survival than those with low-risk scores in all age groups, among females, in stages I-II, and in all T stages (Supplementary Figure 2). These data demonstrated that the 4-gene signature could accurately predict the recurrence risk in most patients.
The robustness of the 4-gene signature was also assessed in the GSE33630 dataset. However, TMC3 was absent from this database, and we could not further analyse the predictive capacity of the 4-gene signature. Nevertheless, we also observed high expression levels of PDZK1IP1 and low expression levels of LRP2 and KCNJ13 in PTC samples compared to normal tissue samples (P < 0.001, Figure 3D). These data indicate that high expression of PDZK1IP1 and low expression of LRP2 and KCNJ13 are associated with the development of PTC.
linicopathological associations of the risk score in the TCGA cohort (Supplementary Table S3,Figure 4). There was no significant difference in age or sex between the low-risk subgroup and the high-risk subgroup. The high-risk-score subgroup was positively correlated with progressive linicopathological features, including extrathyroidal extension (ETE), T3/T4 stage and lymphatic metastasis (LNM). The low-risk-score subgroup was negatively correlated with these linicopathological features. Next, we analysed the associations between the four genes and clinicopathological variables (Supplementary Table S4). PDZK1IP1 was determined to be positively correlated with advanced T stage, lymphatic metastasis, advanced TNM stage and ETE. LRP2 was observed to be negatively correlated with advanced T stage, lymphatic metastasis, advanced TNM stage and ETE. KCNJ13 was observed to be negatively correlated with advanced T stage, lymphatic metastasis, advanced TNM stage and ETE. TMC3 was determined to be negatively correlated with advanced T stage, advanced TNM stage and ETE.
IHC analysis of protein expression of 4-gene signature
Four-gene signature expression at the protein level is presented in Figure 5. PDZK1IP1 was significantly higher in PTC than in normal thyroid tissue and was higher in locally advanced PTCs than in T1/T2 PTCs. For LRP2, although there was no significant difference between PTC and NT, T1/T2 PTC and LA-PTC, we still observed that the expression in PTC was slightly lower than that in NT, and that in LA-PTC was slightly lower than that in T1/T21 PTC. The protein expression of KCNJ13 was notably downregulated in LA-PTC in comparison with T1/T2 PTC and NT, while there was no significant distinction between T1/T2 PTC and NT. The IHC staining of KCNJ13 and TMC3 was almost completely negative in NT. In PTC, the expression of TMC3 in T1/T2 PTC notably exceeds that in LA-PTC. The MODs of the 4 genes in the recurrent group and nonrecurrent group are compared in SupplementaryTable S5. Recurrent patients were compared with an equal number of patients without recurrence, who were randomly selected as the control group. In both T1/T2 PTC and LA-PTC, PDZK1IP1 was significantly upregulated in the recurrent group, and LRP2 and KCNJ13 were significantly downregulated in the recurrent group.
Univariate and multivariate Cox regression analyses of the 4-gene signature
To identify the independence of the 4-gene signature in clinical application, we analysed its prognostic value in the whole TCGA PTC cohort by using univariate and multivariate Cox regression analysis. Clinicopathological parameters, including age, sex, pathological T stage, N stage, M stage and TNM stage, were included in the analysis. In the univariate analysis, age, T stage (T3/T4 vs. T1/T2), tumour stage (stage III/IV vs. stage I/II), and the 4-gene signature were significantly related to the TTR in PTC (Figure 6A). However, the multivariate analysis indicated that only the 4-gene signature was significantly related to the recurrence risk in PTC (HR = 3.606, 95% CI = 1.659-7.839, P = 0.001, Figure 6B). To demonstrate the influence degree of different variables and different values of each variable on the outcome by the length of the line, a nomogram was constructed based on the clinicopathological parameters and 4-gene signature, and the 4-gene signature showed the greatest prediction value on the TTR (Figure 6C). The calibration plots for predicting 2-year, 3-year and 5-year RFS were modified to visualize the nomogram (Figure 6D).
Potential relative pathways
Single-sample GSEA (ssGSEA) was applied to identify the relationship between the risk scores and biological functions in PTC. Seventy pathways with a coefficient greater than 0.5 are presented in Figure 7A, and most of them were positively correlated with the risk score. Among these pathways, the ssGSEA scores of 13 metabolic pathways, including the intestinal immune network and p53 signalling pathway, increased with increasing risk score, while the ssGSEA scores of the other four pathways, such as the Hedgehog signalling pathway, decreased with increasing risk score (Figure 7B). LRP2, KCNJ13, TMC3 and PDZK1IP1 may be involved in these signalling pathways, and the dysfunction of these pathways is closely related to the progression and recurrence of PTC.