Associations of SNPs in NER pathway genes with OS of HBV-HCC patients
Baseline characteristics of the 866 Chinese HBV-HCC patients were described in Supplementary Table 2. Simply put, these patients were significantly more male (760, 87.36%) than female (106, 12.20%), and the median age of diagnosis was 47 years. As shown in Fig. 1, we first assessed the associations between all the 8,243 acquired SNPs in the NER pathway genes with OS of HBV-HCC patients in the discovery dataset, and found 409 SNPs were significantly associated with OS of HBV-HCC patients in an additive model, of which 193 SNPs remained significant after the multiple test correction by BFDP (cutoff value: 0.75). After that, we repeated the correlation analyses in the replication dataset, and found that 26 SNPs in 4 genes (ELL, INO80D, USP45 and PRPF19) were still significant (Supplementary Table 3).
Select independent functional SNPS
We further screened these 26 identified SNPs in GTEx database, as shown in Fig. 2 and Supplementary Fig. 1, 15 of the 26 SNPs were significantly correlated with the mRNA expression levels of their corresponding genes (P < 0.050) in normal liver tissues and were selected as candidate SNPs. Then, the results of LD analyses showed that the four candidate SNPs in USP45 and the eleven candidate SNPs in PRPF19 were all in high LD (r2 > 0.80; Supplementary Fig. 2). Finally, the SNPs USP45 rs4840048 and PRPF19 rs7116665 with the highest prediction scores (probability score ranges from 0 to 1, with 1 indicating the most likely to be a regulatory variable.) in RegulomeDB were selected for further analyses (Supplementary Table 4).
To determine whether these two SNPs were independently associated with OS of HBV-HCC patients, we performed stepwise multivariate Cox regression analyses with adjustment for age, sex, smoking status, drinking status, AFP level, cirrhosis status, embolus and BCLC stage in the combined dataset. As a result, USP45 rs4840048 T > C and PRPF19 rs7116665 C > A remained significant in the model as independent predictors of OS of HBV-HCC patients (HR = 0.64, 95%CI = 0.48–0.86, P = 0.003 and HR = 1.31, 95%CI = 1.13–1.53, P < 0.001, respectively) (Table 1). The two independent SNPs in the present study were summarized in Manhattan plots (Supplementary Fig. 3), and additional regional association plots for each of the two independent SNPs was shown in Supplementary Fig. 4.
Associations between independent SNPs and OS of HBV-HCC patients in combined datasets
As shown in Table 2, USP45 rs4840048 C allele and PRPF19 rs7116665 A allele were significantly associated with OS of HBV-HCC patients, and the allele dose-effect relationship was observed in the combined dataset (USP45 rs4840048 C allele: Ptrend=0.003 and PRPF19 rs7116665 A allele: Ptrend<0.001). In the dominant genetic model, compared with USP45 rs4840048 TT genotype, USP45 rs4840048 TC + CC genotype was associated with more favorable OS of HBV-HCC patients, (HR = 0.64; 95%CI = 0.47–0.87, P = 0.005), while PRPF19 rs7116665 CA + AA genotype was associated with poorer OS of HBV-HCC patients (PRPF19 rs3885382 CA + AA vs CC: HR = 1.27, 95%CI = 1.04–1.54, P = 0.018).
In order to evaluate the cumulative effect of these two independent SNPs on OS of HBV-HCC patients, we used the unfavorable genotypes (USP45 rs4840048 TT, PRPF19 rs7116665 CA + AA) to construct the risk scores and divided all the HBV-HCC patients into three groups (with 0, 1 and 2 risk scores, respectively) according to the NUGs. As shown in Table 2, an increased NUGs was correlated with a worse OS of HBV-HCC patients in the combined dataset after adjusting for covariables (Ptrend<0.001). Furthermore, we dichotomized all the HBV-HCC patients into two groups: low-risk (0–1 NUGs) and high-risk (2 NUGs). Compared with the low-risk group, the high-risk group had a significantly worse OS of HBV-HCC patients in the combined dataset (HR = 1.39; 95% CI = 1.15–1.69, P < 0.001; Table 2). In addition, we plotted KM survival curves to show the relationship between NUGs and OS of HBV-HCC patients, as shown in Fig. 3.
Stratified analyses and interaction analyses for the effect of NUGs on OS of HBV-HCC patients
Subsequently, we performed stratified analyses in the combined dataset to assess whether the association of NUGs with OS of HBV-HCC patients was modified by the covariables, including age, sex, smoking status, drinking status, AFP level, cirrhosis status, embolus, and BCLC stage. Compared with those with 0–1 NUGs, patients with 2 NUGs had a significantly worse OS, particularly in the subgroups of those patients ≤ 47 years old, males, non-smokers, non-drinkers, with AFP > 400 ng/ml, without cirrhosis, with embolus and with BCLC B/C stage. Furthermore, we also analyzed the interaction between the covariables and NUGs, but found no significant interaction between them (Supplementary Table 5).
ROC curves and time-dependent AUC of the two independent SNPs
To further access the predictive value of these two independent SNPs for OS of HBV-HCC patients, time-dependent AUC and ROC curve analyses were performed in the presence of covariables. In combined dataset, compared with the model for the all variables, the 5-year time-dependent AUC and ROC curves with addition of the unfavorable genotypes suggested that the prediction performance of the model was significantly improved, the AUCs changed from 72.07–74.17% (P = 0.006; Fig. 3C and D), but not for 1-year and 3-year (71.07–71.75% and P = 0.419 for the 1-year and 72.72–73.40% and P = 0.299 for the 3-year; Supplementary Fig. 5).
The expression quantitative trait loci analyses
To further explore the potential functions of these two independent SNPs, we investigated the relationship between SNP genotypes and corresponding gene mRNA expression by performing eQTL analyses. As shown in Fig. 2A, the rs4840048 T allele was significantly correlated with higher USP45 mRNA expression levels in normal liver tissues (P = 0.010) in GTEx database but not for the 1000G (P = 0.607, Supplementary Fig. 6A). The rs7116665 A allele was significantly correlated with lower PRPF19 mRNA expression levels in normal liver tissues (P = 0.003) (Fig. 2B), but not for the1000G (P = 0.053, Supplementary Fig. 6B). Moreover, as shown in Fig. 4A and B, and Supplementary Table 4, we found that USP45 rs4840048 T > C located in several signaling region of histones such as H3K36me3, H3K4me1 and H3K9ac, and has an effect on motifs; PRPF19 rs7116665 C > A was close to H3K4Me3, H3K4Me1 and H3K9me3 signaling regions, and can influence enhancer histone marks, DNase and motifs. These above findings suggest that these two independent SNPs may interfere with gene expression through transcriptional regulation.
Differential mRNA expression analyses
Subsequently, we investigated the mRNA expression levels of USP45 and PRPF19 genes in non-paired HCC tumors and normal tissue samples from the TCGA database (http://ualcan.path.uab.edu/). We also evaluated the association between the mRNA expression levels and survival of HCC patients in the TCGA Ndatabase (http://ualcan.path.uab.edu/). As shown in Fig. 4C and D, the expression levels of USP45 mRNA in HCC tumor tissues were higher than that in normal tissues (P < 0.001), and its high expression levels were significantly correlated with poorer survival in HCC patients (P = 0.026). Similarly, the expression levels of PRPF19 mRNA in tumor tissues were also significantly higher than that in normal tissues (P < 0.001), and its high expression is also correlated with poorer survival in HCC patients (P < 0.001) (Fig. 4E and F).
Genes mutation analyses
It is likely that gene somatic mutations in tumor tissues may also play a role in tumor progression[36, 37]. Therefore, we analyzed the mutation of USP45 and PRPF19 in HCC tissues by using cBioPortal for Cancer Genomics database. We found that USP45 possesses an extremely low somatic mutation rate in different HCC data sets (0.82% in the INSERM, 0.82% in the MERiC/Basel and 0.54% in the TCGA). Similarly, PRPF19 also displayed a low somatic mutation rate in different HCC data sets (0.82% in the MERiC/Basel, 0.43% in the AMC, 0.41% in the INSERM, and 0.27% in the TCGA) (Supplementary Fig. 7). Therefore, these low mutation frequencies of USP45 and PRPF19 indicate that the limited somatic mutations may not have a substantial impact on the mRNA expression levels of these two genes in HCC, if any.