Differential expression analysis
The gene expression heatmaps of DElncRNAs, DEmRNAs and DEmiRNAs were shown in Figure 1B, Figure 1C and Figure 1D. Based on differential analysis, we obtained 1562 DElncRNAs (1415 up-regulated and 147 down-regulated), 3013 DEmRNAs (2533 up-regulated and 480 down-regulated) and 183 DEmiRNAs (175 up-regulated and 8 down-regulated).
The screening of 22 candidate SNPs on HCC related ceRNAs
As shown in the flow chart (Figure 1A) and the Method section of this paper, eight HCC related lncRNA-mediated ceRNAs were selected after screening. The lncRNASNP2 database was used to annotate the above eight lncRNAs, and 2699 SNPs affecting the binding of lncRNA-miRNA were extracted from the database. These 2699 SNPs affected the binding of 2288 miRNAs, and 153 miRNAs were retained through DEmiRNAs filtering, of which 505 SNPs were involved. Finally, 22 SNPs with MAF > 0.05 in Asian populations were screened from the 1000 Genome. Genomic visualization of the eight lncRNAs and 22 candidate SNPs were shown in Figure 1E.
Identification of 15 SNPs for genotyping
As described in the Method section, the feasibility of genotyping for the 22 candidate SNPs was evaluated. First, in the 48-Plex SNPscan™ typing system, the 22 candidate SNPs were classified into four levels: first, second, third and fail (Supplementary Figure 1A). Subsequent genotyping was then abandoned for fail grade SNPs (rs7336379, rs9986879 and rs10259737). Haploview 4.2 software (https://sourceforge.net/projects/haploview/) was used to determine the linkage disequilibrium (LD) by the standardized D’ and r2 values for 19 other SNPs. The results showed strong LD between some SNPs (Supplementary Figure 1B). Specifically, rs1050171 on EGFR-AS1 can represent rs7795743, rs16846478 on GAS5 can represent rs58994962 and rs2235095, and rs3931282 on PVT1 can represent rs3931281 for subsequent genotyping. Finally, the remaining 15 SNPs were genotyped, and their network diagram with corresponding lncRNAs and miRNAs was shown in Figure 1F.
Characteristics of the study population
The demographic characteristics such as age, gender, smoking and alcohol consumption of 1601 samples enrolled in this study, as well as the clinical information and clinical test indexes of 800 HCC patients are shown in Table 1. No significant differences were observed for age, gender and cigarette smoking between cases and controls (all P > 0.05). However, alcohol drinking was significantly different between the two groups (P = 0.010). Whether the difference was significant or not, these demographic characteristics were included in the statistical model as potential confounding factors for genetic association analysis.
The Association of 14 candidate SNPs in lncRNA-mediated ceRNAs with HCC risk
The genotype distribution of the 15 SNPs is shown in Figure 2A, among which only rs2278176 does not conform to HWE and will be eliminated in subsequent studies. The results of χ2 test showed that the distribution of GG, GC, GG + GC genotype and CC genotype of rs7958904 was significantly different between the case group and the control group (GG vs CC, P = 0.028; GC vs CC, P = 0.008; GG + GC vs CC, P = 0.014). Then binary logistic regression was used to explore the association between rs7958904 and HCC risk. Since age, gender, cigarette smoking and alcohol drinking were identified as risk factors for HCC 31, they were included as covariates to adjust for confounders (Figure 2B). Compared with CC genotype, a protective effect of rs7958904 GG, GC and GG + GC genotypes was found for HCC risk (P = 0.042, OR = 0.65, 95% CI = 0.43-0.98; P = 0.013, OR = 0.59, 95% CI = 0.38-0.89; and P = 0.023, OR = 0.63, 95% CI = 0.42-0.94, respectively). Furthermore, the association still stood after an FDR correction (corrected P = 0.042; P = 0.035 and P = 0.035, respectively). For other SNPs, there were no significant correlations between different genotypes and HCC risk.
Association analysis between 14 SNPs and clinical characteristics of HCC patients
The association between 14 SNPs and the clinical characteristics of HCC patients, including liver cirrhosis, Child-Pugh grade, tumor size, tumor number, vascular invasion, lymphatic metastasis, distant metastasis and TNM stage, was analyzed by binary logistic regression. And age, gender, cigarette smoking, alcohol drinking, family history, HBsAg and anti-HCV were used as covariates to adjust for confounding factors. FDR correction was performed for all P values. The HCC patients were staged according to the AJCC-TNM classification 32. Early-stage patients included patients with stage I and II. Advanced stage patients included patients with stage III and IV. As shown in Figure 3A, the distribution of rs3931282 genotypes showed a significant difference between early stage and advanced stage (P = 0.003). In detail, the frequency of AA+GA genotypes was higher in early stage than that in advanced stage and showed a decreased risk for HCC progression (corrected P = 0.042, OR = 0.58, 95% CI = 0.40-0.83). In addition, the results of the correlation analysis of the remaining clinical characteristics are shown in Supplementary Figure 2.
Association analysis between 14 SNPs and clinical test indexes of HCC patients
In this study, the levels of AFP, AST, ALT and AST/ALT ratio in HCC patients did not conform to the normal distribution. Mann-Whitney U test was used to explore the correlation between the distribution of 14 SNP genotypes and these indicators. FDR correction was performed for all P values at the same time. As shown in Figure 3B, the AST and ALT levels in patients with rs1134492 CC + TC genotype were significantly higher than those in patients with TT genotype (corrected P = 0.042 and 0.014, respectively). Meanwhile, AST/ALT ratio was significantly higher in patients with rs10589312 TCTTGC/TCTTGC + TCTTGC/T genotype than in patients with TT genotype (corrected P = 0.021), and AST/ALT ratio were significantly lower in patients with rs84557 CC + CT genotype than in patients with TT genotype (corrected P = 0.006). In addition, the results of the correlation analysis of the clinical test indexes are shown in Supplementary Figure 3.
Acquisition of mRNAs targeted by miRNAs
In this association study, five SNPs (rs7958904, rs3931282, rs1134492, rs10589312 and rs84557) were found to be associated with the occurrence and prognosis of HCC. These five loci located on lncRNA coding genes would affect the binding of six microRNAs (miR-615-3p, miR-205-5p, miR-34b-5p, miR-183-3p, miR-31-5p and miR-33b-5p) to lncRNAs (Figure 1E). The target genes of these six miRNAs were obtained by selecting the miRWalk and miRTarBase and screening the mRNAs shared by the two databases. For miR-615-3p, highly expressed DEmRNAs in tumors were also compared with candidate target mRNAs to obtain cross-genes between them. Finally, 57 target genes of miR-615-3p, 163 target genes of miR-205-5p, 92 target genes of miR-34b-5p, 108 target genes of miR-183-3p, 171 target genes of miR-31-5p and 87 target genes of miR-33b-5p were extracted (Figure 4A, Figure 4D, Figure 4G, Figure 4J, Figure 4M and Figure 4P, respectively).
The modular analyses of PPI network of miRNA-target mRNAs
Firstly, PPI networks of six miRNAs targets were constructed based on String, and the Cytoscape plug-in MCODE was used for module analysis. Then, the PPI networks of miR-615-3p, miR-205-5p, miR-34b-5p, miR-183-3p, miR-31-5p and miR-33b-5p target genes were constructed into two, five, three, three, six and three functional modules, respectively. (Figure 4B, Figure 4E, Figure 4H, Figure 4K, Figure 4N and Figure 4Q, respectively). Furthermore, KEGG or Go-BP enrichment of the most important modules was performed using Cytoscape plug-ins ClueGO and CluePedia, which was the first cluster in the PPI network module analysis of the six miRNAs target genes. The results of KEGG or Go-BP enrichment analysis were shown in Figure 4C, Figure 4F, Figure 4I, Figure 4L, Figure 4O and Figure 4R. Therefore, the targets of the above miRNAs may be involved in the occurrence and development of HCC by participating in the pathways or biological processes shown in the figure.