Clinic characters of the patients
We collected the basic information of 504 BC cases and 505 healthy controls into the analysis, including their disease on-set age, menarche age, menopause status, menopause age, reproductive history, number of abortions, breastfeeding history, family history and hormone receptor status (Table1). A loistics analysis shows the age of menarche was different between the BC patients and the healthy controls (P=0.030), multiple pregnancy (OR: 1.964, 95%CI:1.355-2.796) and the family history of breast cancer (OR=1.869 95%CI: 1.116-3.130) may be related to the increased risk of breast cancer;a history of breastfeeding (OR=0.724, 95%CI: 0.535-0.980) may be related to the reduced risk of breast cancer.
Susceptibility analysis
The correlation analysis between PCAT1 SNPs genotype and breast cancer susceptibility was present in Table 2. The analysis was performed in four different models (codominance, dominant, recessive and overdominance) respectively. In the adjusted logistics regression analysis, SNP rs4473999 was the risk factor of breast cancer in the overdominant model (OR=1.360, 95%CI: 1.009-1.832). To ensure the representativeness of the control group, the Hardy-Weinberg Balance Test was applied. It is apparent that all the control samples of SNPs were representative (P>0.05).
Stratified analysis
The stratified analysis consists of three aspects of stratification. The first is to stratify the patients’ clinical information in the model, including age, menarche age, menopause status, menopause age, number of pregnancies, number of abortions, breastfeeding history and family history, as shown in Table 3. In the dominant model, SNP rs117117537 was a risk factor for breast cancer in menopausal age >50 years (OR=2.413 95%CI: 1.057-5.508); rs4473999 was a risk factor for breast cancer in menopausal age >50 years (OR=2.137 95%CI: 1.065-4.286) and abortion times <2 (OR=1.510 95%CI: 1.045-2.181). Secondly, the hormone receptor status of case breast cancer patients was stratified. As shown in Table S4, only TT genotype (OR=0.158 95%CI: 0.029-0.864) of SNP rs785003 was associated with HER-2 receptor status. And finally, the data was stratified according to the molecular subtypes of breast cancer, that the analysis shows GA+AA genotype (OR=0.671 95%CI: 0.452-0.997) of SNP rs1551514 was correlate with luminal type breast cancer (Table S5).
Haplotype analysis and Gene-reproductive interaction
Haplotype analysis was used to determine the joint effect between SNPs of lncRNA, and the frequencies less than 3% were not present (Each haplotype was divided into two groups, the haplotype group and the non-haplotype group. The reference group was the non-haplotype group). As shown in Table 4, the Grs1551514Trs1551513Crs447399Crs9656964Trs17762938Crs7823297Trs785003Trs117117537 haplotype of PCAT1 was associated with increased risk of breast cancer (OR=1.614 95%CI: 1.116-2.333). Table 5 demonstrated the results of the interaction between genetic and reproductive factors analyzed using MDR software. Among the 1 to 3 order interaction models produced by fitting, the 3 order model was the optimal model, the average accuracy of training set was 0.6137, the average precision of test set was 0.5837 and the consistency rate of ten fold cross validation is 10/10. And the model includes three factors, rs4473999, number of pregnancies and breastfeeding history, which manifested that there was interaction between genes and reproductive factors.
False positive report probability (FPRP)
In this study, FPRP analysis [20] was used to evaluate the reliability of the positive results of PCAT1 SNPs associated with breast cancer susceptibility. The critical value of FPRP was set as 0.5. From the data in Table S6, it is apparent that when the prior probability was 0.25, the FPRP value of rs4473999, rs1551514 and rs117117537 positive results were all lower than the critical value, suggesting that rs4473999, rs1551514 and rs117117537 may have a real correlation with breast cancer susceptibility.
Real-time fluorescent quantitative PCR (qPCR)
From the results of qPCR in Figure 1 we can see that for rs4473999, CC, CT and TT genotypes, 41, 21 and 14 samples were randomly selected for qPCR, respectively; The relative expression of PCAT1 in the three genotypes was 1.50±0.70, 1.07±0.83 and 0.75±0.64, respectively. Pairwise comparisons showed that difference between CC vs CT (P=0.038) and CC vs TT (P=0.001) were statistically significant, and the expression levels of PCAT1 in CT and TT groups were lower than that in CC group. For rs1551514, 25 samples were GG genotype, the relative expression of PCAT1 was 1.63 ± 0.97; 33 samples were GA genotype, the relative expression of PCAT1 was 1.10 ± 0.61; 13 samples were AA genotype, the relative expression of PCAT1 was 0.94 ± 0.79; and the differences between GG vs GA (P=0.021) and GG vs AA (P=0.033) were both statistically significant.
Dual-luciferase reporter assay
Figure 2 showed the results of the dual-luciferase reporting experiment. The luciferase activity of NC group is significantly higher than the miR-149-5p group (P=0.001). Simultaneously, the luciferase activity of mutant-type (MUT) plus miR-149-5p group is significantly higher than the wild-type (WT) plus miR-149-5p group (P<0.001). These results indicate the combination of rs4473999-WT and miR-149-5p, but there was no evidence for the combination between the rs4473999-MUT and miR-149-5p, which was consistent with the previous prediction.
Cytological experiment
The results of verification of the knockdown and overexpression stable transgenic effects of miR-149-5p combined with PCAT1 SNP showed that miR-149-5p was knocked down by about 50 and overexpressed by about 2 times in MDA-MB-231 cells and MCF-7 cells (Figure S1). CCK8 assay showed that compared with the negative control(NC) group, the OD values of MDA-MB-231 cells in the miR-149-5p low-expression group were lower at 450nm at 24h (P=0.002), 48h (P=0.002), 72h (P=0.002) and 96h (P<0.001), OD values of MDA-MB-231 cells in the miR-149-5p high expression group were higher at 450nm at 24h (P=0.020), 48h (P=0.016), 72h (P=0.035) and 96h (P=0.016), as shown in Figure 3(A) ; Figure 3(B) showed the results of the CCK8 experiment of MCF-7. The results show that compared with the NC group, MCF-7 cells in the miR-149-5p low expression group at 450 nm for 24 h (P<0.001), 48 h (P =0.016), 72 h (P<0.001) and 96 h (P<0.001) had lower OD values; MCF-7 cells in the miR-149-5p high expression group at 450 nm for 24 h (P=0.013), 48 h (P=0.014), 72 h (P<0.001) and 96 h (P=0.002) had higher OD values. The scratch healing results of MDA-MB-231 cells at 0 h and 48 h (Figure 4A)showed that compared with the NC group, the healing rate of MDA-MB-231 cells in the group with low miR-149-5p expression was lower (P=0.023), and that of MDA-MB-231 cells in the group with high miR-149-5p expression was higher (P=0.043); Similarly, the healing rate of MCF-7 cells in the low-expression group of miR-149-5p was lower (P=0.021), while the healing rate of MCF-7 cells in the high-expression group of miR-149-5p was higher (P=0.014) (Figure 4B). The results of cell invasion experiments showed that in both MDA-MB-231 cells and MCF-7 cells, the number of cell invasion in the miR-149-5p low expression group was lower than that of the NC group, and the cell invasion number in the miR-149-5p high expression group was higher than the NC group (Figure 5).