The study was approved by the ethics committee of Zhengzhou University. All participants were informed and signed written informed consent. The design and implementation flow chart of this study was shown in Figure 1.
Meta-analysis of risk factors for gastric cancer
To obtain the credibility and strength of non-genetic factors and genetic variation on gastric cancer risk, we performed a field synopsis and meta-analysis to identify the risk of gastric cancer in Chinese population. A total of 22 SNPs involving 16 genes were identified and associated with the risk of gastric cancer. Details have been published in the journal of Aging-US[24]
Genetic variant selection for PRS
The bioinformatics method was used to screen lncRNAs and corresponding functional SNPs that were differentially expressed in gastric cancer and possess potential binding sites with microRNAs (miRNAs).
The gastric cancer related microarray data (gse50710, gse53137, gse58828) of Chinese population in the Gene Expression Omnibus (GEO) database were retrieved and downloaded. The GEO chip data related to gastric cancer was analyzed by using the Bioconductor software based on R-software (version 3.6.2 for Windows), which was associated to the mapping database of chip probes according to the probe code. The intersection part was obtained according to the analysis results of three chips by using SAS 9.2 (SAS Institute Inc., Cary, North Carolina, USA). The difference multiple was > 2.0 and P < 0.05, the differentially expressed lncRNAs were screened.
We used the lncRNASNP2 database (http://bioinfo.life.hust.edu.cn/lncRNASNP#!/) and the online database RNAfold (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi), the preliminary potential function prediction of the biological functions of the SNPs on the differentially expressed lncRNAs were screened out, and the SNPs that affect the secondary structure of lncRNAs or affect the binding of miRNAs will be identified and screened out. r2 can reflect the degree of linkage disequilibrium (LD) between SNPs sites, combined with the LD (r2<0.8 and LD<1.0) between SNP sites on the same gene, 21 lncRNA SNPs were finally selected (supplementary Table 1).
We followed the principle of evidence-based medicine and applied a three-step approach. We initially performed meta-analysis to screen the genetic associations between genetic variant and gastric cancer. After this screening analysis, SNPs in strong linkage disequilibrium (LD) with each other polymorphisms ware excluded. Finally, the extracted SNPs were combined with the published field synopsis or systematic review on SNPs (OR≥1.20 or OR<0.8) significantly associated with gastric cancer in Chinese population (Chinese Han in Beijing, Minor Allele Frequency≥0.1). Finally, a total of 18 genes involved in 20 SNPs were selected, the results were presented in supplementary Table 2.
Study population
All patients with gastric cancer were new cases from the First Affiliated Hospital of Zhengzhou University and the Affiliated Cancer Hospital of Zhengzhou University from January 2012 to December 2015. The patients did not receive anti-tumor treatment before recruitment, and had no history of other malignant tumors.
The controls were collected from a cardiovascular disease epidemiological survey conducted at the same time in Henan Province. Individuals with malignant tumors, digestive system diseases, and blood related to the case were excluded.
Based on frequency matched case-control study design to match subjects according to gender and age (± 2 years), the blood samples of 660 patients with gastric cancer confirmed by pathology and 660 normal controls from community were collected. Each participant met the requirements of the institutional review committee and gave informed consent.
Genotyping and quality control
Polymerase chain reaction restriction fragment length polymorphism (PCR-RFLP), created restriction site-PCR-RFLP (CRS-PCR-RFLP) and Improved Multiplex Ligation Detection Reaction (iMLDRTM) were used to genotype SNPs corresponding to lncRNAs or selected by EBM. For iMLDRTM, 3130XL sequencer (AppliedBiosystems, USA) was used for sequencing, and the GeneMapper 4.0 was applied to identify genotypes.
For PCR-RFLP typing, 10% of the samples were randomly selected and the sequencing results were compared with the experimental results. When the agarose gel electrophoresis pattern could not accurately determine the genotype, repeated experiments or direct sequencing were used to determine the genotype.
In the iMLDR typing test, agarose gel electrophoresis was used to detect each sample before typing, and 3% double blind sample quality control and negative control quality control. For quality control samples, the success rate (call rate) and accuracy rate were ensured to be more than 98%.
Weighted genetic risk scores
The population average risk (Genetic score) of single SNP was calculated based on the genotype frequency of the genetic variation and the OR of the meta-analysis in the Chinese population.
Genetic score (W) = (1-p)2+2p(1-p)OR+p2OR2
p was the risk allele frequency.
Assuming that the genotypes of a SNP are AA, AB and BB, B is the risk allele, A is the non-risk allele, and the corresponding risk values are 1, OR and OR2, then the weighted genetic risk scores (wGRS) is estimated as follows:
AA=1/W;AB=OR/W;BB=OR2/W
wGRS=SNP1×SNP2×SNP3…… SNPn (Missing value set to 1)
Polygenic risk score
We derived a PRS specific to Chinese populations from all SNPs that have been verified to be associated with gastric cancer risk at genome-wide significance level. The PRS was constructed for cases and controls by summing the risk allele counts (i.e., subjects have 0, 1, or 2 risk alleles) for the associated variants weighted by their natural log transformed (ie, the ln of the odds ratios (OR)) effect sizes (OR) extracted from results of multivariate logistic regression model. For each participant, we summed the weighted risk allele counts and then divided the total number of loci to derive a mean weighted score, and the mean weighted score as the reference.
PRSj= nijln(ORi)
j is the number of SNPs included in the model; nij is the number of the i-th risk allele (0, 1 or 2); ORi is the associated risk value (OR) between the risk allele of the i-th SNP and gastric cancer.
Statistical analysis
The Hardy-Weinberg equilibrium (HWE) test was performed on the genotype distribution of the control using Chi square test of goodness of fit. Unconditional logistic regression was used to implement the correlation analysis between the targeted SNPs and gastric cancer risk.
Plink 1.9 (NIH-NIDDK's Laboratory of Biological Modeling, Harvard University) was used for quality control of related SNPs, association analysis of allele and generation of PRSice-2 (Gavin Band, New York, USA) basic dataset and target dataset. Gastric cancer risk prediction models were constructed using SNP screened by EBM and verified by association based on wGRS and PRS. lncRNAs SNPs were put into the prediction models as independent datasets of risk factors and empirical P-value was used to perform 10,000 fittings within the model to optimize model parameters and build the optimal model.
Receiver operating characteristic (ROC) and area under curve (AUC) were used to evaluate the gastric cancer recognition degree of different models. Net reclassification improvement (NRI) and integrated discrimination improvement (IDI) were used to evaluate the predictive ability of wGRS and PRS models, akaike information criterion (AIC) and bayesian information criterion (BIC) were used to evaluate the fitting degree of the model.
Statistical analysis was performed with R software (version 3.6.1; The R Foundation for Statistical Computing, Vienna, Austria) and Stata version 13.1MP (StataCorp: College Station, TX, USA). A p-value of <0.05 was considered statistically significant with two-sided.