2.1 Select m6A modification related regulatory factors
A total of 15 genes are widely reported as m6A regulators19-21. RNAs undergo methyl group modification by "writers" such as METTL3, METTL16, WTAP, RBM15, RBM15B, KIAA1429, and ZC3H1322. Then, "readers" including YTHDF1, YTHDF3, YTHDC2, HNRNPC, HNRNPA2B1 and EIF3A recognize those m6A-modified RNAs and function in RNA processing, nuclear export, translation and decay23. By relying on "erasers" (FTO, ALKBH5), m6A is restored to adenosine and achieves demethylation modification24. RNA sequencing data of 15 genes and the clinicopathological parameters of TCGA-LUAD were obtained from the TCGA database (http://cancergenome.nih.gov/). The following methods were used for data screening: (1) samples without clinical data and (2) samples without survival information. Finally, a total of 455 LUAD patient samples were obtained and used for follow-up bioinformatics analysis.
2.2 DNA methylation data collection
The LUAD-DNA CpG site information was obtained from the Illumina Human Methylation 450 BeadChip (Illumina, San Diego, CA, United States) and the TCGA-biolinks package in R language was used to analyze clinical data25. β values were used to visualize DNA methylation levels using the formula: β = M/(U+M+100), where M represents methylation of the target CpG site and U represents unmethylation. A total of 485,577 methylation sites were obtained. The methylation data was matched with screened samples to evaluate the correlation between DNA methylation levels and OS.
2.3 Establishment and verification of the prediction model
The 455 samples were divided into two parts: two-thirds of the clinical samples were divided into the training group to identify and build a prognostic risk model. The remaining one-third of samples were placed into the validation group to confirm the predictive prognostic value of the biomarker. Significant methylation sites related to the prognosis of LUAD were identified using LASSO regression analysis. Meanwhile, to prevent overfitting, LASSO 1000 iterations were conducted using the “glmnet algorithm” package in R. We established a risk model based on the following formula: Risk Score = ∑ (Methylation level of each independent prognostic site× regression coefficient).
Patients in the training set were divided into high and low risk groups according to the risk score. Kaplan-Meier was used to draw the survival curve of high and low risk groups. A bilateral logarithmic rank test was used to test the survival difference between the high risk and low risk groups. A ROC curve was used to explore the survival of patients at 1, 5 and 10 years. In addition, the same method was used to verify the prediction model in the test set.
2.4 Establishing a predictive nomogram
According to the 'rms' package from R software, a nomogram was constructed to evaluate the reliability of the prediction tool. Other pathological parameters (gender, staging, N staging, T staging) and methylation risk score were evaluated using COX proportional risk assessment (including multivariate and univariate). Concordance index (C index), calibration chart and ROC curves were used together to evaluate the prediction ability of the nomogram.
2.5 Immune cell infiltration and biological pathways.
Immune and stromal cells are two types of cells that infiltrate the tumor immune microenvironment. We used "ESTIMATE" in the R package to evaluate the immune score, matrix score and tumor purity26. Quantitative analysis of the immune cell infiltration level was observed using GSVA and Single-sample GSEA (ssGSEA) in the R language GSVA software package27,28. Finally, we evaluated the 24 types of immune cells, including natural immune cells and acquired immune cells. To explore the relationship between the predictive model and immune infiltration, we analyzed the different 24 immune cells and their infiltration levels in the high risk and low risk groups (p<0.05). Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses of the DEGs were performed using Metascape (http://metascape.org).
2.5 Statistical analyses
Univariate analysis was performed in the training set to screen methylation sites significantly associated (P<0.01) with OS in LUAD patients. Then, Lasso Cox regression analysis was used to further establish prognostic methylation sites and construct a Cox multiple regression model. A survival curve was drawn using Kaplan Meier method and was used to evaluate OS between the two groups. In this study, p < 0.05 was considered as statistically significant.