1.1 Data collection
The data of this study are from TCGA and GEO, respectively TCGA-BRCA, GSE96058 and GSE25066. The transcriptome data of the TCGA-BRCA dataset consisted of 1,217 samples, including 1,104 sequenced tumor tissue samples. The GSE96058 data set contains 3409 sequenced samples, including 136 biological replicate samples. The GSE25066 data set contains 508 samples. ComBat (SVA package, V3.36.0) and RUV Seq (V1.22.0) are used for batch effect removal[18]. Two different standardized algorithms, RUVr and RUVg, are used to remove batch effects (Figure S1).
1.2 m6A gene grouping and m6A subtype classification of breast cancer
The M6A gene set contains 8 writer genes (METTL3, METL14, RBM15, RBM15B, WTAP, KIAA1429, CBLL1, ZC3H13), 2 eraser genes (ALKBH5, FTO) and 11 reader genes (YTHDC1, YTHDC2, YTHDF1, YTHDF2, YTHDF3, IGF2BP1, HRNPA2B1, HNRNPC, FMR1, LRPPRC, ELAVL1). According to biological function, m6A-related genes are divided into two groups: Set1 (writer + reader) and Set2 (eraser). According to the Z-score median of Set1 and Set2 gene expression of the sample, the samples is divided into 4 types. They are static Quiescent (setl ≤ 0, set2 ≤ 0), m6A methylation (set1 > 0, set2 ≤ 0), Protein binding (set1 ≤ 0, set2 > 0) and Mixed (set1 > 0, set2 > 0).
1.3 Analysis of the prognosis and clinicopathological indicators of the four m6A subtypes of breast cancer
The survival package of R is used to analyze the prognosis of the four m6A subtypes of breast cancer, and circos is used to analyze the correlation between the four m6A subtypes of breast cancer and different clinicopathological indicators.
1.4 Analysis of variable splicing, mutation and copy number variation of four breast cancer m6A subtypes
The alternative splicing data of the TCGA dataset is downloaded from TCGA Spliceseq 7 types of alternative splicing are provided by the database: Exon Skip (ES), Alternate Promoter (AP), Mutually Exclusive Exons (ME), Alternate Terminator (AT), Retained Intron (RD), Alternate Donor site (AD) and Alternate Acceptor site (AA). The download parameters are set to: Percentage of Samples with PSI Value = 0, Minumum PSI Range(delta across samples) = 0, Minumum PSI Standard Deviation = 0. Percentage of Samples with PSI Value = 0.75 is used to filter PSI data[19]. The MsigDB database (V7.1) is used to download the gene annotation data of Genes annotated by the GO term GO: 0006281. The R package Maftols was used to extract the top 10 genes with the most mutations in the TCGA-BRCA dataset and GO:0006281, as well as the mutation type and copy number mutation type information of these 20 genes. Heat maps are used to show the distribution of mutations and copy number variations of the top 10 tumor driver genes and DN damage repair genes in the four m6A subtypes.
1.5 Analysis of infiltrated immune cell and immune efficacy of four breast cancer m6A subtypes
Cibersort (V1.01) is used to analyze the proportion of immune cells in the four subtypes of tumor samples[20]. The coxph function in the survival (V3.2-3) package of R is used in the univariate cox analysis of the proportion of 22 immune cells to obtain the immune cell infiltration types that are significantly related to the prognosis of the four m6A subtypes of breast cancer. Meanwhile, the proportion of 22 immune cells in the four m6A subtypes of breast cancer are analyzed by ANOVA. The infiltration types of immune cells in the four m6A subtypes of breast cancer were obtained. Multivariate cox analysis is performed using the covariance of the immune infiltration ratio of different immune cells calculated by Cibersort, and the sum of the value multiplied by the corresponding immune infiltration ratio is used as the sample risk score. TIDE is used to analyze the differences in immune efficacy of the four m6A subtypes in the TCGA-BRCA and GES96058 data sets. The analysis is performed using the TIDE (HTTPS: /github.com/liulab-dfci/TIDEpy) default parameter[21].
1.6 Analysis of the sensitivity difference of four m6A subtypes with anthracyclines
According to the method described in 1.2, Breast cancer samples of GSE25066 are divided into 4 m6A types. Twelve m6A-related genes are used for sample typing: Set1 (FTO, WTAP, METTL3, RBM15B, ZC3H13), Set2 (HNRNPA2B1, HNRNPC, ELAVL1, YTHDC1, LRPPRC, FMR1, YTHDC2). In addition, based on the clinical data from the GSE25066 dataset, the sensitive and non-sensitive samples of the anthracyclines were extracted. The differences in the number of drug-sensitive and non-sensitive samples among the four m6A subtypes of breast cancer were compared.
1.7 Analysis of the stemness index of the four m6A subtypes of breast cancer
According to the mRNAsi value of TCGA-BRCA samples, the t test was used to analyze the difference in mRNAsi dryness index among the 4 m6A subtypes. Cell stemness index mRNAsi was downloaded from PMID 29625051[22].
1.8 Statistical analysis
R (V4.0) is used for statistical tests. ANOVA or T-test are used to analyze the differences among different m6A subtypes. P-value < 0.05 is considered statistically significant.