Establishment andidentification of differential expression miRNAs
The clinicopathological data of patients in the training set and the test set were analyzed and no statistical significance was found between the two sets, as shown in table1. There were respectively 36 and 158 samples in high and low TMB group.|log2FC|> 0.263and P value adjusted by false discovery rate <0.01 were used as screening criteria and it was determined that 48 miRNAs were differentially expressed in samples with high and low TMB levels.There were 22 upregulated and 26 downregulated differential expression miRNA in samples with high TMB level. Visualization of miRNA expression in high and low TMB groupswas shown in Figure1, the result of heatmap demonstrated that these differential expression miRNAs can basically identify samples with high and low levels of TMB.
Signature acquirement through LASSO and identification
The expression of 48 miRNAs in training set were analyzed by LASSOlogistic regression to establish a model based on miRNA as a discriminator of TMB level. After the analysis of LASSO, 18 miRNAs with the largest AUC were identified as a signature of discriminating TMB level, as shown in Figure2a. These 18 miRNAs are hsa-miR-296-5p, hsa-miR-155-5p, hsa-miR-6761-5p, hsa-miR-582-5p, hsa-miR-452-3p, hsa-miR-330-5p, hsa-miR-3127-5p, hsa-miR-146b-5p, hsa-miR-99a-5p, hsa-miR-874-3p, hsa-miR-132-3p, hsa-miR-625-3p, hsa-miR-552-5p, hsa-miR-195-3p, hsa-miR-452-5p, hsa-miR-224-5p, hsa-miR-582-3pand hsa-miR-592. Figure 2b and 2c demonstrate the PCA results of 49 differential expression miRNAs and 18 miRNAs in the model, respectively. It is confirmed that the 18 miRNAs identified by LASSO can clearly discriminate samples with high or low TMB level.
The validation of miRNA-based model
According to the LASSO regression analysis, the formula is as follows: index= (-0.0218449096244075*EXPhsa-miR-296-5p)+(0.140980802896763*EXPhsa-miR-155-5p)+(-0.0618130806762854*EXPhsa-miR-6761-5p)+(0.792005574142893*EXPhsa-miR-582-5p)+(0.0400501457312491*EXPhsa-miR-452-3p)+ (0.351535745471987*EXPhsa-miR-330-5p)+(0.321466850321636*EXPhsa-miR-3127-5p)+(0.306550028173435*EXPhsa-miR-146b-5p)+(-0.0129214249555584*EXPhsa-miR-99a-5p)+(-0.291557428450146*EXPhsa-miR-874-3p)+(0.312801874996257*EXPhsa-miR-132-3p)+(0.71353375237544*EXPhsa-miR-625-3p)+(-0.339018514344295*EXPhsa-miR-552-5p)+(-0.506312988185146*EXPhsa-miR-195-3p)+(-0.203390691480266*EXPhsa-miR-452-5p)+( -0.038165302905481*EXPhsa-miR-224-5p)+(0.0188966224605559*EXPhsa-miR-582-3p)+(-0.303904239862044*EXPhsa-miR-592). Trough the analysis of the model, the results are shown in the Table2, the accuracy in training set is 0.9753, 0.964 in test set and 0.9598 in total set. As it turns out, the model has a high credibility. Analysis of ROC curve shows that AUC in training set, test set and total set is 0.998, 0.958 and 0.982, it is implied that there is no significant difference between the training set and the test set, and verifies the accuracy of the model (Figure3a and 3b).
Correlation analysisbetween signature and TMB and immune checkpoints.
To analyze the correlationbetween the model and TMB, and the correlation between the model and immune checkpoint. The index of each sample in total set calculated was brought into the integrated data of TMBand miRNA, and simultaneously, the index was brought into the pre-processed transcriptome profiling data. As demonstrated in Figure3a-d, the miRNA-based model has a strong positive correlation with TMB(Pearson R = 0.47, P<2.2e-16, Figure 4a), weak positive correlation with CTLA4 (Pearson R = 0.34, P =4.2e-10, Figure 4b) and CD274 (Pearson R = 0.39, P =7.1e-13, Figure 4c). However, there is no correlation between the model and SNCA (Pearson R = -0.034, P =0.54, Figure 4d).
Enrichment analysis for target genes prediction of miRNAs in model
The target genes corresponding to miRNAs were predicted from three databases. GO contains three aspects of functional information: the biological process in which genes are involved, the location of cells, and the function of molecules. And we found that through analysis the target genes were enriched in “DNA-binding transcription activator activity, RNA polymerase Il-specific”, “transforming growth factor beta receptor, cytoplasmic mediator activity” and “phosphatase binding” etc. KEGG is an understanding of advanced functions and biological systems at the molecular level, the result of KEGG analysis shows that the target genes of 18 miRNA were mainly enriched in pathways of cancers and cancer-related signaling pathways,for example, the “Colorectal cancer” and “MAPK signaling pathway”.