Selection of Prognostic Microbial Community in CRC
The TCGA-CRC-microbiome dataset, consisting of 1406 microbial genera expression matrices and survival data, was merged. Single-factor Cox regression analysis was employed to identify 102 prognostic microbial genera associated with survival outcomes (Fig.S1, Table S1). Among these, 91 microbial genera exhibited a beneficial effect on CRC prognosis (HR < 1; p < 0.05), while 11 microbial genera were identified as detrimental for prognosis (HR > 1; p < 0.05). The detrimental microbial genera identified in this study included Cytomegalovirus,Luteimonas,Molluscipoxvirus,Enterovirus,Simplexvirus,Xylella,
Candidatus_Stoquefichus, Frankia, Lawsonia, Gemmata, and Rickettsia (Fig. 1A).
Construction of the MAPR Model
The 11 detrimental microbial genera identified were subjected to LASSO regression to determine the most prognostically valuable microbial genus among them (Fig. 1B, Fig. 1C). Subsequently, a multi-COX analysis yielded a final set of 10 microbial genera. The formula for calculating the risk score was as follows: risk score = Cytomegalovirus x (1.12983684531639) + Luteimonas x (1.16977972620878) + Molluscipoxvirus x (1.11448749792434) + Enterovirus x (1.08597536162487) + Simplexvirus x (1.02550888354682) + Candidatus_Stoquefichus x (1.21314521259411) + Frankia x (1.37338663289743) + Lawsonia x (1.06165609822309) + Gemmata x (1.24826562320323) + Rickettsia x (1.09165364398354). This formula enables the assessment of risk scores for all CRC patients, and patients can be categorized into high-risk and low-risk groups based on the risk score (median score). We also explored the correlation among the 10 microbial genera in the MAPR model, revealing that the majority of them exhibited positive correlations (Fig. 1D) with correlation coefficients greater than 0 (Fig. 1E).
Clinical Prognosis of CRC Using the MAPR Model
To further investigate the prognostic value of the MAPR model in CRC, we integrated this model with different survival periods in CRC, including OS, DSS, PFS, and DFS. We separately examined the association between the risk scores derived from these four survival period MAPR models and survival outcomes.In the context of OS, we conducted a stratification of 559 CRC patients based on high and low risk scores derived from the MAPR model. As the risk scores from the MAPR model increased (Fig. 2A), there was an observed escalation in the proportion of patient deaths (Fig. 2B).Moving on to DSS, the MAPR model was utilized to categorize 538 CRC patients into high and low expression groups. We observed an increase in the proportion of patient deaths (Fig. 2E), which was found to be positively correlated with elevated patient risk scores (Fig. 2D).Subsequently, with respect to PFS, we observed a higher incidence of patient mortality in CRC as the risk scores increased (Fig. 2G, Fig. 2H).Lastly, focusing on DFS, the MAPR model was employed to stratify 226 CRC patients with available DSS information into high and low risk groups, yielding consistent results. The proportion of patient deaths (Fig. 2K) was found to be directly proportional to the patient's risk scores (Fig. 2J).We visualized the relationship between the expression of 10 microbial taxa and the high/low risk stratification derived from the MAPR model (Fig. 2C, Fig. 2F, Fig. 2I, Fig. 2L). Our findings revealed a consistent trend across these taxa, wherein higher expression levels were observed in CRC when categorized under the high-risk MAPR group.
Based on the scoring derived from the MAPR model, we investigated the prognostic significance of MAPR scores in CRC. The results consistently indicated that, irrespective of the survival period examined, CRC patients with high-risk MAPR scores exhibited poorer prognosis, as evidenced by the K-M survival curves. This association was observed for OS(P < 0.001, Fig. 3A), DSS (P = 0.006, Fig. 3B), PFS(P = 0.002, Fig. 3C), and DFS(P = 0.005, Fig. 3D).
Clinical Prognostic Value of the MAPR Model in CRC
To investigate the association between the MAPR model and clinical characteristics in CRC, we integrated the MAPR model with clinical information of CRC patients, followed by visualizing the relationship between MAPR model risk scores and various clinical features (such as age, gender, stage, T, M, and N) (Fig. 4A). Subsequently, we further categorized the CRC clinical features based on age (≤ 60 vs. >60), early-stage (Stage I-II) vs. late-stage (Stage III-IV), early T stage (T1-2) vs. late T stage (T3-4), M stage (M ≤ 1 vs. M > 1), and N stage (N ≤ 1 vs. N > 1). We discussed the results in the context of the four different survival periods.Firstly, focusing on OS(Fig. 4B), we observed that for certain clinical features (age, gender, stage, and N), patients with high-risk MAPR scores exhibited shortened OS. For DSS, the MAPR model also showed that patients with high-risk MAPR scores had significantly reduced DSS (Fig. 4C). Furthermore, for PFS(Fig. 4D) and DFS(Fig. 4E), as the risk scores derived from the MAPR model increased, the corresponding expected survival periods for PFS and DFS also decreased.
While it should be noted that for some clinical subgroups, the difference in survival between the high and low-risk MAPR groups did not reach statistical significance (Figure S2), it does not negate the predictive ability of the MAPR model. The lack of statistical significance may be attributed to small sample sizes resulting from subgroups of clinical characteristics.
The Independent Clinical Value of the MAPR Model in CRC
We investigated the prognostic value of the MAPR model as a prognostic indicator in CRC. Among the four survival periods analyzed, the MAPR model demonstrated area under the receiver operating characteristic curve (AUC) values of 0.675, 0.755, and 0.792 for 3-year, 5-year, and 10-year OS predictions respectively (Fig. 5A). Additionally, the AUC values for 3-year, 5-year, and 10-year DSS predictions were 0.655, 0.697, and 0.721 respectively (Fig. 5B). For PFS, the AUC values for 3-year, 5-year, and 10-year predictions were 0.626, 0.667, and 0.742 respectively (Fig. 5C). Moreover, the AUC values for 3-year, 5-year, and 10-year DFS predictions were 0.669, 0.817, and 0.925, respectively (Fig. 5D). These results highlight the predictive capability of the MAPR model for various survival periods in CRC patients.
Furthermore, we explored the independent prognostic influence of the MAPR model risk scores in conjunction with other clinical factors on survival. We performed separate assessments for each of the four survival periods and observed that, both in univariate and multivariate analyses, the MAPR model could essentially serve as an independent prognostic indicator (Table 1, Fig.S3), which aligns with our previous findings. To visualize the predictive abilities of MAPR and other clinical indicators in the four survival models, we assessed the AUC values for MAPR risk scores, age, gender, and stage. Specifically, for OS, the AUC values were 0.755, 0.674, 0.528, and 0.702 respectively (Fig. 5E). For DSS, the AUC values were 0.777, 0.534, 0.598, and 0.794 respectively (Fig. 5F). Similarly, for PFS, the AUC values were 0.742, 0.448, 0.604, and 0.683 respectively (Fig. 5G). Lastly, for DFS, the AUC values were 0.817, 0.523, 0.730, and 0.617 respectively (Fig. 5H). Thus, we conclude that, in terms of predictive ability across different survival periods, the MAPR model's risk score surpasses other clinical features.
Table 1
Univariate and multivariate analysis of the risk scores for four survival times and their clinical correlations
Factors
|
Univariate Cox (OS)
|
|
Multivariate Cox(OS)
|
|
|
HR
|
HR.95L
|
HR.95H
|
pvalue
|
HR
|
HR.95L
|
HR.95H
|
pvalue
|
Age
|
1.03113
|
1.014639
|
1.047889
|
0.000194
|
1.042644
|
1.025067
|
1.060521
|
1.48E-06
|
Gender
|
1.128967
|
0.774224
|
1.646251
|
0.528494
|
0.986494
|
0.672815
|
1.446417
|
0.944478
|
Stage
|
2.136732
|
1.721354
|
2.652343
|
5.81E-12
|
2.495151
|
1.978428
|
3.146831
|
1.14E-14
|
riskScore
|
1.669663
|
1.412887
|
1.973106
|
1.78E-09
|
1.618034
|
1.374889
|
1.90418
|
6.96E-09
|
|
Univariate Cox (DSS)
|
Multivariate Cox(DSS)
|
|
HR
|
HR.95L
|
HR.95H
|
pvalue
|
HR
|
HR.95L
|
HR.95H
|
pvalue
|
Age
|
1.010144
|
0.990601
|
1.030073
|
0.311254
|
1.025596
|
1.004762
|
1.046862
|
0.015795
|
Gender
|
1.181606
|
0.725283
|
1.925032
|
0.502776
|
0.943285
|
0.573352
|
1.551904
|
0.818208
|
Stage
|
3.52615
|
2.583086
|
4.813519
|
2.08E-15
|
3.924133
|
2.841796
|
5.418693
|
1.01E-16
|
riskScore
|
1.787133
|
1.328408
|
2.404265
|
0.000125
|
1.842491
|
1.374962
|
2.468993
|
4.27E-05
|
|
Univariate Cox (PFS)
|
Multivariate Cox(PFS)
|
|
HR
|
HR.95L
|
HR.95H
|
pvalue
|
HR
|
HR.95L
|
HR.95H
|
pvalue
|
Age
|
0.998122
|
0.984754
|
1.011671
|
0.784646
|
1.00303
|
0.989057
|
1.0172
|
0.672544
|
Gender
|
1.34466
|
0.957778
|
1.887819
|
0.087126
|
1.317661
|
0.936326
|
1.8543
|
0.113527
|
Stage
|
2.388702
|
1.959765
|
2.911521
|
6.54E-18
|
2.408147
|
1.970505
|
2.942987
|
8.83E-18
|
riskScore
|
1.338264
|
1.117481
|
1.602668
|
0.001538
|
1.319103
|
1.109816
|
1.567856
|
0.001678
|
|
Univariate Cox (DFS)
|
Multivariate Cox(DFS)
|
|
HR
|
HR.95L
|
HR.95H
|
pvalue
|
HR
|
HR.95L
|
HR.95H
|
pvalue
|
Age
|
1.008882
|
0.977476
|
1.041298
|
0.583649
|
1.00823
|
0.973718
|
1.043965
|
0.644628
|
Gender
|
2.919626
|
1.237231
|
6.889758
|
0.014448
|
2.598009
|
1.054298
|
6.40203
|
0.037998
|
Stage
|
1.394753
|
0.805425
|
2.41529
|
0.234991
|
1.486763
|
0.829062
|
2.666224
|
0.183223
|
riskScore
|
1.731249
|
1.119493
|
2.677302
|
0.013609
|
1.438544
|
0.912539
|
2.267747
|
0.117384
|
Construction of the Nomogram for the MAPR Model
Nomograms, as a visual and quantitative tool for assessing multifactorial diseases, are often utilized to estimate the survival probabilities of tumor patients. In this study, we incorporated multiple clinical factors and the MAPR model into the nomogram graphs for predicting the probabilities of 3-year, 5-year, and 10-year survival in CRC patients. Initially, we constructed a nomogram excluding the MAPR model for CRC patients (Fig. 6A). Subsequently, we developed four distinct nomogram graphs incorporating the MAPR model for each of the four survival periods: OS-nomogram (Fig. 6B), DSS-nomogram (Fig. 6C), PFS-nomogram (Fig. 6D), and DFS-nomogram (Fig. 6E). Compared to the nomogram without the MAPR model, the inclusion of the MAPR model significantly improved our predictive ability for CRC prognosis, and the results remained consistent across the aforementioned four survival periods. Additionally, we generated calibration curves for the four MAPR models to display the survival prognoses for 3-year, 5-year, and 10-year periods (Figs. 6F-I). The results exhibited excellent consistency with the nomogram graphs.
Functional Enrichment Analysis and Drug Sensitivity Analysis of the MAPR Model
Differential gene expression analysis was performed using R language to identify genes that exhibit differential expression between the high-risk and low-risk groups defined by the MAPR model. Subsequently, these differentially expressed genes were subjected to functional enrichment analysis. The GO analysis revealed a significant association between the MAPR model and cellular energy metabolism, including oxidoreductase activity acting on NAD(P)H, NAD(P)H dehydrogenase(quinone) activity, mitochondrial protein-containing complex, and cellular respiration (Fig. 7A,Fig. 7C). Additionally, the KEGG analysis results indicated that the MAPR model was closely related to oxidative phosphorylation, chemical carcinogenesis - reactive oxygen species, and RNA polymerase (Fig. 7B, Fig. 7D).
To provide further guidance for clinical applications, we performed drug sensitivity analysis on the high-risk and low-risk groups defined by the MAPR model (Fig.S4). The results revealed that 5-Fluorouracil and Imatinib exhibited higher sensitivity in the low-risk patients, whereas the compound MG-132 showed increased sensitivity in the high-risk patients. These findings offer new perspectives for our clinical treatment strategies.