This study analyzed 711 stage 2 and 3 rectal cancer patients who received chemoradiation post-biopsy from four cohorts: 20 from University of Michigan (UM), and 691 from the ACOSOG (n = 25), Timing (n = 68) and Memorial Sloan Kettering (MSK, n = 598) groups. We excluded 57 patients who only received chemotherapy and 15 with stages I or IV tumors. Subsequently, 336 patient tumors with available DNA or RNA sequencing data were used for our genomic analyses (Fig. 1A). Those showing pathological or clinical complete response were categorized as complete responders (CRs), while the rest were classified as incomplete responders (ICRs). Significant clinical differences were identified between CR (n = 124) and ICR groups (n = 267) (Supplementary Fig. 1). Notably, ICR patients exhibited larger tumor sizes (Mann Whitney U test, p-value < 0.05, Fig. 1B), higher recurrence rates (Chi-square test, p-value < 0.05, Fig. 1C-D) and deceased overall survival (Chi-square test, p-value < 0.05, Fig. 1E-F). Furthermore, 79.4% of ICR patients were lymph nodes positive (N1, N2, N+), compared to 70.2% of CRs (Fig. 1G-H). These results align with earlier research linking complete response and reduced recurrence [40]. Notably, treatment approach was not significantly associated with the type of response; comparison between CR and ICR showed no statistically significant difference in response to chemoradiation therapy whether it was administered as a consolidation or as the only form of treatment (Chi-square test, p-value = 0.373) (Supplementary Table 1). In addition, our analysis indicated that microsatellite instability (MSI) status did not significantly distinguish between CR and ICR outcomes (Chi-squared test, p-value 0.233) (Supplementary Table 1).
Genomic characteristics of response to nCRT
Prior study has indicated that complete response to nCRT is associated with a high tumor mutation burden (TMB) in rectal cancer [41] but potential confounding factors were not accounted for such as BMI, gender, and age. To do so, we applied the propensity score weighting algorithm in the TMB comparison (Supplementary Fig. 2A). Even after this adjustment, CR tumors exhibited significantly higher mutation rates (Fig. 2A). Importantly, this increased mutation rate in CR tumors was not influenced by the presence of microsatellite instability (MSI) in tumors (Supplementary Fig. 2B). This finding implies that TMB may be an important factor for predicting the response to neoadjuvant therapy.
In addition to TMB, we also aimed to identify the relevant genetic features associated with response. We first focused on the frequently mutated genes and compared their mutation frequencies between CR and ICR tumors. The commonly mutated colorectal cancer-related genes, such as TTN, APC, TP53, and KRAS were found in a high percentage of rectal tumors (Fig. 2B, Supplementary Fig. 2C-D). However, genes like APC, TP53, KRAS, PIK3CA and FBXW7 displayed higher mutation frequencies in ICR tumors (Supplementary Fig. 2D). Interestingly, genes with higher mutation frequencies in CR tumors were enriched in several DNA repair pathways (Fig. 2C), including Mismatch repair (MMR), Base excision repair (BER), Homologous recombination (HR), and Nucleotide excision repair (NER). This observation suggests a potential association between complete response to nCRT and DNA repair deficiency.
Next, we used MutSig2CV [23] to identify significantly mutated genes (SMGs) for CR and ICR tumors, respectively. MutSig2CV identifies genes that are mutated more frequently than expected by chance, considering three factors: background mutation rate, localized hotspots, and vertebrate conservation. We discovered 36 SMGs in ICR tumors and 17 in CR tumors (Fig. 2D, Supplementary Table 2). Several well-known cancer driver genes, including TP53, APC, KRAS, NRAS, ARID1A, and PIK3CA, were present in both groups. Notably, consistent with the frequently mutated genes in CR tumors (Fig. 2D), the CR-specific SMGs were enriched in DNA repair pathways, while the ICR-specific SMGs were predominantly associated with oncogenic signaling pathways (Fig. 2E).
Furthermore, we specifically assessed for mutations in genes associated with various mechanisms of DNA repair. After balancing confounding factors, we observed higher mutation frequencies in MSH3, MLH1, PMS1, BRCA1, BARD1, POLD1, and POLE in CR than ICR tumors (Fig. 3A). Overall, a significant higher proportion of CR patients (52% vs. 34%, p-value < 0.05) had DNA repair-related mutations compared to ICR (Fig. 3B-3C). These DNA repair alterations were not related to MSI status as they were rarely found in MSI tumors and MSI-H status was very similar and rare between CR and ICR tumors (2/104 CR and 2/260 ICR tumors).
In cancer, certain mutations exhibit evolutionary dependencies [42], reflecting interactions between genes. Alteration in one gene may influence the mutation pattern of another, thereby creating a complex network of mutually exclusive or co-occurring mutation pairs [43]. To explore whether specific genetic interactions are associated with the response to nCRT, we detected co-occurring and mutually exclusive mutation pairs within CR and ICR tumor groups. Interestingly, CR tumors demonstrated more significantly co-occurring mutations in cancer-related genes, with 70 co-occurring pairs in CR tumors compared to 38 pairs in ICR tumors (Fig. 3D and 3E, Supplementary Data 1, Fisher’s exact test, p-value < 0.05). In CR tumors, genes with co-occurring mutations were primarily involved in DNA damage response, double-strand DNA break (DSB) response, and nucleotide excision repair (NER) (Fig. 3F). In contrast, the gene network with co-occurring mutations in ICR tumors was predominantly associated with tumorigenic pathways, such as RAS signaling, autophagy, apoptosis, and anti-apoptosis (Fig. 3G).
Additionally, we identified 7 significantly mutually exclusive mutation pairs in ICR tumors and 3 in CR tumors (Fisher’s exact test, p-value < 0.05) (Supplementary Data 1). Alterations in KMT2D and TP53 were observed in both CR and ICR groups. The mutually exclusive alterations in APC and RNF43 has been reported in microsatellite-unstable colorectal adenocarcinomas [44]. Mutations in PTEN and TP53 were also exclusive in a subgroup of colorectal cancer [45] and breast cancer [46]. Similar exclusivity between TP53 and ARID1A has been observed in other cancer types [47].
Transcriptomic differences and tumor immune cell infiltrate in response to nCRT
To compare the transcriptomic differences between CR and ICR tumors, we merged the RNA sequencing data from 114 patients from the MSK cohort and 7 patients from the UM cohort. We identified 678 downregulated and 1018 upregulated genes in CR tumors (p-value < 0.05 and log2FC > 0) (Fig. 4A). Consistent with the study from Chatila et.al [16], IGF2 and L1CAM were distinctly overexpressed in ICR tumors. Gene Set Enrichment Analysis (GSEA) revealed that genes with relatively low expression in CR tumors were significantly enriched in several DNA repair-associated pathways, including non-homologous end joining (NHEJ), NER, MMR, homologous recombination (HR), and BER pathways (Fig. 4B). This is consistent with our mutation data analysis where DNA repair genes were predominantly mutated, leading to DNA repair deficiency. CR tumors also demonstrated overexpression of numerous genes that were significantly enriched in immune-related pathways (Fig. 4B, Supplementary Fig. 3). In addition to the association of tumorigenic pathways with mutation data, gene expression analysis suggests that increased immune function in CR tumors may be critical to nCRT as well.
Emerging evidence shows that tumor-infiltrating immune cells (TICs) are associated with sensitivity to nCRT [48, 49]. To investigate the differences in TICs between CR and ICR tumors, we employed TIMER [32] and subsequently assessed the differential fractions/enrichment scores. We observed that CR tumors manifested a significantly reduced infiltration of T cells, particularly CD8 + T cells (Mann Whitney U test, p-value < 0.05) (Fig. 4C-D). In a contrasting observation, although not reaching statistical significance (Mann Whitney U test, p-value > 0.05), ICR tumors exhibited lower fractions/enrichment scores across a spectrum of immune cells, including B cells, Neutrophil, M2 macrophages, dendritic cells, neutral killer (NK) cells, Neutrophils, and monocytes (Supplementary Data 2).
Predictors of complete response to neoadjuvant therapy in rectal cancer
To elucidate both clinical and genomic predictors of complete response (CR), we selected patients with clinical stage II and III who received neoadjuvant therapy and for whom targeted or whole exome DNA sequencing data was available. This resulted in 233 ICR and 101 CR patients for analysis. The bivariate analysis revealed that the only clinical variable that differed between CR and ICR patients was median tumor size (CR 4.2 cm (Q1-Q3) 3.2–5.2) vs ICR (4.7 cm (3.8-6.0)) (Supplementary Table 1). Treatment group did not predict whether a patient went on to be a CR or ICR (Supplementary Table 1). We next assessed whether presence of particular non-synonymous deleterious mutations differed between CR and ICR and identified 10 alterations that were more common in CR patients with p-values of 0.1 or less (Supplementary Table 3). Network analysis indicated interactions between all 10 genes (Supplementary Table 4). These interactions, such as co-expression, were identified through curated databases (Supplementary Fig. 4). Because CR patients had more co-occurring mutations when the genomic analysis was performed (Fig. 3D-E), we did bivariate comparisons between all identified co-occurring mutations in CR and ICR patients and identified a subset of 95 that were more common in CR patients with p-values < 0.1 (Supplementary Data 3). Next, we fit a logistic regression model for CR using tumor size (cm), clinical stage, a mutation in any of the 10 network genes, and number of co-occurring mutations grouped as 0, 1–2, or 3 or more co-occurring mutations (Supplementary Data 4). After selecting the model with the highest R2 Tjur and lowest AIC, we observed that predictors of CR included smaller tumor size (OR: 0.79, 95%CI: 0.66–0.94), mutation of any of the network genes (OR: 2.57, 95%CI: 1.37–4.77), and presence of more co-occurring mutations (likelihood ratio test p-value < 0.001). Compared to patients with zero co-occurring mutations, those with 1 to 2 mutations exhibited a 1.57 (95%CI: 0.67–3.51) higher odds of CR and those with 3 or more co-occurring mutations showed an even greater 8.47 (95%CI: 2.74–32.26) times higher odds of CR (Table 1).
Table 1
Logistic regression model coefficients for prediction of complete response (CR) to treatment
Predictors | Odds Ratios | 95% CI | p value |
Tumor size (cm) | 0.79 | 0.66–0.94 | < 0.01 |
AJCC classification | | | |
2 | Ref. | | |
3 | 0.71 | 0.38–1.35 | 0.28 |
Number of selected co-occurring mutations | | | |
0 | Ref. | | |
1–2 | 1.57 | 0.67–3.51 | 0.29 |
3 or more | 8.47 | 2.74–32.26 | < 0.001 |
Mutations in selected genes | | | |
No | Ref. | | |
Yes | 2.57 | 1.37–4.77 | < 0.01 |
Observations | 314 | | |
R2 Tjur | 0.14 | | |
AIC | 349.72 | | |
CI: Confidence interval; AIC: Akaike Information Criterion |
Genomic and molecular determinants of incomplete response to nCRT with subsequent recurrence
Given that approximately one-third of ICR tumors subsequently recurred, we explored the molecular features associated with recurrence in these pre-treatment biopsy samples, aiming to enhance our understanding and explore potential strategies specifically to target these recurrent ICR tumors. Although recurrent ICR patients had significantly (the log rank test, p-value < 0.0001) poorer survival (Supplementary Fig. 5A), there was no significant difference in their tumor mutation burden, altered genome fractions, or MATH scores [50] (Supplementary Fig. 5B-D). However, these tumors that later reoccurred exhibited a higher frequency of mutations in cancer driver genes, including APC, TP53, KRAS, and TTN (Fig. 5A). Enrichment analysis revealed these tumors had more mutations in pathways related to TGF-ß signaling, ECM-receptor interaction, and cell-cell interaction (Fig. 5B). In addition, gene expression profiles indicated a downregulation of cell-cell interaction genes in recurrent ICR tumors (Fig. 5C). Moreover, more Cancer Associated Fibroblasts (CAFs), known to influence colorectal cancer (CRC) metastasis [51], were detected in the recurrent ICR tumors (Fig. 5D). Consistent with enriched mutations and decreased expression of cell-cell interaction genes, targeting CAFs may be a potential therapeutic strategy to prevent metastasis in CRC [52, 53]. Additionally, recurrent tumors demonstrated reduced infiltration of NK cells, dendritic cells, and Regulatory T Cells (Tregs) (Fig. 5E-G).
Characteristics of subgroups of the incomplete responders
Given the evidence [54] that gene expression has a higher predictive capacity for drug response than other factors such as mutation signature, copy number variation, and methylation, we aimed to identify ICR patient subgroups based on gene expression patterns. Our objective was to understand the molecular characteristics of these subgroups, potentially guiding alternative therapeutic approaches. Through unsupervised hierarchical clustering of 76 ICR patients with whole exome and transcriptome sequencing and outcome data (Supplementary Data 5), we identified four groups with distinct gene expression profiles (Fig. 6A, Supplementary Fig. 6A).
Cluster 1 tumors (n = 36) exhibited characteristics similar to CR tumors, marked by elevated mutation rates, enhanced cell-cell interaction, and diminished expression of DNA repair genes (Supplementary Fig. 6B). Cluster 2 tumors (n = 25) showed efficient DNA repair capabilities, albeit with reduced expression of pathways associated with drug metabolism (Supplementary Fig. 6C). Cluster 3 (n = 5) exhibited a significant upregulation of cytokine and chemokine signaling pathways (Supplementary Fig. 6D), suggesting potential opportunities for cytokine-based therapeutic interventions [55]. Interestingly, all tumors in Cluster 4 (n = 10) were CMS4 subtype (Fig. 6B), which is associated with poorer survival [56, 57]. This tumor subset exhibited a higher proportion of metastatic patients (25% in Cluster 1 compared to 40% in Cluster 4, Supplementary Fig. 6E). Additionally, these tumors displayed distinct genomic alternations, including decreased mutation rates and lower fraction of altered genomes (Fig. 6C-D), along with lower expression of several cell-cell interaction pathways (Supplementary Fig. 6F). Among the top 50 DEGs between tumors in Cluster 4 and other clusters, we identified 4 genes (DUOXA2, DUOX2, SSH1, FRMD6) that were significantly associated with worse survival (Supplementary Fig. 6G-J). Previous studies have reported that DUOX2 and SSH1 exhibited significantly higher expression and promoted the progression and metastasis of CRC [58, 59]. Additionally, FRMD6, which is an upstream regulator of Hippo signaling, serves as a marker of the CMS4 subgroup in CRC [60, 61].
Furthermore, the immune microenvironment of Cluster 4 tumors was characterized by increased infiltration of various immune cells, including neutrophils, monocytes, macrophages, activated myeloid dendritic cells, naive CD4 + T cell, as well as Tregs cells (Supplementary Fig. 6K-P). The immune-rich microenvironment of Cluster 4 may indicate that there is an opportunity for immunotherapy for this specific subset of tumors. Of note, we observed significant overexpression of PDCD1 (PD-1) in Cluster 4 (Fig. 6E), highlighting its potential as a therapeutic target [62, 63].
To validate these findings, we applied the differential expression signature between Cluster 4 and other clusters to 42 rectum adenocarcinoma (READ) samples in TCGA. We identified that a subgroup, termed Cluster4-like, demonstrated a similar expression profile to Cluster 4 (Supplementary Fig. 6Q-R). Interestingly, the tumors in the Cluster4-like group (n = 10) also had poor outcomes, with 80% being CMS4 subtype (Supplementary Fig. 6S). Immune checkpoint genes, including PD-1, CD274 (PD-L1), CTLA4, HAVCR2 (TIM3), and LAG3, were overexpressed in Cluster4-like tumors (Fig. 6F). These tumors also displayed increased immune cell infiltration (Fig. 6G). These differences mirrored Cluster 4 characteristics. Collectively, our findings suggest that greater attention should be given to the subsets of patients with high immune cell infiltration and overexpression of immune checkpoint genes, as these patients might benefit from immune checkpoint inhibitors (ICIs).