Distinct TNM stages represent with different distributions of molecular subtypes
We analyzed the association between CMS subtypes and tumor stage in a meta-cohort comprising 1,040 patients (Table 1). An increase in prevalence of the poor-prognosis mesenchymal subtype (CMS4) was detected in advanced stages of disease (stage I 12 (9.8%), stage II 89 (22.9%), stage III 94 (29.4%) and stage IV 45 (38.5%), p<0.001) (Fig. 1 and Additional file 1: Table S1). The same increase was observed for the individual cohorts separately (Additional file 1: Table S1 and Additional file 1: Fig. S1).
Table 1. Basic characteristics of the aggregated cohort (n=1,040)
|
|
|
|
|
|
|
|
|
|
|
|
Total
|
|
GSE39582
|
|
TCGA
|
|
|
n=1040
|
|
n=511
|
|
n=529
|
|
|
|
|
|
|
|
|
|
|
Gender
|
Female
|
476
|
45.8%
|
|
227
|
44.4%
|
|
249
|
47.1%
|
|
Male
|
564
|
54.2%
|
|
284
|
55.6%
|
|
280
|
52.9%
|
|
|
|
|
|
|
|
|
|
|
Age
|
Median (IQRa)
|
68 (59-77)
|
|
69 (59-76)
|
|
68 (59-77)
|
|
|
|
|
|
|
|
|
|
|
TNM
|
I
|
133
|
12.8%
|
|
38
|
7.4%
|
|
95
|
18.0%
|
|
II
|
417
|
40.1%
|
|
216
|
42.3%
|
|
201
|
38.0%
|
|
III
|
355
|
34.1%
|
|
200
|
39.1%
|
|
155
|
29.3%
|
|
IV
|
135
|
13.0%
|
|
57
|
11.2%
|
|
78
|
14.7%
|
|
|
|
|
|
|
|
|
|
|
MSI
|
MSS
|
887
|
85.3%
|
|
436
|
85.3%
|
|
451
|
85.3%
|
|
MSI
|
153
|
14.7%
|
|
75
|
14.7%
|
|
78
|
14.7%
|
|
|
|
|
|
|
|
|
|
|
CMS
|
1
|
153
|
14.7%
|
|
79
|
15.5%
|
|
74
|
14.0%
|
|
2
|
420
|
40.4%
|
|
214
|
41.9%
|
|
206
|
38.9%
|
|
3
|
133
|
12.8%
|
|
66
|
12.9%
|
|
67
|
12.7%
|
|
4
|
240
|
23.1%
|
|
112
|
21.9%
|
|
128
|
24.2%
|
|
Indeterminate
|
94
|
9.0%
|
|
40
|
7.8%
|
|
54
|
10.2%
|
|
|
|
|
|
|
|
|
|
|
aIQR = interquartile range
Tumor stage reflects tumor biology
We tested the hypothesis that tumor stage as defined by TNM, does not only represent disease progression but also reflects different biological entities. By investigating the changes in the number of differentially expressed genes, considerable gene expression differences between TNM stages was revealed. These differences decreased significantly when stratified for CMS2 and CMS4 representing the most common CMSs (Fig. 2A). This was confirmed when stratifying for all subtypes (CMS1-4) (Additional file 1: Fig. S2). Furthermore, visualization of the genes that displayed significant differences between tumor stages (ANOVA p<0.05, n=2764) shows a clear separation for the immune (CMS1), epithelial (CMS2/3) and mesenchymal (CMS4) subtypes in both a t-SNE plot and a gene expression heatmap (Fig. 2B and Additional file 1: Fig. S3).
CMS4 correlates with more advanced stages and has a higher progression rate
In order to specifically investigate the association between CMS4 and more advanced tumor stages, we built two gene signatures to discriminate disseminated disease (stage III-IV) from local disease (stage I-II), and to separate CMS4 cancers from CMS1/2/3 tumors (see methods). Remarkably, the two scores were highly correlated (r=0.77, p<0.001) (Fig. 2C), with only a few overlapping genes (13/200), suggesting that overrepresentation of CMS4 cancers in stage III-IV cancers is responsible for gene expression differences between early and advanced malignancies.
Subsequently, we assessed the rate of progression from early (stage I-II) to advanced (stage III-IV) tumor stage for each of the subtypes by calculating the risk ratios. This shows a markedly increased progression rate towards more advanced stages for CMS4 cancers as compared to CMS1 tumors (RR 1.64, 95% CI: 1.29-2.09), CMS2 (RR 1.25, 95% CI: 1.08-1.46) and CMS3 (RR 1.57, 95% CI: 1.23-2.01) (Fig. 2D).
CMS4 holds prognostic value in high-risk stage II colon cancer
In an effort to validate our findings and provide clinical utility to the insight obtained, we evaluated chemotherapy naive high-risk stage II colon cancers (Table 2). Based on the association between CMSs and tumor stage, we hypothesized that CMS4 cancers are over represented in high-risk stage II cancers. Indeed, in the combined stage II cohorts, MATCH and GSE33113 (n=197), CMS4 cancers were more prevalent in high-risk stage II patients (21.7% vs 7.7%, p=0.02 respectively) (Table 2, Fig. 3A and Additional file: Table S2). DFS for these patients confirmed the poor disease outcome of CMS4 cancers (Fig. 3B). This effect was explained by the poor outcome for patients with a CMS4 cancer in the subgroup with high-risk tumors (5-year DFS 41.7% versus 100.0%, p=0.008) (Fig. 3C and Additional file 1: Fig. S4). These findings were substantiated by a multivariate analysis, which showed a significant correlation of CMS with DFS in the subgroup with high-risk tumors but not in the total stage II cohort (Additional file 1: Table S3). The extended GSE33113 cohort, comprising of both stage II and stage III tumors, revealed possible under-staging of high-risk stage II patients. With a rising number of assessed lymph nodes the percentage of stage III colon cancers increased (Fig. 3D and Additional file 1: Table S4).
Table 2. Characteristics MATCH and GSE33113
|
|
|
|
|
|
|
|
|
|
|
|
Total
|
|
MATCH cohort
|
|
GSE33113
|
|
|
n=197
|
|
n=112
|
|
n=85
|
|
|
|
|
|
|
|
|
|
|
Gender
|
Female
|
101
|
51.3%
|
|
57
|
50.9%
|
|
44
|
51.8%
|
|
Male
|
96
|
48.7%
|
|
55
|
49.1%
|
|
41
|
48.2%
|
|
|
|
|
|
|
|
|
|
|
Age
|
Median (IQR)
|
71.0 (63.0-77.0)
|
|
70.0 (63.0-76.0)
|
|
74.6 (61.9-80.2)
|
|
|
|
|
|
|
|
|
|
|
T
|
3
|
184
|
93.4%
|
|
107
|
95.5%
|
|
77
|
90,6%
|
|
4
|
13
|
6.6%
|
|
5
|
4.5%
|
|
8
|
9.4%
|
|
|
|
|
|
|
|
|
|
|
N
|
Median (range)
|
14
|
(1-46)
|
|
14
|
(5-28)
|
|
12
|
(1-46)
|
|
|
|
|
|
|
|
|
|
|
N
|
< 10 lymph nodes assesed
|
45
|
22.8%
|
|
14
|
12.5%
|
|
31
|
36.5%
|
|
≥ 10 lymph nodes assesed
|
142
|
72.1%
|
|
98
|
87.5%
|
|
44
|
51.8%
|
|
Missing
|
10
|
5.1%
|
|
0
|
0,0%
|
|
10
|
11.8%
|
|
|
|
|
|
|
|
|
|
|
MSI
|
MSS
|
140
|
71.1%
|
|
79
|
70.5%
|
|
61
|
71.8%
|
|
MSI
|
52
|
26.4%
|
|
28
|
25.0%
|
|
24
|
28.2%
|
|
Missing
|
5
|
2.5%
|
|
5
|
4.5%
|
|
0
|
0.0%
|
|
|
|
|
|
|
|
|
|
|
CMS
|
1
|
49
|
24.9%
|
|
29
|
25.9%
|
|
20
|
23.5%
|
|
2
|
83
|
42.1%
|
|
52
|
46.4%
|
|
31
|
36.5%
|
|
3
|
19
|
9.6%
|
|
11
|
9.8%
|
|
8
|
9.4%
|
|
4
|
20
|
10.2%
|
|
5
|
4.5%
|
|
15
|
17,6%
|
|
Indeterminate
|
26
|
13.2%
|
|
15
|
13.4%
|
|
11
|
12.9%
|
|
|
|
|
|
|
|
|
|
|
IQR = interquartile range
Table 3. Multivariate analysis of relevant parameters and disease-free survival for high-risk stage II patients
|
HR
|
95% CI limits
|
CMS 1
|
*
|
|
CMS 2
|
0.225
|
0.053-0.957
|
CMS 3
|
0.599
|
0.062-5.781
|
CMS 4
|
Reference
|
|
Gender
|
2.725
|
0.488-15.225
|
Age
|
0.986
|
0.952-1.022
|
Location
|
3.45
|
0.799-14.85
|
T
|
2.006
|
0.360-11.173
|
MSI
|
**
|
|
CMS, consensus molecular subtype; MSI, microsatellite instability
*Not estimable due to no events
**Not estimable due to no MSI patients