Cohort Description
For the discovery phase of this study, we utilized colon adenocarcinoma (COAD) samples obtained from TCGA (44). The subsequent replication phase was carried out through a meta-analysis, incorporating data from studies on colon cancer (GSE131013 (45) and GSE42752 (46)), rectal cancer (TCGA-READ and GSE39958 (47)), and colorectal cancer (GSE101764 (48) and GSE77954 (49)). We categorized the participants by age: those diagnosed before the age of 50 were identified as early-onset cases, while those diagnosed at 70 or older were designated as later-onset CRC (LOCRC) cases. The population characteristics across all datasets are presented in Table 1.
Table 1: Study population Characteristics
Study
|
N total
|
Females, N (%)
|
Age, years mean (SD)
|
Cancer type
|
Discovery dataset:
|
|
|
|
|
TCGA-COAD
|
|
|
|
Colon
|
Early-onset
|
31
|
18 (58)
|
43 (5)
|
|
Later-onset
|
100
|
43 (43)
|
78 (5)
|
|
Replication datasets:
|
|
|
|
|
TCGA-READ
|
|
|
|
Rectal
|
Early-onset
|
14
|
6 (43)
|
44 (6)
|
|
Later-onset
|
30
|
21 (70)
|
76 (6)
|
|
GSE39958
|
|
|
|
Rectal
|
Early-onset
|
12
|
3 (25)
|
45 (7)
|
|
Later-onset
|
9
|
2 (22)
|
74 (5)
|
|
GSE42752
|
|
|
|
Colon
|
Early-onset
|
4
|
3 (75)
|
46 (3)
|
|
Later-onset
|
7
|
5 (71)
|
76 (4)
|
|
GSE77954*
|
|
|
|
Colorectal
|
Early-onset
|
3
|
3 (100)
|
48 (1)
|
|
Later-onset
|
10
|
3 (30)
|
77 (6)
|
|
GSE101764
|
|
|
|
Colorectal
|
Early-onset
|
13
|
3 (23)
|
38 (8)
|
|
Later-onset
|
37
|
13 (35)
|
76 (5)
|
|
GSE131013*
|
|
|
|
Colon
|
Early-onset
|
2
|
0 (0)
|
46 (4)
|
|
Later-onset
|
58
|
13 (22)
|
76 (5)
|
|
Early-onset cases are defined as participants younger than 50 years, while later-onset cases comprise participants diagnosed at age 70 years and older. Dataset GSE77954 encompasses primary (n=7) and metastatic samples (n=6). *Notably, the early-onset category in GSE131013 comprises solely male participants while GSE77954 includes only female participants. Consequently, adjustment for sex was not feasible in these two datasets.
|
|
Exposome-related DNA methylation marker sets
In exploring the exposome's influence on early-onset versus later-onset colon and rectal cancers, our research focused on a curated list of 29 (28–43) lifestyle and environmental factors. The analyzed traits encompassed 10 lifestyle factors: the Alternative Healthy Eating Index (AHEI) (28), alcohol consumption (29), birth weight (30), BMI (continuous variable in kg/m2) (31), coffee consumption (32), education level (33), Mediterranean Diet Score (MDS) (28), obesity (defined as ≥30 kg/m2) (34), smoking habits (35), and smoking inference model (smoking-Maas) (36). Furthermore, we examined 5 air pollution particles: nitrogen dioxide (NO2) (37–39), polychlorinated biphenyls (PCBs) (42), and particulate matter (PM) <10 micrometers (µm) in diameter (PM10) (38–40), <2.5 µm (PM2.5) (37–41), and between 2.5 and 10 µm (PM2.5-10) (40). In addition, we included 14 pesticides encompassing 2,4-dichlorophenoxyacetic acid (2,4-D), atrazine, acetochlor, chlordane, dicamba, malathion, Dichlorodiphenyltrichloroethane (DDT), heptachlor, lindane, glyphosate, mesotrione, metolachlor, picloram, and toxaphene (43). For the marker selection, we identified for each trait significantly associated CpG sites from extensive epigenome-wide association studies (EWAS), employing various significance thresholds, namely P<1.2×10-7, P<1.0×10-5, and false discovery rates (FDR) of <0.01, <0.05, and <0.1.
Exposome-related methylation risk scores
Utilizing available EWAS summary statistics of each trait for the five marker selection thresholds, we identified 73 exposome CpG sets across the 29 exposome traits. These sets were used to compute 73 weighted methylation risk scores (MRSs), utilizing DNA methylation beta-values adjusted for epigenetic age estimators derived from the Horvath clock (50). The number of CpG sites associated with each trait and their respective significance thresholds can be found in Supplementary Table S1 and their corresponding weights in Supplementary Table S2. To elucidate the exposome impact on early-onset colon and rectal cancer cases, we compared the 73 MRSs across early- vs. later-onset (reference group) patients, using multivariate logistic regression models. As there are sex disparities in CRC incidence (51), we adjusted the regressions for sex, when possible. In the discovery dataset, positive associations were observed for MRSs related to PCB, PM10, the smoking-Maas model, heptachlor, metolachlor, picloram, and toxaphene, while negative associations were found for MRSs corresponding to BMI, education level, MDS, obesity, atrazine, malathion, and mesotrione (Fig. 1 and Supplementary Table S3).
We highlight the data for four lifestyle factors previously linked to colon and rectal cancers, including the MDS (52) (Fig. 2a) and education level (53) (Fig. 2b), which are considered protective factors, as well as smoking habits (54) (Fig. 2c) and obesity status (55) (Fig. 2d), which are recognized as risk factors. To elucidate the directionality of our findings, the heatmaps in the left panels of Fig. 2 show the methylation level distributions across CpGs featured in each of the four MRSs, their direction in the original EWAS and sorted by the derived MRSs. The heatmaps depict that an increased MRS correlates with higher beta-values in CpGs with positive associations in the EWAS, and lower beta-values in CpGs with negative associations (please refer to Supplementary Fig. S1 for a more detailed explanation). These results suggest that an elevated MRS mirrors greater exposure levels in the original EWAS. Specifically, for patients with early-onset colon cancer, this suggests deviations from the MDS (Padj. = 0.037) (Fig. 2a), lower education levels (Padj. = 0.025) (Fig. 2b), increased smoking exposure (Padj. = 0.010) (Fig. 2c), and lower obesity rates (Padj. = 0.011) (Fig. 2d) in comparison to those with later-onset, as illustrated in the middle panels of Fig. 2. The association of lower obesity rates in early-onset cases was verified utilizing physical metrics from TCGA-COAD. Colon cancer patients with a BMI over 30kg/m2, as measured in the clinic, were categorized as obese, resulting in 4 out of 24 early-onset and 18 out of 72 later-onset patients marked as obese. This provides a relative risk (RR) of 0.67 (95% CI: 0.26-1.76) for obesity in early-onset colon cancer patients within the TCGA-COAD cohort (Supplementary Fig. S2), supporting the results obtained with the MRS for obesity, and validating our methodology.
Extending our investigation, we conducted a meta-analysis of the six replication datasets (Fig. 1 and Fig. 2 right panels, and Supplementary Table S4). This analysis corroborated the initial findings, notably the associations of non-adherence to MDS (P = 0.011, Padj. = 0.051), lower educational levels (P = 0.0039, Padj. = 0.025), and higher smoking exposure (P = 0.0025, Padj. = 0.024) in EOCRC (Fig. 2, right panel). Moreover, we conducted separate meta-analyses of the datasets comprising only rectal cancer samples (TCGA-READ and GSE39958) and only colon cancer samples (GSE131013 and GSE42752). The results obtained in TCGA-READ and the meta-analyses are presented in Supplementary Fig. S3 and Supplementary Table S5.
Picloram-related methylation risk scores
Our results highlight a novel association between the MRSs for pesticide picloram and the incidence of early-onset colon and rectal cancer, in comparison to later-onset cases, in both the discovery and meta-analysis (Fig. 1). We explored the directionality of these results further, highlighting the data for the MRS employing the genome-wide marker selection threshold (MRS-GW). We observed that a higher exposure level, as indicated by the original EWAS direction, is associated with an elevated MRS (Fig. 3a). This association highlights an augmented exposure to picloram among patients with early-onset colon cancer (Padj. = 0.00049) (Fig. 3b), a finding consistently supported by our meta-analysis (P = 0.021; Padj. = 0.081; OR: 1.6 [95% CI: 1.07-2.38]) (Fig. 3c).
To ascertain the reliability of our findings, we executed two distinct permutation tests to ensure that the observed associations stem from biological relationships rather than being artifacts of particular CpG selections or patient classifications. Initially, the examination of the CpG sites in our MRSs revealed that the CpG sites from the picloram MRS-GW were 13th in terms of significance among 10,000 permutations (Fig. 3d). Furthermore, patient classification permutation identified age-based classification as the second most significant, based on picloram MRS-GW, among 1,000 permutations, as depicted in Fig. 3d. The outcomes of the CpG site permutations for MDS, education levels, the smoking-Maas model, and obesity are detailed in Supplementary Fig. S4a, while the findings from the onset categorization permutations are shown in Supplementary Fig. S4b.
Young tumors associated with picloram exposure
Current patient classification of CRC into early-onset or later-onset categories relies on the patient's age at diagnosis. However, this method is flawed, as the interval between tumor initiation and diagnosis varies significantly among patients, rendering age at diagnosis an unreliable indicator of the tumor's actual age. To address this, we assessed if the single-base substitution signature 1 (SBS1) score, an indicator of the number of mitotic divisions a cell has undergone (56,57), can instead be used as tumor age. For this purpose, we selected patients from TCGA-COAD with data available on DNA methylation and mutational signatures. Furthermore, we excluded patients exhibiting microsatellite instability (MSI), considering MSI arises from defective DNA mismatch repair, inducing distinct mutational patterns that might drive tumorigenesis through mechanisms different from those in microsatellite stable (MSS) tumors (Supplementary Fig. S5) (58,59). The distribution of SBS1 mutations across different age groups— early-onset, middle-onset (aged between 50 and 69 years), and later-onset— among the 173 patients included in the study is detailed in Fig. 3e. Upon comparing the early-onset cases (N = 25) against later-onset cases (N = 72) and the picloram MRS-GW, we observed a statistically significant difference (OR: 2.99 [95% CI: 1.70-5.85]; P = 4.27×10-4). Next, we employed a SBS1 score threshold, identifying 72 young (SBS1<60) and 101 old tumors (SBS1≥60). The novel patient categorization underscored a significant association with picloram MRS-GW (OR: 1.84 [95% CI: 1.31- 2.66]; P = 6.57×10-4) (Fig. 3f). The chronological age distribution for these SBS1-categorized young and old tumors is provided in Fig. 3g.
Pesticide use and EOCRC incidence in population data
The use of MRSs as a proxy for pesticide exposure identified significant associations with several pesticides. Next, we aimed to validate the obtained results employing pesticide use intensity and incidence of EOCRC for available overlapping counties in California, Connecticut, Georgia, Iowa, New Mexico, Utah, and Washington in the United States. Based on data availability, we extracted pesticide usage from the Pesticide National Synthesis Project encompassing acetochlor, 2,4-D, atrazine, dicamba, glyphosate, mesotrione, and picloram. The EOCRC incidence rates were extracted from the Surveillance, Epidemiology, and End Results (SEER), encompassing EOCRC rates measured in 8 registries from 1975 to 2020 (SEER8) or in 12 registries measured from 1992 to 2020 (SEER12). The total number of included observations, the number of measured years times the number of overlapping counties, for acetochlor (SEER8: N = 1.111, SEER12: N = 1.196), 2,4-D (N = 1.983, N = 2.059), atrazine (N = 1.871, N = 1890), dicamba (N = 1.909, N = 1.964), glyphosate (N = 2.002, N = 2.097), mesotrione (N = 636, N = 636), and picloram (N = 1.531, N = 1.548) are depicted in Fig. 4a. Specifically, Fig. 4b shows the average (log) picloram use intensity and the average EOCRC incidence rates between 1992 and 2012 in the state of Iowa. To assess the relationship between pesticide-use intensity (exposure) and age-adjusted EOCRC incidence rates, we utilized linear mixed models adjusting for the years of data collection and a random effect to accommodate county-level variations. Our approach also tested for interaction effects between pesticide use intensity and the years of data collection, which proved to be non-significant for all pesticides under study (data not shown). Significant associations were found between the pesticide use intensity of multiple pesticides and EOCRC incidence in both SEER8 and SEER12, including glyphosate (SEER 8; P = 1.18×10-5, SEER 12; P = 2.02×10-4), atrazine (P = 1.81×10-4, P = 4.21×10-3), picloram (P = 2.87×10-3, P = 1.82×10-2), 2,4-D (P = 4.16×10-3, P = 1.8×10-3), and dicamba (P = 4.94×10-3, P = 4.10×10-2), depicted in Fig. 4c.