Multi-omic analysis of FAP samples
We collected and analyzed 135 samples from the colons of six FAP patients (whose characteristics are shown in Supplemental Table 1), isolated from multiple regions (ascending, transverse, descending including sigmoid, and rectum) (Fig 1A). Samples spanned a range of sizes and degrees of dysplasia as determined using histological staining: 29 polyp-adjacent normal mucosa (M, hereafter referred to in the text as "mucosa"), 35 benign polyps (usually small) (B), 57 dysplastic polyps (small to large) (D), eight samples from three adenocarcinomas (AdCa), and six patient-matched blood samples. The percent of tumor and degree of dysplasia in the dysplastic polyps and adenocarcinoma was scored by a trained pathologist (see Methods).
Samples were subject to a battery of multi-omic assays including whole-genome and/or exome sequencing (WGS/WES), bulk tissue RNA-seq, proteomics (TMT labeling), untargeted metabolomics, and targeted lipidomics (Fig 1C). Additionally, eight samples were profiled via whole genome bisulfite sequencing (methylomics). High-resolution “single-cell” assays, including single nuclei (sn) RNAseq, snATACseq) are described in an accompanying manuscript13. Importantly, all donors in this study consented to an “open consent”, facilitating broad sharing of this multi-sample, multi-omic dataset.
The landscape of somatic alterations in familial adenomatous polyposis
Whole genome sequencing (WGS) was performed on germline and somatic DNA isolated from 108 samples (i.e. buffy coat (germline), mucosa, benign and dysplastic polyps, and adenocarcinomas) from 6 FAP patients. The tissue and germline samples were sequenced to either 30X or 60X average depth coverage and the adenocarcinoma samples were sequenced to 100X coverage (Supplemental Figure 1). Somatic single nucleotide variants (SNVs) and small insertion/deletions (Indels) were detected using Mutect2 in multi-sample mode (DNA isolated from blood was used as a germline reference) to increase sensitivity of detection of identical mutations occuring in different tissue samples within the same patient. Copy number variants (CNVs) as well as purity and ploidy estimates for each of the mucosa, polyp, and adenocarcinomas (AdCa) samples were detected using Titan 14 . We found 2203 ± 1726, 4863 ± 1707, 4083 ± 1985, 172,453 ± 203,555 (mean ± sd) SNVs and 1497 ± 1182, 3089 ± 825, 2036 ± 715, 49032 ± 51384 (mean ± sd) indels in the mucosa, benign, dysplastic and AdCa samples, respectively. Interestingly, stage accounts for most of the variance associated with the number of SNVs detected when compared with size of the lesion and dysplasia status (linear mixed effect model, p = 0.0005, Supplemental Figure 1B).
Across the 106 samples, a total of 302 non-synonymous and deleterious (CADD scores > 20) SNVs, indels and CNVs were found in CRC-specific driver genes (Fig 2A). Although most mutations were found in the AdCa samples (from patients A001, F and G), there were numerous CRC driver mutations which appeared in the early benign polyps as well. We identified patient-specific germline APC mutations in each of their tissue samples and manually recovered them in 2 samples, 1 sample in patient F and G each, which had low variant sequencing depth in the affected APC region. An additional patient, A014, did not have an APC mutation detectable by WGS or clinical diagnostic genetic testing; consistent with the observation that 5% of FAP patients lack detectable germline APC mutations 15. Somatic APC mutation status is associated with higher tumor stage (Chi-squared test; X = 21.01, p-value 1e-04), and somatic KRAS mutation exhibits a similar relationship (Chi-squared test; X = 16.01, p-value 0.00113) (Fig 2B). Although secondary APC and KRAS mutations may not be sufficient for polyp development, they are common across polyps with somatic APC, KRAS and TP53 mutations are present in 43%, 28%, and 11% samples, respectively. This frequency is similar to the rates observed in other analyses of FAP samples 16.
In total, 63 genes across all samples were inferred to be under “positive selection” as determined using the ratio of nonsynonymous relative to synonymous mutations (dN/dS ratio) (Supplemental Figure 2K, FDR < 0.05) 17. Many of these genes are involved in the immunoregulatory gene set (R-HSA-198933) from Reactome 18, including FCGR3A, HLA-A, HLA-C, KIR2DL3, KIR2DL4, LILRA2, KRAS, APC, GNAS, and OTOP1 (Log(q-value) = -2.273, hypergeometric distribution) (Supplemental Table 3, Supplemental Figure 2K). Furthermore, immune-related mutations appear to be under positive selection early in polyp development as LILRA2 is positively selected (dN/dS >1) in the benign and dysplastic samples (q=4.11e-6, q=2.5e-6, respectively), whereas FCGR3A (q = 1.04e−02) and HLA-A (q = 8.61e−04) exhibited evidence of positively selected in the dysplastic samples (Supplemental Figure 3K). Pairwise comparisons of positively selected genes (dN/dS > 1) by clinical stage indicates enrichment of LILRA2 mutations in the benign stage and that KRAS and APC mutations are enriched in dysplastic polyps and adenocarcinomas (pairwise Fisher's exact test, FDR < 0.05, Supplemental Figure 2L). This is consistent with immune dysregulation occurring early in polyp development and persisting in established AdCa, similar to the patterns seen at the single cell level in the accompanying manuscript (Becker et al, submitted).
CNV analysis indicates that alterations in chromosomes 14, 7 and 20, which are commonly gained in CRC, can be acquired in the benign and dysplasia stages of polyp development (Figure 2C) 19. A positive association between the size of the lesion and the number of CNVs was observed (R2 = 0.51, p = 2e-16) (Supplemental Figure 2A,B,C). Whereas mucosa samples are usually unaffected by amplifications and losses, dysplastic polyps have significantly more genomic changes when compared to benign polyps (p = 8.2e-6) (Supplemental Figure 2D,E,F). Genomic losses were prominent in adenocarcinomas, possibly indicating that this is an important step in transition to invasion (Figure 2C, Supplemental Figures 2D-G). Moreover, linear discriminant analysis predicted dysplasia and polyp stage from the fraction of the genome affected by CNVs, (Supplemental Figure 2J; 78% accuracy).
Polyp development and clonal evolution
Analysis of somatic mutations from multiple samples of the same patient provides insights into evolution of hyperplastic polyps located at spatially distinct regions in the colon. To characterize these developmental relationships, we analyzed patterns of somatic mutations across tissues from the same patient. We sought to investigate these patterns using multiple approaches. First, we examined the genetic relationship amongst samples from individual patients, by constructing phylogenetic trees based on mutations and small insertion/deletions (SNVs and InDels) based on the maximum parsimony method (Figure 3A, Supplemental Figures 3B,E,H). Deleterious cancer driver genes (CADD>=20) and pathogenic (CLINSIG) mutations are indicated on the phylogenetic tree (Figure 3B, Supplemental Figures 3A,F,I,L,O) 20–22. Importantly, tree shape reflects the evolutionary histories of multiple lesions from a given patient, as illustrated for Patient A001. Here, the majority of samples harbor many common variants accrued over relatively short branch lengths, suggesting a shared developmental phase, prior to an extended independent growth phase, resulting in thousands of private mutations unique to branches corresponding to individual lesions. Indeed, identical mutations, including canonical drivers such as KRAS p.G12V mutations, were found in benign polyps along the transverse and ascending colon and in the adenocarcinoma in the descending colon, reflecting dramatic physical separation throughout the colon. The somatic origin of these shared events was confirmed by manually inspecting the supporting reads (Supplemental Figures 3R,S,T,U,V,W,X).
To further investigate the phylogenetic relationship between samples harboring identical mutations, we performed ancestral sequence reconstruction (ASR) to determine the likely locations of the mutations on the phylogenetic trees: private mutations on terminal branches and shared mutations on internal branches. We estimated the cancer cell fraction (CCF) for each mutation, thus accounting for differences in sample purity and ploidy23. Mutations present in mucosal tissues as well as many benign and dysplastic samples tended to have low CCF values, reflective of multiple subclonal populations that had not fixed in the population (Figure 3D, Supplemental Figure 3C,G,J,M,P). Most private mutations were subclonal, reflecting abundant clonal heterogeneity, consistent with other studies in adenomas 24. Notably, the distribution of CCF values correlates with tissue stage (linear discriminant analysis), indicating that lower heterogeneity is associated with advanced stage (Supplemental Figure 2J), consistent with the notion that selectively advantageous mutations fix in the population during progression to increasingly malignant states, contributing to reduced heterogeneity. We performed the same analyses on the five other FAP patients with similar results for each case; tissue lesions, including polyps, from different regions of the colon shared many mutations, but also acquired numerous private mutations (Supplemental Figure 1A, Supplemental Figures 3A,C,F,G,I,J,L,M,O,P), indicative of an early shared lineage followed by a long period of independent growth. As a control, we compared mutations across polyps from different individuals and observed few shared mutations linking different patients, as expected (Supplemental Figure 4M).
To determine the relative timing of deleterious SNV/Indel mutation events (colon cancer driver mutations with CADD>=20 and “pathogenic” CLINSIG annotations), we used the phylogenetic relationships between the mutations (shared or private) as well as clonal status to place the mutations from all 6 of the patients on a relative timeline of events (Figure 3E). The rationale behind the relative timeline of events is as follows: 1) because “shared” mutations are detected in more samples, they likely arise before private mutations and 2) because clonal mutations are more abundant than subclonal ones, they likely arise earlier. Using a Bradley Terry model, we find that benign and dysplastic polyps often haror KRAS mutations followed by FBXW7 then second-hit APC mutations. This mirrors prior reports in sporadic CRCs 9 where APC mutations were inferred to occur early, followed by KRAS and FBXW7 mutations and subsequent chr 5q deletion (spanning the APC gene). On a per patient level, we find evidence for mutations in HLA-B in the benign, dysplastic and adenocarcinoma samples (Patient F), potentially reflective of immune evasion as a result of disabled antigen presentation (Figure 3F). Additionally, we find KMT2C mutations in mucosa, dysplastic and adenocarcinoma samples (Patients A001 and G, Figures 3B,C,D,E, Supplementary Figures 3P,Q), suggesting a potential early role for epigenetic dysregulation in colon cancer development.
Quantification of subclonal sharing
WGS facilitated the assessment of subclonal sharing across lesions representing different disease stages and physical locations within the colon. In particular, we compared CCF values for every sample in a pairwise manner, revealing, an abundance of shared subclonal mutations between samples despite the comparatively high number of unique mutations (Figure 4A shows a representative example for A001, Supplemental Figures 4B,D,F,H,J). We quantified this relationship by calculating the Jaccard similarity index (JSI), in which the number of identical subclonal mutations is divided by the sum of private clonal mutations and shared subclonal mutations, as previously described25 . High JSI values occur when two samples share a large number of subclonal mutations with few private clonal mutations, whereas low JSI values occur when two samples have fewer shared subclonal mutations and more private clonal mutations. Pairwise comparisons indicate extensive subclonal sharing across all samples but this varies with disease stage (Supplemental Figure 4L). For example, JSI values were extremely high (0.78 ± 0.02) when comparing mucosa samples across all regions of the colon (Figure 4B,C and Supplemental Figures 4A,C,E,G,I,K). Evaluating pairwise comparisons across stages of malignant progression reveals a reduction in JSI values at later stages (Figure 4B,C and Supplemental Figures 4A,C,E,G,I,K). This is consistent with the fixation of selectively advantageous mutations in malignant lesions and is reflected in the correlation between number of total private mutations per sample and stage (Supplemental Figure 1B, p<0.0001, F-value = 17.2, linear mixed effects model). As might be expected, we find that as the number of lesions a subclonal mutation is detected in increases, its median cellular abundance (CCF) increases (Supplemental Figure 4N).
Additionally, for four patients, A001, A002, A014 and A015, we recorded detailed tissue sample location within the colon, enabling comparisons of the physical distance measured in centimeters with genetic similarity based on the JSI. Irrespective of stage, physical tissue lesion distance did not correlate with genetic similarity, again suggesting that shared subclonal mutations are developmentally early events (Figure 4B, C).
We further sought to investigate whether the patterns of genetic variation across polyps were compatible with a model of monoclonal or polyclonal origin. Briefly, we simulated polyp growth assuming a stochastic birth-death process beginning from a single embryonic progenitor cell in the early colonic epithelium, whereby polyps are initiated from either a single (monoclonal) or group of progenitor cells (n=20 or 40) within this field and grow under defined parameters (Methods). The observed patterns of genetic variation across polyps were remarkably consistent with those simulated under a model of polyclonal origin (either 20-40 cells) (Supplemental Figure 4O-V). These results are in line with an early case study by Thirwell et al. 26 who demonstrated the polyclonal nature of adenomas via clonal marking.
Collectively, the phylogenetics and ancestral sharing analyses as well as the patterns of subclonal mutation sharing support a model of hereditary cancer formation and clonal spreading in which mutations give rise to distinct clones early in development, possibly within the endoderm during gastrulation (Figure 4D). These clones disperse throughout the maturing colon as a result of mechanical forces and tissue expansion and continuously accrue mutations during cell division, resulting in polyclonal polyps.
Extensive molecular changes during polyp formation and progression
In order to better understand the molecular steps and pathways activated during the early stages of polyp formation and cancer progression, we performed deep multi-omic profiling of histologically normal mucosa (M), benign polyps (B), dysplastic polyps (D) and a limited number of adenocarcinomas (A) (Figure 1D). RNA-Sequencing, TMT-based proteomics, untargeted metabolomics and targeted lipidomics were performed on 114, 89, 77 and 77 samples, respectively. Because of limited material, particularly for the small polyps, not all assays could be performed for all samples. The relative levels of transcripts (n = 35,081), proteins (n = 12,389), metabolites (n = 1,157) and lipids (n = 514) were obtained after data curation and normalization (see methods).
PCA analysis of each omic layer revealed a continuous transition from mucosa to benign polyps to dysplastic polyps and for RNA-Seq, to adenocarcinoma, suggesting a gradual progression of malignancy (Fig 5A). Analysis of differences relative to normal mucosa revealed thousands of molecular changes. The largest number of changes were found for transcripts and the least for metabolites and lipids (Fig 5B, Supplemental Data 3, 4, 5 and 6 ). At the gene expression level (Supplemental Data 3), most changes between cancer samples were evident in transitions to adenocarcinoma, with many of these same changes occurring at the early stages (i.e. from mucosa to benign and to dysplasia) (Fig. 5C). Similarly, differential analysis of proteomics, metabolomics and lipidomics (Supplemental Data 4, 5 and 6) revealed that the most significantly changed proteins, metabolites and lipids occur at the early stages (Fig. 5C; see methods & Supplemental Figure 5).
For the three early stage normal mucosa to benign polyp (M-B), normal mucosa to dysplasia (M-D), and benign polyp to dysplasia (B-D) and the late transitions to AdCa (M-A, B-A, D-A), transcripts and proteins evincing significant changes in abundance were subject to pathway enrichment via Qiagen Ingenuity Pathway Analysis (IPA). Similar analyses were performed for metabolite and lipid species using Metabolite Set Enrichment Analysis (MSEA) 27, ConsensusPathDB’s 28over-representation analysis and topology based pathway analysis 29 (see methods). These analyses revealed a total of 842 unique pathways that were altered (Summarized in Fig 6A and Fig 6B; molecules and pathways are summarized in Supplemental Data 7 and 8 respectively). We discuss the molecular features associated with these transitions in the following sections.
Transition from mucosa to benign polyp
To identify the earliest events involved in polyp formation, we examined the molecules and pathways altered in benign polyps relative to FAP mucosa and found 4,017 molecules and over one hundred pathways (Fig. 5B, Fig. 5C and Fig. 6; Supplemental Data 8; corrected p-value < 0.05). A number of connected pathways are shown in Fig. 7A. Notably, the arachidonic acid pathway was upregulated in benign polyps relative to FAP mucosa as indicated by key metabolites and proteins (metabolomics; FDR=1E-03 /lipidomics;FDR=7E-03) which also had alterations in the levels of key proteins (e.g. the GPX family (GPX 2,3,4; FDR=4.6E-05), PTGES3 (FDR=2.9E-02), PTGDS1 (FDR=3.1E-03), PTGIS (FDR=2.8E-03), TBXAS1(FDR=4.46E-03), DPEP1(FDR=1.72E-03), LTA4H (FDR= 2.1E-02) (Fig. 7B). The arachidonic acid pathway is particularly interesting since it is involved in inflammation and can be suppressed by aspirin/NSAIDs, a standard treatment for FAP patients to slow polyp progression and reduce cancer formation 30. Thus, our results provide a molecular explanation for the therapeutic response of FAP-treated patients.
We also observed an extensive alteration of immune response pathways at the transcript and protein levels. Among 82 dysregulated immune response pathways, 26 were supported by both RNA-seq and proteomics data (e.g. complement system, IL-12 signaling and production in macrophages, production of nitric oxide and reactive oxygen species in macrophages, etc), and the remaining pathways (n = 56) were only significant at the transcript level (e.g. agranulocyte adhesion and diapedesis, Th1 and Th2 activation pathway, phagosome formation, T cell exhaustion signaling pathway, etc). The complete list of immune response pathways and their changes in each transition are shown in Supplemental Figure 6. We also detected several immune response-related pathways at metabolite and lipid levels (e.g. Ca-dependent event enrichment -log10 FDR = 5.0E-04). Altogether, these observations may indicate a predominant role of the immune system in the early progression of colon cancer at many different molecular and cellular levels (summarized in Fig 7C).
Additional pathways involved in cell proliferation and cell cycle (Transcriptome, Figure. 6A) and amino acid and lipid metabolism (Metabolome/lipidome Figure 6B) were altered in benign polyps. In particular, we identified a depletion of TAG stores in the benign polyps (Supplemental Figure 7), which was supported by alterations in transcripts levels of several key TAG biosynthesis genes, such as GPAT3, AGPAT family, PLPP family, DGAT1/2 (Fig. 7A). Lower abundance of TAGs may suggest utilization of this lipid class as a key source of energy supply for polyp growth. Significant depletion of TAGs has been reported previously in cancerous tissue 31,32; our results suggest that this is an early event during polyp formation.
Although in many cases pathway enrichment analysis using the different assays (e.g. transcriptome and proteome) resulted in similar results, some changes were only evident with a single assay. For instance, the cell cycle/proliferation and nuclear receptor signaling pathways were strikingly different at the transcriptome relative to the proteome levels (Fig. 6A). These include the canonical pathways for a) FXR/RXR and LXR/RXR Activation, which were modestly enriched bidirectionally without clear patterning in the transcriptomic data, but strongly unidirectionally decreasing in the proteomic data for M-B and M-D, and b) Cell Cycle Control of Chromosomal Replication and G2/M DNA Damage Checkpoint Regulation, for which the proteomic and transcriptomic data were similarly enriched but varied in significance according to stage. These results demonstrate multiple levels of post-transcriptional regulation and highlight the value of multi-omic analyses to establish a holistic understanding of biological changes during early hyperplasia.
Transition to dysplasia and adenocarcinoma
Large numbers of molecular (n = 5,380) and pathway (n = 631) changes were also observed during dysplasia onset (from mucosa to benign and to dysplasia, corrected p-value < 0.05 for molecules; nominal for pathways). In most cases (ie. transcriptome, metabolome and lipidome) these changes could be observed between benign and dysplastic polyps. For proteomics the limited number of benign samples (due to limited sample size) resulted in the identification of fewer novel unique proteins. Many of the pathways evident during benign polyp formation were evident in the dysplasia polyps as well (cell cycle/proliferation). However, several pathways including nuclear receptor signaling and tumor microenvironment pathways became specifically activated during dysplasia formation (Figure 6A). At the compound level, we captured significant alterations of several key lipid signaling pathways (e.g. triglyceride catabolism and metabolism and triacylglycerol degradation) (Figure 6B). In addition, we observed dysregulation of some amino acids such as cysteine metabolism (FDR= 6E-04), a larger number of alterations of nucleotide metabolism (pyrimidine metabolism; FDR= 2.9E-07) and mitochondrial signaling (FDR=9.4E-05) as well as significantly higher regulation of inflammatory pathways (arachidonic acid; FDR=4E-04/ immune system; FDR=7.6E-07) at mucosa to dysplastic polyp transition compared to mucosa to benign polyp. Of note, EP300, a histone acetyltransferase that regulates gene expression 33, is decreased in benign polyps but increased in polyps with dysplasia suggesting a significant transcription activity shift during dysplasia formation (Fig 6A). Overall, these results indicate alterations of key molecules and pathways during dysplasia.
Although our focus was on early cancer formation, we further investigated paired adenocarcinomas from three patients. Two were preserved using formaldehyde fixation and thus were not compatible with proteomic, metabolomic or lipidomics assays. Transcriptome analysis of adenocarcinomas compared with dysplastic polyps revealed significantly enriched pathways involved in cancer signaling and formation. Similarly cellular stress and signalling pathways were highly altered, including DNA double-strand break repair, GP6 signaling pathway, ATM signaling, etc. These pathways were similarly enriched in sporadic colorectal cancers profiled within TCGA.
Early steps of cancer progression
The pairwise comparisons above revealed substantial changes between benign polyps and normal tissue, as well as alterations in dysplastic polyps and adenocarcinoma. To systematically identify molecular and pathway-level changes that occur across stages of malignancy, we examined seven distinct patterns of dynamic change across stages of malignant progression and report their representative trajectories, number of significant molecules, and key pathways (Fig 5D, Supplemental Data 7). For example, Pattern 1 shows a continuous increase/decrease across the entire malignancy spectrum. The Wnt2 transcript shares this pattern which exhibited a progessive increase in expression in the transition from mucosa through adenocarcinoma, whereas C2orf88 and CFD decreased. Wnt2 encodes secreted signaling proteins involved in the Wnt signaling pathway, which we detected as significantly enriched in both the transcriptomic and proteomic data. C2orf88 (chromosome 2 open reading frame 88), which exhibited continuously decreasing expression, was previously reported as the one of the most significantly differentially downregulated genes in CRCs34. Pattern 2 includes molecules that increase during benign and dysplastic polyp formation, but subsequently plateaued, including NKD1, ACSL and FOSL1. NKD1 is a negative regulator of the Wnt signaling pathway 35–37 that has previously been reported to be up-regulated in colorectal adenomas 38. ACSL6 has also been found to be overexpressed in colon cancer 39 and has been investigated as a long chain acyl-CoA synthetase target for cancer therapy 40. Pattern 3 includes pathways associated with tumorigenesis and cancer cell mobility that were altered in benign polyps and in adenocarcinomas but were not significant when comparing benign and dysplastic polyps. Finally, several molecules/pathways were associated with either an increase or a decrease in the dysplastic polyp through adenocarcinoma phase (Pattern 4, “dysplastic associated molecules''). Genes altered in pattern 4 include the taurine transporter, SLC6A6, the tumor suppressor LGALS2, BTNL8 an immune response-related gene, and metallothionein (MT) protein family members such as MT1G. Intriguingly, MT1G was previously reported to be associated with CRC prognosis41. Other patterns of change (Patterns: 5, 6 and 7) reveal molecules and pathways that change at the normal mucosa to benign polyp transition, benign polyp to dysplasia and dysplasia to adenocarcinoma. Collectively, these results indicate specific pathways are variably altered across normal, dysplastic and malignant tissues.
DNA methylation changes during early tumorigenesis
Finally, we examined DNA methylation in eight samples profiled via whole genome bisulfite sequencing (WGBS). Global hypomethylation increased across all transitions, from colorectal mucosa to benign and dysplastic polyps to adenocarcinoma (Figure 5E); in particular, the fraction of partial methylation domains (PMD) increased up to 100% (Figure 5E). At the gene level, promoter hypomethylation was observed in the transition from mucosa to benign polyp for WNT2, TGIF1 and SOX9 (Figure 5F), which are known to be involved in Wnt signaling 42,43 , TGF-beta signaling 44 and CRC metastases 45 respectively. For these three genes, hypomethylation increased with each transition, as did RNA expression. In contrast, PDGFD, a gene involved in RAS signaling 46, exhibited increased hypermethylation and decreased expression as the tissue stage advanced toward adenocarcinoma. This is supported by the finding that PDGFRB protein levels decreased significantly (FDR < 0.01) in benign and dysplastic polyps relative to mucosa and by changes in ceramides (a key component of sphingolipid metabolism), which were significantly downregulated (FDR < 0.05) in both polyps relative to mucosa and associated with the PDGF family (Figure 7A). These results are consistent with WGBS data in advanced cancers from TCGA (Figure 5G)47 and indicate that DNA methylation changes occur early during CRC formation, prior to observable dysplasia.