Tissues
Tissues were prospectively collected at the University Hospital of Zurich (Switzerland) with institutional research ethics committee approval. Donors provided written consent to tissue collection, testing, and data publication. Samples were numerically coded to protect donors’ rights to confidentiality.
Immediately after resection, biopsies of normal colon mucosa, primary CRCs, and CRC liver metastases were frozen in liquid nitrogen and stored at -80°C. Samples included six primary tumors, three of which were accompanied by patient-matched samples of normal mucosa from the same gut segment (>2 cm from the tumor), and twelve liver metastases (none of which were from the primary CRC donors) (Table 1). All of the primary and secondary lesions were microsatellite-stable and CpG island methylator phenotype-negative.
Laser tissue microdissection
Laser tissue microdissection was done with a CellCut system (Molecular Machines & Industries, Glattbrugg, Switzerland). Briefly, 10 µm-thick sections were cut with a Hyrax C60 cryostat (Zeiss, Feldbach, Switzerland) from frozen tissues embedded in Tissue-Tek O.C.T. (i.e., optimum cutting temperature formulation; Sakura, Alphen aan den Rijn, The Netherlands). The sections were placed on membrane slides (Molecular Machines & Industries), fixed in propanol for 45 seconds, and covered with one drop of Mayer's hematoxylin (MHS128, Sigma-Aldrich, Buchs, Switzerland) for 45 seconds. After vigorous washing, the sections were sequentially immersed in 0.1% ammonia (10 seconds), propanol (45 seconds), and xylene (45 seconds) and dried for 5 minutes. Stained tissues were then subjected to laser microdissection using special tubes with caps to which the dissected sections adhered (Molecular Machines & Industries, Glattbrugg, Switzerland). Epithelial cells from normal or neoplastic crypts were selectively collected on the cap, with care taken to minimize stromal-cell contamination. Between 10 and 25 x 106 µm2 of dissected epithelium (= ≈100,000 to 250,000 epithelial cells) was collected from each sample. Immediately after dissection, DNA was extracted with the QIAmp DNA Micro kit (Qiagen, Hombrechtikon, Switzerland) and quantified with a Qubit fluorometer and dsDNA HS Assay kit (Thermo Fisher Scientific, Reinach, Switzerland) (total yield: 100–500 ng DNA).
MBD-peptide-based capture of DNA for deep sequencing
Methylated DNA for sequencing was isolated using an MBD-capture protocol [20]. Reaction volumes were scaled down to successfully process the small volumes of genomic DNA obtained with laser tissue microdissection. Briefly, 100 ng of input DNA was fragmented to an average length of 200 bp in a Covaris (Brighton, UK) S2 ultrasonicator. Recombinant MBD2 protein-mediated enrichment (MBDE) for methylated DNA was carried out with the MethylMiner kit (ThermoFisher, Waltham, MA, USA) using a single elution step with 2000 mM NaCl elution buffer. Multiplex Illumina libraries were prepared with the NEBNext Ultra DNA Library Prep Kit (New England Biolabs, Ipswich, MA, USA), and their sizes and concentrations were evaluated with the Agilent (Santa Clara, CA, USA) 2200 TapeStation system. Libraries were sequenced on an Illumina 2500 system (Illumina, San Diego, CA, USA) (on average 50-bp single-end reads, 40 million reads per sample).
Raw methylome data are deposited in ArrayExpress (accession number: E-MTAB-8232).
Detection of differential methylation
MBD-sequencing reads were aligned to the GRCh37/hg19 human reference genome using bwa (version 0.7.12-r1039) and the BWA-MEM algorithm [21] and de-duplicated with Picard (Picard Toolkit, version 2.13.2; 2018. Broad Institute, GitHub Repository: http://broadinstitute.github.io/picard/).
The R-package csaw (version 1.14.1) [22] was used to identify regions that were differentially methylated in neoplastic tissues (primary and/or metastatic) vs. normal mucosa or in metastatic vs. primary cancers. The number of reads per sample was counted in consecutive overlapping windows (length: 200-bp, overlap: 190 bp). Windows with minimum count sums of 30 across samples were kept. Additional filtering was used to exclude windows with average log counts per million that were below -1. Reads aligned to the X or Y chromosome were excluded from analysis. The csaw package uses methods from edgeR (version 3.22.3) to identify differential binding [23] (i.e., NB GLM model to fit read counts). Each filtered window was tested for differential methylation (i.e., differential binding of the methyl-binding protein) associated with disease status (CRC, primary or metastatic, vs. normal mucosa) or disease stage (primary vs. metastatic CRC). After testing, windows separated by no more than 50 bp were merged to form regions, and P-values were calculated for each region using the Simes method, as described in the package. Differentially methylated regions (DMRs) were classified as hypermethylated or hypomethylated based on the direction of the methylation change in the majority of windows included in the region.
Regions were annotated using the annotatr R-package [24], which overlays regions of interest with predefined annotations from external sources. Annotations were grouped to create three “intragenic” categories: 1) regulatory: genomic regions 5 kb upstream from the TSS, and 5’UTRs; 2) gene body: exons and introns (from the end of the 5’UTR to the beginning of the 3’UTR); and 3) the 3’UTRs. Everything outside these regions was considered “extragenic” (Figure 1A). Regions were also classified based on their CpG density: CpG islands and the CpG shores and shelves flanking them were grouped together and referred to as an sSISs (shelf–shore-island-shore-shelf) region, and everything outside these regions was classified as extra-sSISs (Figure 1D). These regions were overlaid onto all windows tested for differential methylation.
Visualization was performed using the ggplot2 (version 3.0.0) [25] and UpSet (version 1.3.3) [26] R-packages. Analyses were performed with R versions 3.6.0.
Estimated CpG density
Each CpG site (CpGs) (Team TBD 2014; BSgenome.Hsapiens.UCSC.hg19; R package version 1.4.0.) was centered within a 200-bp window, and the ratio of the observed to expected numbers of CpGs (O:E CpG ratio) [27] was calculated for each window, as a measure of its CpG density (low - O:E CpG ratios < 0.3; intermediate >= 0.3 but < 0.6; high >= 0.6 – high density).
Comparison of MBDE- and targeted bisulfite-sequencing in colorectal cancer
In a previous study [13], we used targeted bisulfite sequencing (SeqCapEpi CpGiant protocol, Roche, Basel, Switzerland) to assess DNA methylation in three CRCs (all microsatellite-stable and CpG island methylator phenotype-negative) and matched samples of normal mucosa (ArrayExpress accession number: E-MTAB-6949). The results of those experiments were compared with those obtained in the present study with MBD-capture sequencing, to identify potential strengths and weaknesses of the two enrichment methods for CRC methylome characterization.
For this analysis, we considered CpGs covered by the targeted enrichment (TE) procedure, those covered by MBD enrichment (MBDE), and those covered by both enrichment methods. Because nearby CpGs are frequently co-methylated [28, 29], the log-fold change (logFC) and P-value for each CpG site in a given MBD-captured region were assumed to be identical to those calculated for the region as a whole. For CpGs covered by the TE procedure, differences in methylation proportions and P-values were calculated with the BiSeq R package (Hebestreit K, Klein H, 2018. BiSeq: Processing and analyzing bisulfite sequencing data. R package version 1.22.0), by modeling the methylation level within a beta regression and estimating the group effect, in this case primary CRC vs. normal
CpGs covered by both methods were classified according to the CpG density of the region in which they were located (as specified above) and the direction of the differential methylation identified at the site in primary CRCs (vs. normal mucosa): hypermethylated (TE: difference in methylation proportions > 0; MBDE: logFC > 0 and P-value < 0.05); hypomethylated (TE: methylation proportion difference < 0; MBDE: logFC < 0 and P-value < 0.05); isomethylated (TE: methylation proportion difference = 0; MBDE: logFC = 0 and P-value => 0.05). The strength of the linear association between the two methods was calculated with the Pearson correlation coefficient.
DMRs identified from TE data with the BiSeq R package were overlaid with DMRs detected from MBDE reads, and regions were classified as overlapping if they shared a sequence of at least 1 bp.