Plasmid construction
pYADE4 yeast plasmids encoding full length DNMT1 and DNMT3a with modified sequences around the translation start sites were kindly provided by Dr Jan Fronck, pYES3/CT encoding DNMT3b was provided by Dr Shen Li 36, 55. DNMT3L cloned into pYES3/CT to produce a Nterminal FLAG tagged DNMT3L was provided by Dr Jia-Lei Hu 37.
pCM188 (marker cgURA3) and pCM185 (marker cgHIS or cgLEU), centromeric vectors 39 which differ for the number of Tet operators (respectively 2 and 7) were kindly provided by Dr Jessie Colin.
SmaI restriction fragment (from pYADE4-DNMT1) containing full length DNMT1 cDNA was inserted at PmeI site of pCM185 (LEU) to give pCM185(LEU)-DNMT1.
BamHI-MluI restriction fragment (from pYADE4-DNMT3a) containing full length cDNA from DNMT3a was ligated to pCM185 (HIS) linearized with BamHI and MluI to give pCM185(HIS)-DNMT3a.
BamHI-NotI restriction fragment from pYES3/CT-DNMT3b containing full length DNMT3b was ligated to pCM188 (URA) linearized with BamHI and NotI to obtain pCM188 (URA)-DNMT3b.
Yeast strains and culture conditions
Strain YPH499 (Mata ura3–52 lys2–801 ade2–101 trp1-Δ63 his3-Δ200 leu2-Δ1) was transformed with 2, 3 or 4 expression plasmids by the standard lithium acetate procedure. Transformants were selected on plates of appropriate selective medium with 2% Raffinose and 10µg/ml doxycycline to repress any expression.
Selected transformants (2 to 4 transformants per combination of plasmids) were grown on selective liquid medium with 2% Raffinose and 10µg/ml doxycycline up to OD600 = 0.5. Then, yeast cells were spun 10min at 1000x g, washed twice with sterilized water, and resuspended into selective media with 1% Raffinose and 2% Galactose without doxycycline to allow expression of all four DNMTs. For experiments on synchronized cells, cells were treated with alpha-factor (3µM final) for 4 hours to synchronise cells in G1 or with Nocodazole to synchronize cells in G2.
After different times of induction, cells were collected and treated for subsequent experiments: protein extraction for western blotting, gDNA extraction for whole genome bisulfite sequencing, RNA extraction for RNA-sequencing or Semi-intact cell preparation for Mnase digestion and nucleosome mapping.
Flow cytometry analysis
0.5 ml of culture (OD600 = 0.6–0.8) were collected and centrifuged for 5 mn at 1000 g at RT. Pellets were washed twice with 1x ice-cold PBS and resuspended in 50 µl of 1x ice- cold PBS. 20 µl of cells were fixed with 1 ml of 70 % EtOH overnight at 4 °C. Samples were washed with 1x Saline Sodium Citrate buffer (SSC; 150 mM NaCl, 15 mM Na citrate, pH 7.8 for 20x SSC). The pellet was resuspended in 0.5 ml of 1x SSC, treated with 0.5 mg/ml RNase A (Roche) for 1.5 h and then with 0.5 mg/ml Proteinase K (Roche) for another 1.5 h at 50 °C. After incubation, cells were briefly sonicated for 10 mn, medium potency, by using the Bioruptor system (intervals of 10 s on–20 s off). 250 µl of the cells were added in 0,5 ml of 1x SSC containing 1 µM Sytox Green (Sigma) and were incubated 10–20mn in the dark (room temperature) before analyzing the DNA content using a Beckam Coulter GalliosTM flow cytometer.
HPLC/MS
HPLC/MS/MS analysis was based on the protocol described 56 54. A Kinetez 2.6 μm HILIC 100A column (150 mm × 4.6 mm) (Phenomenex) and a Acquity UPLC system (Waters Corp., Milford, MA, USA) coupled to a mass spectrometer API 3000™ (AB Sciex, Foster City, CA, USA) triple quadrupole working in MRM(multiple reaction monitoring) method in positive mode. Two eluents were used: eluent A2 (Acetonitrile) and eluent B1 (0.1 M ammonium formiate adjusted at pH 3.2) with a isocratic gradient 8 min of total running time at 90 % A and 10 % B for the nucleosides elution. The separation was performed in a flow of 1400 μl min−1, with 10 μl injection volume and two replicates each, totaling two biological replicates and two technical replicates of each sample. The standard nucleosides cytosine and methyl-cytosine (Sigma) were diluted in HCl 0.01N and stored at −20 °C. The m/z transitions from 112 to 95 (cytosine) and from 126 to 81 (methyl cytosine) were chosen for MRM experiments. The peak area obtained was analyzed by Analyst 1.4.2 (AB Sciex). Quantification (%) was performed according to 5mdC concentration divided by 5mdC concentration plus dC concentration multiplied by 100.
Western Blot
Proteins were extracted by resuspending the pellet of cells from a 20ml cultures at OD600 = 1 in 400µl of RIPA buffer (50mM Tris pH7.5, 150mM NaCl, 1% NP40, 0.5% NaDeoxycholate, 0.1% SDS) containing 1mM PMSF and protease inhibitors (cOmplete ULTRA Tablets, Mini, EASYpack, Roche). 400µl of glass beads were added and samples were processed using FastPrep (MP) for 3 times for 20sec pulses @4.5m/s. After centrifugation 5min at 5000rpm, supernatant were recovered and quantified by bradford. 20 µg of protein were loaded on 6 or 8 % acrylamide gel and subjected to PAGE, proteins were then transferred onto an immobilon membrane (millipore) for subsequent hybridization with anti-DNMT1 (ref ab87654, Abcam), anti-DNMT3a (ab2850, Abcam), anti-DNMT3b (ab122932, Abcam) or anti-Flag (F7425, Sigma) antibody overnight followed by secondary antibody anti Rabbit (Goat)-HRP conjugated (65–6120, Invitrogen). The signal was revealed using ECLTM prime WB detection reagent (Amercham, GE Heathlcare).
DNA METHYLATION PATTERN
Whole-genome bisulfite sequencing (WGBS)
WGBS was performed following the procedure outlined in 9. Briefly, genomic DNA (1- 2μg) was spiked with unmethylated λ DNA (5 ng of λ DNA per μg of genomic DNA) (Promega). The DNA was sheared by sonication to 50–500 bp using a Covaris E220 and fragments of size 150–300 bp were selected using AMPure XP beads (Agencourt Bioscience Corp.). Genomic DNA libraries were constructed using the Illumina TruSeq Sample Preparation kit (Illumina Inc.) following the lllumina standard protocol: end repair was performed on the DNA fragments, an adenine was added to the 3’ extremities of the fragments and Illumina TruSeq adapters were ligated at each extremity. After adaptor ligation, the DNA was treated with sodium bisulfite using the EpiTexy Bisulfite kit (Qiagen) following the manufacturer’s instructions for formalin-fixed and paraffin-embedded (FFPE) tissue samples. Two rounds of bisulfite conversion were performed to assure a conversion rate of over 99%. Enrichment for adaptor- ligated DNA was carried out through 7 PCR cycles using the PfuTurboCx Hotstart DNA polymerase (Stratagene). Library quality was monitored using the Agilent 2100 BioAnalyzer (Agilent), and the concentration of viable sequencing fragments (molecules carrying adaptors at both extremities) estimated using quantitative PCR with the library quantification kit from KAPA Biosystem. Paired-end DNA sequencing (2x100bp) was then performed using the Illumina Hi-Seq 2000.
Read mapping and estimation of cytosine methylation levels
The WGBS reads were processed using the gemBS pipeline v3.0 57 using as reference S. cerevisiae S288c. Reads with MAPQ scores < 20 and read pairs mapping to the same start and end points on the genome were filtered out after the alignment step. The first 5 bases from each read were trimmed before the variant and methylation calling step to avoid artifacts due to end repair. For each sample, CpG sites were selected where both bases were called with a Phred score of at least 20, corresponding to an estimated genotype error level of < = 1%. Sites with >500x coverage depth were excluded to avoid centromeric/telomeric repetitive regions. CpGs were considered methylated when the number of mapped reads was larger than 10 and the estimated methylation percentage was above 0.1.
DNMT specificity analysis
We extracted two bases downstream and upstream from each CpG (having at least ten WGBS reads mapped) and trained a logistic regression model (using R) for the number of converted and non-converted Cs, using the extracted motifs as predictors for each WGBS sample (samples removing one of the DNMTs, T859, T860, T861 and T869; and two samples with the four DNMTs, T862 and T863). We computed for each sample the effect of each motif and its standard deviation, and used it to determine those with a significant effect on methylation level (estimated effect above two standard deviations). We found motifs specific for each sample lacking one of the DNMTs (motifs with significant effect in the sample removing one DNMT but not significant in the sample with all DNMTs) and compared their relative frequencies in all samples.
Nanopore sequencing
Suspensions of spheroplasts from methylated and control S. cerevisiae strains were loaded on Sage Science gel cassettes to perform lysis under electrophoretic conditions. DNA content in each sample was estimated by the cell count. A number of spheroplasts equivalent to 10µg of genomic DNA were resuspended in 70 µl of HLS Suspension buffer (Sage Science, Mammalian white Blood cell suspension kit, #CEL-MWB1) and loaded on the gel cassettes (Sage Science, SageHLS HMW DNA extraction kit #HEX–0012).
The custom Sage HLS (Sage Science) protocol used (Extraction Collection DC55V 1h15m) was accommodated for the yeast small chromosome sizes. This custom protocol did not include a DNA fragmentation step. In brief, during the extraction step, the High Molecular Weight (HMW) yeast gDNA was bound in agarose while the solubilised and degraded proteins and other contaminants were kept in solution. The Sage Science Buffer A was used as a lysis buffer for this step. In the last step of the protocol, the HMW DNA was retrieved from the gel through an automated elution process that was optimized to elute all the yeast chromosomes in the elution module number 2 of the cassette.
Elution modules 1, 2, 3 & 4 were selected for the library preparation of the control and methylated S. cerevisiae samples. For each condition, the selected elution modules were pooled, purified with 1-fold excess of Agencourt AMPure XP beads (Beckman Coulter, A63882) and eluted in water. Two barcoded libraries containing both type of samples were prepared using the Oxford Nanopore Ligation sequencing kit (ONT, SQK-LSK109) combined with the Oxford Nanopore Native Barcoding Expansion kit (EXP-NBD103 1D) following manufacturer’s instructions.
After connecting the flows cells to the MinION Mk1b device, the MinKNOW interface QC (Oxford Nanopore Technologies) was run in order to assess the flow cell quality. Once the priming of the flow cell was finished, from 200ng to 600ng of the final barcoded library was loaded into R9.4.1 FLO-MIN106 or FLO-MIN106D flow cells and the sequencing data were collected during 48 hours. The quality parameters of the sequencing runs were further monitored by the MinKNOW platform in real time. The MinKNOW versions used was 1.15.4. The basecalling was performed using Guppy 2.3.7.
Reads were mapped using minimap2 2.9-r720, and CpG methylation was called using nanopolish 0.11.0.
NUCLEOSOME MAPPING
Semi-intact Yeast cell preparation
Semi-intact cells were prepared as previously described 58. Briefly, cells were grown at 30°C in 300 ml YPD to = 1 x 107 cells/ml. For each 250 ml of cells (107cells/ml), semi- intact cells were prepared as follows. Cells were collected by centrifugation (700 g, 7 min, RT), resuspended in 25 ml 100 mM Pipes, pH 9.4, 10 mM DTT, incubated with gentle agitation at 30°C for 10 min, and collected by centrifugation (1,000 g, 5 min, RT). Cells were resuspended in 6 ml YP, 0.2% glucose, 50 mM KPO4, pH 7.5, 0.6 M sorbitol. 10u zymolase was added, and the suspension was incubated with gentle shaking 30°C for 30 min. Spheroplasting was monitored by light microscopy. Great care was taken not to overdigest cells to avoid lysis. Spheroplasts were collected by centrifugation at 1,000 g for 5 min at RT, re- suspended with a plastic pipette in 40 ml YE 1% glucose, 0.7 M sorbitol, and incubated with gentle shaking at 30°C for 20 min. Spheroplasts were collected by centrifugation (1,000 g, 5 min, RT) and washed twice at 4°C with cold permeabilization buffer (20 mM Pipes-KOH, pH 6.8, 150 mM K-Acetate, 2 mM Mg- Acetate, 0.4 M sorbitol. The final pellet was resuspended in 1ml cold permeabilization buffer containing 10%(v/v)DMSO. 100µl aliquots were placed in 1.5 ml microfuge tubes and frozen slowly above liquid N2 and stored at –80°C.
MNase-seq
0.4 x 109 semi-intact cells were digested with micrococcal Nuclease (MNase), 1.5 unit at 37ºC for 30min with 3mM CaCl2. The reactions were stopped by addition of EDTA to a final concentration of 0.02 M and subsequently incubated with RNase A (0.1 mg) for 4h at 37ºC and further treated with Proteinase K at 37ºC o/n. DNA was purified using phenol–chloroform extraction and concentrated by ethanol precipitation.
The percentage of mononucleosomal DNA fragments was examined by 2% agarose gels. Furthermore, the integrity and size distribution of digested fragments were determined using the microfluidics-based platform Bioanalyzer (Agilent) prior to library preparations following Illumina standard protocol. The short-insert paired-end libraries for MNase sequencing were prepared with PCR free protocol using KAPA Library Preparation kit (Roche). In short, 2.0 micrograms of Micrococcal nuclease (MNase) digested genomic DNA from S. cerevisiae was end-repaired, adenylated and ligated to Illumina platform compatible adaptors with dual indexes (Integrated DNA Technologies). The adaptor-modified end library was size selected and purified with AMPure XP beads (Agencourt, Beckman Coulter). The final libraries were quantified by Kapa Library Quantification Kit for Illumina platforms (Roche).
The libraries were sequenced using TruSeq SBS Kit v4-HS (Illumina), in paired-end mode with a read length of 2x76bp following the manufacturer’s protocol. Images analysis, base calling and quality scoring of the run were processed using the manufacturer’s software Real Time Analysis (1.18.66.3).
Nucleosome calling
MNase-seq paired-end reads were mapped to yeast genome (sacCer3, Apr. 2011) using Bowtie 59 aligner, allowing a maximum of 2 mismatches and maximum insert size of 500 bp. Output BAM files were imported in R 60 58 and quality control was performed with htSeqTools package to remove PCR artifacts 61. Filtered reads were processed with nucleR package 62 as follows: mapped fragments were trimmed to 50bp maintaining the original center and transformed to reads per million. Then, noise was filtered through Fast Fourier Transform, keeping 2% of the principal components, and peak calling was performed using the parameters: peak width 147 bp, peak detection threshold 35%, maximum overlap of 80 bp, dyad length 50 bp. Nucleosome calls were considered well- positioned when nucleR peak width score and height score were higher than 0.6 and 0.4, respectively, and fuzzy otherwise.
Nucleosome Dynamics
NucDyn R package 44 was used to find changes in nucleosome organization between control and methylation induced samples. P-values quantifying the nucleosome change were obtained running NucDyn with the following parameters: maximum difference of 70, maximum length of 140, minimum number of reads to report a shift of 3, shifts threshold of 0.1, indels minimum number of reads to report evictions and inclusions (indels) of 3, indels threshold of 0.05.
GENOMIC ANNOTATION
Data was annotated from the UCSC gene track that contains 6692 genes. We discarded genes that are described as “Putative” or “Dubious” and genes located in the mitochondrial chromosome. We used gene lengths to normalize methylation proportions, nucleosome coverages and CpG density partitioning each gene in 137 bins (each bin has on average 10 bp since the mean length of yeast genes is 1369 bp).
GENE EXPRESSION
mRNA library preparation and sequencing
The RNASeq libraries were prepared from total RNA (extracted by the standard hot phenol protocol) as follows. Total RNA quality and quantity were assessed using Qubit® RNA HS Assay (Life Technologies) and RNA 6000 Nano Assay on a Bioanalyzer 2100 (Agilent). The RNASeq libraries were prepared following KAPA Stranded mRNA-Seq Illumina® Platforms Kit (Roche) following the manufacturer´s recommendations. Total RNA (500ng) was enriched for the polyA mRNA fraction and fragmented by divalent metal cations at high temperature. In order to achieve the directionality, the second strand cDNA synthesis was performed in the presence of dUTP. The blunt-ended double stranded cDNA was 3´adenylated and Illumina platform compatible adaptors with unique dual indexes and unique molecular identifiers (Integrated DNA Technologies) were ligated. The ligation product was amplified with 15 PCR cycles and the final library was validated on an Agilent 2100 Bioanalyzer with the DNA 7500 assay (Agilent).
The libraries were sequenced on HiSeq2500 (Illumina) using TruSeq SBS Kit v4-HS (Illumina), in paired-end mode with a read length of 2x76bp following the manufacturer’s protocol. Images analysis, base calling and quality scoring of the run were processed using the manufacturer’s software Real Time Analysis (1.18.66.3). We generated over 20 million paired-end reads for each sample in a fraction of a sequencing lane.
RNA-seq data processing and analysis
RNA-seq reads were mapped against the yeast reference genome (Sacharomyces_cerevisiae.R64–1–1+plasmid) using STAR version 2.5.2a 63 with ENCODE parameters. Quantification of annotated genes (ensembl release 87) was done using RSEM version 1.2.28 64 with default options. Heatmaps with the top differentially expressed genes was perform with the pheatmap R package. Differential expression between conditions was performed with DESeq2 version 1.18 65 with default parameters.
3D GENOME STRUCTURE
The protocol was performed as previously described 66 with a few modifications. 100 ml of yeast culture were crosslinked with 3% formaldehyde during 20 min and quenched with Glycine 125mM during 5 min at RT. Cells were crushed during 30 min in liquid nitrogen and the chromatin was digested with HindIII. The DNA overhangs were filled- in with dNTP including Biotin–14-dATP, and the resulting blunt end were ligated. After ligation, samples were purified with phenol:chloroform and DNA was precipitated with ethanol.
The paired-end Hi-C-sequencing libraries were prepared with KAPA Library Preparation kit (Roche) with some modifications. The biotin marked and de-crosslinked DNA was sheared to a size of 300–500bp on Covaris™ LE220 (Covaris) focused-ultrasonicator. The fragmented DNA was end-repaired, adenylated and the biotin-tagged DNA was pulled down using the Dynabeads™ MyOne™ Streptavidin C1 beads (Thermo Fisher Scientific). The biotinylated fragments were ligated to Illumina platform compatible adaptors with unique dual indexes and unique molecular identifiers (Integrated DNA Technologies) and enriched by 12 PCR cycles by KAPA HiFi PCR Kit (Roche).
The libraries were sequenced on HiSeq2500 (Illumina) using TruSeq SBS Kit v4-HS (Illumina), in paired-end mode with a read length of 2x76bp following the manufacturer’s protocol. Images analysis, base calling and quality scoring of the run were processed using the manufacturer’s software Real Time Analysis (1.18.66.3).
Hi-C data processing and normalization
We processed Hi-C data using TADbit 67 (https://github.com/3DGenomes/tadbit) for quality control, mapping and filtering. First, quality control was performed with the FastQC protocol implementation in TADbit. Then, reads were mapped to the reference yeast genome (sacCer3, Apr. 2011) with a fragment-based strategy. Afterwards, non- informative contacts (self-circle, dangling-end, error, duplicated and random breaks) identified by TADbit were filtered-out, obtaining 32–37 million valid interactions per experiment. Off-target contacts (neither end of the read mapped to one of the capture regions) were also discarded (full details of the number of excluded reads are given in Supplementary Table S5). Finally, contact matrices were created from valid reads at 5 kb resolution with the corresponding TADbit module, and low frequency bins were removed.
Contact matrices were transformed to .hic format for visualization in Juicebox 68 using the pre command, and normalized with the Balanced method 69.
Differential Hi-C analysis was performed using the R/Bioconductor package diffHic 70. The mapped Hi-C data were filtered and the differential interaction analysis between the control and methylated samples (using the two replicates for each treatment) was performed using the procedure recommended in the diffHic manual.
Hi-C-based chromatin 3D structure
High resolution Hi-C data at 5 kb was used to obtain the 3D structure, conformation and dynamics of entire yeast chromosomes and their context inside the nucleus. The Hi-C technique provides interaction contacts between DNA fragments. The interaction counts or frequencies between two loci i and j (fij) can be converted to spatial 3D distances between those loci (dij) by an inverse relationship (equation 1),
dij = 𝛾 / fij𝛼 (1)
where 𝛾 represents the scale of the structure and is usually taken to match experimental distances between selected genomic regions, and the precise value of 𝛼 depends on the organism under study, the genomic distance, and the resolution of the Hi-C map and needs to be fitted 71, 72, 73. In the present work, 𝛾 was taken for the model to match the size of the cell nucleus measured by confocal microscopy and 𝛼 was fitted to maximize the correlation between experimental and modeled contact maps.
Since Hi-C interaction counts are known to present several biases, such as mappability of fragments, GC content, and fragment length, they were normalized using iterative correction and eigenvector decomposition 74. Finally, the output of the conversion procedure was a matrix containing equilibrium distances (r0) for the different interacting loci. To remove the background noise, a cutoff of two times the median of all trans contacts (i.e., between different chromosomes) was applied to the HiC contact map to define interacting regions.
The chromosome model was built as a chain of beads, each bead representing a genomic region that corresponds to a bin from the Hi-C map. Spatial equilibrium distances were obtained from equation 1 as explained above. The distances between interacting beads
(r) were restrained near their equilibrium length during the simulations by penalizing with a harmonic potential (equation 2) when approaching at shorter distances or moving away at longer distances than the equilibrium. A tolerance of one bead radius was applied, thus resulting in a flat-welled parabola potential (equation 2),
E = k(r r’)2 (2)
where r’ = r0 rbead for r < r0 rbead, r’ = r when r lies within r0 +/- rbead and r’ = r0 + rbead
when r > r0 + rbead.
To ensure proper connectivity of the fiber, consecutive beads were also bound by a harmonic potential but with a force constant five orders of magnitude stronger than that applied to interacting non-consecutive beads. An excluded volume was defined for each bead by a standard Lennard-Jones potential with equilibrium distance equal to one bead radius and a soft energy well. Additional repulsive restraints were added for non interacting beads, forced to remain at a distance longer than the maximum equilibrium distance obtained from equation 1. The initial structure of the chromosome fiber was varied between an extended conformation and a random localization of initially unbound beads in different replicas. The system was allowed to sample the conformational space using pmemd simulation engine for GPU from Amber 18 package. Different conformations of the fibers were determined by attraction and repulsion forces arising from the distance restraints between beads.
In the end, an ensemble of structures was obtained by selecting the minimum number of snapshots minimizing the number of experimental restraint violations (equilibrium distances input). This method yields a population of structures with different conformations, which in average, but not individually, reproduce experimental Hi-C maps derived from population of cells with variable chromatin structure 75.
The ensemble was built in the following way. First, sampled structures with more restraint violations than the mean restraint violations were discarded. Then, the structure with less restraint violations was selected. Considering only the restraints violated by the selected structure, the structure fulfilling more of these restraints was kept. The procedure was repeated iteratively, always considering the restraints that were not fulfilled by any of the previously selected structures. Iterations were stopped when there was no structure left fulfilling new restraints.
Chromatin coarse-grained model at the nucleosome level
The starting point for the 3D chromatin model at the nucleosome level is the coverage of the MNase-seq signal obtained using NucleR software 62. Different families bearing nucleosomes in locations compatible with the MNase-seq experiment are derived by deconvolution of the coverage signal by using a composite Gaussian approximation. For each of the resulting families (compatible with the Mnase-seq signal and DNA/histone stoichiometry) an ideal 3D chromatin structure is prepared and further simulated by a coarse-grained Monte Carlo sampling approach with flexible linkers and rigid nucleosomes. Linker DNA is represented at the base pair level by a pseudo-harmonic potential expressed in helical parameters (rise, slide, shift, twist, roll, tilt) 76. Debye Huckel electrostatics and excluded volume potentials were added to avoid overlaps (exact details of the simulation procedure will be described elsewhere). The results of the different simulations are clustered to select the minimum number of nucleosome
structural families that makes physical sense and that together reproduce MNase-seq experiments.