Fly work and generation of mutant flies by CRISPR/Cas9 genome engineering
All flies were raised on standard corn meal yeast extract medium at 25°C. CRISPR/Cas9 mutant fly lines Double and ΔPRE1 are described in [12].
Sequences of gRNAs used to create fly lines PRE1_UP and PRE2_UP are described in Table S1. Sense and antisense oligonucleotides were annealed and phosphorylated by the T4 polynucleotide kinase (NEB#M0201S) before being inserted inside a pCFD3 plasmid (Addgene #49410) previously digested by BbsI (NEB#R0539S). To create the pHD-dsRED donor plasmid (Addgene) containing a removable (floxed) 3XP3-dsRED construct flanked by loxP sites and DNA fragments having homology to the target regions (homology arms) serving as template for homology-directed repair, 1.5 kb genomic DNA fragments were amplified by PCR (Table S1) and inserted into the pHD-dsRED plasmid using the GIBSON assemble (kit NEBuilder NEB#E2621S).
The PRE1 and PRE2 sequences were PCR amplified and introduced into the donor plasmid cut by SpeI and BglII using GIBSON cloning (Table S1). To generate mutant fly lines, gRNA-containing pCFD3 and pHD-dsRED donor plasmids were injected into flies expressing Cas9 in the germline (vas-Cas9(X) RFP-; Bloomington stock #55821). Injections and dsRED screening was performed by BestGene (https://www.thebestgene.com/). To remove the dsRED reporter construct, mutant flies were crossed with a fly line expressing CRE recombinase (Bloomington stock #34516).
To generate the PRE1_UP and PRE2_UP mutant line gRNAs targeting the PRE upstream region and corresponding donor plasmid were injected into ΔPRE1 mutant lines previously generated and expressing Cas9 (vas-Cas9(III)). Genotypes of mutant fly lines were confirmed by PCR genotyping and sequencing analysis of the mutated region.
CUT&RUN
CUT&RUN experiments were performed as described by Kami Ahmad in protocilas.io (https://dx.doi.org/10.17504/protocols.io.umfeu3n) with minor modifications. 50 leg discs were dissected in Schneider medium, centrifuged for 3 min at 700g and washed twice with wash+ buffer before addition of Concanavalin A-coated beads. Embryos were collected and homogenized using a glass douncer in Nuclear extraction buffer (20mM HEPES, 10mM KCl, 0.1% TritonX, 20% Glycerol), then washed once with wash+ buffer before addition of Concanavalin A-coated beads.
MNase digestion (pAG-MNase Enzyme from Cell Signaling) was performed for 30 min on ice. After ProteinaseK digestion, DNA was recovered using SPRIselect beads and eluted in 50ul TE. DNA libraries for sequencing were prepared using the NEBNext® Ultra™ II DNA Library Prep Kit for Illumina. Sequencing (paired-end sequencing 150bp, approx. 2Gb/sample) was performed by Novogene (https://en.novogene.com/).
CUT&RUN Analysis
Fastq files were aligned using Bowtie 2 (v 2.4.2) [19] with the following parameters: --local ---very-sensitive-local --no-unal --no-mixed --no-discordant --phred33 -I 10 -X 700. The samples for WT and ΔPRE1 were aligned to the D. melanogaster reference genome dm6 in https://s3.amazonaws.com/igenomes.illumina.com/ and samples for the PRE1_UP condition were aligned to a modified genome. First, the 1.091 bp region chr2L:16,422,435-16,423,525 including the PRE1 was removed from the dm6 reference genome and replaced with a string of 1091 n’s (undetermined nucleotides). Next, the removed 1,091bp-long sequence was inserted at position chr2L:16,560,363 bp to create the modified genome. SAM files were compressed into BAM files using samtools (v 1.16.1) [20] and reads with low mapping quality (Phred score <30) were discarded. Duplicate reads were removed using sambamba markdup (v 1.0.0) [21] with the following parameters: -r --hash-table-size 500000 --overflow-list-size 500000. For visualization, replicates were merged using samtools merge with default parameters and reads per kilo base per million mapped reads (RPKM)-normalized bigWig binary files were generated using the bamCoverage (v 3.5.5) function from deepTools2 [22] with the following parameters: --normalizeUsing RPKM --ignoreDuplicates -e 0 -bs 10. Two biological replicates were used per condition and the IgG condition was used as a control for the differential enrichment analysis in the 131 Drosophila Polycomb domains [15] performing the DESeq2 method from the “DiffBind” R package (v 3.12.0) (Table S2).
RNA-seq experiments
Third instar larval leg discs were dissected in Schneider medium on ice. Total RNA was extracted using TRIzol reagent. RNA purification was performed using the RNA Clean & Concentrator kit (Zymo Research, #R1015). Finally, poly-A RNA selection, library preparation and Illumina sequencing (20M paired-end reads, 150nt) were performed by Novogene (https://en.novogene.com/). All experiments were performed in triplicate.
RNA-seq processing and differential analysis
RNA-seq data quality was assessed using FastQC (v0.12.1). Stranded RNA-seq data were mapped to the Drosophila melanogaster dm6 genome (FlyBase dmel_r6.34) using Subread-align (Subread v2.0.6) with default parameters. Samtools merge was used to merge bam files from the same sample sequenced in different lanes. (Samtools v1.16.1). Aligned sequencing reads mapped to gene transcripts were counted using featureCounts (Subread v 2.0.6) with -s 2 (reverse stranded) and default parameters. The gene transcript annotation file was obtained from FlyBase (release 6.34). Prior to statistical analysis, genes with fewer than 10 reads (cumulating all samples analyzed) were removed. Differentially expressed genes (DEGs) were identified using the DESeq2 R package (Table S3). Genes with adjusted p-value<0.01 (using the Benjamini-Hochberg FDR method) and |log2FC|>1 were considered differentially expressed. Volcano plots were generated using the “EnhancedVolcano” R package (DOI: 10.18129/B9.bioc.EnhancedVolcano).
Hi-C
Hi-C experiments were performed using the EpiTect Hi-C Kit (Quiagene#59971). All Hi-C experiment were performed in two or three independent experiments using 50 3rd instar imaginal leg discs. Briefly, discs were homogenized and fixed in activated Buffer T and 2% Formaldehyde using Tissue Masher tubes (Biomasher II (EOG-sterilized) 320103 Funakoshi). Tissue was digested by adding 25ul Collagenase I and II (40 mg/ml) for 1h at 37°C. Samples were centrifuged and supernatant was carefully aspirated, leaving ~250 μl of solution in the tube. Then 250ul QIAseq Beads equilibrated to room temperature were added to bind nuclei to the beads and all subsequent reactions were performed on the beads according to the manufactures protocol. Libraries were sequenced at BGI (https://www.bgi.com/) PE 150 (approx. 400 million reads per replicate).
Hi-C analysis
Raw data from Hi-C sequencing were processed by using the "shHiC2" pipeline. Sequencing statistics are summarized in Table S4. Valid interactions were stored in a database using the “misha” R package (https://github.com/msauria/misha-package). Extracting the valid interactions from the misha database, the "shaman" R package [https://bitbucket.org/tanaylab/shaman] has been used for computing the Hi-C expected models, Hi-C scores with parameters k=250 and k_exp=500 (Fig. 1c and 2c), and differential Hi-C interaction scores with parameters k=250 and k_exp=250 and per each comparison down-sampling the compared datasets to have the same number of valid-pairs in chr2L (Fig. 1f and 2g). Specifically, Hi-C scores quantify the contact enrichment (positive values) or depletion (negative values) of each bin of the map with respect to a statistical model used to evaluate the expected number of counts. To generate this expected model, we (randomized) shuffled the observed Hi-C contacts using a Markov Chain Monte Carlo-like approach per chromosome [23]. Shuffling is done such that the marginal coverage and decay of the number of observed contacts with the genomic distance are preserved, but any features of genome organization (e.g, TADs or loops) are not. These expected maps were generated for each biological replicate separately and contain twice the number of observed cis-contacts. Next, the score for each contact in the observed contact matrix was calculated using k nearest neighbors (kNN) strategy [23]. In brief, the distributions of two-dimensional Euclidean distances between the observed contact and its nearest k_exp neighbors in the pooled observed and pooled expected (per cell type) data are compared, using Kolmogorov–Smirnov D statistics to visualize positive (higher density in observed data) and negative (lower density in observed data) enrichments. These D-scores are then used for visualization (-100 to +100 scale) and are referred to as Hi-C scores in the text. Accordingly, the color-scale of the Hi-C scores comprises both positive and negative values. When computing the Differential Hi-C scores maps the reference dataset (WT in Fig. 1f and ΔPRE1 in Fig. 2g) were used as the expected model.
For each condition, the Hi-C interaction quantifications at the dac PRE loop (Fig. 1g, 2f) were performed by considering the Hi-C scores between two regions of 6 kb: in Fig. 1g. and Fig 2f (left) we considered the locations of the two endogenous PREs (PRE1 in chr2L:16,419,514-16,425,515bp and PRE2 in chr2L:16,482,929-16,488,930bp) in Fig 2f (right) we considered the locations of the endogenous PRE2 and the insertion site of the relocated PRE1 (PRE2 in chr2L:16,482,929-16,488,930bp and PRE1_UP in chr2L:16,557,363-16,563,364bp). Each of the comparisons of the Hi-C interaction quantifications at the dac PRE loop was performed between the WT (control line) and the indicated mutant line. An unpaired two-sided Wilcoxon statistical test (H0: true median shift is equal to 0. The two variables are not normally distributed) was used to estimate the reported p-values. The annotation of the Polycomb-associated TADs in chr2L in [15] was used to compute the number of Hi-C interactions intra-PcG-TAD, which were then divided by the number of Hi-C interactions intra-PcG-TAD in the WT (control line). The distributions of Log2 of these ratios are shown in the violin- and boxplots of Fig. 1h. The boxplots show: central line, median; box limits, 75th and 25th percentiles; whiskers, 1.5×interquartile range. An unpaired two-sided Wilcoxon statistical test (H0: true median shift is equal to 0. The two variables are not normally distributed) was used to estimate the reported p-values.
The insulation scores [24] were computed on the observed Hi-C datasets binned at 2kb resolution with windows of 100, 150, 200, 250, and 300kb resulting in five values per bin and were stored in the misha database using an in-house R script. The mean and standard deviation per each of the 2kb-bins were computed were used for the plots in Fig. 1d and 2d. The quantification of the insulation scores (IS) at the TAD borders was performed applying the pair-wise statistical comparison of the five IS quantifications per 2kb-bins. The p-values in Fig. 1d and 2d resulted from a Welch t-test (H0: true difference in means is equal to 0. The variances of the samples are thought not to be equal) between the WT condition and each of the mutant condition at the corresponding locus. All plots of Hi-C maps, Hi-C interaction scores comparisons, insulation score (IS) profiles with window 100kb, p-values of IS comparisons were obtained with in-house R scripts provided in Github (see Availability of data and materials).
RNA-FISH
RNA FISH was performed as described in Denaud et al, in press 10.1038/s41594-024-01375-7)