Target selection and transient transfection of HEK293
HEK293 cells were chosen as the model cell line for allele-specific DNA methylation experiments because of their easy handling and good transfection yields. Information on SNPs in the genome were extracted from a public database (21). An initial estimation of DNA methylation levels was taken from the MBD2-pulldown sequencing data from a previous work (22). An in-silico search was performed to identify the SNPs in the unmethylated CGIs located in the promoter region of genes. Further, the search results were filtered regarding the presence of SNPs in the potential PAM site or next to the PAM site, in the seed region of a potential sgRNA. For 14 target genes containing suitable SNPs, up to 3 sgRNAs were designed to target only one allele as shown in Table 1. The detailed targeting strategy in each case is shown in Additional File 1. Based on the position of the SNP in the sgRNA, the 24 individual experiments could be divided into three categories, either with the SNP in the sgRNA seed region (12 experiments), at position 2 of the NGG PAM sequence (5 experiments), or at PAM position 3 (7 experiments) (Fig. 1, Table 1). SNPs in the PAM region leading to G > Y changes were preferred (where Y stands for C or T), as these changes were demonstrated to allow best discrimination of dCas9 binding (23, 24). Similarly, the position of the mismatch in the sgRNA has been shown to be very important for the binding specificity and single mismatches at the 3' end, within the so-called “seed” sequence, were used because these were shown to reduce Cas9 DNA binding most (25).
Table 1
Compilation of ASM experiments conducted in this study. “ASM (Δ%)” and “ASM (ratio)” refer to the difference and ratio of the increases in DNA methylation at the on- and off-target alleles. The detailed targeting strategy in each case including genome coordinates is shown in Additional File 1.
No. | Gene | SNP location | Experiment | SNP | Target allele | ASM (Δ%) | ASM ratio |
1 | GPD1L | PAM3 | GPD1L - PAM3 | G to T | G | 50 | 7.6 |
2 | GSPT1 | PAM3 | GSPT1 - PAM3 | T to G | G | 47 | 7.1 |
3 | GSPT1 | Seed | GSPT1 -Seed1 | T to G | G | 18 | 1.7 |
4 | GSPT1 | Seed | GSPT1 -Seed2 | T to G | T | 28 | 2.1 |
5 | TTC41P | PAM3 | TTC41P - PAM3 | G to T | G | 6 | 2.3 |
6 | ISG15 | PAM2 | ISG15 – PAM2 | C to G | G | 63 | 8.5 |
7 | ISG15 | Seed | ISG15 - Seed2 | C to G | C | 34 | 1.8 |
8 | ISG15 | Seed | ISG15 - Seed1 | C to G | G | 36 | 2.3 |
9 | MAPK1 | Seed | MAPK1 - Seed1 | T to G | T | 19 | 1.4 |
10 | MRPL52 | PAM3 | MRPL52 - PAM3 | G to T | G | 52 | 3.7 |
11 | MSH6 | PAM3 | MSH6 - PAM3 | T to G | G | 46 | 10 |
12 | MYH10 | Seed | MYH10 - Seed1 | G to A | A | 16 | 2.6 |
13 | NARF | PAM2 | NARF -PAM2 | G to T | G | 45 | 5.6 |
14 | NARF | Seed | NARF - Seed1 | G to T | G | 43 | 3.0 |
15 | NARF | Seed | NARF - Seed 2 | G to T | T | 31 | 1.7 |
16 | PDE8A | PAM2 | PDE8A - PAM2 | G to T | G | 19 | 3.1 |
17 | PDE8A | Seed | PDE8A - Seed1 | G to T | G | 26 | 1.9 |
18 | PDE8A | Seed | PDE8A - Seed2 | G to T | T | 45 | 4.4 |
19 | RAF1 | PAM3 | RAF1 - PAM3 | G to A | G | 4 | 1.7 |
20 | RALB | PAM2 | RALB - PAM2 | G to C | G | 8 | 2.6 |
21 | TYK2 | PAM2 | TYK2 - PAM2 | G to C | G | 15 | 2.7 |
22 | DAP3 | PAM3 | DAP3 - PAM3 | T to G | G | 44 | 4.4 |
23 | DAP3 | Seed | DAP3 - Seed1 | T to G | G | 15 | 1.4 |
24 | DAP3 | Seed | DAP3 - Seed2 | T to G | T | 27 | 5.6 |
To achieve targeted ASM, transient transfection of the dCas9-10x SunTag-BFP, scFv-DNMT3A/3L-sfGFP, and sgRNA-DsRed plasmids was performed in HEK293 cells. Control experiments were conducted with a scrambled sgRNA that does not have a binding site in the human genome (26). Initial studies showed that cells positive for all three plasmids exhibited highest fluorescence of the corresponding reporter proteins on day 3 post-transfection. Hence fluorescence activated cell sorting (FACS) of triple-positive cells was conducted at this time point. Genomic DNA was isolated from the FACS-sorted cells at day 3 after transfection and subjected to bisulfite treatment. Library preparation was performed using the bisulfite-converted samples, followed by NGS and data analysis (Fig. 2). Most DNA methylation experiments were conducted in three independent biological replicates.
Targeting ASM with a SNP in the seed region of the sgRNA
A total of 12 target genes were addressed by individual allele-specific sgRNAs with a SNP positioned in the seed region of the sgRNA (i.e. position 1–5 of the sgRNA). In each case, the targeted allele is referred to as the “on-target” allele, the second allele as “off-target”. Six of these experiments showed strong ASM as described in the next paragraphs (Fig. 3). In all cases, both alleles of the untransfected samples showed almost no DNA methylation, as expected after the pre-selection for unmethylated CGIs.
In the case of ISG15-Seed2 experiment, the transfected sample showed high DNA methylation at many CpG sites of the on-target allele. At CpG sites, 1, 2, 10, 11, 13 and 14, DNA methylation was delivered above 60% and the highest DNA methylation deposition of 80% was observed at CpG 13. The ISG15-Seed2 sgRNA binds to the CpG 16, 17, and 18 in the analyzed region and the DNA methylation data showed that binding of the dCas9 complex protected the sgRNA binding site from methylation deposition (Fig. 3) as observed in previous studies (13, 14). The off-target allele in this experiment also showed some DNA methylation deposition, but the methylation levels always remained below 25%. A scrambled sgRNA was used to further investigate the effect of the global untargeted DNA methylation activity of the EpiEditor. In this case, very low DNA methylation levels were observed at both alleles that were comparable. Interestingly, the methylation levels of the scrambled control samples were lower than that of the off-target allele of the sample transfected with allele-specific sgRNA, suggesting that the off-target allele DNA methylation observed with the ISG15-Seed2 sgRNA at least in part was caused by the undesired binding of the sgRNA/dCas9 complex to the off-target allele (Fig. 3).
To summarize the data and compare the results of different experiments, DNA methylation levels of individual CpG sites were averaged. For this, CpG sites with DNA methylation levels of at least 50% of the highest methylation percentage delivered in the target allele were included (grey-shaded region in the DNA methylation profile graph of each target in Fig. 3). For uniformity in the analysis of the DNA methylation levels across different targets, this procedure was applied in each case. The same CpG sites were included to calculate the average DNA methylation levels of on-target allele and off-target allele DNA methylation levels in every sample. As the maximum DNA methylation at an individual CpG site in the case of ISG15 was 80%, CpG sites displaying more than 40% increase in methylation in the transfected on-target allele were included for average DNA methylation calculation (corresponding to CpG sites 1–3, 5, 7, and 10–14). Based on this, the average DNA methylation of the selected CpG sites in the ISG15 target region was 64%, while in the off-target allele only 27% average DNA methylation was observed at the same CpG sites.
In addition to ISG15-Seed2, PDE8A-Seed2, DAP3-Seed2, NARF-Seed1 and MYH10-Seed1 were successfully targeted allele-specifically with a discriminating SNP in the sgRNA seed region. Also in these cases, the DNA methylation levels of the on-target alleles of the transfected samples displayed higher DNA methylation than the respective off-target alleles of the transfected samples (Fig. 3). The DNA methylation profiles clearly showed the sgRNA binding region footprint in each case (MYH10-Seed1 CpG 6, NARF-Seed1 CpG 8–10, PDE8A-Seed2 CpG 9–10, DAP3-Seed2 CpG 3–5 (Fig. 3). The average DNA methylation levels in the on-target regions of NARF-Seed1, PDE8A-Seed2, DAP3-Seed2 and MYH10-Seed1 were 68%, 59%, 34% and 26%, respectively (Fig. 3). In the cases of PDE8A-Seed2 and DAP3-Seed2, the DNA methylation levels and profile of the off-target alleles from the transfected samples were comparable to the samples treated with scrambled sgRNA, indicating very high allele-specific binding of the allele-specific sgRNA. This observation suggests that the off-target allele methylation was due to an untargeted methylation activity of the DNMT3A/3L construct in these experiments, a problem that has been observed in previous work (14, 27, 28). In the case of NARF-Seed1 and MYH10-Seed1, a weaker allelic discrimination of the sgRNA/dCas9 complex was observed that was comparable to ISG15-Seed2. The DNA methylation profiles of the additional experiments using a SNP in the sgRNA guide region for allele discrimination which led to weaker ASM are shown in Additional File 2, Supplementary Fig. 1.
Targeting ASM with a SNP positioned at the second position of the PAM
In the cases of PDE8A, NARF, ISG15, TYK2, and RALB, allele-specific targeting could be designed by using an sgRNA that places the SNP in the second position of the PAM site. The on-targets of these genes contain the NGG PAM. The off-target allele of PDE8A-PAM2 and NARF-PAM2 contain an NTG sequence, the off-target alleles of RALB-PAM2, TYK2-PAM2, and ISG15-PAM2 contain an NCG sequence. ASM was successful in 3 cases, NARF-PAM2, PDE8A-PAM2 and ISG15-PAM2 (Fig. 4A, DNA methylation profiles are shown in Additional File 2, Supplementary Fig. 2). The average DNA methylation levels of the NARF-PAM2, PDE8A-PAM2, and ISG15-PAM2 on-target alleles were 58%, 28%, and 72%, respectively. In all three cases, the sgRNA/dCas9 complex showed a clear discrimination between the on-target and the off-target alleles and off-target DNA methylation levels were comparable to the scrambled sgRNA controls. In case of NARF-PAM2, the maximum DNA methylation deposited was 75% at CpG 13. The dCas9 footprint on CpG 7–9 was observed in the DNA methylation profile of the on-target allele while no footprint was observed in the off-target allele of the transfected sample as well as both alleles of the scrambled sample. In the case of the ISG15, high specificity and high on-target DNA methylation levels were observed (Fig. 4A). Multiple CpG sites (1, 2, 4, 11, 13, and 14) on the on-target allele showed over 70% DNA methylation deposition in the targeted sample. In the case of the sgRNA/dCas9 complex, targeting one allele of PDE8A-PAM2, DNA methylation was only observed on the on-target allele, although it was weaker, with maximum methylation of about 34% at CpG 5. The DNA methylation profiles of the additional experiments using a SNP at the second PAM position for allele discrimination which led to weaker ASM are shown in Additional File 2, Supplementary Fig. 3.
Targeting ASM with a SNP positioned at the third position of the PAM
In the case of the MSH6, GPD1L, MRPL52, DAP3, GSPT1, RAF1 and TTC41P gene loci, sgRNAs could be designed that place the SNP at the third position of the PAM site. ASM was successfully implemented in case of MSH6-PAM3, GPD1L-PAM3, MRPL52-PAM3, DAP3-PAM3, and GSPT1-PAM3 (Fig. 4B, DNA methylation profiles are shown in Additional File 2, Supplementary Fig. 4). In each case, high on-target DNA methylation and good ASM specificity was observed. The average DNA methylation delivered in the MSH6-PAM3, GPD1L-PAM3, MRPL52-PAM3, DAP3-PAM3, and GSPT1-PAM3 target alleles was 52%, 62%, 72%, 57% and 56%, respectively. The DNA methylation profiles of the additional experiments using a SNP at the third PAM position for allele discrimination which led to weaker ASM are shown in Additional File 2, Supplementary Fig. 5.
Comparison of ASM achieved at one genomic locus by different targeting methods
For the 5 loci, ISG15, NARF, PDE8A, DAP3 and GSPT1, multiple sgRNA could be designed that place the SNP for allele discrimination either in the PAM or sgRNA seed region. Of note, if the SNP is present in the seed region, two distinct sgRNAs can be designed to address both alleles of the target gene. This is not possible if the SNP is in the PAM region, where the non-G allele cannot be targeted. Hence in all cases, 3 different targeting constructs were used. The results of these experiments are presented in Fig. 5. Strikingly, in two examples, PDE8A and DAP3, efficient and opposite ASM was observed on both alleles of the target locus. In general, these data illustrate that ASM was most efficient when the SNP was present in the PAM region.
Stability of ASM in HEK293 cells
The stability of the introduced targeted DNA methylation is a critical issue and previous work has provided examples of high and low stability (22, 29–32). Systematic studies showed that this depends, (among other factors) on the genomic locus (22, 33). In order to investigate the stability of the introduced ASM in our system, the DNA methylation of both alleles in the target regions in the ISG15-Seed2, MYH10-Seed1, PDE8A-PAM2, ISG15-PAM2, MRPL52-PAM3, MSH6-PAM3, and GPD1L-PAM3 experiments was studied at regular intervals after transfection. The day of transfection is noted as day 0 followed by the sorting of triple-positive cells on day 3. The sorted cells were seeded and the culture was maintained until day 11. A fraction of cells was collected on day 3, day 5, day 8, and day 11 and the DNA methylation was analyzed. Every target region showed a maximum of ASM on day 3 and a gradual decrease of DNA methylation until day 11 although the loss of DNA methylation showed different rates (Fig. 6, Additional file 2, Supplementary Fig. 6). The experiments GPD1L-PAM3 and ISG15-Seed2 delivered ASM with high stability of the DNA methylation, and on day 11 DNA methylation levels corresponding to 93% and 84%, respectively, of the methylation observed on day 3 were still present. In the experiments ISG15-PAM2, MRPL52-PAM3, and MYH10-Seed1 69%, 60% and 57%, repectively, of the DNA methylation deposited on day 3 was retained. Other experiments such as PDE8A-PAM3 and MSH6-PAM3 showed low stability of ASM and retained only about 30% of the DNA methylation observed on day 3 until day 11.
Modulation of allele-specific gene expression ratios by ASM
Next, we were interested to determine if the targeted ASM has the capacity to alter gene expression in an allele-specific manner. The genes ISG15, MSH6, MYH10, MRPL52, NARF and GPD1L contain SNPs in an exon allowing to discriminate the expression of both alleles. Therefore, the allelic expression ratios were analyzed in these genes with and without introduction of ASM. RNA was extracted from the cells sorted on day 3 post transfection and from this and untreated cells and cDNA was generated. A pair of primers was designed to amplify the corresponding exonic region containing the SNP followed by library generation and NGS. For each gene, the number of reads per allele was extracted from the sequencing data. The ratio of the reads of each allele was calculated for the untransfected and transfected samples. A change in the ratios between untransfected and the transfected samples indicates an alteration in the expression levels of the alleles of this gene. Among the genes tested, RNA of the transfected samples ISG15-Seed2 and MRPL52-PAM3 displayed a clear shift in the ratio of alleles after ASM when compared to the parental allele ratio (Fig. 7). The ratio of allele reads from ISG15-Seed2 in untransfected samples was 35:65, while after ASM establishment, a ratio of 66:34 was observed, corresponding to a 3.6-fold change of the expression ratio (p-value 4.8x10− 4, based on two-sided t-test assuming equal variance). The allele reads ratio of MRPL52 before and after transfection was 66:34 and 49:51, respectively, corresponding to a 2-fold change (p-value 6.2x10− 4, based on two-sided t-test assuming equal variance). The other studied genes did not show changes in the allelic read ratios (Fig. 7).