Amplicon sequencing (AmpSeq) of ABCs shows increased indels in human B cells
Of the known ABCs associated with B cell malignancies,3 the major focus here is on the ABC upstream of the CRLF2 gene.13CRLF2 is located on a pseudoautosomal region (PAR) shared by both the X and Y chromosomes.33 We also include data on the ABC in the 3’ UTR of the BCL2 gene that is also called the BCL2 Major Breakpoint Region (MBR) (Fig. 1A) to demonstrate that more than one ABC site can be targeted even though only a DSB at one leads to an oncogenic translocation with IGH.10
According to the mapping of chromosomal translocation junctions in human patients, a DSB that results in pathogenic rearrangements involving CRLF2 can occur anywhere in the 27 kb region upstream of the gene, yet two specific DSB clusters are of note.13 Approximately 4.5 kb upstream of CRLF2 is a cluster that shows the CAC/CACA motif associated with being a cryptic recombination signal sequence (RSS) recognized by the RAG complex. Off-target RAG cutting at this site leads to an intra-chromosomal deletion that puts the CRLF2 gene under control of the P2RY8 promoter, leading to its overexpression.13,34 The other site is 16 kb upstream of CRLF2 and contains no cryptic RSS sites, but does have DSBs at or near a CpG, which is a hallmark of ABCs since AID has an affinity to deaminate cytosines within RCG sequences (WGCW>>>WRC > > RCG).35–37 For rearrangements involving ABCs, it has been shown that a translocation is 30% more likely to occur directly at a CpG dinucleotide and 70% more likely to occur within 8 bp of a CpG dinucleotide.10
AID requires single-stranded DNA (ssDNA) as a substrate for deamination.35 While AID expression in the human pre-B cell lines Nalm6 and Reh is below the limit of detection,38 the ssDNA generated at these sites would still be vulnerable to damage. Thus, we were curious if ABCs show an inherent propensity for damage. This was tested using amplicon sequencing (AmpSeq). The detailed sequence for the CRLF2 region with the ABC is shown in Supplemental Fig. 1A. Strikingly, as with the reported patient data,10,13 there is a peak of deletions near the CpGs in the CRLF2 ABC in both Nalm6 and Reh cell lines (Fig. 1B). These deletions likely represent information scars indicating that these are sites of DSB formation and NHEJ repair, which is prone to creating indels. In Nalm6 cells, indels appear very focal and are highest at the first two CpG sites. While indels are also highly clustered at this site in Reh cells, we found that the indels appear more spread throughout the ABC. In both cell lines the downstream CpG does not appear to accumulate indels.
Similarly, using AmpSeq at the BCL2 ABC (Supplemental Fig. 1B), we also see indels, yet observe a notably different pattern for each cell line (Fig. 1C). It has been reported that DSBs cluster at the three regions carrying CpG sites in patients.10 In Nalm6 cells, we measure a major peak centered on the middle CpG site, but no indels at the flanking CpG sites. In contrast, while there is still a peak in Reh cells, it is downstream of the three CpG sites. This may be indicative of either this region being less accessible to enzymes that drive formation of these indels or that DSBs ends are processed or resected differently once formed. Importantly, that indels accumulate in the vicinity of CpGs mapped at both CRLF2 and BCL2 to correlate with the regions where DSBs involved in chromosomal translocations have been mapped in human patients10,13 shows these regions have an innate instability consistent with results studying their cytosine content.39
CRISPR/Cas9 demonstrates ability of digital PCR (dPCR) to detect indels at ABCs
While AmpSeq can provide an analysis of sequence changes occurring at ABCs, we sought an improved method that was more rapid and quantifiable to detect mutations within ABCs and developed a novel dPCR-based assay40 for the CRLF2 and BCL2 ABCs (Supplemental Fig. 1A and 1B). For CRLF2, three TaqMan probes bind directly at one of the three CpG sites (FAM, TAMRA, Cy5). These are drop-off probes as DSB formation and repair would alter the sequence, preventing probe binding (i.e., it “drops-off”). The SUN reference probe binds where no DSBs have been mapped and acts as a detector for the amplicon and a baseline for the total fluorescent signal produced. Genomic DNA (gDNA) from cells or patient blood can then be collected and subjected to dPCR analysis to determine genome stability at specific loci (Fig. 2A).
We tested the sensitivity of the dPCR assay using a CRISPR/Cas9 system to induce DSBs. Two single-guide RNAs (sgRNAs), sgCRLF2-1 and sgCRLF2-3, were designed to target Cas9 to sites bound by the TAMRA drop-off probe (Supplemental Fig. 1A) and were transfected into Nalm6-Cas9 cells (Supplemental Fig. 2). dPCR was performed on gDNA from untransfected Nalm6-Cas9 cells or cells transfected with sgCRLF2-1. 2D plots of the dPCR results are shown in Figs. 2B and 2C. No significant accumulation of the drop-off product is measured in untransfected cells as nearly all amplicons are bound by both TAMRA and SUN probes (Fig. 2B). Upon transfection of sgCRLF2-1, there is a dramatic increase in the level of drop-off product (i.e., an amplicon where the SUN probe binds but the TAMRA probed does not) (Fig. 2C). Comparing the copies/µL of drop-off product across multiple replicates demonstrates the high degree of consistency between experiments and shows that transfection of sgCRLF2-1 results in approximately 10% of cells showing drop-off of the TAMRA probe, a 1000-fold increase over the baseline levels leading to over 10 copies/µL of drop-off product (Fig. 2D). Interestingly, if we compare this to sgCRLF2-3, which recognizes a site on the opposite DNA strand and was predicted to cut as efficiently as sgCRLF2-1, we measure 5-fold less drop-off.
To determine if Nalm6-Cas9 cells harbor ongoing genome instability at ABC sites or if the sgRNAs used display off-target activity, we used High-Throughput rejoin and Genome-Wide Translocation Sequencing (HTGTS-JoinT-seq).41 An sgRNA that recognizes a region of the IGH gene that is unrearranged in Nalm6 cells (sgIGH-6, Supplemental Table 1) was used to create the bait DSB to map breakpoints genome-wide. Compared to no bait sgRNA controls, which showed very low numbers of random breakpoint junctions (283 and 161 total breakpoint junctions, Supplemental Fig. 3), sgIGH-6 transfection resulted in over a 1000-fold increase in junctions (Fig. 2E), but no ABC hotspot translocation events. Recurrent t(X;14) translocation events were only detected after co-transfection with sgCRLF2-1 (Fig. 2E). Similar results were obtained with a sgRNA that targets Cas9 to BCL2 (Fig. 2F). These results indicate that Nalm6 cells are not undergoing genome destabilizing events at the two ABC sites at levels detectable by joins to the IGH-induced DSB.
dPCR detects increased indels in response to increased AID expression
To directly correlate AID activity with indel formation at ABCs, we integrated a doxycycline (dox) inducible AID expression cassette via lentiviral transduction to generate Nalm6-AID cells. While no protein was detectable with 0 ng/mL of dox, increases in both AID expression (Fig. 3A) and protein levels (Fig. 3B) were evident as dox levels increased. gDNA was prepared from Nalm6-AID cells grown in increasing levels of dox for dPCR analysis of the CRLF2 ABC (Fig. 3C). With no dox, the drop-off products indicating mutations at the FAM and TAMRA bound sites are elevated while the Cy5 site shows little drop-off. This is very consistent with the AmpSeq data of this region and again demonstrates that the region bound by FAM and TAMRA is inherently unstable while the WGCW at Cy5 is not. Strikingly, with just 100 ng/mL of DOX to increase AID levels, there is a sharp increase in drop-off of the Cy5 probe that continues to rise as more DOX is added showing a clear dose response. Drop-off at the FAM and TAMRA sites also rise as AID levels increase with 250 and 500 ng/mL DOX treatment showing that, on top of the mutations already present, AID can further increase mutations at the CRLF2 ABC.
A similar dose response correlating increased AID expression with increased drop-off was also measured for the BCL2 ABC (Fig. 3D). The Cy5 bound site, which showed the most instability via AmpSeq in Nalm6, shows a sharp increase in response to initial DOX treatment while mutations at the FAM site continue to increase in response to increasing DOX levels. This demonstrates that both the CRLF2 and BCL2 ABCs are assessable and targeted by AID.
Importantly, we also tested non-ABC loci that harbor CpG sites, AMIGO1 (chr1: 109,507,297, 5’-CACAATGGGCGTATCA-3’) and PLEKHA5 (chr12: 19,358,238, 5’-TGACTGTGGAAGAGCA-3’), by dPCR. Neither site shows any increased drop-off as AID levels increase in the cells showing that the RCG and WGCW sites at these control regions do not accumulate mutations (Fig. 3E). While this list is not exhaustive, and we cannot rule out that some non-ABC sites are targeted, the stark difference between these control loci and the ABC sites show that the latter are much more heavily targeted by AID activity. This data supports that ABCs at CRLF2 and BCL2 are unique features prone to AID damage, likely related to these sequences being dynamic and able to form transient ssDNA structures. 39,42
Amplicon sequencing of patient samples shows increased indels
To apply our approach to monitor genome instability at ABC sites in LA populations that are susceptible to Ph-like ALL, we leveraged a set of primary samples from the UC Irvine Hematological Malignancies Biorepository, which included two LA patients with Ph-like ALL (17–062 and 19–021) and two LA patients with Ph+ ALL (15 − 010 and 17–061) (Table 1) that contained ample cells to provide gDNA for AmpSeq. While the number of samples is small due to the relative rarity of these cases, it is adequate for the exploratory nature of this study as we continue to build our biorepository. Strikingly, for the CRLF2 ABC, there are remarkable similarities in terms of indels with each LA patient presenting a major peak centered over two CpGs that are proximal to each other (23 bp apart), yet not at a more distal CpG 150 bp away (Figs. 4A-4C, and Supplemental Fig. 4A), regardless of being Ph-like versus Ph+.
The BCL2 ABC showed a more diverse pattern of indel formation in patients (Figs. 5A-5D), similar to what we found in the Nalm6 and Reh cell lines. While translocations involving this region are typically associated with follicular lymphoma, which does not present as a LA health disparity,14 the mechanism of DSB formation instigated by AID is thought to be similar,3 thus making it important to monitor other ABC sites in LA patients for genome instability. For LA Ph-like patients 17–062 and 19–021 and LA Ph + patient 15 − 010, similar patterns are observed where indels accumulate and peak just after the CpG sites. For patient 17–061, however, we see that the indels form two distinct peaks that line up with the 2nd and 3rd cluster of CpGs (Fig. 5D). As we describe further below, patient 17–061 has other features that suggest a higher level of genome instability. Overall, this indicates that LA patients presenting with increased instability at ABC sites are at higher risk for B cell cancers, making this type of analysis a useful diagnostic tool for cancer risk and severity.
dPCR of LA patient samples shows increased indels at ABCs
As dPCR uses only a fraction of the DNA used for high-throughput sequencing and provides absolute quantification, there would be a clear advantage to using this method to detect instability at ABC sites. Direct comparison of AmpSeq and dPCR data from gDNA isolated from LA patients with either Ph-like ALL (17–062 and 19–021) or Ph+ ALL (15 − 010 and 17–061) also reveals that dPCR has a greater sensitivity to detecting indels. Patient 17–062, for example, shows elevated drop-off at the FAM and TAMRA sites, as shown in the AmpSeq data, yet the dPCR also detects indels at the more distal CpG covered by the Cy5 probe (Fig. 4D). In contrast, patient 19–021 shows no significant accumulation of indels at the Cy5 CpG site, only at the FAM and TAMRA-bound CpG sites despite AmpSeq showing a similar number of reads as 17–062 (Fig. 4E). The dPCR profile for the Ph+ ALL patient 15 − 010 (Fig. 4F) also shows elevated drop-off signal at all three sites, including that Cy5 CpG site.
The LA Ph+ ALL patient 17–061 is the most distinct and provides a direct example of how AmpSeq can miss crucial sequence information. When attempting to calculate the quantity of drop-off product, which relies on having nearly all amplicons bound by the SUN reference probe, we noted a significant decrease in the total SUN signal that was verified by calculating the total copies/µL based on the amplicon binding of all 4 probes (Supplemental Figs. 4B and 4C). Indeed, when we look at the calculation of copies/µL for each probe, we see that the SUN signal is exactly half that of the FAM, TAMRA, and Cy5 probes in each case (Supplemental Fig. 5B). This strongly suggests copy number variation (CNV), where this particular PAR sequence is only intact on either the X or Y chromosome in this patient. Relying on AmpSeq data alone would have led us to miss this instability, likely either because an amplicon was not able to be generated due to the sequence anomaly or the variant calling software was unable to align it with the reference and removed it as a low-quality read.
We can also compare the BCL2 ABC AmpSeq data to what was obtained from dPCR using the drop-off assay designed for this region. For the LA Ph-like patients, 17–062 and 19–021, most indels accumulate immediately after two CpG sites covered by a FAM probe (Figs. 5A and 5B) and we see that FAM drop-off is much higher in patient 17–062 along with increased drop off of the Cy5 probe (Fig. 5E). Patient 19–021 has very little drop-off activity at the BCL2 ABC with no detectable drop-off of Cy5 and only slightly increased drop-off of FAM (Fig. 5F), again something that is not easily discernable from the AmpSeq data alone. Similar results were also obtained for the LA Ph+ patient 15 − 010 (Fig. 5G). The other LA Ph+ patient, 17–061, was the only one with a substantial increase in the number of indels at the Cy5-bound CpG site, having higher and more consistent drop-off product at both sites (Fig. 5H). Also, 17–061 does not display evidence of CNV at the BCL2 locus on chromosome 18. Overall, we see that dPCR can outperform AmpSeq for quantification of indels at ABC sites and can be done in a fraction of the time using between 10–20 ng of gDNA.
While all four of these samples are LA, they are also all cancer patients with some form of ALL. To determine if we detect drop-off from the dPCR assay in healthy individuals, we obtained gDNA from three LA and three White, non-LA donors. Regardless of race, we were unable to detect any significant drop-off product at the CRLF2 locus from either LA (Fig. 4G) or White, non-LA (Fig. 4H) donors. This was also true when we examined the BCL2 locus (Figs. 5I and 5J). According to Table 1, blood and bone marrow were collected from ALL patients when their blast count was very high, meaning the B-ALL cells are overrepresented in these patients resulting in measuring increased drop-off products from the cancer cells. In healthy patients, cells in a pre-disease state that are obtained from blood and bone marrow are likely very few. Monitoring of LA individuals at high risk for Ph-like ALL through dPCR could provide an early indicator of disease where there is an increase in B cells with genome instability at ABC sites.
Differential expression patterns in Ph-like vs Ph + ALL and between Ph-like ALL cohorts High CRLF2 expression is a hallmark of Ph-like ALL but given the high genome instability associate with ALL and that CRLF2 rearrangements can co-occur in Ph + ALL cases,43 we wanted to determine CRLF2 expression levels in all the ALL samples for which we had access. In addition to LA patients, we also included two non-LA Asian Ph + samples (16–022 and 17–020, Table 1) to provide a baseline of expression in a background with very low CRLF2-linked Ph-like ALL risk.19 All three LA Ph− patients, including the two that were Ph-like, showed high relative expression of CRLF2 (Fig. 6A). Strikingly, the Ph+ patient 17–061, which showed the highest level of genome instability via dPCR, also had a significantly elevated CRLF2 expression level. This further emphasizes that confirmation of the Philadelphia chromosome should still be followed-up by testing for CRLF2 overexpression or CRLF2 rearrangements since standard therapy for Ph+ ALL may be less effective in this situation.
Next, to compare global gene expression changes between Ph− and Ph+ ALL patients, we preformed RNA-seq comparing Ph-like patients 17–062, 19–021, and Ph- ALL patient 19–035 to Ph+ patients 17–061, 17 − 004, and 15 − 010, all of which are LA (Table 1), with analysis of differentially expressed genes (DEG) showing a number of differences in the expression pattern (Fig. 6B). Notably, CRLF2 expression is significantly higher in all the Ph− samples. As CRLF2 overexpression is a clinical feature of nearly 65% of all Ph-like ALL cases,20 and Ph-like ALL cases are predominantly in LA individuals, it is possible that further testing of patient 19–035 would confirm a Ph-like ALL diagnosis. A complete list of all genes that showed differential expression has been deposited online (SRA, SPR# accession pending).
We further highlight the differences between these two cohorts by performing Gene Set Enrichment Analysis (GSEA) on the Ph− and Ph+ datasets to determine pathway enrichment of DEGs using the Gene Ontology Biological Process, Reactome, and Drug Signatures Database (DSigSB) databases. The analysis reveals very little overlap between enrichment in the Ph− (Fig. 6C) and Ph+ (Fig. 6D) patients. That several of the most significant pathways affected in Ph− patients include those upregulating the MAP kinase pathway, regulation of B cell proliferation, and negative regulation of cell cycle likely indicate why Ph-like ALL (at least 2/3 of patients) is a much more aggressive disease than Ph+ ALL.
While rates of Ph-like ALL among LAs is increasing, it is still a relatively rare disease, making it difficult to collect a large cohort. To further explore the complex heterogeneity of this disease, we compared our RNA-seq data to a set previously published by Roberts, et al.44 We intersected significant (P < 0.01) DEGs between the two studies that allowed us to determine overlap between the 15 Ph-like ALL patients from Roberts, et al. with our Ph-like ALL cases as well as the ALL cases with no CRLF2 rearrangements. Importantly, we are only including the LA cases from this study in our analysis. As strongly suggested by Fig. 6B, even within our own cohort there is 0% overlap between the Ph-like ALL patients and the others (Fig. 6E). Strikingly, only 44 (2%) DEGs overlap between our study’s patients with CRLF2 rearrangements and the ones from Roberts, et al. Primarily, this demonstrates the complex gene expression patterns in these ALL patients that changes significantly depending on age and ethnicity, the uniqueness of different patient cohorts, and the challenge ahead for defining set characteristics among populations with significant genetic admixture that define Ph-like ALL risk.