As previously documented, the burden of SNVs and indels was low (median 0.6 Mb− 1, range 0.04–5.31) when compared to the majority of solid cancers. Mutation burdens differed significantly across subtypes (PKruskal−Wallis=2.2x10− 16), with iAMP21 and KTM2D (MML1) positive tumours having the highest and lowest mutational burdens respectively (Fig. 1A). The most common chromosome-arm level aberrations were loss of 9p (containing CDKN2A/CDKN2B) and gain of 21q (containing RUNX1), both occurring in 8% of cases (Supplementary Fig. 3). 9p loss preferentially occurred in TCF3-PBX1 translocated tumours (PFisher= 0.043), and 21q gain in hypodiploid tumours (PFisher= 0.012) (Supplementary Fig. 4A and 4B).
The median number of SVs was eight per tumour (2x10− 3 Mb− 1), with iAMP21 tumours possessing the highest number (Fig. 1B). The rate of SVs on chromosome 21 (0.027 Mb− 1) was 10-fold higher than other chromosomes, largely accounted for by iAMP21 tumours (Fig. 1C). Since iAMP21 tumours are defined by RUNX1 copy number, we examined the distribution of SVs on chromosome 21, finding no clustering evident (Supplementary Fig. 5). Chromothripsis did not account for elevated SV rates in iAMP21 tumours as no events were observed on chromosome 21.
Identification of driver genes
We searched for drivers of ALL by first considering the following classes of somatic coding alterations; single nucleotide variants (SNVs)/indels, copy number variants (CNVs), structural variants (SVs) and loss of heterozygosity (LOH). In addition to established drivers, we identified a number of novel ALL drivers, including HLA-DRB5, the histone gene cluster 1, ZEB2, CTCF and MAP1B.
Consistent with previous reports1,44, the most frequently altered genes included CDKN2A/B, PAX5, ETV6, ERG, RUNX1, NRAS, KRAS and IKZF1 (Fig. 2). By combining CNV and SV data we identified two novel regions of recurrent alteration. Firstly, a 120 kb region of HLA (6p21; 32,442,465 − 32,554,750 bps) was deleted in 17% of tumours (Fig. 3A). Within this region only HLA − DRB5 was expressed and deletion was associated with significantly reduced gene expression (P = 3.7x10− 4). We further evaluated read depth data in tumours with an HLA SV but no CNV using an additional copy segmentation algorithm45, finding evidence of a corresponding change number change within 2,000bp an SV breakpoints in every tumour (Supplementary Fig. 6). Secondly, a 117 kb region of 6p22.2 overlapping histone gene cluster 1 (26,122,685 − 26,239,852 bps) was deleted in 10% of tumours (Fig. 3B), within which deletion was associated with reduced expression of HIST1H4E (P = 0.034) and HIST1H2AE (P = 0.023). The cancer cell fraction (CCF) of SVs in the region suggested the majority of these variants are clonal.
Non-silent SNVs or indels in ZEB2, CTCF and MAP1B were seen in 2.2%, 1.7%, and 1.4% of tumours respectively (Supplementary Table 4 and Supplementary Fig. 7). ZEB2 missense mutations were clustered at three base positions, consistent with oncogenic activation (Supplementary Fig. 8). A further eight tumours had focal ZEB2 amplifications. In addition to truncating and damaging mutations in CTCF, an additional 20 tumours had CTCF deletions, consistent with gene inactivation. None of the MAP1B mutations were recurrent and all were predicted to be damaging.
Next we sought to identify non-coding driver mutations. We observed a significant excess of promoter mutations for BTLA (4.2%, Q = 0.002) and CHID1 (2.2%, Q = 0.049). BTLA promoter mutations were clustered within a 27 bp region, and were associated with 5-fold reduced BTLA expression (PMann−Whitney=0.056), the small number of tumours with corresponding expression data presumably preventing this relationship from attaining significance (Fig. 4A). Mutations were predicted to alter the affinity of various transcription factors (TFs) with evidence of binding from ChIP-sEq. Each BTLA promoter mutant possessed a variant predicted to disrupt the binding of an interacting TF, most frequently RUNX1/3, GATA3 and MYB (Supplementary Table 5). Of 14 CHID1 promoter mutations 12 clustered within a 12bp region 1kb upstream of the transcription start site within an AGO1 binding site, corresponding RNA-seq was consistent with mutation conferring reduced CHID1 expression (Fig. 4B).
To search for significantly mutated cis-regulatory elements (CREs) we restricted our analysis to sequences interacting with promoters through chromatin looping. A CRE interacting with the USP22 promoter was mutated in 9.1% of tumours and these were associated with reduced USP22 expression (QMann−Whitney=0.009) (Fig. 4C). Mutations were not uniformly distributed and occupied a number of TF binding sites. A CRE interacting with XRCC2 was mutated in 4% of tumours, mutations were associated with elevated XRCC2 expression (QMann−Whitney=0.046) (Fig. 4D).
We found no evidence of recurrent mutations within UTRs or non-coding RNAs when imposing a threshold of at least five effected tumours.
Mutated pathways
In addition to documented enrichment of NRAS and KRAS mutations in hyperdiploid ALL and TP53 mutations in hypodiploid/near haploid ALL, we identified a number of additional associations (Supplementary Table 6). Notably, TBL1XR1 and ZEB2 mutations were enriched in ERG-deleted ALL (present in 21% and 14% of tumours respectively). iAMP21 tumours were characterised by an excess of RB1 deletions (40%) and IL7R mutations (20%). NF1 mutations were largely confined to near haploid tumours occurring in 45%. ETV6-RUNX1 positive tumours were associated with enrichment for the deletion of TBLXR1 and RAG1/RAG2. Finally undefined tumours (included in other) showed an excess of IKZF1 deletions.
Given the identification of alterations in both CTCF and the histone gene cluster 1 we explored their transcriptional impacts, performing differential expression analysis. We identified five differentially expressed genes in both sets of mutated tumours (Pbinomial= 1.5x10− 8), including CLIC5 and IGF2BP1 (Supplementary Table 7 and Supplementary Fig. 9). Whilst CLIC5 and IGF2BP1 are markers of hyperdiploid ALL46, none of the tumours harbouring these mutations were of this subtype. In total 60 tumours (17%) harboured alterations (deletions or mutations) in either CTCF or the histone gene cluster 1.
To produce a composite picture of somatic events we clustered drivers by biological pathways (Fig. 5). The most frequently altered pathway featured B-cell developmental genes, altered in 70% of tumours. This analysis confirmed the importance of RAS/RTK alterations in hyperdiploid biology and highlighted a number of other key pathways, including secondary alterations affecting cytokine signalling in iAMP21, where 37% of tumours possessed a secondary hit in either IL7R, JAK2 or CRLF2 (including 3/5 cases of P2RY8-CRLF2 translocation). BCR-ABL tumours were characterised by recurrent alteration of genes regulating the cell cycle, whilst hypodiploid tumours were typified by disruption of transcriptional (gene) regulation. As well as disruption of B-cell development genes, driven by loss of the second ETV6 alleles, 56% of ETV6-RUNX1 tumours were effected by alterations in chromatin and histone modification genes.
We assessed the clonality of driver gene mutations, finding most occur both clonally and sub-clonally (Fig. 6A and Supplementary Fig. 10). Exceptions to this included ZEB2 mutations which were always clonal, moreover mutations of B-cell development and haematopoiesis genes (IKZF1, PAX5 and ZEB2) tended to be clonal. Conversely the majority of RAS/RTK gene mutations were subclonal (65%; PFisher=0.001). This was especially true of ERG-deleted tumours where 44% possessed a subclonal RAS/RTK variant (accounting for 89% of these mutations in the subtype) compared to only 8% with a clonal variant. Conversely RAS/RTK mutations in hyperdiploid tumours were usually clonal (60%), occurring in 44% of tumours compared to 20% with only a subclonal variant.
To assess the molecular mechanisms promoting tumorigenesis we used non-negative matrix factorization (NMF) to extract COSMIC single base signatures (SBS). Ten signatures were seen contributing at least 1% of mutations (Supplementary Fig. 11). SBS5 (aetiology unknown but clock-like) accounted for the most mutations (41%) and was seen in all tumours (Supplementary Figs. 12 and 15). SBS2 and SBS13 (AID/APOBEC) were almost exclusively confined to ETV-RUNX1 tumours (QMan−Whitney=2.3x10− 33 and QMan−Whitney=1.1x10− 36 respectively), whilst SBS7a (UV exposure) was highly enriched in iAMP21 tumours (QMan−Whitney=5.3x10− 12) (Supplementary Figs. 13 and 15). SBS7a was associated with the highest mutation rate, 10-fold higher than SBS1 (Supplementary Fig. 16) and was largely responsible for the increased mutation rate in iAMP21 tumours (Supplementary Fig. 17).
It has been reported that SVs in ETV6-RUNX1 positive tumours bear the hallmarks for RAG1 and RAG2 activity47. We searched for recurrent DNA motifs at SV breakpoints, firstly agnostically using motif enrichment performed using HOMER, and secondly by assessing the similarity of discovered motifs to those of candidate mutagenic drivers (Supplementary Table 2). Overall the most enriched motifs were the RAG heptamer (P < 1x10− 200), RAG nonamer (P < 1x10− 200) and PRDM9 (P = 1x10− 121), found at 8.8%, 7.2% and 1.5% of breakpoints respectively. With the exception of ETV6-RUNX1 positive tumours the most frequent enriched motifs were the RAG hepamer and RAG nonamer, however in ETV6-RUNX1 the most common motif was PRDM9 contained in 28% of breakpoints (P = 1x10− 162) (Fig. 6B). Overall RAG heptamers were observed at both breakpoints of 3% of SVs.
We also sought evidence of activation induced deaminase (AID) activity at SV breakpoints. Due to the degenerate nature of AID motifs we used the number of repeats of core AID recognition sequences (Supplementary Table 2) as a proxy of activity. After comparing SVs in immunoglobulin regions we established a cut-off of > 10 repeats as suggestive of AID activity (Supplementary Fig. 18). AID signatures were detected in the breakpoints of 2% of all SVs, but 17% of SVs in TCF3-PBX1 positive tumours (PFisher=8x10− 9) (Supplementary Fig. 18).
Clonal architecture
The presence of subclonal populations in tumours was almost universal (observed in 98% of tumours; Fig. 7A). Most commonly tumours possessed two subclones, however ERG-deleted tumours tended to have a higher number of subclones (QMann−Whitney =0.008) and KMT2A translocated lower (QMann−Whitney=0.038) (Supplementary Fig. 20). The distribution of subclone CCF was similar across subtypes, with the exception of hyperdiploid tumours whose subclones tended to have higher CCFs (QMann−Whitney=0.004), 50% having a subclone with a CCF between 0.7 and 0.8, compared to 9% of other tumours (Supplementary Fig. 21).
The diversity of cell populations (i.e. heterogeneity) varied across subtypes, hypodiploid and ERG-deleted tumours were the most heterogeneous (median Simpson index = 0.61 and 0.62; QMann−Whitney=1.7x10− 3, 1.13x10− 3), while hyperdiploid tumours exhibited lower heterogeneity (Simpson index = 0.45; QMann−Whitney=2.8x10− 7).
Accounting for mutational frequency, we found subclones were enriched for driver mutations (Pbinomal=1.8x10− 5) relative to clonal populations. To examine the processes influencing tumour evolution we compared the frequency of driver gene alteration in subclones. ERG-deleted subclones were the most likely to possess a mutation in an ALL driver gene (35%; Qbinomal=0.0016), whereas BCR-ABL1 positive tumour subclones contained the lowest frequency of driver alterations (4%; Qbinomal= 0.052) (Supplementary Fig. 22).
To explore the possible contribution of neutral evolution to tumour heterogeneity we used MOBSTER43, which models variant distribution under neutrality. MOBSTER called neutral tails in the majority of tumours, fitting a median of 12% (SNVs) and 16% (SNVs and indels) of variants (Supplementary Fig. 23). Evidence of positive selection was sought using dNdSCV, revealing that tail compartments were enriched in NRAS (Q = 3.4x10− 8) and KRAS (Q = 1.9x10− 3) mutations. Additionally rates of non-synonymous substitution in NRAS, KRAS, FLT3, NSD2 were higher in tail compartments than clonal groups (Supplementary Fig. 24).