The genomic basis of childhood T-lineage acute lymphoblastic leukemia

doi:10.21203/rs.3.rs-3488430/v1

Download PDF

Biological Sciences - Article

The genomic basis of childhood T-lineage acute lymphoblastic leukemia

https://doi.org/10.21203/rs.3.rs-3488430/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

T-lineage acute lymphoblastic leukemia (T-ALL) is a high-risk tumor that has eluded comprehensive genomic characterization, in part due to the high frequency of non-coding genomic alterations resulting in oncogene deregulation. Here we report integrated genome and transcriptome sequencing of T-ALL tumor and remission samples obtained from over 1300 uniformly treated children with T-ALL, coupled with epigenomic and single cell analysis of malignant and normal T cell precursors. Integrated analysis identified 15 subtypes with distinct genomic drivers, gene expression, developmental state and outcome. Integration with chromatin topology analyses enabled elucidation of multiple mechanisms of enhancer deregulation that involve enhancers and genes in a subtype-specific fashion, demonstrating a wide-spread involvement of the noncoding genome that has not been systematically interrogated in prior studies. We show that the immunophenotypically-described, high-risk entity of early T-cell precursor ALL is superseded by a broader category of “ETP-like” leukemia with variable immunophenotype and diverse genomic alterations of a core set of genes encoding regulators of hematopoietic stem cell development. Numerous genetic alterations and disease subtypes emerged as independent predictors of survival and treatment failure in univariable and multivariable outcome models. These findings provide a roadmap for the classification, risk stratification and mechanistic understanding of this disease.

Biological sciences/Cancer/Cancer genetics

Biological sciences/Cancer/Cancer genomics

Biological sciences/Computational biology and bioinformatics/Data integration

Biological sciences/Cancer/Haematological cancer/Leukaemia/Acute lymphocytic leukaemia

The prognosis for patients with relapsed or refractory (r/r) T-cell acute lymphoblastic leukemia (T-ALL) is dismal¹. A critical need is the identification of patients at risk for recurrence allowing intervention with novel therapies. Genomic characterization of B cell progenitor ALL (B-ALL) and integration of genomic data into treatment approaches has been transformative^2-4. In contrast, prior attempts to identify genetic aberrations that are reproducibly prognostic independent of treatment response in T-ALL have failed for multiple reasons. First, several small studies have suggested a significant percentage of biologically relevant alterations in T-ALL occur in non-coding regions of the genome⁵, yet most published studies included very few cases sequenced at the whole genome level⁶. The lack of whole genomic sequencing (WGS) not only limited identification of key genomic drivers, but also prevented mechanistic understanding of T-ALL leukemogenesis, which is only possible through integration of WGS with whole transcriptome sequencing (WTS). Secondly, because the prognosis for newly diagnosed patients has improved and the biology of T-ALL is heterogenous and complex, no study has been large enough to power the identification of prognostic genetic alterations independent of treatment response. Thirdly, the only widely used treatment stratification factor except for response to therapy is early T-cell precursor (ETP) ALL, a subset of high-risk leukemia defined by immunophenotype, rather than by biological or genomic features^7,8. Of note, the impact of ETP-ALL on outcome remains poorly defined^7,9. Finally, most genomic profiling efforts in T-ALL excluded patients with refractory disease⁶, because bone marrow (BM) or blood samples obtained at remission are often used as the source of matched non-tumor DNA. As refractory disease is a common cause of inferior survival, many studies excluded a large percentage of high-risk patients who never obtained remission. Collectively, these issues have prevented accurate patient risk stratification and a comprehensive understanding of the biologic basis of T-ALL.

To overcome these limitations, we embarked on comprehensive WGS, whole exome sequencing (WES) and WTS of tumor and germline samples, on more than 1300 T-ALL cases treated on the Children’s Oncology Group (COG) AALL0434 trial (NCT00408005) (Supplementary Table 1-2)¹⁰. This multidimensional approach aimed to uncover both coding and non-coding alterations, provide a comprehensive genomic classification of the T-ALL, and ascertain predictors of relapse and treatment failure (Extended Data Fig.1a).

Integrated genomic analysis identifies 15 subtypes of T-ALL

Using the uniform manifold approximation and projection (UMAP) technique¹¹, we projected RNA-seq gene expression data in two dimensions and employed the Leiden¹² algorithm for data clustering (Fig.1a, Extended Data Fig.1b-c, Supplementary Table 3-7). To identify genomic drivers of each cluster, we examined associations between subtypes and genomic alterations identified from analysis of sequence and structural DNA variants (SV). Putative drivers were identified in 95.1% of cases (1245 of 1309), 59% (777 of 1309) of which were in non-coding regions of the genome (Fig.1b-c). WGS was required to identify drivers for 29% (378) cases, particularly for detecting inversions, translocations, and enhancer single nucleotide variants or small insertions/deletions (SNV/indels) in non-coding regions. Subtype-defining drivers were not identified in 4.9% of cases, of which 39% had low tumor purity.

Previous studies have identified 9 distinct transcriptional subtypes of T-ALL^3,6. Integrated analysis of WGS and RNA-seq data delineated 15 subtypes, each with specific drivers and patterns of oncogene expression (Fig.1d, Extended Data Fig.1b-g). Several of these subtypes were previously recognized, including those with deregulation of the TAL1/TAL2/LMO1/LMO2/LYL1 transcriptional regulators, or deregulation of TLX1, TLX3, NKX2-1 or HOXA9. This analysis enabled the identification of additional genomic alterations deregulating these drivers, as well as subdivision of several of these subtypes into subgroups based on genomic and clinical features. For example, deregulation of TLX3 in the TLX3 subtype is commonly due to hijacking of the BCL11B (ThymoD) enhancer¹³, but we identified multiple additional enhancers that were rearranged (R) and deregulated TLX3, including the T-cell receptor (TCR) β locus, CDK6, the NOTCH1-driven MYC enhancer^14,15(N-Me), and SATB1 enhancers. TLX1 activation commonly arises from rearrangement to TCRβ or δ locus, but we also identified a diverse range of intergenic losses, translocations, and inversions that resulted in TLX1 deregulation in the TLX1 subtype. Similarly, NKX2-1 activation was achieved by TCRδ loci rearrangements, along with recurrent chromosome 14 chromothripsis (Supplementary Fig.1), intergenic losses, rare enhancer hijacking events, and enhancer gains, all contributing to elevated expression in the NKX2-1 subtype. The NKX2-5 subtype harbored rearrangements of NKX2-5 to TCRβ/δ loci or BCL11B enhancer hijacking¹⁶. A subtype of cases with HOXA9 deregulation was characterized by HOXA9 hijacking the TCRβ locus enhancers¹⁷. SPI1 fusions with TCF7/STMN1¹⁸, and YWHAE were hallmarks of the SPI1 subtype. The BCL11B subtype harbored rearrangements juxtaposing hematopoietic stem cell (HSC) enhancers to BCL11B, previously identified in lineage ambiguous leukemia and ETP-ALL^19,20. STAG2/LMO2 T-ALL most commonly harbor a LMO2::STAG2²¹ rearrangement that activates LMO2 and inactivates the cohesin gene STAG2.

Cases with deregulation of the TAL1/TAL2/LMO1/LMO2/LYL1 core transcriptional circuitry could be divided into two subtypes previously termed TAL1-RA and -RB¹⁷. Considering our tumor and normal progenitor genomic data, we term these TAL1 αβ-like and TAL1 double positive (DP)-like subtypes. The TAL1 DP-like subtype exhibits higher expression of RAG1/2, CD4 and CD8, whereas the TAL1 αβ-like subtype is characterized by higher expression of the TCR alpha constant (TRAC) gene and TCRα/β rearrangements (Fig.1d, Extended Data Fig.1f). Known drivers in these subtypes are STIL::TAL1, rearrangements of TAL1, LMO1, LMO2 and LYL1 to TCRδ/β enhancers, and non-coding sequence mutations creating a TAL1 neoenhancer⁵. Here we identified multiple additional mechanisms of oncogene deregulation, including copy number variation (CNV) gains and SNV/indel mutations generating neoenhancers for TAL1, LMO2, and LYL1, intergenic inversions and deletions that result in enhancer hijacking-mediated deregulation of TAL1 and LMO2.

Two additional subtypes were defined: The ETP-like subtype was enriched for cases of ETP immunophenotype and diverse genomic driver alterations, and LMO2 gamma delta (γδ)-like subtype that has diverse alterations, including LMO2 activation from BCL11B enhancer hijacking, FOXP1 rearrangement, rearrangement of MYC to TCRδ, and enhancer SNV/Indels activating LMO2.

We analyzed the differentiation stage of each T-ALL subtype by projecting gene expression signatures onto single cell RNA (scRNA) and assay for transposase-accessible chromatin (ATAC) sequencing data of normal thymi (n=3) and BM samples (n=5; Fig.1e). The 15 T-ALL subtypes spanned a continuum of normal T-cell maturation, ranging from immature HSC or progenitor cells (HSPC), common lymphoid progenitors (CLP), lympho-myeloid primed progenitors (LMPP) or ETP for the BCL11B and ETP-like subtypes; pro/pre-T for the KMT2A, TLX3 and MLLT10 subtypes; cycling double positive (DP) for HOXA9 TCR, TLX1, NKX2-1 and TAL1 DP-like subtypes; TCRA expressing single positive αβ/mature αβ for TAL1 αβ-like; and γδ/effector T for LMO2 γδ-like that was also associated with TCRγδ rearrangements. Notably, enrichment for myeloid signatures were also observed, including dendritic cells (DC) for the SPI1 subtype, granulocyte/monocyte progenitors (GMP) for NKX2-5, and megakaryocytic/erythroid precursor (MEP) for the STAG2/LMO2 subtype.

We observed associations between subtypes and clinical features. Patients within STAG2/LMO2, NKX2-5 and SPI1 T-ALL were younger at diagnosis and the BCL11B and HOXA9 TCR subtypes included a higher proportion of older patients (Extended Data Fig.1j-k). Sex distribution varied across subtypes, with a higher female proportion in the STAG2/LMO2, ETP-like, HOXA9 TCR, LMO2 γδ-like and NKX2-5 subtypes, and a higher male proportion in the TAL1 subtypes (Extended Data Fig.1l).

Detection of significantly altered coding and non-coding alterations

We used DnDScv and Gistic2^22,23, and the genomic random interval (GRIN2) to identify significantly recurrent coding and non-coding alterations²⁴. Altered genes were systematically assigned to 17 distinct pathways. Across the entire cohort, we identified 164 recurrent genes (q-value<0.05) and 46 broad CNVs (Extended Data Fig.2a, Supplementary Table 8-22). Many of these recurrent alterations showed subtype specificity (Extended Data Fig.2b). Twenty-one genes had recurrent alterations outside of coding regions that were only detected by GRIN2 regulatory feature analysis. CDKN2A (71% of cases) and NOTCH1 (69%) were the most recurrently altered genes. PHF6, PTEN, LEF1, MYB, MYC, and RUNX1 each had at least 9 genomic mechanisms of deregulation, many of which required WGS for identification (Extended Data Fig.2a). We identified recurrent sequence mutations for 16 previously unreported coding genes, including putative loss of function stop/frameshift mutations for CUL1, PSIP1, NOL4L, KMT2E, KAT6A²⁵, DDX39B and MYL1 and mutation hotspots for CD99, E2F1, LCK and RBMX.

Diverse oncogene-activating non-coding alterations

Enhancer-mediated oncogene activation was observed in 70.5% of cases and 43.9% of cases had enhancer SNV/Indel, translocations or intrachromosomal inversions that required WGS for identification. The underlying mechanisms were highly diverse, including translocations or inversions, as well as chromothripsis or intergenic copy number (CN) losses juxtaposing 12 enhancers to 62 oncogenes observed in 50.8% of cases, and sequence alterations resulting in the generation of putative neoenhancers for 8 genes in 34.8% of cases (Fig.2a).

We used ATAC-seq and histone 3 lysine acetylation (H3K27ac) HiChIP in 19 T-ALL samples encompassing 19 different alterations in 6 subtypes with reference to healthy cord blood-derived CD34+ HSPC and thymic DP normal cell controls to examine open chromatin state, enhancer formation, and interactions of putative enhancers with oncogenes. Fifty genes, 23 of which were recurrent (q value<0.05) hijacked TCR enhancers, and of these, events deregulating HOXA13, HOXA9 and LMO3 were verified by HiChIP (Extended Data Fig.3a-c, Supplementary Table 19). Strikingly, the BCL11B enhancer was hijacked by 16 genes (6 recurrent; q-value<0.05); LMO2, HOXB13, NKX2-1, HOXA13 were validated by HiChIP (Extended Data Fig.3d-g). Several other enhancers were hijacked by specific genes, such as intergenic 11p deletions resulting in LMO2 deregulation driven by hijacking of RAG enhancers (Fig.2b), CAPRIN1 and CELF1 enhancers (Extended Data Fig.3h), TAL1 inversion to the DHX9 enhancer (Extended data Fig.3i), TLX1 hijacking of the LINC00592 enhancer via intergenic loss; LMO2 hijacking of MIR2117HG enhancer (Extended Data Fig.3j), NKX2-1 hijacking of the NFKBIA enhancer (Extended Data Fig.3k), and ARID1B enhancer hijacking by BCL11B²⁰. By contrast, some enhancers were hijacked by many different oncogenes: IGH (5 oncogenes), MIR181A1HG (4, including MIR181A1HG::HOXA13 and MIR181A1HG::LMO2, Extended Data Fig.3l,m), SATB1, N-Me¹⁴ and CDK6 (3 each; Fig.2a).

Non-coding alterations also deregulated 15 genes that were not initiating/classifying drivers in 14.6% of cases. These included MYC deregulation from SV of the N-Me enhancer^14,15 (8.3%), enhancer SNV/indel mutations resulting in IL7R deregulation (2.2%), ZNF219-HNRNPC intergenic deletion (1.2%), and deletion of the RUNX1 (1.2%), and IRX3 regulatory regions (1.2%; Supplementary Fig.2a-b). Specifically, a recurrent deletion within the FTO locus was associated with increased expression of IRX3 and IRX5 located 240 kB and 890kB downstream, respectively, and the deleted region contains a putative regulatory region for these genes in CD34+ cells (Extended Data Fig.4a). In a second example, ZNF219, positioned 520kb downstream of the TCRδ locus, exhibited recurrent intergenic losses between ZNF219 and the HNRNPC promoter region in conjunction with TCRδ locus deletions, resulting in elevated monoallelic expression of ZNF219 (Extended Data Fig.4b,c, Supplementary Fig.2c).

Several enhancer hijacking events were associated with developmental stage. Intergenic deletions between LMO2 and RAG2 were observed in the TAL1 DP-like subtype, and the RAG2 enhancer that drives LMO2 deregulation was highly active in normal DP and not in CD34+ cells (Fig.2b, Supplementary Fig.3a). Similarly, inversion of TAL1 to the CD1 locus resulted in hijacking of CD1E enhancers, also active in normal DP cells and thymic CD34+CD1a+ cells, but not BM CD34+ cells (Fig.2c, Supplementary Fig.3b). Conversely, a HOXA13-deregulated case in the ETP-like subtype hijacked SOX4 enhancers located within the CASC15 locus that are preferentially active in normal BM/thymic CD34+ cells as compared to DP cells (Fig.2d, Supplementary Fig.3c).

Mutational generation of recurrent neoenhancers was frequent in TAL1 DP/αβ-like ALL (24.7% of cases; TAL1, 4 regions; LMO2, 4 regions; LMO1, 1 region) and also observed i NKX2-1 (N=2) and LYL1 (N=5). We detected recurrent TAL1 enhancer gains (median size 133bp) 28kb downstream of TAL1, enhancer SNV/Indels 20kb downstream of TAL1, SNV/Indels in the first intron of TAL1 and the enhancer Indel upstream of TAL1⁵ all of which were active enhancers in CD34+ and not DP cells (Fig.2f). We validated TAL1 enhancer gains using ATAC-seq, HiChIP and isoform sequencing (isoseq; Fig.2g, Extended Data Fig.4d, Supplementary Fig.2d). Small intronic gains (median size 77 bp) and previously reported SNV/Indels²⁶ of LMO2 were associated with neomorphic promoter generation and non-canonical LMO2 isoform expression (Extended Data Fig.4e, Supplementary Fig.2e). Another mutation hotspot was found 1.8kb upstream of LMO2. ATAC-seq and HiChIP revealed that a 6bp deletion at this enhancer resulted in increased H3K27 acetylation compared to other LMO2 alterations, open chromatin, and heightened expression, consistent with the generation of a neoenhancer (Extended Data Fig.4e, Supplementary Fig.2e).

Oncogene intragenic SV and intronic SNVs

In addition to known coding sequence mutations that result in altered function or SV that impact gene expression, we observed non-coding events and intragenic SV of unknown functional consequence in 50 genes. We detected recurrent NOTCH1 intronic SNVs in 1.6% of cases that triggered alternative splicing of exon 28, which we validated using isoseq, RT-PCR and Sanger sequencing (Fig.2e, Supplementary Fig.4,5). The mutation resulted in increased NOTCH1 activation when compared to wild-type or heterodimerization domain (HD) or sequence rich in proline, glutamic acid, serine, and threonine (PEST) domain mutations, as demonstrated by luciferase assay results (Fig.2f). Alphafold²⁷ predicted an extension of the disordered region between the transmembrane domain and the HD domain without disruption to the HD domain (Supplementary Fig.6). This extension is likely to disrupt signaling or alter the cleavage of the extracellular domain, as described for NOTCH1 tandem duplication²⁸ (Extended Data Fig.4f). In addition, we identified NOTCH1 intragenic losses, culminating in recurrent exon 3-27 and 16-27 deletions on NOTCH1 extracellular domains as well as alternative splicing potentially resulting in the constitutive activation of the NOTCH1 intracellular domain (Extended Data Fig.4g-h). IL7R transcription start site (TSS) loss was observed in 0.6% of cases and was associated with a loss of the mutated allele and also IL7R enhancer hijacking by PRLR²⁹ (Extended Data Fig.4i). In contrast, CCND3 harbored TSS losses in 2% of samples, where the deleted site affected only the long isoform of the gene, but preserved expression of both the mutant and wild type short isoform, suggesting a tumor suppressive function of the long isoform (Extended Data Fig.4j).

TLX3 and NKX2-1 can be subdivided based on genomic profiles

We identified distinct gene expression clusters associated within the TLX3 (TLX3-immature and TLX3 DP-like) and NKX2-1 groups (NKX2-1 TCR and NKX2-1 other; Extended Data Fig.1c). While activation mechanisms were similar, TLX3-immature exhibited a higher incidence of WT1 alterations, NUP214::ABL1 fusions, 16q22.1 losses with CTCF, FLT3 internal tandem duplication (ITD) mutations, and JAK pathway alterations. In contrast, TLX3 DP-like exhibited distinct features like 14q gains, LEF1, and MYB alterations (Extended Data Fig.5a-d, Supplementary Table 23). These groups also differed in TCR rearrangements and ETP-status confirmed by scRNA signature analysis (Extended Data Fig.5e-g).

NKX2-1 TCR exhibited TCR hijacking events and RPL10 mutations, while NKX2-1 other was characterized by chromothripsis leading to BCL11B enhancer hijacking, NFKBIA enhancer hijacking, and NKX2-1 locus CNV alterations, accompanied by MYB TCR rearrangements and gains (Extended Data Fig.5h-j and Extended Data Fig.3f,k, Supplementary Table 24).

Refined classification of TAL1/TAL2/LYL1 and LMO1/LMO2 deregulated T-ALL

We performed comparative analysis of driver mechanisms and co-lesions in TAL1 subtypes. TAL1 αβ-like exhibited a higher frequency of STIL::TAL1 fusions and LMO1 Enhancer SNVs, while TAL1 DP-like had higher frequency of LMO2 TCR rearrangements and RAG2 enhancer hijacking events (Fig.3a, Extended Data Fig.6a, Supplementary Table 25). TAL1 deregulation showed significant co-occurrence with LMO2 or LMO1 activation, whereas LMO2 and LMO1 alterations were mutually exclusive. Similarly, alterations involving TAL1, TAL2, and LYL1 exhibited mutual exclusivity (Extended Data Fig.6b). Although driver genes were often shared between these two subtypes, differences in hijacked enhancers and oncogene activation mechanisms suggests that the maturational arrest state is major determinant of the phenotype. Flow cytometry-based immunophenotype analysis confirmed distinct immunophenotypes of these subtypes (Fig.3c-d, Extended Data Fig.6c-f, Supplementary Table 7).

We detected co-occurring and mutually exclusive genetic co-lesions within TAL1 subtypes, facilitating a refined classification into genetic subtypes (Fig.3b,e, Extended Data Fig.6g-i). TAL1 DP-like was classified into subgroups characterized by: RPL10 mutations, frequent DDX3X and MYB alterations; JAK alteration with frequent IL7R and STAT5B mutations; LEF1 SV/Del or LYL1-altered genetic subgroups; and, a diverse "Other" subgroup featuring an increased frequency of FBXW7, CCND3, TAL2 alterations, and TCRD::MYC. TAL1 αβ-like was subdivided into NOTCH1 wild type with frequent PTEN deletions and PI3K pathway alterations; a group marked by 6q loss; and, an "Other" category with NOTCH1 mutations but lacking 6q loss.

Gene expression profiling unveiled associations between genetic subtypes and oncogenes. For LEF1/LYL1, TAL2 altered, LMO2 γδ-like, or STAG2/LMO2 subtypes TAL1 expression was not distinctive; however TAL1 αβ-like demonstrated nearly exclusively high TAL1 expression (Fig.3e). Similarly, MYC expression was frequently elevated in TAL1 DP-like, possibly linked to a higher incidence of FBXW7 alterations within this group to stabilize MYC³⁰. In contrast, TAL1 αβ-like displayed heightened MYB and MYCN expression, whereas LMO2 γδ-like and STAG2/LMO2 exclusively expressed MYCN and MYC, respectively. MYCN mutations were enriched in LMO2 γδ-like and TAL1 αβ-like, but not within TAL1 DP-like subtype.

Characterization of early T-cell precursor-like ALL

One of the few diagnostic features used to stratify risk in T-ALL is ETP immunophenotype (cytoplasmic CD3+, CD1a-, CD8-, CD5 dim/-, with expression of stem cell or myeloid antigens)⁷. “Near-ETP” T-ALL cases have similar immunophenotype except positivity for CD5. Prior genomic studies of ETP ALL have identified recurrent alterations of genes encoding regulators of hematopoietic development, kinase signaling and chromatin modification^6,8, but have failed to identify unifying genomic alterations that distinguish such cases. Here we identify four subtypes that exhibit enrichment of ETP and near-ETP ALL. The BCL11B-activated subtype was exclusively of ETP immunophenotype^19,20 and harbored (13.6%) ETP cases in the cohort. Most strikingly, we observe a group of cases that includes 70.9% of ETP and 41% of near ETP cases in the cohort, but conversely comparable proportions of each immunophenotypic group: 38.2% of cases were ETP, 33.8% Near-ETP and 27.9% non-ETP. This “ETP-like” subtype had multiple recurrent driver alterations of genes with known or putative roles in HSC development: activating rearrangements of HOXA13 (18.7%) to TCR, BCL11B, MIR181A1HG, SATB1, CDK6 enhancers, cases with HOXA9/10/11 deregulation driven by rearrangements of MLLT10 (18.3%), KMT2A (11%), NUP214 (5.1%), NUP98 (3.4%); loss-of-function mutations of MED12 (14.4%); ZFP36L2 (8.9%) rearrangements or alterations of ETV6 (7.2%) (Fig.4a). Notably, KMT2A and MLLT10 rearrangements also define distinct subtypes of T-ALL not enriched for ETP ALL, supporting the notion that both cell of origin and oncogenic driver determine gene expression signatures. KMT2A fusions within the ETP-like subtype harbored mostly KMT2A::AFDN and other fusion partners, whereas the non-ETP KMT2A subtype exclusively had KMT2A::MLLT1 fusions (Extended Data Fig.7a). Similarly, a subset of NUP98- and NUP214-rearranged cases clustered apart from ETP-like NUP98/NUP214-R fusion cases. Each ETP-like driver had distinct patterns of concomitant alterations: 2q alterations in the ETV6 subgroup; RUNX1, JAK, SUZ12, ASXL1 mutations in the ZFP36L2 subgroup (Extended Data Fig.7b, Supplementary Table 26); ETV6, TP53, SATB1, SH2B3 alterations in the HOXA13 subgroup; PSIP1 mutations in cases with MLLT10 fusions; KAT6A and MBNL1 mutations in the KMT2A subgroup, and gains of chromosomes 8, 10 and 19, loss of 5q and mutation of IKZF1 and GATA3 in the MED12 subgroup. ETP-like cases with MLLT10/KMT2A/NUP98/NUP214 driver alterations had a higher frequency of CNVs and alterations of ETV6, GATA3, IKZF1 and RAS signaling, and fewer NOTCH pathway alterations than non-ETP-like cases with these drivers, indicating likely roles of cell of origin, fusion partner and co-lesion in driving gene expression fate. By contrast, near-ETP cases were more dispersed, and enriched in the ETP-like, TLX3 Immature and TAL1 αβ-like subtypes (Extended Data Fig.1h-i).

The MED12 alterations observed in ETP-like cases were observed across the coding region of MED12, suggesting loss of function (Supplementary Fig.7). To test this, we inactivated MED12 using genome editing in the LOUCY (SET::NUP214) cell line (Supplementary Fig.8) that has immunophenotypic similarity to ETP ALL, and observed upregulation of histone deacetylase pathway gene expression (Extended Data Fig.7c). Intersection of these data with the gene expression profile of MED12 ETP-like cases showed common reduced expression of the T-cell differentiation markers CD5 and CD28, and increased expression of the stem cell markers LMO2 and HHEX (Fig.4b, Extended Data Fig.7d-f, Supplementary Table 27-29), indicating loss of MED12 function directly contributes to the immaturity characteristic of ETP-like ALL.

Integrated genomic analysis also elucidated mechanisms driving differential deregulation of specific HOXA genes in ETP-like ALL. Specifically, enhancer hijacking alterations driving HOXA9, but not HOXA13 deregulation such as TCRB::HOXA9 showed that rearrangement breakpoints were always located between two CTCF peaks that demarcate a topologically associating domain (TAD) boundary between the HOXA9 and HOXA13 loci in CD34+ HSPC cells (Fig.4c, Extended Data Fig.7g). By contrast, all breakpoints of rearrangements deregulating HOXA13 were confined to the HOXA13 TAD, thus constraining activation of HOXA9.

Although 27.9% of cases in the ETP-like subtype did not fulfil the immunophenotypic criteria for ETP/near-ETP ALL, they exhibited immunophenotypic trends (lower expression of T-cell expression and expression of myeloid/stem cell markers) and commonalities including absence of TCR rearrangements, similar maturational stage, genomic drivers, and outcome (Fig.4d-i, Extended Data Fig.7h-j, Supplementary Results). Thus, ETP-like ALL is a subtype of ALL with distinct, heterogenerous drivers, a likely HSPC origin but variable diagnostic immunophenotype; genomic classification should replace immunophenotypic classification.

Outcome analysis reveals genomic risk factors associated with refractory disease, relapse and secondary malignancies

We examined genomic features associated with clinical outcome (Supplementary Table 30-39). Positivity for residual disease (RD) (MRD ≥0.01%⁹) was particularly common in the ETP-like and LMO2 γδ-like subtypes (Fig.5a, Extended Data Fig.8a). We associated subtypes, genetic drivers, co-lesions, broad CNV changes, altered pathways to RD risk (Extended Data Fig.8b). Notably, ETP-like drivers and co-lesions (such as SH2B3, ETV6, NRAS, WT1) were associated with higher RD risk, while TAL1 subtype-related features (LEF1, USP7, PI3K, CCND3) associated with lower RD risk. Additionally, pathways like JAK and RAS were associated with higher RD risk, whereas NOTCH, ribosome, and PI3K lower RD risk.

Examining event-free (EFS), disease-free (DFS) and overall survival (OS), the SPI1 and LMO2 γδ-like subtypes had dismal outcomes, and the NKX2-5, as well as ETP-like KMT2A, MLLT10, HOXA13 genetic subtypes had adverse outcome. The non-ETP-like KMT2A subtype had higher MRD but a very favorable prognosis, unlike KMT2A cases within the ETP-like subtype (Fig.5b). Similarly, non-ETP-like MLLT10 cases had better prognosis compared to ETP-like MLLT10 cases. Analogous patterns emerged for TLX3, where TLX3 DP-like had favorable prognosis and TLX3 Immature exhibited worse outcomes. The ZFP36L2 subgroup had increased rates of high MRD, yet a favorable outcome, highlighting that early poor disease response alone should not be the sole factor driving decisions such as HSC transplant. Notably, heterogeneity was also evident within TAL1 genetic subgroups, as TAL1 DP-like subgroups (‘LEF1/LYL1’ and ‘Other’) were associated with inferior EFS and DFS, and TAL1 αβ-like subgroups (‘Notch wt’ and ‘Other’) were associated with inferior OS, whereas the ‘RPL10’ subgroup had an excellent outcome (Extended data Fig.8c).

Next, we examined associations between genetic variants and outcome (Fig.5c, full variant list in Extended Data Fig.9a). Most NOTCH1 variants had favorable prognosis, traditionally perceived as markers of a favorable prognosis regardless of MRD response (Extended Data Fig.9b). Unexpectedly, NOTCH1 intronic SNV and NOTCH1 intragenic losses associated with worse OS and EFS, respectively (Fig.5c). Further, MYC TCR rearrangements had inferior DFS, whereas MYC enhancer gains and had favorable DFS. PTEN alterations emerged as another poor prognosis feature, as cases with PTEN deletions had markedly worse outcomes compared to other PI3K pathway alterations. Within TAL1 subtype features, LYL1 TCR and LMO2 intergenic losses leading to RAG/CAPRIN1 enhancer hijacking were associated with worse outcomes, in contrast to favorable prognostic markers like 6q Loss and RPL10 mutations. Collectively, these results demonstrate that risk stratification must account for the type of variant for a given gene and not only the gene that is altered.

Through competing risk (CR) models, we identified risk factors such as LMO2 intergenic loss, MYC TCR, PTEN deletions and NOTCH1 intragenic deletions (Extended Data Fig.9c-g) associated with relapse. We found an association of TAL1 upstream indel with a higher relapse risk compared to other TAL1 mechanisms and overall differential relapse risks across TAL1 subtypes (Extended Data Fig.9h-i). In a recent study, TAL1 upstream Indels and TCR::LMO1 were associated with induction failure; however, we found no association in our cohort (Extended Data Fig.9j).

Notably, CR analysis of secondary malignancies showed that 4 out of 11 SPI1 fusion cases developed histiocytosis and myeloid sarcoma within a year of diagnosis (Fig.5d). The T-ALL samples exhibited elevated expression of markers also expressed by dendritic cells (HLA-D, CD1a, CD38, CD45, CD7, CD5, and sCD3) and the SPI1 signature showed high enrichment in thymic dendritic cells and gene expression markers aligned with immunophenotype, suggesting the cell of origin that acquires SPI1 fusion has T and dendritic cell characteristics (Fig.5e-f).

Multivariable genomic models accurately predict patients at risk in T-ALL

We developed multivariable models incorporating clinical variables, treatment response and genetic subtypes and alterations to predict outcome and risk stratify patients (Methods). Random Survival Forest (RSF) and Penalized Cox Regression (pCox) had the highest accuracy when each model was fitted using numeric MRD, clinical variables (sex, WBC count at diagnosis >2x10⁵ cells/μl and central nervous system (CNS) status), subtype/variant level genomic features for the pCox model, and genetic subtype for the RSF model (Extended Data Fig.10a, Supplementary Results, Supplementary Table 40). These approaches are distinguished by their precision and potential clinical utility (Supplementary Results). The pCox model was designed to comprehensively identify combinations of genetic and clinical features that were independently prognostic. The second four node Survival Tree (ST) model, was designed with a specific focus on the practical application of the RSF model in a clinical setting. In this model, patients were stratified into prognostic groups using only their genetic subtypes and MRD status.

The pCox model achieved a concordance of 0.767 and incorporated three clinical features, five subtypes and 18 genomic alterations to stratify patients into four equally sized groups with five year EFS ranging from 65 to 97% (Fig.6a,b, Supplementary Table 41). SPI1 subtype, ETP-like subtypes, PTEN deletions/loss, PIK3CD SNV/indels, and LMO2 intergenic loss associated with worse outcomes, while KMT2A subtype, 6q loss, RPL10 and NOTCH1 SNV/indels were associated with favorable outcomes both in univariable and pCox models, highlighting their value as independent prognostic biomarkers (Fig.6a, Fig.5b-c).

The four node ST model achieved a concordance of 0.712 and was able to risk stratify patients into eight groups with 5-year EFS ranging from 45-98% (Fig.6c, Extended Data Fig.10b). Several features, including the ETP-like drivers KMT2A, MLLT10, NUP98 and rare drivers, and the SPI1, LMO2 γδ-like and NKX2-5 subgroups had poor outcome (5-year EFS <60%) regardless of MRD response. These patients should be considered for HSC transplant or novel immunotherapies as outcomes are poor despite intensive multi-agent chemotherapy. In contrast, several other features, including ETP-like with ZFP36L2 alterations, TLX3 DP-like, TAL1 DP-like RPL10, NKX2-1, TLX1, KMT2A, HOXA9 TCR, TAL1 αβ-like Loss 6q, TAL1 αβ-like Notch wt had very favorable outcomes (5-year EFS >98%) if day 29 MRD was <0.01%. This large group of patients (n = 260; ~20% of cohort) may benefit from a reduction in intensity of chemotherapy.

ETP immunophenotype (IP) was not prognostic in the ETP-like subtype (Extended Data Fig.10f). In contrast, both the pCox model and ST proved effective in accurately predicting outcomes for individuals within both the ETP-like and ETP-IP groups (Fig.6c, Extended Data Fig.10c-j). These findings underscore the necessity of employing genomics-based multivariable prognostic classification.

We classified T-ALL into 15 subtypes with distinct gene expression and genomic drivers, including previously undefined ETP-like and LMO2 γδ-like subtypes. We also refined the classification of known subtypes such as TAL1 and TLX3, unveiling their genetic and transcriptomic heterogeneity, providing insights into their underlying biology and developmental states. Notably, approximately 60% of identified leukemia drivers involved non-coding regions, requiring WGS in 28% of cases. This highlights a substantial shift in our understanding of T-ALL biology, with a predominant genomic landscape driven by non-coding alterations. Notably, enhancer hijacking emerged as a prevalent method for oncogene activation, impacting 53.5% of cases with 53 distinct oncogenes. In total, we identified recurrent pathogenic or likely pathogenic genetic alterations in 164 genes and uncovered numerous recurrently altered genes and variants for known genes not previously documented in T-ALL. Many potentially targetable genes with biologic pathway relevance were altered and enriched in various subtypes. For instance, FLT3 alterations were enriched in BCL11B and TLX3 subtypes, while PI3K pathway alterations were frequent in TAL1 DP-like and NKX2-5 subtypes, enhancing our understanding of T-ALL biology and directly informing targeted therapies.

Leveraging a large dataset of uniformly treated patients, we were able to craft two robust multivariable outcome models and determine the prognostic significance of various subtypes, altered genes, and dysregulated pathways. T-ALL is typically immunophenotypically classified; however, immunophenotypic classification has not been informative in risk stratification. We showed that the ETP-like subtype exhibits poor early MRD response and inferior EFS, and is notably enriched with genetic drivers such as KMT2A-R and MLLT10-R. ETP-like and non-ETP-like KMT2A-R or MLLT10-R patients revealed a stark divergence in outcomes, underscoring the importance of separating these subgroups. We identified genetic alterations such as the BCL11B and ZFP36L2 ETP-like subtypes that associated with poor MRD response but good outcome. Conversely, other alterations such as SPI1-R subtype and PI3K alterations displayed favorable MRD response but poorer survival, highlighting that MRD alone is insufficient to risk stratify patients. Finally, we observed a significant link between the type of gene alterations and outcomes in T-ALL. While mutations in the NOTCH pathway often imply a favorable prognosis, we found that patients with intronic SNVs or intragenic deletions in NOTCH1 experienced inferior survival. Similar patterns emerged for other genes, including MYC and PTEN. In summary, through the largest comprehensive sequencing effort in T-ALL performed to date, our data elucidate insights into T-ALL disease biology and underscore the need for comprehensive risk stratification that considers incorporates genomic subtypes, variant type, as well as coding and non-coding alterations.

Patient cohort

AALL0434 Clinical Trial and Samples Used for Genomic Analyses

AALL0434 (NCT04408005) is a Children’s Oncology Group (COG) phase 3 international clinical trial for patients with newly diagnosed T-cell acute lymphoblastic leukemia (T-ALL) and lymphoblastic lymphoma (T-LL) aged 1-30 years. Subjects were enrolled from 01/22/07 until 07/25/14 at 214 centers in the United States, Canada, Australia, New Zealand, and Switzerland. All subjects with T-ALL were required to enroll on a companion classification study that was used for sample banking and risk stratification, AALL03B1 (NCT00482352) from 1/22/07 until 08/08/10 or AALL08B1 (NCT01142427) from 08/09/10 until 07/25/14. AALL0434, AALL03B1, and AALL08B1 were approved by the Pediatric Central Institutional Review Board (IRB), local IRBs at all participating centers, and NCI Cancer Evaluation and Therapeutic Program (CTEP). Written informed consent/assent was obtained from all study participants and/or their legally authorized representative in accordance with the Declaration of Helsinki. Genomic studies performed for this work were approved by COG, CTEP, and the local IRBs at the Children’s Hospital of Philadelphia and St Jude Children’s Research Hospital. Samples were decoded and assigned a unique study identifier (USI). Samples were banked at the COG biorepository at Nationwide Children’s Hospital in Chicago, IL.

Details on the chemotherapy treatment backbone, inclusion and exclusion criteria for the clinical trial, and results of the AALL0434 clinical trial have been previously published^31,32. A total of 1562 eligible and evaluable subjects with T-ALL were enrolled on AALL0434. Of these, 1409 subjects consented to correlative research and had samples banked for genomic analyses. Diagnostic specimens are used for somatic/tumor DNA and RNA and Day 29/end of induction/remission samples were used for matched normal control DNA. Details on isolation of normal control DNA for subjects who did not attain remission at Day 29/end of induction are detailed in Supplementary Methods.

1309 subjects had successful complete sequencing defined as WGS (whole genome sequencing), WES (whole exome sequencing), and WTS (whole transcriptome sequencing) of tumor and WGS of normal matched control. An additional 53 subjects had successful WTS sequencing without either WGS/WES of tumor or normal matched control and were included in transcriptome only analyses. A comparison of important clinical and demographic features as well as outcomes between the eligible and evaluable T-ALL patients and the sequenced cohort are provided in Supplemental Table 2, demonstrating the sequenced cohort was representative of the overall trial cohort.

Subjects with newly diagnosed T-ALL were eligible based on either >25% leukemic blasts on bone marrow aspirate or by a complete blood count (CBC) documenting the presence of at least 1,000/μl circulating peripheral blasts. A bone marrow examination was required unless there was a medical contraindication to having the test. Bone marrow and/or peripheral blood samples were collected and banked at diagnosis and bone marrow and/or peripheral blood were also banked at the end of induction (after one month of chemotherapy). Bone marrow samples collected at diagnosis were prioritized for use for tumor genomic studies; however, peripheral blood samples were used if bone marrow samples were not available.

Central determination of immunophenotype and minimal residual disease (MRD)

Immunophenotype and MRD of patient samples was performed centrally on AALL0434 by 8-9 color flow cytometry. Comprehensive immunophenotype including determination of ETP, near-ETP and not-ETP status was performed on diagnostic bone marrow or peripheral blood samples, as previously published⁷.

ETP was defined by T-lymphoblasts that were CD8 negative and CD1a negative (<5% positive), weakly expressed CD5 (either <75% positive or median intensity more than 1 log less than mature T cells) and expressed one or more myeloid or stem cell markers (>25% positive) including CD13, CD33, CD34, CD117 or HLA-DR⁹. Near-ETP was defined by meeting the ETP-IP but having stronger CD5 expression. The remaining cases were defined as Non-ETP. The panel of antibodies is included in Supplemental Table 6. Of note, the panel changed in 2008. ETP status was determined on 82.3% of subjects treated on AALL0434 (1256 of 1526) and 87.2% of subjects (1141 of 1309) who had complete sequencing performed. ETP was described as an entity in 2009 and therefore subjects enrolled prior to its description were uncharacterized for ETP status by immunophenotype^7,9. MRD was assessed by flow cytometry at Day 29/end of induction in all patients who remained on protocol therapy at that time point.

DNA/RNA isolation

Matched normal control cell DNA isolation for all samples except those that underwent flow-sorting (n = 24) was performed at Nationwide Children’s Hospital, using the Qiagen QiaAMP DNA kit (Mini kit or Maxi kit depending on number of cells in the sample). Matched normal control DNA for those that underwent flow sorting was performed at CHOP, using the Qiagen QiAMP DNA kit (Mini kit or Micro kit depending on number of cells in the sample). DNA and RNA from Ficoll-enriched viably preserved tumor samples was extracted at the Fred Hutchinson Cancer Center using the Qiagen AllPrep Extraction Kit.

Sequencing

Ribosomal RNA (rRNA) reduction RNA-seq library preparation and sequencing:

The concentration and integrity of the total RNA was estimated by Ribogreen assay (Invitrogen), and Fragment Analyzer (Agilent), respectively. Approximately 500ng of total RNA from each sample was taken into library prep using the Illumina Stranded Total RNA Prep with Ribo-Zero Plus kit (Illumina) as per manufacturer's recommended protocol. Final Library concentration was measured by Picogreen Assay (Invitrogen), and the library size was estimated by utilizing a DNA High Sense chip on a LabChip Gx (PerkinElmer). Accurate quantification of the final libraries for sequencing applications was determined using the qPCR-based KAPA Biosystems Library Quantification kit (Roche). 2x100 PE Sequencing was performed on an Illumina NovaSeq 6000 instrument (Illumina).

Whole Exome library preparation:

Approximately, 500ng DNA from each sample was sheared on a Covaris focused-ultrasonicator (Covaris Inc, USA) with a target yield of 200bp fragment size. Following this the fragmented DNA was taken into standard library preparation protocol using KAPA HyperPrep Kits (Roche, USA) with slight modifications. Post-ligated material was individually barcoded with unique in-house primers and amplified PCR using KAPA HiFi HotStart Ready Mix (Roche, USA). The concentration of the libraries was then measured by Picogreen assay (Thermo, USA), and the average fragment size of the libraries was estimated by utilizing a DNA High Sense chip on a LabChip GX Touch Nucleic acid analyzer (PerkinElmer, USA), respectively. KAPA qPCR assay (Roche, USA) was then performed to assess the nanomolar amounts of ligated libraries.

Post Library prep, approximately 600ng library per sample was hybridized using the xGen Exome Hyb Panel v2 (Integrated DNA Technologies, USA) as per manufacturer’s recommendation. This probe set consists of 415,115 probes that span a 34 Mb target region (19,433 genes) of the human genome and 39 Mb of probe space. Post hybridized libraries are amplified through PCR and the concentration of the libraries was then measured by Picogreen assay and the average fragment size of the libraries was estimated by utilizing a DNA High Sense chip on a LabChip GX Touch Nucleic acid analyzer. KAPA qPCR assay was then performed to assess the nanomolar amounts of ligated libraries. Final libraries were pooled and then sequenced as 2x100bp Paired-end sequencing on the NovaSeq 6000 instrument using an S4 200 cycle flow cell.

Whole Genome library preparation and sequencing:

Approximately, 500ng DNA from each sample was sheared on a Covaris focused-ultrasonicator (Woburn, MA, USA) with a target yield of 500bp fragment size. Following this the fragmented DNA was taken into standard library preparation protocol using KAPA HyperPrep kit as per manufacturer’s recommendation. The concentration of the libraries were assessed by Picogreen, and the average fragment size of the libraries was estimated by utilizing LabChip® GXII Touch (Caliper), respectively. Accurate quantification for sequencing applications was determined using the qPCR-based KAPA Biosystems Library Quantification kit (Roche). Paired End (PE) (150bp) sequencing was performed on an Illumina NovaSeq 6000 (Illumina).

Other sequencing:

HiCHiP, ATACseq, and long read RNA sequencing are described in the supplementary methods.

Data analysis

See Supplementary Methods for data analysis and variant calling approaches.

Subtyping

UMAP

A series of filtering steps were applied to the features as follows: (1) Samples were required to exhibit expression of over 1 transcript per million (TPM) ≥ 5 samples. (2) Only genes located on chromosomes 1 to 22 and X were included. (3) Genes were filtered based on gene biotype, with the requirement that they be classified as protein_coding, lncRNA, TR_C_gene, or TR_V_gene. (4) the 300 most variable genes were selected solely from diagnostic samples with a blast percentage exceeding 70%, determined by the standard deviation. Subsequently, UMAP¹¹ (Uniform Manifold Approximation and Projection) dimensionality reduction analysis was conducted on the entire cohort (N=1362) using the 'uwot' R package (v0.1.14), with parameters set to n. neighbors = 15 and min. dist = 0.1, with seed.use = 1. The robustness of the resulting projection was evaluated by performing the UMAP projection across varying variable feature counts, ranging from 100 to 2000 genes.

Clustering

We applied community detection-based clustering using the same set of 300 most variable genes (Supplementary Table 5) employed in the UMAP analysis. The Leiden¹² algorithm, implemented within the 'igraph' R package (v.1.3.5), was employed for this purpose. Initially, SNN (Shared Nearest Neighbor) graphs were constructed using the 'scran' v.1.26.2 'buildSNNGraph' function, with a parameter of k=7 and 'set.seed(1)' for reproducibility. We conducted clustering at two different resolutions, namely 0.1 and 0.5, to delineate both primary subtypes and finer subgroups. Additionally, we extended our clustering approach to include frequently altered oncogene expressions, specifically TAL1, TAL2, LMO1, LMO2, NKX2-1, NKX2-5, TLX1, TLX3, HOXA9, HOXA13, and LYL1. For this, we constructed the SNN graph with k=20 and set the resolution to 0.8.

Defining subtypes using integrative analysis

The main subtypes were defined by Leiden clusters at a resolution of 0.1, also having high concordance with the oncogene-based expression analysis (Extended Data Fig.1b). However, certain smaller subgroups, such as SPI1, LMO2 γδ-like, NKX2-5, and STAG2/LMO2, did not exhibit distinct boundaries within the low-resolution clusters. To address this, a combined approach involving clustering at a resolution of 0.5 and driver classification was used to delineate these subtypes. This approach was validated by the identification of highly distinct gene expression patterns for each subtype. In addition, we observed that the MLLT10, HOXA9, and KMT2A subtypes also included rare instances of NUP98 and NUP214 cases. Importantly, these cases did not cluster with the ETP-like subtype NUP98/NUP214 cases, warranting their separate analysis. Meanwhile, the TME-enriched subtype, lacking a unifying driver, exhibited a connection with low blast percentages (Extended data Fig.1m). Further analysis of gene expression signatures revealed an elevated presence of monocytes and other components of the tumor microenvironment (TME), accompanied by an absence of T-cell gene expression. This evidence suggests that signals originating from the tumor microenvironment pose challenges for driver discovery.

The Leiden algorithm at a resolution of 0.5 was employed to identify subclusters within the main subtypes. Specifically, the ETP-like, TAL1, TLX3, NKX2-1 groups were subjected to subgrouping based on genetic alterations in alignment with these subclusters. ETP-like exhibited different drivers per subcluster, whereas the TAL1, NKX2-1 and TLX3 groups showed notable disparities in gene expression subclusters and underlying driver variants or co-lesions, providing a basis for their refined classification.

Assembly of final genetic alteration dataset

The final dataset employed for subsequent statistical analysis encompassed harmonized data from 1309 patients with WGS and RNA. This comprehensive dataset integrated significant alterations (See GRIN2, GISTIC2, DnDScv and data harmonization in Supplementary Methods), along with manually reviewed classifying driver alterations for each patient sample. This was done to also keep rare variants that were putative drivers. Additionally, manually reviewed broad copy number variations (CNVs) were integrated for each sample. Each variant underwent classification as either Coding or Non-coding, with three levels of annotations, ranging from simple to more granular annotations (Supplementary Table 8-11). The subsequent step involved condensing the data at the gene, variant, or pathway level as binary features. Pathway curation and pubmed PMIDs are shown in Supplementary Table 12.

Assembly of Healthy normal thymus and bone marrow reference set

See Supplementary Methods.

Gene expression analysis

See Supplementary Methods.

Outcome Analysis

Overall survival (OS) is defined as the time from study enrollment or postinduction randomization to death or date of last contact. Event-free survival (EFS) was defined as time from study enrollment to first event (induction failure, induction death, relapse, second malignant neoplasm, or remission death) or date of last contact. Disease free survival was defined as the time from postinduction randomization to first event (relapse, second malignant neoplasm, or remission death) or date of last contact³¹. Minimal residual disease (MRD) was treated as a numeric proportion or as binary variable with class “Negative” if MRD<0.01% and “Positive” if MRD 0.01%. We also created an ordinal MRD variable that further divided the “Positive” group to “Low Positive” if MRD<1% and “High Positive” if MRD 1%.

Univariable Screening and Competing Risks

See Supplementary methods.

Multivariable Analysis

We explored multiple statistical methodologies to build prognostic models with each of the survival endpoints and multiple candidate predictors. To minimize overfitting, we split the data 100 times into 70% training and 30% test datasets. Then for each method, we built a predictive model using the training dataset and calculated the concordance of this model in the test dataset³³. The 100 replicates allow us to capture the uncertainty associated with the model concordance, which was summarized by mean and 95% interval from the replicates. The prognostic model methods considered were Penalized Cox Models, Random Survival Forests, Survival Trees (See Supplementary methods).

In Vitro Experiments

Engineering MED12 knockout cells

LOUCY-MED12KO (MED12 knockout) cells were generated using CRISPR/Cas9 technology, employing Cas9-gRNA ribonucleoprotein (RNP) delivery. A mixture consisting of 3 µl of Cas9 protein (20 µM) and sgRNAs (60 µM) was incubated for 15 minutes at room temperature and subsequently combined with LOUCY cells at a concentration of 1 x 105 cells/ml in Buffer T (Invitrogen, #MPK1025K). Electroporation was executed using the Neon Transfection System (Thermo Fisher Scientific) under the conditions of 1600V, 10ms, and three pulses. Following a 72-hour incubation, the electroporated cells underwent single-cell sorting for precise clonal isolation. The genetic modifications were assessed through Sanger Sequencing, and the loss of MED12 protein expression was confirmed by Western blotting. Furthermore, whole-genome sequencing (WGS) was conducted to validate the gene edits and to scrutinize potential off-target effects.

Western blotting

Engineered LOUCY-MED12KO cells were lysed in radioimmunoprecipitation assay buffer supplemented with protease and phosphatase inhibitors (Thermo Fisher Scientific, #1861281). 20 µg of protein of the cell lysate was electrophoresed through 3–8% NuPage Tris-acetate gels (Life Technologies) at 110 V for 120 minutes. Blots were probed with anti-STAG2 (Cell signaling technology, #14360) and anti-Lamin B (Abcam, #133741) antibodies. For imaging and quantitation, Odyssey DLx (LI-COR) and Image Studio (LI-COR) were used.

Sequencing validation of NOTCH1 exon 27-28 intronic SNV

We analyzed two patient samples from St Jude Children’s Research Hospital (Memphis, USA) Tissue Bank. DNA/RNA extraction was performed by Quick-DNA/RNA Microprep™ Plus Kit (#d7005, Zymo Research). Primers were designed upstream and downstream the SNV and used to amplify DNA by PCR using VeriFi™ Hot Start Mix Red (#PB10.47, PCR Byosistems). Purification of the PCR products was obtained by Wizard® SV Gel and PCR Clean-Up System (#A9282, Promega) and sequences were verified by Sanger sequencing using 3730 DNA analyzer (Applied Biosystems) and BigDye™ Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems). To validate the extended transcript due to the intronic SNV, reverse transcription of RNA of the two patient samples and two cell lines not harboring the intronic SNV, PEER (#ACC6, DSMZ) and LOUCY (#CRL-2629, ATCC), was performed using SuperScript™ III First-Strand Synthesis System (#18080051, Invitrogen). Amplification of the region of interest was performed by PCR using VeriFi™ Hot Start Mix Red (#PB10.47, PCR Byosistems) and primers designed upstream and downstream the retained intronic region (Supplementary table 43). Purification of the PCR products was obtained by Wizard® SV Gel and PCR Clean-Up System (#A9282, Promega) after separation by molecular size through 0.8% agarose gel electrophoresis. Sequences were verified by Sanger sequencing as described above. The results were analyzed by CLC Genomic Workbench v. 22.0 (Qiagen).

Luciferase reporter assays

pcDNA6.2 NOTCH1 expression constructs were transiently transfected in HEK293T cells using FuGENE HD Transfection Reagent (#E2311, Promega) together with RBPJ/CSL luciferase reporter construct (pGa981-6) (Oswald, F., Mol. Cell. Biol. 21 (22): 7761-7774 (2001)) and Renilla luciferase expressing vector (pRL-CMV) (#E2261, Promega). Luciferase assays were performed using Dual Luciferase Assay System (#E1910, Promega) 24 hours after transfection on Biotek Synergy HTX Multimode Reader (Agilent) according to manufacturer’s protocols. Assays were performed in triplicate and repeated at least three times with consistent results.

Data Availability

Primary sample whole-genome, exome and transcriptome data are available under database of Genotypes and Phenotypes (dbGaP) accession number phs002276.v2.p1 (phs000218, phs000464 for T-ALL TARGET samples) and the Kids First data portal (https://portal.kidsfirstdrc.org/dashboard). Clinical data, processed genomic data and statistical analysis results can be found in the Supplementary Tables excel file. HiChIP, Isoseq and ATACseq data are available in European Genome Phenome (EGA) data portal, accession number EGAS50000000016. Somatic alterations, recurrent mutations and HiChIP/ATACseq tracks can also be explored interactively using ProteinPaint³⁴ and GenomePaint³⁵ on St. Jude Cloud at https://viz.stjude.cloud/mullighan-lab/collection/the-genomic-basis-of-childhood-t-lineage-acute-lymphoblastic-leukemia~29.

Code Availability

This study did not involve the development of software. Code to reproduce key parts of the analysis can be accessed from github: https://github.com/ppolonen/genomic_basis_TALL.

Acknowledgements

We would like to thank the Gabriella Miller Kids First Pediatric Data Research Program and Data Resource Center, including Marcia Fournier, PhD, James Coulombe, PhD, Emily Boja, PhD, Jamie Guidry Auvil, PhD, Valerie Cotton, BSc, and David Higgen, PhD; the Biopathology Center at Nationwide Children’s Hospital, including Alexis Cameron, BS, Yvonne Moyer, MBA, and Tyler Jones, BS; Children’s Oncology Group (COG) operations including Sarah Vargas, PhD, Mary Beth Sullivan, MPH, Michael Thomas, BA, and Chelsee Sauni, MPH; COG leadership including Douglas Hawkins, MD, Lia Gore, MD, and Peter Adamson, MD; the Cancer Therapy Evaluation Program (CTEP) including Malcolm Smith, MD, PhD; Hudson Alpha Genomics Project Management, including Salina Kuhafa-Hall, MSc; and, the Flow Cytometry Core at the Children’s Hospital of Philadelphia, including Florin Tunic, MD, PhD, and Jennifer Murray, BS; Computational Biology Training in Hematology (CBTH) program mentors. This investigator initiated trial was supported by Novartis Inc.

Author Contributions

M.L., E.R., S.P.H., J.J.Y., M.D., H.I., S.W., K.D., W.L.C, N.C.R, K.B., C.D., L.U., D.F., S.M., R.R., A.L.,T.L.V., provided patient samples and collected data. P.P., Z.C., J.M, Y.H., Y.F., D.H., E.R, preprocessed data. P.P., T.C.C, G.W., H.N., R.S., analyzed genomic data. P.P., A.E.S., A.E, S.B.P. performed statistical analysis. P.P., J.X., C.C., E.L., J.S., A.L., K.T analyzed single cell RNA data. D.D.G., F.B., S.K., Y.C., L.E.M., I.I., conducted experiments. B.L.W analyzed flow cytometry data. P.P., C.G.M., D.T.T. designed the study and wrote the manuscript.

Competing interests statement

D.T.T. received research funding from BEAM Therapeutics, NeoImmune Tech and serves on advisory boards for BEAM Therapeutics, Janssen, Servier, Sobi, and Jazz. D.T.T. has multiple patents pending on CAR-T. C.G.M. serves on scientific advisory board and honoraria for Illumina, and received research funding from Pfizer, equity from Amgen and royalties from Cyrus.

Funding

Gabriella Miller Kids First X01HD100702 (D.T.T., C.G.M., P.P., M.L.L., S.P.H., S.W., E.A.R., B.L.W., M.D., S.P.B., K.P.D., J.J.Y.), R03CA256550 (D.T.T., C.G.M., P.P., M.L.L., S.P.H., S.W., E.A.R., B.L.W., M.D., S.P.B., K.P.D., J.J.Y.), Alex’s Lemonade Stand Foundation (D.T.T., K.T., S.P.H.), the Leukemia and Lymphoma Society (D.T.T.), Hyundai Hope of Wheels (D.T.T., K.T., R.S.), R01CA193776 (D.T.T., B.W., K.T., C.G.M., S.P.H., J.J.Y., R.S., M.D.), U10CA180886 (D.T.T., M.L.L.), R01CA264837 (D.T.T., J.J.Y., C.G.M., K.T., B.W., R.S.), U10CA18099 (M.D.), U24CA114766 (D.T.T., M.L.L.), U24CA196173 (D.T.T.), R01GM115634 (R.W.K.), 1U54CA243124-01 (R.W.K.). American Lebanese and Syrian Associated Charities of St. Jude Children’s Research Hospital, The St Jude Chromatin Collaborative, P30CA021765 (C.G.M.), R35CA197695 (C.G.M.), U54 CA243124, T32CA236748 (C.G.M), St Jude Children’s Hospital Hematological Malignancies Program Garwood Fellowship (S.K.), 5F32CA254140 (L.E.M), P30CA021765 (G.W.). This research was supported in part by the National Cancer Institute grants. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Additional Information

Supplementary Information is available for this paper.

Reprints and permissions information is available at www.nature.com/reprints.

Summers, R. J. & Teachey, D. T. SOHO State of the Art Updates and Next Questions | Novel Approaches to Pediatric T-cell ALL and T-Lymphoblastic Lymphoma. Clinical lymphoma, myeloma & leukemia 22, 718-725 (2022). https://doi.org:10.1016/j.clml.2022.07.010
Roberts, K. G. et al. Targetable kinase-activating lesions in Ph-like acute lymphoblastic leukemia. N Engl J Med 371, 1005-1015 (2014). https://doi.org:10.1056/NEJMoa1403088
Brady, S. W. et al. The genomic landscape of pediatric acute lymphoblastic leukemia. Nat Genet 54, 1376-1389 (2022). https://doi.org:10.1038/s41588-022-01159-z
Holmfeldt, L. et al. The genomic landscape of hypodiploid acute lymphoblastic leukemia. Nat Genet 45, 242-252 (2013). https://doi.org:10.1038/ng.2532
Mansour, M. R. et al. Oncogene regulation. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. Science 346, 1373-1377 (2014). https://doi.org:10.1126/science.1259037
Liu, Y. et al. The genomic landscape of pediatric and young adult T-lineage acute lymphoblastic leukemia. Nat Genet 49, 1211-1218 (2017). https://doi.org:10.1038/ng.3909
Coustan-Smith, E. et al. Early T-cell precursor leukaemia: a subtype of very high-risk acute lymphoblastic leukaemia. Lancet Oncol 10, 147-156 (2009). https://doi.org:10.1016/s1470-2045(08)70314-0
Zhang, J. et al. The genetic basis of early T-cell precursor acute lymphoblastic leukaemia. Nature 481, 157-163 (2012). https://doi.org:10.1038/nature10725
Wood, B. et al. Prognostic Significance of ETP Phenotype and Minimal Residual Disease in T-ALL: A Children's Oncology Group Study. Blood (2023). https://doi.org:10.1182/blood.2023020678
Dunsmore, K. P. et al. Children's Oncology Group AALL0434: A Phase III Randomized Clinical Trial Testing Nelarabine in Newly Diagnosed T-Cell Acute Lymphoblastic Leukemia. J Clin Oncol 38, 3282-3293 (2020). https://doi.org:10.1200/JCO.20.00256
Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 37, 38-44 (2019). https://doi.org:10.1038/nbt.4314
Traag, V. A., Waltman, L. & Van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports 9 (2019). https://doi.org:10.1038/s41598-019-41695-z
Isoda, T. et al. Non-coding Transcription Instructs Chromatin Folding and Compartmentalization to Dictate Enhancer-Promoter Communication and T Cell Fate. Cell 171, 103-119.e118 (2017). https://doi.org:10.1016/j.cell.2017.09.001
Herranz, D. et al. A NOTCH1-driven MYC enhancer promotes T cell development, transformation and acute lymphoblastic leukemia. Nat Med 20, 1130-1137 (2014). https://doi.org:10.1038/nm.3665
Yashiro-Ohtani, Y. et al. Long-range enhancer activity determines <i>Myc</i> sensitivity to Notch inhibitors in T cell leukemia. Proceedings of the National Academy of Sciences 111, E4946-E4953 (2014). https://doi.org:doi:10.1073/pnas.1407079111
Nagel, S., Kaufmann, M., Drexler, H. G. & MacLeod, R. A. The cardiac homeobox gene NKX2-5 is deregulated by juxtaposition with BCL11B in pediatric T-ALL cell lines via a novel t(5;14)(q35.1;q32.2). Cancer Res 63, 5329-5334 (2003).
Soulier, J. et al. HOXA genes are included in genetic and biologic networks defining human acute T-cell leukemia (T-ALL). Blood 106, 274-286 (2005). https://doi.org:10.1182/blood-2004-10-3900
Seki, M. et al. Recurrent SPI1 (PU.1) fusions in high-risk pediatric T cell acute lymphoblastic leukemia. Nat Genet 49, 1274-1281 (2017). https://doi.org:10.1038/ng.3900
Di Giacomo, D. et al. 14q32 rearrangements deregulating BCL11B mark a distinct subgroup of T-lymphoid and myeloid immature acute leukemia. Blood 138, 773-784 (2021). https://doi.org:10.1182/blood.2020010510
Montefiori, L. E. et al. Enhancer Hijacking Drives Oncogenic BCL11B Expression in Lineage-Ambiguous Stem Cell Leukemia. Cancer discovery 11, 2846-2867 (2021). https://doi.org:10.1158/2159-8290.CD-21-0145
Chen, S. et al. Novel non-TCR chromosome translocations t(3;11)(q25;p13) and t(X;11)(q25;p13) activating LMO2 by juxtaposition with MBNL1 and STAG2. Leukemia 25, 1632-1635 (2011). https://doi.org:10.1038/leu.2011.119
Martincorena, I. et al. Universal Patterns of Selection in Cancer and Somatic Tissues. Cell 171, 1029-1041.e1021 (2017). https://doi.org:10.1016/j.cell.2017.09.042
Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol 12, R41 (2011). https://doi.org:10.1186/gb-2011-12-4-r41
Cao, X., Elsayed, A. H. & Pounds, S. B. Statistical Methods Inspired by Challenges in Pediatric Cancer Multi-omics. Methods Mol Biol 2629, 349-373 (2023). https://doi.org:10.1007/978-1-0716-2986-4_16
O'Connor, D. et al. The Clinicogenomic Landscape of Induction Failure in Childhood and Young Adult T-Cell Acute Lymphoblastic Leukemia. J Clin Oncol 41, 3545-3556 (2023). https://doi.org:10.1200/jco.22.02734
Rahman, S. et al. Activation of the LMO2 oncogene through a somatically acquired neomorphic promoter in T-cell acute lymphoblastic leukemia. Blood 129, 3221-3226 (2017). https://doi.org:10.1182/blood-2016-09-742148
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583-589 (2021). https://doi.org:10.1038/s41586-021-03819-2
Sulis, M. L. et al. NOTCH1 extracellular juxtamembrane expansion mutations in T-ALL. Blood 112, 733-740 (2008). https://doi.org:10.1182/blood-2007-12-130096
Liu, Y. et al. Discovery of regulatory noncoding variants in individual cancer genomes by using cis-X. Nat Genet 52, 811-818 (2020). https://doi.org:10.1038/s41588-020-0659-5
Yada, M. et al. Phosphorylation-dependent degradation of c-Myc is mediated by the F-box protein Fbw7. Embo j 23, 2116-2125 (2004). https://doi.org:10.1038/sj.emboj.7600217
Dunsmore, K. P. et al. Children’s Oncology Group AALL0434: A Phase III Randomized Clinical Trial Testing Nelarabine in Newly Diagnosed T-Cell Acute Lymphoblastic Leukemia. Journal of Clinical Oncology 38, 3282-3293 (2020). https://doi.org:10.1200/jco.20.00256
Winter, S. S. et al. Improved Survival for Children and Young Adults With T-Lineage Acute Lymphoblastic Leukemia: Results From the Children’s Oncology Group AALL0434 Methotrexate Randomization. Journal of Clinical Oncology 36, 2926-2934 (2018). https://doi.org:10.1200/jco.2018.77.7250
Harrell, F. E., Jr., Lee, K. L. & Mark, D. B. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 15, 361-387 (1996). https://doi.org:10.1002/(sici)1097-0258(19960229)15:4<361::Aid-sim168>3.0.Co;2-4
Zhou, X. et al. Exploring genomic alteration in pediatric cancer using ProteinPaint. Nat Genet 48, 4-6 (2016). https://doi.org:10.1038/ng.3466
Zhou, X. et al. Exploration of Coding and Non-coding Variants in Cancer Using GenomePaint. Cancer Cell 39, 83-95.e84 (2021). https://doi.org:https://doi.org/10.1016/j.ccell.2020.12.011

Yes there is potential Competing Interest. D.T.T. received research funding from BEAM Therapeutics, NeoImmune Tech and serves on advisory boards for BEAM Therapeutics, Janssen, Servier, Sobi, and Jazz. D.T.T. has multiple patents pending on CAR-T. C.G.M. serves on scientific advisory board and honoraria for Illumina, and received research funding from Pfizer, equity from Amgen and royalties from Cyrus.

TALLX01SupplementaryInformation.docx
Supplementary Information
ExtendedDataFigures.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

The genomic basis of childhood T-lineage acute lymphoblastic leukemia

Status:

Version 1

Abstract

Figures

INTRODUCTION

RESULTS

Integrated genomic analysis identifies 15 subtypes of T-ALL

Detection of significantly altered coding and non-coding alterations

Diverse oncogene-activating non-coding alterations

Oncogene intragenic SV and intronic SNVs

TLX3 and NKX2-1 can be subdivided based on genomic profiles

Refined classification of TAL1/TAL2/LYL1 and LMO1/LMO2 deregulated T-ALL

Outcome analysis reveals genomic risk factors associated with refractory disease, relapse and secondary malignancies

Multivariable genomic models accurately predict patients at risk in T-ALL

DISCUSSION

METHODS

Patient cohort

Central determination of immunophenotype and minimal residual disease (MRD)

DNA/RNA isolation

Sequencing

Ribosomal RNA (rRNA) reduction RNA-seq library preparation and sequencing:

Whole Exome library preparation:

Whole Genome library preparation and sequencing:

Other sequencing:

Data analysis

Subtyping

UMAP

Clustering

Defining subtypes using integrative analysis

Assembly of final genetic alteration dataset

Assembly of Healthy normal thymus and bone marrow reference set

Gene expression analysis

Outcome Analysis

Univariable Screening and Competing Risks

Multivariable Analysis

In Vitro Experiments

Engineering MED12 knockout cells

Western blotting

Sequencing validation of NOTCH1 exon 27-28 intronic SNV

DECLARATIONS

Data Availability

Code Availability

Acknowledgements

Author Contributions

Competing interests statement

Funding

REFERENCES

Additional Declarations

Supplementary Files

Status:

Version 1