GENERATION OF AN ISOGENIC DSTAG2 CELL MODEL
Given that the most common mutations within STAG2 result in the introduction of a premature stop codon early in the STAG2 coding region (including a mixture of frame shifting indels and stop gains)(3), we utilized a CRISPR/Cas9 system to target the STAG2 coding region within the OCI-AML3 cell line. We selected the OCI-AML3 line, as it was derived from a normal karyotype (NK) patient with NPM1c+ (Type A) and DNTM3A mutations, which have been shown to co-occur with STAG2 mutations in patient samples (13, 14). We targeted the commonly mutated exon 20 of STAG2 (ensuring all possible isoforms were affected), resulting in a hemizygous (STAG2 is encoded within the X-chromosome and is thus hemizygous in this male cell line) 11 bp deletion (c.1838_1848del) and a single base insertion at the same site (c.1838-1848insG) resulting in a frameshift mutation and the introduction of a premature stop codon (p. L613CfsX5) 11 bp downstream of the mutation site (Figure 1A).
Analysis of STAG2 mRNA levels in the edited cell line (DSTAG2), highlighted a ~7 fold reduction in expression compared to the STAG2-WT counterpart, indicating that the mutant mRNA is degraded through non-sense mediated decay (Figure 1B). When examining the mRNA expression of STAG1 it was interesting to observe an increase of ~1.5 fold relative to the STAG2-WT counterpart (Figure 1C). The other members of the cohesin complex (RAD21, SMC1A & SMC3) exhibited very similar patterns of expression at the mRNA level in both the STAG2-WT and DSTAG2 cell lines (Figure 1C). In keeping with this, STAG2 protein expression was completely abolished in this model (Figure 1D). Additionally, when examining the expression of other members of the cohesin complex, including STAG1, similar protein levels were observed in both the DSTAG2 and STAG2-WT cells (Figure 1D).
STAG1 COMPENSATES FOR LOSS OF STAG2 CHROMATIN BINDING
Chromatin immunoprecipitation for STAG1, STAG2 and CTCF followed by deep sequencing (ChIP-Seq) was carried out to confirm the loss of STAG2 chromatin binding and to assess the chromatin binding of other core members of the cohesin complex in the absence of STAG2 (Figure 2A). Binding peaks were called using MACS2 and Deeptools used to generate summary binding density plots to visualise binding across the genome (Figure 2B). The genomic distribution of STAG1 and STAG2 in STAG2-WT cells was similar to that previously described in other cell lines (15) (Figure S1 and Table S1).
As expected, STAG2 binding was completely abrogated in the DSTAG2 cells, whilst STAG1 appeared to have a slightly stronger binding profile in the DSTAG2 cells compared to STAG2-WT cells (Figure 2B). Conversely, the CTCF binding profile was slightly weaker in the DSTAG2 cells (Figure 2B).
To assess this at an individual binding region level, genome-wide signal density plots for STAG2, STAG1 and CTCF were generated using peak summit locations called in the STAG2-WT cells for each protein (Figure 2C). A total of 34,670 STAG2 binding peaks were identified in STAG2-WT cells, with none identified in the DSTAG2 cells, again confirming the complete loss of functional STAG2 in these cells. Analysis of STAG1 binding revealed a ~35% increase in the number of STAG1 binding peaks (from 15,164 to 22,955) in the STAG2-WT and DSTAG2 cells respectively. Many of these additional STAG1 binding sites corresponded to canonical STAG2 binding sites suggesting a degree of compensation by STAG1 in the absence of STAG2. To confirm this, STAG1 peak calls from the STAG2-WT cells were intersected with STAG1 binding sites from the DSTAG2 cells. This showed that ~95% (14,375/15,164) of the canonical STAG1 binding peaks were maintained in the DSTAG2 cells but also identified an additional 8,580 new STAG1 binding sites, only present in the DSTAG2 cells (Figure 2D). To test if these new STAG1 binding peaks represented regions previously occupied by STAG2, we intersected the locations of these binding peaks with the STAG2 binding peaks called in STAG2-WT cells. Of the 8,580 new STAG1 sites, ~86% (7,392) overlapped with identified sites in STAG2-WT cells; these also coincided with 8046 (94%) and 7947 (92%) CTCF sites in the STAG2-WT and ∆STAG2 cells respectively.
This suggests that the increased STAG1 binding observed in the DSTAG2 cells has partially, but not completely, compensated for loss of STAG2 and was predominantly associated with CTCF binding sites.
CHRONIC LOSS OF STAG2 LEADS TO ALTERED 3D GENOME ARCHITECTURE
3D chromatin structure is organised into loops and domains, which are anchored at convergent CTCF binding sites brought together through loop extrusion facilitated by cohesion (16). Therefore, to examine 3D architecture, HiChIP analysis (11) targeting CTCF to identify the CTCF/cohesin bound chromatin interaction sites present in both cell populations was undertaken.
Topologically associated domains (TAD’s) were identified using the Juicer Tools Arrowhead algorithm at both 10 kb and 25 kb resolutions, retaining unique domain calls from both resolutions to generate a consensus domain list (12). This identified 5,014 and 622 domains in STAG2-WT and DSTAG2 cells respectively. However, despite the reduction in domains called in the DSTAG2 cells, the characteristic square appearance of a significant number of TAD’s was apparent when visually assessing contact maps generated from the DSTAG2 cells, but with significantly reduced signal (Figure 3A). This indicates that some signal was present and visually observable, but was below the call detection threshold of the Juicer Arrowhead detection software. Nonetheless, far fewer TADs were detected in the DSTAG2 cells, suggesting that this may occur due to fewer contacts between CTCF anchor points.
The Juicer Arrowhead software (12) generates scores that serve as a measure of likelihood that the boundaries of the domains called are actual domain corners, with interactions likely to occur within the domain. We found that domains called in the STAG2-WT cells scored on average 1.14 (range 0.11 – 1.88) whilst in the DSTAG2 cells the score was reduced to an average of 0.96 (range 0.19 – 1.69) (Figure 3B). The decrease in average score reflected the decreased frequency of domain forming interactions observed in the DSTAG2 cells.
Additionally, TADs in the STAG2-WT cells, ranged in size from 120 kb to 5.8 Mb (median 425 kb) with 89% of interactions less than 1 Mb, whilst in DSTAG2 cells, the domains ranged from 160 kb to 6.4 Mb (median 575 kb), with 60% of them less than 1 Mb in the DSTAG2 cells (Figure 3C). This represents a shift in the domain size, with decreased numbers of smaller domains (<250 kb), maintenance of intermediate sized domains (250-800 kb) and formation of larger domains (>800 kb) associated with STAG2 loss, which is also visible when comparing the contact maps for STAG2-WT versus DSTAG2 HiChIP data (Figure 3A & S2A&C).
Using virtual 4C (v4C)(17) plotting enabled changes in chromatin interactions with defined anchor regions between the STAG2-WT and DSTAG2 cells to be visualised. We plotted regions where we visually observed altered domain structure between the STAG2-WT and DSTAG2 cells in interaction maps and observed large differences in interaction frequency, with the DSTAG2 cells displaying greatly reduced interaction frequencies on normalised plots (Figure 3D and S2B&D). The loss of STAG2 not only results in a reduced number of domains present, but the domains that are able to form with only STAG1 present predominantly result in the loss of smaller domains and larger domain formation.
DNA LOOP SIGNAL INTENSITY IS REDUCED WITH CHRONIC STAG2 LOSS
The altered TAD formation observed in DSTAG2 cells suggested that loop formation might be affected in these cells. Therefore, the presence/formation of loops was determined using the Hi-C Computational Unbiased Peak Search (HiCCUPS) tool (18). The appearance of focal peaks in proximity ligation data is indicative of a “loop” between two DNA locations. The analysis was run at 5 kb, 10 kb and 25 kb resolutions, with the final output being a merged consensus loop list from all 3 resolutions for each replicate library. The final loop call list contained 2,859 loops in the STAG2-WT cells compared to 994 in the DSTAG2 cells. Surprisingly, the vast majority of the loops were of similar size range (Figure 3E), albeit with some loss of smaller loops (<100 kb) in the DSTAG2 cells. Although significantly fewer loops were called in the DSTAG2 cells, this is not caused by the significant loss of loops of any specific size. Importantly, although much weaker loop peak signals were present in the DSTAG2 cells, the formation of new loops was not seen. Aggregate Peak Analysis (APA) of the consensus loop lists showed that despite the low loop peak signal within the DSTAG2 cells, clear peak foci were visible, albeit with a reduced pixel signal intensity, when compared to STAG2-WT cells (Figure 3F).
GENOME COMPARTMENTALISATION ANALYSIS HIGHLIGHTS COMPARTMENT SWITCHING ASSOCIATED WITH CHRONIC LOSS OF STAG2
Transcriptionally active and repressed compartments within the genome are observable within the HiChIP data using Pearson’s correlation matrices of observed over expected read densities creating a plaid pattern at a resolution of 50 kb (19). These patterns can be loosely interpreted as compartments defined as A or B, which correlate with euchromatin and heterochromatin respectively. Eigenvector calculation at the same resolution was used to determine if any differences in compartmental shape and edges occur in the DSTAG2 cells compared to their WT counterparts (Figure 4A). Positive eigenvector values were assigned by performing H3K27ac and H3K27me3 ChIP-Seq in both the WT and DSTAG2 cells and aligning with the eigenvector output.
Eigenvector scoring of the DSTAG2 HiChIP data showed that ~10% of compartments scored as type B in the WT cells had switched to type A compartments, whilst ~8% of compartments scored as A were type B compartments in the DSTAG2. This “switch” in compartment status is greater than would have been expected by chance (~5%) (Figure 4B).
Furthermore, when comparing the compartmental plaid patterns, DSTAG2 compartments appeared to “blend” at edges by switching from A-B or B-A at earlier or later points than seen in the WT counterparts, suggesting a degree of “slippage” at compartment edges (Figure 4C).
CHRONIC LOSS OF STAG2 ALTERS TRANSCRIPTIONAL PROGRAMMES
Previous studies assessing the impact of acute or short-term depletion of members of the cohesin complex showed a modest impact on transcription (20). To examine the impact of chronic loss of STAG2, a clinically relevant scenario, in a relevant genetic background, RNA-seq was performed on the matched isogenic cell pair and the resultant gene expression profiles compared (Figure 5A). The chronic loss of STAG2 had a relatively large effect on overall gene expression with 24.5% of the active transcriptome (3149 genes) exhibiting some degree of significant change in expression; 1667 (53%) genes decreased and 1482 (47%) genes increased in expression (Figure 5B & Table S2). To identify genes whose expression was more definitively deregulated, a >2 fold-change cut-off identified 225 (32.6%) upregulated and 466 (67.4%) genes downregulated under chronic loss of STAG2 (Table S2). Increasing the cut-off level to >5 fold resulted in the identification of 13 (12.9%) up-regulated and 88 (87.1%) down-regulated genes (Table S1).
To identify specific biological processes affected by chronic STAG2 loss, Ingenuity Pathway Analysis (IPA) of the 691 differentially expressed genes (>2 fold-change) identified key networks that were relevant to hematopoietic development and function “cell death” and “cell morphology” (Figure S4). This suggests that whilst the majority of genes and pathways have not become highly deregulated, the chronic loss of STAG2 function has resulted in the transcriptional deregulation of key leukaemogenic pathways, which may contribute to disease development/phenotype.
The disruption in transcription in DSTAG2 was unexpected, as previously published data had suggested that STAG1 was predominantly responsible for cohesin binding near gene promoter regions and thereby regulating transcription, whilst STAG2 may be more involved in the maintenance of global 3D chromatin structure (21). To assess this, the ChIP-Seq data was used to re-analyse STAG1 and STAG2 binding surrounding transcription start sites (TSSs). Average signal density plots confirmed that, on a genome wide level in the STAG2-WT cells, STAG1 was predominantly bound at TSSs in comparison to STAG2 (Figure 4C). Intriguingly, STAG1 binding at TSSs was nearly identical in the DSTAG2 cells when compared to the STAG2-WT, suggesting that STAG1 based cohesin complexes contribute to the vast majority of transcription-associated interactions in the genome.
Intriguingly, in wild type cells, STAG2 was predominantly bound near the TSSs or promoter regions of the 691 differentially expressed genes than STAG1 (Figure 4C). Moreover, STAG1 binding at the TSS did not change in the setting of chronic STAG2 loss, indicating that at these sites, STAG1 was unable to compensate for STAG2. The presence of STAG2, and the lack of STAG1 compensation, at/near the TSSs of these genes further suggest that STAG2 may play a role in linking transcription regulation regions with promoters and enhancers and/or protecting certain promoters from interactions with enhancers in 3-dimensional space.
GENE EXPRESSION CHANGES MEDIATED BY LOSS OF STAG2 ARE LINKED WITH ALTERED DOMAIN STRUCTURE
To examine the relationship between STAG2 mediated changes in genome structure and altered transcription, we specifically focused on the HOXA cluster, as multiple HOXA genes were de-regulated upon chronic STAG2 loss, and deregulation of these genes is highly associated with leukaemogenesis (22). Interestingly, the region containing the HOXA cluster features extensive structural control, with many CTCF/cohesin sites positioned throughout the locus.
From the ChIP-Seq data, seven strong CTCF sites were identified that were associated with the HOXA locus (Figure 5A). These sites linked the 3’ region of a TAD (300 kb), to an upstream anchor within the SKAP2 gene. The major boundary edge of this TAD (termed C7/9 boundary (C=CTCF)) was located between the HOXA7 and HOXA9 sites with HOXA9, HOXA10, HOXA11, HOXA13 and HOTTIP lying downstream in a region of high H3K27ac (Figure 5A). The expression of these genes remained relatively unchanged in the DSTAG2 cells compared to WT (Figure 5B). In contrast, the genes lying upstream of the C7/9 boundary (HOXA2, HOXA3, HOXA4, HOXA5, HOXA6 and HOXA7) were all significantly up-regulated in the DSTAG2 cells (2.2 to 3.41-fold higher than the STAG2-WT cells) (Figure 5B). The upregulation of the genes upstream of the C7/9 boundary was also associated with increased H3K27ac and decreased H3K27me3, suggesting an association with altered compartmentalisation. However, the resolution of the Pearson’s correlation and eigenvector calculations used to visualise genomic compartmentalisation is not sufficient to allow us to examine this finite region of the genome. Nonetheless, changes in compartmentalisation are generally driven by altered local chromatin structure, therefore v4C was used to visualise the domain region spanning the 300 kb TAD region using the major CTCF boundary site within SKAP2 as the anchor point (Figure 5C).
Significantly reduced interaction frequency and novel points of interaction between the SKAP2 anchor point and with different CTCF sites within the HOXA cluster were identified. The major interaction in STAG2-WT cells occurred between the SKAP2 anchor point and the C7/9 boundary; the frequency of this interaction was significantly reduced in the DSTAG2 cells, with a high-frequency interaction now occurring downstream at the C10/11 boundary, resulting in a larger sub-TAD domain (Figure 5C).
The change in local structure within this TAD may contribute to altered compartmentalisation in this region, as indicated by the altered H3K27ac and H3K27me3 signals across the C7/9 boundary. This would lead to transcriptional activation of the genes within the TAD. Specifically the genes upstream of the C7/9 boundary, which would include the early HOX genes (HOXA2-A7), would now lie within the transcriptionally active TAD in which the more highly expressed late HOX genes (HOXA9-13) were normally located.
To assess whether the up-regulation of early HOX genes occurs in patients with STAG2 mutant disease, HOXA cluster gene expression was examined in two different public datasets held within the Gene Expression Omnibus (TGCA AML dataset GSE68833 and CD34+ cells from MDS patients, GSE58831) (Figure S6-S7) (23). Although STAG2 mutant disease is relatively rare in these datasets and the fraction of STAG2 mutant cells in either dataset is somewhat diluted due to the presence of non-leukemic lymphoid cells in each sample (the average blast cell count in AML samples is <62%, whilst the average pre-isolated blast cell count is <19%). However, both datasets showed significant up-regulation of many of the early HOXA genes; A5 and A7 in AML patients and A3, A4 and A7 in MDS patients.
The formation of a larger domain spanning the HOXA cluster led to upregulation of genes normally held within a repressive domain. However, the transcriptional consequences of the loss of smaller domain structures associated with chronic loss of STAG2 were not clear. Interestingly, when examining biologically important pathways from our IPA analysis, a large number of downregulated genes were associated with the MAPK signalling pathway (Figure 6A-B). These genes mapped to genomic locations that were associated with loss of small domain structures, which may function to link promoter and enhancer regions (Figure 6C-D & S5A-B). One gene of interest was DUSP4, which has been linked with sensitivity to MEK inhibition (24). Analysis of DUSP4 expression in the same publically available datasets described earlier showed that DUSP4 expression was lower, but not significantly, in STAG2 mutant samples compared to STAG2 wild-type samples (Figure S8). This was likely due to the limited number of STAG2 mutated samples, the reduced blast cell fraction in these samples and the more subtle changes in expression of these genes observed in our ∆STAG2 model.
To determine if the downregulation of MAPK signalling genes associated with the loss of smaller domain architecture in DSTAG2 cells, the sensitivity of the isogenic cell lines to MEK inhibition was assessed. The ∆STAG2 cells were significantly more sensitive to treatment with the dual MEK1/2 inhibitors Selumetanib and Trametinib, than the STAG2-WT cells (Figure 6E). Consistent with this, ∆STAG2 cells exhibited lower dose inhibition of MEK1/2 (assessed by ERK1/2 phosphorylation) and extremely low dose induction of apoptosis (assessed by PARP and Caspase 3 cleavage) in comparison to STAG2-WT cells (Figure 6F & S5C-E).