Genome-wide mapping of G4s in Mtb: from ChIP-seq to CUT&Tag
To map the location of G4s in Mtb genome-wide, we initially applied a chromatin immunoprecipitation protocol (ChIP) with BG4, the most used anti-G4 antibody17, followed by either qPCR or sequencing. Experiments were conducted under standard laboratory growth conditions and under oxidative stress conditions, to compare the genomic distribution of G4s in Mtb cultures in the environments to which the bacterium is exposed during infection of the host.
ChIP is a multi-step antibody-based technique which allows the identification of specific regions within chromatin18. Initially, bacterial cells are harvested, cross-linked with formaldehyde to preserve biological interactions, and lysed before chromatin shearing. This latter step is critical for the good outcome of the protocol and requires fragments’ distribution to be comprised between 100–500 bp19. Subsequently, immunoprecipitation with the antibody of interest is performed and the enrichment of the genomic sequences is assessed through qPCR or next generation sequencing. In our set-up, each ChIP experiment consisted of three samples: IP (immunoprecipitated DNA), MOCK and input. The IP was treated with the antibody of interest, whereas the MOCK sample underwent immunoprecipitation in the absence of the antibody, thus serving as control for non-specific background noise. The input sample, consisting of unprocessed sheared chromatin, allows qPCR data normalization and target enrichment evaluation18. qPCR was thus performed to test the effectiveness of the applied protocol. To this end, three genomic regions were selected as positive controls: two of them included G4 sequences previously shown to fold in vitro, namely zwf1 and mosR12, and a third one, lpqS, as a further G4-forming region. We performed CD to confirm the G4 folding of the selected positive regions (Fig. S1). As negative control, we identified a genomic segment in the Rv1507A gene that, based on the sequence, could not fold into G4 in vitro. ChIP-qPCR analysis showed remarkable enrichment of IP positive samples with respect to the MOCKs, both in standard growth conditions and upon treatment with H2O2, indicating high specificity of BG4. Replicates were highly consistent and positive regions were at least five-fold enriched over the negative control, a value in line with successful ChIP-qPCR data reported in the literature (Fig. 1A-B)20. Encouraged by these results, we proceeded with ChIP-seq analysis. Here, however, the identification of a clear signal above the background proved to be challenging. Despite the presence of a general enrichment in putative G4 positive regions (i.e. zwf1) and no enrichment in the negative region Rv1507A with respect to the input sample (Fig. 1C, S2), broad enriched domains were observed and commonly used peak calling tools struggled to identify true enriched regions above the background. To overcome this issue, the bioinformatic tool EPIC221, specifically designed to identify broad domains, was employed but the identification of few peaks combined with the strong similarity between IP and input samples, made these data unreliable.
RNA Polymerase antibody profiling validate the CUT&Tag protocol in the Mtb genome
To proceed with our goal of identifying the G4 landscape in Mtb, we explored the possibility of applying the CUT&Tag technique, which had never been applied to a bacterial genome before.
In the CUT&Tag protocol cells are first bound to magnetic beads and then mildly permeabilized to allow the antibody of interest and the Tn5 enzyme to reach the cell genome22. We initially optimized the binding of Mtb to concanavalin A (ConA)-coated beads by measuring the optical density upon different incubation times (10 and 20 minutes). Good bead saturation was achieved at 20 minutes of incubation (Fig. S3A). To assess the optimal number of bacteria to be processed, we performed CUT&Tag on three different sample sizes, i.e. 1, 10 and 20 million cells, quantified the bacterial DNA before and after library preparation (Fig. S3B-C) and found that the highest amount of DNA was collected using 10 million cells.
To validate the novel CUT&Tag protocol in Mtb, we initially used the antibody targeting the RNA polymerase subunit β, whose genome-wide data obtained by ChIP-seq were previously reported23. Experiments were conducted under both standard growth and oxidative stress conditions. Briefly, bacterial cultures were mildly fixed with formaldehyde and the permeabilized cells were incubated with the antibody, tagmented with Tn5 transposase and PCR-amplified fragments were subjected to next-generation sequencing. Each CUT&Tag experiment consisted of three biological replicates that presented high Pearson’s correlation coefficients, indicating robust reproducibility (Fig. 2A), and were clustered according to the growth conditions (standard or oxidative stress growth), as shown in the PCA analysis plot (Fig. 2B). Proper sequencing depth was reached in all samples from both conditions : FRiP values largely exceeded 0.01, a threshold commonly associated with successful experiments (Fig. 2C)24. Peak size distribution showed that the median peak size across all samples was ∼200 bp, as expected20 (Fig. 2D). The sequencing profile of the RNA polymerase subunit β showed peaks across the entire Mtb genome, identified through the MACS2 tool with IgG subtraction. High-confidence peaks, i.e. peaks present in at least two out of three biological replicates, were selected for downstream analysis. Under standard growth conditions, an average of 1101 peaks (replicate 1 = 1059, replicate 2 = 997, replicate 3 = 1373) were identified, while under oxidative stress, an average of 986 peaks (replicate 1 = 1023, replicate 2 = 992, replicate 3 = 1029) were observed. The RNA polymerase peaks predominantly localized around gene start sites (36.7% and 36.1% in standard and oxidative stress condition, respectively), within genes (23.9% and 23.7%, respectively), and upstream of genes (21.8% and 21.4%, respectively) (Fig. 2E-F). As the β subunit constitutes an essential component of the holoenzymatic complex, the predominant localization of peaks near gene start sites indicates that the enzyme is preparing to initiate transcription. The significant abundance of peaks residing within gene bodies or overlapping gene ends indicates that the RNA polymerase complex is either initiating or finishing the transcription process. This observation aligns with the pivotal role of the β subunit, which serves as a fundamental component of the enzyme involved in the entire transcription process, from initiation to termination25.
To further validate our CUT&Tag results, we compared them with the previously published ChIP-seq data for RNA polymerase subunit β from Uplekar et al.23: 54% of the peaks (595/1101) were obtained with both experimental approaches (Fig. S4). All these data prove the consistency and reliability of our results and support the application of the CUT&Tag protocol in Mtb.
RNA Polymerase peaks are associated with highly expressed genes
Integration of the CUT&Tag datasets with RNA-seq was performed to strengthen the validity of the CUT&Tag analysis within the Mtb genome. The raw read counts were converted into transcripts per million (TPM) for individual genes and associated to their corresponding genetic locus. Gene expression levels were next correlated with the presence or absence of CUT&Tag RNA polymerase peaks. A statistically significant increase in gene expression was observed in regions exhibiting RNA polymerase peaks versus those lacking them, both in standard and oxidative stress conditions (Fig. 3A-B). By dividing genes into three categories based on gene expression interquartile (IQR) values (low, medium, and high), we observed a substantial increase in the expression levels of all RNA polymerase peaks in all categories (Fig. 3C-E and S5). Hence, the presence of RNA polymerase peaks was consistently associated with genes exhibiting increased transcriptional activity.
We next performed differential gene expression analysis (DEG) between standard and oxidative stress conditions. It is known that Mtb treatment with 5 mM H2O2 causes a substantial alteration of the bacterial expression profile26. Genes were considered as differentially expressed when meeting criteria of log2-fold change exceeding 1 or falling below − 1 and with adjusted p-value below 0.05. Under these criteria, 732 and 650 genes were found to be upregulated and downregulated, respectively. Within these genes, we identified 211 upregulated and 82 downregulated genes associated with an RNA polymerase peak, constituting 28.8% and 12.6%, respectively, of the differently expressed genes in response to oxidative stress (Fig. 3F). In response to oxidation, the induction of scavenging enzymes becomes crucial for Mtb survival27,28. For instance, katG, which efficiently reduces reactive oxygen species (ROS) concentrations, was strongly upregulated, along with cysN, responsible for sulfate activation in cysteine biosynthesis, and with enzymes involved in the DNA damage response pathway, all confirming the status of oxidative stress within the bacteria. Other genes that were upregulated upon oxidative stress conditions, included iron-related genes such as the Mtb gene cluster encoding mycobactin, an iron-chelating siderophore26, irtA and ideR involved in iron import and regulation, respectively, DNA repair enzymes, such as recA, radA, and dnaE226,27, and protein repair or degradation systems, such as clpC1 and clpP2.
All these results corroborate the efficiency of the CUT&Tag protocol in the Mtb genome, both in exponential growth and oxidative stress conditions.
G4 CUT&Tag reveals a stress-responsive G4 landscape characterized by two guanine tract motifs
We next applied the CUT&Tag protocol to define the G4 landscape in Mtb, using the BG4 antibody. We performed the analysis in standard and oxidative stress growth conditions to assess G4 distribution and possible role in both environments. Analogously to RNA-Pol-CUT&Tag, replicates obtained with BG4 were highly consistent, as shown by Pearson’s correlation (Fig. S6A), with the expected fragment distribution29 (Fig. S6B). Library complexity and sequencing depth exceeded the accepted threshold, confirming the efficiency of the G4-CUT&Tag (Fig. S6C). Visual inspection of G4-CUT&Tag sequencing profiles showed remarkable improvement in signal-to-noise ratio (Fig. S6D) with respect to G4-ChIP data (Fig. 1C), allowing the identification of sharp peaks distributed across the entire Mtb genome. Remarkably, the number of retrieved G4 peaks was higher in the oxidative stress condition than in standard growth (Fig. 4A). In the latter condition, an average of 745 high-confidence peaks (replicate 1 = 836, replicate 2 = 416, replicate 3 = 1580) were identified, while under oxidative stress, an average of 1080 high-confidence peaks (replicate 1 = 1138, replicate 2 = 1061, replicate 3 = 1335) was observed. Annotation of G4 peaks showed an unprecedented distribution of G4s, with almost 60% of peaks in both conditions located in the gene body. The remaining peaks mostly overlapped with gene start and end sites (Fig. 4B). These results strongly differ from the typical eukaryotic distribution, where G4s accumulate at promoter regions30,31, thus suggesting new regulatory mechanisms. We next performed G4 motif prediction on the sequences included in the high confidence peaks. We screened for motif length up to 30 bp with the following characteristics: i) G-tracts of at least 2 guanines each; ii) loop length up to 7 (short) or 12 (long) nucleotides. Surprisingly, considering guanine abundance in the Mtb genome, ~ 99% of putative G4-forming sequences were characterized by two-guanines tracts and short loops (Fig. 4C), which characteristics are generally associated to weaker G4s.
We selected the twenty top high-confidence peaks and searched for a consensus motif by STREME analysis. Guanine-rich patterns were retrieved in both growth conditions, thus confirming the robustness of our results (Fig. 4D). We used CD spectroscopy to confirm G4 folding of six selected retrieved sequences per condition (Fig. 4E-F). Most sequences showed the typical CD signature of a G4 with antiparallel topology, characterized by two positive peaks at λ ~ 240 and 290 nm and a negative one at λ ~ 260 (STD2, STD3, STD4, STD6, OX1, OX6). The remaining sequences showed a mixed G4 topology, with positive peaks at λ ~ 260 and 290 nm, with different intensities. Overall, all tested sequences were confirmed to fold into G4s.
G4-bearing genes under oxidative stress are associated with lower transcript levels than in standard growth conditions
To elucidate the physiological role played by G4s within the Mtb genome, we performed RNA-seq analysis in both growth conditions and integrated transcripts values with G4-CUT&Tag results. Upon standard growth condition, no statistical difference was observed in terms of transcripts levels between genes bearing or not a G4 peak (Fig. 5A). In the oxidative stress conditions, we observed an overall reduction of gene expression likely induced by the treatment26,27 and G4-bearing genes showed a statistically significant reduction of transcripts when compared to the non-G4 genes (Fig. 5B). To deepen this aspect, we further subdivided transcript values into high, medium and low categories according to the interquartile range method. Genes harboring a G4 motif and belonging to the high and medium category exhibited a significant reduction in gene expression levels compared to their non-G4 counterparts, while the low category did not show significant differences (Fig. 5C-E). These data suggest a direct correlation between G4 folding and transcription downregulation in oxidative stress conditions. To further corroborate this scenario, we identified the G4 peaks unique for the oxidative condition, i.e. those peaks in which the G4 is not present in standard condition whereas the folding is induced upon H2O2 treatment. We associated each G4-bearing gene with the relative transcript level in standard and oxidative growth conditions, and normalized the data on the sigA housekeeping gene, the expression of which is not affected by the oxidative treatment32. By comparing the transcript levels of these genes in both conditions, we observed a significant downregulation upon G4 folding (Fig. 5F), therefore confirming that in the Mtb genome G4 formation in oxidative stress is strictly related to downregulation of gene expression.
We thus performed GO enrichment analysis on the oxidative stress-unique G4 peaks to identify the pathways in which such genes may be involved. G4-bearing genes induced upon oxidative treatment were mostly associated with metabolic and biosynthetic processes and cell wall biogenesis (Fig. 5G). We applied the same enrichment analysis to the unique peaks regrouped based on their expression levels: we divided the dataset into upregulated, downregulated and non-differentially expressed G4-bearing genes, according to the RNA-seq values. We observed that the downregulated group mainly included genes involved in cell wall organization and biogenesis, while the non-differentially expressed genes were mainly associated with biosynthetic pathways (Fig. S7). The up-regulated group included a number of genes that were too small to perform the analysis. These data suggest that Mtb G4s may act as defensive agents that protect the bacterium from the ROS damage, by slowing down the biogenesis of new wall components, hence slowing bacterial replication.
Overall, our findings indicate that the G4-bearing genes recovered during Mtb stress response predominantly engage in various metabolic processes, as well as catalytic and binding functions. Interestingly, the identified enriched pathways are typical of Mtb stress-induced response, both at transcriptomic and proteomic levels33, suggesting that G4s might be constitutive elements for Mtb response to stress.