Cell culture and generation of AID Cell lines
HCT116 cell lines were cultured in DMEM (Thermo Fisher Scientific), supplemented with 10% (v/v) FBS, 2 mM L-glutamine at 37ºC. To generate HCT116-AID cell lines, the HCT116-OsTIR1 cells were transfected into a 10 cm petri dish with 4.8 µg of guide RNA plasmid (based on pX330-U6-Chimeric_BB-CBh-hSpCas9 (addgene #42230)) and 3.6 µg of donor plasmids (pMK289 (addgene #72827) and pMK290 (addgene #72828)) by using the Calcium Phosphate Cell Transfection Kit (Beyotime, C0508). 8 hours after transfection, the media were changed with fresh media and cells were selected with 500 µg/mL neomycin (Biofroxx,1150GR005) and 100 µg/mL hygromycin (Sino Biological Inc, 50708-mccH). After 10 ~ 15 days of selection, individual clones were isolated and screened by genomic DNA PCR with corresponding primer sets (details in Supplementary Table 1). The correct clones were further confirmed by the western blot with corresponding antibodies. Before the experiments, 500 mM Indole-3-Acetic Acid Solution (Phytotech, I364) was dissolved in DMSO as the stock solution.
To build the inducible AID system, HEK293T cells were transfected with pSW-2XFlag-TIR1 (F74A) BLA-TET-ON plasmid together with two helper plasmids (psPAX2 and pMD2.G) using the standard transfection protocol of Lipofectamine 2000 (Invitrogen, 11668019). HCT116 cells were infected twice and screened with 6 µg/ml Blasticidin for 14 days. The HCT116-OsTIR1(F74A) clones were confirmed with the western blot against OsTIR1 (Anti-OsTIR1 pAb, MBL PD048). The development of HCT116-AID2 cell lines were similar, except with 10 µM of 5-Ph-IAA (MCE, HY-134653) after 24 hours of the 1µg/ml Doxycycline induction.
Chromatin RNA sequencing (ChrRNA-seq) and data processing
ChrRNA-seq was performed as previously described53. Briefly, 5 ~ 10 millions of cells were suspended in cold cytoplasmic lysis buffer (0.15% NP-40, 10mM Tris pH7.5, 150mM NaCl) and incubated on ice for 10 min. The cell lysate was carefully layered onto a cold sucrose buffer (10mM Tris pH7.5, 150mM NaCl, 24% sucrose W/V) and the cytoplasm fraction (supernatant) was removed after centrifugation. The nuclear pellet was gently resuspended in cold glycerol buffer (20mM Tris pH 7.9, 75mM NaCl, 0.5mm EDTA, 50% glycerol, 0.85 mM DTT) and lysed with nuclear lysis buffer (20mM HEPES pH7.6, 7.5mm MgCl2, 0.2mm EDTA, 0.3m NaCl, 1M urea, 1% NP-40, 1mM DTT). Following a quick centrifugation, the chromatin was isolated from the pellet fraction.
The chromatin-associated RNA was isolated with the standard Trizol protocol (Invitrogen, cat. no. 15596018). Genomic DNA was removed following the protocol of DNase I treatment (Thermo, EN0521). Chromatin RNA-seq (ribo-depleted) libraries were produced with a VAHTS Universal V8 RNA-seq Library Prep kit for MGI (Vazyme, NRM605-02). ChrRNA-seq libraries were quantified with Bio-Fragment Analyzer (Bioptic, c100001) and sequenced on MGI 2000 platform (MGI-SEQ, BGI).
Raw reads of chromatin RNA-seq were processed with Trim Galore (v0.6.6)
(https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) to remove adaptors and low-quality reads with the parameter “-q 30”. Ribosomal RNA contaminations were removed using Bowtie (v1.3.0)66 with “--un”. Remaining reads were aligned to the human genome (UCSC hg38) with STAR (v2.7.9a)67, which were converted to BAM file and sorted with SAMtools (v1.7)68. The quantification of gene and PROMPTs counts matrix was carried out with the featureCounts tool from Subread (v2.0.1)69. Differential expression analysis and normalization of gene and PROMPTs counts matrices were performed using the DESeq2 R package (v1.34.0)70. Biological replicates BAM files were merged by SAMtools (v1.7)68. Bigwig file were generated and normalized using merged BAM files by bamCoverage of deepTools (v3.5.1)71 with parameter "--normalizeUsing RPKM" (RPKM = Reads Per Kilobase per Million mapped reads), and visualized by Integrative Genomics Viewer (IGV) (v2.9.4)72.
Chromatin immunoprecipitation sequencing (ChIP-seq) and data processing
About 1 x 107 cells were cross-linked with 1% formaldehyde for 10 min and quenched in 125 mM of glycine for 5 min at room temperature. After washing twice with ice cold PBS, the cells were re-suspended in cold ChIP buffer (150 mM NaCl, 1% Triton X-100, 0.7% SDS, 500 mM dithiothreitol, 10 mM Tris-HCl and 5 mM EDTA with fresh protease inhibitors) on ice for 20 min. Chromatin shearing was performed using Covaris ME220 ultrasonic generator. After clearance of the sonicated chromatin by centrifugation at 14,000 rpm for 10 min, chromatin fragments were immuno-precipitated overnight at 4ºC with 2–4 µg of appropriate antibodies and 30 µL of Dyna Protein A or G beads (Invitrogen,11204D or 11202D). The next day, beads were washed twice with cold Mixed Micelle Buffer (150 mM NaCl, 1% Triton X-100, 0.2% SDS, 20 mM Tris-HCl, 5 mM EDTA, and 65% sucrose), twice with cold Buffer 500 (500 mM NaCl, 1% Triton X-100, 0.1% Na deoxycholate, 25 mM HEPES, 10 mM Tris-HCl, and 1 mM EDTA), twice with cold LiCl/detergent Buffer (250 mM LiCl, 0.5% Na deoxycholate, 0.5% NP-40, 10 mM Tris-HCl, and 1 mM EDTA) and one wash with 1 x cold TE buffer. The immunoprecipitated chromatin fragments were eluted with 1 x TE buffer containing 1% SDS and incubated overnight at 65°C to reverse crosslinks. The chromatin fragments were treated with 0.5 mg/mL proteinase K for 3 h. DNA was purified by phenol/chloroform and precipitated with isopropanol with Glyco-Blue (Invitrogen, AM9516). The DNA libraries were constructed with the VAHTS Universal DNA Library Prep Kit for MGI (Vazyme, NDM607-02) and sequenced on MGI 2000 instrument (MGI-SEQ).
Raw reads were first processed as described above and mapped to the human genome (UCSC hg38) using Bowtie2 (v2.3.4.1)66 with default parameters. Duplicate reads were removed by Picard (v2.25.7) (https://broadinstitute.github.io/picard/) and Bigwig files were generated with deepTools71 and normalized with parameter “--normalizedUsing RPGC” and “--effectiveGenomeSize 2913022398”. Peak calling was performed with MACS2 (v2.2.7.1)73 with default parameters.
4sU-seq and TT-seq and data processing
Detailed procedures of 4sU-seq and TT-seq were followed with previous published papers 74,75. For 4sU-seq, cells were treated with 4-thiouridine (4sU) at a final concentration of 2 mM for 15min, and total RNA was extracted by Trizol (Invitrogen, cat. no. 15596018). The mixture of 80 µg of total RNA and 2 µg 4sU-labeled RNA of Drosophila S2 cells (as spike-in) was biotinylated in 4sU-seq biotinylation mix at room temperature for 2 h. Subsequently, RNA was purified by chloroform extraction twice, and precipitated with isopropanol. After denaturation for 10min at 65°C, RNA was incubated with 50 µl pre-washed streptavidin magnetic C1 beads (Invitrogen, cat. no. 65002), with gentle rotation for 15 min at room temperature. Beads were washed five times with beads wash buffer (1 M NaCl, 5 mM Tris-HCl pH = 7.5, 0.5 mM EDTA, 0.05% Tween 20). The 4sU-labeled RNA was eluted with 100 mM DTT, and purified by kit (ZYMO RESEARCH, Cat R1016). RNA library was constructed with VAHTS Universal V8 RNA-seq Library Prep kit for MGI (Vazyme, NRM605-02). The quality of library was monitored by Bio-Fragment Analyzer (Bioptic, c100001) and library was sequenced with MGI 2000 instrument (MGI-SEQ, BGI). For TT-seq, cells were treated with 2 mM 4sU for 7 min and quenched with direction RNA extraction by Trizol reagent (Invitrogen, cat. no. 15596018). The mixture of 80 µg total RNA and 2 µg 4sU-labeled RNA of Drosophila S2 cells (as spike-in) was fragmented using 20 µl 1 M NaOH for 23 min on ice, and neutralized with 80 µl 1 M Tris-HCl (pH = 6.8). RNA was biotinylated in TT-seq biotinylation mix for 30 min at room temperature.
Raw reads were processed as described above and mapped to the human genome and drosophila genome (UCSC dm6) using STAR67 with parameter “--outFilterMultimapNmax 1” to remove multi-mapped reads. Low mapping quality (MAPQ lower than 30) and duplicate reads were further removed from BAM files by SAMtools68. The number of spike-in dm6 reads counted by SAMtools68 was used to calculate the normalization factor alpha = 1e6/dm6_count. Bigwig files were generated and normalized with merged BAM files by deepTools71 with scaling factors of spike-in. Gene expression quantification was performed with featureCounts69. Reads counts were normalized by both scaling factors of spike-in and gene length.
Lentiviral transduction
Lenti expression plasmids INTS11 or INTS11 (E203Q) were transfected with two helper plasmids (psPAX2 and pMD2.G) into HEK293T cells by Lipofectamine 2000 (Invitrogen,11668019). The fresh culture media were replaced and the viral supernatants were collected twice after 24 hours and 48 hours of transfection. The HCT116-AID cells were infected with virus for 70 hours and harvested 60 hours after the IAA treatment. The efficiencies of protein expression were measured by western blots with appropriate antibodies and quantitative RT-PCR for the products of transcription. All the antibodies and PCR primer sequences are listed in the supplementary table 1.
Antisense oligonucleotide transfection
The HCT116-AID cell line was cultured in 5% CO2 at 37℃. When cell density reached to 80–90%, 100nM gapmer ASO was transfected into cells by calcium transfection method. After 6–8 hours of transfection, fresh medium was replaced and IAA was provided at the same time. After 18 hours of transfection, Trizol (Invitrogen, cat. no. 15596018) was used to extract total RNA for RT-qPCR. ASOs used in this study to cleave PROMPTs of MYC, SRRT and RBM14 are 20 nucleotides in a standard sandwich structure (10 unmodified deoxynucleotides flanking by 5 MOE-modified ribonucleotides with phosphorothioate backbone)76. ASOs were solubilized in water (DNase-/RNase-free)
The sequence of the MYC ASO is 5′- TACTGCTACGGAGGAGCAGC-3′
The sequence of the RBM14 ASO is 5′-AATTAATGGCACGAGGGCTT-3′
The sequence of the SRRT ASO is 5′- TGTGCCTGGCCCTAAATATT-3′
The bold letters represent MOE-modified bases.
RNA immunoprecipitation (RIP)
About 1 x 106 cells were transfected with pCMV2-INTS1-FLAG and pCMV2-CPSF73-FLAG for 36 h and lysed with lysis buffer (50mM Tris-HCl pH 7.5, 150mM NaCl, 0.1% Triton, 1mM EDTA). The RIP experiment did not include any crosslinking steps. After centrifuging at 14000 rpm for 10 min, the supernatant was incubated with 50µL Anti-FLAG® M2 Magnetic Beads (Sigma-Aldrich) for 2 h at 4°C. The beads were washed three times with ice-cold lysis buffer and twice with ice-cold PBS. Immunoprecipitated RNA was extracted with Trizol (Invitrogen, cat. no. 15596018) reagent and used for qPCR assay.
Identification of active promoters and enhancers
Active promoters were defined within 500bp regions immediately upstream of TSS (transcription start site), which are overlapped with peaks called from RNAP II ChIP-seq. To identify active enhancers, we first selected genomic regions that contain RNAP II peaks and are at least ± 10kb away from any annotated gene. Next, we use ROSE (v0.1)77,78 to identify active enhancers and super-enhancers from those regions. Most BED files are processed using BEDtools79.
PROMPT-Finder
First, we set background area as intergenic regions which are 20kb and 10kb from the upstream or downstream of annotated genes (UCSC hg38). To generate the empirical distribution of ChrRNA-seq background signals, we randomly selected 10,000 windows (200bp) from background areas of each chromosome and calculated the chrRNA-seq density of each window, resulting in an empirical distribution function for each chromosome. Next, we used a sliding window (200bp in length, 10bp steps) to scan across the genome. ChrRNA-seq signals of each sliding window were evaluated with corresponding empirical distribution function (e.g., chromosome I). The probability of each window was further adjusted by false discovery rate (FDR). Windows with FDR > 0.05 were removed. Remaining windows in upstream antisense region of active promoters (50kb upstream and 2kb downstream of TSS) were merged if the gap between windows is less than 400bp. For a given PROMPT region, the portion overlapping with downstream active genes of the same direction were truncated. We eliminated the PROMPT region if length < 1kb. We next estimated the differential expression of these PROMPT regions between treatment and control by featureCounts (v2.0.1)69 and DESeq2 (v1.34.0)70. PROMPT regions were defined as FDR > 0.05 and FC > 2.
Transcript activity and RNAP II loading balance analysis
Reads in identified promoter regions and genes were quantified using featureCounts69. ChrRNA-seq reads counts were normalized with library size and region (gene or PROMPTs) length. TT-seq reads counts were normalized by scaling factors and region length. As chrRNA-seq and TT-seq are strand-specific, the total read counts were calculated as the sum of reads in PROMPT and gene regions. The total read counts in ChIP-seq were calculated from the end of the PROMPT region to the end of the gene. Then, the + IAA and CTRL were compared to identify the RNAP II loading and transcription activity changes in PROMPT, gene(pre-mRNA), and total.
U1 site prediction
Prediction of U1 snRNA recognition sites was performed as described80. The 5’ splice site motif was calculated in the known intron 5’ site (3nt in exon and 6nt in an intron) of the human genome (UCSC hg38). The motif of the 5’ splice site was used by FIMO81 to search for significant matches (P < 0.01). Matches were then scored by the maximum entropy model82. All annotated 5’ splice sites were also calculated with maximum entropy score to classify the predicted sites. Sites with scores larger than the median of annotated 5’ splice sites were classified as strong. Sites with scores lower than the median but higher than the first quartile were classified as medium.
Classification of genes by predicted U1 site and U1 score calculation
Active genes were removed if their upstream 5kb region overlapped with the putative promoter of annotated genes. For the remaining genes, we only took into account predicted U1 sites that were located upstream 2kb in antisense direction and calculated the distance between 1st U1 and TSS. For estimation of U1 distance in the antisense direction, genes were classified as “0-0.5kb”, “0.5-1kb” and “1-2kb” by the 1st U1 distance. For estimation of U1 abundance in the antisense direction, we only counted the number of U1 sites within 1kb upstream of TSSs. Genes were classified as “0,” “1,” and “2+” by the U1 number in the upstream antisense regions. For the estimation of PROMPT level, we took into account reads in 5kb regions in the upstream antisense direction with featureCounts69. Reads counts were normalized by both feature-length and scaling factors (TT-seq and 4sU-seq were spike-in) generated by deepTools71. U1 Scores were calculated as the following equation:
Visualization of data through Heatmaps, average line plot, boxplots, and violin plots
The heatmap plots of Bigwig files were generated using computeMatrix and plotHeatmap from deepTools71, while average line plots were created with computeMatrix and plotProfile, and log2 scale plots were produced using computeMatrix and customized Python scripts. For chrRNA-seq, boxplots were constructed through featureCounts69 and DESeq270 for reads count and normalization, and R scripts were utilized for plotting. Meanwhile, boxplots for ChIP-seq were generated by featureCounts69 and normalized by library size count via SAMtools68. Python scripts are used for data visualization. Violin plots were created using featureCounts69 for reads count, normalized with scaling factors (or library size for ChIP-seq and chrRNA-seq), and plotted via customized Python scripts.
Statistical analysis
We used Mann-Whitney test throughout for high-throughput sequencing analyses.