Production of Sloppymerase
To purify Sloppymerase, an overnight culture was created from BL21 E. coli (Thermo Fisher Scientific) that have been transformed by heat-shock with the Sloppymerase vector (VectorBuilder). The bacteria were inoculated with 4 ml lysogeny broth (LB) containing 100 μg/ml ampicillin (Merck) at 37°C with 225 rpm. The next day the overnight culture was transferred to a 1 L flask with 200 ml LB containing 100 μg/ml ampicillin. The culture was grown at 37°C with 225 rpm to an OD600 of 0.5. The expression of Sloppymerase was induced with 0.2% L-arabinose (Merck) and allowed to grow at 16°C overnight with 225 rpm. The next day the cells were harvested in a Sorwall centrifuge for 15 mins at 6000 x g at 4°C in a GSA rotor. After centrifugation the cells were lysed with 20 ml binding buffer (50 mM sodium phosphate, 500 mM NaCl, pH 7.4, 1 mM MgCl2, 0.25 % Triton-X, 1x c0mplete protease inhibitor EDTA free (Merck), 0.2 mg/ml lysozyme (Thermo Fisher Scientific)) and incubated for 30 mins at 4°C. To clear the lysate, the samples were centrifuged for 15 mins at 13000 x g at 4°C and passed through a 0.45 mm Filtropur S syringe filter (Sarstedt). The His GraviTrap ™ Talon® (Cytivia) were equilibrated according to the producer’s manual before loading the samples. The columns were washed with 3 x 10 ml washing buffer (50 mM sodium phosphate, 500 mM NaCl, 5 mM imidazole, pH 7.4) and thereafter the samples were eluted with elution buffer (50 mM sodium phosphate, 500 mM NaCl, 50 mM imidazole pH 7.4). To exchange the buffer to the storage buffer (25mM Tris-HCl, 1 mM DL-Dithiothreitol (DTT), 0.2 mM EDTA (Merck), pH 7.4), the enzyme was concentrated down with Amicon® 10kDa Ultra Centrifugal filters (Merck) and reconstituted to the 2x sample volume with 2x storage buffer. This was repeated twice before the concentration was measured with nanodrop and glycerol (Merck) was added to a final concentration of 50%. To check the purity of the enzyme, samples from the purification were run on a NuPAGE Novex 4-12% Bis-Tris gel and thereafter stained with Coomassie Brilliant Blue G-250 Dye (Thermo Fisher Scientific) for 30 min. After destaining with water, the gel was scanned at 700 nm with an Odyssey® Fc imaging system (Li-Cor).
LC-MS Analysis
To confirm the amino acid sequence of the generated enzyme, a sample was prepared for LC-MS analysis. For filter-aided sample preparation, 20 µg of protein lysate were placed on the filter unit (Microcon-30 kDa; Merck, Darmstadt, Germany) and washed with a buffer containing 8 M urea and 100 mM Tris (pH 8.5). First 8 mM DTT were added followed by an incubation at 56 °C for 15 mins and then 50 mM of IAA. After 20 mins of incubating at room temperature, excess IAA was removed with 8 mM DTT (incubation at 56 °C for 15 min). After each incubation the sample was washed twice with Tris buffer. Finally, washing with NH4HCO3 was performed, trypsin was added [enzyme–protein ratio 1:50 (w/w)] and the samples were placed in a wet chamber at 37 °C. After incubation overnight, the resulting peptides were washed from the filter by adding 50 mM NH4HCO3 and centrifuging at 14,000g for 10 mins twice. Trifluoroacetic acid [final concentration of 1% (v/v)] was added, the samples were dried and reconstituted in a solution containing 3% acetonitrile and 0.1% formic acid in water to a final concentration of 150 ng protein/ µL.
For tryptic peptide analysis, a nanoAcquity UPLC system equipped with a C18, 5 μm, 180 μm × 20 mm trap column and a HSS-T3 C18 1.8 μm, 75 μm × 100 mm analytical column (Waters Corporation, Manchester, UK) was coupled to a Synapt G2 Si HDMS mass spectrometer with an electrospray ionization source (Waters Corporation, Manchester, UK). Mobile phase A contained 0.1% formic acid and 3% dimethyl sulfoxide in water and mobile phase B 0.1% formic acid and 3% dimethyl sulfoxide in acetonitrile. 300 ng of protein was injected in trapping mode. The peptides were separated at 40 °C with a gradient run from 3 to 40% (v/v) mobile phase B at a flow rate of 0.3 μl/mins over 120 min. Via the reference channel, a lock mass solution composed of [Glu1]-fibrinopeptide B (0.1 μM) and leu-enkephalin (1 μM) was introduced every 60 s. Peptide analysis was performed in positive ionization mode using the ultra-definition MSE (UDMSE) approach. The reproducibility and stability of the method were controlled with a commercially available HeLa digest (Thermo Scientific, Waltham, MA).
ProteinLynx Global Server (PLGS) (version 3.0.3, Waters Corporation, Milford, MA) was used for data processing. The samples were searched with a false discovery rate (FDR) of 0.01 against a randomized UniProt human database (Uni-ProtKB version 14/01/2020) with the addition of the sequence for the engineered enzyme. Search parameters were carbamidomethyl cysteine set as a fixed modification; acetyl lysine, C-terminal amidation, asparagine deamidation, glutamine deamidation, and methionine oxidation as variable modification; and trypsin as the digest reagent. One missed cleavage was allowed. Minimum peptide matches per protein were 2, and minimum ion matches per peptide and protein were 1 and 3, respectively.
Sloppymerase activity
To test the activity of the purified enzyme, an incomplete hairpin was used. Two oligonucleotides (hairpin A1 and A2, (Integrated DNA Technologies)) were ligated together by mixing 20 μM of each oligonucleotide and ligate the oligos for 48 hrs at 4°C end-over-end in 1x T4 DNA ligation buffer (50 mM Tris-HCl (pH 7.6), 10 mM MgCl2, 1 mM ATP, 1 mM DTT, 5% (w/v) polyethylene glycol-8000) and 0.1 U/μl T4 DNA ligase (Thermo Fisher Scientific). The reaction was heat inactivated at 65°C for 20 min. The incomplete hairpin was hybridized to a complementary oligonucleotide (hairpin A3) to create a nicked hairpin. 0.02 μM of the hairpin was then treated with Sloppymerase with either all four dNTPs: (0.1 mM dATP, 0.1 mM dCTP, 0.1 mM dGTP, 0.05 mM dTTP (Thermo Fisher Scientific) and 0.05 mM Biotin-11-dUTP (Jena Bioscience)) or omitting either dATP or dCTP in 1x Neb2.1 buffer (New England Biolabs) with 0.1 mM MnCl2 (Merck) and 0.035 μg/μl Sloppymerase. The samples were incubated at 37°C for 60 minutes unless stated otherwise and then heat inactivated at 75°C for 20 min. The samples were mixed to a final concentration of 1x with Novex™ TBE-Urea Sample Buffer (2X) (Thermo Fisher Scientific) and heated up to 95°C for 5 mins before running the samples on Novex™ TBE-Urea gels, 10% (Thermo Fisher Scientific) at denaturing conditions to evaluate DNA polymerase and 5’-3’ exonuclease activity. The gel was stained with SYBR™ Gold nucleic acid gel stain (Thermo Fisher Scientific) and scanned at 600 nm with an Odyssey® Fc imaging system (Li-Cor). For evaluating if Sloppymerase can incorporate biotin-dUTP, the gel was additionally stained with IRDye® 800CW Streptavidin (926-32230. LI-COR) in a final concentration of 0.2 μg/ml before scanning the gel at 800 nm.
DNA oligonucleotides:
Hairpin A1
|
CCCAAACCCAATTAATGTACTGCAGAATTCAGCTCGAAGCTTGG CCGGATCCGTGAGCTGTCGTC
|
Hairpin A2
|
/5Phos/TCAGATCGGATACGGCGACCACCGAGATCTA CACCCTGCGGGACACTCTTTCCCTACACGACGCTCT TCCGATCTGAGACGACAGCTCAC
|
Hairpin A3
|
CCGGCCAAGCTTCGAGCTGAATTCTGCAGTACATTAATTGGGTTTGGG
|
SSB labelling in fixed cells
The SSB labelling was conducted as described in Bivehed et al. 1. Briefly, HaCat cells were seeded in chamber slides, cultivated for 24 hours and thereafter fixed with ice-cold ethanol (70%) for 30 mins, followed by ethanol washes (96%-99.5%). To induce SSBs, the slides were subjected to 125 mU/μl Nt.BsmAI in 1x CutSmart buffer for 1 hour at 37°C followed by 2x5 mins washes with TBS. For SSB detection, the slides were incubated with 0,315 μg/μl Sloppymerase diluted in 1x NEB 2.1 buffer, 0.1mM MnCl2, 0.1 mM dATP/dGTP/dCTP, 0.08 mM dTTP, 0.02 mM Aminoallyl-dUTP-XX-AF555 for 60 mins at 37 °C in a moisture chamber. Thereafter, the slides were subjected to a series of washes: 3 times quickly rinsed with H2O, 2x5 mins with TBS-tween, 5 mins with 10 mg/ml Hoechst 33342 diluted in TBS-tween, and finally 2x5 mins with TBS. The slides were mounted with ProLong Glass antifade (Thermofisher Scientific) and cured overnight before being imaged. All experiments were conducted three independent times and at least 3 images were acquired per replicate.
Images were acquired with a Zeiss imager M2 microscope equipped with a Plan-Apochromat 63X/1.4 Oil objective, HXP 120 V light source, Hamamatsu C11440 camera and Zen 2 software (blue edition). Cube filter sets 43HE and 49 were used (all from Zeiss). Signal strength was equally enhanced for visualization purposes.
Sanger sequencing of Sloppymerase-treated Hairpin
For the sanger sequencing, 20 μM of hairpin S1, S2 and S3 were mixed together to create a nicked hairpin. The oligonucleotides were ligated overnight as described above. 0.02 μM of the hairpin was then treated with Sloppymerase with either all four dNTPs: (0.1 mM dATP, 0.1 mM dCTP, 0.1 mM dGTP and 0.1 mM dTTP or omitting either dATP or dCTP in 1x Neb2.1 buffer with 0.1 mM MnCl2 and 0.035 μg/μl Sloppymerase. The samples were incubated at 37°C for 60 mins and thereafter heat inactivated for 20 mins at 75°C. To prepare the sequences for PCR, adapter S1 was ligated over night at 4°C with 1 mM ATP, 0.1 U/μl T4 DNA Ligase and 0.3 U/μl T4 Polynucleotide Kinase (PNK) (New England Biolabs).
For the PCR 0.04 μM of treated hairpin was added to 1x Phusion Green HF buffer (Thermo Fisher Scientific), 0.2 mM dNTPs (Thermo Fisher Scientific), 0.5 μM of forward and reverse primers S1 with 0.02 U/μl of Phusion U Hot Start DNA Polymerase (Thermo Fisher Scientific). The hairpin was amplified with a thermocycler for 7 seconds at 98°C, 20 seconds at 60°C and 20 seconds at 72°C for 20 cycles. The PCR product was cloned with Zero Blunt™ TOPO™ PCR Cloning Kit (Thermo Fisher Scientific) for sequencing according to the manufacturer’s manual. Another PCR was performed with the cloning products by dipping a pipette tip in a single colony and then in a tube with master mix consisting of 1x Platinum II HS buffer, 0.2 mM dNTPs, 0.04 U/μl Platinum II HS polymerase (Thermo Fisher Scientific) and 0.2 μM forward and reverse primers S2. The reaction was amplified with a thermocycler for 15 seconds at 94°C, 15 seconds at 60°C and 15 seconds at 68°C for 30 cycles.
The PCR products were cleaned up with 0.8 U/μl exonuclease I and 0.03 U/μl shrimp alkaline phosphatase (New England Biolabs) for 30 mins at 37°C and heat inactivated at 80°C for 20 min. The samples were then prepared for sequencing according to the platform’s instructions for TubeSeq Service (Eurofins Genomics). The samples were sequenced on ABI 3730XL sequencing machines using cycle sequencing technology (dideoxy chain termination/ cycle sequencing).
DNA oligonucleotides:
Hairpin S1
|
CCCAAACCCAATTAATGTACTGCAGAATT CAGCTCGAAGCTTGGCCGGATCCAGCGTGGGACTGAGTC
|
Hairpin S2
|
/5Phos/GTCTCGTGTCTGTAAAAACGTACGTAGATGCCATT TCTAAAAAAACAGACACGAGACGACTCAGTCCCACGCT
|
Hairpin S3
|
CCGGCCAAGCTTCGAGCTGAATTC TGCAGTACATTAATTGGGTTTGGG
|
Adapter S1
|
/5Phos/CGCACTGAGACTGATATGTGAAAAATTAGATT GGATAACTGCGCAGAAAAACACATATCAGTCTCAGTGCG
|
Fwd primer S1
|
CTGCGCAGTTATCCAATCTAA
|
Rev primer S1
|
CGTACGTAGATGCCATTTCTA
|
Fwd primer S2
|
GTAAAACGACGGCCAG
|
Rev primer S2
|
CAGGAAACAGCTATGAC
|
Illumina sequencing of Sloppymerase-treated Hairpin
For the experiment with Illumina sequencing, the hairpin was generated by ligating hairpin I1, I2 and I3.
The ligation and Sloppymerase treatment of the hairpin were performed as described above. For the PCR a different set of adapters was used as each adapter contained a unique barcode for each sample. For the omitting dATP sample adapter I1 was ligated to the Sloppymerase treated hairpin, and for the sample with all four dNTPs adapter I2 was used.
To open up the hairpins, the samples were treated with 1 unit of Uracil-DNA Glycosylase (UNG) and 2 units of endonuclease IV (Endo IV) (Thermo Fisher Scientific).
The PCR was again performed as before, but with only 10 cycles followed by a cleanup and buffer exchange to Tris-EDTA (TE) buffer pH 8.0 (Thermo Fisher Scientific) with Amicon Ultra-0.5 Centrifugal Filter Unit (Merck) according to the manufacturer’s protocol. The samples were then diluted and prepared according to the sequencing platform’s requirements and were sequenced with Illumina. SNP&SEQ Technology Platform performed 75 cycles paired-end sequencing in a MiSeq Nano v2 flowcell.
DNA oligonucleotides:
Hairpin I1
|
CCCAAACCCAATTAATGTACTGCAGAATTCAG CTCGAAGCTTGGCCGGATCCGTGAGCTGTCGTC
|
Hairpin I2
|
/5Phos/TCAGAUCGGAATGATACGGCGACCACC GAGATCTACACCCTGCGGGACACTCTTTCCCTA CACGACGCTCTTCCGATCTGAGACGACAGCTCAC
|
Hairpin I3
|
CCGGCCAAGCTTCGAGCTGAATTCTGCA GTACATTAATTGGGTTTGGG
|
Adapter I1
|
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC AGACCAGCATCTCGTATGCCGTCTTCTGCTTG
|
Adapter I2
|
AGATCGGAAGAGCACACGTCTGAACTCCAGTC ACAAATACAGATCTCGTATGCCGTCTTCTGCTTG
|
STEEL-seq using Illumina sequencing
For Illumina sequencing, DNA has been extracted from TK-6 lymphoblast cells that have either been left untreated (WT) or irradiated (IR) to introduce DNA damage. Irradiation was performed by incubating 22.5106 TK6 cells, in a 15 ml Falcon tube on ice, with 10 Gy of 225 kV X-rays (X-RAD iR225, Precision X-Ray Inc., North Branford, CT, USA) at a dose-rate of 1.5 Gy/mins using an inherent Ba filter (0.8 mm) and an external Cu filter (0.3 mm). The DNA was extracted with DNeasy Blood & Tissue Kit (Qiagen) according to the manufacturer’s manual. 100 ng of the extracted WT DNA was additionally incubated with 20 U of Nt.BsmAI (New England Biolabs) and 1x Cutsmart buffer (New England Biolabs) to introduce nicks. After a buffer exchange to ddH2O with Zeba™ Spin Desalting Columns, 7K MWCO, 0.5 ml (Thermo Fisher Scientific), the samples were incubated with 1x Neb2.1 buffer, a mixture of dNTPs (0.1 mM dCTP, 0.2 mM dGTP, 0.1 mM dTTP), 0.1 mM MnCl2 (Merck) and 0.035 μg/μl Sloppymerase for 120 mins at 37°C. Thereafter, the samples were heat inactivated for 20 mins at 75°C. For tagmentation,10 ng of purified DNA/sample was added to a transposase mixture with 12.5 µL 2x TD buffer, 1.25 µL Tn5 transposase (2 µM), produced and assembled according to Picelli et al.2 and ddH2O to bring the reaction volume up to 50 µL. The Tn5 adapters used were:I3, I4 and I5.
The transposase mixture was incubated at 55 ℃ for 7 minutes and purified using the QIAquick PCR cleanup kit (QIAGEN), following manufacturer’s protocol and eluted in 20 µL ddH2O. The purified DNA was added to 25 µL NEBNext High-Fidelity 2x PCR Master Mix (New England Biolabs), 2.5 µL of forward primer: I1 (25 nM) and 2.5 µL of reverse primer: I1, I2 or I3 (25 nM). It was amplified with a thermocycler following 72 ℃ for 5 minutes, 98 ℃ for 30 seconds and thermocycling at 98℃ for 10 seconds, 63 ℃ for 30 seconds and 72 ℃ for 1 minute for 9 cycles. After amplification, the samples were purified using SPRIselect beads (Beckman) at a 1:1 ratio, following the manufacturer’s protocol for the Left workflow. The purified samples were analyzed with the 2200 TapeStation System (Agilent) and kept at -20 ℃ for sequencing. The samples were then sequenced with Illumina NovaSeq 6000. SNP&SEQ Technology Platform performed 150 cycles paired-end sequencing in one lane of a SP flowcell.
DNA oligonucleotides:
Adapter I3
|
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
|
Adapter I4
|
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
|
Adapter I5
|
CTGTCTCTTATACACATCT
|
Fwd primer I1
|
AATGATACGGCGACCACCGAGATCTACACCCTGCGGG TCGTCGGCAGCGTCAGATGTGTAT
|
Rev primer I1
|
CAAGCAGAAGACGGCATACGAGATC TGTATTTGTCTCGTGGGCTCGGAGATGTG
|
Rev primer I2
|
CAAGCAGAAGACGGCATACGAGATGAC CCAAGGTCTCGTGGGCTCGGAGATGTG
|
Rev primer I3
|
CAAGCAGAAGACGGCATACGAGATT TTAACGCGTCTCGTGGGCTCGGAGATGTG
|
STEEL-seq using PacBio sequencing
To test the functionality of the STEEL method with PacBio sequencing, extracted and purified DNA from TK6 lymphoblast cells was treated with Sloppymerase and the DNA was sequenced. The DNA was extracted with DNeasy Blood & Tissue Kit (Qiagen) according to the manufacturer’s manual. Before treatment 8,7 ug of purified DNA from untreated wild type TK-6 cells was incubated with 25 units of the nickase Nt.BsmAI for 45 mins at 37°C and then heat inactivated for 20 mins at 65°C to introduce nicks. The reaction was prepared in 1,5 ml eppendorf tubes with 1x Neb2.1 buffer (New England Biolabs), a mixture of dNTPs (0.1 mM dCTP, 0.2 mM dGTP, 0.05 mM dTTP) and 0.05 mM Biotin-11-dUTP, 0.1 mM MnCl2 and 0.035 μg/μl Sloppymerase. The DNA was added to the reaction at a concentration of 130 ng/μl. The sample was then incubated at 37°C for 90 mins followed by heat inactivation for 20 mins at 75°C. To fill any potential gaps, the sample was treated with 10 U Klenow fragment (3’-5’ exo-) (New England Biolabs) after adding 0.2 mM dATP, 0.1 mM dCTP and 0.15 mM dTTP to account for different concentrations of dNTPs. The sample was incubated for 30 mins at 37°C, followed by heat inactivation for 20 mins at 75°C
The sample was digested with a combination of EcoRI-HF, BamHI-HF, NcoI-HF and HindIII-HF (New England Biolabs) (10 U/enzyme) for 60 mins at 37°C followed by a buffer exchange to ligation buffer with Amicon Ultra-0.5 Centrifugal Filter Unit according to the manufacturer’s protocol. In the next step, a mixture of adapters (P1, P2, P3, P4, P5 and P6) with unique overhangs matching the different restriction sides and 30 U of T4 DNA ligase was added. The sample was then incubated overnight end over end at 4°C.
To extract the manipulated DNA fragments containing biotinylated dUTPs, a Dynabeads™ kilobaseBINDER™ Kit (Thermo Fisher Scientific) was used with a 3x higher concentration of beads than the manufacturer’s recommendation to ensure a high yield, otherwise the beads were prepared and washed according to the original protocol. The solution with biotinylated DNA was incubated with the beads end-over-end overnight at 4°C and an additional 2 hrs at RT the next morning. The bead solution was washed according to the manufacturer’s protocol before they were eluted in ddH2O at 75°C for 5 min. The concentration of the recovered DNA was measured with nanodrop for PCR amplifciation. The PCR reaction was mixed according to the manufacturer’s protocol with 175 ng of DNA, 0.2 mM dNTPs, 0.5μM forward primer P1 and reverse primer P2, 1,5 mM MgCl2 and 1,25 U Taq polymerase.
The DNA was amplified in a thermocycler for 20 cycles at 94°C for 45 seconds, at 51°C for 30 seconds and 2.5 mins at 72°C. The samples were run on a Novex™ TBE Gels, 10% (Thermo Fisher Scientific) for quality control followed by PCR clean up with MinElute PCR Purification Kit (Qiagen) according to the manufacturer’s protocol and eluted in nuclease free water. After purification, the samples were run on an agarose gel and the concentration was measured with nanodrop before they were sent to PacBio sequencing. The sequencing was performed by SNP & Seq technology platform using PacBio Revio system to generate 1 million HiFi reads.
DNA oligonucleotides:
Adapter P1
|
/5Phos/AATTGTTCCCTACACGGACTGAATACTCTGGCCGTCGTTTTAC
|
Adapter P2
|
/5Phos/GATCGTTCCCTACACGGACTGAATACTCTGGCCGTCGTTTTAC
|
Adapter P3
|
/5Phos/CATGCTTCCCTACACGGACTGAATACTCTGGCCGTCGTTTTAC
|
Adapter P4
|
/5Phos/AGCTGTTCCCTACACGGACTGAATACTCTGGCCGTCGTTTTAC
|
Adapter P5
|
/5Phos/CTTCCCTACACGGACTGAATACTCTGGCCGTCGTTTTAC
|
Adapter P6
|
CAGGAAACAGCTATGACAGTATTCAGTCCGTGTAGGGAAC
|
Fwd primer P1
|
GTAAAACGACGGCCAG
|
Rev primer P1
|
CAGGAAACAGCTATGAC
|
STEEL-seq using Nanopore sequencing
DNA extracted from HaCat keratinocyte cells was either left untreated, or nicked with 10 U of Nt.BsmAI for 60 mins at 37°C and thereafter heat inactivated for 20 mins at 65°C. The DNA was added to 1x Neb2.1 buffer, a mixture of dNTPs (0.2 mM dCTP, 0.2 mM dGTP, 0.15 mM dTTP) (Thermo Fisher Scientific) and 0.05 mM Biotin-11-dUTP (Jena Bioscience), 0.1 mM MnCl2 (Merck) and 0.035 μg/μl Sloppymerase for 60 mins at 37°C and heat inactivated for 20 mins at 75°C. To fill any remaining gaps, the samples were then treated with 10 U Klenow Fragment (3'→5' exo-), 60 U T4 DNA ligase and 0.1 mM dATP for 60 mins at 37°C, followed by heat inactivation for 20 mins at 75°C. The DNA was fragmentized with 20 U of PmeI and 40 U of PmlI (New England Biolabs) for 60 mins at 37°C with 1 volume of 1x cutsmart buffer and heat inactivated for 20 mins at 65°C. The samples were then sequenced with Oxford Nanopore. The sequencing was performed by SNP&SEQ technology platform using a PromethION system and two flow cells.
Data analysis
Sequence Analysis of Sloppymerase treated Hairpin
Comparison of sample sequences to the reference was performed using a custom Python script. Reads were filtered by matching to a primer sequence and the number of errors from the template. Sequence alignment was performed through the Needleman-Wunch global fit algorithm 3.
Alignment of STEEL-seq reads
GRCh38 was used as reference sequence. Illumina reads were aligned with strobealign 4 using default settings. Nanopore and PacBio HiFi reads were aligned with minimap25 using -ax map-ont and -ax map-hifi, respectively.
Break detection
We developed a Python script that finds potential SSB sites in aligned reads from STEEL-seq libraries prepared without dATP in the reaction mix. The main signature of Sloppymerase activity in this setting is substitution of adenine with a different nucleotide – typically guanine, but we do not impose this as a restriction. The script thus starts by searching each aligned read for regions in which all adenines appear mutated (substituted or deleted). As also the reverse strand may have been sequenced, it also searches for regions where all thymines appear mutated. We call these regions events below. Supplementary alignments are ignored in the current version of the script. Used software libraries include pysam (https://github.com/pysam-developers/pysam) and pyfaidx (https://dx.doi.org/10.7287/peerj.preprints.970v1).
Filtering
Since sequencing errors can easily conspire to look like Sloppymerase activity, we apply the following filters to ensure we remove most false positives.
1) Aligned reads with an overall mutation rate (counting substitutions, insertions and deletions) above 10% are removed. Since both sequencing errors and mutations caused by Sloppymerase are counted, this threshold is set to a value well above the typical sequencing error rate.
2) An event is only kept if the number of mutated adenines (or thymines when on the reverse strand) is at least five for PacBio HiFi and Nanopore reads, three for Illumina reads.
3) The average base quality of substituted bases must be at least 10.
4) For Nanopore reads, at least three of the mutations in an event must be substitutions (i.e., not deletions).
Choice of which filters to implement and the values for filtering thresholds were informed by comparing the script output to a list of manually annotated ground truth events on a 1 Mbp region of a nickase-treated sample.
Output
The script computes the regions within which the SSB site must be located. These are between the last unmutated adenine upstream of the mutated region and its first mutated nucleotide for the case of mutated adenines, and between the last mutated nucleotide and the first unmutated thymine downstream of the region for the case of mutated thymines. The output is a BED track with extra annotations in a format thatIGV 6 understands and displays (including read name, mapping quality, base qualities). In a postprocessing step, the BED file is sorted using BEDTools 7 and duplicate entries are removed. Additional output is a BAM file with the subset of the input reads on which events were detected.
Single Strand Breaks Annotations
SSB hits from the Illumina sequencing were deduplicated based on the position. PacBio and NanoPore data where not deduplicated as they correspond to unique break events.
Non-canonical chromosomes were filtered out before annotation. For Illumina data, non-irradiated and non-treated cells data were used as references and, Sloppymerase treated samples were filtered using the references. The ChIPseeker, a R Bioconductor package was used to annotate genomic regions (promoter, 5’UTR, 3’UTR, exon, intergenic regions) and visualize forward and reverse strand breaks 8. Notebooks for data exploration, preprocessing and annotation are available on github (https://github.com/barslmn/sloppymerase-annotations).
References (Method section)
1. Bivehed, E. et al. Visualizing DNA single- and double-strand breaks in the Flash comet assay by DNA polymerase-assisted end-labelling. Nucleic Acids Res 52, e22 (2024).
2. Picelli, S. et al. Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome Res 24, 2033-2040 (2014).
3. Needleman, S.B. & Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48, 443-453 (1970).
4. Sahlin, K. Strobealign: flexible seed size enables ultra-fast and accurate read alignment. Genome Biol 23, 260 (2022).
5. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094-3100 (2018).
6. Robinson, J.T. et al. Integrative genomics viewer. Nat Biotechnol 29, 24-26 (2011).
7. Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841-842 (2010).
8. Yu, G., Wang, L.G. & He, Q.Y. ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 31, 2382-2383 (2015).