Generation of sgRNA-encoding plasmids
Target sequences were designed using Cas-Designer56 to avoid off-target sequence (up to 2 mistmaches). The list of oligomers for target sequence is in Supplementary Table 1. pRG2 GG expression vector was digested using Bsa1 restriction enzyme. sgRNA oligos, with overhangs complementary to the digested vector, were ordered from Macrogen (Korea) and Cosmogenetech (Korea). These oligos—comprising both the upper and lower strands—were then annealed to produce double-stranded oligo deoxynucleotides (dsODN). The annealed dsODNs were ligated into the digested expression vector using T4 DNA ligase (Enzynomics) and incubated for 1h at room temperature. The ligation mixture was transformed into DH5a competent cells using the heat-shock method and cultured overnight at 37°C. Individual colonies were selected and grown in LB media for 16 hours at 37°C in a shaking incubator. The plasmids were isolated using Exprep™ Plasmid SV kit (GeneAll).
Cell culture and transfection for cancer cell lines and fibroblasts
HeLa (ATCC®, CCL-2™), HEK293T (ATCC®, CCL-3216™) cells, and U2OS (ATCC HTB-96) cells were maintained in Dulbecco’s Modified Eagle Medium (DMEM) supplemented with 10% fetal bovine serum (FBS), 100 unit/mL penicillin, and 100 unit/mL streptomycin. K562 (ATCC CCL-243) cells were maintained in Roswell Park Memorial Institute (RPMI) 1640 Medium supplemented with 10% fetal bovine serum (FBS), 100 unit/mL penicillin, and 100 unit/mL streptomycin. Normal fibroblasts [ThermoFisher, Human Dermal Fibroblasts (C0045C)] were maintained in Dulbecco’s Modified Eagle Medium (DMEM) supplemented with 20% fetal bovine serum (FBS), 100 unit/mL penicillin, and 100 unit/mL streptomycin.
HeLa and HEK293T were transfected with Lipofectamin 2000 (Invitrogen). Before transfection, 1 × 105 cells from each well were seeded in 24-well plates. SpCas9 expression plasmids (750 ng) and sgRNA expression plasmids (250 ng) were mixed with 100 µl of Opti-MEM medium and 2 µl of Lipofectamin 2000 and incubated for 20 min at room temperature. The prepared mixture was added to the seeded wells. After 24 hours, the culture media were replaced with fresh media. U2OS and K562 were transfected with Neon transfection system (Invitrogen). cells (2.5 × 105) were transfected with 750 ng of Cas9 expression plasmids (750 ng) and sgRNA expression plasmids (250 ng) with the following parameters: 1050 V, 30 ms, 2 pulse for U2OS and 1350 V, 10 ms, 4 pulse for K562. Normal fibroblasts were transfected with Amaxa P3 primary cell 4D-nucleofector kit using program DS-137. All cells were analyzed 3 days after transfection.
Cell culture and transfection for H9 cells
H9 human embryonic stem cells were maintained in Essential 8 (E8) medium (Gibco A1517001) on iMatrix-511 (Matrixome, 892 021). The dissociation of H9 cells into clusters for subculturing was facilitated using ReLeSR (Stemcell Tech., 05873). Subsequently, the cells were transferred and replated in E8 medium supplemented with p160-Rho-associated coiled-coil kinase (ROCK) inhibitor Y-27632. Before electroporation, TrypLE (Gibco, 12604013) was used to generate a suspension of single cells. Cells (1 × 105) were electroporated with 250 ng of sgRNA-encoding plasmid and 750 ng of Cas9 expression plasmid using a NEON system (ThermoFisher) at 1050 V for 30 ms (two pulses). Cells were then seeded in 48-well plates in E8 supplemented with Y-27632 (10 µM) for 24 hours. After three days of culturing, gDNA was isolated.
Generation of Cas9 ribonucleoprtein (RNP) complexes for CRISPRi screening
The puromycin targeting sgRNA was synthesized by in vitro transcription using T7 RNA polymerase (NEB) and template oligos, and the sgRNA product was purified using RNeasy Mini Kit (Qiagen). Streptococcus pyogenes Cas9 (SpCas9) was ordered from Enzynomics. To generate Cas9 RNP complex, SpCas9 and sgRNA were mixed in a ratio of 1:3 and incubated at room temperature for 30 min. These Cas9 RNP complexes were added to the CRISPRi-stable HeLa cell line after lentiviral transduction and puromycin selection for CRISPRi screening.
Isolation, culture and editing of human primary T cells
Whole blood samples from healthy donors were taken under a protocol approved by the committee of Asan Medical Center. Peripheral blood mononuclear cells (PBMCs) were isolated from the whole blood samples using SepMate PBMC isolation tubes (STEMCEL). The PBMCs were further processed to isolate human primary T cells using MACS based PAN-T isolation kits (Miltenyi Biotec). RPMI-1640 (Gibco) supplemented with fetal bovine serum (10%, gibco), GlutaMAX (2 mM, gibco), sodium pyruvate (1 mM, gibco), non-essential amino acids (0.1 mM, gibco), beta-mercaptoethanol (55 µM, gibco), HEPES (10 mM, sigma) and penicillin-streptomycin (1%, gibco) was used to culture human primary T cells with IL-2 (300 IU/ml, BMI KOREA).
For gene editing, the human primary T cells were stimulated with Dynabeads human T-Activator CD3/CD28 (Thermo Fischer Scientific) at a cell-to-bead ratio of 1:1 for 48 h. After separating T cells from beads using a magnet, the stimulated T cells were electroporated with Neon transfection system (Thermo Fisher). Briefly, 5 µg of recombinant Cas9 (Enzynomics) and 5 µg of in vitro-transcribed sgRNA was incubated at 37°C for 10 min to form Cas9 RNP complex, immediately before electroporation. Assembled CRISPR RNPs were added to 0.5 million of activated human T cells resuspended in T buffer and electroporated with a Neon electroporation device (1400 V, 10 ms, 3 pulse). Electroporated cells were transferred into culture vessels containing culture medium without antibiotics. One day after electroporation, culture medium was changed into fresh medium containing antibiotics and cells were maintained at a concentration of approximately 1 million cells per ml of medium.
Long-range amplicon sequencing
The PCR primers were designed using Primer3Plus57 (https://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi) or Primer-BLAST58 (https://www.ncbi.nlm.nih.gov/tools/primer-blast). The targeted region (~ 8 to 15 kb) in gDNA was amplified using KOD multi & epi DNA polymerase (TOYOBO) according to the manufacturer’s protocol. For sequences that were difficult to amplify, primers were replaced or gDNA was re-extracted. Amplified products (1 µg) were purified using AMPure XP bead-based reagent (Beckman Coulter) with 0.95X. The purified DNA samples were fragmented to ~ 300 bp with M220 Focused-ultrasonicator (Covaris) according to the manufacturer’s protocol. The fragmented samples were purified with Expin™ PCR SV kit (GeneAll) and prepared as an NGS library with NEBNext® Ultra™ II DNA Library Prep Kit for Illumina® (NEB). The prepared samples were sequenced with MiniSeq High Output Reagent Kit (300-cycles) using MiniSeq to obtain about ~ 400,000 to 500,000 reads. The NGS data FASTQ files from CRISPR-treated and non-treated cells were analyzed using the k-mer alignment program.
Developing k-mer alignment program
We developed a k-mer based algorithm for detecting CRISPR-induced DNA alterations without requiring a supercomputer and that can run on a personal computer. The k-mer alignment program is efficient to run with limited computational resources. For the cases investigated here, we used a personal computer with (16 GB) memory and (3.4 GHz / 8 cores) CPU. Our software program accepts as input a 10-kbp reference sequence that includes the cleavage site and the paired-end sequencing data in FASTQ format from both CRISPR-treated and non-treated samples. The output quantifies CRISPR-treated large deletions and small indels and provides read alignment results. The alignment program consists of 3 steps: (i) the short read alignment step, (i) short read classification step, and (iii) removal of false-positive large deletions (Supplementary Fig. 4).
The first step is the short read alignment task based on a k-mer hash table that is constructed using a reference genome. This hash table is used to obtain positions on the reference genome of (l-k + 1) k-mers for a given read of length l. To determine alignment of given input, our program identifies the longest region of consecutive overlapping k-mers on the reference sequence through the Longest Increasing Subsequence (LIS) algorithm.
The second step is the short read classification task. Based on the read alignment results, our program categorizes the read pairs into three distinct classes, (i) all read pairs are skewed to the cleavage site, (ii) all read pairs are mapped without splitting and the cleavage site passes between reads, and (iii) one of the reads is split and passes through the cleavage site. Read pairs in category (i) are considered wildtype. Read pairs in categories (ii) and (iii) are considered as potential CRISPR-derived variant candidates and are compiled into a candidate list.
The third step involves the elimination of false-positive large deletions. It is possible for variant candidate read pairs to be present in non-treated samples, primarily due to inherent biases associated with PCR amplification. This phenomenon is not specific to CRISPR-treated samples, but it is a consequence of the characteristics of the reference sequence and affects both CRISPR-treated and non-treated datasets. To reduce the influence of false positives, our program performs a mapping of candidate read pairs to the left and right positions within the reference sequence and identifies clusters of these candidates by implementing k-means clustering. The appropriate value for k (the number of clusters) is determined using the Bayesian information criterion, which enables the automated selection of the optimal k selected based on the characteristics of the dataset. After cluster selection, those clusters that are common to both CRISPR-treated and non-treated datasets are categorized as a result from PCR bias rather than CRISPR-induced variation. These co-occurring clusters are then excluded from the candidate list. Finally, the remaining read pairs in the candidate list are identified as CRISPR-derived variants.
Read pairs with a deletion length of 50 base pairs or more within the curated list of candidate variants are determined as a large deletion. Read pairs with insertions or deletions within a range of ± 13 base pairs around the cleavage site are categorized as small indels. To quantify the identified reads, our program enumerates the occurrence of normal mappings, small indels, and large deletions within read pairs passing through the cleavage site (± 13 base pairs). Our program then calculates the relative frequencies at which large deletions, small insertions, and small deletions occur. Let \({N}_{T}\) be the total number of reads passing the cut site (± 13 base pairs), including large deletions. \({N}_{L}\) is the number of large deletions. \({N}_{S}, {N}_{D}\) are the number of reads for small insertions and small deletions. The relative frequency of large deletions calculated by \({N}_{L}\)/\({N}_{T}\). Small indels are calculated by \({N}_{S}+ {N}_{D}\)/\({N}_{T}\). Deletion analysis was performed with the developed k-mer alignment program using a local computer version with an added web front-end.
CRISPRi library construction
Our custom CRISPRi library was constructed using pAX198 (Addgene #173042) that includes pU6-sgRNA-EF1a-Puro-T2A-BFP. Using the Repair-seq dataset and genes associated with DNA repair in the Human Protein Atlas website, we selected 794 genes associated with DNA repair and categorized them into specific repair processes or pathways. For each gene, three CRISPRi gRNA sequences were sourced from the hCRISPRi-v2.1 library developed by the Weissman group (Supplementary Table 2) and 60 non-targeting gRNAs from Repair-seq were added in the CRISPRi library. The oligonucleotide library was procured from GenScript (USA) and subsequently amplified employing the Phusion® High-Fidelity DNA Polymerase (NEB). The amplified product and pAX198 plasmids were digested using FastDigest Bpu1102I and FastDigest BstX1 (ThermoFisher). Desired DNA fragments from the digested pAX198 were isolated through a 1% agarose gel and subsequently purified using the Expin™ Gel SV kit (GeneAll). The digested oligo library was separated and the required DNA sequence was extracted from a 10% PAGE gel. This gel-extracted DNA was then purified using isopropanol precipitation. The digested oligo library and plasmid backbone were ligated using T4 DNA ligase. The ligation mixture was then purified with AMPure XP beads. The ligated product was transformed into MegaX DH10B T1R Electrocomp™ Cells (ThermoFisher) using MicroPulser Electroporator (BioRad). After confirming more than 90,000 colonies, the plasmid library was obtained using NucleoBond Xtra Midi EF kit (Macherey-Nagel). The oligo library was confirmed using nested PCR and Illumina sequencing.
Lentivirus preparation
Lentivirus was produced in the Lenti-X 23T cell line (Takara Bio). Transfection was carried out with psPAX2 (Addgene #12260), pMD2.G Addgene #12259), and library plasmids using polyethyleneimine (Sigma-Aldrich). The cultured medium was replaced one day post-transfection. Two days later, lentivirus-containing medium was harvested and filtered through a 0.45-µm syringe filter. The lentivirus was concentrated using the Lenti-X concentrator (Takara Bio). The viral titer was determined by performing lentiviral transductions at varying concentrations in a 48-well plate format. After titration, the lentiviral library was aliquoted and stored at -80°C.
CRISPRi screening with Nanopore sequencing
HeLa CRISPRi cells were generated by lentiviral integration (~ 3 to 5 MOI) using the dCas9-KRAB-blast plasmid (Addgene #89567), followed by single cell isolation. Prior to lentiviral transduction, HeLa CRISPRi cells were cultured at a density of 5 × 105 cells in a 100-mm dish. The following day, the sgRNA library lentivirus was added to the cultured HeLa CRISPRi cells in the presence of 8 µg/mL polybrene. After 24 hours, the culture medium was replaced with fresh medium. Another 24 hours later, cell selection was initiated with 2 µg/mL puromycin and continued for 2 to 3 days. Post-selection, the culture medium was replaced with fresh medium and the cells were cultured for 6 to 8 days to allow gene repression to occur. Subsequently, Cas9 RNP complex targeting the transduced puromycin resistance gene was transfected into the sgRNA library transduced CRISPRi stable cell line using Neon transfection system (Fig. 3a). Three days post-transfection, gDNA was extracted using the NucleoSpin Blood XL, Maxi kit (Macherey-Nagel). Cell culture was performed whenever the cells had grownto 90% of the cell plate.
Half of the extracted gDNA was amplified to generate fragments of ~ 5 to 6 kb using the KOD multi & epi DNA polymerase. These amplified fragments were then purified with AMPure XP beads. The purified samples were sequenced on MinION (Oxford Nanopore) using ligation sequencing kit V14 (Oxford Nanopore) and MinION flow cell R10.4.1 (Oxford Nanopore) according to the manufacturer’s protocol. The sequencing process ran at a speed of 260 bps, and base calls were made on the resulting data using guppy (Oxford Nanopore) with super high accuracy mode.
Analysis for CRISPRi screening with Nanopore sequencing
Fastq files were aligned to the reference genome using the guppy aligner with default settings. To identify gRNA sequences within the sequencing data, we utilized BWA-mem. A gRNA reference FASTA file was constructed by appending 10 bp from the reference sequence to both ends of the gRNA sequence and the gRNA reference FASTA file was indexed using BWA. The gRNA sequence of each read was obtained and aligned with BWA-mem, applying the parameters “-k10 -A4 -B2 -O2”. The sequencing results were saved as files according to gRNA. To ensure data accuracy, we discarded reads where sequences downstream of both the gRNA and the the Blue fluorescent protein (BFP) did not align. If the deletion was more than 100 bp and the deletion spanned a region within 100 bp of the cleavage site, the deletion was classified as a large deletion mutation. Because Nanopore sequencing has a bias depending on the length of the DNA fragment (Supplementary Fig. 1), the ratio of the length of the DNA fragment to the reference was used instead of the count so that the ratio would decrease as the length of the deletion became longer. Based on the results of 60 non_targeting gRNAs, the Z-Score of large deletion for each gRNA was calculated.
Knock-out cell line generation
To generate three different gene (LIG IV, POLQ, RAD52) knock-out HeLa cell line, 750 ng of Cas9 expression plasmid and 250 ng of sgRNA targeting upstream exon of each gene were transfected with 2µl of Lipofectamine 2000 reagent (Invitrogen) into HeLa cells. After 72 hours, CRISPR-treated HeLa cells were distributed as a single cell into each well of 96-well plates. The cell lines were cultured for two weeks, and each genotype was confirmed by an Illumina Miniseq instrument. The Miniseq results were analyzed using Cas-Analyzer (http://www.rgenome.net/cas-analyzer/)59. For the complete knock-out, the cell line harboring a frameshift mutation in both alleles was selected.
M4344 toxicity and effect on CRISPR-induced large deletion events
To inhibit the ATR protein in HeLa cells, we used M4344 (Selleckchem, S9639) according to manufacturer’s protocol. HeLa cells (1 × 105) were seeded in 24-well plates. After 24 hours, the cells were exposed to M4344 (1nM, 5nM, 10nM, 25nM and 50nM) for 1 hour after which 750 ng of Cas9 and 250 ng of sgRNA expression plasmids were transfected with 2 µl of lipofectamine 2000 reagent (Invitrogen) into HeLa cells. After 72 hours from transfection, the cells are detached for genomic DNA extraction. Since the chemical was treated to the cells, the cells were maintained with chemical-containing media.
Microhomology-dependent deletion events
The alignment information about deletion reads was extracted from alignment results SAM files from k-mer alignment analysis program. Deletion position and the near sequence were calculated using the alignment information. The homology length was calculated by comparing the both sequences in 1bp increments from the start or end position of the deletion sites. If the homology length is 2–16 bp, the reads were categorized as microhomology-dependent.
Transfection for base editing and prime editing
HEK293T cells (1 x 105 cells per well) were cultivated in a 24-well plate for 24 hours. A mixture of 0.5 µl jetOPTIMUS reagent (Polyplus, 101000006), 500ng plasmid DNA (375ng BE expression plasmid and 125 ng sgRNA expression plasmid) or 543 ng plasmid DNA (365 ng PE expression plasmid, 125 ng pegRNA expression plasmid and 43 ng ngRNA expression plasmid) were added to the cells. After 72 hours, gDNA was isolated.