Sequencing by Avidity Enables High Accuracy With Low Reagent Consumption

doi:10.21203/rs.3.rs-1965701/v1

Download PDF

Article

Sequencing by Avidity Enables High Accuracy With Low Reagent Consumption

https://doi.org/10.21203/rs.3.rs-1965701/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 25 May, 2023

Read the published version in Nature Biotechnology →

Version 1

posted

You are reading this latest preprint version

We present a novel sequencing chemistry implemented as part of the AVITI system. Relying on the proximal DNA binding sites created through DNA amplification on a solid support, avidity sequencing uses multivalent nucleotide ligands on dye-labeled cores to simultaneously form polymerase-polymer nucleotide complexes bound to clonal copies of DNA targets. These polymer-nucleotide substrates, termed avidites, decrease the required concentration of reporting nucleotides by 100x and yield a negligible dissociation rate. We demonstrate the use of avidites within a novel sequencing technology that surpasses Q40 accuracy and enables a diversity of applications that include single cell RNA-seq and whole human genome sequencing.

Biological sciences/Biotechnology/Sequencing/Next-generation sequencing

Biological sciences/Biotechnology/Sequencing/DNA sequencing

Over the past 15 years, next generation sequencing (NGS) methods have become popular across a broad set of applications [1–8]. A number of sequencing chemistries have been introduced during this time, each having various strengths and limitations [9]. The chemistries vary across multiple dimensions including accuracy, read length, run time, and cost. The most widely used methods utilize highly parallel and accurate short read sequencing such as described in Bentley et al. and termed sequencing by synthesis (SBS) [10]. We present a novel sequencing chemistry that addresses some of the limitations of SBS and enables increased accuracy while decreasing key reagent concentrations.

The SBS methodology sequences DNA by controlled (i.e., one at a time) incorporation of a labeled nucleotide that can be read before removal of the label moiety and the subsequent round of sequencing [11]. In order to ensure only a single incorporation occurs, a structural modification (“blocking group”) of the labeled nucleotides is required. The blocking group and label must be removable, under reaction conditions that do not interfere with the integrity of the DNA being sequenced. The sequencing cycle is repeated with the incorporation of the next blocked and labeled nucleotide. In order to be of practical use, the entire process should consist of high yielding, highly specific chemical and enzymatic steps to facilitate multiple cycles of sequencing. Typically, to achieve high yielding incorporation steps to drive reaction completion, micromolar concentrations are necessary to achieve the maximum rate of incorporation of a nucleotide [12–16].

The concept of avidity can be described as multivalent ligands tethered in close proximity that can simultaneously bind to their targets. Coincident binding increases affinity and residence time when multivalent ligands bind to their target sites [17]. As an example of the dramatic impact avidity can have on both affinity and decreased dissociation rate, Zhang et al. demonstrated that by changing a monomeric nanobody to a pentameric nanobody, it is possible to achieve affinity gains and decrease dissociation rates by 3–4 orders of magnitude [18]. Element’s novel approach is to leverage avidity for nucleotide detection within sequencing chemistry.

With avidity sequencing technology, the detection step is separated from the controlled incorporation step. Prior to highly parallel sequencing, DNA fragments of interest are captured on the surface of a flowcell and multiple copies of the DNA fragment in close proximity are created through DNA amplification, forming a polony [19]. Relying on the proximal DNA binding sites created through DNA amplification on a solid support, avidity sequencing uses multivalent nucleotide ligands on dye-labeled cores to simultaneously form polymerase-polymer nucleotide complexes bound to clonal copies of DNA targets. A polymerase and a mixture of four avidites, each corresponding to a particular nucleotide, are used for base discrimination. The avidite is not incorporated, and its removal leaves no modifications in the synthesized strand. The avidites decrease the required concentration of reporting nucleotides by 100x and yield a negligible dissociation rate. The subsequent synthesis step proceeds using blocked but unlabeled nucleotides. We demonstrate the use of avidites within a novel sequencing technology that surpasses Q40 accuracy and enables a diversity of applications that include single cell RNA-seq and whole human genome sequencing.

Avidity sequencing utilizes multiple cycles of reagent delivery to generate sequencing reads from DNA of interest. This approach has two distinct phases: interrogation of an unknown base through avidity binding, and incorporation of a single nucleotide to step to the next base after the base identity has been determined (Fig. 1A). Separating signal generation from catalysis allows for more efficient use of substrate. A specificity constant (k_cat/K_m) of 0.54 ± 0.22µM^− 1s^− 1 for monovalent dye labeled nucleotides using an engineered polymerase for sequencing was observed resulting from a maximum rate of incorporation (k_pol) of 0.86 ± 0.14s^− 1 and an apparent Kd (K_d,app) of 1.6 ± 0.6µM (Fig. 2A). This apparent Kd reflects the Km of a kinetic system not in equilibrium rather than the true Kd of the nucleotide substrate [20]. To achieve complete product turnover, this high apparent Kd can be overcome by using increased concentrations of fluorescent nucleotide substrate or allowing longer incorporation time for the reaction to complete. Both paths to overcome this substrate limitation have undesirable consequences. Reagent costs increase because substrate concentration is increased to drive the incorporation reaction forward to completion. The alternative of allowing longer incorporation times to drive the blocked nucleotide incorporation reaction to completion results in longer cycle times which have an additive effect over 300 cycles of step wise sequencing.

Avidity sequencing of DNA relies on the 4-color fluorescent detection of unblocked bases within a polony, in which two or more copies of primed target nucleic acid sequence and two or more polymerases bind to the nucleic acid with a polymer-nucleotide conjugate under conditions sufficient to allow a multivalent binding complex. The result is a system that establishes a binding equilibrium that reaches saturation based on substrate concentration in less than 30 seconds to generate signal rather than catalysis. The binding kinetics of this interaction were monitored using real-time data collection to observe avidites binding to polonies with an on rate (k_on,avidite) of 271 ± 82nM^− 1s^− 1 (Fig. 2B). This on-rate occurs within the limit of error of a single fluorescently labeled monovalent nucleotide (Fig. 2C). Major differences were observed in the off-rate kinetics of avidite substrates versus monovalent nucleotides. Avidite substrates bound to the DNA polonies tightly with no measurable off-rate over the > 1 minute timescale needed for imaging and basecalling (Fig. 2D). This is in sharp contrast to the observed off-rates of fluorescently labeled monovalent nucleotides, which dissociate rapidly during the wash step following binding and then continue to dissociate during imaging (Fig. 2E). The negligible off rate results in decreased Kd of more than two orders of magnitude for avidites compared to monovalent nucleotides. With negligible avidite off-rates, a persistent signal can be achieved without the presence of free avidites in bulk solution, eliminating background. Without avidity, dissociation kinetics with monovalent nucleotides show a 4x signal decrease at the beginning of imaging due to fast dissociation as a result of the disruption of the binding equilibrium during reagent exchange (Fig. 4E).

Following binding and imaging, the avidites are removed. The nascent DNA strand must be extended one base before avidity detection can be repeated to determine the next DNA base of the polony. This is achieved by removing the blocking group and incorporating the next unlabeled, blocked nucleotide. Completing the avidity sequencing cycle (Fig. 1A) results in a basecall for each polony and extends the sequencing primer one base; allowing the cycle to be repeated to determine the identity of the next unknown base. Fluorescent signals corresponding to base calls are observed for 150 cycles (Fig. 1C).

Accuracy of Avidity Sequencing

To evaluate the accuracy of avidity sequencing, 20 sequencing runs were performed using a well characterized human genome. The sequencing data was used to train quality tables according to the methods of Ewing et al. [21], but with modified predictors. The quality tables were then applied to independent sequencing runs. Figure 3 shows the data quality that was obtained in a representative run. The quality scores are well calibrated across the entire range, meaning that predicted quality matches observed quality as determined by alignment to a known reference. Virtually all the data is above Q30 (an average of one error per 1000 bp) and the majority of data is above Q40 (an average of one error per 10,000 bp). The sequence of base call assignments and quality scores across the cycles constitute the output of the run. This data is represented in standard FASTQ format for compatibility with downstream tools and applications.

Single Cell RNA Sequencing

To demonstrate sequencing performance across common applications, single cell RNA expression libraries were prepared and sequenced. Two libraries from a reference standard consisting of human peripheral blood mononuclear cells (PBMCs) were generated using the 10X Chromium instrument. The two libraries contain RNA from roughly 10,000 and 1,000 cells, respectively. After application of the Adept compatibility kit, the libraries were loaded onto an AVITI sequencer and generated a paired end read data set. The analysis was done using CellRanger (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/installation). This reference standard is used by 10x to evaluate sequencing performance, so a set of metrics and guidelines to assess sequencing results is provided along with the biological material. Table 1 shows each metric, the guideline values from 10X, and the performance of each library on AVITI. All metrics are within the guided ranges, and the metrics pertaining to sequencing quality far exceed the thresholds provided.

Whole Human Genome Sequencing

Another common application is human whole genome sequencing. This application challenges sequencer accuracy to a greater extent than measuring gene expression because the latter requires only accurate alignment while the former depends on nucleotide accuracy to resolve variant calls. To demonstrate performance for this application, the well characterized human sample HG002 was prepared for sequencing using a Covaris shearing and PCR-free library preparation method and sequenced with 2x150bp reads.

A FASTQ file with the basecalls and quality scores was down-sampled to 35X coverage and used as an input into the DNAScope analysis pipeline from Sentieon [22]. Following alignment and variant calling, the variant calls were compared to the NIST genome in a bottle truth set v4.2.1 via the hap.py comparison framework [23]. Both SNP and indel calls were highly accurate, with F1 scores of 0.995 and 0.996, respectively. Table 5 shows variant calling performance for SNPs and small indels on the GIAB-HC regions. Sensitivity, precision, and F1-score are shown. SNP and indel performance are comparable.

Although other chemistries have proposed to separate incorporation and signal generation [24], the avidite concept is new and comes with several advantages stemming from the fact that the multiple nucleotides on the avidite bind multiple copies of the DNA template within a polony. This binding-only approach interrogates the DNA without incorporation of a dye-labeled nucleotide to make a base call. These substrates decrease the required concentration of reporting nucleotides by 100x and yield a negligible dissociation rate. Furthermore, the avidite construct is modular. The core can be swapped for a different substrate. The number and type of dye molecules is configurable, and many types of linkers can be used. The changes are straightforward to implement and do not require modification to the polymerase responsible for binding the nucleotides attached to the linkers. The modular design speeds technology improvement as each component can be optimized in parallel for increased signal, decreased cycle time, lower reagent concentration, or any other potential axis of improvement.

The avidite chemistry described above has been implemented as part of the AVITI benchtop sequencing solution. The accuracy of the AVITI sequencer has been demonstrated by training a quality model on human sequencing data and showing that that majority of bases in an independent human whole genome sequencing run exceed Q40, or less than 1 error in 10,000 base pairs. The sequencer can be used on a wide range of applications, as exemplified by providing results for a single cell RNA-seq experiment and for whole human genome variant calling. In both cases, reference standards were sequenced so that the quality of result could be assessed. The single cell data exceeded the quality metric guidelines provided by 10X. The human genome variant calling results showed high sensitivity and precision for both SNPs and small indels. The two applications were selected due to the availability of well-characterized samples and because they represent very different use cases. However, these are only examples and many other applications have already been demonstrated internally and by our commercial partners. Notably, the current implementation of the avidity-based chemistry is relatively new. Although it already achieves high accuracy and broad applicability, there are many improvement directions being explored.

Table 1: Single cell expression: CellRanger metric values for 10K cell and 1K cell libraries from the PBMC reference

CellRanger v7.0 Metric	Performance expectation	AVITI 10K cells	AVITI 1K cells
Valid barcodes	>90%	97.5%	97.5%
Reads mapped confidently to exonic regions	>50%	53.0%	53.8%
Read mapped confidently to transcriptome	>40%	74.7%	77.8%
Fraction reads in cells	>80%	95.5%	92.6%
Q30 bases in barcode	>85%	99.5%	99.5%
Q30 bases in RNA read	>75%	98.6%	98.8%
Mean reads per cell	>50,000	61,326	68,766
Median genes per cell	>1700	2,910	2,951
Estimated number of cells	+/-20%	8,513	922

Table 2: Variant calling performance for HG002 on GIAB-HC regions

	Sensitivity	Precision	F1-Score
SNP	0.9939	0.9977	0.9958
Small indel	0.9928	0.9980	0.9954

Solution measurements of nucleotide incorporation

Solution measurements of nucleotide kinetics were performed using dye labeled nucleotides DNA substrates for solution kinetic assays were prepared by annealing a 5’FAM labeled primer oligo purchased from IDT and HPLC purified with a template oligo. Annealing was performed with 10 percent excess template oligo in annealing buffer using a PCR machine to heat oligos to 95°C followed by slow cooling to room temperature over 60 minutes. Solution kinetics were performed by mixing a preformed Enzyme-DNA complex with fluorescent nucleotide and MgSO₄ using a RQF3 Rapid Quench Flow (KinTek Corp.). Extension products were separated from unextended primer oligos by Capillary Electrophoresis using a 3500 Series Genetic Analyzer (ThermoFisher) to achieve single base resolution. Products were quantified and fit to a single exponential equation. The observed rates as a function of nucleotide concentration were then fit to a hyperbolic equation to derive an apparent K_d (K_d,app) and a rate of polymerization (k_pol).

Real-time measurements of avidite association and dissociation

Real-time measurements of avidite binding kinetics were performed using an Olympus Axiovert IX81 microscope equipped with custom automated fluidics and control software. Excitation lines were centered from AVITI flowcells were used to capture surface based avidite kinetic measurements. Data collection (4fps) was triggered by flow of the avidity mix and collected for 55 seconds. Polonies in the field of view were localized by custom analysis software. Background corrected intensities were extracted and plotted versus time. Experiments were performed at 0.5pM, 1nM, 7.5nM, and 10nM avidites. Higher concentration data collection was limited by the ability to detect polony intensity from free avidite intensity at elevated concentrations. Off-rate measurements were performed by saturating polonies with avidites followed by washing with imaging mix and collecting data. Experiments using monovalent dye-labeled nucleotides were conducted using identical methods.

Genomic DNA and NGS library preparation

Human DNA from cell line sample HG002 was obtained from Coriell Institute. Linear NGS library construction was performed using a KAPA HyperPrep library kit (Roche) according to published protocols. Finished linear libraries were circularized using Element Adept Compatibility kit (catalog #830-00003). Final circular libraries were quantified by qPCR with the standard and primer set provided in the kit. Circular library DNA was denatured using sodium hydroxide and neutralized with excess Tris pH 7.0 prior to dilution. Denatured libraries were diluted to 8pM in hybridization buffer before loading onto the sequencing cartridge.

Single cell 3’ gene expression library circularization

Single cell RNA-Seq libraries were prepared from two lots of PBMC cell suspensions (10,000 cells and 1,000 cells) using the Chromium Next GEM Single Cell 3’ Kit v3.1 (Part #1000268). Each library was quantified and individually processed for sequencing on the AVITI System using the Element Adept Library Compatibility Kit (Part #830-00003). The processed libraries were pooled and sequenced on the AVITI System with 28 cycles for Read 1, 90 cycles for Read 2, and index reads.

Sequencing instrument and workflow

Element’s AVITI commercial system (Part #88 − 00001) was used for all sequencing data. AVITI 2x150 kits were loaded on the instrument (Part #86 − 00001). Primary analysis was performed onboard the AVITI sequencing instrument and FASTQ files were subsequently analyzed using a secondary analysis pipeline from Sentieon.

Sequencing primary analysis

Four images were generated during each sequencing cycle for each portion of the flowcell, corresponding to the dyes used to label each avidite. An analysis pipeline was developed that uses the images as input to identify the polonies present on the flowcell and to assign to each polony a base call and a quality score for each cycle. The analysis approach has similar steps to those described in Whiteford et al. [25]. Briefly, intensity is extracted for each polony in each color channel. The intensities are corrected for color cross talk and phasing. The intensities are then normalized to make cross channel comparisons. The highest normalized intensity value for each polony in each cycle determines the base call. In addition to assigning a basecall, a quality score corresponding to the call confidences is also assigned. The standard Q score definition is utilized, where the Q value is defined as \(Q=-10*{\text{log}}_{10}p\), where p is the probability that the base call is an error. The Q score generation follows the approach of Ewing et al., with modified predictors [21].

The sequence of base call assignments and quality scores across the cycles constitute the output of the run. This data is represented in standard FASTQ format for compatibility with downstream tools.

To assess the accuracy of quality scores shown in Fig. 3, the FASTQ files were aligned with BWA to generate BAM files. GATK BaseRecalibrartor was then applied to the BAM, specifying publicly available known sites files to exclude human variant positions.

Single cell gene expression data analysis

Following sequencing, the Bases2Fastq Software was used to generate FASTQ files for compatible upload into 10x Cloud and subsequent analysis with the 10x Genomics Cell Ranger analysis package. Data visualization of single cell gene expression profiling was generated using 10x Genomics Loupe Browser.

Levy, S.E. and R.M. Myers, Advancements in Next-Generation Sequencing. Annu Rev Genomics Hum Genet, 2016. 17: p. 95–115.
van Dijk, E.L., et al., Ten years of next-generation sequencing technology. Trends Genet, 2014. 30(9): p. 418–26.
Yohe, S. and B. Thyagarajan, Review of Clinical Next-Generation Sequencing. Arch Pathol Lab Med, 2017. 141(11): p. 1544–1557.
Zhang, Y., et al., Single-cell RNA sequencing in cancer research. J Exp Clin Cancer Res, 2021. 40(1): p. 81.
Ekblom, R. and J. Galindo, Applications of next generation sequencing in molecular ecology of non-model organisms. Heredity (Edinb), 2011. 107(1): p. 1–15.
Morozova, O. and M.A. Marra, Applications of next-generation sequencing technologies in functional genomics. Genomics, 2008. 92(5): p. 255–64.
Schuster, S.C., Next-generation sequencing transforms today's biology. Nat Methods, 2008. 5(1): p. 16–8.
Metzker, M.L., Sequencing technologies - the next generation. Nat Rev Genet, 2010. 11(1): p. 31–46.
Hu, T., et al., Next-generation sequencing technologies: An overview. Hum Immunol, 2021. 82(11): p. 801–811.
Bentley, D.R., et al., Accurate whole human genome sequencing using reversible terminator chemistry. Nature, 2008. 456(7218): p. 53–9.
Chen, F., et al., The history and advances of reversible terminators used in new generations of sequencing technology. Genomics Proteomics Bioinformatics, 2013. 11(1): p. 34–40.
Joyce, C.M., et al., Fingers-closing and other rapid conformational changes in DNA polymerase I (Klenow fragment) and their role in nucleotide selectivity. Biochemistry, 2008. 47(23): p. 6103–16.
Kati, W.M., et al., Mechanism and fidelity of HIV reverse transcriptase. J Biol Chem, 1992. 267(36): p. 25988–97.
Kuchta, R.D., et al., Kinetic mechanism of DNA polymerase I (Klenow). Biochemistry, 1987. 26(25): p. 8410–7.
Xia, S. and W.H. Konigsberg, RB69 DNA polymerase structure, kinetics, and fidelity. Biochemistry, 2014. 53(17): p. 2752–67.
Yang, G., et al., Steady-state kinetic characterization of RB69 DNA polymerase mutants that affect dNTP incorporation. Biochemistry, 1999. 38(25): p. 8094–101.
Vauquelin, G. and S.J. Charlton, Exploring avidity: understanding the potential gains in functional affinity and target residence time of bivalent and heterobivalent ligands. Br J Pharmacol, 2013. 168(8): p. 1771–85.
Zhang, J., et al., Pentamerization of single-domain antibodies from phage libraries: a novel strategy for the rapid generation of high-avidity antibody reagents. J Mol Biol, 2004. 335(1): p. 49–56.
Shendure, J., et al., Accurate multiplex polony sequencing of an evolved bacterial genome. Science, 2005. 309(5741): p. 1728–32.
Tsai, Y.C. and K.A. Johnson, A new paradigm for DNA polymerase specificity. Biochemistry, 2006. 45(32): p. 9675–87.
Ewing, B., et al., Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res, 1998. 8(3): p. 175–85.
Freed, D., et al., The Sentieon Genomics Tools - A fast and accurate solution to variant calling from next-generation sequence data. bioRxiv, 2017: p. 115717.
Krusche, P., et al., Author Correction: Best practices for benchmarking germline small-variant calls in human genomes. Nat Biotechnol, 2019. 37(5): p. 567.
Drmanac, S., et al., CoolMPS < sup>™: Advanced massively parallel sequencing using antibodies specific to each natural nucleobase. bioRxiv, 2020: p. 2020.02.19.953307.
Whiteford, N., et al., Swift: primary data analysis for the Illumina Solexa sequencing platform. Bioinformatics, 2009. 25(17): p. 2194–9.

Yes there is potential Competing Interest. I am an employee and co-founder of Element Biosciences. Positive feedback of this manuscript and ultimate publication could impact the commercial uptake of our commercial sequencing system. As a result, this could impact the financial well-being of Element and myself as a shareholder in the company.

Download PDF

Journal Publication

published 25 May, 2023

Read the published version in Nature Biotechnology →

Version 1

posted

You are reading this latest preprint version

Sequencing by Avidity Enables High Accuracy With Low Reagent Consumption

Status:

Journal Publication

Version 1

Abstract

Figures

Main

Results

Accuracy of Avidity Sequencing

Single Cell RNA Sequencing

Whole Human Genome Sequencing

Discussion

Methods

Solution measurements of nucleotide incorporation

Real-time measurements of avidite association and dissociation

Genomic DNA and NGS library preparation

Single cell 3’ gene expression library circularization

Sequencing instrument and workflow

Sequencing primary analysis

Single cell gene expression data analysis

References

Additional Declarations

Status:

Journal Publication

Version 1