Avidity sequencing utilizes multiple cycles of reagent delivery to generate sequencing reads from DNA of interest. This approach has two distinct phases: interrogation of an unknown base through avidity binding, and incorporation of a single nucleotide to step to the next base after the base identity has been determined (Fig. 1A). Separating signal generation from catalysis allows for more efficient use of substrate. A specificity constant (kcat/Km) of 0.54 ± 0.22µM− 1s− 1 for monovalent dye labeled nucleotides using an engineered polymerase for sequencing was observed resulting from a maximum rate of incorporation (kpol) of 0.86 ± 0.14s− 1 and an apparent Kd (Kd,app) of 1.6 ± 0.6µM (Fig. 2A). This apparent Kd reflects the Km of a kinetic system not in equilibrium rather than the true Kd of the nucleotide substrate [20]. To achieve complete product turnover, this high apparent Kd can be overcome by using increased concentrations of fluorescent nucleotide substrate or allowing longer incorporation time for the reaction to complete. Both paths to overcome this substrate limitation have undesirable consequences. Reagent costs increase because substrate concentration is increased to drive the incorporation reaction forward to completion. The alternative of allowing longer incorporation times to drive the blocked nucleotide incorporation reaction to completion results in longer cycle times which have an additive effect over 300 cycles of step wise sequencing.
Avidity sequencing of DNA relies on the 4-color fluorescent detection of unblocked bases within a polony, in which two or more copies of primed target nucleic acid sequence and two or more polymerases bind to the nucleic acid with a polymer-nucleotide conjugate under conditions sufficient to allow a multivalent binding complex. The result is a system that establishes a binding equilibrium that reaches saturation based on substrate concentration in less than 30 seconds to generate signal rather than catalysis. The binding kinetics of this interaction were monitored using real-time data collection to observe avidites binding to polonies with an on rate (kon,avidite) of 271 ± 82nM− 1s− 1 (Fig. 2B). This on-rate occurs within the limit of error of a single fluorescently labeled monovalent nucleotide (Fig. 2C). Major differences were observed in the off-rate kinetics of avidite substrates versus monovalent nucleotides. Avidite substrates bound to the DNA polonies tightly with no measurable off-rate over the > 1 minute timescale needed for imaging and basecalling (Fig. 2D). This is in sharp contrast to the observed off-rates of fluorescently labeled monovalent nucleotides, which dissociate rapidly during the wash step following binding and then continue to dissociate during imaging (Fig. 2E). The negligible off rate results in decreased Kd of more than two orders of magnitude for avidites compared to monovalent nucleotides. With negligible avidite off-rates, a persistent signal can be achieved without the presence of free avidites in bulk solution, eliminating background. Without avidity, dissociation kinetics with monovalent nucleotides show a 4x signal decrease at the beginning of imaging due to fast dissociation as a result of the disruption of the binding equilibrium during reagent exchange (Fig. 4E).
Following binding and imaging, the avidites are removed. The nascent DNA strand must be extended one base before avidity detection can be repeated to determine the next DNA base of the polony. This is achieved by removing the blocking group and incorporating the next unlabeled, blocked nucleotide. Completing the avidity sequencing cycle (Fig. 1A) results in a basecall for each polony and extends the sequencing primer one base; allowing the cycle to be repeated to determine the identity of the next unknown base. Fluorescent signals corresponding to base calls are observed for 150 cycles (Fig. 1C).
Accuracy of Avidity Sequencing
To evaluate the accuracy of avidity sequencing, 20 sequencing runs were performed using a well characterized human genome. The sequencing data was used to train quality tables according to the methods of Ewing et al. [21], but with modified predictors. The quality tables were then applied to independent sequencing runs. Figure 3 shows the data quality that was obtained in a representative run. The quality scores are well calibrated across the entire range, meaning that predicted quality matches observed quality as determined by alignment to a known reference. Virtually all the data is above Q30 (an average of one error per 1000 bp) and the majority of data is above Q40 (an average of one error per 10,000 bp). The sequence of base call assignments and quality scores across the cycles constitute the output of the run. This data is represented in standard FASTQ format for compatibility with downstream tools and applications.
Single Cell RNA Sequencing
To demonstrate sequencing performance across common applications, single cell RNA expression libraries were prepared and sequenced. Two libraries from a reference standard consisting of human peripheral blood mononuclear cells (PBMCs) were generated using the 10X Chromium instrument. The two libraries contain RNA from roughly 10,000 and 1,000 cells, respectively. After application of the Adept compatibility kit, the libraries were loaded onto an AVITI sequencer and generated a paired end read data set. The analysis was done using CellRanger (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/installation). This reference standard is used by 10x to evaluate sequencing performance, so a set of metrics and guidelines to assess sequencing results is provided along with the biological material. Table 1 shows each metric, the guideline values from 10X, and the performance of each library on AVITI. All metrics are within the guided ranges, and the metrics pertaining to sequencing quality far exceed the thresholds provided.
Whole Human Genome Sequencing
Another common application is human whole genome sequencing. This application challenges sequencer accuracy to a greater extent than measuring gene expression because the latter requires only accurate alignment while the former depends on nucleotide accuracy to resolve variant calls. To demonstrate performance for this application, the well characterized human sample HG002 was prepared for sequencing using a Covaris shearing and PCR-free library preparation method and sequenced with 2x150bp reads.
A FASTQ file with the basecalls and quality scores was down-sampled to 35X coverage and used as an input into the DNAScope analysis pipeline from Sentieon [22]. Following alignment and variant calling, the variant calls were compared to the NIST genome in a bottle truth set v4.2.1 via the hap.py comparison framework [23]. Both SNP and indel calls were highly accurate, with F1 scores of 0.995 and 0.996, respectively. Table 5 shows variant calling performance for SNPs and small indels on the GIAB-HC regions. Sensitivity, precision, and F1-score are shown. SNP and indel performance are comparable.