Laboratory validation of a clinical metagenomic next-generation sequencing assay for respiratory virus detection and discovery

doi:10.21203/rs.3.rs-4492202/v1

Download PDF

Article

Laboratory validation of a clinical metagenomic next-generation sequencing assay for respiratory virus detection and discovery

https://doi.org/10.21203/rs.3.rs-4492202/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Tools for rapid identification of novel and/or emerging viruses are urgently needed for clinical diagnosis of unexplained infections and pandemic preparedness. Here we developed and clinically validated a largely automated metagenomic next-generation sequencing (mNGS) assay for agnostic detection of respiratory viral pathogens from upper respiratory swab and bronchoalveolar lavage samples in <24 hours. The mNGS assay achieved mean limits of detection of 543 copies/mL, viral load quantification with 100% linearity, and 93.6% sensitivity, 93.8% specificity, and 93.7% accuracy compared to gold-standard clinical multiplex RT-PCR. Performance increased to 97.9% overall predictive agreement after discrepancy testing and clinical adjudication, which was superior to that of RT-PCR (95.0% overall agreement). To enable discovery of novel, sequence-divergent human viruses with pandemic potential, de novo assembly and translated nucleotide algorithms were incorporated into the automated SURPI+ computational pipeline used by the mNGS assay for pathogen detection. Using in silico analysis, we showed after removal of all human viral sequences from the reference database that 70 (100%) of 70 representative human viral pathogens could still be identified based on homology to related animal or plant viruses. Our assay, which was granted breakthrough device designation from the US Food and Drug Administration (FDA) in August of 2023, demonstrates the feasibility of routine mNGS testing in clinical and public health laboratories, thus enabling a robust and rapid response to the next viral respiratory pandemic.

Health sciences/Medical research/Translational research

Health sciences/Diseases/Infectious diseases/Viral infection

Biological sciences/Microbiology/Virology/Metagenomics

Biological sciences/Microbiology/Clinical microbiology

Biological sciences/Microbiology/Infectious-disease diagnostics

metagenomic next-generation sequencing

assay development

agnostic detection

respiratory virus detection

pandemic preparedness

SARS-CoV-2

viral diagnostics

SURPI+ computational pipeline for pathogen detection

viral load quantification

diagnostic assay performance

viral multiplex RT-PCR

Respiratory infections are among the most common infections globally and are associated with significant morbidity and mortality^1-3. Despite their importance, half of adult patients hospitalized in the United States with community-acquired pneumonia, which is most commonly caused by respiratory viruses, have no causative pathogen identified^2-5. Respiratory infections caused by viruses can be especially challenging to diagnose because of the diversity of potential agents^6-8. In particular, emerging pandemic viruses represent an unpredictable threat which traditional diagnostic tools such as nucleic acid amplification tests have not been designed to detect⁹. The importance of unbiased assays for rapid identification of viral pathogens, especially those with sequence-divergent genomes, became evident during the discovery of SARS-CoV-2^10,11

Metagenomic next-generation sequencing (mNGS) has emerged as an attractive diagnostic method for identifying causative agents in unexplained infections as it provides a comprehensive and agnostic approach by which all potential pathogens can be identified in a single assay without the need for specific primers and probes^12,13. mNGS has been used for broadly diagnosing infections, whether viral, bacterial, fungal, or parasitic, from multiple specimen types^14-16, and its clinical utility has been demonstrated for neurological and bloodstream infections^16-18.

However, despite the favorable performance of mNGS testing as shown by multiple studies, general adoption of mNGS technologies in clinical microbiology laboratories has been hindered by high costs, complex protocols, lack of automation, insufficient standardization of bioinformatic pipelines, prolonged turnaround times (24-72 hours), lack for regulatory guidelines for clinical validation, and overall lower sensitivity for detection of common pathogens relative to targeted approaches such as polymerase chain reaction (PCR) assays¹⁹.

Here we describe the development, optimization, and clinical validation of a streamlined and largely automated mNGS laboratory-developed test (LDT) with a sample-to-result turnaround time of less than 24 hours for identification of common as well as unexpected and/or novel viral respiratory pathogens. The computational SURPI+ pipeline used by the mNGS assay was modified to provide enhanced analysis capabilities, including viral load quantification, incorporation of curated reference genome databases such as FDA dAtabase for Reference Grade micrObial Sequences (FDA-ARGOS), and sensitive identification of novel, sequence-divergent viruses by de novo assembly and translated nucleotide alignment. We comprehensively evaluated assay performance metrics, including limits of detection, linearity, precision, inclusivity and exclusivity, contamination, interference, matrix effect, stability, accuracy, and capacity to detect novel viruses.

Development and Optimization of an mNGS Assay for Detection of Viral Respiratory Pathogens

We developed an mNGS assay for the detection of viral pathogens from respiratory secretions, including upper respiratory swab and bronchoalveolar lavage (BAL) fluid samples (Figure 1). We leveraged our 7-year experience running clinical mNGS assays for pathogen detection from cerebrospinal fluid²⁰ by optimizing the sample preparation and bioinformatics analysis protocols to maximize sensitivity and decrease assay sample-to-result turnaround time. We tested different combinations of centrifugation, heat, and addition of a DNA/RNA stabilization medium prior to total nucleic acid extraction and found that centrifugation alone produced the highest yield of detected viral reads. To decrease turnaround times, we used a 15-minute protocol for human rRNA depletion and reduced incubation times for the reverse transcription and second-strand cDNA synthesis steps to 15 and 9 minutes, respectively. The final assay used 450 μL of sample input volume and consisted of the following steps: (1) centrifugation (~15 min), total nucleic acid extraction and DNase treatment for isolation of total RNA (~1 hr), (2) cDNA synthesis with ribosomal RNA (rRNA) depletion (~1 hr), (3) barcoded adapter ligation, library PCR amplification and purification on an automated instrument (~6.5 hr), (4) library pooling (~5 min), (5) Illumina (San Diego, CA) sequencing (5 or 13 hr, depending on whether a MiniSeq or NextSeq sequencer is used), and (6) bioinformatics analysis for viral detection and quantification using the SURPI+ pipeline (~1 hr). Overall sample-to-answer assay turnaround time was 14 - 24 hours. We used MS2 phage and External RNA Controls Consortium (ERCC) RNA Spike-In Mix (Invitrogen, Waltham, MA) added into each sample as internal qualitative and quantitative controls, respectively. The MS2 phage and ERCC sequencing results were also used to evaluate and interpret the background level in the sample, generally originating from the human host (Supplementary Tables 1 and 2).A commercial reference panel (Accuplex Panel, SeraCare, Milford, MA) consisting of quantified SARS-CoV-2, influenza A, influenza B, and respiratory syncytial virus (RSV) was spiked into pooled virus-negative nasopharyngeal swab matrix (see Methods for details) as an external positive control (PC) for the assay, with pooled virus-negative nasopharyngeal swabs from healthy uninfected donors as the negative matrix serving as an external negative control (NC).

The SURPI+ computational pipeline, run as a container on either a server or cloud, was used for the identification of viral respiratory pathogens from mNGS data^21,22. Three enhancements were made (Figure 2A). First, we added the capability for viral load quantification using the PC and a standard curve generated for each sample from the ERCC reads. Second, “tagging” of Genbank accession numbers in the SURPI+ database was incorporated to allow inclusion of curated viral reference genomes, such as those deposited in the FDA-ARGOS database²³, for virus identification by alignment and results reporting . Third, a custom algorithm consisting of de novo assembly of metagenomic reads and translated nucleotide, or amino acid, alignment of the reads to a viral protein database was developed to enable detection of novel, sequence-divergent viruse ²³.

Following the review of clinical charts, we investigated the correlation between viral load concentration, quantified in copies per milliliter (cp/mL) (Figure 2B). The severity of the infection which was categorized on a scale ranging from asymptomatic to mild, moderate, and severe. We observed significant differences in median viral loads between patients with asymptomatic/mild and moderate/severe infections (P < 0.001) (Supplemental Fig. 5a). Further stratification of patients into asymptomatic, mild, moderate, and severe infections highlighted an increasing trend in viral load concentrations. Through pairwise comparisons, we noted significant differences between asymptomatic and moderate (P < 0.01), as well as between mild and moderate (P < 0.01) infections. Overall, differences in median viral loads across all severity levels were significant (P < 0.001) (Supplemental Fig. 5b).

Quality control metrics were based on those previously established for a validated cerebrospinal fluid mNGS assay²¹ and include a minimum of 5 million preprocessed reads per sample, >75% of data with quality score >30 (Q>30), and successful detection of the internal spiked MS2 phage control and all four respiratory viruses in the PC. A threshold criterion of ≥3 non-overlapping viral reads or contigs aligning to the target viral genome was considered a positive detection. Overall, 93% (156 of 167) of both positive (n= 111) and negative (n=56) nasopharyngeal swab samples met QC metrics, those that did not meet QC metrics were excluded from the analysis.

Analytical Sensitivity

We adopted Clinical and Laboratory Standards Institute (CLSI) guidelines for NGS-based infectious diseases testing (MM24)²⁴ and validation of multiplex nucleic acid assays (MM17)²⁵ to conduct a comprehensive evaluation of assay performance metrics (Table 1). To determine limits of detection (LoD), negative nasopharyngeal swab matrix was spiked with the Accuplex Verification Panel and diluted at concentrations ranging from 5,000 to 100 copies/mL, with 10 to 40 replicates at each concentration. By 95% probit analysis, the LoD was determined for each of the four representative organisms in the panel (SARS-CoV-2, Influenza A, Influenza B, and RSV). We found LoDs ranging from 439 to 706 copies/mL for the four respiratory viruses in the positive control (Figure 3). The achieved average LoD of 550 copies/mL was comparable within one log to reported LoDs from specific reverse transcription-polymerase chain reaction (RT-PCR) assays for detection of viral respiratory pathogens²⁶.

Linearity

To evaluate the assay’s capability to accurately quantitate viral load for detected viruses, a linearity panel was generated using five log dilutions of a quantified high-titer SARS-CoV-2 positive nasal swab sample and compared to a commercially available AccuSpan^TM HCV RNA Linearity Panel. For both panels, the calculated linearity was 100% after running duplicates or triplicate replicates across a minimum of four 10-fold dilutions (Supplementary Figure 1). The absolute log₁₀ deviation of calculated from expected viral loads was <0.52 log₁₀, which was favorable in comparison to the interquartile ranges for virus-specific qPCR assays between different laboratories²⁷.

Precision

We measured intra-assay precision by testing two PC and two NC samples within the same run using different barcodes across 20 runs and inter-assay precision by testing 20 PC and 20 NC samples using different barcodes across 20 separate runs. Essential agreement (EA) was 100% and intra- and inter-assay precision were within our a priori established limits of <10% and <30% (log-transformed coefficients of variation in reads per million), respectively (Table 1).

Inclusivity and Exclusivity

To evaluate the ability of the mNGS assay to detect a wide range of targets (inclusivity), we obtained commercially available culture supernatants from 17 respiratory viruses representing different sublineages and subspecies. Viruses were spiked into negative control matrix at concentrations ranging from 1.3 x 10³ to 1.2 x 10⁷50% tissue culture infective dose (TCID50) per mL in 1:10 ratio (Table 2).All 17 (100%) of 17 viruses in these contrived samples were correctly identified by mNGS assay at the sublineage or subspecies level. Additionally, we identified subtypes of rhinovirus and enterovirus from PCR-positive clinical samples that were not differentiated by multiplex RT-PCR (Supplementary Figure 2A). We also evaluated the ability of the mNGS assay to identify uncommon or rare viral pathogens associated with respiratory infections (n=8 virus-positive tracheal aspirate samples) or central nervous system (CNS) infections (n=4 cerebrospinal fluid samples) in severely ill hospitalized patients (Table 2, Supplementary Figure 2B). The assay detected 11 (100%) of 11 viruses in these samples. To assess the exclusivity of the mNGS assay, we spiked two mixtures of microorganisms, including a previously reported positive control mNGS panel consisting of 7 representative pathogens²¹ and a commercial reference panel consisting of 10 bacterial and fungal species, into negative nasopharyngeal swab matrix and analyzed multiple aliquots (Table 1 and Supplementary Table 3). Detected reads from non-viral pathogenic organisms did not result in any false-positive detections for viral pathogens.

Contamination,, Matrix Effect and Stability

We evaluated potential cross-contamination between nearby sample wells and carryover contamination across successive runs from 10 SARS-CoV-2 high-titer clinical samples and 24 controls (cycle threshold, or C_t = 16-20) loaded in a modified checkerboard pattern (with at least one space between samples) on a 96-well plate, to mimic a single run on the Illumina NextSeq instrument. Only one possible cross-contamination event was observed, with a single SARS-CoV-2 read detected in one of the negative control wells at a subthreshold reporting level. We also evaluated the effects of interference from human RNA, bacterial DNA, and potential interfering substances on mNGS assay performance. Hemolysis, lipids, bilirubin, and human genomic RNA spiked into PC matrix at concentrations of 0.1 – 100 µg/mL did not interfere with respiratory virus detection, but background DNA/RNA spiked into PC matrix at concentrations ³1 x 10⁷ cells/mL resulted in failure to detect viruses due to high background. To evaluate the potential matrix effect from samples with high host background, we analyzed 14 PCR-positive highly mucoid bronchoalveolar lavage (BAL) samples obtained from lung transplant or cystic fibrosis patients undergoing surveillance bronchoscopy (Supplementary Table 4). All 14 samples had high host background, and 13 (92.9%) of 14 samples had very high host background. As a result, 6 (42.9%) of 14 samples had neither detection of the internal spiked MS2 phage control nor of a respiratory virus, and thus excluded from further analysis, as they not pass equencing quality control criteria (Supplementary Table 1). The respiratory viral pathogen was detected in all (100%) of the remaining 8 samples. We concluded that highly mucoid samples can inhibit the assay due to high host background. Finally, we evaluated mNGS assay stability; qualitative detection was not affected by keeping samples for up to 7 days at 4°C or subjecting the samples to 3 freeze/thaw cycles.

Accuracy

To evaluate accuracy, 191 residual samples after routine clinical testing were obtained from the UCSF Clinical Microbiology Laboratory, including 110 virus-positive samples (104 upper respiratory swab samples and 6 BAL fluids) from patients with acute respiratory infection (Supplementary Dataset 1), along with 81 virus-negative samples (52 upper respiratory swab samples and 29 BAL fluids) (Figure 4).As more than one target may be positive with mNGS and respiratory viral multiplex panel (RVP) testing using FDA-approved in vitro diagnostic (IVD) assays, sensitivity/specificity analyses were performed by assessing each result independently to assign true/false-positive/negative calls (see Methods for details). Compared to results from RVP RT-PCR testing, the mNGS assay exhibited 93.6% (103 of 110) sensitivity, 93.8% (76 of 81) specificity, and 93.7% (179 of 191) accuracy.

Discrepancy testing and clinical adjudication (DTCA) of 14 mNGS positive-RVP negative samples using blinded chart review by two board-certified infectious diseases physician (PB and CYC) and orthogonal assays run by the California Department of Public Health Viral and Rickettsial Disease Laboratory confirmed the presence of 9 respiratory viruses missed by RVP, allowing them to be reclassified as true positives (Supplementary Table 5). Viruses detected by mNGS but not targeted by RVP were not considered false-positive results. In one case, while the original RVP and orthogonal PCR testing returned negative results, mNGS identified rhinovirus C with high confidence. A review of the viral sequences revealed 12 non-overlapping reads across the human rhinovirus C genome (Supplementary Figure 3). Cross-contamination was ruled out, as no other sample in the sequencing batch tested positive for rhinovirus. A nucleotide BLAST (blastn) search confirmed sequences with high homology (95-98% identity) to known rhinovirus C strains (Supplementary Data 1). Although the exact primer binding sites for the clinical RT-PCR assays used in the current study are unknown, we identified, for the rhinovirus C sample, the presence of mismatches in primer and probe regions from previously reported RT-PCR assays targeting the 5’-untranslated region (UTR)^28,29 (Supplementary Figure 3C), which explained the detection by mNGS despite negative RT-PCR results.

Similarly, DTCA was performed on the 7 mNGS negative / RVP positive samples along with repeating the RVP assay (if possible, on a different instrument). This reassessment resulted in 5.5 samples being reclassified as true negatives (1 sample harbored two organisms adjudicated as one true negative and one false negative) (Supplementary Table 6). Compared to a composite standard that incorporates discrepancy testing and clinical adjudication, positive, negative, and overall predictive agreements of the mNGS assay were 98.7% (110.5 of 113), 98.1% (76.5 of 78), and 97.9% (187 of 191), respectively.

Detection of divergent viruses

To benchmark the capability of the modified SURPI+ pipeline for detection of novel, highly divergent viruses in silico, we created a simulated sequencing output file containing many known human viral pathogens of clinical and public health significance, including those with pandemic potential (Figure 5, left). We then removed all viral reference sequences of the same type (for example, all human polyomviruses, coronaviruses, or parainfluenza viruses) or corresponding to the same genus or species from the SURPI+ 2019 reference database (Figure 5, middle). Next, we used the SURPI+ pipeline to analyze the simulated sequencing file against both the original and “filtered” reference databases. In this analysis, 98.6% (69 of 70) of human viruses were detected at a sequencing depth of 100 reads per million (RPM) and 100% (70 of 70) at 1000 RPM based on homology to known animal or plant viruses (Figure 5, right). Of note, bunyaviruses pathogenic to humans, which are among the most divergent viruses, were still identified by translated nucleotide (amino acid) alignment to plant viruses (for example, detection of Venezuelan equine encephalitis virus based on homology to vanilla latent virus in Figure 3).

We validated a clinical mNGS assay in a CLIA laboratory as a Laboratory Developed Test (LDT) for agnostic viral respiratory pathogen detection intended to aid in patient diagnosis and public health surveillance. Our main goal was to develop, optimize, and streamline a protocol for respiratory viral mNGS testing that could be deployed and run routinely in clinical or public health laboratories. The mNGS assay developed here has favorable performance characteristics compared to clinical RVP testing, including a limit of detection of ~500 copies/mL, viral load quantification with 100% linearity, and sensitivity, specificity, and accuracy ranging from 93.6 – 93.8%. However, in contrast to targeted assays such as RVP, the mNGS assay is capable of detecting, in principle, all known as well as novel viral pathogens in respiratory samples. In addition, mNGS assay performance was found to be superior to RVP (97.9% versus 95.0% overall agreement) after discrepancy testing and clinical adjudication. The correlations we observed between viral load and disease severity highlight the potential for complementary quantitative viral load measurements to aid to distinguish beween asymptomatic infection and/or colonization and overt and/or severe respiratory disease, thereby informing clinical management and treatment, as has been previously demonstrated for certain non-respiratory viruses such as CMV³⁰.Following completion of the validation, our assay received breakthrough device designation from the US Food and Drug Administration (FDA) in August of 2023. Widespread implementation of highly accurate, rapid mNGS assays such as this, with enhanced capacity to detect novel viruses, will support robust preparation for and rapid response to the next viral pandemic.

Speed is a critical factor for diagnosis of respiratory infections, especially in critically ill patients with lower respiratory involvement and in outbreak investigations of novel or emerging viruses with pandemic potential. We also aimed to develop an assay that could be deployable widely in clinical and public health laboratories. Thus, we optimized many of the steps of the mNGS assay and moved the key RNA/cDNA library preparation step to an automated platform, the MagicPrep NGS system (Tecan Genomics, Inc., Männedorf, Switzerland). We further demonstrated that sequencing can be performed on the Illumina MiniSeq using the Rapid Reagent Kit for a faster 5-hour turnaround time or on the Illumina NextSeq 550Dx using the Mid-Output Reagent Kit for a 13-hour turnaround time, depending on laboratory needs and priorities. All together, these modifications resulted in an assay with a turnaround time of 14-24 hours and ~2 hours of hands-on technician time.

Orthogonal testing and clinical adjudication performed on discordant results demonstrated that the RVP assay is an imperfect gold standard on which to judge mNGS performance. The mNGS assay was able to not only detect uncommon infections from viruses not covered on existing RVP panels, but also, in multiple cases, detect viruses that would in principle be detectable by RVP but tested negative. Unlike RVP, mNGS does not rely on specific primers or probes and is thus less susceptible to primer failure due to viral evolution, as evidenced by the mNGS positive and RVP negative rhinovirus case presented here, and which can result in decreased assay sensitivity or false negative results due to viral mutation, which is an inevitable feature of SARS-CoV-2 and many other RNA viruses³¹. Notably, a previous study evaluating the usefulness of published PCR primers in detecting rhinovirus infection reported that none of the published rhinovirus-specific PCR primer pairs could detect all human rhinoviruses in 101 genotyped clinical specimens³². In addition, the broader sampling of the viral genome by mNGS may result in increased sensitivity of virus detection compared to RVP due to increased robustness to variability in the relative levels of viral gene expression by infected cells³³. Most of the false-negative mNGS samples were confirmed as true negative after chart review and repeating the RVP assay. Most likely, these represented false-positive results during the original RVP run, given the high cycle thresholds (>36), suggesting low viral titers, or samples that had degraded over time and/or after multiple freezing and thawing cycles.

In the study, we used several approaches to demonstrate the capacity of the mNGS assay to identify novel and/or emerging viruses with divergent genomes. The assay was successful in detecting uncommon and unusual viral pathogens associated with both severe respiratory infections (bronchoalveolar lavage fluid) and central nervous infections (CSF spiked into respiratory sample matrix). mNGS testing also enabled subtyping of specific viral strains with increased virulence, such as enterovirus D68, which has been linked to acute flaccid myelitis in children^34,35, and rhinovirus C, which has been associated with invasive pulmonary and bloodstream infection in immunocompromised patients^36,37. Importantly, the mNGS assay was also able to detect DNA viruses, such as adenovirus and bocavirus, in both clinical and contrived samples, despite the incorporation of DNase treatment in the protocol. Detection of DNA viruses is presumably based on detection of transcribed viral mRNA in infected cells, although may also enabled by incomplete DNA digestion from.the DNase enzyme.

To evaluate the capacity for mNGS testing using a modified SURPI+ computational pipeline to identify novel viruses, we performed an in silico analysis of a contrived metagenomic dataset consisting of reads from the genomes of human viruses of pandemic potential spiked into background using a reference database depleted of all known human viral sequences. This analysis was done to simulate whether “novel” human viruses with pandemic potential could be identified based on homology to known plant and animal viruses. All 70 of the human viral pathogens tested were successfully identified, including those with only remote homology to other viruses. Indeed, chikungunya virus, in the Alphavirus genus of the Togaviridae family, was only identified (after removal of all alphaviruses) because of distant homology to vanilla latent virus in the family Alphaflexivirdae. Notably, alphaflexiviruses contain a distinct lineage of alphavirus-like replication proteins that lack a recognized protease domain³⁸. Here we show in silico that the pipeline is able to detect highly diverse viruses from families that are known to be potentially pathogenic to humans and that emerge from animal reservoirs (for example, Bunyaviridae, Flaviviridae, and Adenoviridae). If a novel, highly divergent virus from an uncharacterized family were detected, with little to no homology, much more work would be needed to ascertain its clinical significance, or whether it is even capable of infecting humans, including formal assessment of Koch’s postulates with modificatons by Rivers for causality³⁹.

Our validation study has limitations. First, we tested very few bronchoalveolar lavage fluid samples from patients with acute respiratory infection (n=6) and very few clinical samples harboring rare or unusual respiratory viruses (n=7), and further validation of assay performance with these kinds of samples is needed. Second, mNGS testing was performed exclusively on samples from US patients, so viral pathogen diversity may not represent all populations globally. Third, we did not formally prove that the mNGS assay would be able to detect a novel, sequence-divergent virus, but instead demonstrated the ability of the test to detect such a virus using an in silico analysis, an approach which nonetheless has been used in previous studies to benchmark mNGS bioinformatic pipelines for viral pathogen discovery^40,41. Finally, we did not address the utility of the mNGS assay for routine diagnosis in patients with unexplained infections, or for outbreak surveillance in public health, which will likely require future prospective clinical and/or epidemiologic investigation.

Even though the respiratory mNGS assay described here has demonstrated high performance characteristics for sensitivity and specificity for the detection of viral pathogens, it is currently unlikely to replace multiplex respiratory panels as a first-line test since these are inexpensive and have more rapid turnaround times than mNGS. The projected costs of ~$300 USD per sample (Supplementary Table 7) make the respiratory mNGS assay more expensive than standard RVP tests, for which costs in our clinical laboratory range from $77 to $149 USD. However, the benefits of greatly expanded scope of detection, capability to identify novel emerging viruses, and comparable performance likely outweigh the costs for certain clinical and public health scenarios. The test could be particularly useful in public health laboratories that are more likely to receive and test samples from patients infected with unusual or novel viruses that are not part of the standard RVP testing. Of note, a modified protocol based on the assay was used to identify adeno-associated virus 2 in co-infections with adenoviruses and herpesviruses in cases of acute severe hepatitis in children as part of a nationwide US outbreak⁴². The mNGS assay could also be implemented as a second-line test in clinical laboratories for patients with presumed viral bronchiolitis and pneumonia when RVP testing is negative. This strategy would be useful for diagnosis of rare and/or unexpected infections in immunocompromised patients or returning travelers, for whom there is a wider differential diagnosis.

Resource availability

Lead Contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Charles Chiu ([email protected]).

Materials Availability

This study did not generate any new reagents.

Data and Code Availability

Human-subtracted raw sequence data were submitted to the Sequence Read Archive (SRA) database. (BioProject accession number PRJNA1084017 and umbrella BioProject accession number PRJNA171119). Sequence metadata, custom scripts and code for data analyses and visualization are available in a Zenodo data repository (https://doi.org/10.5281/zenodo.10553379).

Human Sample Collection

Residual laboratory-confirmed virus-positive upper respiratory swab or BAL samples from clinical patient testing were retrieved from the UCSF Clinical Microbiology Laboratory and stored according to protocols approved by the UCSF Institutional Review Board (protocol no. 11-05519) . Acceptable upper respiratory swab samples included (1) bilateral nasopharyngeal swabs, (2) bilateral anterior nares swabs, (3) oropharyngeal swabs, (4) combined nasopharyngeal and oropharyngeal swabs, and (5) combined oropharyngeal/mid-turbinate nasal swabs. All samples were required to meet minimal sample handling, storage, and volume requirements for inclusion in our study. Samples were stored at 4°C for <24 hr prior to being de-identified, aliquoted, and stored in -80°C freezer prior to mNGS processing, thus undergoing one freeze-thaw cycle.

Inclusion and Ethics

All residual samples meeting minimal requirements were included in the study. Samples were de-identified prior to processing.

External controls preparation

External positive control (PC) was prepared by spiking a pooled negative nasal swab matrix with a commercially available reference material, the Accuplex Verification Panel (SeraCare, Milford, MA). This panel consists of a mixture of non-infectious SARS-CoV-2, influenza A, influenza B, and RSV genomes encapsidated in a synthetic protein coat to mimic the structure of a viral capsid. This PC material was “spiked in” at a titer of approximately 10⁴ copies/mL for each virus control, which is 1–2 logs higher than the estimated limit of detection of the assay(~500 copies/mL). The negative matrix was prepared by pooling nasopharyngeal swab samples from asymptomatic individuals and was used as an external negative control (NC).

Nucleic acid extraction

500 µL of upper respiratory swab or BAL fluid was centrifuged at 16,000 x g for 10 minutes. The MagMAX™ Viral/Pathogen II (MVP II) Nucleic Acid Isolation Kit (Thermo Fisher Scientific, Waltham, MA) and the KingFisher™ Flex Purification System with a 96 deep-well head (Thermo Fisher Scientific, Waltham, MA) were used for total nucleic acid extraction. This protocol was modified to include DNase treatment as a host depletion step during extraction. Bacteriophage MS2 (Zeptometrix, Buffalo, NY) was added to all samples including the negative control as an internal qualitative control.

Library preparation and sequencing

Simultaneous reverse transcription of purified RNA, spiked in with ERCC RNA controls (Invitrogen, Waltham, MA), and ribosomal RNA (rRNA) depletion were carried out using NEBNext® Ultra™ II RNA First Strand Synthesis Module (New England Biolabs, Ipswich, MA) and QIAseq FastSelect-rRNA HMR Kit (Qiagen, Germantown, MD), respectively, followed by second strand cDNA synthesis using Sequenase™ Version 2.0 DNA Polymerase (Thermo Fisher Scientific, Waltham, MA). Complementary DNA (cDNA) was purified using AMPure XP beads (Beckman Coulter, Brea, CA) and loaded on the MagicPrep NGS instrument (Tecan Genomics, Inc., Männedorf, Switzerland) to undergo end-repair, adapter ligation and barcoding, amplification (25 cycles) and purification. Libraries were quantified and normalized using the Qubit dsDNA HS Assay (Thermo Fisher Scientific, Waltham, MA) on the Qubit Flex (Thermo Fisher Scientific, Waltham, MA). Final pooled libraries were sequenced as single-end reads on either the Illumina (San Diego, CA) MiniSeq using the Rapid Reagent Kit (100 cycles) or on the Illumina NextSeq 550 using the Mid-Output or High-Output Kit (150 cycles).

Bioinformatics

The SURPI+ computational pipeline, run as a container (v1.0.0) on either a secure server or cloud infrastructure, was used for identification of respiratory viral pathogens from mNGS data. Reads were preprocessed by trimming of adapters and removal of low-complexity and low-quality sequences, followed by computational subtraction of human reads. The Scalable Nucleotide Alignment Program (SNAP)⁴³ nucleotide aligner was run using an edit distance of 16 against the National Center for Biotechnology Information (NCBI) nucleotide (NT) database (March 2019, with inclusion of the SARS-CoV-2 WuHan-Hu-1 genome accession number NC_045512) filtered to retain only viral reads. The pipeline was modified to include “tagging”, or annotation, of entries from reference sequences that constitute a subset of the NCBI NT database, such as FDA-ARGOS²³. Note that the FDA-ARGOS database, while quality controlled and regulated, contains only 1,428 microbial strains, the majority of which are bacterial. It had also not been updated with recent viruses such as SARS-CoV-2; thus, we did not detect any reads matching to viral genomes in this study. The pipeline is also able to accommodate additional reference databases as needed such as GISAID⁴⁴. The pipeline was also modified to include optional de novo assembly of reads into contiuous sequences (contigs) and translated nucleotide sequence alignment of both reads and contigs using SPAdes ⁴⁵ and e⁴⁶, respectively. Viral reads are identified using DIAMOND at a e-value cutoff of 10^-5. Coverage maps were automatically generated by mapping reads classified by SURPI as viral to the most likely reference genome.

Quality control metrics for the assay were based on those previously established for cerebrospinal fluid²¹, and include a minimum of 5 million preprocessed reads per sample, >75% of data with quality score >30 (Q>30), and successful detection of the 4 respiratory viruses in the PC and the internal spiked MS2 phage control. A criterion of ≥3 non-overlapping viral reads or contigs aligning to the target viral genome was considered a positive detection.

Evaluation of mNGS analytical performance characteristics

The automated standard operating procedures and sequencing runs for these clinical validation studies were performed by a state-licensed clinical laboratory scientist.LoD was determined for each of the four representative organisms in the PC by probit analysis using a series of dilutions ranging from 100 to 5,000 copies/mL, with 10 to 40 replicates at each concentration. Linearity was demonstrated by plotting the standard curve. To validate the quantification using the ERCC and the positive control, we serially diluted an HCV positive plasma to known concentration ranging from 4 x 10⁶ to 4 x 10³copies/mL in triplicates. We then compared the quantitative measure to the known measure. Precision was determined using repeat analysis of two PC and two NC samples across 20 runs (intra-assay reproducibility) and by testing 20 PC and 20 NC across 20 separate runs (inter-assay reproducibility). To assess inclusivity, commercially available cultured supernatants were obtained to assess the assay’s ability to detect the intended targets. Each of the 17 respiratory viruses, titers ranging from 1.3 x 10⁴ to 1.2 x 10⁸ TCID50/mL, were spiked into the negative control matrix at 1:10 dilutions. These viruses represented known sublineages and subspecies and we evaluated their identification by our assay. We also tested samples of confirmed virus-positive BAL (n=7) and CSF samples (n=4) spiked into negative matrix to evaluate the detection of unusual viruses. To assess the exclusivity of the mNGS assay, we spiked a previously established mixture of seven representative pathogenic organisms to verify the false positive detection for viral pathogens. We evaluated cross-contamination between adjacent sample wells and carryover contamination across successive runs from samples with high viral loads. Interference was determined using PC spiked with known amount of hemolytic blood, lipids, bilirubin, human RNA, bacterial DNA/RNA. The effect of mucus in BAL positive fluid was also assessed. Stability was determined by keeping samples for up to 7 days at 4°C or subjecting the samples to 3 freeze/thaw cycles. Accuracy was determined using 191 clinical samples comprising 110 virus-positive samples (103 upper respiratory swab samples and 7 BAL fluids) from patients with acute respiratory infection, along with 81 virus-negative samples (52 upper respiratory swab samples and 29 BAL fluids). Samples were obtained from patients at the University of California, San Francisco (UCSF). The viral RT-PCR comparator assays that were used include the Genmark ePlex (Carlsbad, CA), Luminex NxTAG (Austin, TX), and/or Luminex Verigene RP Flex Respiratory Pathogen Panels. mNGS results were compared with original clinical testing and then with a composite reference standard including discrepancy testing and clinical adjudication. In the second comparison, when results were discordant, orthogonal testing was performed using a different instrument or an independent CLIA laboratory (the California Department of Public Health) in addition to clinical adjudication to reclassify mNGS results. The second comparison was reported as positive percent agreement (PPA) and negative percent agreement (NPA), as selective discrepancy testing can bias sensitivity and specificity results.

Orthogonal discrepancy testing at the California Department of Public Health

Specimens were tested by real-time PCR based on CDC protocols using a viral respiratory panel, an unpublished CDPH laboratory-developed test (LDT). Viruses that can be detected by this panel include human metapneumovirus, respiratory syncytial virus, adenovirus, parainfluenza virus (types 1, 2, 3, and 4), enterovirus/rhinovirus, and human coronaviruses 229E, OC43, NL63, and HKU1.

In silico analysis for identification of novel and/or divergent viruses using the SURPI+ pipeline

To measure accurate detection of novel and/or divergent viruses, an in silico analysis was performed. Representative viral reference genomes corresponding to outbreak viruses of clinical and public health significance with pandemic potential were retrieved from the NCBI GenBank database, partitioned into non-overlapping segments, and then randomly sampled and spiked in silico into a negative nasal swab matrix sequencing library. We then took a higher-level set of taxonomic identifiers (species, genus, and/or family) corresponding to these viruses and removed all entries with these taxonomic identifiers from the SURPI+ reference dataset. Next, we used the SURPI+ pipeline to analyze the simulated sequencing file against both the original and “restricted reference” databases and evaluated the performance of the pipeline in detecting “simulated” novel and/or divergent viruses that lacked a reference sequence.

Statistical analyses

Sensitivity and specificity analyses were performed as follows: as more than one target may be positive with mNGS and RVP, each result was independently assessed in every sample and true/false-negative/positive were accordingly assigned to each result. However, the total number of observations was kept constant (one sample = one observation = 1). For instance, in the case a test detected two organisms, namely the real culprit pathogen and a contaminant, the former was assigned 0.5 true-positive (TP) and the latter 0.5 false-positive (FP), in order as their sum was always equal to 1. In addition, as we used RVP as a comparator which includes a limited number of targets, mNGS positive-RVP negative results that were not a target for the RVP were not considered as false-positive results.

Statistical analyses were performed using scipy (version 1.5.3) and rstatix (version 0.7.0) packages as implemented in Python (version 3.7.12) and R (version 4.0.3), respectively. Probit regression analyses were done using scipy (version 1.5.3), numpy (version 1.19.1) and statsmodels (version 0.12.2) as implemented in Python software (version 3.7.12).

Acknowledgments

We thank the staff at the UCSF Clinical Microbiology Laboratory for help in collecting nasopharyngeal swab and bronchoalveolar lavage fluid samples. This work was financially supported in part by BARDA EZ-BAA award 75A50122C00022 (C.Y.C.), US CDC grants 75D30122C15360 and 75D30121C12641 (C.Y.C.), Abbott Laboratories (C.Y.C.), and the Chan-Zuckerberg Biohub (C.Y.C.). The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review or approval of the manuscript; and decision to submit the manuscript for publication.

Disclaimer

The content of this paper is solely the responsibility of the authors and does not represent the official views or opinions of the National Institutes of Health, Kaiser Permanente, California Department of Public Health or the California Health and Human Services Agency. Use of trade names and commercial sources is for identification only and does not imply endorsement by the California Department of Public Health or the California Health and Human Services Agency.

Competing interests

C.Y.C. is a founder of Delve Bio and on the scientific advisory board for Delve Bio, Flightpath Biosciences, Biomeme, Mammoth Biosciences, BiomeSense and Poppy Health. He is also an inventor on US patent 11380421, “Pathogen detection using next generation sequencing”, under which algorithms for taxonomic classification, filtering and pathogen detection are used by SURPI+ software. C.Y.C. receives research support from Delve Bio and Abbott Laboratories, Inc. The other authors declare no competing interests.

Author contributions

J. Tan, V.S., D.S., and C.Y.C conceived and designed the study. J. Tan, V.S., D.S., N.S., A.F., H.J.H., J.N., M.O., N.B., J. Tang, D.I., B.F., H.R., M.H., C.M., D.A.W., and C.Y.C coordinated the sequencing efforts and laboratory studies. J. Tan, A.C., H.C., and S.Y. processed samples. J. Tan, V.S., D.S., E.K., A.C., H.C., S.Y., M.D.L., P.B., and C.Y.C. analyzed data. J. Tan, N.S., A.F., J.N., M.O., P.M.M., and C.L. collected samples. J. Tan, V.S., E.K., P.B., M.D.L and C.Y.C. wrote the manuscript. J. Tan, V.S., E.K., P.B., and C.Y.C. prepared the figures. J. Tan, V.S., D.S., E.K., N.S., A.F., H.J.H., J.N., M.O., N.B., J. Tang, D.I., B.F., H.R., M.H., D.A.W., P.M.M., C.R.L., M.D.L., P.B., and C.Y.C edited the manuscript. J. Tan, V.S., E.K., M.D.L., P.B., and C.Y.C. revised the manuscript. All authors read the manuscript and agree to its contents.

DALYs, G.B.D., et al. Global, regional, and national disability-adjusted life years (DALYs) for 306 diseases and injuries and healthy life expectancy (HALE) for 188 countries, 1990-2013: quantifying the epidemiological transition. Lancet 386, 2145-2191, doi: 10.1016/S0140-6736(15)61340-X (2015).
Jain, S., et al. Community-Acquired Pneumonia Requiring Hospitalization among U.S. Adults. N Engl J Med 373, 415-427, doi: 10.1056/NEJMoa1500245 (2015).
Jain, S., et al. Community-acquired pneumonia requiring hospitalization among U.S. children. N Engl J Med 372, 835-845, doi: 10.1056/NEJMoa1405870 (2015).
Musher, D.M. & Thorner, A.R. Community-acquired pneumonia. N Engl J Med 371, 1619-1628, doi: 10.1056/NEJMra1312885 (2014).
Charlton, C.L., et al. Practical Guidance for Clinical Microbiology Laboratories: Viruses Causing Acute Respiratory Tract Infections. Clin Microbiol Rev 32, doi: 10.1128/CMR.00042-18 (2019).
Evans, S.E., et al. Nucleic Acid-based Testing for Noninfluenza Viral Pathogens in Adults with Suspected Community-acquired Pneumonia. An Official American Thoracic Society Clinical Practice Guideline. Am J Respir Crit Care Med 203, 1070-1087, doi: 10.1164/rccm.202102-0498ST (2021).
Jain, S. Epidemiology of Viral Pneumonia. Clin Chest Med 38, 1-9, doi: 10.1016/j.ccm.2016.11.012 (2017).
Schlaberg, R., et al. Viral Pathogen Detection by Metagenomics and Pan-Viral Group Polymerase Chain Reaction in Children With Pneumonia Lacking Identifiable Etiology. J Infect Dis 215, 1407-1415, doi: 10.1093/infdis/jix148 (2017).
Jones, K.E., et al. Global trends in emerging infectious diseases. Nature 451, 990-993, doi: 10.1038/nature06536 (2008).
Zhou, P., et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579, 270-273, doi: 10.1038/s41586-020-2012-7 (2020).
Lu, R., et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet 395, 565-574, doi: 10.1016/S0140-6736(20)30251-8 (2020).
Chiu, C.Y. & Miller, S.A. Clinical metagenomics. Nat Rev Genet 20, 341-355, doi: 10.1038/s41576-019-0113-7 (2019).
Simner, P.J., Miller, S. & Carroll, K.C. Understanding the Promises and Hurdles of Metagenomic Next-Generation Sequencing as a Diagnostic Tool for Infectious Diseases. Clin Infect Dis 66, 778-788, doi: 10.1093/cid/cix881 (2018).
Blauwkamp, T.A., et al. Analytical and clinical validation of a microbial cell-free DNA sequencing test for infectious disease. Nat Microbiol 4, 663-674, doi: 10.1038/s41564-018-0349-6 (2019).
Gaston, D.C., et al. Evaluation of Metagenomic and Targeted Next-Generation Sequencing Workflows for Detection of Respiratory Pathogens from Bronchoalveolar Lavage Fluid Specimens. J Clin Microbiol 60, e0052622, doi: 10.1128/jcm.00526-22 (2022).
Wilson, M.R., et al. Clinical Metagenomic Sequencing for Diagnosis of Meningitis and Encephalitis. N Engl J Med 380, 2327-2340, doi: 10.1056/NEJMoa1803396 (2019).
Lee, R.A., Al Dhaheri, F., Pollock, N.R. & Sharma, T.S. Assessment of the Clinical Utility of Plasma Metagenomic Next-Generation Sequencing in a Pediatric Hospital Population. J Clin Microbiol 58, doi: 10.1128/JCM.00419-20 (2020).
Han, D., et al. The Real-World Clinical Impact of Plasma mNGS Testing: an Observational Study. Microbiol Spectr 11, e0398322, doi: 10.1128/spectrum.03983-22 (2023).
Miller, S. & Chiu, C. The Role of Metagenomics and Next-Generation Sequencing in Infectious Disease Diagnosis. Clin Chem 68, 115-124, doi: 10.1093/clinchem/hvab173 (2021).
Benoit, P., et al. Metagenomic next-generation sequencing of cerebrospinal fluid for diagnosis of central nervous system infections: 7-year performance of a clinically validated test. medRxiv, doi: (2024).
Miller, S., et al. Laboratory validation of a clinical metagenomic sequencing assay for pathogen detection in cerebrospinal fluid. Genome Res 29, 831-842, doi: 10.1101/gr.238170.118 (2019).
Naccache, S.N., et al. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res 24, 1180-1192, doi: 10.1101/gr.171934.113 (2014).
Sichtig, H., et al. FDA-ARGOS is a database with public quality-controlled reference genomes for diagnostic use and regulatory science. Nat Commun 10, 3313, doi: 10.1038/s41467-019-11306-6 (2019).
Clinical Laboratory Standards Institute. Molecular Methods for Genotyping and Strain Typing of Infectious Organisms, 1st Edition. Vol. 24 (ed. Institute, C.a.L.S.) (Clinical and Laboratory Standards Institute, Wayne, Pennsylvania, 2021).
Clinical Laboratory Standards Institute. Validation and Verification of Multiplex Nucleic Acid Assays, 2nd Edition. Vol. 9 (ed. Institute, C.a.L.S.) (Clinical and Laboratory Standards Institute, Wayne, Pennsylvania, 2018).
Espy, M.J., et al. Real-time PCR in clinical microbiology: applications for routine laboratory testing. Clin Microbiol Rev 19, 165-256, doi: 10.1128/CMR.19.1.165-256.2006 (2006).
Hayden, R.T., et al. Progress in Quantitative Viral Load Testing: Variability and Impact of the WHO Quantitative International Standards. J Clin Microbiol 55, 423-430, doi: 10.1128/JCM.02044-16 (2017).
Andeweg, A.C., Bestebroer, T.M., Huybreghs, M., Kimman, T.G. & de Jong, J.C. Improved detection of rhinoviruses in clinical samples by using a newly developed nested reverse transcription-PCR assay. J Clin Microbiol 37, 524-530, doi: 10.1128/JCM.37.3.524-530.1999 (1999).
Lu, X., et al. Real-time reverse transcription-PCR assay for comprehensive detection of human rhinoviruses. J Clin Microbiol 46, 533-539, doi: 10.1128/JCM.01739-07 (2008).
Razonable, R.R. & Hayden, R.T. Clinical utility of viral load in management of cytomegalovirus infection after solid organ transplantation. Clin Microbiol Rev 26, 703-727, doi: 10.1128/CMR.00015-13 (2013).
Clark, C., Schrecker, J., Hardison, M. & Taitel, M.S. Validation of reduced S-gene target performance and failure for rapid surveillance of SARS-CoV-2 variants. PLoS One 17, e0275150, doi: 10.1371/journal.pone.0275150 (2022).
Faux, C.E., et al. Usefulness of published PCR primers in detecting human rhinovirus infection. Emerg Infect Dis 17, 296-298, doi: 10.3201/eid1702.101123 (2011).
Russell, A.B., Trapnell, C. & Bloom, J.D. Extreme heterogeneity of influenza virus infection in single cells. Elife 7, doi: 10.7554/eLife.32303 (2018).
Greninger, A.L., et al. A novel outbreak enterovirus D68 strain associated with acute flaccid myelitis cases in the USA (2012-14): a retrospective cohort study. Lancet Infect Dis 15, 671-682, doi: 10.1016/S1473-3099(15)70093-9 (2015).
Messacar, K., et al. Enterovirus D68 and acute flaccid myelitis-evaluating the evidence for causality. Lancet Infect Dis 18, e239-e247, doi: 10.1016/S1473-3099(18)30094-X (2018).
Lupo, J., et al. Disseminated rhinovirus C8 infection with infectious virus in blood and fatal outcome in a child with repeated episodes of bronchiolitis. J Clin Microbiol 53, 1775-1777, doi: 10.1128/JCM.03484-14 (2015).
Sayama, A., et al. Comparison of Rhinovirus A-, B-, and C-Associated Respiratory Tract Illness Severity Based on the 5'-Untranslated Region Among Children Younger Than 5 Years. Open Forum Infect Dis 9, ofac387, doi: 10.1093/ofid/ofac387 (2022).
Kreuze, J.F., et al. ICTV Virus Taxonomy Profile: Alphaflexiviridae. J Gen Virol 101, 699-700, doi: 10.1099/jgv.0.001436 (2020).
Guo, C. & Wu, J.Y. Pathogen Discovery in the Post-COVID Era. Pathogens 13, doi: 10.3390/pathogens13010051 (2024).
Wood, D.E. & Salzberg, S.L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15, R46, doi: 10.1186/gb-2014-15-3-r46 (2014).
Flygare, S., et al. Taxonomer: an interactive metagenomics analysis portal for universal pathogen detection and host mRNA expression profiling. Genome Biol 17, 111, doi: 10.1186/s13059-016-0969-1 (2016).
Servellita, V., et al. Adeno-associated virus type 2 in US children with acute severe hepatitis. Nature 617, 574-580, doi: 10.1038/s41586-023-05949-1 (2023).
Zaharia, M., et al. Alignment in a SNAP: Cancer Diagnosis in the Genomic Age. Laboratory Investigation 92, 458a-458a, doi: (2012).
Shu, Y. & McCauley, J. GISAID: Global initiative on sharing all influenza data - from vision to reality. Euro Surveill 22, doi: 10.2807/1560-7917.ES.2017.22.13.30494 (2017).
Bankevich, A., et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19, 455-477, doi: 10.1089/cmb.2012.0021 (2012).
Buchfink, B., Xie, C. & Huson, D.H. Fast and sensitive protein alignment using DIAMOND. Nat Methods 12, 59-60, doi: 10.1038/nmeth.3176 (2015).

Table 1. Performance characteristics of the UCSF viral respiratory mNGS assay

Metrics	Method	Expected target		Results
Limit of detection (LoD)	Detection of PC dilution by probit analysis	<1000 copies/mL		Target SARS-CoV-2 Influenza A Influenza B RSV	LoD 439 copies/mL 706 copies/mL 493 copies/mL 563 copies/mL
Linearity	Correlation of PC with assay quantification	R² > 90%		R² = 100 %
Precision	Intra-Assay: PC and NC within the same run across 20 runs.	Concordance 100% EA	Log-transformed CV <10%	Concordance 100% EA	Log-transformed CV <10%
	Inter-Assay: PC and NC across 20 separate runs	100% EA	<30%	100% EA	<30%
Inclusivity	Detection of viruses from diluted culture supernatant	100% detection		100% detection (17/17)
	Detection of viruses in positive BAL/CSF diluted samples	100% detection		100% detection (11/11)
Exclusivity	Detection of viruses in known organism mixtures^a	No false-positive		No false-positive
Contamination	Detection of cross-contamination on the sample wells	No carryover contamination		Cross-contamination of 0.1% between adjacent wells but no carryover contamination
Interference	Detection of PC spiked with hemolytic blood	Detection at all concentrations		Detection at all concentrations
	Detection of PC spiked with Human RNA	Detection at all concentrations		Detection at all concentrations
	Detection of PC spiked with bacterial DNA/RNA	Detection at concentration ≤ 10⁷cells/mL		Detection at concentration ≤ 10⁷cells/mL
	Detection of virus-positive overtly mucoid BAL samples	Detection in all BAL samples		Target detected in 13/14 (92.9%) valid sample runs
Stability	Detection of targets in samples held at 4°C for 7 days or after 3 freeze-thaw cycles	100% concordance		100% concordance
Accuracy	Detection in virus positive and negative samples (n=191)	Sensitivity > 90% Specificity > 90% Accuracy > 90% PPA > 90% NPA > 90%		Original testing Sensitivity: 93.6% Specificity: 93.8 % Accuracy: 93.7 %	After discrepancy testing and clinical adjudication PPA: 98.7% NPA: 98.1% Overall: 97.9%
Detection of divergent viruses	Detection by an in silico analysis of divergent viruses (n=70)	Sensitivity >95% Specificity >95%		Sensitivity: 98.6% Specificity: 100%

(PC) Positive control consisting of 4 respiratory viruses spiked into pooled nasopharyngeal swab matrix; (IC) spiked internal control consisting of a RNA MS2 phage; (NC) Negative control; (EA) Essential agreement, (CV) Coefficient of variation, (PPA) positive percent agreement; (NPA) negative percent agreement.

^aTwo mixtures were assessed. The first mixture included detectable concentrations of CMV, HIV, Klebisella pneumoniae, Streptococcus agalactiae, Aspergillus niger, Cryptococcus neoformans and Toxoplasma gondii, and corresponds to positive control material from a previously validated CSF assay²¹. The second mixture was a commercial reference panel, the ZymoBIOMICS Microbial Community Standard (Zymo Research, Tustin, CA), and consisted of 10 bacterial and fungal pathogens at varying concentrations (Listeria monocytogenes - 12%, Pseudomonas aeruginosa - 12%, Bacillus subtilis - 12%, Escherichia coli - 12%, Salmonella enterica - 12%, Lactobacillus fermentum - 12%, Enterococcus faecalis - 12%, Staphylococcus aureus - 12%, Saccharomyces cerevisiae - 2%, and Cryptococcus neoformans - 2%) that were spiked into negative nasopharyngeal swab matrix.

Table 2. Detection of a broad range of viruses in contrived samples

Contrived Sample type	Correctly identified Virus by mNGS assay
Positive cerebrospinal fluid (CSF) spiked in negative matrix	Lymphocytic Choriomeningitis Virus (LCMV)
	Herpes simplex virus 2 (HSV-2)
	Varicella-zoster virus (VZV)
	Herpes simplex virus 1 (HSV-1) and Epstein-Barr Virus (EBV)
Positive bronchoalveolar lavage (BAL) spiked in negative matrix	Parainfluenza Virus Type 4	Parechovirus A
	Influenza C Virus	Human Bocavirus
	Primate Bocaparvovirus 1	Coronavirus 229E
	Coronavirus NL63
Viral culture fluid spiked in negative control matrix (1:10)	Adenovirus Type 1	Coronavirus 229E
	Coronavirus NL63	Coxsackie Virus Type A1
	Echovirus	Human Metapneumovirus 16
	Influenza B Virus	Measles Virus
	Mumps Virus	Parainfluenza Virus Type 2
	Parainfluenza Virus Type 3	Parainfluenza Virus Type 4A
	Parechovirus Type 1	Rhinovirus A16
	Rhinovirus B14	Rubella Virus
	Influenza B Virus

Yes there is potential Competing Interest. C.Y.C. is a founder of Delve Bio and on the scientific advisory board for Delve Bio, Flightpath Biosciences, Biomeme, Mammoth Biosciences, BiomeSense and Poppy Health. He is also an inventor on US patent 11380421, “Pathogen detection using next generation sequencing”, under which algorithms for taxonomic classification, filtering and pathogen detection are used by SURPI+ software. C.Y.C. receives research support from Delve Bio and Abbott Laboratories, Inc. The other authors declare no competing interests.

SupplementaryDataset1.xlsx
Supplementary Dataset 1. Clinical diagnosis and disease severity for patients whose respiratory samples were analyzed as part of the mNGS accuracy evaluation. Abbreviations: CAR-T, chimeric antigen receptor T-cell; COVID-19, coronavirus disease 2019; CMV, cytomegalovirus; CXR, chest x-ray; Flu, influenza; ICU, intensive care unit; PCR, polymerase chain reaction;RSV, respiratory syncytial virus; SOB, shortness of breath.
SupplementaryMaterialver5.docx
Supplementary Material

Download PDF

Version 1

posted

You are reading this latest preprint version

Laboratory validation of a clinical metagenomic next-generation sequencing assay for respiratory virus detection and discovery

Status:

Version 1

Abstract

Figures

Introduction

Results

Discussion

Methods details

Declarations

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1