Flow virometry for water-quality assessment: Protocol optimization for a model virus and automation of data analysis

doi:10.21203/rs.3.rs-1674358/v1

Download PDF

Article

Flow virometry for water-quality assessment: Protocol optimization for a model virus and automation of data analysis

https://doi.org/10.21203/rs.3.rs-1674358/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 04 Apr, 2023

Read the published version in npj Clean Water →

You are reading this latest preprint version

Flow virometry (FVM) can support advanced water treatment and reuse by delivering near real-time information about viral water quality. But realizing the full potential of FVM in relevant applications relevant to water treatment and reuse requires consistent, optimized protocols to facilitate data validation and interlaboratory comparison—as well as approaches to protocol design that can extend the suite of viruses that FVM can feasibly and efficiently monitor. We address these needs herein. First, we optimize a sample-preparation protocol for a model virus using a fractional factorial experimental design. The final protocol for FVM-based detection of T4—an environmentally relevant viral surrogate—blends and improves on existing protocols developed using a traditional “pipeline”-style optimization approach. Second, we test whether density-based clustering can aid and improve analysis of viral surrogates in complex matrices relative to manual gating. We compare manual gating with results obtained through algorithmic clustering: specifically, by coupling the OPTICS (Ordering Points to Identify Cluster Structure) ordering algorithm with either manual or automated extraction of clusters from the OPTICS-ordered data. We demonstrate that OPTICS-assisted clustering can in some cases work as well or better than manual gating of FVM data—and is far faster and less labor-intensive. OPTICS-assisted clustering can also point to features in FVM data that are difficult to detect through manual gating alone. We demonstrate our combined sample-preparation and automated data analysis pipeline on tertiary-treated wastewater samples collected from a water-recycling facility and augmented with T4 and virus-sized fluorescent particles. As use of FVM for viral water-quality assessment expands, we recommend that this protocol be used to validate instrument performance prior to and alongside application of FVM on environmental samples. Widespread adoption of a consistent, optimized analytical approach that (i) centers on a widely available, easy-to-use viral target, and (ii) includes automated data analysis will bolster confidence in FVM as a reliable approach for microbial water-quality monitoring.

Water reuse is becoming essential to meeting water demand. Strategies for nonpotable and indirect potable reuse are well established [1,2]. Direct potable reuse (DPR)—i.e., reuse of water for potable purposes without an environmental buffer—represents the final frontier. While DPR offers multiple advantages [3], it also engenders concerns about technical feasibility, cost, safety, and societal acceptance. The California State Water Resources Control Board concluded that improved methods of monitoring waterborne microorganisms “would enhance the understanding and acceptability of DPR” by reducing threats to human health [4].

Flow cytometry (FCM) is emerging as one such method. FCM characterizes particles (including biological and non-biological targets) based on how they scatter light and/or fluoresce when passing through one or more laser beams. Improvements in instrumentation and techniques have recently enabled applications of FCM for water-quality assessment [5], but key knowledge gaps persist. One major need is improved protocols for FCM-based detection and enumeration of viruses (a subset of FCM known as flow virometry, or FVM) in environmental water samples [6]. Researchers often struggle to differentiate actual viruses from similarly sized virus-like particles (VLPs) such as exosomes and microvesicles (Reyes and Aguilar, 2018), and there are no widely accepted methods for performing this differentiation [8]. In a 2018 review, Lippé observed that absolute quantification of targets is a “persistent issue” for flow virometry given “lack of proper standards.” Lippé suggested that “well-characterized, fixed, and uniform viruses, particularly the more rigid nonenveloped ones, could eventually be employed as biological [flow virometric] standards.” Consistent, optimized protocols for FVM analysis of such viruses would facilitate comparison of results and data exchange across labs [9]. Moreover, Dlusskaya et al. (2021) demonstrated that flow virometry is currently “neither sensitive nor accurate enough to quantify most natural viral populations” [7]. Advances in FVM hardware and dyes will be needed for flow virometry to detect smaller viral classes in wastewater, such as enteric viruses. But better protocols could extend the suite of viruses that can feasibly be monitored through FVM in the interim.

Brussaard and colleagues demonstrated [10] and refined [11] a staining protocol for FVM-based detection of a variety of viruses through general nucleic-acid staining. This protocol has been used and adapted by many others. However, Huang et al. (2015) reported that the Brussaard protocol did not enable clear separation of virus signal from noise in samples from a water-reclamation plant, and recommended several variations [11]. Brussaard et al. and Huang et al., like most researchers developing flow virometric protocols, used “pipeline”-type strategies [12] for protocol optimization. A problem with the pipeline approach is that it overlooks potential interaction effects between factors of interest. Varying factors sequentially is also an inefficient way to identify factors with the most substantial main effects.

In addition, FVM data is typically presented by plotting the intensity and frequency of electronic signals recorded by a cytometer’s detectors. Researchers manually analyze the data by drawing “gates” around clusters of points and relating the gated populations to treatments and/or outcomes of interest. The success of this workflow is problematically dependent on researcher expertise. Bashashati and Brinkman (2009) found that when identical samples were analyzed via FVM by 15 experienced laboratories, the mean interlaboratory coefficient of variation ranged from 17–44%...with most of the variation attributed to gating differences [13]. Reliance on time-consuming manual gating also impedes use of FVM for real-time microbial water-quality assessment.

This study addresses both challenges. First, we leverage the Brussaard and Huang protocols to optimize the detection of T4 bacteriophage—an environmentally relevant viral surrogate—using a fractional factorial approach. While a fractional factorial design is a well-recognized way to rigorously identify “the most important factors or process/design parameters that influence critical quality characteristics” [14], the method has been surprisingly underutilized in environmental applications of FVM. Second, we test whether density-based clustering can aid and improve analysis of viral surrogates in complex matrices relative to manual gating. We demonstrate the combined protocols on tertiary treated wastewater samples collected from a water recycling facility and augmented with T4, and explain how the combined protocols could help improve the quality of results obtained from FVM in contexts relevant to water treatment and reuse.

Optimizing staining through fractional factorial experimental design

Our first objective was to test whether a fractional factorial experimental design can yield an improved protocol for FVM-based detection of viruses treated with a general nucleic-acid stain. We used the protocols developed by Brussaard et al. and Huang et al. as scaffolding for our design. Brussaard et al. recommended fixing the sample with glutaraldehyde at a final concentration of 0.5%, flash-freezing in liquid nitrogen, diluting in Tris-EDTA (TE) buffer, staining with SYBR Green I at a final dilution of 5 × 10^-5 the commercial stock, and incubating the sample with the stain for 10 min in the dark at 80°C. Huang et al. concluded that better results for reclaimed-water samples could be obtained by using an 0.2% glutaraldehyde concentration, omitting flash-freezing, staining at room temperature for 15 minutes, using SYBR Gold instead of SYBR Green I, and staining at a final dilution of 1 × 10^-4. We combined treatment steps from the two protocols into a 2_IV^6-2 fractional factorial experimental design (replicated 4x) to assess main and interaction effects of six two-level factors—(1) stain concentration, (2) staining temperature, (3) staining time, (4) additive, (5) diluent, and (6) stain type—on nucleic-acid staining of T4 for FVM analysis.

A representative suite of results plots is displayed in Figure 1. Collective results from the T4 optimization are also summarized graphically in Figure S1. A distinct target population was only visible for the eight glutaraldehyde-treated runs. Indeed, glutaraldehyde addition had a highly significant (p < 0.001) effect on total event count, mean fluorescence intensity (MFI; a measure of brightness achieved through nucleic-acid staining), and the fluorescence coefficient of variation (CV; a measure of the tightness of the target population). Adding glutaraldehyde increased the total sample event count by 65,402 events, increased MFI by 360 units, and decreased fluorescence CV by 9 percentage points.

There are three explanations for the glutaraldehyde-induced increase in sample event count:

Glutaraldehyde increases the presence of fluorescent phantom events (e.g., colloidal particles [15]).
Glutaraldehyde raises the fluorescence of non-target events (e.g., bacterial debris) above the fluorescence threshold.
Glutaraldehyde raises the fluorescence of target events (here, T4) above the fluorescence threshold.

To test (1) and (2), we used FVM to compare untreated and glutaraldehyde-treated 0.2-mm filtered phosphate buffered saline (PBS) after staining with SYBR Gold. We also compared FVM data collected on untreated and glutaraldehyde-treated samples of the negative control (bacterial host propagated and purified without virus infection) stained with SYBR Gold. In neither case did FVM reveal a distinct target population, nor a substantial increase in event count, after glutaraldehyde addition. These results suggest that glutaraldehyde addition not only helps visibly separate the target signal from non-target events, but also increases the absolute number of target events detected through FVM. The target event count for the eight runs that incorporated glutaraldehyde was approximately 10⁹–10¹⁰ events/mL: about an order of magnitude greater than the qPCR-based titer (10⁸–10⁹ gc/mL) and about two orders greater than the culture-based titer (10⁷–10⁸ PFU/mL). These discrepancies may be attributed to factors such as non-specific staining of particles (e.g., cellular debris) in FVM, losses during DNA extraction in PCR, and aforementioned challenges with plate-based culturing.

The fractional factorial design enabled quantification of main and two-way interaction effects for each factor tested. Results are shown in Figure 2 and Table S1. We performed this quantification first on all events within analysis bounds. Though the quantification analysis suggested the presence of numerous significant main effects as well as several significant two-way interaction effects between glutaraldehyde and other experimental factors, results were compromised by the fact that the analysis did not distinguish between target and non-target events. Because a distinct target population was only visible for glutaraldehyde-treated runs, and because our goal was to develop a staining protocol that most successfully separates the target population from background, we also performed the quantification using only data from target events identified in glutaraldehyde-treated runs.

No statistically significant two-way interaction effects were observed in the target-only analysis. However, including glutaraldehyde as a variable in the experimental design meant that only a small subset of two-way interaction effects between non-glutaraldehyde factors were analyzed. Future work could explore other possible two-way interaction effects. The target-only quantification analysis also did not identify any statistically significant main effects on MFI. Diluent was the only variable that had a significant main effect on event count: the main effect of using TE buffer instead of MQ water was -7,807 events with a p-value of 0.023. This result may be explained by the increased tendency of free stain to form colloids in low-ionic-strength water [16].

Stain temperature and diluent had strongly significant (p < 0.001) main effects on CV. Staining at 50°C decreased CV by 2.7 percentage points; using TE buffer decreased CV by 4.4 percentage points. Stain concentration had a strongly significant (0.001 < p < 0.01) effect on CV: staining at 1 x 10^-4 times the sample volume increased CV by 1.8 percentage points. Stain time and stain type both had significant (0.01 < p < 0.05) main effects on CV. Staining for 15 minutes decreased CV by 1.2 percentage points; staining with SYBR Gold rather than SYBR Green I increased CV by 1.5 percentage points. We conclude that stain temperature and diluent are the most important sample-preparation factors besides glutaraldehyde addition. In other words, dilution in TE buffer and staining at 50°C meaningfully increases the “tightness” of the T4 fluorescence signal, thereby aiding discrimination of T4 from background.

We also conclude that using SYBR Green I (instead of SYBR Gold) and staining for 15 minutes (instead of 1 minute) could improve target discrimination of T4 slightly further. But these small potential gains must be weighed against drawbacks. Staining for one minute is more conducive to near-real-time FVM analysis than staining for 15. Moreover, SYBR Green I exhibits a large fluorescence enhancement upon binding to DNA but not RNA. A protocol using SYBR Green I will be less effective than SYBR Gold at detecting a wide variety of viruses, since the latter exhibits a large fluorescence enhancement upon binding to DNA and RNA. Future work could explore these tradeoffs for environmental samples.

Overall, our results suggest that a protocol for reliably identifying and quantifying T4 bacteriophage through FVM involves a combination of treatments recommended by Brussaard et al. and Huang et al. We recommend diluting the sample in TE buffer to achieve an FVM analysis rate of about 10²–10³ events/second, adding glutaraldehyde at a final concentration of 0.5%, and staining with either SYBR Green I or SYBR Gold (depending on whether the species of interest include both DNA and RNA viruses) at 5 x 10^-5 times the sample volume at 50°C for at least 1 minute.

Automating data analysis through density-based clustering

Clustering approach

Our second objective of this study was to explore cluster analysis as an objective, automated alternative to manual gating. Specifically, we tested whether density-based clustering can aid and improve analysis of viral surrogates in complex matrices. The OPTICS algorithm developed by Ankerst et al. (1999) underlies the most widely used density-based clustering strategies [17]. OPTICS outputs all points in a dataset ordered by a characteristic “reachability distance”, and generates a reachability plot that can be used to identify clusters by looking for “valleys” of low reachability distance separated by “peaks” of noise. The most straightforward way to extract clusters from the reachability plot is to set a single global reachability threshold. Unfortunately, this approach fails when—as is often the case in real-world environmental samples—the number of targets and the spatial density of FVM data generated by those targets is variable (Figure S2).

Alternative options are (i) extracting clusters via manual selection of peaks and valleys on the reachability plot, or (ii) using an algorithm to perform the selection automatically (Figure S3). Ankerst et al. suggested extracting clusters automatically by identifying “steep up” and “steep down” areas on the reachability plot characterized by the ξ steepness parameter, but ξ must be laboriously tuned based on trial and error. The opticskxi package available in R [18] provides a variantcluster-extraction algorithm that “iteratively investigates the largest differences” in steepness until either a given number of clusters are defined or the maximum number of iterations is reached [19]. We compared results obtained through manual gating to results obtained through OPTICS combined with either manual or opticskxi-based cluster extraction[1] for two datasets, as described below.

Mixed-target experiment

A variety of microbiological targets may be present and of interest in a real-world setting such as a water-treatment plant. To test whether density-based clustering can accurately detect and quantify waterborne viruses alongside other specimens, we prepared a solution containing known concentrations of viral and non-viral targets in the submicron size range. The targets were φ6 and T4 bacteriophages as well as fluorescent polystyrene beads of 0.2, 0.5, and 0.8 mm in diameter. T4 was included as an environmentally relevant viral surrogate that generates a clear FVM signal; φ6 was included to represent viral classes that are not detectable through FVM as distinct populations but may still generate an indeterminate “virus-like particle (VLP)” signal [6]; and beads were included because they are highly uniform and similar in size to many viral and bacterial classes. Combining biological and engineered targets enabled us to test the performance of density-based clustering on a mixed-density dataset. We collected FVM data on 10 replicates of each of five dilutions of the mixed-target solution. The 0.8 mm bead component was kept undiluted as a control/reference.

Figures 3, 4, and 5 respectively illustrate results from manual gating, OPTICS ordering + manual extraction, and OPTICS ordering + opticskxi-based extraction of the mixed-target data. We note several features of the results. First, manual extraction labeled far more points as noise than did opticskxi. This is because for manual extraction, we separated valleys from peaks by setting cutpoints at the apparent “knees” of the reachability plot curves. Opticskxi, by contrast, tends to set cutpoints at or near the curve peaks.

Second, the different data-analysis strategies yielded somewhat different clusters. In manual gating we drew six gates: one for each of the three bead sizes, T4, φ6 and other virus-like particles (VLPs), and an additional apparent cluster corresponding to 0.5 mm bead doublets.[2] Neither manual extraction nor opticskxi identified a cluster matching the manual gates drawn for φ6/VLPs and for the 0.5 mm doublet. Manual extraction tended to identify events falling within these gates as noise, while opticskxi- tended to assign events falling into the φ6/VLP gate as part of the T4 cluster and events falling into the 0.5 mm doublet gate as part of the 0.5 mm bead cluster. Both OPTICS-based approaches frequently detected two separate clusters within the side scatter (SSC) vs. fluorescence (FITC) region designated by manual gating as corresponding to 0.2 mm beads. Inspecting the data revealed that some of the events exhibiting the same SSC and FITC signal intensity ranges exhibited meaningfully different FSC signal intensities.

To numerically compare results across the different data-analysis approaches, then, we established four “buckets” corresponding to (1) viruses (including T4, φ6, and other VLPs), (2) 0.2 mm beads, (3) 0.5 mm beads (including 0.5 mm doublets), and (4) 0.8 mm beads. Tables S2 and S3 show expected and average detected event counts across the three approaches for each bucket; Figure 6 plots these data. There were clear differences between the theoretical and detected event counts for each bucket. Event counts were higher than expected for the 0.2 and 0.5 mm bead buckets, slightly lower than expected for the 0.8 bead bucket, and much lower than expected for the virus bucket. The bead-bucket discrepancies can be explained by the fact that manufacturer-provided concentrations of the bead solutions were only approximate within an order of magnitude. Discrepancies for the virus bucket can be explained by the fact that φ6, a small and difficult-to-stain enveloped virus, emits only a faint FITC signal. A majority of the φ6 particles spiked into the mixed-target solution were likely not stained brightly enough to rise above the FITC limit of detection [6].

Focusing on detected event counts, results were generally consistent across all three data-analysis approaches for the bead buckets. For the virus bucket, event counts from manual gating and opticskxi were similar to each other but generally higher than event counts from manual extraction. This is because while engineered particles generate tightly grouped data of fairly uniform density, viral targets tend to generate more unevenly dispersed FVM data. Consider how each of our three data-analysis approaches considered handle the clusters associated with T4 and φ6. For manual gating, we established relatively large T4 and φ6 gates. Any point falling within these gates was hence categorized as part of the virus bucket. For the OPTICS-based methods, there was not a clear shift in reachability distance marking the transition from T4 to φ6/VLPs: reachability distance increased gradually towards the border of the T4 cluster, then increased at roughly the same rate as the T4 cluster border bled into the φ6/VLP region. This resulted in manual extraction and opticskxi delivering divergent results. Opticskxi tended to assign high-reachability-distance points included in a given reachability curve (points near the peak) to the same cluster as low-reachability-distance points (points near the valley). Since OPTICS placed many points corresponding to the T4 and φ6/VLP regions on the same reachability-plot curve, opticskxi assigned those points to the T4 cluster. By contrast, manually set cutpoints assigned points near the valley of the T4/φ6/VLP curve to the T4 cluster and points near the peak to noise.

Environmental-spike experiment

We performed a modified version of the mixed-target experiment to assess whether automated clustering can accurately detect and quantify waterborne viruses in a challenging environmental matrix, where the presence of an increased background signal could confound FVM analysis and/or alter the target signal.[3] Specifically, we spiked a T4/bead solution described above into tertiary-treated, 0.2 mm-filtered wastewater effluent diluted 10x. The T4/bead solution was the same as the one used in the mixed-target experiment, but with φ6 and 0.5 mm beads omitted. We also prepared an identical but unspiked solution for comparison. We again collected FVM data on 10 replicates each of the spiked and unspiked solutions, analyzing data by both manual gating and density-based clustering.

Figures 7, 8, and 9 illustrate results from manual gating, OPTICS ordering + manual extraction, and OPTICS ordering + opticskxi-based extraction of environmental-spike data. Manual gating identified the three targets: an 0.8 mm bead cluster, an 0.2 mm bead cluster, and for the T4-spiked sample, and a T4 cluster partially obscured by signal from the wastewater matrix but still clearly within the previously established T4 gate. Expected event counts were roughly in line with detected event counts obtained through manual gating, exhibiting the same discrepancies observed in the mixed-target experiment (Table S4). We also observed a low-SSC, high-FITC cluster in most of the replicate runs for both the T4-spiked and unspiked samples. The identity of particles in this cluster is unknown.

The two OPTICS-based clustering approaches yielded quite different results. As was also true for the mixed-target experiments, manual cluster extraction successfully detected the 0.8 mm bead cluster, the 0.2 mm bead cluster, and often a sub-cluster in the 0.2 mm bead zone corresponding to particles exhibiting similar FITC and SSC intensities but different FSC intensities. Manual extraction also detected one or more clusters in the low-FITC, low-SSC region corresponding to φ6/VLPs in the mixed-target experiments, and hence to background (including natural virus particles) in the wastewater matrix. Manual extraction did not typically clearly distinguish the T4 cluster, nor did it detect the low-SSC, high-FITC foreign cluster.

For opticskxi-based cluster extraction, the constraining k parameter meant that opticskxi did not yield as many clusters as manual extraction. Rather, opticskxi consistently detected an 0.8 mm bead cluster, a cluster that included the 0.2 mm beads but also many apparent noise points, and a cluster that included the T4/VLP/background region. The latter sometimes spilled over to include much of the 0.2 mm bead region. Opticskxi occasionally detected the higher-FSC sub-cluster in the 0.2 mm bead region, occasionally detected the low-SSC, high-FITC foreign cluster, and never detected a clearly distinct T4 cluster.

Because (i) the reachability plots from the environmental-spike data were so complex, (ii) we set manual gates exclusively based on the SSC vs. FITC pseudocolor density plot (i.e., without considering FSC), and (iii) of concerns (discussed further below) that OPTICS might over-weight FSC signal intensities for virus data, we also generated OPTICS orderings of the environmental-spike data using only the SSC vs. FITC dimensions. Figures S4 and S5 contain representative plots illustrating manual extraction and opticskxi results, respectively, using these reduced-dimension orderings. The reachability plots of these orderings were simpler but did not yield significantly better results, especially for detecting T4.

As advanced water treatment and reuse becomes more common, the need for improved methods of monitoring waterborne microorganisms is becoming more acute. Flow cytometry (FCM)—and its virus-focused specialty, flow virometry (FVM)—is emerging as one such method. But realizing the full potential of FVM in applications relevant to water treatment and reuse requires consistent, optimized protocols to facilitate data validation and interlaboratory comparison—as well as approaches to protocol design that can extend the suite of viruses that FVM can feasibly and efficiently monitor.

In this study, we proposed using fractional factorial experimental designs for optimizing FVM sample-preparation protocols, and density-based clustering for analyzing complex FVM data. Both approaches can be considered for any FVM application but were here demonstrated using the bacteriophage T4 in the context of water treatment and reuse. Specifically, we used a fractional factorial experimental design to efficiently identify multiple factors with statistically significant main effects on T4 count, MFI, and CV. Our results suggest an optimized protocol for FVM-based T4 detection that blends and improves on protocols developed using a traditional optimization approach. While we did not observe any statistically significant interaction effects among factors tested, employing a fractional factorial design enabled us to confirm the absence of such effects. Per Lippé (2018), we recommend that other researchers working on FVM for microbial water-quality monitoring consider adopting our protocol[1] as a biological (flow virometric) standard. T4 is widely accepted as an environmentally relevant viral surrogate, and is easy to cultivate, use, and detect through using a variety of commercially available flow cytometers. Incorporating analysis of T4 (suspended in clean matrix) as a routine part of any FVM experiment will help researchers determine instrument-specific limits of detection and expected ranges of viral signal intensities. We further recommend that researchers couple this analysis with analysis of standard but non-biological viral surrogates (e.g., submicron-scale fluorescent polystyrene beads) to generate multiple comparison points for data across different labs. Finally, we recommend that to carry out these analyses as objectively as possible, researchers consider employing computational techniques such as clustering.

In this study, we showed that density-based clustering can sometimes work as well or better than manual gating of FVM data—and is certainly far faster and less labor-intensive. In particular, we found that OPTICS ordering coupled with either manual or opticskxi-based cluster extraction delivered excellent results for the dense, well-defined clusters generated by engineered beads. There was almost unilaterally good agreement between results obtained through manual gating and density-based clustering for bead targets. In addition, both OPTICS-assisted approaches revealed a feature of the data not detected through manual gating: that some points within the apparent 0.2 mm bead region in fact emitted meaningfully different FSC signals.

Results from applying the different data-analysis methods to virus targets were more mixed (strengthening the case for using both biological and non-biological standards to facilitate validation and comparison of FVM data across labs). Manual gating reliably identified separate T4 and φ6/VLP populations, but these manual gates impose unnaturally sharp boundaries on irregularly shaped clusters and are incapable of adjusting to variability in biological data (e.g., due to inconsistent staining). However, neither manual cluster extraction nor opticskxi reliably identified separate T4 and φ6/VLP populations in this study. In the mixed-target experiment, manual extraction reliably identified the T4 cluster while labeling points in the φ6/VLP region as noise. This is arguably acceptable since the apparent φ6/VLP events constitute a vague cloud of points more than a clearly defined cluster. In the environmental-spike experiment, though, manual extraction failed to identify spiked T4. Opticskxi grouped apparent T4 points with points in the φ6/VLP/background region in both the mixed-target and the environmental-spike experiments. This is also arguably acceptable if the ultimate intended application is monitoring total virus counts. On the other hand, opticskxi applied to data from the environmental-spike experiment sometimes unacceptably grouped T4 with 0.2 mm beads.

Further work is clearly needed to improve performance of density-based clustering on challenging FVM data from targets like viruses in challenging matrices like wastewater. Future work could focus, for instance, on selecting the best OPTICS parameters based on information available about the dataset in question, improving automatic cluster extraction, or more strategically weighting different dimensions of FVM data in OPTICS.[2] OPTICS could also be useful as a tool to assist manual gating in complex samples. A researcher could theoretically apply density-based clustering on a target in a clean sample, and then use the identified cluster boundary as a gate for complex samples where density-based clustering fails.

Such additional work is merited given the advantages that density-based clustering could offer for to microbial water-quality assessment. First, density-based clustering could improve result consistency. We previously found that “data from identical samples can produce electronic signals of considerably different intensities depending on the instrument used for analysis” [20]. Labs using different instruments cannot readily adopt a shared set of analytical gates, but they can adopt a shared clustering algorithm. Second, the method could improve the speed at which results are delivered. By minimizing human involvement in data processing density-based clustering analysis could support real-time validation of microorganism removal in advanced water treatment [4]. Third, the method could improve result accuracy—essential to positioning FVM as a viable quality-check mechanism in water-reuse applications with public-health implications. Fourth and finally, the approach could improve result quality by uncovering features in FVM data difficult to detect through manual gating alone.

Phage stock preparation

The bacteriophage T4 (ATCC 11303-B4) and its host Escherichia coli (Migula) Castellani and Chalmers (ATCC 11303) were ordered from the American Type Culture Collection (ATCC) and propagated from freeze-dried specimens. φ6 bacteriophage (strain HB104) and its host Pseudomonas syringae were provided as stock solutions by Samuel Díaz-Muñoz (UC Davis). Host aliquots (containing 25% glycerol by volume) and phage aliquots were stored at -80°C until use. The purified, high-titer phage stocks used in this study were prepared using protocols based on Bonilla et al. (2016) [21]; these protocols are explicated in the Supplementary Information. Negative control stocks were prepared using the same protocols minus the phage spike. One group of positive and negative stock aliquots was prepared by 100x dilution in Milli-Q (MQ) water; a second was prepared by 100x dilution in Tris-EDTA (TE) buffer. Subsets of each group were fixed with glutaraldehyde (0.5% final concentration, 15 min at 4°C). Final stock aliquots were stored at -80°C until use.

Phage stock quantification

We assessed the titers of the purified stock via both plate-based culturing and quantitative polymerase chain reaction (qPCR)/real-time qPCR (RT-qPCR); again, protocols are explicated in the Supplementary Information. Approximate phage stock titers are reported in Table S5. The Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) checklist for this study is included in the Supplementary Information.

Flow cytometric analysis

Working stocks of SYBR Green I and SYBR Gold stains (both obtained from ThermoFisher Scientific as 10,000X concentrates in dimethyl sulfoxide (DMSO)) were prepared in advance by dilution in TE buffer and stored in aliquots at -20°C until use. FVM analysis was carried out using the 488 nm (blue) solid-state laser on a NovoCyte 2070V Flow Cytometer coupled with a NovoSampler Pro autosampler (Agilent). Green fluorescence (FITC) intensity was collected at 530 ± 30 nm; forward and side scatter (FSC and SSC) intensities were collected as well. In all experiments, a 10-mL volume of each sample was measured using the lowest instrument flowrate (5 mL/min) and a FITC = 800 threshold. For the optimization experiments, 10 mL of an unstained control was run after each sample. The instrument was flushed in between each sample and control by running 150 mL of 1x NovoClean solution (Agilent) followed by 150 mL of MQ water through the SIP at the highest instrument flowrate (120 mL/min). Instrument performance was ensured by performing the instrument’s built-in quality control (QC) test at least monthly.

Optimization design and protocols

We created a 2_IV^6-2 fractional factorial design to assess main and interaction effects of six two-level factors on nucleic-acid staining of T4 for FVM analysis. Table 1 summarizes the factors and factor levels tested in the optimization experiment, with corresponding rationales. For these experiments, previously prepared T4 stock aliquots (see above) were thawed immediately before each round of testing and diluted an additional 10x in the appropriate medium prior to staining. Samples were incubated in the dark following stain addition; incubation at higher temperatures was performed by immersion in a water bath.

Table S5 presents the matrix of experiments included in the design. Factors were strategically assigned to avoid confounding main effects with interaction effects thought likely to prove significant. Corresponding estimation structures are provided in Table S6, with main effects and two-way interaction effects in bold. Four complete rounds of the experimental design were performed; run order was randomized within each round.

Optimization data analysis

The number of events in control (unstained) samples was subtracted from the number of events in the corresponding stained samples. Data were bounded at 0 ≤ SSC ≤ 1,000 and 800 (threshold level) ≤ FITC ≤ 10,000 and visualized using FlowJo^TM 10 software (Becton Dixon & Company) as pseudocolor density plots to assess whether a distinct target population was visible. The software’s “Create Gates on Peaks” function was used to set the bounds of the target population on FITC for these runs, after which the number, MFI, and CV of all target particles were calculated. The FrF2 package^{^[1]} was used in Rstudio Desktop (version 2021.09.01) to quantify main and two-way interaction effects of each factor tested in the optimization. The FrF2 analysis was performed first on all events from all runs, and second on target events from glutaraldehyde-treated runs.

Mixed-target and environmental-spike data generation

Previously prepared stock phage (T4 and φ6) solutions were treated using the optimized protocol described in the main text. 20 mL of T4 stock (10^-3 dilution) and 20 mL of φ6 stock (10^-3 dilution) were added to 1 mL of an 0.2-mm diameter fluorescent polystyrene bead suspension, 2 mL of an 0.5-mm diameter bead suspension, and 15 mL of PBS. 2x, 4x, 8x, and 16x dilutions of this mixed-target solution were also generated. 4 mL of an 0.8-mm diameter bead suspension was added to each dilution as a constant-concentration reference.

Separately, tertiary treated effluent from the UC Davis Wastewater Treatment Plant (collected as a grab sample, transported to the lab on ice, stored at 4°C, and used within 24 hours of collection) was syringe-filtered at 0.2 mm, diluted 10x in Milli-Q water, and spiked with the same mixed-target solution described above, minus the φ6 and 0.5 mm beads. 10 replicates of each solution dilution were analyzed via FVM as described above. Table S2 provides expected concentrations of each target in the mixed-target and environmental-spike solutions per effective volume (10 mL) analyzed.

Mixed-target and environmental-spike data analysis

Manual gating was performed on experimental data from the 1x mixed-target dilution experimental data plotted as SSC vs. FITC log-log scale pseudocolor density plots. Set gates were then applied to the remaining mixed-target and environmental-spike data. Density-based clustering was performed as follows. We applied a log transformation to the FSC, SSC, and FITC data collected from each replicate, then standardized the features by centering and rescaling to standard deviation 1. We used Rstudio (version 2021.9.1.372) to apply the OPTICS implementation available in the dbscan package (Hahsler et al. 2019). Distance between points was measured using Euclidean distance. Based on Sander et al. (1998), we set k equal to 2*[dimensionality of the dataset], or 6 in this case We set ε equal to 0.1 to bound the algorithm and reduce computational time. We used MATLAB^® software (version R2021a; MathWorks) to inspect reachability plots of the OPTICS-ordered data for manual extraction. We applied the opticskxi package available in R (Charlton 2019) for automated extraction, using a maximum iteration number of 1,000 and a maximum cluster number (k) of six for the mixed-target data and four for the environmental-spike data. For the mixed-target data, the minimum-points-per-cluster (MinPts) parameter started at 8,000 for the 1x dilution and was halved for each subsequent dilution. For the environmental-spike data, MinPts was set at 8,000. k for the mixed-target data was selected based on the number of clusters identified through manual gating; k for the environmental-spike data was selected based on the three clusters identified through manual gating, plus a fourth to provide the algorithm room to identify a cluster corresponding to background in the wastewater matrix. The MinPts parameters were selected based on the lowest expected target event count.Analysis scripts are available at https://github.com/hsafford/FCMClustering2022.

Acknowledgements

This research was supported by the U.S. Bureau of Reclamation under Award R18AC00106. We are grateful for the following UC Davis core resource centers for support on various aspects of the work: the Real-Time PCR Research & Diagnostics Core Facility, the Flow Cytometry Shared Resource, and the DataLab. We are also grateful to the following individuals: Yutong Zhang and Erica Koopman-Glass for assistance with laboratory tasks, Bridget McLaughlin and Jonathan Van Dyke for flow cytometry training, David Rocke for advising on fractional factorial experimental design, Edlin Escobar and Minji Kim for helping develop the qPCR protocols, Jonathan Herman for early assistance with cluster analysis, and Nick Ulle and Pamela Reynolds for later help on cluster analysis.

Author contributions

H.S. drafted the proposal for the USBR grant, helped guide the experimental design, performed the bulk of laboratory work, oversaw project components related to cluster analysis, and drafted the manuscript. M.J. performed supplemental laboratory work, provided insight on cluster analysis, and helped revise the manuscript. H.B. served as PI for the USBR grant and for the project as a whole and helped revise the manuscript.

Competing interests

The authors declare no competing interests.

National Research Council. Water Reuse: Potential for Expanding the Nation’s Water Supply through Reuse of Municipal Wastewater (National Academies Press, Washington, DC, 2012).
Olivieri, A. et al. Evaluation of the Feasibility of Developing Uniform Water Recycling Criteria for Direct Potable Reuse. Available at https://www.waterboards.ca.gov/drinking_water/certlic/drinkingwater/documents/rw_dpr_criteria/app_a_ep_rpt.pdf (2016).
Arnold, R. G. et al. Direct potable reuse of reclaimed wastewater: It is time for a rational discussion. Rev. Environ. Health 27, 197–206 (2012).
California State Water Resources Control Board. Investigation on the Feasibility of Developing Uniform Water Recycling Criteria for Direct Potable Reuse. Available at waterboards.ca.gov/drinking_water/certlic/drinkingwater/documents/rw_dpr_criteria/final_report.pdf. (2016).
Safford, H. R. & Bischel, H. N. Flow cytometry applications in water treatment, distribution, and reuse: A review. Water Res. 151, 110–133 (2019).
Dlusskaya, E., Dey, R., Pollard, P. C. & Ashbolt, N. J. Outer Limits of Flow Cytometry to Quantify Viruses in Water. Environ. Sci. Tech. Water 1, 1127–1135 (2021).
Brussaard, C. P. D., Marie, D. & Bratbak, G. Flow cytometric detection of viruses. J. Virol. Methods 85, 175–182 (2000).
Reyes, J. L. Z. & Aguilar, H. C. Flow Virometry as a Tool to Study Viruses. Methods 134–135, 87–97 (2018).[9] Lippé, R. Flow Virometry: a Powerful Tool To Functionally Characterize Viruses. J. Virol. 92 (2018).
Brussaard, C. P. D. Optimization of Procedures for Counting Viruses by Flow Cytometry. Appl. Env. Microbiol. 70 (2004).
Huang, X. et al. Evaluation of methods for reverse osmosis membrane integrity monitoring for wastewater reuse. J. Water Process Eng. 7, 161–168 (2015).
Nescerecka, A., Hammes, F. & Juhna, T. A pipeline for developing and testing staining protocols for flow cytometry, demonstrated with SYBR Green I and propidium iodide viability staining. J. Microbiol. Methods 131, 172–180 (2016).
Bashashati, A. & Brinkman, R. R. A Survey of Flow Cytometry Data Analysis Methods. Adv. Bioinform. (2009).
Antony, J. 7 – Fractional Factorial Designs. Design of Experiment for Engineers and Scientists (Elsevier, 2016).
Dlusskaya, E. A., Atrazhev, A. M. & Ashbolt, N. J. Colloid chemistry pitfall for flow cytometric enumeration of viruses in water. Water Res. X 2 (2019).
Zhang, Y. et al. (2015). The influence of ionic strength and mixing ratio on the colloidal stability of PDAC/PSS polyelectrolyte complexes. Soft Matter 11.
Ankerst, M., Breunig, M. M., Kriegel, H.-P. & Sander, J. OPTICS: ordering points to identify the clustering structure. ACM SIGMOD Record 28, 49–60 (1999).
Charlon, T. opticskxi: OPTICS K-Xi Density-Based Clustering. R package version 0.1 (2019).
Chalron, T. opticskxi: OPTICS K-Xi Density-Based Clustering. Available at https://cran.r-project.org/web/packages/opticskxi/vignettes/opticskxi.pdf (n.d.).
Safford, H. R. & Bischel, H. N. Performance comparison of four commercially available cytometers using fluorescent, polystyrene, submicron-scale beads. Data in brief 24, 103872 (2019).
Bonilla, N. et al. Phage on tap–a quick and efficient protocol for the preparation of bacteriophage laboratory stocks. PeerJ 4 (2016).

Table 1. Factors and levels included in the fractional factorial experimental design for staining optimization.
Factor	Level 1	Level 2	Rationale
Stain type (which nucleic-acid stain was used?)	SYBR Green I	SYBR Gold	Both stains are widely used for applications of FVM to microorganisms. Huang et al. (2015) deemed SYBR Gold more effective for FVM-based analysis of waterborne viruses, while Brussaard (2004) reported better results with SYBR Green.
Diluent (what was the sample diluted in?)	Milli-Q (MQ) water	Tris-EDTA (TE) buffer	Both SYBR Green I and SYBR Gold are pH-sensitive, so using a buffer instead of MQ water as a diluent may improve results.
Dye concentration (what was the concentration of dye in the final sample?)	5 x 10^-5 times sample volume	1 x 10^-4 times sample volume	Level 1 concentration used by Brussaard (2004); Level 2 concentration used by Huang et al. (2015).
Staining temperature (what temperature was the sample stained at?)	25°C	50°C	Huang et al. (2015) stained at room temperature (~25°C) while Brussaard (2004) stained at 80°C. Multiple studies have found that an elevated temperature can promote the staining reaction, but an 80°C staining temperature may be unrealistic for applied water-treatment and -reuse scenarios. An intermediate temperature (50°C) was selected as the “high” staining temperature for comparison with room-temperature staining.
Staining time (how long was the sample stained for?)	1 min	15 min	Huang et al. stained for 15 minutes while Brussaard (2004) stained for 10 minutes. Our preliminary results (not reported) suggested that a prolonged staining time may not be necessary to achieve good results. If a short staining time is workable, it would increase the potential of FVM as a real-time technique for water-quality monitoring.
Glutaraldehyde (was the sample treated with glutaraldehyde prior to staining?)	No	Yes, glutaraldehyde added at a final concentration of 0.5%	Both Huang et al. (2015) and Brussaard (2004) found that adding glutaraldehyde significantly improved the detectability of waterborne viruses by FVM. However, glutaraldehyde addition also closes off certain pathways for validating FVM results (e.g., using a flow cytometric cell sorter to separate target populations and then using culture-based methods to verify the identity of the target). This factor was assessed to determine whether glutaraldehyde addition is essential for our samples.

(Not answered)

Saffordetal.SInpjfinal.docx

Download PDF

Journal Publication

published 04 Apr, 2023

Read the published version in npj Clean Water →

Editorial decision: revise
16 Jun, 2022
Review #3 received at journal
10 Jun, 2022
Review #2 received at journal
10 Jun, 2022
Reviewer #3 agreed at journal
31 May, 2022
Reviewer #2 agreed at journal
31 May, 2022
Reviewer #1 agreed at journal
28 May, 2022
Reviewers invited by journal
27 May, 2022
Submission checks completed at journal
20 May, 2022
First submitted to journal
19 May, 2022
Editor assigned by journal
19 May, 2022

You are reading this latest preprint version

Flow virometry for water-quality assessment: Protocol optimization for a model virus and automation of data analysis

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Results

Optimizing staining through fractional factorial experimental design

Automating data analysis through density-based clustering

Clustering approach

Mixed-target experiment

Environmental-spike experiment

Discussion

Materials And Methods

Phage stock preparation

Phage stock quantification

Flow cytometric analysis

Optimization design and protocols

Optimization data analysis

Mixed-target and environmental-spike data generation

Mixed-target and environmental-spike data analysis

Declarations

Acknowledgements

Author contributions

Competing interests

References

Tables

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1