Automated environmental metagenomics using Oxford Nanopore sequencing

doi:10.21203/rs.3.rs-4745570/v1

Download PDF

Research Article

Automated environmental metagenomics using Oxford Nanopore sequencing

https://doi.org/10.21203/rs.3.rs-4745570/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Long-read sequencing has revolutionised metagenomics through improved metagenome assembly, taxonomic classification and functional characterisation. Automation can enhance the throughput, reproducibility, and accuracy of library preparation. However, the validation of automated library preparation protocols remains undetermined for metagenomic workflows, which are particularly sensitive to methodological perturbation. Here, we compare long-read metagenomic sequencing of environmental samples through parallel manual and automated protocols. Despite minor variation in read lengths and classification rate, minimal differences in microbial community structure were identified between manual and automated libraries. These findings demonstrate the utility of automation for high-throughput long-read metagenomics, with broad applicability to automated long-read sequencing.

Automation

long-read sequencing

metagenomics

Oxford Nanopore

library preparation

soil

Long-read sequencing has transformed our understanding of microbiomes through improved genome assembly, functional characterisation and taxonomic classification accuracy and precision (1–3), leading to its rapid expansion in metagenomic research (4). The enhanced capacity for multiplexing samples on Oxford Nanopore Technologies (ONT) platforms, as well as increases in potential yield and reduced costs of ONT sequencing (4), has improved the potential throughput of long read metagenomics. However, multiplexed ONT library preparation protocols involve many pipetting steps, requiring considerable hands-on time and introducing potential for human error and inter-sample variation.

Automation using liquid handling robotics therefore has the potential to enhance the throughput, reproducibility, and accuracy of sequencing library preparation (5, 6). However, validation of liquid handling automation for ONT protocols remains limited in the literature, with studies limited to high throughput amplicon sequencing of SARS-CoV-2 (7). Along with clinical applications, validation of sample preparation processes is particularly important in metagenomic workflows due to the sensitivity of these analyses to perturbations from methodological bias (8, 9), which can impact the interpretation of study results (10, 11).

Here, we compared long-read metagenomic sequencing of environmental samples using either manual or automated ONT library preparation. We utilised the Bravo Automated Liquid Handing Platform (Agilent Technologies, UK), which has a 96-channel pipetting head for simultaneous execution of liquid handling steps across a 96-well plate. ONT sequencing libraries were prepared in parallel manually and on the Bravo using 24 DNA samples, extracted from soils with a range of habitat and geochemical traits. Analysis of metagenomic data revealed that while there were differences in read length, classification rate and alpha diversity between manual and automated libraries, there was minimal impact on the observed microbial community composition. Considering the benefits of reduced hands-on time, reproducibility and reliability, automated library preparation using the Bravo should be considered for increasing throughput of long-read sequencing.

Sequencing read metrics were compared between automated and manual library preparations (Fig. 1). No significant difference in sequencing depth was identified between paired libraries (Fig. 1a). However, read length was found to be significantly longer from manually prepared libraries, with a mean difference in average length and N50 of 756 bp and 785 bp, respectively (Fig. 1b-c). However, when reads were taxonomically assigned using Kraken2, a small but significantly higher percentage of reads was classified from automated libraries (Fig. 1d), with a mean difference in classification rate of only 0.5% (excluding the outlying sample from the pasture soil).

Differences in read length are likely caused by variation in bead purification steps between manual and automated protocols. While shaking to elute DNA from magnetic beads was carried out at 37 ˚ C in the manual protocol to improve elution of long fragments, as recommended in the ONT protocol, simultaneous temperature control and shaking was not possible on the Bravo. This may have caused reduced efficiency of long DNA fragment elution for the automated libraries. Meanwhile, the taxonomic classification rate may have been slightly improved in automated libraries through increased efficacy of DNA purification, leading to reduction in PCR artefacts.

Figure 1. Sequencing read metrics.

Boxplots comparing (a) sequencing read depth, (b) read length N50, (c) mean read length and (d) percentage of reads classified by Kraken2 from manual or automated library preparation. Grey lines indicate paired samples prepared in parallel and results of Wilcoxon signed-rank tests are displayed.

Ecological analyses were performed on the results of taxonomic classification at a Family level (Fig. 2). A significant increase in alpha diversity, measured as both Shannon-Weaver index and family richness, was observed in libraries prepared on the Bravo (Fig. 2a-b), which was mostly the result of the presence of rare taxa (Fig. 2d). This may be explained by improved efficacy of automated DNA purification leading to better amplification of rare DNA fragments. Detection of rare microorganisms in complex samples is an important objective of many metagenomic studies, due to their importance to ecosystem functions and community dynamics (12, 13), for which the increased diversity of automated libraries observed here could provide a benefit.

Variation in microbial community structure was investigated through calculation of Bray-Curtis distances with rarefaction (Fig. 2c). Unsurprisingly, soil type was found to explain the vast majority of variation in community composition between the samples (PERMANOVA, R² = 0.92, p < 0.001), while library preparation method or the interaction between these variables showed no significant effect (Fig. 2c). To support this, analysis within each soil type found no significant effect of library preparation method on microbial community composition at any taxonomic rank (PERMANOVA, p > 0.05). This indicates that minimal differences in microbial community composition were observed between manual and automated libraries, with no pattern to this variation within each soil type. Such consistency is crucial if the results from manual and automated library preparations are to be compared, considering the importance of reproducibility for interpretation of metagenomic data within and between studies.

Figure 2. Family level microbial community analysis.

Boxplots comparing alpha diversity metrics calculated at the Family taxonomic rank, including (a) Shannon-Weaver index and (b) family richness, with grey lines indicating paired samples prepared in parallel and Wilcoxon signed-rank test results displayed. (c) Non-metric Multidimensional Scaling (nMDS) plot based on Bray-Curtis distances, showing variation between the observed microbial community structure of manual and automated libraries from the four soil types. (d) Stacked bar chart showing the relative abundance of microbial families across the four soil types. Legend shows colours corresponding to the top 20 families.

Demonstrating reproducibility is especially important for analysis of environmental samples, such as soil, that are particularly vulnerable to perturbation by methodological variation (8, 10). The soil matrix exhibits high spatial heterogeneity of microorganism distribution (8), as well as containing an abundance of inhibitors posing a challenge to molecular genetic analysis. Furthermore, microbial ecologists wish to characterise soil communities from a field to a continental scale (14, 15), while most soil nucleic acid extraction methods require comparably minuscule input quantities (250 µg-2 g). Considering these factors, and the statistical analysis required for deciphering differential abundance, sufficient sampling sizes and replication are crucial to uncover patterns in microbial community composition and function between sites and experimental treatments (8, 16, 17). Automation has the potential to address these challenges of increased throughput and maintain reproducibility.

Despite the identification of minor differences in sequencing metrics and detection of rare taxa between manual and automated protocols, automated library preparation had minimal impact on the microbial community characterised from parallel metagenomic analysis of soil DNA samples. Considering the benefits of reduced hands-on time, reproducibility and reliability, automated library preparation should be considered suitable for improving throughput of ONT long-read sequencing.

Soil samples from four habitats were collected and characterised as previously described (REF). DNA was extracted using the DNeasy® PowerSoil® Pro Kit (Qiagen, UK), with 4–8 extractions from each soil type. DNA input into library preparations was normalised to 1 µg. Libraries were prepared using the Ligation Sequencing Kit (SQK-LSK114; ONT, UK) and PCR Barcoding Expansion 96 (EXP-PBC096; ONT, UK), with parallel preparations carried out manually, following manufacturer’s protocol (Additional file 1), and automated on the Bravo (detailed in Additional file 2) on the same samples. Between 15–45 ng DNA was input into PCR barcoding reactions. Parallel preparations concluded with normalised pools of barcoded libraries, which were subsequently pooled together and sequenced on the same R10.4.1 PromethION flowcell.

Reads were basecalled and demultiplexed using guppy v7.1.4 and adapters were trimmed using dorado v0.6.0. Taxonomic classification was carried out using Kraken2 v 2.1.2 against the NCBI nr database (downloaded on 09/03/24), using confidence score threshold of 0.05 to reduce the occurrence of false positives. Count tables were compiled using MEGAN Ultimate Edition v6.25.6 and filtered to remove taxa occurring at an abundance of < 0.1% across all samples. Ecological statistics were calculated using the vegan v2.6-4 R package. Bray-Curtis distances were calculated using the avgdist function with subsampling to the minimum classified read counts across samples.

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Availability of data and materials

The datasets generated and analysed during the current study are available in the NCBI Sequence Read Archive repository, BioProject accession PRJNA1112790.

Competing interests

KR and QH are employees of Agilent Technologies. All other authors declare that they have no competing interests.

Funding

This research was supported by Shell Research Ltd (CW648947-PT34767).

Authors' contributions

HTC and RKT conceptualised the study. LW, GRJ, KR and QH optimised protocols and performed laboratory experimentation. HTC processed and analysed the sequencing data, prepared the figures and wrote the manuscript. RKT edited the manuscript. All authors read and approved the final manuscript.

Acknowledgements

Authors wish to thank Dr Sam Bridgewater and the Clinton Devon Estate for access to soils across different habitats and Dr Tomasz Dobrzycki for advice on Oxford Nanopore Technologies protocols.

Sereika M, Kirkegaard RH, Karst SM, Michaelsen TY, Sørensen EA, Wollenberg RD, et al. Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing. Nat Methods. 2022 Jul;19(7):823–6.
Portik DM, Brown CT, Pierce-Ward NT. Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets. BMC Bioinformatics. 2022 Dec 13;23(1):541.
Chen Y, Nie F, Xie SQ, Zheng YF, Dai Q, Bray T, et al. Efficient assembly of nanopore reads via highly accurate and intact error correction. Nat Commun. 2021 Jan 4;12(1):60.
Agustinho DP, Fu Y, Menon VK, Metcalf GA, Treangen TJ, Sedlazeck FJ. Unveiling microbial diversity: harnessing long-read sequencing technology. Nat Methods. 2024 Apr 30;1–13.
Socea JN, Stone VN, Qian X, Gibbs PL, Levinson KJ. Implementing laboratory automation for next-generation sequencing: benefits and challenges for library preparation. Front Public Health. 2023 Jul 13;11.
Hess JF, Kohl TA, Kotrová M, Rönsch K, Paprotka T, Mohr V, et al. Library preparation for next generation sequencing: A review of automation strategies. Biotechnol Adv. 2020 Jul 1;41:107537.
Coope RJN, Matic N, Pandoh PK, Corbett RD, Smailus DE, Pleasance S, et al. Automated Library Construction and Analysis for High-Throughput Nanopore Sequencing of SARS-CoV-2. J Appl Lab Med. 2022 Sep 1;7(5):1025–36.
Lombard N, Prestat E, van Elsas JD, Simonet P. Soil-specific limitations for access and analysis of soil microbial communities by metagenomics. FEMS Microbiol Ecol. 2011 Oct 1;78(1):31–49.
Nearing JT, Comeau AM, Langille MGI. Identifying biases and their potential solutions in human microbiome studies. Microbiome. 2021 May 18;9(1):113.
Changey F, Blaud A, Pando A, Herrmann AM, Lerch TZ. Monitoring soil microbial communities using molecular tools: DNA extraction methods may offset long‐term management effects. Eur J Soil Sci. 2021 Mar;72(2):1026–41.
Schloss PD. Identifying and Overcoming Threats to Reproducibility, Replicability, Robustness, and Generalizability in Microbiome Research. mBio. 2018 Jun 5;9(3):10.1128/mbio.00525-18.
Shade A, Jones SE, Caporaso JG, Handelsman J, Knight R, Fierer N, et al. Conditionally Rare Taxa Disproportionately Contribute to Temporal Changes in Microbial Diversity. mBio. 2014 Jul 15;5(4):10.1128/mbio.01371-14.
Xiong C, He JZ, Singh BK, Zhu YG, Wang JT, Li PP, et al. Rare taxa maintain the stability of crop mycobiomes and ecosystem functions. Environ Microbiol. 2021;23(4):1907–24.
Leff JW, Jones SE, Prober SM, Barberán A, Borer ET, Firn JL, et al. Consistent responses of soil microbial communities to elevated nutrient inputs in grasslands across the globe. Proc Natl Acad Sci. 2015 Sep;112(35):10967–72.
Gravuer K, Eskelinen A, Winbourne JB, Harrison SP. Vulnerability and resistance in the spatial heterogeneity of soil microbial communities under resource additions. Proc Natl Acad Sci. 2020 Mar 31;117(13):7263–70.
Prosser JI. Replicate or lie. Environ Microbiol. 2010;12(7):1806–10.
Baker KL, Langenheder S, Nicol GW, Ricketts D, Killham K, Campbell CD, et al. Environmental and spatial characterisation of bacterial community composition in soil to inform sampling strategies. Soil Biol Biochem. 2009 Nov 1;41(11):2292–8.

Competing interest reported. KR and QH are employees of Agilent Technologies. All other authors declare that they have no competing interests.

AdditionalFile1.pdf
Supplemental Material Additional File 1. Protocol checklist for ligation sequencing V14 with PCR barcoding (SQK-LSK114 with EXP-PBC001 or EXP-PBC096) on PromethION.
AdditionalFile2.pdf
Additional File 2. ONT General Ligation Agilent Bravo Option B Automated User Guide

Download PDF

Reviewers invited by journal
24 Jul, 2024
Editor invited by journal
16 Jul, 2024
Editor assigned by journal
16 Jul, 2024
Submission checks completed at journal
16 Jul, 2024
First submitted to journal
15 Jul, 2024

You are reading this latest preprint version

Automated environmental metagenomics using Oxford Nanopore sequencing

Status:

Version 1

Abstract

Figures

1 Background

2 Results and Discussion

3 Conclusion

4 Methods

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1