Sequencing read metrics were compared between automated and manual library preparations (Fig. 1). No significant difference in sequencing depth was identified between paired libraries (Fig. 1a). However, read length was found to be significantly longer from manually prepared libraries, with a mean difference in average length and N50 of 756 bp and 785 bp, respectively (Fig. 1b-c). However, when reads were taxonomically assigned using Kraken2, a small but significantly higher percentage of reads was classified from automated libraries (Fig. 1d), with a mean difference in classification rate of only 0.5% (excluding the outlying sample from the pasture soil).
Differences in read length are likely caused by variation in bead purification steps between manual and automated protocols. While shaking to elute DNA from magnetic beads was carried out at 37 ˚ C in the manual protocol to improve elution of long fragments, as recommended in the ONT protocol, simultaneous temperature control and shaking was not possible on the Bravo. This may have caused reduced efficiency of long DNA fragment elution for the automated libraries. Meanwhile, the taxonomic classification rate may have been slightly improved in automated libraries through increased efficacy of DNA purification, leading to reduction in PCR artefacts.
Figure 1. Sequencing read metrics.
Boxplots comparing (a) sequencing read depth, (b) read length N50, (c) mean read length and (d) percentage of reads classified by Kraken2 from manual or automated library preparation. Grey lines indicate paired samples prepared in parallel and results of Wilcoxon signed-rank tests are displayed.
Ecological analyses were performed on the results of taxonomic classification at a Family level (Fig. 2). A significant increase in alpha diversity, measured as both Shannon-Weaver index and family richness, was observed in libraries prepared on the Bravo (Fig. 2a-b), which was mostly the result of the presence of rare taxa (Fig. 2d). This may be explained by improved efficacy of automated DNA purification leading to better amplification of rare DNA fragments. Detection of rare microorganisms in complex samples is an important objective of many metagenomic studies, due to their importance to ecosystem functions and community dynamics (12, 13), for which the increased diversity of automated libraries observed here could provide a benefit.
Variation in microbial community structure was investigated through calculation of Bray-Curtis distances with rarefaction (Fig. 2c). Unsurprisingly, soil type was found to explain the vast majority of variation in community composition between the samples (PERMANOVA, R2 = 0.92, p < 0.001), while library preparation method or the interaction between these variables showed no significant effect (Fig. 2c). To support this, analysis within each soil type found no significant effect of library preparation method on microbial community composition at any taxonomic rank (PERMANOVA, p > 0.05). This indicates that minimal differences in microbial community composition were observed between manual and automated libraries, with no pattern to this variation within each soil type. Such consistency is crucial if the results from manual and automated library preparations are to be compared, considering the importance of reproducibility for interpretation of metagenomic data within and between studies.
Figure 2. Family level microbial community analysis.
Boxplots comparing alpha diversity metrics calculated at the Family taxonomic rank, including (a) Shannon-Weaver index and (b) family richness, with grey lines indicating paired samples prepared in parallel and Wilcoxon signed-rank test results displayed. (c) Non-metric Multidimensional Scaling (nMDS) plot based on Bray-Curtis distances, showing variation between the observed microbial community structure of manual and automated libraries from the four soil types. (d) Stacked bar chart showing the relative abundance of microbial families across the four soil types. Legend shows colours corresponding to the top 20 families.
Demonstrating reproducibility is especially important for analysis of environmental samples, such as soil, that are particularly vulnerable to perturbation by methodological variation (8, 10). The soil matrix exhibits high spatial heterogeneity of microorganism distribution (8), as well as containing an abundance of inhibitors posing a challenge to molecular genetic analysis. Furthermore, microbial ecologists wish to characterise soil communities from a field to a continental scale (14, 15), while most soil nucleic acid extraction methods require comparably minuscule input quantities (250 µg-2 g). Considering these factors, and the statistical analysis required for deciphering differential abundance, sufficient sampling sizes and replication are crucial to uncover patterns in microbial community composition and function between sites and experimental treatments (8, 16, 17). Automation has the potential to address these challenges of increased throughput and maintain reproducibility.