DNA yield and quality of extracted DNA
In our assessment, a consistent starting cell input (~ 1.2🞨109 CFU mL–1) was used for all eight pathogenic bacteria (Table S1). Each bacterial sample was processed for DNA extraction using the following four different DNA isolation kits (Table 1). The quantity and quality of purified DNA were measured and shown in Fig. 1. Overall, ZymoBIOMICS™ DNA Miniprep Kit (ZM) with bead-beating step demonstrated a significant increase in DNA yield across most of the tested pathogenic bacterial strains, ranging from 20.6–235.3 ng µL–1 (Table S2). However, it should be noted that for Enterococcus faecium (Efa) and Streptococcus suis (Ssu), the Nanobind CBB Big DNA Kit (NB) yielded higher amounts of DNA, with values of 63.5 ± 10.5 ng µL–1 and 116.8 ± 17.3 ng µL–1, respectively, compared to ZM, Fire Monkey HMW-DNA Extraction Kit (FM) and Roche MagNaPure 96 system (RO). Notably, there is no significant difference in the DNA concentration from Streptococcus agalactiae (Sag) among the DNA isolation kits (Fig. 1, Table S2).
Table 1
Summary of DNA isolation kit features used in this study
Key features | ZymoBIOMICS™ DNA Miniprep Kit | Nanobind CBB Big DNA Kit | Fire Monkey HMW-DNA Extraction Kit | Roche MagNaPure 96 |
Manufacturer | ZymoBIOMICS | PacBio | Revolugen | Roche |
Abbreviation | ZM | NB | FM | RO |
Cat. number/Lot | D4300 | NB-900-001-01 | 0030A | – |
No. of reactions | 50 | 20 | 10 | 96 |
Principle | Spin column | Magnetic disk | Spin column | Magnetic glass Particle |
Bead beating | Yes | No | No | No |
Expansion reagents requirements | No | Yes | Yes | No |
Estimated time per extraction (hours)a | 4 | 6 | 5 | 1 |
Estimated cost per sample (USD)b | 8.42 | 60.50 | 61.04 | 6.34 |
Comprehensive time and costc | 0.09 | 0.99 | 0.83 | 0.02 |
aCompletion time based on processing 24 samples and included time taken to pre-treat samples with enzymatic digestion. |
bEstimated cost per sample indicated in the table was converted from the Thai’s currency (THB) to US Dollar (USD). |
cComprehensive time and cost were calculated as: (estimated cost per extraction of any one method/maximum estimated cost among four methods) × (estimated time per extraction of any one method/maximum estimated time among four methods). |
The absorption spectra were subsequently investigated to assess the purity of DNA samples and any contaminants. Acceptable values for pure DNA typically are within the range of 1.8–2.2 for A260/A280 and ≥ 2.0 for A260/A230 ratio. The results showed that all extraction kits exhibited lower values within the desired range (1.2–1.7 of A260/280) for all gram-positive pathogenic bacteria. The extraction kit, FM resulted in A260/A280 ratio ≤ 1.8 for gram-negative bacteria, suggesting a potential presence of contaminants in the DNA samples. Nonetheless, three extraction kits, ZM, NB, and RO, demonstrated acceptable A260/A280 ratios for all gram-negative pathogenic bacteria, except FM which resulted in ≤ 1.8. Among the tested extraction kits, ZM only achieved an acceptable ratio of A260/230 ratio (≥ 2.0) for both gram-positive and negative pathogenic bacteria when compared to three extraction kits (Fig. 1, Table S3).
Following DNA extraction, gel electrophoresis was performed using TapeStation to visually examine the size distribution of the obtained DNA fragments. In general, all of the DNA extraction kits provided a single DNA band that corresponded to the reference 48.5 kb gDNA ladder. Genomic DNA extracted by the FM resulted in the same size DNA fragments as the RO extraction kit, while the ZM generated the smallest size distribution and a faint smear of extracted DNA. Remarkably, the NB extraction kit resulted in greater DNA size than the reference gDNA ladder for both gram-positive and gram-negative pathogenic bacteria, except for Sag and Stapylococcus aureus (Sau) strains which the ZM gave the single DNA band (Fig. S1).
Sequencing statistics and assembled genomes evaluation
To evaluate the influence of commercial DNA isolation kits on bacterial genome assembly, a total of twenty-four samples, comprising three independent replicates of eight pathogenic bacteria, were pooled together. The pooled samples were then sequenced on the same flow cell (R9.4.1) using GridION sequencer. The sequencing process resulted in the generation of twelve assemblies for each strain, based on the utilization of four DNA extraction kits with three replicates each. The sequencing runs generated 10.89, 7.71, 13.96, and 14.01 GB of sequenced DNA, which was extracted by ZM, NB, FM and RO, respectively. Among the four strains of gram-positive pathogenic bacteria, the NB generate read length N50 in excess of 6,000 bp for both Efa (N50 = 8,036 bp) and Ssu (N50 = 6,304 bp) strains while FM, ZM and RO gave the maximum read length of 5,978 bp, 3,528 bp, and 4,146 bp for Efa, respectively. Similarly, both NB and FM extraction kits yielded the longest raw read length in both Aba and Pae, except for Kpn and Sgd of which either FM- or RO-extracted DNA respectively showed the highest N50 at 8,751 and 9,405 bp (Table S4).
Flye assembled genome using all sequencing reads was measured based on total length, the number of contigs and contig N50. In this study, the number of contigs varied depending on the extraction kits used and the bacterial strains. Remarkably, the RO failed to produce sufficient sequencing reads for successful genome assembly in the Sag strain, while the NB had the same outcome for the Sau strain. However, both the RO and FM exhibited a noteworthy reduction in the number of contigs for most pathogenic bacteria (6/8 strains, 75%), followed by ZM (5/8 strains, 62.5%) when comparing to reference genomes (Table 2, Fig. S2). The NB, on the other hand, exhibited lower success (3/8 strains, 37.5%) in terms of genome assembly performance as indicated in Tables 2 and Tables S5-S6. Despite ZM and NB demonstrating the lower performance in genome assembly, NB exhibited a significant advantage in specific instances, particularly with the Efa and Kpn strains of which the DNA extracted by NB achieved a considerably higher contig N50 value of 2,255,486 bp and 3,672,441 bp, surpassing ZM-extracted DNA assembly with contig N50 of 1,731,151 bp and 285,365 bp, respectively (Table 2).
Table 2
Genome assembly statistics for all sequencing reads of eight pathogenic bacteria extracted by four different benchmark DNA extraction kits; ZymoBIOMICS™ DNA Miniprep Kit (ZM), Nanobind CBB Big DNA Kit (NB), Fire Monkey High Molecular Weight (HMW) DNA Extraction Kit (FM) and MagNaPure 96 system (RO). The best assembly statistics from three independent replicates are shown. T, total contig; C, chromosome (chromosome contig); and P, plasmid (plasmid contig).
Genome features | Benchmark DNA extraction kits |
ZM | NB | FM | RO |
Enterococcus faecium (Efa): 1 chromosome, 5 plasmids |
Total length (bp) | 3,221,642 | 3,306,012 | 3,277,684 | 3,282,502 |
Contig no. | T (19), C (8), P (11) | T (8), C (3), P (5) | T (10), C (5), P (5) | T (14), C (8), P (6) |
Contig N50 (bp) | 1,731,151 | 2,255,486 | 977,103 | 1,231,442 |
Streptococcus agalactiae (Sag): 1 chromosome |
Total length (bp) | 2,115,530 | 2,016,690 | 2,115,566 | 48,164 |
Contig no. | T (1), C (1) | T (152), C (152) | T (1), C (1) | T (12), C (12) |
Contig N50 (bp) | 2,115,530 | 22,323 | 2,115,566 | 4,152 |
Streptococcus suis (Ssu): 1 chromosome |
Total length (bp) | 2,074,028 | 2,073,360 | 2,073,399 | 2,073,375 |
Contig no. | T (9), C (9) | T (1), C (1) | T (1), C (1) | T (1), C (1) |
Contig N50 (bp) | 500,472 | 2,073,360 | 2,073,399 | 2,073,375 |
Staphylococcus aureus (Sau): 1 chromosome, 2 plasmids |
Total length (bp) | 2,900,568 | 2,839 | 2,842,536 | 2,893,797 |
Contig no. | T (3), C (1), P (2) | T (1), C (1), P (0) | T (83), C (81), P (2) | T (3), C (1), P (2) |
Contig N50 (bp) | 2,836,691 | 2,839 | 61,800 | 2,836,954 |
Acinetobacter baumannii (Aba): 1 chromosome, 2 plasmids |
Total length (bp) | 3,759,537 | 3,753,359 | 3,750,497 | 3,759,646 |
Contig no. | T (3), C (1), P (2) | T (3), C (1), P (2) | T (3), C (1), P (2) | T (3), C (1), P (2) |
Contig N50 (bp) | 3,728,865 | 3,732,302 | 3,728,867 | 3,728,890 |
Klebsiella pneumoniae (Kpn): 1 chromosome, 3 plasmids |
Total length (bp) | 5,605,261 | 5,653,853 | 5,654,499 | 5,640,460 |
Contig no. | T (61), C (54), P (7) | T (9), C (6), P (3) | T (4), C (1), P (3) | T (4), C (1), P (3) |
Contig N50 (bp) | 258,365 | 3,672,441 | 5,383,369 | 5,383,367 |
Pseudomonas aeruginosa (Pae): 1 chromosome |
Total length (bp) | 6,411,819 | 6,395,284 | 6,411,840 | 6,411,817 |
Contig no. | T (1), C (1) | T (1), C (1) | T (1), C (1) | T (1), C (1) |
Contig N50 (bp) | 6,411,819 | 6,395,284 | 6,411,840 | 6,411,817 |
Salmonella spp. group D (Sgd): 1 chromosome, 1 plasmid |
Total length (bp) | 5,000,588 | 5,041,401 | 5,000,585 | 5,000,584 |
Contig no. | T (2), C (1), P (1) | T (7), C (5), P (2) | T (2), C (1), P (1) | T (2), C (1), P (1) |
Contig N50 (bp) | 4,783,456 | 2,789,139 | 4,783,456 | 4,783,456 |
In terms of plasmid recovery, the number of plasmids observed in gram-positive bacteria varied depending on the specific extraction kit used. Notably, Efa yielded the same number of plasmids as the reference genome (5 contigs) when using either NB or FM kits. While, higher plasmid contig numbers were found in RO (6 contigs) and ZM (11 contigs), due to fragmentation or replication of plasmid contigs by the Flye assembler (Table 2, Table S6). On the other hand, the FM and RO extraction kits exhibited inferior recovery of plasmid in most gram-negative bacteria, particularly noticeable in the case of Aba and Kpn, where only 2 and 3 plasmids were obtained for the NB kit, respectively. However, the plasmid contig numbers for Knp of Sgd were higher when extracted using ZM and NB kits, respectively (Table 2, Table S6). All assembled genomes were compared against the Genome Taxonomy Database (GTDB-Tk) taxonomic classification based on genomes comparison, using a 95% average nucleotide identity (ANI) values cutoff to group genomes belonging to the same species. The results of the genome-based taxonomic assignment revealed that all genomes displayed ANI values > 97%, indicating a high similarity to the described strains (Table S7).
Long-read coverage on bacterial genome assembly statistics
Flye genome assembly of subsampled read coverage, including 20×, 30×, 50×, 80×, and 100×, was evaluated by analyzing the raw read N50, number of contig, completeness, and contig contiguity through the observation of the N50 values, representing the minimum contig length needed to cover 50% of the genome. Overall, sequencing reads, particularly in Aba and Pae strains, generated by NB- (12 and 15 kb) and FM-extracted kits (10 and 11 kb) showed greater read lengths compared to those from ZM (4.3 and 5.9 kb) and RO (6.4 and 7.5 kb) kits (Fig. 3, Table S8). As the read coverage approached 30× to 50×, there was a reduction in the number of contigs for all assembled genomes. Furthermore, increasing the read coverage resulted in improved genome completeness, with values exceeding 97%. The reduction in the number of contigs for all assembled genomes depends on extraction kits. For instance, the chromosome numbers of Ssu, Aba, Kpn, and Pae were reduced to a single chromosome even with 20× read coverage when using extraction kits from FM and RO (Figs. 2 and 3).
For gram-positive bacteria, the assembly contiguity of assemblies improved substantially when the read coverage exceeded 30× coverages, with the exception for the ZM (31 contigs), which had a lower number of contig for DNA extracted by NB (1 contig), FM (1 contig) and RO (1 contig) for Ssu, for example. It is noteworthy that NB improved the contig N50 value from 1.43 Mb (20×) to 2.07 Mb (30×), while FM (2.07 Mb) and RO (2.07 Mb) demonstrated in achieving 99–100% genome coverage at the 20× read coverages. The RO yielded suboptimal reads for genome assembly in Sag strain, with ~ 25% genome completeness, while the NB extraction kit resulted in 0% genome completeness in the Sau strain (Figs. 2b, 2d and Table S8). In contrast, increasing the read coverage led to an improvement in the proportion of genome completeness in other gram-positive genomes across all extraction kits, ranging from approximately 98–100%. However, regardless of the increased coverage to 80×, ZM kit yielded unusually low contig N50 value of the Efa, Sag, and Ssu genomes compared to Sau strain (Fig. 2, Table S8).
For gram-negative bacteria, ZM and NB kits resulted in a high number of assembled contigs for Kpn and Sgd, even with an increase in read coverage to 80× (43 contigs and 3 contigs, respectively), compared to the other extraction kits. Nonetheless, both FM and RO demonstrated incremental improvement in genome completeness (99.4–100%), contig N50 values, and a reduced in the numbers of contigs for all four gram-negative pathogenic bacteria when compared to ZM and NB at the coverage of 30×. Unexpectedly, the RO extraction provided a high-quality genome of all gram-negative bacteria, even with a read coverage as low as 20× (Fig. 3, Table S8).
Long-read coverage on recovered plasmid number
Overall, a minimum read coverage of ≥ 50× was found to be sufficient for accurate plasmid recovery from most subsampled assemblies obtained using the four different extraction kits. However, the NB kit failed to recover the plasmid of Sau. When comparing the assembled genomes obtained from various extraction kits, similar numbers of recovered plasmids were observed. However, there were exceptions for Efa and Kpn genomes extracted by either the ZM extraction kit which resulted in a high number of the plasmids, except for the Aba and Sgd strains. Interestingly, an increase in read coverage resulted in a decreased number of reconstructed plasmids in the genome, such as particularly in the case of Kpn, where the number of contigs decreased from 11 contigs at 30× coverage to 5 contigs at 80× coverage when extracted using the ZM (Fig. 4, Table S9).