The Ion Torrent S5 and Illumina MiSeq provide alternative methods for researchers to study SARS-CoV-2 at a genomic level (30). This study compared the performance and data generated by the two WGS platforms. We hypothesized the sequencing methodologies that were used for the genomic surveillance of SARS-CoV-2 in a high throughput laboratory setting, generated sequencing data that differed when analysed with the same analysis pipeline. In a brief overview of the data generated for genomic coverage, clade assignments and quantification of mutations, we concluded that the platforms were similar in sequencing capabilities but differed in sequencing data outcomes. Our findings indicate that the Ion Torrent S5 produced sequences with higher genomic coverage over a broader range of viral loads and in a shorter time in comparison to the Illumina MiSeq. These findings were in agreement with previous comparison studies (25, 26, 30, 32).
In terms of the sequencing process, the Ion Torrent S5 and Illumina MiSeq followed a streamlined process flow, which allowed the platforms to display their adaptability in the WGS of SARS-CoV-2. The sequencing runtime for a sample set of 96 on the Illumina MiSeq was 36 hours, whereas on the Ion Torrent S5 it was seven hours using two Ion 540 sequencing chips. The difference in sequencing time allows a greater number of samples to be sequenced on the Ion Torrent S5 in a 36-hour period than on the Illumina MiSeq. Although the sequencing duration is much shorter on the Ion Torrent S5, it is important to note that the remaining processes such as amplification and library preparation consume a shorter duration on the Illumina MiSeq. The automated process for templating the manually prepared libraries onto the Ion Torrent sequencing chip is approximately 15,5 hours whereas the amplification and tagmentation step utilize less than 10 hours. These findings confirmed previous observations that found similarities in processing times with each respective workflow (30). With limited hands-on time, full automation with the Ion chef allows for faster turnaround times but limits sample numbers that can be processed at once (eight libraries per seven and a half hours on the Ion Chef) on the Ion Torrent S5. Thus making the Ion Torrents selling point of full automation its major downfall in a high-throughput laboratory setting. However, taking into consideration the manual route for library preparation, the Ion Torrent S5 is similar to the Illumina MiSeq in handling larger sample numbers provided a manual library preparation is followed. The Illumina MiSeq workflow can also be automated by including an external liquid handler further to reduce hands-on time and overall duration of processing. In addition, reagents used in the upstream preparation processes on the Illumina MiSeq can be further optimized and validated to accommodate greater sample numbers with reduced reagent volumes by miniaturisation of process workflow. In contrast, the use of full automation on the Ion Torrent S5 coupled with the Ion chef allows for limited handling of small sample numbers. It increases the overall turnaround time in a day-shift laboratory site. It is therefore feasible to incorporate a manual library preparation for such platforms to minimize turnaround times and increase sample numbers processed in a high throughput laboratory setting as illustrated in this study.
The same remnant sample set of 183 was used to limit variability between samples and ensured comparison of the data generated during analysis from the two platforms. The sequence process had a direct impact on the sequencing outcomes in relation to genomic coverage and sequence quality metrics. Sequence quality was based on in-house quality control specifications established at KRISP for GISAID submissions. These included sequences, which had greater than 80% genomic coverage and less than 100 mutations. Although the Ion Torrent S5 and Illumina MiSeq are both capable of producing complete SARS-CoV-2 genomes, sequences generated on the Ion Torrent S5 maintained an overall higher mean genomic coverage in comparison to sequences generated on the Illumina MiSeq. Various factors contribute to the genomic coverage obtained from both platforms. Ct values are semi-quantitative numbers that generally categorise the concentration of viral RNA in a given sample following qPCR testing. An inverse correlation was observed between viral load and Ct values. Low Ct scores are associated with high viral loads and were found to influence sample quality and, therefore, overall sequence quality (33, 34). Echoing previous findings, we observed an association of viral load on genomic coverage for all sequences generated. Moderate to low viral load samples sequenced on the Ion Torrent S5 resulted in an overall good mean genomic coverage (> 60%) resulting in higher success rates and increased test eligibility. These findings imply that the Ion Torrent S5 sequencing capabilities are less likely to be affected by sample Ct values and, therefore, can be employed in sequencing of samples during early stages of infection when viral load is lower. However, further investigation may be required to assess this using a larger sample cohort across various laboratories. In contrast, the Illumina MiSeq relied on samples with higher viral load for better coverage of genomes as observed in other studies (20–22).
Additionally, the increase in genomic coverage obtained from the Ion Torrent S5 may be attributed to the greater number of reads obtained per sample with the use of two Ion 540 chips in a sequencing run of 96 samples (26). As highlighted in Table 1, the AmpliSeq Research Panel on Ion Torrent S5 produces over twice the number of reads of the Illumina MiSeq, and half the size of fragments sequenced compared to the Illumina MiSeq (200bp versus 400bp) (35). In essence, the number of reads achievable for the samples sequenced in this dataset would be at least double on the Ion Torrent S5 to obtain a coverage that is higher or equal to the Illumina MiSeq. It is possible that the greater number of reads obtained per sample could also have contributed to the greater coverage and reliability in clade assignments observed in sequences generated on the Ion Torrent S5.
Furthermore, the Ion Torrent S5 detected a significantly greater number of total mutations (insertions, substitutions, and deletions) than the Illumina MiSeq. Previous studies have reported that unlike the Illumina MiSeq, semiconductor sequencing platforms like the Ion Torrent S5 are known to produce a predominated homopolymer-associated base-call error by means of INDELS (36–38). Interestingly, these were often deletions instead of insertions, similar to findings of this study, which may have contributed to the larger number of total mutations from Ion Torrent S5 sequences. According to Marine et al., 2020, while such INDELS may be adjusted and corrected for in well characterised viruses, this may not be the case when characterizing novel viruses (26). This further validates the need for in-depth quality control parameters during analysis of such sequences.
In using Genome Detective as the prime assembly method in generating consensus genomes for both platforms, we eliminate the variability between other assembly methods. Additionally, the advantage of using NextClade allows the user to determine the difference in quality of the consensus sequences generated, to classify clades accordingly, and to establish similarity in the identification of evolutionary changes between sequences from each platform (39). In contrast to our findings above highlighting higher genomic coverage obtained for sequences generated on the Ion Torrent S5, we find majority of these sequences to be grouped as lower quality on NextClade in comparison to those generated on the Illumina MiSeq (data not shown). This, however, can be attributed to sequencing errors or miscalled bases generated on the Ion Torrent systems as previously observed (7, 26, 38, 40). Furthermore, other findings indicate that the Ion Torrent S5 and Illumina MiSeq sequences can easily differentiate between the Beta and Delta VOCs based on mutation calling and respective clade assignment. NextClade assigned the same clades for 147/183 (80,3%) samples during the early delta-replacing-beta phase observed in the Eastern Cape, South Africa. A mismatch in clade assignment was observed in 17/183 (9,3%) samples successfully sequenced on both platforms followed by a low failure rate of 4,4% and 6,0% on the Ion Torrent S5 and Illumina MiSeq, respectively. It would be interesting to expand the clade classification and sequencing across a larger cohort of known VOCs using both platforms to assess the reliability and accuracy in clade assignment of NextClade. The Pangolin lineage assignment tool is an alternative software for lineage classification; however, it was not included in this study (41).
A major limitation and consideration for genomic surveillance laboratories are the use of updated primer sets. It is important to note that unlike the Illumina MiSeq Artic V3 primers, which were found to be problematic with novel variants such as Delta, the Ion Torrent primers (AmpliSeq Primers) covered 99% of the SARS-CoV-2 genome, including all serotypes, therefore attributing to higher genomic and S-gene coverage (42–44). Due to the consistent evolution of SARS-CoV-2, difficulty in mutational regions arose resulting in poor coverage of some Artic V3 primers that were located in regions having the key Delta mutations (43). Since the initial primers were designed and based on the reference SARS-CoV-2 genome sequence, it was expected that there would be difficulty in identifying large structural variants. As a result, systematic limitations were observed in the presence of high levels of genomic variation. Subsequently, several S-gene target failures (SGTF) were observed during diagnostic qPCR testing for CoVID-19 (45–47). A decreased coverage of specific regions within the SARS-CoV-2 genome was also observed with the emergence of novel variants since the beginning of the pandemic (46). Low coverage sequences generated on the Illumina MiSeq may have contributed to the discrepancy in clade assignment during the initial surge of the Delta variant. It was previously reported that the G142D amino acid substitution was substantially underrepresented among early Delta variant genomes identified (43). Furthermore, Kuchinski et al., 2022, reported a disruption in genomic sequencing of SARS-CoV-2 as a result of emerging mutations identified in novel variants (44). Since the ARTIC primer set is one of the most widely used SARS-CoV-2 sequencing primers, the V3 primers were updated to address the amplicon drop-off observed among the Delta variant of concern, resulting in a version 4 being released in June 2021. Unfortunately, V4 primers were not used during the execution of the study as they were not procured. Lambisia et al., 2022, subsequently conducted a study to assess the impact of the updated V4 Artic primers on genome recovery using the ONT and concluded a great improvement in the recovery of the Delta variant amongst others (48). The Ion Torrent sequencing panel was also updated to accommodate the amplicon drop-off in novel variants. The updated panel, Ion AmpliSeq SARS-CoV-2 Insight Research Assay, was designed to improve the coverage and uniformity of the previous Ion AmpliSeq SARS-CoV-2 Research Panel used in this study. Continuous improvement of current primers irrespective of kit specifications is, therefore, an essential requirement of an effective genomic surveillance regime.
Plitnick et al., 2021, directly compared the performance of the SARS-CoV-2 AmpliSeq Research Panel to the results obtained with the Illumina MiSeq-based ARTIC Nextflow analysis pipeline (30). Post-bioinformatic analysis of data from such studies showed that both methods produced similar levels of coverage (> 98%) across a broad range of viral loads (Ct values of 15.56 to 32.54 [median, 22.18]) and that both approaches sequenced SARS-CoV-2 effectively (30, 32). Although the bioinformatic analysis pipelines used in this study differ, the findings of our study are similar to those documented in the above study by Plitnick et al., 2021. Standardisation of analysis regimes accommodates the comparison of data from different NGS technologies without bias from independent assembly and analysis tools. Assembly software affects the overall genomic coverage of sequences obtained from various platforms. There is an additional need for quality control processes to improve the overall quality of such sequences made publicly available as recommended by Jacot et al., 2021 for diagnostic purposes (49). Such achievements have included removal of frame shifts and unknown stop codons in some instances. In this study, we observed sequences generated on the Illumina MiSeq to be simpler to process, with quality control easily implemented across such sequences yielding sequences of better quality as per NextClade analysis. Although sequencing capabilities are similar on both platforms, an overall higher genomic coverage of sequences were generated on the Ion Torrent S5. However, majority of these sequences were found to be of lower quality as per analysis on NextClade. It is, therefore, significant to note that standardising assembly and analysis software allows for improved comparison of the data generated by the different platforms and data analysed by different software. Nonetheless, this study complements previous research investigating the efficacy of Ion Torrent and Illumina platforms for sequencing viral pathogens (23, 24, 50).
This study also provides fundamental insight into the requirements and challenges of the different methodologies for their intended purpose of genomic surveillance in a high throughput research laboratory setting. As a precaution, potential users of publicly available sequences need to take into account technologies used, assembly and analysis pipelines implemented and as well as the quality of sequences generated from samples of differing integrity, particularly when comparing or combining data generated on different platforms. It is also advisable to ensure that primers used in the early stages of sequencing are kept up-to-date in a genomic surveillance laboratory setting.