Comparing SARS-CoV-2 sequencing methodologies during early phase detection of the Delta variant in South Africa

doi:10.21203/rs.3.rs-2310293/v1

Download PDF

Research Article

Comparing SARS-CoV-2 sequencing methodologies during early phase detection of the Delta variant in South Africa

https://doi.org/10.21203/rs.3.rs-2310293/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 29 May, 2024

Read the published version in OBM Genetics →

Version 1

posted

You are reading this latest preprint version

Background: Genomic surveillance, with the aid of next-generation sequencing (NGS) technologies, revolutionized the SARS-CoV-2 pandemic. Coupled with high-performance analysis software, methodologies such as the Ion Torrent S5 and Illumina MiSeq dramatically improved the genomic surveillance capacity within South Africa during the height of the pandemic. Using de-identified remnant samples collected from Eastern Cape and analysis software, Genome Detective and NextClade, we compared the sequencing process, genomic coverage, quantification of mutations, and clade classification from sequence data generated by these two common “benchtop” NGS platforms.

Results: Sequence data analysis revealed success rates of 175/183 (96%) and 172/183 (94%) on the Ion Torrent S5 and Illumina MiSeq, respectively. Internal quality metrics were assessed in terms of genomic coverage (>80%) and the number of mutations identified (<100). A greater number of higher-genomic coverage sequences were generated on the Ion Torrent S5 (99%) than on the Illumina MiSeq (80%) and <100 mutations was obtained by both platforms. Ion Torrent S5 generated high coverage sequences from samples having a broader range of viral loads (VL) compared to the Illumina MiSeq, which was less successful in sequencing samples with lower viral loads. Clade assignments were comparable across platforms which accurately differentiated between Beta (<45%) and Delta (≤30%) VOCs. A disparity in clade assignment was observed in <10% of sequences due to poor coverage obtained on the Illumina MiSeq, followed by a failure rate of ≤6% across the two platforms. Manual library preparation found both methods similar in terms of sample processing, handling of larger sample quantities, and clade assignment for SARS-CoV-2. Variability between the Ion Torrent S5 and Illumina MiSeq was observed in sequencing run duration (3,5 hrs vs 36 hrs), sequencing process (semi-automation vs manual), genomic coverage (99% vs 80%), and viral load requirements (broad range vs high VL).

Conclusion: The Illumina MiSeq and Ion Torrent S5 are both reliable platforms capable of performing WGS with the use of amplicons and providing specific, accurate, and high throughput analysis of the SARS-CoV-2 whole viral genomes. Both sequencing platforms are feasible platforms for the genomic surveillance of SARS-CoV-2, each with its specific advantages and trade-offs.

Next-generation-Sequencing (NGS)

SARS-CoV-2

Illumina MiSeq

Ion TorrentS5

Genomic surveillance

viral load

bioinformatics analysis

At the end of 2019, incidence rates of SARS-CoV-2 increased substantially within a short period, followed by a rapid global diffusion and evolution of the virus resulting in five novel VOCs - each driving new infection waves (1, 2). Globally, more than 626 million confirmed cases of COVID-19 have been reported to date resulting in over 6,5 million deaths to date and five VOCs (3, 4).

Pathogen genome sequencing is a fundamental surveillance tool used to support the understanding of molecular epidemiology of disease outbreaks (5). Recent advances in sequencing technologies have shown their applicability for research use in outbreak situations, such as those observed with Ebola, Zika virus, SARS, MERS, and most recently, the novel SARS-CoV-2 (6–11). Tracking signature genetic mutations of the virus allows researchers to estimate the influence of early outbreaks, and ensures more accurate detection and characterisation of variants, possible drug resistance mutations, vaccine escape variants, virulence, and pathogenicity factors (12–16). Therefore, genomic surveillance, complemented by real-time monitoring and data sharing networks, is a valuable tool to improve the understanding of SARS-CoV-2 transmission and epidemic dynamics. Sequencing centres across the country initiated genomic surveillance programs for the WGS of SARS-CoV-2 as part of the Network for Genomic Surveillance in South Africa (NGS-SA) (17, 18). Sequencing technologies used included Illumina, Ion Torrent and Oxford Nanopore technologies. Together, the network kept abreast of the latest SARS-CoV-2 variants circulating within South Africa’s infection waves and produced over 46,000 genomes, which were made publicly available since May 2020 (19–22).

Although there are distinct differences between these two systems, the Ion Torrent S5 and Illumina MiSeq platforms are well-known for their mid- to high-throughput sequencing, variant calling, and overall good quality short read sequences (23–26). Illumina, a benchmark in sequencing technology, uses a fluorescence-based paradigm for determining nucleotide sequence in which all of the enzymatic processes and imaging steps take place in a flow cell. Ion Torrent, an alternative sequencing technology, reads nucleotide sequences based on a measure of pH by proton release and makes use of a semiconductor sequencing chip and ion spheres bound to DNA (25, 27, 28). Additionally, there are differences in the type of data generated by each platform. The sequence reads generated from Illumina data in a single run have the same length and are paired-end reads, whereas Ion Torrent reads vary in length, and are single-ended (23, 29). A comparative study involving SARS-CoV-2 further highlights the ease of use and operation with the automated Ion Torrent S5 workflow when coupled with the Ion Chef (30). The Ion Chef was used to automate library preparation for small sample numbers and is an essential component in the templating of prepared libraries onto the Ion Torrent sequencing chip. Cost comparisons of these platforms are similar provided an increase in multiplexing of samples is maintained on the Ion Torrent S5 (26). The operation of the Illumina MiSeq at maximum capacity ensures an overall low cost due to the high efficiency of the platform (26, 30).

While SARS-CoV-2 has been broadly studied over the past two years, there are still concerns about emerging variants and mutations; therefore, continued surveillance of SARS-CoV-2 variants remains critical for the identification of new emerging variants. For this reason, it is necessary to ensure that sequencing data generated by various WGS platforms is comparable in terms of performance and sequencing output. A recent study performed a benchmarking comparison on several different SARS-CoV-2 genome-sequencing protocols and reported performance variation across WGS technologies (31). Another study assessed the WGS of SARS-CoV-2 using both the Ion Torrent and Illumina technologies with their respective protocols in considerable detail (30). Although the study’s findings demonstrate that genomic coverage was high with faster turnaround times on the Ion Torrent platform, it is challenging to compare the data generated without using the same analysis pipeline to avoid discrepancies in assembly and quality control processes.

Therefore, in this comparative study, we used remnant SARS-CoV-2 positive samples collected from the Eastern Cape for a direct comparison of data generated by the Ion Torrent S5 and Illumina MiSeq platforms. The same analysis pipeline was used by both platforms in an attempt to assess if genomic coverage, quantification of mutations (deletions, insertions, and substitutions), and clade assignment were comparable for data generated across the two platforms.

Process Flow

The Illumina MiSeq and Ion Torrent S5 sequencing platforms followed specific workflows for WGS of SARS-CoV-2 samples. Figure 1 provides a detailed overview on the processes involved from sample receipt to data analysis. Post nucleic acid extraction, the Illumina MiSeq and Ion Torrent S5 workflows take on a separate direction in sample processing.

Sequencer-specific attributes

Sequencing data obtained for the independent runs and listed in Tables S3 and S4, were subjected to WGS on the Illumina MiSeq and Ion Torrent S5, respectively. This sequencing data listed was obtained for each respective platform upon run completion and was tabulated separately due to non-comparable parameters listed. Each platform was independently assessed based on kit and platform specific outputs as listed in Table 1. Each platform displays similarities in the use of 2-pool primer sets, processing capacity of a broad range of sample quantities, semi-automation of process, high sequence success rates, and genomic coverage. The platforms differ across the platform-specific kit requirements and primer sets used, the performance output in read length, fragment size, and varied sequencing duration observed per sequence run.

Table 1

Comparison of (a) Illumina MiSeq and (b) Ion Torrent S5 sequencing methodologies used for WGS of SARS-CoV-2.
Attributes	Illumina MiSeq	Ion Torrent S5 and Chef
Primer details	Artic Primers (version 3) – 2 pools	AmpliSeq Primers – 2 pools
Sequencer performance and kit details	Nextera Flex V2 500 cycle kit V2 24–30 million reads Throughput: 12 Gb Sequencing Run time: ~ 36 hours	AmpliSeq Research Panel using the Ion 540 Chip 60–80 million reads Throughput: 10–15 Gb Sequencing Run time: ~3,5 hours per chip
Sample quantity	Processing low to high sample numbers	Processing of low to high sample numbers
Sample processing (automated vs manual)	Can be automated using the addition of an automated liquid handler to reduce hands on time and error rates Automation allows for handling of large sample numbers High throughput with manual library preparation and indexing of 96 libraries at a time Increased hands on times and increase error rates	Automation of library preparation on Ion Chef allows for small sample number processing (8 samples per 7,5 hours) Automation limits hands on time and limits error rates Higher throughput is achievable by preparing libraries manually with the use greater IonCode Barcode Adaptors Manual library preparation increases hands on time and increases error rates
Read lengths	Assembled Paired-end reads 400 bp fragments	Single-end reads 125–275 bp fragments

Sequencing Platform Performance Comparison

Run A and Run B, constituting 183 libraries in total, were sequenced on both platforms generating 86 consensus sequences for each of the Illumina MiSeq runs, followed by 92 and 83 consensus sequences for the Ion Torrent S5 sequencing runs (Table 2). The sequence success rate was 93,9% on the Illumina MiSeq and 95,6% on the Ion Torrent S5. The sequencing runtime per run on the Illumina MiSeq was 36 hours, whereas the Ion Torrent S5 platform had a total sequencing runtime of seven hours per 96 samples. Of the 183 samples sequenced, 164 (89,6%) had paired consensus sequences available from both platforms.

Table 2

Summary of comparable consensus sequence data generated from same sample set on Illumina MiSeq and Ion Torrent S5 Platforms
Properties	Sequencing Platform
Properties	Illumina MiSeq	Ion Torrent S5 and Chef
Sequencing Run Time per run (hrs)	36	7 (3,5 hrs per 540 chip)
Total number of samples processed per platform	183
Sequence Success Rate (n/N (%))	172/183 (93,9%)	175/183 (95,6%)
Sequence Failure Rates (n/N (%))	11/183 (6,0%)	8/183 (4,4%)
Greater than 80% genomic coverage (n/N (%))	138/172 (80,2%)	174/175 (99,4%)
Sequences with less than 100 mutations (%)	100%	100%
Mean Genomic Coverage (%)	83,4%	98,9%
Mean S-gene coverage (%)	78,7%	99,0%
Total mutations per platform	7036	8116
Total Substitutions per platform	4671	5039
Total Insertions per platform	17	46
Total Deletions per platform	2348	3031
Paired consensus genomes for both platforms (n/N (%))	164/183 (89,6%)
Matched clade assignments between platforms (n/N (%))	147/183 (80,3%)
Mismatched clade assignments between platforms (n/N (%))	17/183 (9,3%)

Genome Sequence Quality Metrics

The consensus sequences generated on Genome detective were evaluated to determine genome coverage and the quantity of assigned mutations. In total, 99,4% of the genomes that were generated on the Ion Torrent S5 passed the genomic coverage quality metrics for GISAID submissions based on KRISP’s internal specification compared to 80,2% on the Illumina MiSeq (Table 2). All consensus sequences generated from both platforms had less than 100 mutations quantified for each sequence.

Genomic and S-gene coverage of SARS-CoV-2

Whole-genome assemblies generated on the Ion Torrent S5 showed a generally higher mean genomic coverage compared to those generated on the Illumina MiSeq (Fig. 2). Mean genomic coverages of 99,0% and 83,7% was observed on the Ion Torrent S5 and Illumina MiSeq, respectively. A highly statistically significant difference (Wilcoxon, p < 0.0001) was observed in genomic coverage obtained between the two platforms. Genomes generated on the Ion Torrent S5 ranged from 63,0% to 100,0% coverage, whereas genomes generated on the Illumina MiSeq ranged from 1,9% to 99,5% coverage. Furthermore, the S-gene coverage ranged from 64,2% to 100,0% on the Ion Torrent S5 and 0,9% to 100,0% on the Illumina MiSeq, with an average S-gene coverages of 99,0% and 78,7%, respectively. A highly significant difference (Wilcoxon, p < 0.0001) in the S-gene coverage was observed across both platforms (Fig. 3). Overall, the genomic and the S-gene coverages were consistently higher on the Ion Torrent S5 platform than on the Illumina MiSeq (Figs. 2 and 3).

Effect of Viral load on genomic and S-gene coverage of SARS-CoV-2

A Spearman’s ranked sum correlation test was performed to determine the effect of increasing viral load on genomic and S-gene coverage on sequences generated from each sequencing platform (Figs. 4 and 5). Of the 183 samples sequenced on each platform, 180 sequences had corresponding Ct scores available. Estimated viral loads were qualitatively based on mean Ct values provided with available metadata and grouped as per Table 3. Sequences generated from samples with mean Ct values ≥ 15 and ≤ 25 (high viral loads) accounted for 28,4% of the total consensus genomes, followed by 29,5% with mean Ct ≥ 26 and ≤ 30 (moderate viral loads) and 40,4% with mean Ct values > 30 (low viral load). High statistical significance obtained for sequences generated on the Illumina MiSeq (p < 0.001) and Ion Torrent S5 (p < 0.001) showed that estimated viral load directly influences genomic coverage. Mean genomic coverages were also tabulated in Table 3 for each of the estimated viral loads assigned. Although consensus sequences generated on the Ion Torrent S5 had higher overall mean genomic coverages compared to the Illumina MiSeq, genomic coverage is known to gradually decline with decreasing viral load. A similar trend was also for S-gene coverage in association with viral load. In addition, several samples sequenced on the Illumina MiSeq resulted in an absence of coverage in the S-gene region (Table S5). These sequences occurred in samples having very low viral load and low template material, and therefore obtained low genomic coverages.

Table 3

Mean Ct score range for Viral load estimation
Ct score (mean)	Sample No. (n (%))	Estimated Viral Load (Qualitative)	Mean Genomic Coverage (Ion Torrent S5 / Illumina MiSeq)
15 ≤ Ct ≤ 25	52 (28,4%)	High	99,9 / 93,7
25 ≤ Ct ≤ 30	54 (29,5%)	Moderate	99,7 / 93,0
> 30	74 (40,4%)	Low	97,7 / 71,4
No Ct score available	3 (1,6%)	Unknown	99,3 / 39,0

SARS-CoV-2 Clade assignments

The consensus sequences were uploaded onto NextClade and the clade assignments were determined and compared between sequences generated on the two platforms. Clade assignments obtained from sequences run on the Ion Torrent S5 and Illumina MiSeq were comparable across the majority of the clades identified. As illustrated in Table 4, Beta and Delta variants were identified in 41,5% and 30,0% of samples sequenced on the Ion Torrent S5 followed by 44,8% and 29,5% on the Illumina MiSeq, respectively. The number of samples that were unsuccessfully sequenced (eight and 11 on the Ion Torrent S5 and Illumina MiSeq, respectively) were not assigned a clade and are detailed in Table 5. Of the 183 consensus genomes compared, 17 sequences were classified as different clades between the NGS platforms (Table 6). We highlighted the mismatched clades assigned to each sequence generated per NGS platform as well the genomic and S-gene coverages obtained for each, followed by the respective amino acid mutations identified. It is evident from the data obtained that genomic coverage influenced the clade classification of the variants sequenced by each platform. Table 6 illustrates that the high coverage obtained for sequences on the Ion Torrent S5 had contributed to a reliable clade assignment of variants in relation to Illumina MiSeq data. Specific key mutations for each of the mismatched clade assignments were also examined with a focus on specific key mutations in the spike region. We observed that not all specific key mutations were identified for each respective clade assigned. We also observed inconsistency in mutations identified in sequences between the platforms, that is, each sequence identified had a different set of mutations called. This discrepancy may be attributed to several sequencing factors such as primers used, number of reads obtained as well as platform-specific processing.

Table 4

Clade assignment summary between sequencing platforms
Clade	Ion Torrent S5 (n (%))	Illumina MiSeq (n (%))
19A	14 (7,7%)	7 (3,8%)
19B	-	2 (1,1%)
20A	8 (4,4%)	4 (2,2%)
20B	13 (7,1%)	13 (7,1%)
20C	7 (3,8%)	8 (4,4%)
20H (Beta,V2)	76 (41,5%)	82 (44,8%)
20I (Alpha,V1)	2 (1,1%)	2 (1,1%)
21A (Delta)	1 (0,5%)	1 (0,5%)
21J (Delta)	54 (29,5%)	53 (29,0%)
Blanks	8 (4,4%)	11 (6,0%)

Table 5

Samples unsuccessfully sequenced on the Ion Torrent S5 and Illumina MiSeq platforms
		Ion Torrent S5		Illumina MiSeq
Accession Identifiers	Mean Ct (VL)	Genomic Coverage (%)	Clade Assignment	Genomic Coverage (%)	Clade Assignment
EPI_ISL_3275349	26,8 (Moderate)	99,8	20H (Beta, V2)	-	-
EPI_ISL_3275351	22,4 (High)	99,8	20B	-	-
EPI_ISL_3275352	24,4 (High)	99,8	21J (Delta)	-	-
EPI_ISL_3275353	33,0 (Low)	99,8	21J (Delta)	-	-
EPI_ISL_3275354	28,2 (Moderate)	99,7	21J (Delta)	-	-
-	33,1 (Low)	99,7	21A (Delta)	-	-
EPI_ISL_3275374	35,7 (Low)	100,0	20H (Beta, V2)	-	-
EPI_ISL_3275376	34,1 (Low)	99,9	20C	-	-
EPI_ISL_3275378	19,2 (High)	100,0	20C	-	-
EPI_ISL_2727188	18,1 (High)	-	-	91,7	20A
EPI_ISL_3275379	33,1 (Low)	100,0	20H (Beta, V2)	-	-
EPI_ISL_2727191	33,7 (Low)	97,1	19A	-	-
EPI_ISL_2727197	22,6 (High)	-	-	95,8	20H (Beta, V2)
EPI_ISL_2727205	35,8 (Low)	-	-	95,4	20H (Beta, V2)
EPI_ISL_2727214	32,3 (Low)	-	-	86,4	20H (Beta, V2)
EPI_ISL_2727215	25,7 (High)	-	-	94,9	20H (Beta, V2)
EPI_ISL_2727222	30,8 (Low)	-	-	95,0	20H (Beta, V2)
-	36,5 (Low)	-	-	14,2	20H (Beta, V2)
EPI_ISL_2727233	15,3 (High)	-	-	97,3	21J (Delta)
Unsuccessful sequences		8		11
VL = Viral Load

Table 6

Clade calling discrepancy between the Illumina MiSeq and Ion Torrent S5 platforms with their corresponding mutations identified (mutations in bold signify key S-gene mutations specific to the VOC identified)
Sample ID	Run no.:	Coverage (%) per Platform				Clade Calling per Platform		Key Mutations
		Illumina MiSeq		Ion Torrent S5		Illumina MiSeq	Ion Torrent S5	Illumina MiSeq	Ion Torrent
		Genomic Coverage %	S-gene Coverage %	Genomic Coverage %	S-gene Coverage %	Illumina MiSeq	Ion Torrent S5	Illumina MiSeq	Ion Torrent
K016691	A	18,5	9,7	99,8	100,0	19A	20H (Beta, V2)	ORF7b:T40I	E:P71L, N:P13S, N:T205I, ORF1a:T265I, ORF1a:K1655N, RF1a:K3353R, ORF1b:P314L, ORF1b:A1057S, ORF3a:Q57H, ORF3a:S171L, S:L18F, S:D80A, S:D138Y, S:D215G, S:R246G, S:K417N, S:E484K, S:N501Y, S:D614G, S:A701V
K016709	A	78,1	71,7	99,7	100,0	20H (Beta, V2)	20A	E:P71L,N:T205I, ORF1a:T265I, ORF1a:G507R, ORF1a:P2046L, ORF1a:N2596S, ORF1a:M2796C, ORF1a:K3353R, ORF1b:P314L, ORF1b:I1074V, ORF3a:S171L, S:L18F, S:D80A, S:D215G, S:K417N, S:E484K, S:N501Y, S:D614G, S:A701V, S:S940F, S:A1087S	ORF1b:P314L, S:D80A, S:N501Y, S:D614G
K016715	A	11,0	4,8	99,7	100,0	20C	20A	ORF1a:T265I, ORF1a:K3353R, ORF1a:D4335Y, ORF1b:P314L	S:N501Y, S:D614G
K016754	A	36,3	46,2	99,0	97,7	21J (Delta)	19A	M:I82T, N:D63G, ORF1b:E513, ORF7b:T40I, ORF9b:T60A, S:T19R, S:L452R, S:T478K*, S:D614G, S:D950N	M:I82T, N:D63G, N:D377Y, ORF1a:T3255I, ORF1b:G662S, ORF1b:A1918V, ORF7a:V82A, ORF7b:T40I, ORF9b:T60A, S:L452R, S:D614G, S:P681R
K016760	A	28,2	10,8	99,7	100,0	19A	20A	ORF1a:K3353R	M:I82T, N:D63G, N:G215C, ORF1a:A1306S, ORF1a:P2046L, ORF1a:P2287S, ORF1a:T3255I, ORF1b:P314L, ORF1b:G662S, ORF1b:P1000L, ORF1b:A1918V, ORF3a:S26L, ORF7a:V82A, ORF9b:T60A, S:T19R, S:T478K, S:N501Y, S:D614G
K016870	B	11,5	25,4	97,5	98,3	20H (Beta, V2)	19A	ORF1a:K1197N, ORF1a:T1638I, ORF1a:P1640S, ORF1a:K1655N, S:K417N	ORF3a:S26L, ORF3a:Q57H, ORF3a:S171L, S:D614G, S:A701V
K016899	B	13,5	0,0	98,1	98,6	20C	19A	ORF1b:R1315C	E:P71L, S:A701V
K016902	B	42,7	19,1	98,4	98.6	20H (Beta, V2)	19A	N:G60F, N:K61P, N:E62S, N:D63S, N:K65P, N:R68L, N:G69L, N:I74L, N:D81Y, N:D82Y, ORF1a:T265I, ORF1a:T2154I, ORF1a:N2767H, ORF1a:K3353R, ORF1b:V22I, ORF1b:C44S, ORF1b:L271I, ORF9b:E65S, ORF9b:D66Y, ORF9b:Q70H, ORF9b:Q77H, ORF9b:M78I, S:A701V	ORF1a:T265I; S:D614G
K016908	B	30,8	33,7	95,4	97,6	20B	20A	N:A152S,N:R203K, N:G204R, N:N213I, ORF1a:T395I, ORF9b:R32L, S:D614G	ORF1b:S1779I, S:N450K, S:D614G, S:P681R
K016917	B	77,3	73,7	97,7	98,8	20H (Beta, V2)	19A	E:P71L, N:D128Y, N:T205I, N:Y268N, ORF1a:K1655N, ORF1a:K3353R, ORF1b:S1182L, ORF3a:W131L, ORF3a:S171L, ORF8:D63N, S:L18F, S:D215G, S:K417N, S:E484K, S:N501Y, S:D614G	N:D128Y, ORF1a:K3353R, ORF3a:Q57H, S:L18F, S:K417N, S:D614G, S:A701V
K016920	B	33,1	8,3	99,7	97,6	20C	19A	ORF1a:T265I, ORF1a:T2087S, ORF1a:K3353R, S:D614G	N:T205IS:D614G
K016923	B	5,9	10,6	86,8	79,7	20A	20H (Beta, V2)	S:K417N	E:P71L, M:I82T, N:T135I, N:T205I, ORF1a:T265I, ORF1a:I547F, ORF1a:K1655N, ORF1b:P314L, ORF1b:L1698F, ORF3a:Q57H, ORF3a:G100S, ORF3a:S171L, S:T19A, S:L24F, S:P25T, S:K182N, S:D215G
K016926	B	3,9	10,2	98,9	99,0	19B	20A	S:D614G	M:I82T, ORF1a:P2046L, ORF1a:S2048F, ORF1a:T3255I, ORF1b:G662S, ORF1b:P1000L, ORF1b:A1918V, S:D614G
K016931	B	9,3	0,0	99,7	100,0	21A (Delta)	20H (Beta, V2)	M:I82T, N:D63G, ORF9b:T60A	E:P71L; N:T205I, ORF1a:T265I,ORF1a:K1655N, ORF1a:K3353R, ORF3a:Q57H, ORF3a:S171L, S:K417N, S:E484K, S:N501Y, S:D614G, S:A701V
K016940	B	4,4	6,6	98,5	97,6	20H (Beta, V2)	19A		ORF3a:Q57H, RF3a:S171L, S:D614G, S:A701V
K016943	B	5,2	0,0	97,7	93,1	19B	20A	ORF1a:A540T, ORF1a:K1655N	S:D614G
K016945	B	79,9	59,6	99,9	100,0	20H (Beta, V2)	19A	E:P71L, N:T205I, N:T271I, ORF1a:K1655N, ORF1a:K3353R, ORF1b:P314L, ORF1b:A941S, ORF1b:G1129V, ORF3a:G100S, ORF3a:S171L, S:D215G, S:K417N, S:D614G, S:A701V	S:D614G
Clade Summary per Platform (total = 17)								20H Beta V2 (n = 6), 20A (n = 1), 20B (n = 1), 20C (n = 3), 19A (n = 2), 19B (n = 2) 21J Delta (n = 2)	20H Beta V2 (n = 3); 20A (n = 6); 19A (n = 8)

Quantification of mutations (insertions, deletions, and substitutions) for sequences generated on the Illumina MiSeq and Ion Torrent S5

The number of mutations detected for each sample was individually compared in order to establish if there was a significant difference in the assignment of mutations. These analyses included the total substitutions, insertions, and deletions, as detected by the Ion Torrent S5 and Illumina MiSeq platforms (Fig. 6). In total, we analysed 347 consensus sequences. There was a highly significant difference in total mutations observed across the Ion Torrent S5 and Illumina MiSeq (Wilcoxon, p < 0.0001) with a greater number of mutations detected by the Ion Torrent S5 (6–94 mutations, total: 8116) than the Illumina MiSeq (1–92 mutations, total: 7036). A marked significant difference was also noted for the number of substitutions (Wilcoxon, p < 0.05) and deletions (Wilcoxon, p < 0.0001) identified by both platforms; however, no significant difference was observed for insertions (Wilcoxon, p = 0,25 ns) across both platforms. The variation in the number of mutations detected across the sequencing platforms are listed in Table 2.

The Ion Torrent S5 and Illumina MiSeq provide alternative methods for researchers to study SARS-CoV-2 at a genomic level (30). This study compared the performance and data generated by the two WGS platforms. We hypothesized the sequencing methodologies that were used for the genomic surveillance of SARS-CoV-2 in a high throughput laboratory setting, generated sequencing data that differed when analysed with the same analysis pipeline. In a brief overview of the data generated for genomic coverage, clade assignments and quantification of mutations, we concluded that the platforms were similar in sequencing capabilities but differed in sequencing data outcomes. Our findings indicate that the Ion Torrent S5 produced sequences with higher genomic coverage over a broader range of viral loads and in a shorter time in comparison to the Illumina MiSeq. These findings were in agreement with previous comparison studies (25, 26, 30, 32).

In terms of the sequencing process, the Ion Torrent S5 and Illumina MiSeq followed a streamlined process flow, which allowed the platforms to display their adaptability in the WGS of SARS-CoV-2. The sequencing runtime for a sample set of 96 on the Illumina MiSeq was 36 hours, whereas on the Ion Torrent S5 it was seven hours using two Ion 540 sequencing chips. The difference in sequencing time allows a greater number of samples to be sequenced on the Ion Torrent S5 in a 36-hour period than on the Illumina MiSeq. Although the sequencing duration is much shorter on the Ion Torrent S5, it is important to note that the remaining processes such as amplification and library preparation consume a shorter duration on the Illumina MiSeq. The automated process for templating the manually prepared libraries onto the Ion Torrent sequencing chip is approximately 15,5 hours whereas the amplification and tagmentation step utilize less than 10 hours. These findings confirmed previous observations that found similarities in processing times with each respective workflow (30). With limited hands-on time, full automation with the Ion chef allows for faster turnaround times but limits sample numbers that can be processed at once (eight libraries per seven and a half hours on the Ion Chef) on the Ion Torrent S5. Thus making the Ion Torrents selling point of full automation its major downfall in a high-throughput laboratory setting. However, taking into consideration the manual route for library preparation, the Ion Torrent S5 is similar to the Illumina MiSeq in handling larger sample numbers provided a manual library preparation is followed. The Illumina MiSeq workflow can also be automated by including an external liquid handler further to reduce hands-on time and overall duration of processing. In addition, reagents used in the upstream preparation processes on the Illumina MiSeq can be further optimized and validated to accommodate greater sample numbers with reduced reagent volumes by miniaturisation of process workflow. In contrast, the use of full automation on the Ion Torrent S5 coupled with the Ion chef allows for limited handling of small sample numbers. It increases the overall turnaround time in a day-shift laboratory site. It is therefore feasible to incorporate a manual library preparation for such platforms to minimize turnaround times and increase sample numbers processed in a high throughput laboratory setting as illustrated in this study.

The same remnant sample set of 183 was used to limit variability between samples and ensured comparison of the data generated during analysis from the two platforms. The sequence process had a direct impact on the sequencing outcomes in relation to genomic coverage and sequence quality metrics. Sequence quality was based on in-house quality control specifications established at KRISP for GISAID submissions. These included sequences, which had greater than 80% genomic coverage and less than 100 mutations. Although the Ion Torrent S5 and Illumina MiSeq are both capable of producing complete SARS-CoV-2 genomes, sequences generated on the Ion Torrent S5 maintained an overall higher mean genomic coverage in comparison to sequences generated on the Illumina MiSeq. Various factors contribute to the genomic coverage obtained from both platforms. Ct values are semi-quantitative numbers that generally categorise the concentration of viral RNA in a given sample following qPCR testing. An inverse correlation was observed between viral load and Ct values. Low Ct scores are associated with high viral loads and were found to influence sample quality and, therefore, overall sequence quality (33, 34). Echoing previous findings, we observed an association of viral load on genomic coverage for all sequences generated. Moderate to low viral load samples sequenced on the Ion Torrent S5 resulted in an overall good mean genomic coverage (> 60%) resulting in higher success rates and increased test eligibility. These findings imply that the Ion Torrent S5 sequencing capabilities are less likely to be affected by sample Ct values and, therefore, can be employed in sequencing of samples during early stages of infection when viral load is lower. However, further investigation may be required to assess this using a larger sample cohort across various laboratories. In contrast, the Illumina MiSeq relied on samples with higher viral load for better coverage of genomes as observed in other studies (20–22).

Additionally, the increase in genomic coverage obtained from the Ion Torrent S5 may be attributed to the greater number of reads obtained per sample with the use of two Ion 540 chips in a sequencing run of 96 samples (26). As highlighted in Table 1, the AmpliSeq Research Panel on Ion Torrent S5 produces over twice the number of reads of the Illumina MiSeq, and half the size of fragments sequenced compared to the Illumina MiSeq (200bp versus 400bp) (35). In essence, the number of reads achievable for the samples sequenced in this dataset would be at least double on the Ion Torrent S5 to obtain a coverage that is higher or equal to the Illumina MiSeq. It is possible that the greater number of reads obtained per sample could also have contributed to the greater coverage and reliability in clade assignments observed in sequences generated on the Ion Torrent S5.

Furthermore, the Ion Torrent S5 detected a significantly greater number of total mutations (insertions, substitutions, and deletions) than the Illumina MiSeq. Previous studies have reported that unlike the Illumina MiSeq, semiconductor sequencing platforms like the Ion Torrent S5 are known to produce a predominated homopolymer-associated base-call error by means of INDELS (36–38). Interestingly, these were often deletions instead of insertions, similar to findings of this study, which may have contributed to the larger number of total mutations from Ion Torrent S5 sequences. According to Marine et al., 2020, while such INDELS may be adjusted and corrected for in well characterised viruses, this may not be the case when characterizing novel viruses (26). This further validates the need for in-depth quality control parameters during analysis of such sequences.

In using Genome Detective as the prime assembly method in generating consensus genomes for both platforms, we eliminate the variability between other assembly methods. Additionally, the advantage of using NextClade allows the user to determine the difference in quality of the consensus sequences generated, to classify clades accordingly, and to establish similarity in the identification of evolutionary changes between sequences from each platform (39). In contrast to our findings above highlighting higher genomic coverage obtained for sequences generated on the Ion Torrent S5, we find majority of these sequences to be grouped as lower quality on NextClade in comparison to those generated on the Illumina MiSeq (data not shown). This, however, can be attributed to sequencing errors or miscalled bases generated on the Ion Torrent systems as previously observed (7, 26, 38, 40). Furthermore, other findings indicate that the Ion Torrent S5 and Illumina MiSeq sequences can easily differentiate between the Beta and Delta VOCs based on mutation calling and respective clade assignment. NextClade assigned the same clades for 147/183 (80,3%) samples during the early delta-replacing-beta phase observed in the Eastern Cape, South Africa. A mismatch in clade assignment was observed in 17/183 (9,3%) samples successfully sequenced on both platforms followed by a low failure rate of 4,4% and 6,0% on the Ion Torrent S5 and Illumina MiSeq, respectively. It would be interesting to expand the clade classification and sequencing across a larger cohort of known VOCs using both platforms to assess the reliability and accuracy in clade assignment of NextClade. The Pangolin lineage assignment tool is an alternative software for lineage classification; however, it was not included in this study (41).

A major limitation and consideration for genomic surveillance laboratories are the use of updated primer sets. It is important to note that unlike the Illumina MiSeq Artic V3 primers, which were found to be problematic with novel variants such as Delta, the Ion Torrent primers (AmpliSeq Primers) covered 99% of the SARS-CoV-2 genome, including all serotypes, therefore attributing to higher genomic and S-gene coverage (42–44). Due to the consistent evolution of SARS-CoV-2, difficulty in mutational regions arose resulting in poor coverage of some Artic V3 primers that were located in regions having the key Delta mutations (43). Since the initial primers were designed and based on the reference SARS-CoV-2 genome sequence, it was expected that there would be difficulty in identifying large structural variants. As a result, systematic limitations were observed in the presence of high levels of genomic variation. Subsequently, several S-gene target failures (SGTF) were observed during diagnostic qPCR testing for CoVID-19 (45–47). A decreased coverage of specific regions within the SARS-CoV-2 genome was also observed with the emergence of novel variants since the beginning of the pandemic (46). Low coverage sequences generated on the Illumina MiSeq may have contributed to the discrepancy in clade assignment during the initial surge of the Delta variant. It was previously reported that the G142D amino acid substitution was substantially underrepresented among early Delta variant genomes identified (43). Furthermore, Kuchinski et al., 2022, reported a disruption in genomic sequencing of SARS-CoV-2 as a result of emerging mutations identified in novel variants (44). Since the ARTIC primer set is one of the most widely used SARS-CoV-2 sequencing primers, the V3 primers were updated to address the amplicon drop-off observed among the Delta variant of concern, resulting in a version 4 being released in June 2021. Unfortunately, V4 primers were not used during the execution of the study as they were not procured. Lambisia et al., 2022, subsequently conducted a study to assess the impact of the updated V4 Artic primers on genome recovery using the ONT and concluded a great improvement in the recovery of the Delta variant amongst others (48). The Ion Torrent sequencing panel was also updated to accommodate the amplicon drop-off in novel variants. The updated panel, Ion AmpliSeq SARS-CoV-2 Insight Research Assay, was designed to improve the coverage and uniformity of the previous Ion AmpliSeq SARS-CoV-2 Research Panel used in this study. Continuous improvement of current primers irrespective of kit specifications is, therefore, an essential requirement of an effective genomic surveillance regime.

Plitnick et al., 2021, directly compared the performance of the SARS-CoV-2 AmpliSeq Research Panel to the results obtained with the Illumina MiSeq-based ARTIC Nextflow analysis pipeline (30). Post-bioinformatic analysis of data from such studies showed that both methods produced similar levels of coverage (> 98%) across a broad range of viral loads (Ct values of 15.56 to 32.54 [median, 22.18]) and that both approaches sequenced SARS-CoV-2 effectively (30, 32). Although the bioinformatic analysis pipelines used in this study differ, the findings of our study are similar to those documented in the above study by Plitnick et al., 2021. Standardisation of analysis regimes accommodates the comparison of data from different NGS technologies without bias from independent assembly and analysis tools. Assembly software affects the overall genomic coverage of sequences obtained from various platforms. There is an additional need for quality control processes to improve the overall quality of such sequences made publicly available as recommended by Jacot et al., 2021 for diagnostic purposes (49). Such achievements have included removal of frame shifts and unknown stop codons in some instances. In this study, we observed sequences generated on the Illumina MiSeq to be simpler to process, with quality control easily implemented across such sequences yielding sequences of better quality as per NextClade analysis. Although sequencing capabilities are similar on both platforms, an overall higher genomic coverage of sequences were generated on the Ion Torrent S5. However, majority of these sequences were found to be of lower quality as per analysis on NextClade. It is, therefore, significant to note that standardising assembly and analysis software allows for improved comparison of the data generated by the different platforms and data analysed by different software. Nonetheless, this study complements previous research investigating the efficacy of Ion Torrent and Illumina platforms for sequencing viral pathogens (23, 24, 50).

This study also provides fundamental insight into the requirements and challenges of the different methodologies for their intended purpose of genomic surveillance in a high throughput research laboratory setting. As a precaution, potential users of publicly available sequences need to take into account technologies used, assembly and analysis pipelines implemented and as well as the quality of sequences generated from samples of differing integrity, particularly when comparing or combining data generated on different platforms. It is also advisable to ensure that primers used in the early stages of sequencing are kept up-to-date in a genomic surveillance laboratory setting.

Genomic surveillance is vital in combating the spread of SARS-CoV-2 and has significantly contributed to the swift identification of novel mutations within the Delta VOC (15, 16). Both platforms accurately differentiate between Beta and Delta VOC’s. Additionally, the Ion Torrent S5 had an advantage for samples with lower viral loads (higher Ct) in comparison to the Illumina MiSeq; however, further assessment is required. In terms of sequencer performance, genomic coverage, high-quality sequences, and data generated, the Illumina MiSeq and Ion Torrent S5 are both well suited to the task as sequencing platforms for the WGS of SARS-CoV-2. The findings of this study, therefore, show that the Ion Torrent S5 and Illumina MiSeq, when coupled with Genome Detective and NextClade, have individual advantages and disadvantages, but are both reliable platforms for the genomic surveillance of SARS-CoV-2.

Sample population, collection, and processing

As part of the NGS-SA initiative, we used remnant routine genomic surveillance samples that were collected from the Eastern Cape between March and June 2021 to assess the sequencing data generated from two NGS platforms (17). The National Health Laboratory Service (NHLS) in Port Elizabeth, Eastern Cape collected nasopharyngeal and oropharyngeal swabs from inpatients and outpatients in clinics and hospitals. The NHLS team also determined SARS-CoV-2 positivity using qualitative polymerase chain reaction (qPCR) assays on the Cobas® SARS-CoV-2 (Roche Molecular, Pleasanton, CA, USA), Xpert Xpress SARS-CoV-2 (Cepheid, CA, USA) or Seegene Allplex™ 2019-nCoV Assay and the CFX96 DX™, Bio-Rad (Seegene, Inqaba Biotec, SA). Sample Ct values were provided as part of the metadata files accompanying all samples. The remnant nasopharyngeal and oropharyngeal swabs from 183 patients were used for this study (Table S1) irrespective of their Ct values. The RNA was extracted at the KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP) based in Durban, KwaZulu-Natal, South Africa. The extracted RNA was used for independent library preparation at the respective sequencing sites followed by sequencing on the Illumina MiSeq based at KRISP and on the Ion Torrent S5 based at the Central Analytical Facilities (CAF) in Stellenbosch, Western Cape. Samples were sequenced over two runs on each platform for a direct comparison of the sequence data generated.

Nucleic acid extraction

All 183 samples were extracted as per the manufacturer’s instructions using the CMG-1049 kit on the Chemagic 360 instrument (Perkin Elmer, Hamburg, Germany). Total nucleic acid (TNA) extraction was performed using a total volume of 200 µL per sample added to 450 µL lysis buffer and 14 µl Poly A RNA / proteinase-K reaction mixture. The TNA was eluted in 100 µL elution buffer in which two aliquots were made and stored at -20 C until further use.

Complementary DNA (cDNA) synthesis

cDNA synthesis of samples sequenced on the Illumina MiSeq was performed with the SuperScript IV reverse transcriptase using random hexamers (Life Technologies) while cDNA synthesis on samples sequenced on the Ion Torrent S5 was performed with the SuperScript Vilo cDNA synthesis kit (Life Technologies).

Library preparation and next-generation sequencing strategies

Sequence libraries for Illumina MiSeq and Ion Torrent S5 sequencing were manually prepared using the Nextera DNA Flex Library Prep kit and the Ion AmpliSeq Library Kit Plus, respectively. Templating of the prepared libraries onto the sequencing chip for the Ion Torrent S5 was automated using the Ion Chef and the Ion AmpliSeq SARS-CoV-2 Research Panel and Ion AmpliSeq kit for Chef DL8.

Multiplex Tiling PCR, Illumina MiSeq Library Preparation, and Sequencing

Samples sequenced using Illumina MiSeq were amplified using a multiplex PCR as previously published (21). ARTIC primers were designed on Primal Scheme (http://primal.zibraproject.org/) to generate 400 base pair (bp) amplicons with 70 bp overlaps (51). The primers (v3 at that time) were used to amplify the SARS-CoV-2 whole genome (30 kb). Amplicons were purified using Ampure XP purification beads (Beckman Coulter, High Wycombe, UK), using a 1:1 ratio. All purified amplicons were quantified on the Qubit 4.0 instrument using the Qubit double-stranded DNA (dsDNA) High Sensitivity assay kit (Life Technologies). Purified amplicons were stored at 4°C prior to further use. Indexed paired-end libraries were prepared using the Nextera DNA Flex Library Prep kit and the Nextera DNA CD indexes (Illumina, San Diego, USA) in compliance with the manufacturer’s instructions. Libraries were purified and normalized to 4nM prior to the pooling. The pooled library was denatured using 0.2N sodium hydroxide followed by dilution to obtain a final concentration of 8 pM. A minimum of two controls were included in each sequencing run with 96 samples processed in total. The library was spiked with 1% PhiX Control v3 (adapter-ligated library used as a control) and was sequenced using a 500-cycle v2 MiSeq Reagent Kit on the Illumina MiSeq instrument (Illumina, San Diego, CA, USA) (21).

Ion Torrent S5 Library preparation and sequencing

Manual library preparation workflow was performed using the overlapping amplicon strategy and a 2-pool primer panel on the Ion AmpliSeq Library Kit Plus. Primer pool 1 consisted of 125 primer pairs and primer pool 2 consisted of 122 primer pairs generating amplicons within a range of 125 bp to 275 bp in length. It is imperative to note that the panel design allows for the tiling of ~ 237 amplicons across the SARS-CoV-2 genome (~ 30kb) resulting in a sequencing coverage of 99,0%, covering positions 43 to position 29,842 (positions relative to the SARS-CoV-2 reference, GenBank accession number NC_045512). An additional five primer pairs targeting human expression were used as controls within the panel. SARS-COV-2 targets were set up using a 16-cycle target amplification for samples with a broad range of viral load. Amplified targets for the two primer pools were combined and ligated using the Ioncode Barcode Adapters 1–96 kit. Automated templating of 70 pM libraries was loaded onto two high sequencing data output Ion 540 Chips per sequencing run using the Ion Chef followed by sequencing on the Ion Torrent S5 as per manufacturer’s protocol (Ion 540™ Kit – Chef User Guide Pub. No. MAN0010851). A minimum of two controls were included in each run with 96 samples processed per sequence run using two 540 chips. All runs were pre-planned and set up using the Ion Torrent suite software (v5.16.0). All information on the Ion AmpliSeq SARS-CoV-2 Research Panel is available at https://ampliseq.com.

Sequence Data Analysis

The data analysis was processed by one bioinformatician at KRISP and the sequences generated by the Illumina MiSeq were originally analysed prior to those generated on the Ion Torrent S5 due to routine sequencing schedules. The raw paired-end reads generated from Illumina MiSeq and the raw single-end reads from Ion Torrent sequencing (FASTQ files) were assembled using the web-based application Genome Detective, version 1.126 (https://www.genomeDetective.com/)(52). Genome Detective is a web-based assembly tool that incorporates the use of de novo and reference-based mapping algorithms to assemble whole viral genomes. The initial assemblies obtained from Genome Detective were refined by aligning mapped reads to the reference and generating consensus sequences for each of the comparisons runs on both sequencing platforms. Consensus sequences for both platforms were assessed using NextClade (https://clades.nextstrain.org/, version 1.7.4) for sequence clade assignment, identification, quantification of mutations, and sequence quality analyses. NextClade is a classification tool that utilizes Nextstrain nomenclature to distinguish differences between a given sequence and a reference sequence in order to identify various clades and VOCs(39). Additional data regarding S-gene coverage was obtained by uploading consensus sequences to Genome Detective. Sequences (FASTA files) that passed quality control with greater than 80% genomic coverage and less than 100 mutations were deposited onto Global Initiative on Sharing Avian Influenza Data (GISAID) (https://www.gisaid.org/) (19). The GISAID accession identifiers are included as part of Supplementary Table S2. For uploading purposes only, the majority of sequences uploaded onto GISAID were obtained from data generated on the Illumina MiSeq platform, as these were initially sequenced and analysed. Outstanding sequences that passed quality control were obtained from sequencing data generated on the Ion Torrent S5 and uploaded onto GISAID thereafter.

Statistical Evaluation and Considerations

Data visualization and statistical analysis were performed using ggplot2 v3.3.6 package and R v.4.2.

A spearman’s ranked sum correlation test was performed to determine the relationship between viral load and coverage (genomic and S-gene) obtained from sequences generated on each platform. The Wilcoxon test was used to establish the difference in the range of genomic coverage obtained between platforms and to assess the difference in the quantification of mutations (total mutations, insertions, deletions, and substitutions) detected between the Ion Torrent S5 and the Illumina MiSeq platforms.

NGS: Next-generation Sequencing

WGS: Whole-genome Sequencing

SARS-CoV-2: Severe Acute Respiratory Syndrome Coronavirus 2

GISAID: Global Initiative on Sharing Avian Influenza Data

WHO: World health Organisation

KRISP: KwaZulu-Natal Research Innovation and Sequencing Platform

S-gene: Spike Gene

RNA: Ribonucleic Acid

VL: Viral Load

ONT: Oxford Nanopore Technology

COVID-19: Coronavirus Disease

Ct: Cycle threshold

PCR: Polymerase Chain reaction

Ns: Non-significant

QC: Quality Control

Ethics Approval and consent to participate

The University of KwaZulu-Natal Biomedical Research Ethics Committee (BREC) previously approved a study (protocol reference no. BREC/00001510/2020; project title: Spatial and genomic monitoring of COVID-19 cases in South Africa) which was used as umbrella ethics for this study. We have also applied for an additionally linked ethics to cover the extended analysis and interpretation of genomic surveillance data for this study (protocol reference no.: BREC/00004745/2022; project title: “Benchmarking South African genomic surveillance strategies: A look at genomic surveillance and epidemiological evaluation of SARS-CoV-2 transmission within Southern Africa”). All methods were performed in accordance with the relevant guidelines and regulations. We also used de-identified remnant nasopharyngeal and oropharyngeal swab samples from qPCR confirmed COVID-19 positive patients obtained from the national health laboratory services (NHLS) in South Africa. Informed consent for study participants was not applicable in this study and was waivered by BREC because de-identified (anonymous) remnant samples, which would otherwise have been discarded, were utilized.

Consent for publication

Not applicable

Availability of data and materials

All of the SARS-CoV-2 whole genome sequences that passed QC and were analysed in the present study are publicly available on the GISAID data repositories (https:// www.gisaid.org/, EPI_SET ID: EPI_SET_230203ym https://doi.org/10.55876/gis8.230203ym). A list of accession identifiers can be found in the supplementary section (Table S2: GISAID Accession Identifiers).

Competing interests

TdO received fees from Illumina as a member of the Infectious Diseases Testing Advisory Board. All other authors have no competing interests to declare.

Funding

This study was funded by the UKZN and KRISP.

Authors’ contributions

Conceptualization: UR, EW, RL, JG; Methodology: UR, CvH, SP, JG, OL-A; Formal Analysis: UR, DT, SvW; Resources: UR, JG, SP; Data Curation: UR, YR, AM, DT, EJS; Writing – original draft: UR; Writing – review & editing: UR, JG, RL, SvW, CvH, CB, OL-A, EJS; Visualization: UR, YR, DT; Supervision: JG, RL, TdO; Project administration: TdO; Funding acquisition: TdO; Final approval of manuscript: All authors.

Acknowledgements

The Strategic Health Innovation Partnerships Unit of the South African Medical Research Council supported this research reported in this publication, with funds received from the South African Department of Science and Innovation (DSI). Department of Technology and Innovation as part of the Network for Genomic Surveillance in South Africa (NGS-SA). Genomics Surveillance in South Africa was supported in part through National Institutes of Health USA grant U01 AI151698 for the United World Antiviral Research Network (UWARN) and by the Rockefeller Foundation (Prof. Tulio de Oliveira and Dr. Eduan Wilkinson). KRISP have received donations from Chan Soon-Shiong Family Foundation (CSSFF) and Illumina. Sub-Saharan African Network for TB/HIV Research Excellence (SANTHE), a DELTAS Africa Initiative [grant# DEL-15-006]. The DELTAS Africa Initiative is an independent funding scheme of the African Academy of Sciences (AAS)’s Alliance for Accelerating Excellence in Science in Africa (AESA) and supported by the New Partnership for Africa’s Development Planning and Coordinating Agency (NEPAD Agency) with funding from the Wellcome Trust [grant # 107752/Z/15/Z] and the government of the United Kingdom (UK). The views expressed in this publication are those of the author(s) and not necessarily those of AAS, NEPAD Agency, Wellcome Trust or the UK government.

Konings F, Perkins MD, Kuhn JH, Pallen MJ, Alm EJ, Archer BN et al. SARS-CoV-2 Variants of Interest and Concern naming scheme conducive for global discourse. Nat Microbiol [Internet]. 2021;6(July):821–3. Available from: http://dx.doi.org/10.1038/s41564-021-00932-w
Tay JH, Porter AF, Wirth W, Duchene S. The Emergence of SARS-CoV-2 Variants of Concern Is Driven by Acceleration of the Substitution Rate. Mol Biol Evol [Internet]. 2022 Feb 3 [cited 2022 Feb 15];39(2). Available from: http://www.ncbi.nlm.nih.gov/pubmed/35038741
Boehm E, Kronig I, Neher RA, Eckerle I, Vetter P, Kaiser L. Novel SARS-CoV-2 variants: the pandemics within the pandemic. Clin Microbiol Infect. 2021 Aug 1;27(8):1109–17.
WHO Coronavirus (COVID-19.) Dashboard | WHO Coronavirus (COVID-19) Dashboard With Vaccination Data [Internet]. [cited 2022 May 1]. Available from: https://covid19.who.int/
Babb de Villiers C, Blackburn L, Cook S, Janus J, Johnson E, Kroese M. Next Generation Sequencing for SARS-CoV-2.Found Innov New Diagnostics. 2021
Faria NR, Silva Azevedo DSocorroD, Kraemer R, Souza MUG, Cunha R, Hill MS et al. SC,. Zika virus in the Americas: Early epidemiological and genetic findings. Science [Internet]. 2016 Apr 15 [cited 2021 Nov 30];352(6283):345. Available from: /pmc/articles/PMC4918795/
Quick J, Loman NJ, Duraffour S, Simpson JT, Severi E, Cowley L, et al. Real-time, portable genome sequencing for Ebola surveillance. Nature. 2016;530(7589):228–32.
Butera Y, Mukantwari E, Artesi M, Umuringa J, d’arc, O’Toole ÁN, Hill V et al. Genomic sequencing of SARS-CoV-2 in Rwanda reveals the importance of incoming travelers on lineage diversity. Nat Commun [Internet]. 2021;12(1):1–12. Available from: http://dx.doi.org/10.1038/s41467-021-25985-7
Tegally H, Wilkinson E, Lessells RJ, Giandhari J, Pillay S, Msomi N et al. Sixteen novel lineages of SARS-CoV-2 in South Africa. Nat Med [Internet]. 2021;27(3):440–6. Available from: http://dx.doi.org/10.1038/s41591-021-01255-3
Tegally H, Wilkinson E, Giovanetti M, Iranzadeh A, Fonseca V, Giandhari J et al. Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa.medRxiv. 2020;2.
Yin Y, Wunderink RG. MERS, SARS and other coronaviruses as causes of pneumonia. Respirology. 2018;23(2):130–7.
Giandhari J, Pillay S, Wilkinson E, Tegally H, Sinayskiy I, Schuld M et al. Early transmission of SARS-CoV-2 in South Africa: An epidemiological and phylogenetic report. Int J Infect Dis [Internet]. 2021 Feb;103:234–41. Available from: http://www.ncbi.nlm.nih.gov/pubmed/32511505%0Ahttp://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC7273273
Khan K, Karim F, Cele S, Reedoy K, San JE, Lustig G et al. Omicron infection enhances Delta antibody immunity in vaccinated persons. Nature [Internet]. 2022 May 6; Available from: https://www.nature.com/articles/s41586-022-04830-x
Engelbrecht S, Delaney K, Kleinhans B, Wilkinson E, Tegally H, Stander T, et al. Multiple early introductions of sars-cov-2 to cape town, South Africa. Viruses. 2021;13(3):1–11.
Tegally H, Wilkinson E, Giovanetti M, Iranzadeh A, Fonseca V, Giandhari J et al. Detection of a SARS-CoV-2 variant of concern in South Africa. Nature [Internet]. 2021;592(7854):438–43. Available from: http://dx.doi.org/10.1038/s41586-021-03402-9
Wilkinson E, Giovanetti M, Tegally H, San JE, Lessells R, Cuadros D et al. A year of genomic surveillance reveals how the SARS-CoV-2 pandemic unfolded in Africa. Science (80-) [Internet]. 2021 Oct 22;374(6566):423–31. Available from: https://www.science.org/doi/10.1126/science.abj4336
Msomi N, Mlisana K, de Oliveira T, Willianson C, Bhiman JN, Goedhals D, et al. A genomics network established to respond rapidly to public health threats in South Africa. The Lancet Microbe. 2020;1(6):e229–30.
Msomi N, Govender K, Laguda-akingba O. The implementation of SARS-CoV-2 genomic surveillance in South Africa. 2021;(May).
GISAID - Initiative [Internet]. [cited 2022 May 3]. Available from: https://www.gisaid.org/
Charre C, Ginevra C, Sabatier M, Regue H, Destras G, Brun S, et al. Evaluation of NGS-based approaches for SARS-CoV- 2 whole genome characterisation. Virus Evol. 2020;6(2):1–8.
Pillay S, Giandhari J, Tegally H, Wilkinson E, Chimukangara B, Lessells R et al. Whole Genome Sequencing of SARS-CoV-2: Adapting Illumina Protocols for Quick and Accurate Outbreak Investigation during a Pandemic. Genes (Basel) [Internet]. 2020 Aug 17;11(8):949. Available from: https://www.medrxiv.org/content/10.1101/2020.04.17.20064691v1
Tshiabuila D, Giandhari J, Pillay S, Ramphal U, Ramphal Y, Maharaj A et al. Comparison of SARS – CoV – 2 sequencing using the ONT GridION and the Illumina MiSeq. BMC Genomics [Internet]. 2022;1–17. Available from: https://doi.org/10.1186/s12864-022-08541-5
Quail MA, Smith M, Coupland P, Otto TD, Harris SR, Connor TR, et al. A tale of three NGS sequencing platforms. BMC Genomics. 2012;13(341):13.
Salipante SJ, Kawashima T, Rosenthal C, Hoogestraat DR, Cummings LA, Sengupta DJ, et al. Performance comparison of Illumina and Ion Torrent next-generation sequencing platforms for 16S rRNA-based bacterial community profiling. Appl Environ Microbiol. 2014;80(24):7583–91.
Lahens NF, Ricciotti E, Smirnova O, Toorens E, Kim EJ, Baruzzo G, et al. A comparison of Illumina and Ion Torrent sequencing platforms in the context of differential gene expression. BMC Genomics. 2017;18(1):602.
Marine RL, Magaña LC, Castro CJ, Zhao K, Montmayeur AM, Schmidt A et al. Comparison of Illumina MiSeq and the Ion Torrent PGM and S5 platforms for whole-genome sequencing of picornaviruses and caliciviruses. J Virol Methods [Internet]. 2020 Jun;280(February):113865. Available from: https://doi.org/10.1016/j.jviromet.2020.113865
Merriman B, Torrent I, Rothberg JM. Progress in Ion Torrent semiconductor chip based sequencing. Electrophoresis. 2012;33(23):3397–417.
Ion GeneStudio™ S. 5 System [Internet]. [cited 2022 May 5]. Available from: https://www.thermofisher.com/order/catalog/product/A38194
Liu L, Li Y, Li S, Hu NH, He Y, Pong R et al. Comparison of nextgeneration sequencing systems.Adv Biofuel Prod Algae Aquat Plants. 2016;(February):279–303.
Plitnick J, Griesemer S, Lasek-Nesselquist E, Singh N, Lamson DM, George KS. Whole-genome sequencing of sars-cov-2: Assessment of the ion torrent ampliseq panel and comparison with the illumina miseq artic protocol. J Clin Microbiol. 2021;59(12):1–8.
Liu J, Chen X, Liu Y, Lin J, Shen J, Zhang H et al. Characterization of SARS-CoV-2 worldwide transmission based on evolutionary dynamics and specific viral mutations in the spike protein. Infect Dis Poverty [Internet]. 2021;10(1):1–15. Available from: https://doi.org/10.1186/s40249-021-00895-4
Rachiglio AM, De Sabato L, Roma C, Cennamo M, Fiorenza M, Terracciano D et al. SARS-CoV-2 complete genome sequencing from the Italian Campania region using a highly automated next generation sequencing system. J Transl Med [Internet]. 2021;19(1):1–10. Available from: https://doi.org/10.1186/s12967-021-02912-4
Rabaan AA, Tirupathi R, Sule AA, Aldali J, Mutair A, Al, Alhumaid S et al. Viral dynamics and real-time rt-pcr ct values correlation with disease severity in covid-19.Diagnostics. 2021;11(6).
Zuckerman NS, Bucris E, Erster O, Mandelboim M, Adler A, Burstein S et al. Prolonged detection of complete viral genomes demonstrated by SARS-CoV-2 sequencing of serial respiratory specimens. PLoS One [Internet]. 2021;16(8 August):1–7. Available from: http://dx.doi.org/10.1371/journal.pone.0255691
Target Selection for Ion Torrent Next-Generation Sequencing - ZA. [cited 2021 Dec 8]; Available from:
Marine RL, Magaña LC, Castro CJ, Zhao K, Montmayeur AM, Schmidt A et al. Comparison of Illumina MiSeq and the Ion Torrent PGM and S5 platforms for whole-genome sequencing of picornaviruses and caliciviruses.J Virol Methods. 2020;280(April).
Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, et al. Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotechnol. 2012;30(5):434–9.
Laehnemann D, Borkhardt A, McHardy AC. Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Brief Bioinform. 2016;17(1):154–79.
Aksamentov I, Roemer C, Hodcroft EB, Neher RA. Nextclade: clade assignment, mutation calling and quality control for viral genomes. Zenodo [Internet]. 2021; Available from: https://doi.org/10.5281/zenodo.5607694
Bragg LM, Stone G, Butler MK, Hugenholtz P, Tyson GW. Shining a Light on Dark Sequencing: Characterising Errors in Ion Torrent PGM Data.PLoS Comput Biol. 2013;9(4).
Rambaut A, Holmes EC, O’Toole Á, Hill V, McCrone JT, Ruis C et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol [Internet]. 2020;5(11):1403–7. Available from: http://dx.doi.org/10.1038/s41564-020-0770-5
Alessandrini F, Caucci S, Onofri V, Melchionda F, Tagliabracci A, Bagnarelli P, et al. Res Panel Massive Parallel Sequencing Genes (Basel). 2020;11(December 2019):1–12.
Davis JJ, Long SW, Christensen PA, Olsen RJ, Olson R, Shukla M, et al. Analysis of the ARTIC Version 3 and Version 4 SARS-CoV-2 Primers and Their Impact on the Detection of the G142D Amino Acid Substitution in the Spike Protein. Microbiol Spectr. 2021;9(3):2–6.
Kuchinski KS, Nguyen J, Lee TD, Hickman R, Jassem AN, Hoang LMN et al. Mutations in emerging variant of concern lineages disrupt genomic sequencing of SARS-CoV-2 clinical specimens. Int J Infect Dis [Internet]. 2022;114:51–4. Available from: https://doi.org/10.1016/j.ijid.2021.10.050
Wolter N, Jassat W, Walaza S, Welch R, Moultrie H, Groome M et al. Early assessment of the clinical severity of the SARS-CoV-2 omicron variant in South Africa: a data linkage study. Lancet [Internet]. 2022;399(10323):437–46. Available from: http://dx.doi.org/10.1016/S0140-6736(22)00017-4
Vogels CBF, Breban MI, Ott IM, Alpert T, Petrone ME, Watkins AE, et al. Multiplex qPCR discriminates variants of concern to enhance global surveillance of SARS-CoV-2. PLoS Biol. 2021;19(5 May):1–12.
Challen R, Dyson L, Overton CE, Guzman-Rincon LM, Hill EM, Stage HB et al. Early epidemiological signatures of novel SARS-CoV-2 variants: establishment of B.1.617.2 in England. medRxiv [Internet]. 2021;2021.06.05.21258365. Available from: https://www.medrxiv.org/content/10.1101/2021.06.05.21258365v1%0Ahttps://www.medrxiv.org/content/10.1101/2021.06.05.21258365v1.abstract
Lambisia AW, Mohammed KS, Makori TO, Ndwiga L, Mburu MW, Morobe JM, et al. Optimization of the SARS-CoV-2 ARTIC Network V4 Primers and Whole Genome Sequencing Protocol. Front Med. 2022;9(February):1–8.
Jacot D, Pillonel T, Greub G, Bertelli C. Assessment of SARS-CoV-2 Genome Sequencing: Quality Criteria and Low-Frequency Variants. Mellmann A, editor. J Clin Microbiol [Internet]. 2021 Sep 20;59(10):1–10. Available from: https://journals.asm.org/doi/10.1128/JCM.00944-21
Szargut M, Cytacka S, Serwin K, Urbańska A, Gastineau R, Parczewski M et al. SARS-CoV-2 Whole-Genome Sequencing by Ion S5 Technology—Challenges, Protocol Optimization and Success Rates for Different Strains.Viruses. 2022;14(6).
Quick J, Grubaugh ND, Pullan ST, Claro IM, Smith AD, Gangavarapu K, et al. Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples. Nat Protoc. 2017;12(6):1261–6.
Genome Detective Virus Tool [Internet]. [cited 2021 Dec 7]. Available from: https://www.genomedetective.com/app/typingtool/virus/

Competing interest reported. Tulio de Oliveira received fees from Illumina as a member of the Infectious Diseases Testing Advisory Board. All other authors have no competing interests to declare.

SupplementaryTablesEdited2023v2.docx

Download PDF

Journal Publication

published 29 May, 2024

Read the published version in OBM Genetics →

Version 1

posted

You are reading this latest preprint version

Comparing SARS-CoV-2 sequencing methodologies during early phase detection of the Delta variant in South Africa

Status:

Journal Publication

Version 1

Abstract

Figures

Background

Results

Process Flow

Sequencer-specific attributes

Sequencing Platform Performance Comparison

Genome Sequence Quality Metrics

Genomic and S-gene coverage of SARS-CoV-2

Effect of Viral load on genomic and S-gene coverage of SARS-CoV-2

SARS-CoV-2 Clade assignments

Discussion

Conclusion

Materials And Methods

Sample population, collection, and processing

Nucleic acid extraction

Complementary DNA (cDNA) synthesis

Library preparation and next-generation sequencing strategies

Multiplex Tiling PCR, Illumina MiSeq Library Preparation, and Sequencing

Ion Torrent S5 Library preparation and sequencing

Sequence Data Analysis

Statistical Evaluation and Considerations

Abbreviations

Declarations

References

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1