SARS-CoV-2 has caused a global health crisis as it is highly infectious and risks mutations that could result in more lethal variants (1, 25). A major factor in helping curb the spread of the virus and decreasing the infection rate is rapidly sequencing the virus to detect new strains and identify transmission chains (7). The sequencing runtime on the MiSeq for Run116 was 36 hours, whilst on the GridION it was 21 hours. This 10-hour decrease in sequencing time allows for 480 samples to be sequenced each day on the GridION in comparison to the 96 that can be sequenced on the MiSeq every 36 hours. This is in agreement with reports that nanopore sequencing takes approximately 20 hours as a rapid library prep kit supplied by ONT can be used (26, 27). The lack of an image analysis step during nanopore sequencing facilitates real-time base-calling, which allows for the rapid detection of DNA for pathogen screening from clinical samples (28).
Studies have shown that Illumina sequencing may still be the most accurate way to sequence viruses (29). The majority of errors noted between Nanopore and Illumina consensus genomes have been attributed to Nanopore sequencing errors (30). Run116 samples were sequenced on both platforms to determine whether there was a significant difference in the sequencing coverage regardless of the sample. Sequencing coverage was significantly greater with the MiSeq when compared to the GridION and this result was also observed when comparing all sequence runs. Sequence coverage can be affected by sequencing time and thus GridION coverage may have increased if left to sequence for longer. We also note a statistically significant higher sequencing coverage for the S-gene and ORF1ab-gene with the MiSeq than with the GridION. Nanopore technology has been shown to provide lower per-read sequencing coverage when compared to short-read sequencing (31). Coverage biases seen with ONT’s sequencing protocol can be a result of truncated reads caused by pore blocking or fragmentation during library prep as transcripts are sequenced from the 3’ to 5’ end (32). ONT has made error correction tools such as Nanopolish available to try and reduce the error rate observed with Nanopore sequencing (33). In this study, variant calling was achieved using Nanopolish but we still note a significantly lower sequence quality obtained from the GridION than the MiSeq. These low-quality sequences cannot be used to confidently acquire information on the infecting viral strain and are generally removed through a series of quality control checks (34). Although more sequences can be produced using the GridION than the MiSeq, the low-quality sequences which are removed would eliminate the advantage of having a large number of consensus genomes produced.
Higher sequencing coverage for the Illumina MiSeq has been associated with lower Ct scores (21). Ct score is a value that refers to the number of cycles required to amplify viral RNA to a detectable level. There is therefore an inverse relationship between Ct score and viral load (35). In this investigation, we also noted an inverse relationship between Ct score and sequence coverage for both GridION and MiSeq sequencing. There is, however, a significantly stronger negative correlation seen with the GridION than the MiSeq, which may imply that the MiSeq’s sequencing capabilities are less affected by sample Ct score and as a result, can be used for sequencing of samples within the early stages of infection when viral load is still low. This was, however, limited by not having the same runs to compare between the GridION and the MiSeq. Further analysis is required as the number of samples analyzed for each run was low and inconsistent due to the availability of Ct scores received with sample metadata. Additional analyses should be conducted to understand characteristics such as coverage bias, sequence biases, and reproducibility for the GridION sequencing platform (31). Sample quality may also have an effect on sequencing and thus it is very important to maintain a cold chain during storage of swabs and RNA.
Identifying sequence mutations involves aligning a sequence to a reference genome and identifying changes within the sequence. This is important, as it allows us to identify gene variants that may play a major role in the diagnosis of diseases (36). It has been shown that long-read sequencing platforms have a high error rate, which is mostly indels that are assumed to be randomly distributed within each read (37, 38). Prediction and interpretation of protein sequences may, therefore, be critically affected due to frameshifts and premature stop codons that may be introduced by the indels (39).
There was a significantly greater number of mutations detected by the MiSeq than the GridION for identical samples sequenced on both platforms. Although Nanopore platforms have been shown to make a large number of indel errors, in this study the MiSeq had a significantly higher number of insertions than the GridION. Paired-end sequencing, utilized by Illumina MiSeq, produces twice the number of reads, for the same sample and library preparation efforts, as single-end sequencing. This allows for a more accurate read alignment and detection of indel variants (40). Short read lengths have been shown to hinder the assignment of reads to parts of the genome that are complex, phasing of variants, resolving regions that are repeated, and the introduction of gaps and ambiguous regions in de novo assemblies. Longer reads can be used for sequencing of extended repetitive regions, allowing for the identification of mutations that are generally associated with disease (41). The higher number of indels noted with GridION sequencing highlights that genomic surveillance using Nanopore sequencing should be conducted cautiously as incorrect information on a viral strain can be obtained.
The rapid increase in COVID-19 cases has been linked to different SARS-CoV-2 viral lineages (42). Viral lineages are separated based on the number and type of mutations they contain that differ from the parent strain (43). From the 93 sequences analyzed from both platforms, 27 sequences were classified within different clades. These sequences had unique mutations and the clade differences noted between the two platforms were 20A – 20C and 20C – 20H(Beta, V2). As the number of indels and substitutions produced by the MiSeq and the GridION were significantly different, we can expect there to be differences in clade classifications as viral clades are subject to viral-defining mutations (25). Table 3 shows that the GridION sequences have lower coverages than the MiSeq sequences. This may be one of the factors causing a difference in the clade assignment as errors arising from the amplification and sequencing process may result in incomplete genome coverage, which affects phylogenetic inference (44). Rambaut et al., 2020 suggests that new lineages should only be proposed if the genome coverage exceeds 70% of the coding region. Degradation of RNA can result in the introduction of mutations, which may cause a variant change (45). The GridION library for RUN116 was prepared simultaneously with that of the MiSeq and the amount of RNA used is also lower. Therefore, we can eliminate the possibility of RNA degradation and RNA input amount as factors that may have caused a difference in the variants called by each instrument. Lineages identified by the GridION need to be further analyzed to determine whether the mutations are valid or are a result of sequencing errors. Accurate identification of lineages can assist in identifying transmission chains and allow for the development of diagnostic methods and treatments (42).