In principle, a combination of phylogeny- and distance-based criteria in the analysis of genomic sequences is required for genotyping a virus (19, 20, 22). To define a subtype of enterovirus, we proposed a distance criterion of over 15% between subtypes at the genomic level and/or at least at the near-complete VP1 gene level (when genomic sequences are not available) together with a well-supported clade (over 75% bootstrap value support). If two or more well-supported clades are formed within the same subtypes, they can be further divided into sub-subtypes based on a cut-off of over 12% genetic distance. To reclassify the subtypes of four dominant HFMD-related enteroviruses EVA71, CAV16, CAV6 and CVA10, we preformed phylogenetic analyses of both full-length genomic sequences and near-complete VP1 gene sequences.
Phylogenetic classification of EVA71
All 922 near full-length genomic sequences of EVA71 available in GenBank were downloaded (on November, 2019). Of them, 913 with a length of more than 7000 nt were subject to sequence alignment. After removing highly homologous sequences with more than 97% sequence identity, 131 representative genomic sequences were used to construct maximum likelihood (ML) tree. The ML tree of the genomic sequences showed that the vast majority of EVA71 strains were clustered within three well-supported large clades (with 100% bootstrap value), and the others formed small clusters or independent phylogenetic branches (Fig. 1A). Together with the genetic distance data (Table 1), EVA71 was classified into seven subtypes, and designated as genotypes A to G (Fig. 1A), in keeping with previous nomenclature order of EVA71 (Fig. 1A). The three large clades correspond to subtypes B, C and G. Among the seven subtypes, subtypes E had only one available full-length genomic sequence. Interestingly, one strain (DL71/CHN/2012) that was clustered between subtypes C and B, was not defined as an independent subtype because it had about 13.4% genetic distances with subtypes C, and was highly suspected to be an inter-subtype recombinant.
Table 1
Mean genetic distance and standard error among genotypes of EVA71.
Subtype | A | B | C | D | E | F | G | Unclassified 1 | Unclassified 2 | Unclassified 3 | Unclassified 4 |
A | NA/0.01 ± 0.00 | 0.215 ± 0.014 | 0.207 ± 0.013 | NA | 0.234 ± 0.014 | NA | 0.208 ± 0.012 | 0.326 ± 0.018 | 0.218 ± 0.013 | 0.231 ± 0.014 | 0.222 ± 0.012 |
B | 0.235 ± 0.006 | 0.09 ± 0.00/0.09 ± 0.00 | 0.185 ± 0.010 | NA | 0.221 ± 0.011 | NA | 0.206 ± 0.010 | 0.317 ± 0.016 | 0.194 ± 0.011 | 0.207 ± 0.012 | 0.215 ± 0.011 |
C | 0.239 ± 0.006 | 0.215 ± 0.005 | 0.07 ± 0.00/0.06 ± 0.00 | NA | 0.197 ± 0.011 | NA | 0.135 ± 0.007 | 0.311 ± 0.017 | 0.200 ± 0.011 | 0.197 ± 0.012 | 0.204 ± 0.011 |
D | 0.226 ± 0.006 | 0.220 ± 0.004 | 0.203 ± 0.004 | 0.06 ± 0.00/NA | NA | NA | NA | NA | NA | NA | NA |
E | 0.225 ± 0.006 | 0.222 ± 0.005 | 0.238 ± 0.005 | 0.191 ± 0.005 | NA/0.09 ± 0.01 | NA | 0.200 ± 0.010 | 0.320 ± 0.017 | 0.208 ± 0.011 | 0.195 ± 0.011 | 0.188 ± 0.010 |
F | 0.224 ± 0.007 | 0.226 ± 0.005 | 0.194 ± 0.005 | 0.192 ± 0.005 | 0.213 ± 0.005 | NA/NA | NA | NA | NA | NA | NA |
G | 0.242 ± 0.005 | 0.235 ± 0.005 | 0.205 ± 0.004 | 0.180 ± 0.004 | 0.235 ± 0.005 | 0.197 ± 0.004 | 0.11 ± 0.00/0.10 ± 0.01 | 0.303 ± 0.016 | 0.202 ± 0.011 | 0.192 ± 0.010 | 0.208 ± 0.010 |
Unclassified 1 | NA | NA | NA | NA | NA | NA | NA | NA/NA | 0.332 ± 0.019 | 0.324 ± 0.017 | 0.324 ± 0.017 |
Unclassified 2 | NA | NA | NA | NA | NA | NA | NA | NA | NA/0.09 ± 0.01 | 0.200 ± 0.011 | 0.205 ± 0.011 |
Unclassified 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA/0.11 ± 0.01 | 0.172 ± 0.009 |
Unclassified 4 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA/0.11 ± 0.01 |
The data obtained from full-length and near-complete VP1 sequences are shown in lower left and top right quarters, respectively. NA: not available. It means that there are only one, no sequence, or more than two completely identical sequences. |
Whether complete VP1 sequences generated consistent phylogeny of EVA71 with the full-length genomic sequences and met the distance criterion of subtyping is critical to determine the accuracy of VP1 sequence-based classification, albeit it was previously widely used. To assess the performance of VP1 sequence-based classification, complete VP1 sequences from 131 full-length genomic sequences were subjected to further phylogenetic analysis, together with 31 additional near-complete VP1 sequences retrieved from GenBank. According to phylogeny-based and genetic distance-based criteria, five EVA71 subtypes A-D and G could also be classified based on complete VP1 sequences (Fig. 1B and Table 1). Apart from five identified subtypes, three well-supported small clusters, as well as a single branch at the root of the tree were observed in the VP1 tree (Fig. 1B), and met the genetic distance criterion of being new subtypes (Table 1). Because of the lack of available full-length genomic sequences, they were defined as unclassified subtypes 1–4. Their nomenclatures need to be determined in the future.
It is worth noting that the strains of subtypes D and F did not form independent subtype clades in the VP1 tree, but were clustered within the clade of subtype G (Fig. 1B). The inconsistency of phylogeny between full-length genomic and near-complete VP1 sequences suggests that recombination event might have occurred during the evolution of EVA71 subtypes D and F. Bootscan analyses confirmed that subtype D was involved in recombination between subtypes G and C, and subtype F was involved in recombination among subtypes C, D and G (supplementary Fig. S1A and B). In particular, the proportion of permuted trees appeared to be relative low (less than 50%) in the majority of the genomic regions of both subtypes D and F, implying that the recombination events occurred in the distant past. The suspected recombinant DL71/CHN/2012 was found to belong to subtype G in the VP1 region. Bootscan analyses illuminated that DL71/CHN/2012 originated recombination between subtypes G and C, and had a mosaic genome structure of G-C-G-C-G-C-G (Fig. 2A). The recombination pattern was further confirmed by separate phylogenetic analyses (Supplementary Fig. S2). Furthermore, a subtype G strain VR1432/China/2009 was found to be clustered between subtypes C and G in the VP1 tree, suggesting the presence of recombination between subtypes C and G at least in VP1 region (Fig. 1). Bootscan and separate phylogenetic analyses confirmed that VR1432/China/2009 was a recombinant between subtypes C and G with a mosaic genome structure of C-G-C-G-C-G (Fig. 2B and Supplementary Fig. S3). The recombinant strains DL71/CHN/2012 and VR1432/China/2009 were defined as EVA71 recombinant form RF01_CG and RF02_CG, respectively.
On the basis of full-length genomic and/or near-complete VP1 sequences, EVA71 was classified into seven subtypes from A to G, as well as two recombinant forms RF01_CG and RF02_CG. Furthermore, four unclassified subtypes 1–4 were identified based on near-complete VP1 sequences. The mean inter-subtype genetic distances ranged from 18–24.2% at the genome level and 13.5–33.2% at VP1 gene level, and the mean within-subtype distance ranged from 6–11% at the genome level and 1–11% at VP1 gene level (Table 1).
Phylogenetic Classification Of Cva16
All 142 available near full-length genomic sequences of CVA16 were retrieved from GenBank (on November, 2019). Of them, 138 had a genomic sequence of more than 7000 nt, and were used in sequence alignment. After the removal of 36 identical sequences, 102 representative genomic sequences were used in the phylogenetic analysis. Based on the phylogenetic relationship and genetic distance criterion (Fig. 3A and Table 2), CAV16 was classified into A, B, and C subtypes. Although subtype C was defined, it had only one available genomic sequence each, and was called as single genomic sequence subtype. To find more support for the classification of single genomic sequence subtype, and to identify more potential subtypes, a phylogenetic analysis using 102 VP1 sequences of the representative CVA16 strains and 19 additional VP1 sequences was performed. The additional VP1 sequences were selected if they closely clustered with any single genomic sequence subtype to form a well-supported clade, or they were unable to cluster with other cluster within any defined subtypes (A-C) in a preliminary phylogenetic analysis with all available near-complete VP1 sequences (data not shown). In the VP1 tree, six additional VP1 sequences were found to form a well-supported clade together with the representative strain of subtype C with a 100% bootstrap value (Fig. 3B). The subtype C clade completely met the genetic distance criterion of subtyping (Table 2), supporting the classification of subtype C. Furthermore, another five additional sequences formed an independent clade with 87% bootstrap value between the clades of subtypes B and C (Fig. 3B). The genetic distance analysis showed that the new clade had a mean distance of 12.2% to the subtype B strains (Table 2). Because of the lack of genomic sequences, the new clade was marked as unclassified (Fig. 3). As a result, three subtypes A, B, and C were identified for CVA16. In addition, a potential subtype was identified based on the near-complete VP1 analysis. The subtype B had mean genetic distances of 29.4% and 16.3% to subtypes A and C, and the unclassified subtype had mean genetic distances of 29.8%, 0.122 and 16.4% to subtypes A, B and C, respectively (Table 2). According to the classification of CVA16, the vast majority of the circulating strains belonged to subtype B.
Table 2
Mean genetic distance and standard error among genotypes of CVA16.
Subtype | A | B | C | Unclassified |
A | NA/NA | 0.294 ± 0.015 | 0.313 ± 0.018 | 0.298 ± 0.016 |
B | 0.257 ± 0.006 | 0.08 ± 0.00/0.07 ± 0.00 | 0.163 ± 0.010 | 0.122 ± 0.008 |
C | 0.266 ± 0.005 | 0.208 ± 0.004 | NA/0.05 ± 0.00 | 0.164 ± 0.011 |
Unclassified | NA | NA | NA | NA/0.07 ± 0.01 |
The data obtained from full-length and near-complete VP1 sequences are shown in lower left and top right quarters, respectively. NA: not available. It means that there are only one, no sequence, or more than two completely identical sequences. |
Phylogenetic Classification Of Cva6
A total of 230 near full-length genomic sequences of CVA6 were available in GenBank (on November, 2019), and 229 of them had a genomic sequence of more than 7000 nt. After the removal of 160 identical sequences, 69 full-length genomic sequences were used in the phylogenetic analysis. Simultaneously, the VP1 sequences of the 69 representative strains were also subjected to the phylogenetic analysis together with 17 additional VP1 sequences that might support the subtype with single genomic sequence or form potential new subtypes. Based on the phylogenetic relationship and genetic distance criterion (Fig. 4A and Supplementary Table 3), CAV6 was classified into three subtypes (A, B and C) at genomic sequence level. Of note, two independent small clusters formed by 5 strains with more than 99% bootstrap value support, and an independent branch by a strain (NIV43883/India/2004) were observed in the VP1 tree (Fig. 4B). These new clusters and branch were dispatched to unclassified subtypes 1, 2 and 3 (Fig. 4B and Table 3). Therefore, CVA6 was classified into three subtypes A-C. The mean inter-subtype genetic distances were all over 15% regardless of whether being analyzed at genomic or VP1 sequence levels (Table 3).
Table 3
Mean genetic distance and standard error among genotypes of CVA6.
Subtype | A | B | C | Unclassified 1 | Unclassified 2 | Unclassified 3 |
A | NA/0.00 ± 0.00 | 0.183 ± 0.015 | 0.206 ± 0.015 | 0.319 ± 0.020 | 0.185 ± 0.014 | 0.197 ± 0.015 |
B | 0.228 ± 0.007 | NA/NA | 0.186 ± 0.014 | 0.307 ± 0.020 | 0.193 ± 0.014 | 0.167 ± 0.013 |
C | 0.232 ± 0.006 | 0.185 ± 0.005 | 0.09 ± 0.00/0.07 ± 0.00 | 0.324 ± 0.019 | 0.191 ± 0.013 | 0.177 ± 0.012 |
Unclassified 1 | NA | NA | NA | NA/NA | 0.322 ± 0.019 | 0.338 ± 0.020 |
Unclassified 2 | NA | NA | NA | NA | NA/0.07 ± 0.01 | 0.195 ± 0.014 |
Unclassified 3 | NA | NA | NA | NA | NA | NA/0.10 ± 0.01 |
The data obtained from full-length and near-complete VP1 sequences are shown in lower left and top right quarters, respectively. NA: not available. It means that there are only one, no sequence, or more than two completely identical sequences. |
Phylogenetic Classification Of Cva10
A total of 127 near full-length genomic sequences of CAV10 were downloaded from GenBank (on November, 2019), and 124 that had a genomic sequence of more than 6800 nt were used. After the removal of 22 identical sequences, 102 representative genomic sequences were used in the phylogenetic analysis. Nine subtypes/sub-subtypes (A-G and H1 and H2) were identified for CVA10 based on the phylogenetic and genetic distance analyses of genomic sequences (Fig. 5A and Table 4). There was only one representative genomic sequence available for subtypes A-B, and D-G. To confirm the classification of CVA10, phylogenetic analyses of VP1 sequences were performed using those from the representative strains and 19 additional VP1 sequences retrieved from GenBank. The classification of CVA10 was well supported by the VP1 analyses regardless of using either phylogeny or genetic distances (Fig. 5B and Table 4). In particular, additional representative VP1 sequences were found to support the definitions of single genomic sequence subtypes A, B, G and F. According to the classification of CVA10, the vast majority of the circulating strains belonged to subtypes C and H (H1 and H2).
Table 4
Mean genetic distance and standard error among genotypes of CVA10.
Subtype | A | B | C | D | E | F | G | H1 | H2 |
A | NA/0.00 ± 0.00 | 0.292 ± 0.021 | 0.283 ± 0.020 | 0.269 ± 0.020 | 0.307 ± 0.023 | 0.289 ± 0.020 | 0.270 ± 0.019 | 0.299 ± 0.021 | 0.272 ± 0.019 |
B | 0.240 ± 0.007 | NA/0.05 ± 0.00 | 0.235 ± 0.018 | 0.236 ± 0.019 | 0.242 ± 0.019 | 0.230 ± 0.017 | 0.236 ± 0.017 | 0.212 ± 0.015 | 0.216 ± 0.015 |
C | 0.250 ± 0.006 | 0.232 ± 0.007 | 0.04 ± 0.00/0.04 ± 0.00 | 0.142 ± 0.013 | 0.176 ± 0.014 | 0.206 ± 0.015 | 0.196 ± 0.014 | 0.206 ± 0.016 | 0.206 ± 0.014 |
D | 0.243 ± 0.005 | 0.227 ± 0.007 | 0.170 ± 0.005 | NA/NA | 0.164 ± 0.015 | 0.186 ± 0.015 | 0.179 ± 0.015 | 0.203 ± 0.016 | 0.212 ± 0.016 |
E | 0.249 ± 0.007 | 0.228 ± 0.006 | 0.177 ± 0.006 | 0.158 ± 0.006 | NA/NA | 0.193 ± 0.015 | 0.196 ± 0.016 | 0.210 ± 0.017 | 0.217 ± 0.016 |
F | 0.270 ± 0.006 | 0.244 ± 0.006 | 0.226 ± 0.007 | 0.227 ± 0.006 | 0.232 ± 0.007 | NA/0.07 ± 0.010 | 0.188 ± 0.014 | 0.207 ± 0.016 | 0.214 ± 0.016 |
G | 0.259 ± 0.006 | 0.246 ± 0.008 | 0.224 ± 0.005 | 0.226 ± 0.006 | 0.219 ± 0.007 | 0.197 ± 0.006 | NA/0.11 ± 0.010 | 0.188 ± 0.014 | 0.191 ± 0.014 |
H1 | 0.270 ± 0.006 | 0.234 ± 0.006 | 0.234 ± 0.005 | 0.235 ± 0.005 | 0.231 ± 0.006 | 0.198 ± 0.006 | 0.195 ± 0.006 | 0.08 ± 0.00/0.05 ± 0.00 | 0.145 ± 0.011 |
H2 | 0.265 ± 0.006 | 0.235 ± 0.005 | 0.231 ± 0.005 | 0.233 ± 0.005 | 0.230 ± 0.005 | 0.198 ± 0.005 | 0.191 ± 0.005 | 0.154 ± 0.004 | 0.12 ± 0.00/0.09 ± 0.010 |
The data obtained from full-length and near-complete VP1 sequences are shown in lower left and top right quarters, respectively. NA: not available. It means that there are only one, no sequence, or more than two completely identical sequences. |
Interestingly, we found an inconsistent location of the clade of subtype B between the ML trees of genomic sequence and VP1 sequence (Fig. 5). The subtype B clade was located at the root of the ML tree of genomic sequences, but it was located at middle, and closely clustered with the clade of subtype H (including H1 and H2) in the VP1 sequence tree. This result implies that recombination events between subtype B and other subtypes have occurred during the evolution of subtype B strains. To confirm this hypothesis, we performed bootscan and phylogenetic analyses. The bootscan analysis showed that several genomic segments from subtypes H and F split the backbone of subtype A into seven segments (supplementary Fig. S1C), which was further confirmed by separate phylogenetic analyses (data not shown). Because subtype B was genetically closely related to subtype A, the subtype A strain used in the bootscan analysis can reflect the early strain of subtype B. Therefore, the bootscan analysis indicated that several subtypes H, F and C-related genomic segmented were inserted into the genomic backbone of subtype B by recombination during the evolution of subtype B.
Comparison Of The New Nomenclature With The Old Ones
According to the new classification, EVA71, CVA16, CVA6 and CVA10 were divided into 7, 3, 3 and 9 subtypes/sub-subtypes in alphabetical order, as well as 4, 1, 3 and 0 unclassified potential subtypes, respectively (Table 5). The major difference between the new and old nomenclatures of EVA71 was that previous C1-C3 and C5 were redefined as subtypes F and G, and C2 to subtype D. The dominant sub-subtype C4 was redefined as subtype C. Previous subtypes D, F and G were defined as unclassified subtypes because of the lack of genomic sequence. For CVA16, the changes were that previous C and B1 strains were merged into new sub-subtype B, and previous subtype D was redefined as subtype C. Previous nomenclature of CVA10 and CVA6 based on partial VP1 sequences seemed to be relatively confusing, and contained some misclassifications. The corresponding relationship of new and old nomenclatures of both enteroviruses are detailed in Table 5. For the ease of use of the new nomenclature system, GenBank accession numbers of reference strains and sequence alignment for each enterovirus are provided in Table 5 and supplementary file, respectively.
Table 5
Comparison of new and old nomenclatures of enteroviruses.
Enteroviruses | New nomenclature | Previous nomenclature | Reference strains |
EVA71 | A | A | U22521, GU434678, AB204853 |
B | B1-B5 | DQ341354, HQ189392, JF738001 |
C | C4 | HQ423142, AF302996, MG207963 |
D | C2 | MG013988, MG672479 |
E | E | MG672478 |
F | C1 | MK800119 |
G | C1-C3,C5 | HQ647173, JN835312, DQ341359 |
RF01_CG | C2 | KF982854 |
RF02_CG | C4 | KC954664 |
Unclassified 1 | NA | KY115200* |
Unclassified 2 | F | HG421068*, HG421069* |
Unclassified 3 | G | KF906417*, KF906416* |
Unclassified 4 | D | KF906421*, KF906419*, KF906425* |
CVA16 | A | A | U05876, JQ746659, EU812514 |
B | C/B1a,b/B1a,b,c | JQ746677, JF738003, JQ746678 |
C | D | MG957117, LT577761*,LT617115* |
Unclassified | B2 | AB465370*, AM292455*, AM292461* |
CVA6 | A | A/B | AY421764, AF081297* |
B | NA | LR027552 |
C | E/F/G | MH716144, JQ946054, LC126146 |
Unclassified 1 | A | KF412903* |
Unclassified 2 | A/B/C | JQ364886*, KP143075*, LC421656* |
Unclassified 3 | C/D/E | JN203517*, JQ364887* |
CVA10 | A | A/F | AY421767, AF081300* |
B | B/C/D | MH118041, KC879535*, HE572987* |
C | C/D/G | HQ728262, KP289406, KU578131 |
D | NA | MF678312 |
E | NA | MF422532 |
F | C/B/F | MF422531, GQ214177*, GQ214175* |
G | C | MH118054, KC879491*, KC879488 |
H1 | C/D | MH144599, MH118066, MH118057 |
H2 | MH118069, MH118036, MH144590 |
* complete VP1 sequence. |