Genome annotation and characterization
Consensus sequence for all 56 samples were successfully obtained through assembly by Geneious Assembler (Geneious Prime Version 2023.1.1) by using Thailand’s CHIKV isolate full genome sequence (Accession number: MN 974211 .1) as reference sequence. Total consensus sequence sizes for each sample ranged from 11 810 bps to 11 526 bps with varied coverage of flanking 5’-end and 3’-end untranslated regions. The full-length coding sequences (CDS) of 11 238 nucleotides, for all samples were successfully sequenced which includes two open-reading-frame (ORFs) and an intergenic region. The first ORF of 7 425 nucleotides, encodes non-structural proteins (nsP1, nsP2, nsP3 and nsP4) and the second ORF of 3 747 nucleotides encodes structural proteins (C, E3, E2, 6K and E1). The intergenic region of 66 nucleotides, consists of a regulatory sub-genomic promoter for structural protein ORF transcription. Six selected genomic sequences were submitted to NCBI GenBank database (Accession number: PP236105.1- PP236110.1). Genomic features were annotated by using Geneious Prime (version 2023.1.1) and Clone Manager 9 and an example of annotated genome (IMR24-21) is shown in Figure 2.
Phylogenetic and Mutation analysis
Maximum likelihood (ML) tree constructed with 156 reference sequences revealed the investigated virus genotype is ECSA-IOL sublinegae (Figure 3). Additionally, Maximum Clade Credibility (MCC) tree was also constructed, in order to analyse the evolution of the expanding IOL sublinage using 147 references sequence (Figure 4).
Mutation analysis in reference to strain S27 (African prototype), showed the lack of IOL sub-lineage initiating, E1-A226V mutation, as well as secondary adaptive mutation, E2-K252Q, compared to the isolate from previous major outbreak in Tangkak, Johor in 2008 (Accession number: KT324226.1) (Table 2). The presence of E1-K211E, E2-V264A and E1-I317V mutation, define these investigated viruses as part of the new E1-K211E/E2-V264A IOL sub-lineage CHIKVs (Figure 4) which comprise of wildtype or reverted E1-226A strains [20]. This indicates the investigated viruses originated from the dominant wildtype or reverted IOL strain detected in India from 2007-2010 [24-27] which expanded to northern India subcontinent during 2016-2017 outbreaks [28-30] before spreading to Thailand from Bangladesh in 2017 [31]. Besides, these characterized viruses also consist of nsp2-E145D and nsp4-S55N mutations which are defined as Indian subcontinent/Southeast Asia clade (IS clade). In addition, the presence of geographically distributed nsP2-N495S and C-K73R mutations among the investigated CHIKVs, classifies these viruses into Southeast Asia subclade of the IS clade [20]. Similar to Thailand 2018-2019 outbreak viruses, both nsp3-D372E and E2- G205S were also detected in the current investigated viruses which indicates the 2021 CHIKV E1-226A strain detected in Peninsular Malaysia is from neighboring Thailand [20].
The current circulation of E1-226A strain have been attributed to the presence of E1-K211E:E2-V264A epistatic mutation which are reported to result in higher fitness in Ae. aegypti through increased viral infectivity, dissemination and transmission [32]. Thus, in contrast to the E1-A226V strains detected in 2008 outbreak which consists of Ae. Albopictus adaptive mutations (E2-K252Q), the 2021 CHIKV E1-226A strain may have adapted to Ae. aegypti [33, 34]. This indicates a potential vector shift which would have been integral in the recent re-emergence of CHIKV in Malaysia where urban areas were mostly affected due to the high population of Ae. aegypti [35, 36]. Therefore, it is crucial to investigate Ae. aegypti and Ae. Albopictus in Malaysia CHIKV hotspots, in order to determine the possibility of CHIKV vector shift which leads to increasing local transmission.
Table 2 Amino acids differences comparison between outbreak strain and isolate from 2008-2009 outbreak (KT324226.1) by using GISAID, EpiArboTM, ChikSurver with strain S27 (African prototype) as reference. Different mutations detected in the current outbreak strains compared to KT324226.1, are bolded and underlined, while, KT324226.1 mutations that are absent in current outbreak strains are bolded with asterisks.
Protein
|
Regions
|
Malaysia 2021 outbreak strain
|
KT324226.1 Chikungunya virus isolate MY/09/5668
|
Non-structural
|
nsp1
|
T128K, L172V, E234K, T376M, M383L, I384L, T481I, Q488R, L507R
|
T128K, L172V, E234K, T376M, M383L, I384L, T481I, Q488R, L507R
|
nsp2
|
S54N, H130Y, E145D, H374Y, N495S, C642Y, S643N
|
S54N, H374Y, *L539S, C642Y, S643N, *A793V
|
nsp3
|
V175I, Y217H, P326S, V331A, T337I, K352E, D372E, I376T, A382T, T441A/V, L461P, S462N, P471S, R524stop
|
V175I, Y217H, P326S, V331A, T337I, *L340P, K352E, I376T, A382T, L461P, S462N, P471S, R524stop
|
nsp4
|
S55N, T75A, R85G, T254A, Q500L, I514T, V555I, V604I
|
T75A, R82S, T254A, Q500L, I514T, V555I, V604I
|
Structural
|
C
|
P23S, V27I, K63R, K73R
|
P23S, V27I, K63R
|
E3
|
I23T
|
I23T
|
E2
|
G57K, I74M, G79E, N160T, A164T, L181M, S194G, G205S, I211T, V264A, M267R, S299N, T312M, A344T, S375T, V386A
|
G57K, I74M, G79E, N160T, A164T, L181M, S194G, I211T, *N218S, *K252Q, M267R, S299N, T312M, A344T, S375T, V386A
|
6k
|
V8I, I54V
|
V8I, I54V
|
E1
|
K211E, M269V, D284E, I317V, V322A
|
*A226V, M269V, D284E, V322A
|
A new mutation was detected in the outbreak virus which is threonine-to-valine mutation at position 441 within the nsp3 protein (nsp3-T441V) that may help define the current Malaysian IS subclade with estimated divergence time between 2018.17 to 2019.61 (Figure 4). Interestingly, 54 samples received from southern state, Johor (53 from Tangkak; 1 from Segamat) consist of nsp3-T441V mutation while 2 samples received from central state, Selangor harbor nsp3-T441A and nsp3-T441A/V mutation respectively. Alignment of the CHIKV nsp3 431-449 amino acid region among 49 reference sequences from all genotypes revealed the nsp3-441T is highly conserved among 48 sequences while a recently sequenced CHIKV isolate from a 2020 case study in Malaysia consist of nsp3-T441A mutation similar to that detected in current sample received from Selangor (Table 3)[37]. The CHIKV nsp3 is a multidomain protein which consists of N-terminal macro domain (1-161 a.a), Alphavirus-Unique domain (AUD) (162-320 a.a) and the C-terminal hypervariable domain (HVD) (321-550 a.a) [38]. Both macro domain and AUD plays a crucial role in virus genome replication and transcription. The C-terminal HVD involves in binding to various host proteins which contributes to viral pathogenesis [39, 40]. The nsp3-T441A/V mutation detected in the hypervariable domain region, therefore, may affect the nsp3 interaction with cellular factors that could translate into CHIKV immunomodulation and its’ pathogenicity [38]. Since alphavirus nsp3 protein is a phosphoprotein, the nsp3-T441V/A could potentially replace a threonine phosphorylation site. Whilst the exact phosphorylation site for CHIKV nsp3 protein is unknown, it was reported that unlike other alphaviruses, the lack of phosphorylation sites significantly diminished virus replication which resulted in attenuated CHIKV mutants [41]. The nsp3-T441A/V mutation is also within the region of HVD (411-519 a.a) which serves as binding site for Four and a half LIM domains protein 1 (FHL1), a host protein that promotes CHIKV infection. Replacement of all phosphorylation sites in this region to alanine were shown to significantly reduce the binding of FHL1 which also could potentially diminish virus infection and pathogenesis [42]. Although the nsp3-T441A/V may possibly contribute to an avirulent CHIKV strain, further study is needed, since the 2020 Malaysian CHIKV (Accession number: MW557661) which harbors nsp3-T441A was isolated from an infant suffering from severe Chikungunya disease with encephalopathy and pneumonia [37]. Apart from that, the nsp3-T441A/V mutation may also result in novel nsp3 interaction with cellular protein which may help in elucidating nsp3 function and its potential immunomodulatory properties in human and mosquito.
Table 3 Alignment of CHIKV nsp3 amino acid (431-449) sequences
Virus-Accession no- country - year
|
nsP3 sequence
|
431
|
441
|
449
|
CHIKV- PP236105-Malaysia-2021
|
RAELCPVVQEVAETRDTA
|
CHIKV- PP236106-Malaysia-2021
|
RAELCPVVQEVAETRDTA
|
CHIKV- PP236107-Malaysia-2021
|
RAELCPVVQEVAETRDTA
|
CHIKV- PP236110-Malaysia-2021
|
RAELCPVVQEVAETRDTA
|
CHIKV- PP236108-Malaysia-2021
|
RAELCPVVQEA/VAETRDTA
|
CHIKV- PP236109-Malaysia-2021
|
RAELCPVVQEAAETRDTA
|
CHIKV-MW557661-Malaysia-2020
|
RAELCPVVQEAAETRDTA
|
CHIKV-KM923920-Malaysia-2015
|
RAELCPVVQETAETRDTA
|
CHIKV-KT324226-Malaysia-2009
|
RAELCPVVQETAETRDTA
|
CHIKV-KX262997-Malaysia-2009
|
RAELCPVVQETAETRDTA
|
CHIKV-MF773568-Malaysia-2008
|
RAELCPVVQETAETRDTA
|
CHIKV-KX168429-Malaysia-2009
|
RAEQCPAVQETAETRDTA
|
CHIKV- EU703762-Malaysia-2006
|
RAEQCPAVQETAETRDTA
|
CHIKV-MN974211-Thailand-2018
|
RAELCPVVQETAETRDTA
|
CHIKV- KX619422-India-2014
|
RAELCPVVQETAETRDTA
|
CHIKV- MT640256-Thailand-2021
|
RAELCPVVQETAETRDTA
|
CHIKV- DQ443544-La Reunion-2006
|
RAELCPVVQETAETRDTA
|
CHIKV- FJ807898-Taiwan-2009
|
RAELCPVVQETAETRDTA
|
CHIKV- GQ428210-India-2006
|
RAELCPVVQETAETRDTA
|
CHIKV- EU244823-Italy-2007
|
RAELCPVVQETAETRDTA
|
CHIKV- EF012359-Mauritius-2007
|
RAELCPVVQETAETRDTA
|
CHIKV- KC862329- Indonesia-2010
|
RAELCPVVQETAETRDTA
|
CHIKV- MF076568-Laos-2013
|
RAELCPVVQETAETRDTA
|
CHIKV- HM045810-Thailand-1958
|
RAELCPAVQETAETRDTA
|
CHIKV- EF027140-India-1963
|
RAELCPAVQETAETRDTA
|
CHIKV- EF452493-Thailand-1962
|
RAELCPAVQETAETRDTA
|
CHIKV- HM045813-India-1963
|
RAELCPAVQETAETRDTA
|
CHIKV- HM045803-India-1963
|
RAELCPAVQETAETRDTA
|
CHIKV- EF027141-India-1973
|
RAELCPAVQETAETRDTA
|
CHIKV- HM045788- India-1973
|
RAELCPAVQETAETRDTA
|
CHIKV- HM045791-Thailand-1983
|
RAELCPAVQETAETRDTA
|
CHIKV- HM045797- Indonesia-1985
|
RAELCPAVQETAETRDTA
|
CHIKV- HM045814-Thailand-1975
|
RAELCPAVQETAETRDTA
|
CHIKV- HM045800-Philippines-1985
|
RAEMCPAVQETAETRDTA
|
CHIKV- HM045790-Philippines-1985
|
RAEMCPAVQETAETRDTA
|
CHIKV- HM045796-Thailand-1995
|
RAELCPAVQETAETRDTA
|
CHIKV- HM045802- Thailand -1995
|
RAELCPAVQETAETRDTA
|
CHIKV- HM045787-Thailand-1995
|
RAELCPAVQETAETRDTA
|
CHIKV- HM045789- Thailand -1988
|
RAELCPAVQETAETRDTA
|
CHIKV- AB455493-India-2006
|
RAELCPVVQETAETRDTA
|
CHIKV- AF369024-Tanzania-1953
|
RAELCPVVQETAETRDTA
|
CHIKV- FJ807897- Indonesia -2007
|
RAEQCPAVQETAETRDTA
|
CHIKV- KF318729 -China-2012
|
RAEQCPTVQETAETRDTA
|
CHIKV- FN295483 -Malaysia-2006
|
RAEQCPAVQETAETRDTA
|
CHIKV- KJ451623 -Micronesia-2013
|
RAEQCPTVQETAETRDTA
|
CHIKV- HE806461- New Caledonia -2011
|
RAEQCPAVQETAETRDTA
|
CHIKV- KF872195-Russia-2013
|
RAEQCPAVQETAETRDTA
|
CHIKV- FJ000068- India-2006
|
RAELCPVVQETAETRDTA
|
CHIKV- AB860301- Philippines-2013
|
RAEQCPTVQETAETRDTA
|
CHIKV- KT327165- Mexico-2014
|
RAEQCPTVQETAETRDTA
|
CHIKV- KT327167- Mexico-2014
|
RAEQCPTVQETAETRDTA
|
CHIKV- KT327164- Mexico-2014
|
RAEQCPTVQETAETRDTA
|
CHIKV- KJ451624- British Virgin Islands-2014
|
RAEQCPTVQETAETRDTA
|
CHIKV- KX262991-Saint Martin-2013
|
RAEQCPTVQETAETRDTA
|
CHIKV- MH124579 -India-2010
|
RAELCPVVQETAETRDTA
|
CHIKV- MH124583-India-2016
|
RAELCPVVQETAETRDTA
|