3.1. Phylogenetic tree analysis
The maximum likelihood phylogenetic tree in Figure 1 shows that two of the SARS-CoV-2 isolates from Sri Lanka (GISAID accession IDs: EPI_ISL_428671 and EPI_ISL_428672) collected on 10th March 2020 and 19th March 2020, respectively are clustered in the group with the isolates from Italy, Germany, France and Mexico that were collected before the 10th March 2020. The EPI_ISL_428673 Sri Lankan isolates collected in 31st March 2020 was clustered with isolate obtained in 9th Feb 2020 from England while EPI-ISL_428670 Sri Lankan isolates collected on 16th March 2020 showed the highest evolutionary distance to the SARS-CoV-2 sequence originated in Wuhan, China (GenBank Acc No: NC_405512).
Figure 1. Phylogenetic analysis of four SARS-CoV-2 complete genome sequences of Sri Lanka retrieved in this study, with available selected complete genome sequences from different countries a (n=50 genome sequences). Strains names were written name followed by country of origin, GISAID accession number, and sample collection date. GISAID: Global Initiative on Sharing All Influenza Data; HKY: Hasegawa, Kishino, and Yano; MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms; SARS-CoV-2: severe acute respiratory syndrome coronavirus. Sequence data obtained from GISAID: Global Initiative on Sharing All Influenza data. All Sri Lankan SARS CoV2 isolates are indicated in red triangles (▲). The main clusters are highlighted in different colors. The Wuhan reference genome is in larger font (GenBank accession number: NC_045512.2). The filled circles represent the main supporting clusters and bootstrap support values are indicated at the level of the nodes. The tree was built using the best-fitting substitution model (HKY) through MEGA X software [7].
3.2. SNPs Analysis
Fifteen SARS-CoV-2 genome sequences that are mainly clustered with the four Sri Lankan strains were compared with the Wuhan reference to observe the viral genome mutations and amino acid variations. The SNPs presented along the whole genome indicated in Table 2 (positions referred respect to the reference sequence; GenBank accession number: NC_045512). The genome sequence of EPI_ISL_428671 from the first local patient has differed in six nt positions compared to the reference genome, while rest of three Sri Lankan sequences EPI_ISL_428670, EPI_ISL_428672, and EPI_ISL_428673 showed variations in six nt positions, five nt positions, and four nt positions, respectively (Table 2). Both EPI_ISL_428671 and EPI_ISL_428672 strains, which clustered with the main group in the phylogenetic tree with European isolates, have shown three similar SNPs at the positions of bps3037, bps14408, and bps23403 (Table 2).
3.3 Amino acid variations
Table 3 indicates the respective changes in the amino acid positions of the derived proteins (positions referred respect to the reference sequence; GenBank accession number: NC_045512). SNPs occurred only in the Open Reading Frame (ORF) 1ab gene, S gene, ORF 3a gene, M gene, and N gene of four Sri Lankan whole-genome strains have resulted in Amino acid changes at the corresponding positions of the translated proteins, while rest of SNPs in the genes did not result in any changes in amino acid sequence (Table 3).
Except for the first Sri Lankan isolate collected on 10th March (EPI_ISL_428671), the other three Sri Lankan isolates presented a total of six mutations in the ORF 1ab protein with respect to the Reference (Table 3). Mutations can be observed in the S protein at the same position AA614 (bps23403) in both Sri Lankan Strains EPI_ISL_428671 and EPI_ISL_428672. A single mutation was observed in ORF 3a protein in strain EPI_ISL_428673 at the position AA251 and bps26144. In the EPI_ISL_428670 strain, the amino acid sequence of N protein shows one mutation at the position AA398 at bps29465, while EPI_ISL_428671 strain had mutations at the positions AA203 (bps28882) and AA204 (bps28883) compared to the reference strain.