Geographic distribution of Bangladeshi isolates
A total of 306 SARS-CoV-2 full genomes were submitted from patients representing 20 different districts of Bangladesh (Figure1). The highest number (n=66) of was from the capital city Dhaka, while 20 samples were not specified at the district level.
The evolutionary rate for Bangladeshi isolates
We analysed four structural genes (S, E, M and N) and four major open reading frames (ORF1a, ORF6, ORF7a and ORF8) along with the full genome (Table-1). The mutation rate of the in full Bangladesh genome was 0.49E-3 (95% HPD 0.30E-3 to 0.68E-3) nucleotide substitutions/site/year. In ORFs, the highest mutation rate was found for ORF7a (31.8E-3 nucleotide substitutions/site/year) and lowest for ORF1a (0.59E-3 nucleotide substitutions/site/year) while all segments showed negative or stabilizing selection (dN/dS, 0.47 to 0.95). In the structural proteins, the highest mutation rate was 4.59E-3 nucleotide substitutions/site/year for N and it gradually decreased for E, S and M (3.35E-3, 1.47E-3 and 1.14E-3 nucleotide substitutions/site/year). Only the envelop protein gene (E) showed positive selection (dN/dS, 1.43) while other structural proteins showed negative selection (dN/dS, 0.20 to 0.57).
Mutation in S gene of Bangladeshi isolates
A total of 107 nucleotide changes were found in the Bangladeshi S gene; among them 53 were non-synonymous. Among 107 substitutions, 98 were present in <1% isolates, and six changes were present in <2% isolates (Supplementary Table 4). Nucleotide changes C13T, C882T and A1841G were present among 2.6%, 6.3% and 97.7% of isolates respectively. Among them C882T was synonymous, while C13T and A1841G substituted amino acid L5F and D614G. A non-synonymous change in amino acid position five (L5F) occurred between the same group of amino acids (non-polar Leucine to Phenylalanine). While, in position 614, a hydrophilic amino acid D (Aspartate) was substituted by an amphoteric amino acid, G (Glycine).
The evolutionary rate of S gene in South-Asia
The evolutionary rate of S gene varied in the different regions between 1.34E-3 to 1.84E-3 nucleotide substitutions/site/year, except for Maharashtra which was 3.50E-3 (Table-2). With regards to the degree of natural selection (dN/dS) acting on all S genes, it ranged from 0.41 to 0.76 (stabilizing selection) except for Gujrat at 1.0 (neutral selection)., The major substitution found for the Bangladeshi S gene (e.g., D614G substitution) was also analysed for Indian isolates. Only 15.7% of isolates from New Delhi contained this substitution and the highest substitution rate was identified for Gujarat isolates (93.8%) which are comparable with Bangladeshi isolates (97.7%).
Mean p-distance of genes from the reference strain Wuhan-Hu-1
Finally, from ‘Dataset-2’, three major genes (ORF1a, S, and M) were subjected to genetic distance analysis over time through the p-distance method. For Bangladeshi isolates, mean p-distance from the reference sequence (Wuhan-Hu-1) increased during first month then ran horizontally for ORF1a while it slightly increased for S gene (Figure-2). For M gene, p-distance showed a curved line starting with high similarity with Wuhan-Hu-1 strain, then increased and finally declined again. A similar pattern to Bangladesh was observed for Maharashtra and West Bengal for S and M genes. All Indian regions showed a slightly upward trend for ORF1a. Only Delhi and Gujarat showed an upward trend for S and M genes. Notably, in the last two time-points of these two regions, the numbers of isolates were less than ten (Supplementary Table-3).