Geographic distribution of SARS-CoV-2 clades
As of July 7, 2020, WHO reported a total number of 11,669,259 confirmed COVID-19 cases and 539,906 deaths.19 The calculated world case fatality rate (CFR) was 4.63%.
The median number of cases reported from the countries from which SARS-CoV-2 genomes were submitted to the database was 16,799 while the median number of deaths was 274 and that of the CFR was 2.2%. Contributing countries were grouped into two groups according to the relation between the national values of each of the disease epidemiology parameters to the median. Of all continents, the highest number of COVID-19 cases was reported from North America. This was also associated with the highest median CFR (6.26%). On the other hand, most deaths were reported from Europe. GR was the most common clade (29.4%), followed by G (23.4%) and GH (21.5%). Other less common clades including L, S, and V were identified in 6.1%, 7.0% and 6.7% of the submitted genomes, respectively. About 6.0% of the genomes were not clustered into any of the major clades.
Analysis of the continent distribution of viral clades (Fig 1) showed that clade G was the most common in Africa (52.6%) and Oceania (26.4%). Clade GH was the most common in North America (58.5%). GR was the most common clade in Europe (40.0%) and South America (52.9%), while other clade (O) predominated in Asia (34.7%).
The number of coexisting clades was compared between countries with respect to different disease epidemiology parameters including the number of cases, total number of deaths and CFRs. Viral strains belonging to all known clades coexisted in 27 countries (27.0%). Such countries had also reported above median local values for the studied disease epidemiology parameters. Of them, above median number of cases, number of deaths and CFRs were reported by 70.3%, 66.6%, and 62.9%, respectively.
Mann Whitney test showed a significant difference in the distribution of the number of coexisting clades in the two groups with respect to the total number of cases (P-value = 0.008), total deaths (P-value = 0.038) and CFRs (P-value = 0.039). Higher medians were shown for all parameters in the group of countries where above median cases, deaths and CFRs were recorded.
The impact of the distribution of individual clades on the disease epidemiology was also analyzed. Distribution bias of some clades was noted, as shown in Table 1. This was statistically significant for all clades with all disease epidemiology parameters.
Table 1: Geographical distribution of SARS-Cov-2 clades with respect to disease epidemiology parameters
Geographic Region
|
G%
|
GH%
|
GR%
|
L%
|
O%
|
S%
|
V%
|
Countries showing above median number of cases
|
22.9
|
20.9
|
31.3
|
6.4
|
4.9
|
6.8
|
6.8
|
Countries showing below median number of cases
|
26.8
|
26.9
|
13.7
|
3.2
|
15.8
|
7.9
|
5.6
|
Countries showing above median number of deaths
|
23.1
|
21.7
|
31.3
|
6.3
|
4.1
|
6.7
|
6.8
|
Countries showing below median number of deaths
|
25.4
|
19.6
|
13.6
|
4
|
22.5
|
8.9
|
5.9
|
Countries showing above median CFRs
|
23.1
|
21.8
|
31
|
6.4
|
4.1
|
6.8
|
6.9
|
Countries showing below median CFRs
|
25
|
19.5
|
17
|
3.7
|
21.2
|
8.4
|
5.2
|
The distribution bias of different clades among the groups of countries showing above median and below median values for all disease epidemiology parameters was statistically significant (P-value <0.05).
Among all studied cases, patient’s clinical status were specified for only 1331. Based on the provided data, such cases were grouped into mild/recovered cases (n=1153) and severe/deceased cases (n=178). Only clade GR was significantly more frequently identified among viral genomes isolated from severe/deceased cases (Pearson Chi Square, P-value= <0.001). In contrast all other clades showed higher prevalence in mild/recovered cases than severe/deceased ones. Of them, this was statistically significant only for clade S (Pearson Chi Square, P-value= 0.003) (Table 2).
Table 2: Distribution of SARS-CoV-2 clades with respect to patient’s clinical status
Patient Status
|
G%
|
GH%
|
GR%
|
L%
|
O%
|
S%
|
V%
|
Severe/deceased
|
21.9%
|
24.7%
|
31.5%
|
5.6%
|
12.4%
|
2.8%
|
1.1%
|
Mild/recovered
|
23.0%
|
25.2%
|
15.1%
|
7.4%
|
18.0%
|
9.5%
|
1.9%
|
P-value
|
0.751
|
0.901
|
<0.001*
|
0.398
|
0.062
|
0.003*
|
0.761
|
*P-values < 0.05 are statistically significant.
Analysis of the chronological distribution of SARS-CoV-2 clades was done for 59,425 cases for which the date of collection was available. The analysis showed a gradual regression in the number of genomes that belonged to some clades including L, S, and V as well as those not clustered into any of the major clades (clade O). This was accompanied by an expansion in the number of genomes that belonged to others such as G, GR and GH. A slight recent regression was also noted for clade GH compared to G and GR. The global chronological distribution of SARS-CoV-2 clades is shown in Fig 2.
Gender distribution of SARS-CoV-2 clades
The severity of cases in both genders were compared in 1265 cases for which both gender and patient’s clinical status are known. Although severe or deceased cases were more prevalent among male patients than females, gender bias was found to be statistically non-significant (15.0% versus 12.7%, P-value=0.248).
Analysis of 20,939 cases (Males= 11292, Females = 9647) for which patient gender was specified showed gender distribution bias for some clades (Table 3). Some clades were more frequently isolated from males than females such as L and O, while others were more prevalent in females such as GR and V. This was statistically significant for clades GR and O. Clades G, GH and S were nearly equally distributed between the two groups. Deeper analysis into patient’s clinical status information showed that GR clades were significantly more prevalent among female patients with severe cases or death than those with mild or recovered disease (31.8% versus 17.2%, P-value = 0.005). Similarly, genomes that belonged to other clade (O) recovered from male patients were more prevalent in mild or recovered cases than others (13.4% versus 11.6%, P-value = 0.603).
Table 3: Gender Distribution of SARS-CoV-2 clades
Gender
|
G%
|
GH%
|
GR%
|
L%
|
O%
|
S%
|
V%
|
Male
|
22.1%
|
21.5%
|
21.2%
|
7.0%
|
17.7%
|
7.7%
|
8.7%
|
Female
|
22.8%
|
21.9%
|
23.9%
|
6.0%
|
8.1%
|
7.8%
|
9.5%
|
P-value
|
0.217
|
0.515
|
<0.001*
|
0.002
|
<0.001*
|
0.990
|
0.055
|
*P-values < 0.05 are statistically significant.
Age distribution of SARS-CoV-2 clades
A significant correlation was found between age groups and patient’s clinical status. The analysis included 1194 cases for which both patient age and clinical status are known. Age groups were defined as children (up to 18 years), adults (18-64 years) and elderly (65 years or more). Severe/deceased cases were significantly more prevalent in elderly than in adults (38.1 vs 7.9%, Pearson Chi-Square P-value <0.001) or in children (38.1 vs 3.0%, Pearson Chi-Square P-value <0.001). Although Severe/deceased cases were more frequently reported among adults than children, this was not statistically significant (7.9 vs 3.0%, Fisher’s Exact test P-value = 0.158).
The distribution of genomes that belonged to different clades in different age groups was analyzed among 20,871 cases for which the patient age was specified. This included 68.0% adults, 28.7% elderly and 3.4% children. The distribution of the genomes that belonged to clade G was nearly the same across the groups. Viral isolates whose genomes belonged to GR were more frequently isolated from children (27.2% versus 22.0% in adults and 22.8% in elderly). Those which belonged to clades GH and O showed higher prevalence in adult patients. The prevalence of GH was 22.1% in genomes from adults, 20.6% in children and 18.6% in elderly. Other clade (O) was identified in 12.4%, 10.9% and 6.4% of viral genomes recovered from adults, children and elderly patients, respectively. Genomes belonged to clades L, S, and V, were isolated with higher percentages from elderly patients (7.5%, 8.6% and 13.7%, respectively). The distribution of SARS-CoV-2 clades in different age groups with respect to patients’ clinical status is shown in Fig 3.