General features of the genome sequence of ASFV Uvira B53 strain
The ASFV Uvira B53 strain is a p72 genotype X isolated from the spleen of an ASFV-positive and symptomatic domestic pig in South Kivu province, East of DRC, during outbreaks reported in the Uvira district in 2019 [11]. Genotype X confirmation of the ASFV Uvira B53 was done through BLAST result showing the highest percentage identity with existing GenBank ASFV sequences of the genotype X (Kenya 1950 GenBank Acc. No. AY261360.1 and Ken05/Tk1 GenBank Acc. No. KM111294.1). The draft assembly of the Uvira B53 genome comprises 180,916 bp that could be assembled in 2 contigs with an N50 of 112,709 bp. Terminal inverted repeats were missing at both ends of the genome sequence, probably due to the limited amount of ASFV material (about 0.1% of total reads) in the spleen DNA sample sequenced. The base composition of the genome sequence showed a GC content of 38.5%, which is comparable to that of other genotype X strains, i.e. Ken05/Tk1 (38.3%) and Kenya 1950 (38.4%). Sequence annotation using GATU software [14] revealed a total of 168 protein-coding genes (Table 1). In total, 46 multigene family (MGF) genes were identified including MGF 100 (3 members), MGF 110 (13 members), MGF 300 (3 members), MGF 360 (18 members) and MGF 505 (9 members). However, there was a deletion of one MGF 360 member (MGF 360-2L) and 2 other MGF 360 members (MGF 360-21R and MGF 360-1L) in the right variable region; another MGF 360 member in the left variable region (MGF 360-1L) was truncated, being only 95 bp long starting from the 5’ end (compared to the 1071 bp of the complete ORF) and likely non-functional. Also, four MGF 110 members were missing in the Uvira B53 genome (MGF 110-4L, MGF 110-7L, MGF 110-8L and MGF 110-9L) as well as one MGF 100 member (MGF 100-1R).
Table 1 Summary of the Uvira B53 ASFV genomic sequencing data
Genome assembly
|
Number of contigs
|
Largest contig
|
Total length (bp)
|
% GC content
|
N50 (bp)
|
L50
|
ORFs
|
ASFV Uvira B53
|
9
|
112,709
|
180,916
|
38.5
|
112,709
|
1
|
168
|
Initial comparative analysis of the Uvira B53 strain genome with other ASFV strains
A total of seventeen ASFV complete genome sequences representing different strains reported in several ASF endemic countries were retrieved from the GenBank and were used in this study for comparison and phylogeny. Genome’s GenBank accession numbers, country and year of isolation, virus genotype, host, global alignment percentage identity to the genome of ASFV strain Uvira B53, and the reference are shown in Table 2. The pair-wise alignment between the Uvira B53 strain and other ASFV genomes showed the highest maximum percentage identity with strains of the genotype X, specifically Kenya 1950 (98.85%) and Ken05/Tk1 (95.52%) (Table 2).
Table 2 Comparison of complete genome sequences of Uvira B53 ASFV with selected genomes from the GenBank
Strain name
|
GenBank Acc. No.
|
Country
|
Year
|
p72 genotype
|
Host
|
% identity to Uvira B53
|
Reference
|
Tengani 62
|
AY261364
|
Malawi
|
1962
|
V/I
|
Pig
|
86.6
|
[17]
|
Georgia 2007/1
|
FR682468
|
Georgia
|
2007
|
II
|
Pig
|
85.4
|
[18]
|
Kenya 1950
|
AY261360
|
Kenya
|
1950
|
X
|
Pig
|
98.8
|
[19]
|
Kenya05/Tk1
|
NC044945
|
Kenya
|
2005
|
X
|
Tick
|
95.3
|
[20]
|
Benin 97/1
|
NC044956
|
Benin
|
1997
|
I
|
Pig
|
86.2
|
[21]
|
Ken06.Bus
|
KM111295
|
Kenya
|
2006
|
IX
|
Pig
|
92.2
|
[20]
|
R35
|
MH025920
|
Uganda
|
2015
|
IX
|
Pig
|
91.9
|
Unpublished
|
N10
|
MH025919
|
Uganda
|
2015
|
IX
|
Pig
|
91.9
|
Unpublished
|
Pretorisuskop/96/4
|
AY261363
|
South Africa
|
1996
|
XX/I
|
Tick
|
86.1
|
[19]
|
Warthog
|
AY261366
|
Namibia
|
1980
|
IV
|
Warthog
|
86.5
|
[19]
|
R8
|
MH025916
|
Uganda
|
2015
|
IX
|
Pig
|
91.9
|
Unpublished
|
Mkuzi 1979
|
NC044953
|
South Africa
|
1979
|
I/VII
|
Tick
|
85.7
|
[19]
|
Belgium 2018/1
|
LR536725
|
Belgium
|
2018
|
II
|
Wild boar
|
85.4
|
[22]
|
RSA_2_2004
|
MN641877
|
South Africa
|
2004
|
XX
|
Wild boar
|
84.6
|
Unpublished
|
Zaire
|
MN630494
|
Zaire
|
2020
|
XX
|
Pig
|
85.4
|
Unpublished
|
Pig/China/Cas19-01/2019
|
MN172368
|
China
|
2019
|
II
|
Pig
|
85.4
|
[23]
|
85/Ca/1985
|
MN270973
|
Italy
|
1985
|
I
|
Pig
|
86.3
|
[24]
|
Genome comparison of ASFV Uvira B53 with reported ASFV genotype X strains
Comparatively, with a length of 180,916 bp, the newly determined Uvira B53 genome is about 10-13 kbp shorter than the two reference genotype X strains from the GenBank, i.e. Kenya 1950 (193,886 bp) and Ken05/Tk1 (191,058 bp). However, Uvira B53 was genetically closer to Kenya 1950, a pig-derived strain exhibiting 98.8% DNA identity, than to Ken05/Tk1, a tick-derived strain with 95.3% identity (Table 2). A visual representation of the whole genome alignment of homologous genes between the ASFV genotype X generated using the Viral Orthologous Cluster V.2.0 [14] is shown in Figure 1. Sequence alignment showed that the length difference observed between these genomes is due mainly to the absence of some genes in Uvira B53 particularly the members of the 5 multigene families such as MGF 100, MGF 110, MGF 300, MGF 360, and MGF 505. In summary, 10 genes were not present in the Uvira B53 genome including 8 multigene families (MGF 360-2L, MGF 110-4L, MGF 110-7L, MGF 110-8L, MGF 100-1R, MGF 110-9L, MGF 360-21R and MGF 360-1L), DP96R encoding the UK protein [25] and p285L of unknown functions (Table 3). More specifically, MGF 360-1L and MGF 360-21R were absent in the right terminal in the Uvira B53 strain while they were present in the other genotype X strains. Moreover, MGF 110-4L, MGF 110-7L, MGF 110-8L, MGF 110-9L and MGF 100-1R were absent in Uvira B53 strain but present in the two reference genotype X strains. In contrast, MGF-110-5L was absent in the Kenya 1950 isolate but present in the Uvira B53 and Ken05/Tk1 (Table 3).
Table 3 ORFs present in Kenya 1950 and Ken05/Tk1 but absent in Uvira B53
ORF name
|
Uvira B53
|
Kenya 1950
|
Ken05/Tk1
|
DP96R
|
-
|
+
|
+
|
MGF 100-1R
|
-
|
+
|
+
|
MGF 110-4L
|
-
|
+
|
+
|
MGF 110-7L
|
-
|
+
|
+
|
MGF 110-8L
|
-
|
+
|
+
|
MGF 110-9L
|
-
|
+
|
+
|
MGF 360-1L (right)
|
-
|
+
|
+
|
MGF 360-21R (right)
|
-
|
+
|
+
|
MGF 360-2L
|
-
|
+
|
+
|
p285L
|
-
|
+
|
+
|
MGF 110-5La
|
+
|
-
|
+
|
Abbreviation: (-), ORF not present. aMGF 110-5L is present in Uvira B53 but absent in Kenya 1950.
Overall, of the 168 ORFs identified in the Uvira B53 genome, 134 shared 100% identity with homologs in Ken05/Tk1 whereas the 34 others were polymorphic (56.6 to 99.6% identity). On the other hand, Uvira B53 and Kenya 1950 also shared 167 ORFs including 136 with 100% sequence identity and 31 that were divergent (74.4% to 99.7% sequence identity). Therefore, one Uvira B53 ORFs (MGF 110-5L) was absent in Kenya 1950. Altogether, the 168 ORFs in the Uvira B53 genome could be clustered into two main groups: 131 conserved and 37 non-conserved ORFs (Table 4).
Conserved ORFs
The conserved category included 131 Uvira B53 ORFs for proteins showing 100% amino acid identity with the two reference ASFV genotype X analyzed. Some of them encode for structural proteins, transcription, replication and processing factors, enzymes and proteins involved in nucleotide metabolism, DNA repair. Whereas several other ORFs were classified as coding for membranes proteins from which 16 belong to the members of MGF. Also clustered in the category of conserved ORFs were protein-coding genes A238 (an IkB-like protein), H339R (the viral protein involved in host-virus interaction), E301R (proliferating cell nuclear antigen), B263R (the TATA box binding protein), Bcl-2 A179L (the apoptosis regulating protein) and E120R (the DNA-binding structure). Furthermore, most conserved proteins included several uncharacterized ORFs such as F317L, H171R (with 100% identity between the strains) (Data not shown).
Non-conserved, variable ORFs
Sequence comparison revealed that 37 Uvira B53 ORFs were polymorphic in either Ken05/Tk1 or Kenya 1950, or both strains, with 50.6% – 99.6% sequence identity (Table 4). In comparison with Ken05/Tk1, non-conserved ORFs included seven proteins involved in the putative signal peptide and transmembrane region (B169L, C84L, CP123L, E146L, I177L, I196L, and X69R), one belonging to helicase superfamily II (A859L), the structural protein p54 (E183L), the lectin-like protein (EP153R), the CD2 homolog (EP402R), a ERCC4 predicted nuclease and potential death domain (EP364R), NifS-like PLP-dependent transferase (QP383R), twelve members of the MGF including five of MGF 110, ( 2L, 5L, 6L, 11L, and 13L), two MGF 300 (2R and 4L), four MGF 360 (6L, 8L, 13L and 18R), one MGF 505 (1R) and eight ORFs of unknown functions.
With respect to Kenya 1950 strain, we identified 31 ORFs with 74.4% to 99.7% sequence identity (Table 4). One ORF (MGF 110-5L) was missing in Kenya 1950 while present in the other strains. Similarly, two ORFs (I10L and MGF 505-4R) were also absent in Ken05/Tk1 while present in the two other genotype X strains.
Table 4 Uvira B53 polymorphic ORFs with respect to Ken05/Tk1 and Kenya 1950
Gene
|
Function
|
Ken05/Tk1
|
Kenya 1950
|
A859L
|
Helicase superfamily II
|
99.4
|
100
|
B117L
|
Transmembrane region containing protein
|
90
|
77.1
|
B169L
|
Putative signal peptide
|
91.5
|
98.8
|
B407L
|
Unknown
|
97.3
|
99.5
|
B475L
|
Unknown
|
99.6
|
95.8
|
C84L
|
Putative signal peptide
|
98.5
|
97
|
CP123L
|
Putative signal peptide
|
100
|
99.2
|
D129L
|
Unknown
|
86
|
92.7
|
E146L (j16L)
|
Putative signal peptide
|
98.6
|
100
|
E183L (p54, j13L)
|
Structural protein p54
|
98.4
|
91.4
|
EP153R
|
Lectin-like protein
|
84.3
|
79.5
|
EP364R
|
Predicted nuclease and potential DEATH domain
|
95.5
|
97.1
|
EP402R (CD2v)
|
CD2-like protein
|
76
|
96.2
|
I10L
|
Unknown
|
100
|
97.6
|
I10L_2
|
Unknown
|
61.8
|
96
|
I12R
|
Unknown
|
88.3
|
78.7
|
I177L (k14L)
|
Putative signaling peptide
|
90.1
|
100
|
I196L (k15L)
|
Putative signaling peptide
|
79.4
|
94.9
|
I8L
|
Unknown
|
62.1
|
100
|
I9R
|
Unknown
|
97.9
|
99
|
L60L
|
Unknown
|
52.1
|
74.4
|
O61R (p12)
|
Structural protein p12
|
98.4
|
96.8
|
QP383R (j11R)
|
NifS-like PLP-dependent transferase
|
90.3
|
99.7
|
X69R
|
Putative signal peptide
|
95.8
|
84.9
|
MGF 110-11L
|
110 multigene
|
76.1
|
92.1
|
MGF 110-13L-14L
|
110 multigene
|
96.7
|
95.3
|
MGF 110-2L
|
110 multigene
|
83.7
|
95.6
|
MGF 110-5L
|
110 multigene
|
98.4
|
(-)
|
MGF 110-6L
|
110 multigene
|
65.3
|
95.1
|
MGF 300-2R
|
300 multigene
|
50.6
|
100
|
MGF 300-4L
|
300 multigene
|
54.4
|
96.4
|
MGF 360-13L
|
360 multigene
|
74.8
|
99.2
|
MGF 360-18R
|
360 multigene
|
82.5
|
97.7
|
MGF 360-6L
|
360 multigene
|
85.5
|
94.9
|
MGF 360-8L
|
360 multigene
|
78.7
|
91.1
|
MGF 505-1R
|
505 multigene
|
72.5
|
98.1
|
MGF 505-4R
|
505 multigene
|
100
|
99.2
|
(-), ORF not present.
Comparison of the region between I73R and I329L genes in the Uvira B53 and other genotype X strains
Previous studies have demonstrated that ASFV genotype X strains reported to date are closely related and are known to be widespread in Kenya [20]. Some small length variations among these strains’ genomes are mostly due to the number of tandem repeat sequences (TRS) either within genes or within intergenic regions. The intergenic region between I173R and I329L genes is essential for discriminating between closely related ASFV strains. In that regard, we assessed the tandem repeat sequence in the intergenic region between those two genes (region 173,611–173,760) in the Uvira B53 and the two Kenyan genotype X strains. The multiple sequence alignment revealed a significant size variation due to indels. With respect to Ken05/Tk1, Uvira B53 showed a 69 bp deletion, whereas Kenya 1950 featured a 36 bp deletion (Figure 2).
Phylogenetic analysis of the complete genomes of ASFV strains and the polymorphic genes
The genetic relationship between the ASFV strains was assessed through multiple sequence alignments of the whole complete genome sequences from 17 representative ASFV strains retrieved from the GenBank. Phylogenetic analysis grouped the viruses into different clusters corresponding to their genotypes as expected. Thus, Uvira B53 clustered with the two other ASFV p72 genotype X, Kenya 1950 and Ken05/Tk1 strains (Figure 3). The closest but distinct cluster to this genotype X group was the cluster composed of genotype IX ASFV strains (Ken06.Bus, R8, R35 and N10) whereas the most distantly related clusters concerning Uvira B53 included genotype XX with virus strains from DRC (Zaire) and South Africa (Pretorisuskop/96 and RSA_2_2004) as well as genotype IV containing the Namibian warthog strain.
Furthermore, we looked at polymorphic genes among all the 18 ASFV strains and carried out a phylogenetic analysis of the four most divergent of these genes, which included I196L, KP177L, EP153R and I177L. The Uvira B53 protein variants for I196L and I177L genes clustered with the Kenyan strains Kenya 1950 and Ken05/Tk1 from the 15 other strains analyzed (Figure 4 A and D). In contrast, protein variants encoded by the two other genes, KP177L and EP153R, showed hypervariable regions among the strains and separated the three ASFV genotype X into different clusters (Figure 4 B and C). The KP177L gene product clustered Uvira B53 (genotype X) together with Benin 97/1 and 85/Ca/1985 (genotype I). The KP177L gene was absent in the Ugandan strains (R8, R35 and N10) genotype IX. In contrast, given the low bootstrap percentage (38%) of the node value, the EP153R protein grouped Uvira B53 with Ugandan strains (R8, R35, N10) and Kenyan strain Ken06.Bus (Figure 4C).
Amino acid sequence comparison of the EP402R (CD2V) and serotyping
To determine the hemadsorption inhibition (HAI) and serogroup characteristics, the protein sequence of Uvira B53 EP402R gene was compared with the ones of 13 other ASFV strains retrieved from the GenBank and representing the 8 serogroups known to date. The results revealed a high sequence variation in the CD2v protein among all the strains. The Uganda (KM609361) strain of serogroup 7 was the most closely related to Uvira B53 displaying 99% amino acid identity (the Uvira B53 CD2v is 373 amino acid long, three amino acid residues longer than its Uganda counterpart; data not shown), suggesting that Uvira B53 reported in this study belongs to serogroup 7, representing the second ASFV serogroup 7 reported to date.
The C-terminal end of CD2v is characterized by a tandem repeat sequence (TRS) of six amino acids PPPKPC. Comparative analysis of the partial TRS region showed sequence diversity due to amino acid substitutions and deletions (Figure 5). However, O-77 and STP-1 strains, both of serogroup 4, did not contain indels in the sequences thus displaced the longest TRS. For Uvira B53 and Uganda (KM609361) strains, this partial TRS was identical.