RhCMV BACs and ORF nomenclature
Following isolation from the urine of a rhesus macaque in 1968, a parental virus (RhCMV strain 68-1; RhCMV68−1) was subjected to extensive and largely undocumented passage in cultured fibroblasts of human or rhesus macaque origin16. A stock of the resulting virus was used to construct a primary BAC (RhCMV68−1 BAC)17, from which all available RhCMV BACs are derived. A succession of studies has shown that RhCMV68−1 and RhCMV68−1 BAC are highly mutated18–23, with the detail having been revealed progressively by the genome sequences of RhCMV68−1, RhCMV68−1 BAC, derivatives of RhCMV68−1 BAC, viruses generated from RhCMV68−1-based BACs, and other RhCMV strains (Table 1). As a result, RhCMV68−1 and RhCMV68−1 BAC lack the functions of many genes required for cellular tropism and fitness in vivo. It was necessary to repair these mutations in order to create a candidate for testing as a transmissible vaccine. Achieving this involved making multiple small- and large-scale repairs to RhCMV68−1/EBOV BAC and carrying out Illumina-based complete genome sequencing at each stage to monitor fidelity.
Table 1
GenBank accession no.
|
Parental strain
|
Source of sequence
|
Reference
|
AY186194.1
|
68-1
|
Isolated virus
|
Hansen et al., 200324
|
JQ795930.1
|
68-1
|
BAC
|
Malouli et al., 201220
|
MF468139.1
|
68-1
|
BAC-derived virus
|
Hansen et al., 20188
|
MF468140.1
|
68-1
|
BAC
|
Hansen et al., 2018
|
MF468141.1
|
68-1
|
BAC-derived virus
|
Hansen et al., 2018
|
MF468142.1
|
68-1
|
BAC-derived virus
|
Hansen et al., 2018
|
MF468143.1
|
68-1
|
BAC-derived virus
|
Hansen et al., 2018
|
MF468144.1
|
68-1
|
BAC-derived virus
|
Hansen et al., 2018
|
MF468145.1
|
68-1
|
BAC-derived virus
|
Hansen et al., 2018
|
MF468146.1
|
68-1
|
BAC-derived virus
|
Hansen et al., 2018
|
MF468147.1
|
68-1
|
BAC
|
Hansen et al., 2018
|
MK937070.1
|
68-1
|
BAC
|
Marshall et al., 201942
|
MN437483.1
|
68-1
|
BAC
|
Hansen et al., 20094
|
MT157325.1
|
68-1
|
BAC
|
Taher et al., 202023
|
MT157326.1
|
68-1
|
BAC
|
Taher et al., 2020
|
MT157327.1
|
68-1
|
BAC
|
Taher et al., 2020
|
MZ517252.1b
|
68-1
|
BAC
|
Present study
|
MZ517253.1c
|
68-1
|
BAC
|
Present study
|
DQ120516.1
|
180.92
|
Isolated virus
|
Rivailler et al., 200618
|
KX689267.1
|
19262
|
Isolated virus
|
Burwitz et al., 201621
|
KX689268.1
|
19936
|
Isolated virus
|
Burwitz et al., 2016
|
KX689269.1
|
24514
|
Isolated virus
|
Burwitz et al., 2016
|
MT157328.1
|
34844
|
Isolated virus
|
Taher et al., 2020
|
MT157329.1
|
KF03
|
Isolated virus
|
Taher et al., 2020
|
MT157330.1
|
UCD52
|
Isolated virus
|
Taher et al., 2020
|
MT157331.1
|
UCD59
|
Isolated virus
|
Taher et al., 2020
|
MZ517254.1d
|
180.92
|
Isolated virus
|
Present study
|
aSequences are listed in order of parental strain and then GenBank accession no. |
bParental RhCMV68−1 BAC used in the present study. |
cRhCMV68−1/EBOV/RL11G+ BAC generated in the present study. |
dFull-length sequence generated in the present study; DQ120516.1 has a large deletion. |
The original nomenclature for RhCMV68−1 ORFs was established in 2003 and consisted of the prefix rh followed by a number (GenBank accession no. AY186194.1)24. This nomenclature was modified and extended in 2012 by comparison with the sequence of RhCMV68−1 BAC (GenBank accession no. JQ795930.1)20. As this nomenclature related only to RhCMV and not other CMVs, a comparative analysis in 2006 of the sequence of RhCMV strain 180.92 (GenBank accession no. DQ120516.1)18 was used to develop a partially inclusive system in which RhCMV ORFs conserved in human CMV (HCMV) were given names corresponding to those in HCMV. A fully inclusive system applying across sequenced primate CMVs was developed in 2011 (GenBank accession no. FJ483968.2)22, when the RhCMV genome annotation was improved further and orthologous ORFs in different CMVs were denoted by the same name. The principal names were those of HCMV ORFs, supplemented by those of ORFs specific to Old World monkey CMVs, which are prefixed by the letter O. This nomenclature is used below and in the genetic map of the final product of the HHi-FiVe pipeline (Figure 1). In addition, when available, the alternative names are provided in Table 2, and the 2012 names are specified below in parentheses after first use of an inclusive name. Nucleotide descriptions are given in relation to the genome sequence regardless of ORF orientation.
Table 2
Steps in repairing inactivated ORFs in RhCMV68−1/EBOV BAC.
Step
|
ORF
|
2003 ORFa
|
2006 ORFb
|
2012 ORFc
|
Mutationd
|
Repaird
|
Locatione
|
1
|
UL36
|
rh61&rh60
|
rhUL36
|
Rh61/Rh60
|
Frameshifted in T8
|
Replaced by T7
|
48669-48675
|
2
|
UL146C
|
NP
|
NP
|
NP
|
Wholly deleted
|
Replaced
|
167033-171891
|
UL146D
|
NP
|
NP
|
NP
|
Wholly deleted
|
Replaced
|
UL146F
|
NP
|
NP
|
NP
|
Wholly deleted
|
Replaced
|
UL146H
|
rh161
|
NP
|
Rh161
|
Partially deleted
|
Replaced
|
3
|
RL11D
|
rh08
|
rh8
|
Rh08
|
Frameshifted in C11
|
Replaced by CAC8
|
6341-6350
|
4
|
UL119
|
rh152&rh151f
|
rhUL119
|
Rh152/Rh151
|
Terminated by stop codon (TCA)
|
Replaced by CCA
|
154761-154763
|
5
|
US12E
|
rh197
|
rh197
|
Rh197
|
Terminated by stop codon (CTA)
|
Replaced by CCA
|
208625-208627
|
6
|
RL11E
|
NA
|
rh8.1
|
Rh08.1
|
Frameshifted in CCAC10
|
Replaced by C12
|
6991-7002
|
7
|
RL11B
|
rh06
|
rh6
|
Rh06
|
Frameshifted in C11
|
Replaced by TCCACCTCC
|
5197-5208
|
8
|
UL128
|
NP
|
rhUL128
|
NP
|
Wholly deleted
|
Replaced
|
161705-167032
|
UL130
|
NA
|
rhUL130
|
Rh157.4
|
Partially deleted
|
Replaced
|
9
|
RL11G
|
rh14
|
rh13.1&rh14
|
Rh13.1
|
Frameshifted due to insertion (CT)
|
Deleted
|
Between 12847 and 12848
|
Frameshifted in A8
|
Replaced by A7
|
13059-13065
|
aName of ORF in RhCMV68−1, GenBank accession no. AY186194.1 (Hansen et al., 2003)24; NA, not annotated; NP, not present; &, separate ORFs; some ORFs are partial because of lack of recognition of errors, mutations or splicing. |
bName of ORF in RhCMV strain 180.92, GenBank accession no. DQ120516.1 (Rivailler et al., 2006)18; NP, not present; &, separate ORFs; some ORFs are partial because of lack of recognition of mutations. |
cName of ORF in RhCMV68−1 BAC, GenBank accession no. JQ795930.1 (Malouli et al., 2012)20; NP, not present; /, spliced ORFs; some ORFs are partial because of lack of recognition of mutations. |
dSequences correspond to the genome sequence regardless of ORF orientation. |
eIn RhCMV68−1/EBOV/RL11G+ BAC, GenBank accession no. MZ517253.1 (present study). |
fApparently intended to be named rh151 but annotated rh141. |
Identification of mutated ORFs in RhCMV68−1/EBOV BAC
As noted previously, one of the complications with identifying mutations in RhCMV68−1 and RhCMV68−1 BAC is that the original RhCMV68−1 sequence appears to contain numerous errors20. Thus, some of the differences between RhCMV68−1 and RhCMV68−1 BAC are due to these errors rather than to mutations generated during the construction of RhCMV68−1 BAC. We estimated the number of such errors at 20. They include substitutions and insertions or deletions (indels) in noncoding regions, substitutions in ORFs [RL1 (Rh01), RL11A (Rh05), RL11C (Rh07), RL11D (Rh08) and UL55 (Rh89); and RL11G (Rh13.1)], introducing an in-frame stop codon, although this may have been due to a subpopulation of mutants in RhCMV68−1 rather than an error), and frameshifts in ORFs [RL11D, COX2 (Rh10), UL34 (Rh57), UL71 (Rh100.1), US18 (Rh199) and US27D (Rh216)].
In order to ensure that the RhCMV component of the repaired RhCMV68−1/EBOV BAC was as close in sequence as possible to the original RhCMV68−1 genome as perceived to have existed prior to isolation and serial passage in cell culture25, it was necessary to identify mutations in RhCMV68−1 BAC (and hence in RhCMV68−1/EBOV BAC) that have resulted in inactivated ORFs. This involved detailed examination of an alignment of all available RhCMV genome sequences, which at the time did not include several reported since by Taher et al (2020)23; these recent sequences were incorporated at the end of the study and identified no additional mutations. This comparative exercise revealed a total of 13 putatively inactivated ORFs (Table 2). They fell into two categories: (i) those terminated by in-frame stop codons due to substitutions and those truncated or extended by frameshifts due to small indels (most located within or associated with homopolynucleotide tracts), and (ii) those partly or wholly missing due to large deletions or rearrangements. Seven ORFs [RL11B (Rh06), RL11D, RL11E (Rh08.1), RL11G, UL36 (Rh61/Rh60), UL119 (Rh152/Rh151) and US12E (Rh197)] distributed across the genome were in category (i) and required small-scale repair. Six ORFs [UL128, UL130 (Rh154.7), UL146C, UL146D, UL146F and UL146H] located within a region of the genome called UL/b’, which contains ORFs involved in cellular tropism and immunomodulation19,26 were in category (ii) and required large-scale repair. These ORFs were supplemented by six other ORFs in UL/b’ [UL131A (Rh157.6), UL132 (Rh160), UL148 (Rh159), UL147A, UL147 (Rh158) and UL146B (Rh158.1)] that, although intact and therefore probably not inactivated, were inverted as a block. These 19 ORFs were targeted for repair, replacement, or restoration in RhCMV68−1/EBOV BAC.
Examination of the sequence alignment also indicated a few additional differences in six RhCMV68−1 BAC ORFs [O3, UL41A (Rh67.1), UL45 (Rh72), UL74A, UL141 (Rh164) and US12B (Rh194)] that are not represented in RhCMV68−1 or other RhCMV strains but caused predicted amino acid substitutions. Given the error-prone nature of the RhCMV68−1 sequence, the reality of these differences was not certain, and they were not targeted for repair.
Pipeline for repairing mutated ORFs
Targeted genetic manipulation of herpesvirus genomes is achieved by BAC-based recombineering followed by reconstitution of virus by transfection of BACs into permissive cells27. Off-site mutations are a concern when manipulating such large DNA constructs and reconstituting viruses. In the past, this problem has been addressed by creating viruses from revertant BACs in order to demonstrate that the intended manipulations are genetically and phenotypically reversible. However, this approach is regarded as inadequate because it does not control for off-site mutations that arise during reconstitution of virus; in our experience, this is often when such mutations occur. It is also not practical for vaccine development because of the labor-intensiveness and limited scope of phenotypic assays. To cope with this inherent vulnerability, we coupled BAC-based recombineering with responsive Illumina-based whole genome sequencing to create the HHi-FiVe pipeline for generating and validating BACs and reconstituted viruses (Figure 2).
We set out to use this pipeline to repair the mutations in RhCMV68−1/EBOV BAC using BAC-based recombineering4,28,29. Recombinant BACs were screened initially by restriction fragment length polymorphism (RFLP) analysis to screen for appropriate changes to fragment mobility (Supplementary Figure 1). This was followed by whole genome sequencing of recombinant BACs at each stage. Overall, the complete process was accomplished in nine steps (Table 2).
Small-scale repairs
Six inactivated ORFs (RL11B, RL11D, RL11E, UL36, UL119 and US12E) required small-scale repair (Steps 1 and 3–7). Most mutations were addressed by restoring the perceived original sequence to reinstate the integrity of the ORF. However, an initial attempt at repairing RL11B at Step 7, which consisted of removing two C residues in a C11 homopolynucleotide tract to restore a C9 tract, resulted consistently in a C10 tract. Therefore, an alternative strategy was used that involved introducing synonomous substitutions within the tract. Repair of RL11G was also small-scale (see below).
Large-scale repairs
A total of 12 ORFs in UL/b’ had undergone extensive deletion or rearrangement during passage of RhCMV68−1, and the six ORFs that were completely or partially missing as a result were not amenable to small-scale repair. Instead, the whole region was replaced by a wild type version based on RhCMV strain 19936 (Table 1), using three synthetic DNA segments that together encompassed this region (Steps 2 and 8). The product of Step 8, which still contained two frameshift mutations in RL11G (see below), was denoted RhCMV68−1/EBOV/RL11G− BAC.
Repair of RL11G
RL11G contained two separate mutations: a CT insertion in a (CT)2 tract, and further downstream, an A insertion in an A7 tract. Each mutation resulted in a frameshift, the first removing the transmembrane domain of the encoded protein and the second restoring the correct reading frame near the end of the ORF. The first mutation was predicted to have been sufficient to inactivate RL11G on its own. RL11G is an orthologue of HCMV RL1320, which has been shown to mutate during viral growth in culture in all cell types tested30,31. Therefore, its repair was reserved for the final step (Step 9). This strategy was vindicated by the recent demonstration that a repaired version of RL11G in a BAC-derived version of RhCMV68−1 mutates in rhesus fibroblast cell culture23. Our approach was to delete RL11G completely and insert a full-length synthetic version in which the two frameshift mutations were repaired and two substitutions unique to RhCMV68−1 and RhCMV68−1-based BACs were replaced. The final product was denoted RhCMV68−1/EBOV/RL11G+ BAC and was repaired in all the genes inactivated in RhCMV68−1 BAC and RhCMV68−1/EBOV BAC by premature termination, frameshifting, deletion, or rearrangement. As well as the intended manipulations and repairs, both RhCMV68−1/EBOV/RL11G− and RhCMV68−1/EBOV/RL11G+ BAC had one inconsequential difference from RhCMV68−1/EBOV: an additional G residue in a G7-tract in one copy of the terminal direct repeat of the viral genome.
Stability of RhCMV68−1/EBOV/RL11G− and RhCMV68−1/EBOV/RL11G+
Viruses were reconstituted by transfecting RhCMV68−1/EBOV/RL11G− BAC or RhCMV68−1/EBOV/RL11G+ BAC into rhesus fibroblast (Telo-RF) or human epithelial (hTERT RPE-1) cells and passaging further. When the cultures exhibited full cytopathic effect, DNA was extracted from infected cells or infected cell supernatant and sequenced. For each dataset, mutations were identified by visual inspection of an alignment of sequence reads to the anticipated viral genome sequence, and their abundance was calculated by counting the proportion of reads containing the mutation. This approach allowed mutations representing major subpopulations to be identified and quantified, and permitted the prevalence of these mutations to be examined in other samples in the same passage series even if present in minor subpopulations. However, minor subpopulations that did not reach sufficient representation in any sample in the series might not have been detected. Three clones (clones 1–3) of RhCMV68−1/EBOV/RL11G− BAC and one clone (clone 1) of RhCMV68−1/EBOV/RL11G+ BAC were transfected, the latter having been derived from one of the former RhCMV68−1/EBOV/RL11G− BAC clones (clone 2). The sequences of the RhCMV68−1/EBOV/RL11G− BAC clones were identical to each other. The scheme for reconstituting and passaging viruses is summarised in Figure 3 and the 16 samples sequenced (Samples A–P) are indicated by red font.
Reconstitution of RhCMV68−1/EBOV/RL11G− BAC clone 1 in Telo-RF cells generated a 1 bp frameshifting deletion in UL128 and a 193 bp frameshifting deletion in UL116 (Rh148). At passages 1, 4 and 8 in Telo-RF cells (Samples A–C), the proportions of the UL128 mutation were 85, 99 and 100 %, respectively, and the proportions of the UL116 mutation were 4, 0 and 0 %, respectively. Virus at passage 1 in Telo-RF cells was also transferred to hTERT RPE-1 cells. At passages 2 and 5 in these cells (Samples D–E), the proportions of the UL128 mutation were 40 and 30 %, respectively, and the proportions of the UL116 mutation were 38 and 66 %, respectively. Thus, both mutations were present in passage 1 in Telo-RF cells, and the UL128 mutation was selected for in Telo-RF cells but selected against in hTERT RPE-1 cells. In contrast, whereas the UL116 mutation was selected against in Telo-RF cells, it was selected for in hTERT RPE-1 cells. Reconstitution of RhCMV68−1/EBOV/RL11G− BAC clone 2 in Telo-RF cells generated a 1022 bp deletion truncating UL128 and UL130 and nine linked C to T substitutions in US12 (Rh190). Among the substitutions, four were synonymous, three were nonsynonymous, and two introduced in-frame stop codons. At passage 1 in Telo-RF cells (Sample F), the percentages of the UL128 and US12 mutations were both 77 %, implying that they were present in the same genome. In contrast, reconstitution of RhCMV68−1/EBOV/RL11G− BAC clone 3 in Telo-RF cells generated no major mutations (Sample G). Selection of mutations in one or more of the three adjacent genes UL128, UL130 and UL131A is a recognised feature of RhCMV and HCMV when passaged in fibroblast cells18,30−32. Additional mutations may be carried fortuitously with these mutations when present in the same genome, or they may be selected independently. In contrast, UL128, UL130 and UL131A are essential for growth of HCMV and RhCMV in non-fibroblast cells because they encode a glycoprotein complex that is required for viral entry into these cells32,33. Consistent with this, no major mutations were generated by reconstitution of RhCMV68−1/EBOV/RL11G− BAC clone 1 in hTERT RPE-1 cells at passages 1, 5 and 10 (Samples H–J), in a mixture of stocks from passages 3–9 in this series (Sample K) grown in hTERT RPE-1 cells, in an independent stock grown from passage 3 in this series grown in hTERT RPE-1 cells, in a mixture of hTERT RPE-1 and Telo-RF cells or in Telo-RF cells alone (Samples L–N), or in an independent stock grown from passage 1 in this series in HTERT RPE-1 cells (Sample O).
In contrast to the results obtained with the RhCMV68−1/EBOV/RL11G− clones, reconstitution of RhCMV68−1/EBOV/RL11G+ BAC clone 1 in RPE-1 cells generated a major mutation at passage 1 in these cells consisting of a 12,778 bp sequence extending from within RL1 to close downstream from RL11H that had been replaced by a 1,786 bp bacterial sequence (Sample P). The proportion of genomes in which RL11G had not been inactivated by this indel was close to 0 %. We conclude that virus reconstituted from the RhCMV68−1/EBOV/RL11G− BACs was genetically unstable when passaged in Telo-RF cells, accumulating mutations not only in UL128, UL130 and UL131A but also in other parts of the genome. In contrast, the genome was stable when virus was passaged in hTERT RPE-1 cells. Virus reconstituted from RhCMV68−1/EBOV/RL11G+ BAC was unstable in hTERT RPE-1 cells, in which RL11G was inactivated.
Cellular tropism of RhCMV/EBOV/RL11G−
The purpose of repairing RhCMV68−1/EBOV BAC is eventually to examine its potential as a model transmissible vaccine platform for providing protective immunity against EBOV following animal-to-animal dissemination of the vaccine. The extent to which RL11G is required for dissemination remains to be determined, but the use of virus reconstituted from RhCMV68−1/EBOV/RL11G+ BAC was precluded because of the instability of RL11G in various cell types tested following reconstitution and passage (Figure 3; data not shown), which is consistent with previous findings for RhCMV23, and the RL11G orthologue (RL13) in HCMV30,31. In contrast, the genome integrity of RhCMV68−1/EBOV/RL11G− BAC was maintained over multiple passages in hTERT RPE-1 cells. To assess cellular tropism, RhCMV68−1/EBOV was reconstituted from RhCMV68−1/EBOV BAC in Telo-RF cells, and RhCMV68−1/EBOV/RL11G− was reconstituted in hTERT RPE-1 cells. Viral growth was measured in infected Telo-RF cells and hTERT RPE-1 cells. Only RhCMV68−1/EBOV/RL11G− was able to replicate in both cell lines (Figure 4).
EBOV-GP expression by RhCMV68−1/EBOV/RL11G−
To examine transcription of EBOV-GP, hTERT RPE-1 cells were infected with RhCMV68−1/EBOV/RL11G−, total infected cell RNA was harvested at 21 d p.i., stranded RNAseq data were generated from the polyadenylated RNA fraction, and the relative proportions of sense and antisense RNAs produced from individual coding regions were calculated (Supplementary Table 1). Sense transcripts predominated (93.61 % of all sense and antisense transcripts combined), and EBOV-GP was the sixth most highly expressed (2.31 %) sense RNA of the 185 coding regions assessed. Transcripts from the RL11 family were notable by their generally low level of expression.
Finally, to examine translation of EBOV-GP, Telo-RF cells were infected with RhCMV68−1 or RhCMV68−1/EBOV/RL11G−, and hTERT RPE-1 cells were infected with RhCMV68−1/EBOV/RL11G−. Immunoblotting was carried out on infected cell proteins using an EBOV-GP-specific monoclonal antibody (mAb) to detect EBOV-GP, a RhCMV UL44 protein-specific antibody to confirm viral infection, and an anti-glyceraldehyde-3-phosphate dehydrogenase (GAPDH) mAb to monitor cellular protein expression. EBOV-GP was expressed in increasing amounts by RhCMV68−1/EBOV/RL11G− in Telo-RF cells at least until 7 d p.i. (Figure 5A) and in hTERT RPE-1 cells at least until 15 d p.i. (Figure 5B).