Investigation of the first genome draft of the emerging MPX21 virus via all possible ORFs with a minimum length of ten amino acids, covering the forward and reverse strand in all three reading frames, resulted in 10,043 distinctive ORFs. A subsequent BLAST search of the non-redundant protein database22 and the PDB decreased the number of putative ORFs to 925 as well as 123 protein sequences, respectively. Structures of those protein sequences that aligned to proteins in the PDB were predicted by homology modeling (Table 1, Fig. 1). A comprehensive dataset on the process parameters and properties of all potential ORFs within the genome draft of MPX21 as well as the protein models generated within this study are available at https://doi.org/10.6084/m9.figshare.19877842.v1.
Table 1 | Modeling parameters of the putative structural proteome.
Structures of 123 putative ORFs matching with proteins in the PDB were predicted by homology modeling using the Catalophore™ DrugSolver Platform employing Yasara23. Positional information about the respective ORFs within the genome sequence as well as its putative function based on a BLASTP search24,25 of the non-redundant database and modeling parameters such as sequence identity and similarity are summarized. QX: number of ‘X’ in the query sequence, resulting from not resolved sections in the genome sequence; S: strand, on which the putative ORF is located [forward (+) or reverse (-)]; RF: reading frame; HM: homology modeling. Color coding: 0 (red) to 100% (green).
Figure 1 | Genomic map of the putative structural proteome.
Potential ORFs resulting in matches to proteins in the non-redundant protein database are depicted along the genome sequence draft of MPX21. Putative protein sequences in the forward and the reverse strands in three reading frames each are depicted above and below the genome, respectively, and are labeled by their query ID. Protein structures were modeled from the orange colored ORFs. Yellow colored sections in the genome refer to low quality regions, indicated with ‘N’ in the genome sequence. The figure was created using the Python package Matplotlib26 and Blender 3.1.2, available at www.blender.org.
The early-stage structural models presented in this work should promptly serve as an initial collection of putative proteins within the currently spreading MPX, a compound of information which could support timely drug discovery, mutational analyses, and vaccine development. Most probably, the list of models (contained in Table 1) does not represent the complete structural proteome, since a number of 190 ORFs has been described earlier to be present in the MPX genome5. Notably, the list of 925 distinctive ORFs which showed sequence similarity to entries in the non-redundant database will certainly include additional protein sequences within the MPX proteome, which we expect to be of a size close to 190 ORFs. The remaining potential ORFs of this set may contain fractions of protein sequences involved in the evolutionary origin of this virus. Besides that, it may include as-yet-unidentified physiological proteins of the MPX. Eventually, a further (dynamic) refinement of the initial putative structural proteome presented here should be considered, especially for drug targets as F13L27,28, referring to query sequence 9984.