Our goal was to develop an assay capable of distinguishing between sera generated due to mpox infection, MVA immunization, or pre-immune sera without a known history of either infection or MVA immunization. To this end, we created a panel of recombinant poxviral proteins, previously described as immunogenic and/or targets of neutralizing antibodies (see Table 1) 34–46. Among these proteins, we included five pairs of homologous recombinant proteins derived from either VACV or MPXV: A27L/A29L, L1R/M1R, D8L/E8L, A33R/A35R, B5R/B6R. Despite the high sequence similarity between these homologues (ranging between 93.6% and 98.8% sequence homology based on the comparison of the amino acid sequences of MPXV Zaire and VACV; see supporting figures S1 to S8 for pairwise sequence alignments of all proteins from representative orthopoxvirus strains and supporting tables S1 to S8 for sequence homology based on the amino acid sequences), it has been demonstrated that polyclonal antibodies produced post-infection or immunization exhibit differential binding to some of these antigens, namely A27L and L1R. An observation which has been utilized for serological differentiation by ELISA 17. As another potential marker for differentiation, we included the recombinant ATI protein, previously used to distinguish between Dryvax vaccination (employed during the smallpox eradication campaign) and MVA immunization 47. Additionally, H3L (VACV) and A5L (CPXV) were included as important immune dominant antigens 39,48. Finally, we included a complex antigen mixture from VACV infected cells (and the corresponding uninfected cells) to encompass the entire complexity of the antibody immune response, characterized by high redundancy and plasticity 20,27.
Initially, we evaluated our novel multiplex assay using serum panels previously characterised by well-established reference methods, including an in-house ELISA, IFA, or virus NT 19,20. This was done to ensure that the results from our new method agreed with those from established assays. Additionally, our goal was to verify the immunogenicity of the included antigens, and to assess the contribution of antibodies targeting individual recombinant antigens to the overall antibody immune response that is detected in the reference assays. We observed a high degree of agreement in both IgG and IgM measurements between our multiplex assay and the tested reference assays (Fig. 1). Notably, we found strong correlations between the binding to the complex VACV lysate antigen, normalized against the uninfected cell lysate antigen (resulting in the “Delta” response), and the results from the reference assays for both IgG and IgM (Fig. 1a). This indicates that complex antigen mixtures can be effectively adapted to bead-based multiplex assays. The high level of agreement with IFA and NT results suggests that the multiplex assay yields results consistent with reference assays, enabling reliable sero-diagnostics with higher throughput and automation. We determined cut-off values for each of the orthopoxvirus-specific antigens using IFA results as ground truth (Titer ≤ 1:80 negative, above positive) in a ROC analysis for both IgG and IgM results (supporting figure S9, supporting tables S9 and S10). Here, the highest level of agreement was found between the complex Delta antigen and the IFA results. From the recombinant proteins, D8L/E8L, H3L, A33R/A35R, and B5R/B6R agreed best with the IFA results (performance parameters shown in supporting tables S11 and S12). Next, we compared the binding to recombinant antigens in the multiplex assay with binding to the complex Delta antigen using the multiplex assay or ELISA. Here, we found the highest antibodies levels were directed against D8L/E8L, H3L, A33R/A35R, and B5R/B6R both for IgG and IgM detection (Fig. 1b and supporting figure S10). This also held true for the comparison of the IgG responses against the IFA or NT results (Fig. 1c and d, IgM results see supporting figure S11). These observations were strongly supported by data analysis using pearson’s or spearman correlation coefficients (supporting figure S12 and supporting tables S13 to S15) where those antigens showed the closest correlation to both the Delta antigen as well as ELISA, IFA, and NT results. Other antigens tested (A27L/A29L, L1R/M1R, ATI-C, ATI-N, and A5L) also showed significant correlations with the IFA and NT binding assays, albeit to a lesser extent. These findings indicate that all tested orthopoxvirus-specific antigens are immunogenic, with certain antigens contributing more significantly to the overall immune response than others.
Differential immune response after mpox infection and MVA immunisation revealed by the novel multiplex assay
After demonstrating that our multiplex assay closely mirrors the results of other serological assays, the next step was to determine, if our assay could differentiate between the antibody immune responses induced by mpox infection and MVA immunization by utilizing supervised machine learning algorithms. To achieve this, we analysed both IgG and IgM immune responses in two distinct serum panels, which both comprised sera post-mpox infection, post-MVA immunization, and pre-immunisation or infection, but differed with regards to their timepoint of collection (Fig. 2).
The first panel, labelled “acute”, included sera from PCR-confirmed mpox cases collected during the peak of Germany's mpox outbreak from May to August 2022, in the acute or early convalescent phase of the infection (n = 307). It also comprised sera collected two weeks after prime or boost immunization with the Imvanex vaccine (n = 48). The pre-immune sera in this panel were obtained prior to MVA immunization (n = 16) and before the mpox outbreak for diagnostics of measles, mumps, or rubella virus infections (n = 176). The second panel, termed “epi” (short for epidemiological), consisted of 1,120 sera collected from April to June 2023, after which the mpox outbreak had largely subsided in Germany. Those samples were collected from MSM visiting MSM-friendly practices for general STI screening, hence embracing a study population eligible for mpox vaccination and at risk for mpox infection. Metadata with regards to self-reported mpox infection (n = 59), Imvanex immunization (n = 476) or neither mpox infection nor Imvanex immunization (n = 324) was available for a total of 859 sera. Sera in this panel were gathered to conduct a sero-epidemiological study on the prevalence of antibodies post-mpox infection and/or MVA immunization following the outbreak and vaccination campaign. We deliberately chose these distinct panels to replicate scenarios commonly encountered in serological assay development during acute outbreak phases and for subsequent epidemiological studies. We evaluated the performance of various machine learning algorithms, namely Random Forrest (RF), Fuzzy Rule-based Classification (FRBC), Gradient Boosting Classifier (GBC), Linear Discriminant Analysis (LDA), and combinations of LDA and RF or LDA and FRBC, which were trained and tested on the same panel, or on both panels combined. Additionally, algorithms trained on the “acute” panel were applied to predict outcomes in the “epi” panel, and vice versa.
Before training the different ML algorithms, we analysed the IgG and IgM immune response targeting the different antigens stratified by the different panels (“acute” or “epi”) or infection/immunization status (pre, MVA, mpox) (Fig. 3a and b). Additionally, we analysed differences with regards to possible childhood immunizations against smallpox by using an age-based cut-off to classify sera with high probabilities for having received a vaccination (age ≥ 60 years as of 2023), low probabilities (age below 40 years) or ambiguous serostatus (between). When we compared the IgG and IgM immune response over all analytes, we found that the IgG response was higher in patients with presumed childhood vaccination against smallpox (linear mixed model: normalized data ~ childhood immunisation (“Yes” > “No”); intercept = 0.49, slope = -0.05; Wilcoxon p < 2.2e-16) whereas the IgM response was higher in patients without presumed childhood vaccination (intercept = 0.39, slope = 0.02; Wilcoxon p < 2.2e-16, supporting figure S13) irrespective of the serostatus (Pre, MVA, mpox) or serum panel tested (“acute”, “epi”). To quantify the impact of the three categorial variables childhood immunisation, serostatus, or analyte, on the antibody immune profile we performed four three-way ANOVAs, stratified by antibody isotype (IgG or IgM) and serum panel (“acute” or “epi”) (supporting tables S16 to S19) with subsequent determination of the simple main effects (supporting tables S20 to S23) and pairwise comparisons (supporting tables S24 to S27). When we analysed the simple main effects, we found significant contributions from almost all antigens to the IgG response (supporting tables S20, S22). Interestingly, we found antigens contributing differently, depending on the serum panel. In the “acute” panel A27L, ATI-C, D8L, ATI-N, and A29L contributed most in patients with assumed childhood vaccination, while in patients without assumed childhood vaccination, A29L, A35R, B6R, A27L, and E8L contributed most. In the “epi” panel, E8L, D8L, Delta, B5R/B6R, A33R/A35R contributed most in patients irrespective of assumed childhood vaccination. As the IgM response was weaker in patients with previous childhood immunization, fewer antigens contributed significantly to differences between the three serostatuses, with D8L/E8L being the most prominent. Besides that, B6R and A33R/A35R also contributed significantly in both smallpox-vaccinated and non-vaccinated patients. As expected, the overall contribution of the IgM response was much more pronounced in the “acute” panel as compared to the “epi” panel (supporting tables S21, S23). When we performed pairwise comparisons, we found statistically significant differences between pre-immune sera and sera after either MVA immunisation or mpox infections for most antigens in the “acute” panel, when the panels were stratified by presumed childhood immunisation (Fig. 3c and supporting table S24). However, differentiation between MVA immunisation and mpox infection was not possible based on binding to most antigens when analysed in isolation. Additionally, as the immune response in younger patients without previous childhood immunisation against smallpox was generally lower, the immune response after MVA vaccination in those patients was similar to pre-immune sera of older patients with a high likelihood of a previous childhood immunization against smallpox. Remarkably, the immune response directed at the N-terminal fragment of the A-type inclusion protein (ATI-N) was significantly elevated only after mpox infection, in comparison to both pre-immune sera and sera after MVA vaccination. This indicated that reactivity against ATI-N could be a valuable marker to discern acute infection from previous vaccination. As expected, the IgM response in the “acute” panel was elevated significantly mostly in younger patients without previous childhood vaccination (Fig. 3d and supporting table S25). Interestingly, those differences were more pronounced in the MPXV derived proteins as compared to the VACV derived proteins. When we analysed the IgG immune profile in the “epi” panel, we found the highest immune response after mpox infection, followed by MVA vaccination (Fig. 3e and supporting table S26). Due to the later timepoints of sampling after infection, the IgM response was lower with fewer significant differences (Fig. 3f and supporting table S27) while the IgG response especially against the immune dominant antigens D8L/E8L, A33R/A35R, and B5R/B6R was fully matured with fewer low positive signals in the mpox infected patients as compared to the acute phase sera. Conversely, the difference in the IgG response against ATI-N after mpox infection as compared to the pre-immune sera or after MVA vaccination was still highly significant, yet less pronounced than in the acute panel, indicating that ATI-N might perform best in the early phase of mpox infection as a marker for differentiation.
Lastly, we examined whether binding to homologue antigens derived from MPXV or VACV differed among the three immune statuses (pre-immune, MVA, mpox) stratified by a presumed childhood vaccination against smallpox (supporting Figure S14). We compared the IgG and IgM binding responses to MPXV-derived proteins versus VACV-derived proteins across all five tested antigen pairs. We observed stronger binding to MPXV proteins post-mpox infection and to VACV proteins post-MVA immunization for the antigen pairs A33R/A35R, B5R/B6R, D8L/E8L, and to a lesser extent, A27L/A29L. However, no difference in binding to MPXV vs. VACV derived proteins was observed for the L1R/M1R antigen pair. The effect was more pronounced in younger patients without assumed childhood immunization against smallpox for both IgG and IgM binding.
Discrimination of sera post-MVA immunization and post-mpox infection using the novel multiplex assay and machine learning-guided analysis
As we observed notable differences between pre-immune sera and those collected post-mpox infection or post-MVA vaccination, particularly with regard to certain antigens, we trained supervised ML algorithms, to distinguish between the three different serostatuses: pre-immunization, post-MVA vaccination, and post-mpox infection (Fig. 2b). Additionally, we tested combinations of both LDA and RF or FRBC where LDA was employed to distinguish the most relevant features or characteristics for the collected data and, thus, reduce the number of dimensions while RF or FRBC were trained on the updated data. Regarding orthopoxvirus-specific antigens, we included all except L1R and M1R. These were excluded due to batch-to-batch variations observed during the antigen-coupling process for L1R (supporting Figure S15), their relatively weaker correlation with the reference assays compared to other tested antigens (Fig. 1), as well as their weaker contribution to the mpox-specific immune response (Fig. 3). In addition to comparing different ML algorithms, we investigated the influence of different training datasets, hyperparameters, and features on the prediction performance. Firstly, we used data from either the “acute”, the “epi” panel, or a combination of both for training to assess how the algorithms perform across different datasets. As the “epi” panel comprised sera from an at-risk population of MSM, we excluded sera from younger patients (< 40 years at the time of sampling) labelled as pre-immune based on self-reported exposure to either MVA immunisation or mpox infection with clear indicators for exposure to orthopoxviruses, or labelled as MVA with elevated ATI-N binding also hinting towards recent MPXV exposure, from the training dataset (n = 132 sera, supporting Figures S16 and S17). Secondly, we trained the models using either IgG data alone or a combination of IgG and IgM data to determine if including IgM results improved model performance or if IgG data alone was sufficient for discrimination. Although we did see that the childhood vaccination status has an impact on the immune response, we did not further stratify our training datasets for this hyperparameter. This was done to be able to include a more complete dataset as we did not have sufficient data on childhood vaccination status for all samples.
We then applied these models to predict serostatus outcomes in various panels. This included not only testing models on the same serum panels used for training, but also applying models trained on one panel (“acute” or “epi”) to make predictions on the other. Furthermore, all models were trained on a combined dataset containing results from both panels (“all”). To validate the models trained and tested on the same serum panels, we employed a 5-fold cross-validation approach, repeated thrice. For assessing the transferability of the models between different panels, we used the complete “acute” dataset for training when testing on the “epi” dataset, and vice versa.
When we compared accuracy, precision, recall, and F1 scores for the test datasets of the 5-fold cross-validation approach, we found marked differences between the tested panels and ML algorithms (Fig. 4a and supporting table S28). With regards to the tested ML algorithms, GBC showed the best performance, followed by LDA and LDA RF, which performed approximately equally well, while both RF and FRBC performed slightly worse with LDA FRBC showing the overall worst performance. The inferior performance in the latter cases is caused by the dimensional reduction by LDA, which leads to information loss, that affects FRBC more than RF. When we compared the different panels used for training and testing of the ML algorithms, we found good and largely comparable performances when homologue panels were used for training and testing (“acute acute”, “epi epi”, and “all all”). Of those, the performance was slightly better in the “acute acute” panel as compared to both the “epi epi” and “all all” panels. Additionally, the inclusion of IgM data improved the performance, which was more pronounced in the “acute acute” panel. Conversely, the overall performance decreased dramatically, when the “acute” panel was used for training of the algorithms and the models were tested on the “epi” panel and vice versa, highlighting the observed differences in the immune signature between both panels. The performance was better, when the “epi” panel was used as the training dataset and testing was done on the “acute” panel, and LDA-based algorithms seemed to be more robust, however the performance was still not satisfactory in either of these cases (mean F1 scores of ~ 0.6 as compared to ~ 0.8 in the homologue panels).
Next, we analysed the feature importance for those algorithms, where this information was available (LDA, GBC, RF) on the different panels and antibody isotypes used for training. When the algorithms were trained on IgG data from the “acute” panel, ATI-N, A27L/A29L, and ATI-C contributed most, i.e. the algorithms relied on those antigens more often for differentiation as compared to the other antigens for GBC and RF. For LDA, ATI-N and A29L contributed most to differentiation in most cases of the 5-fold cross validation, followed by D8L, B5R, and Delta. When the IgM response was included, the IgM response of E8L and A33R contributed most for the LDA, followed by the ATI-N IgM response. Similar results were obtained for GBC and RF, but with the IgM response against A33R contributing most, followed by E8L and ATI-N. When the algorithms were trained on the “epi” panel, E8L and D8L were contributing most with ATI-N still contributing, but to a lesser degree as compared to the “acute” panel, while IgM contribution was also much less important for the differentiation between the different serostatuses as compared to the “acute” panel (supporting figures S18 to S20).
To assess the specificity of the different ML algorithms trained and tested on the combined dataset (“all all”) in dependence of a presumable childhood vaccination against smallpox, we compared the predictions on a panel of 88 negative sera, which have been sampled before the mpox outbreak, stratified by age groups (Fig. 4b). Here we found that, depending on the algorithm and antibody isotypes used for the predictions, RF and GBC tended to classify sera in older subjects as “mpox”, indicating that those algorithms have a lower specificity in subjects with presumed childhood vaccination. Conversely, LDA was more robust, leading to fewer false misclassifications in older subjects. Classifications as “MVA” also occurred with higher frequencies, indicating that childhood vaccination might also be confused with MVA vaccination to a degree. In agreement with the overall performance, FRBC based algorithms also showed higher rates of misclassification in comparison with the other algorithms tested.
Finally, as we observed a higher rate of misclassification in older patients indicating a more challenging differentiation in patients with previous childhood vaccination, we compared the prediction of the GBC and LDA algorithms on the “all all” dataset in comparison to the ground truth either on all data, or, where available, on a dataset comprising patients with a high likelihood of childhood vaccinations (age 60+), on a data set with patients without childhood vaccinations (age < 40) or with a more ambiguous vaccination status (age between) (Fig. 4c; results for RF and FRBC in supporting figure S20, full results supporting table S29). Here we found that classifications of pre-immune or MVA sera as mpox sera were increased in patients with a high likelihood of childhood smallpox vaccination. This effect was more pronounced when the GBC algorithm was used as compared to the LDA algorithm (see supporting table S30 for assay performance of the LDA algorithm stratified by childhood vaccination). Conversely, a much smaller fraction of pre-sera from patients without a history of childhood vaccination were classified as mpox whereas more MVA and mpox sera were misclassified as pre-sera, especially when LDA was used for classification. This indicated that LDA was more specific, but less sensitive as compared to the GBC algorithm. Lastly, another advantage of the LDA algorithm is, that the classification probability, which the algorithm uses to predict different classes, can be used to further increase the stringency of the predictions (Fig. 5). By excluding low confidence predictions, which also have a higher rate of misclassification (Fig. 5a), the accuracy of predictions for the remaining samples could be increased. The cut-off values were determined by ROC analysis (Fig. 5b, supporting table S31) to optimize the separation between correctly classified and incorrectly classified samples while simultaneously minimizing the number of excluded samples. When we applied this approach to the different datasets used for training and testing, we were able to improve the performance parameters in the remaining samples above the cut-off (Fig. 5c) while the performance parameters decreased below the cut-off value (Fig. 5d). For example, the overall accuracy of predictions based on the IgG data above the cut-off value applied to the “all all” seropanel improved to 0.89 (95% CI: 0.87 to 0.91) while including 70% of all serum samples. However, the accuracy decreased to 0.63 (95% CI: 0.59 to 0.66) when including 52% of all serum samples, which fell below the cut-off value (see full results in supporting table S30). For samples close the cut-off values, classification scores fell both above and below the cut-off, depending on the run and replication of the 5-fold cross validation approach hence leading to sums above 100% for individual sera.