The concept of a digital twin (DT) was introduced as an ideal framework for product life cycle management by Grieves in 2002 [1], comprising three components: (1) a physical system, (2) a corresponding virtual model, and (3) twinning, which is a bidirectional data flow that provides physical-to-virtual (P2V) and virtual-to-physical (V2P) connections [2, 3]. The definition was broadened by the American Institute of Aeronautics and Astronautics: “A digital twin is a set of virtual information constructs that mimics the structure, context, and behavior of a natural, engineered, or social system (or system-of-systems), is dynamically updated with data from its physical twin, has a predictive capability, and informs decisions that realize value. The bidirectional interaction between the virtual and the physical is central to the digital twin.” [4, 5]. The Committee on Foundational Research Gaps and Future Directions for Digital Twins of the National Academies highlighted two central elements of the definition: “the phrase predictive capability to emphasize that a digital twin must be able to issue predictions beyond the available data to drive decisions that realize value,” and “the bidirectional interaction, which comprises feedback flows of information from the physical system to the virtual representation and from the virtual back to the physical system to enable decision making, either automatic or with humans in the loop.”[6].
Since their inception, digital twins (DTs) have found numerous applications in fields where forecasts and predictions are crucial, including atmospheric and climate sciences, business, engineering, finance, and health care [6–13]. Designed to provide timely and actionable information tailored to decision-making [4], DTs simulate real world conditions, respond to changes, improve operations, and add value. According to McKinsey & Company, investments in DT will surpass $48 billion by 2026 [14].
In healthcare, DTs have found applications under the umbrella field of precision medicine [2]. Proponents of precision medicine suggest that DTs can help identify individuals at-risk for disease, which provides the opportunity for early intervention to prevent worse outcomes. The prediction of treatment outcomes can also enable the development of personalized interventions optimized for each individual patient [15–20]. In addition, both the US Food and Drug Administration (FDA) and the European Medicines Agency (EMA) have issued guidelines for using DTs in randomized controlled trials (TwinRCTs) to mitigate risks in treatment development and streamline the evaluation of new technology. Specifically, DTs of patients in a treatment arm can be used as virtual patients in a control arm, which increases the effective number of patients through combination with real patients, and can enable faster and smaller trials by enhancing statistical power [21]. DTs have also been utilized for safety assessment in over 500 FDA submissions [22].
The scale and modeling methods of DT depend on the nature of the physical system and the desired level of detail, fidelity, and functionality [3, 11, 13, 23]. In healthcare applications, DTs can span a broad spectrum of biological scales, encompassing molecular, subcellular, and cellular levels, as well as entire systems (e.g., digestive system), functions (e.g., vision), individual humans, populations, and the biosphere [2, 19, 22–27]. They can also represent medical devices, such as the OCT machine, or healthcare organizations.
Various DT modeling methods have been developed to create sufficiently representative virtual replicas of physical entities, processes, or objects, including geometric modeling, physics-based modeling, data-driven modeling, physics-informed machine-learning modeling, and systems modeling [28]. For example, Subramanian [29] created a DT of the liver in homeostasis to show improved phenotypes comparable to clinical trials. Fisher et al [30] used existing longitudinal data in cognitive exams and labs from patients with Alzheimer’s disease to create DTs and generated synthetic patient data at different time points to simulate natural disease progression; the data generated by DTs were statistically indiscernible from real collected data.
In this proof-of-concept study, a data-driven, generative model approach was taken to develop DTs of contrast sensitivity function, based on a population of human observers tested in different luminance conditions (Fig. 1a). The trial-by-trial responses of existing observers in qCSF testing (N = 112) were used to train a three-level hierarchical Bayesian model (HBM; Fig. 1b) [31] to derive the joint posterior probability distribution of CSF hyperparameters and parameters at the population, condition, subject and test levels (Fig. 1c). Using a generative model, the DTs combine the joint posterior distribution with newly acquired data to predict CSFs for new observers or existing observers in unmeasured conditions (Fig. 1d); data predictions can also serve as informative priors for subsequent testing in those conditions (Fig. 1e).
The CSF characterizes contrast sensitivity (1/threshold) as a function of spatial frequency. As a fundamental assay of spatial vision, it is closely related to daily visual activities in both normal and impaired vision [32–39]. It has emerged as an important endpoint for staging eye diseases and assessing treatment efficacy [40–67]. Importantly, the CSF varies not only with stimulus conditions such as retinal luminance [68], temporal frequency [69, 70], and eccentricity [71, 72], but also with disease progression and treatment [41–46]. Predicting CSF for new observers or existing observers in not-yet measured conditions could help predict human performance in new conditions, identify potential risks and benefits of interventions, and enable personalized treatment for each individual patient. The predictions could also serve as informative priors to reduce test burdens in new measurements. Additionally, at the group level, the DTs of clinical study patients in an active arm have potential value as controls in TwinRCTs.
Previously, we developed a three-level HBM to comprehensively model an entire CSF dataset within a single-factor (luminance), multi-condition (3 luminance conditions), and within-subject experiment design [31]. This model utilized trial-by-trial data and employed a log parabola CSF functional form with three parameters as the generative model at the test level, in addition to hyperparameters at the subject and population levels, incorporating between-and within-subject covariances as well as conditional dependencies across levels. The performance of the HBM was evaluated using an existing dataset [73] of 112 subjects tested with qCSF [74] across three luminance conditions. By leveraging information across subjects and conditions to constrain the estimates, the HBM generated more precise estimates of the CSF parameters than the Bayesian Inference Procedure, which treated data for each subject and experimental condition separately. This increased precision improved signal detection (increased \({d}^{{\prime }}\)) for comparisons in Area Under the Log CSF (AULCSF) and CSF parameters between different experimental conditions at the test level for each subject, along with larger statistical differences across subjects. Importantly, the HBM also captured strong covariances within and between subjects and luminance conditions.
Here, we expanded the application of the HBM to generate DTs for a population of CSF observers (Fig. 1). Our hypothesis is that the DTs created using the HBM can accurately and precisely predict CSFs for new or existing observers in conditions where data is not available. To test this hypothesis, we conducted and assessed 12 prediction tasks (Table 1) using an existing CSF dataset involving 112 subjects tested in three luminance conditions73. We divided the subjects into two groups. Group I’s data in all three luminance conditions served as historical data, while we aimed to predict CSFs for Group II subjects in the 12 prediction tasks (Table 1). In tasks 1 to 3, we utilized the DTs to predict CSFs for Group II subjects across all three conditions without incorporating new data, simulating scenarios for new observers who have not been previously tested. Subsequently, in tasks 4 to 9, we integrated Group II subjects’ CSF data from each luminance condition into the DTs to forecast their CSFs in the other two conditions. Tasks 10 to 12 involved predicting CSFs for Group II subjects in one of the three conditions by incorporating data from the other two conditions. These predictions were then compared against the actual observed data from Group II subjects in the corresponding conditions to evaluate the accuracy and reliability of the digital twins.
Table 1
task | Training data from Group I | Training Data from Group II | Group II Predictions |
1 | L, M, H | none | L |
2 | L, M, H | none | M |
3 | L, M, H | none | H |
4 | L, M, H | L | M |
5 | L, M, H | L | H |
6 | L, M, H | M | L |
7 | L, M, H | M | H |
8 | L, M, H | H | L |
9 | L, M, H | H | M |
10 | L, M, H | L, M | H |
11 | L, M, H | L, H | M |
12 | L, M, H | M, H | L |
In each prediction task, the dataset included historical data from Group I, any new data available Group II, and “missing data” in the unmeasured conditions. The three-layer HBM [31] was utilized to compute the joint distribution of the population, subject and test level CSF hyperparameters and parameters from all the data in each prediction task (details in Supplementary Materials A). The posterior distributions of the parameters in the unmeasured conditions were utilized as input for the CSF generative model to generate predicted CSFs. The validation of the DTs involved assessing the accuracy and precision of their predictions by comparing them with the observed data. Additionally, the advantages of employing predictions from the DTs as informative priors in the qCSF test were evaluated.