While we decided on Pathling specifically to execute the data extraction, the separation of concerns allows us to consider the frontend and backend components separately. We will present the user interface first, followed by the exchange format used with the backend, and then return to the integration of Pathling.
User Interface
The FDPG's primary target user group is clinical researchers. For them, the user interface is an abstract layer between the cohort definition and the underlying technical solution. For defining the cohort, the user is presented with the same interface used for creating a feasibility query [13]. The only difference is the returned result (count for feasibility query, list of patient IDs for cohort definition). Notably, the cohort only defines whose data shall be extracted. Without further filtering independent of the feasibility query, the entire medical record of each patient in the cohort would be extracted. A repeatable dynamic form element whose content depends on the selected health record item (representing a FHIR resource type—e.g., Medical Conditions) presents the user with filters and selectable attributes. A single descriptive name for each attribute abstracts each underlying FHIRPath expression, and more detailed text is shown to the user on hover.
In this initial implementation, filter options are limited to code and time restriction, which are the most demanded options [21]. The filters must be re-applied independently of the cohort definition, as the cohort definition defines whose data to extract, not what. Without further filtering, the cohort records contain all medical conditions, not only those searched for in the feasibility query. This being a foreseeable misalignment with the mental model of the user, additional information for the user and means to construct data selection based on the feasibility query should be added to the final product. This will allow the users to extract the criteria of the feasibility query for verification purposes and to gain more detailed insight into the individual data elements.
Figure 2 displays a simplistic data selection form to extract the patient’s gender, birth date, and city. Additionally, all diabetes mellitus conditions are extracted, including the code, verification, clinical status, and onset date.
The plus symbol allows the addition of further health record elements. Once the data selection is finalized, the user can inspect a final preview before submitting the query (Figure 3). In the final preview, only the selected attributes and applied filters are displayed to provide a more concise overview and allow for additional final verification by a DIC data steward. While pre-validated allowlists of FHIRPath may enable automated extraction in the future, we deem this manual verification indispensable in the early adaptation. With an expert view toggle, the underlying FHIRPath expressions can be displayed, allowing for a fast screening of the underlying extraction performed when the query is executed.
The UI was developed by extending the existing FDPG feasibility UI with an additional module for data extraction, as previously presented. For the UI representation, we used the angular dynamic form component and created services to obtain the codings for the filter components from a terminology server and to translate the representation to our exchange format used in the communication with the backend.
ViewDefiniton
The decision on the data extraction exchange format becomes pivotal for separating concerns between the front- and backend. For our solution, we decided on the interoperable FHIR ViewDefinition. In its simplest form, for each health record element associated with a single FHIR Resource, each directly extractable FHIRPath expression is expressed in a single column of the select clause, with the filters described in the where clause. Basing our data extraction on the ViewDefinition will allow for later adaptation and reuse of the developed technology stack once more FHIR servers implement execution engines based on the SQL on FHIR standard [22]. Three additional implementations beyond Pathling already support the execution of a ViewDefinition as defined here [23].
Figure 4 depicts the ViewDefinition used to extract the demographic information (gender, birth date, address type, and city) and diagnoses of diabetes mellitus (the specific diagnosis code which is-a diabetes mellitus, the verification status, the clinical status, the onset time) of a previously defined cohort. The path information provides the necessary computable information. To obtain this representation from the user interface, we transfer the angular form components of the UI elements. In addition to the relevant information for display, each form component holds the underlying FHIRPath expression it represents. Therefore, in the translation, the information within the form and its components can be restructured and reduced to the relevant ones for the ViewDefinition. Appendix 1 provides a JSON representation of the dynamic form translated into the displayed ViewDefinition.
Integrating Pathling into the Existing Architecture
Initially, we intended to base our solution solely on the Pathling server. Given the architectural choice of the FDPG to work with an intermediate query format, which only requires a translation component and Pathling’s API for aggregation, we hoped that Pathling could be a possible candidate for replacing Blaze for our use case. Figure 5 showcases how Pathling could support all three required capabilities:
- Feasibility queries by transferring the existing structured representation to filter expressions and calling the $aggregate endpoint of Pathing.
- Cohort definition by repurposing the Feasibility queries structured representation as Cohort definition, only differing in execution by utilizing the $extract endpoint to obtain the patient IDs.
- Data extraction involves transferring a structured representation of the feature set to extract and its filters to the column and filter components of the $extract endpoint and running the query for each previously identified patient ID.
Before integrating Pathling, Behrend et al. analyzed the performance of Pathling and Blaze and found that for feasibility queries, Blaze outperformed Pathling by a factor of 10 [24]. Consequently, we adjusted the architecture to use Pathling as a staging layer. Figure 6 shows the resulting architecture and its implementation, available on GitHub [25].
Initiated by the user through the definition of a cohort and the features to be extracted, the frontend sends the CCDL and ViewDefinition to the backend, where initially, the Cohort is identified and extracted via the FLARE Component (an execution engine that uses FHIR Search to answer cohort queries defined using the CCDL as displayed) and staged on the Pathling server. Afterward, the ViewDefinitions are converted to Parameters of the $extract method provided by Pathling before joining the CSVs and creating one CSV the user can download via the UI. For a detailed overview of the process, refer to Figure 7.
Verification and Validation
To verify the technical correctness of our implementation, we created a synthetic test data set and leveraged the insight on it. This allowed us to manually verify the correct extraction and filtering for all FHIR Path expressions. We were further able to combine and evaluate the correctness of the extracted data according to the filter criteria. For example, if we search for a cohort of female patients, obtain a cohort of all female patients with a count of 250 individuals, and afterward extract the criterion data element from the same cohort, we can expect that the attribute gender matches the concept female 250 times, and no derivations are present. While trivial this approach allows us to verify the correctness of our tooling, as the actual outcome aligns with the expected outcome based on our knowledge on the data.
Automated testing remains outstanding, but it is important to note that full automation without UI integration is of limited value. It would only test an already verified product, Pathling.
Despite the detailed work on the user interface and our usability considerations, a validation of the tooling is beyond the scope of the current prototype. The developed tooling was presented to several medical informatics experts, and the first insights on the future development of the final product could already be derived (see discussion).