From Feasibility to Insight: Piloting Feature Extraction from FHIR Cohorts to Advance Clinical Research

doi:10.21203/rs.3.rs-4977169/v1

Download PDF

Research Article

From Feasibility to Insight: Piloting Feature Extraction from FHIR Cohorts to Advance Clinical Research

https://doi.org/10.21203/rs.3.rs-4977169/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Background

Interoperability between healthcare institutions and the standardized sharing of health data pose ongoing challenges. The Medical Informatics Initiative (MII) and the German Portal for Medical Research Data (FDPG) leverage the Fast Healthcare Interoperability Resources (FHIR) standard to address these issues. However, their capability for standardized and automated data extraction still needs to be added.

Objective

This research explores enhancing the FDPG's functionalities beyond its current scope of distributed feasibility studies (e.g., for cohort size estimations) within the existing MII framework. The focus is on extracting a subset of data represented in FHIR for specific cohorts aiming to uncover actionable insights from the health data repositories, thus extending the portal's utility beyond feasibility analyses.

Methods

We developed a prototype focusing on designing a user interface and implementing a local data extraction process. Based on a detailed comparison of existing data extraction tooling, we decided on the Pathling Server, chosen for the alignment of its capabilities with the problem space of data extraction and feasibility and potential as an all-in-one server solution for the FDPG architecture.

Results

We implemented a prototype that showcases the possibility of expanding the FDPG's feature set for local data extraction at clinical sites. Further, we were able to showcase its feasibility in providing researchers with means to extract CSV-formatted reports on specified cohorts based on a synthetic data set.

Conclusion

While a range of considerations are still required for extending the FDPG to support data extraction in a federated network, our work provides valuable insight. Namely, the value of providing an abstraction layer for researchers with an implicit translation to FHIR Path expressions and the benefit of a local CSV extraction. The approach of using Pathling requires staging project-specific data due to performance constraints. This poses privacy risks and should, therefore, be revisited. By presenting an early prototype, we hope to gather additional feedback from different stakeholders in the MII, including but not limited to clinical researchers, data stewards, and data privacy specialists.

Technical innovation requires more than mere technical capability. Factors like timing, existing infrastructure, market readiness, user acceptance, governmental structures, and regulatory considerations are just as crucial as technical proficiency [1].

One such innovation is making patients’ healthcare data accessible to researchers. Beyond feasibility queries for identifying and estimating cohort sizes, data extraction is critical for making routine healthcare data available. While the technical feasibility has been previously demonstrated [2–5], we believe the current environment and infrastructure established within the German network of university hospitals and other healthcare providers united under the Medical Informatics Initiative (MII) [6] and the Network University Medicine (NUM) [7] presents a unique opportunity. Given the existing frameworks, infrastructure, and available interoperable healthcare databases underpinned by regulatory and governmental structures, we consider now the perfect time to introduce data extraction. Enhancing these capabilities is expected to unlock the potential of one of Germany’s largest federated real-world healthcare datasets, providing clinical researchers unprecedented access and advancing the scope and quality of healthcare research. This should facilitate more comprehensive studies and allow for the rapid development of tailored, evidence-based treatments that could significantly improve patient outcomes [8].

Before introducing data extraction in a federated network of university hospitals planned for rollout in the winter of 2024, we limit ourselves to a single hospital in this pilot study. We emphasize integrating existing software solutions and concepts of the FDPG – a single access point for researchers currently supporting feasibility queries and requesting and managing data access [9]. Therefore, our goal is to get early insights through a prototype for data extraction. We aim to continuously refine our solution based on feedback and user experience from this initial implementation. This process will help us identify and address occurring challenges early on and provide insights into opportunities for further enhancements as we prepare for a broader rollout by the end of 2024.

The German Medical Informatics Initiative Research Network Infrastructure

The MII, funded by the German Federal Ministry of Education and Research (BMBF), represents a national effort to integrate and utilize clinical data to advance and improve medical research and patient care. Central to these objectives is creating a federated research infrastructure that enables seamless and secure data exchange across the national network. Currently, 29 university hospital partner sites within the funding scheme and associated partner institutes act as nodes through their data integration center (DIC). The centers are tasked with creating technical and organizational frameworks necessary to enable the findability and accessibility of their routine data within the network [10]. Consequently, the DIC must ensure the interoperability of the data.

Within the MII, the FHIR Standard, with a specific adaptation in the so-called core data set (CDS), defines the common structural and semantical representation for the otherwise highly heterogeneous routine data to enable interoperability across sites. The CDS covers a patient's most common and relevant health care data, including demographic, diagnostic, laboratory, and medication information in the basic profiles and additional use case-driven elements for specific applications (i.e., Oncology). The CDS is iteratively extended, nationally coordinated, and balloted [11,12].

Each university hospital site maintains complete control over its data, which poses challenges for researchers conducting multi-site studies. Researchers must often query or negotiate with each site individually without knowing if the sites have sufficient data for their studies. The FDPG facilitates clinical researchers with a central access point to the federated network of the underlying DIC, and their interoperable data. The FDPG streamlines data access by supporting the researcher in performing federated feasibility queries and managing data use proposals [9]. Two additional FDPG modules to identify cohorts and extract features shall be added in 2024 and will be discussed in this study.

FDPG feature expansion for data extraction

For the scope of this paper on extending the FDPG’s capabilities for data extraction, we will temporarily focus on the technical aspects, excluding the complexities of regulatory and governmental requirements and the iterative interactions between the clinical sites, the clinical researcher, and the supporting staff of the FDPG. Instead, we assume the ideal scenario where the clinical researcher has already gained sufficient access rights to fully identify the cohort, extract the data, and receive the resulting data package.

From a user perspective, the functionality required to identify cohorts and extract their features is independent of the local or federated use case. In a two-step process, the cohort must first be identified based on eligibility criteria. Once identified, the researcher shall be able to express the feature set they intend to extract based on the individual attributes of the health record items.
Once expressed, both components shall be automatically processable at the DIC and return to a structured, accessible format of the feature set. Since the current implementation of the FDPG already supports feasibility queries based on eligibility criteria, which obtain the cohort size available at the clinical sites, we can consequently use the existing software stack for cohort selection. Instead of aggregating the patient count once identified, we only need to process each patient’s health record further to extract the desired feature set.

Requirement Analysis

Based on the broader user story of extending to expand the FDPG’s capabilities for data extraction, we can derive the functional requirements for our prototype that need to be addressed:

Capabilities for feasibility queries shall be expanded to obtain the cohort's health records.
A user interface shall be created to allow the user to express the feature set to extract from the cohort's health record items.
Different granularities in the data extraction for each health record item shall be supported by establishing filters (code, time restriction) to enable different scenarios, for example:
- A feature set of all laboratory values of a patient shall be obtainable.
- A feature set of all laboratory values of a patient in a specific time frame shall be obtainable.
- A feature set of a subset of laboratory values of patients shall be obtainable.
- A feature set of a subset of laboratory values of a patient in a specific time frame shall be obtainable.
- ...
Feature selection on the attribute level for each health record item shall be supported.
The feature set shall be provided in a user-accessible format.
Redundant data storage should be avoided.

FDPG expansion

Our solution for data extraction can be based on the FDPG's architecture [13]. Figure 1 displays the current architecture for the FDPG's on-site use in a university hospital in a simplified manner.

For the feasibility process, despite its limitations [14], we can use FHIR Search by performing individual FHIR Search requests and logically combining the returned patient IDs before aggregating their count. Standard FHIR servers support basic CRUD (Create, Read, Update, Delete) operations but lack advanced query capabilities, limiting their use in complex analytics [15].

The MII commonly employed Blaze [16] FHIR server overcomes some of these shortcomings by supporting the Clinical Quality Language (CQL) [17], which, among others, allows the direct expression of complex feasibility queries and does not require the FHIR Search Execution Engine displayed, enabling a direct request to the FHIR Data Store to obtain the patient count. Unfortunately, direct extraction of features based on individual attributes is currently not supported.

FHIRPath

FHIR Path can be used to overcomes these limitations. FHIRPath [18], a graph-traversal language created by HL7, allows data handling within the FHIR standard. It is a core technology across various tools that manipulate and query FHIR data. FHIRPath allows precise data interaction within tree-structured data models, independent of the representation formats. Inspired by the fluent interface pattern [19], its syntax employs paths, literals, operators, and function invocations, enabling complex operations such as data transformation, validation, and querying in clinical systems.

Pathling

CSIRO’s Pathling [20] presents a FHIR Server for analytics with extended API capabilities that not only serve our particular use case of data extraction well but would also allow for an integration in the overall architecture as a single solution for feasibility queries, cohort selection, and data extraction and its API facilitates:

Complex Data Queries: Utilizing FHIRPath, Pathling enables more complex queries that span multiple FHIR resources, improving the comprehensiveness and depth of data analysis beyond the limitations of FHIR Search [14].
Attribute-Level Data Extraction: Pathling allows for the extraction of data based on specific attributes, meeting the needs of detailed analytical requirements without the necessity for extensive data manipulation.
Integration with Terminology Services: By incorporating terminology services, Pathling supports enhanced query capabilities that utilize standardized codes and controlled vocabularies, ensuring consistency and interoperability across different data systems.

While we decided on Pathling specifically to execute the data extraction, the separation of concerns allows us to consider the frontend and backend components separately. We will present the user interface first, followed by the exchange format used with the backend, and then return to the integration of Pathling.

User Interface

The FDPG's primary target user group is clinical researchers. For them, the user interface is an abstract layer between the cohort definition and the underlying technical solution. For defining the cohort, the user is presented with the same interface used for creating a feasibility query [13]. The only difference is the returned result (count for feasibility query, list of patient IDs for cohort definition). Notably, the cohort only defines whose data shall be extracted. Without further filtering independent of the feasibility query, the entire medical record of each patient in the cohort would be extracted. A repeatable dynamic form element whose content depends on the selected health record item (representing a FHIR resource type—e.g., Medical Conditions) presents the user with filters and selectable attributes. A single descriptive name for each attribute abstracts each underlying FHIRPath expression, and more detailed text is shown to the user on hover.

In this initial implementation, filter options are limited to code and time restriction, which are the most demanded options [21]. The filters must be re-applied independently of the cohort definition, as the cohort definition defines whose data to extract, not what. Without further filtering, the cohort records contain all medical conditions, not only those searched for in the feasibility query. This being a foreseeable misalignment with the mental model of the user, additional information for the user and means to construct data selection based on the feasibility query should be added to the final product. This will allow the users to extract the criteria of the feasibility query for verification purposes and to gain more detailed insight into the individual data elements.

Figure 2 displays a simplistic data selection form to extract the patient’s gender, birth date, and city. Additionally, all diabetes mellitus conditions are extracted, including the code, verification, clinical status, and onset date.

The plus symbol allows the addition of further health record elements. Once the data selection is finalized, the user can inspect a final preview before submitting the query (Figure 3). In the final preview, only the selected attributes and applied filters are displayed to provide a more concise overview and allow for additional final verification by a DIC data steward. While pre-validated allowlists of FHIRPath may enable automated extraction in the future, we deem this manual verification indispensable in the early adaptation. With an expert view toggle, the underlying FHIRPath expressions can be displayed, allowing for a fast screening of the underlying extraction performed when the query is executed.

The UI was developed by extending the existing FDPG feasibility UI with an additional module for data extraction, as previously presented. For the UI representation, we used the angular dynamic form component and created services to obtain the codings for the filter components from a terminology server and to translate the representation to our exchange format used in the communication with the backend.

ViewDefiniton

The decision on the data extraction exchange format becomes pivotal for separating concerns between the front- and backend. For our solution, we decided on the interoperable FHIR ViewDefinition. In its simplest form, for each health record element associated with a single FHIR Resource, each directly extractable FHIRPath expression is expressed in a single column of the select clause, with the filters described in the where clause. Basing our data extraction on the ViewDefinition will allow for later adaptation and reuse of the developed technology stack once more FHIR servers implement execution engines based on the SQL on FHIR standard [22]. Three additional implementations beyond Pathling already support the execution of a ViewDefinition as defined here [23].

Figure 4 depicts the ViewDefinition used to extract the demographic information (gender, birth date, address type, and city) and diagnoses of diabetes mellitus (the specific diagnosis code which is-a diabetes mellitus, the verification status, the clinical status, the onset time) of a previously defined cohort. The path information provides the necessary computable information. To obtain this representation from the user interface, we transfer the angular form components of the UI elements. In addition to the relevant information for display, each form component holds the underlying FHIRPath expression it represents. Therefore, in the translation, the information within the form and its components can be restructured and reduced to the relevant ones for the ViewDefinition. Appendix 1 provides a JSON representation of the dynamic form translated into the displayed ViewDefinition.

Integrating Pathling into the Existing Architecture

Initially, we intended to base our solution solely on the Pathling server. Given the architectural choice of the FDPG to work with an intermediate query format, which only requires a translation component and Pathling’s API for aggregation, we hoped that Pathling could be a possible candidate for replacing Blaze for our use case. Figure 5 showcases how Pathling could support all three required capabilities:

Feasibility queries by transferring the existing structured representation to filter expressions and calling the $aggregate endpoint of Pathing.
Cohort definition by repurposing the Feasibility queries structured representation as Cohort definition, only differing in execution by utilizing the $extract endpoint to obtain the patient IDs.
Data extraction involves transferring a structured representation of the feature set to extract and its filters to the column and filter components of the $extract endpoint and running the query for each previously identified patient ID.

Before integrating Pathling, Behrend et al. analyzed the performance of Pathling and Blaze and found that for feasibility queries, Blaze outperformed Pathling by a factor of 10 [24]. Consequently, we adjusted the architecture to use Pathling as a staging layer. Figure 6 shows the resulting architecture and its implementation, available on GitHub [25].

Initiated by the user through the definition of a cohort and the features to be extracted, the frontend sends the CCDL and ViewDefinition to the backend, where initially, the Cohort is identified and extracted via the FLARE Component (an execution engine that uses FHIR Search to answer cohort queries defined using the CCDL as displayed) and staged on the Pathling server. Afterward, the ViewDefinitions are converted to Parameters of the $extract method provided by Pathling before joining the CSVs and creating one CSV the user can download via the UI. For a detailed overview of the process, refer to Figure 7.

Verification and Validation

To verify the technical correctness of our implementation, we created a synthetic test data set and leveraged the insight on it. This allowed us to manually verify the correct extraction and filtering for all FHIR Path expressions. We were further able to combine and evaluate the correctness of the extracted data according to the filter criteria. For example, if we search for a cohort of female patients, obtain a cohort of all female patients with a count of 250 individuals, and afterward extract the criterion data element from the same cohort, we can expect that the attribute gender matches the concept female 250 times, and no derivations are present. While trivial this approach allows us to verify the correctness of our tooling, as the actual outcome aligns with the expected outcome based on our knowledge on the data.

Automated testing remains outstanding, but it is important to note that full automation without UI integration is of limited value. It would only test an already verified product, Pathling.

Despite the detailed work on the user interface and our usability considerations, a validation of the tooling is beyond the scope of the current prototype. The developed tooling was presented to several medical informatics experts, and the first insights on the future development of the final product could already be derived (see discussion).

In this study, we focused on developing a prototype to inform further development of a federated and automated data selection and extraction on FHIR data. Given the aim of this study, uncovering additional constraints, complexities, and shortcomings of used software and identifying future work is as important as demonstrating the approach's viability.

The Role of FHIR Profiles

A central point in the discussion of the presented prototype is the role of FHIR Profiles in creating UI elements for data extraction. In prior work for feasibility queries, we used FHIR Profiles to generate the search ontology, where we predominantly used the semantic information within the profiles [26]. Additionally, highly specific profiles are valuable in that context to derive and represent semantic interdependencies for the criteria, a topic we previously explored in depth [27]. Conversely, for data extraction, the focus shifts. Extracting the user interface abstracted FHIRPath expressions from FHIR Profiles inherently requires a transition from the semantic to syntactic aspects. The semantic information only remains relevant for defining the filter element.

Given that each FHIR Profile inherently derives from an FHIR Base Resource and adheres to the 80/20 rule of FHIR, it is possible to use fewer specific profiles to capture the most needed data elements effectively. Therefore, we propose a dual approach where fewer particular profiles can efficiently capture a broad spectrum of data elements while allowing more specific profiles where necessary to ensure precise targeting. This method allows for extensive data extraction without necessitating individual extraction entries for each profile, although retaining the option for such specificity remains a possibility. The approach to select attributes of a resource as a clinical entity aims to enhance usability. By grouping related data elements under a single UI component, the interaction required from users is minimized, thereby reducing cognitive load and decision fatigue [28].

To illustrate the idea, examine the difference between the two approaches to extracting vital parameters. One method could be to select the UI element vital signs, where common signs are preselected or selectable as filters, requiring the specification of the attributes to extract, such as value, coding, and record date, only once. On the contrary, treating each vital sign individually would necessitate repeated attribute selection, increasing the user's task load. However, this method should remain available to accommodate more precise attribute selection needs.

With the prototype developed here, we can already see the validity of this approach by utilizing the Synthea-based data extraction elements for querying instance data based on the CDS, which makes common FHIR attributes from resources like Observation possible.

Alternative Tooling to Pathling

Besides Pathling multiple tools exist to perform data extraction:

The Matchbox FHIR Server supports the FHIR mapping language (FML) [29], enabling implementers to express transformation rules between two directed acyclic graphs. Within the context of FHIR, it can and is used to map between different Versions or Profiles of the same Resource type. Further, it is utilized to transfer existing structures like HL7 CDA Documents to FHIR [30]. Within the FML, the FHIRPath maps the source and target structure.
FHIR Extinguisher [4] provides a façade for FHIR Servers to filter Resource Data using FHIR Search and project the Tree Structure to flat table data based on FHIR Path Expressions.
FHIR GraphQL and its associated tools allow for retrieving data from FHIR resources using the GraphQL specification. The output of a FHIR GraphQL request is a JSON representation of the extracted attributes. Although it appears to have the same structure as an FHIR object, it cannot be validated by a FHIR validator because it is not compliant with the FHIR standard.
FHIR-PYrate [31] is a Python package that provides an API based on the FHIR Path specification to extract DataFrames (tabular representations) from connected FHIR Servers. It uses the FHIRPath Python implementation to execute the expressions [32].
FHIR Cracker [33] is an R-package that uses XPath [34], unlike the other solutions, but provides the same capabilities to download and flatten FHIR resources.
Firely Query Language (FQL) is a proprietary query language developed by Firefly and included in their Firely Terminal Tooling [35] specifically for querying FHIR Resources. It merges elements from SQL, JSON, and FhirPath, enabling users to retrieve and project FHIR data efficiently. It is targeted for utilization in dynamic implementation guides, though its underlying principles can be applied to the broader scope of any FHIR data.

Our choice for Pathling was motivated by its fully integrated support for data aggregation and data extraction, in which case no redundant data storage would have inherently reduced privacy risks. Arguably, with the derivation from the ideal architecture (Fig. 5), a multitude of the presented tools present valid alternatives. Nevertheless, we stuck with our decision for Pathling's direct support for CSV format and integration with the terminology server, both of which significantly lower implementation efforts.

CSV as Extraction Format

For this prototype, the CSV format was chosen given the familiarity among clinical researchers with tabular data [36]. Despite its accessibility, CSV lacks robust data validation capabilities, which the FHIR standard can provide. In federated networks such as the Medical Informatics Initiative (MII), the absence of such validation impedes scalability and compromises data integrity, as there are only assurances regarding the quality of the data extracted with proper validation [37]. This issue poses significant challenges for data joining and linking processes.

A FHIR-based data exchange within a federated network, as intended by the MII, could overcome this challenge. Instead of extracting and converting the FHIRPath expression, they shall be utilized to perform a redaction process. This would enable a FHIR validation process step to verify data for correctness and uniformity, thereby ensuring more reliable data joining and linking processes [37].

However, this integration will likely require further governmental and infrastructure measures, i.e., the commitment to a FHIR validator, the establishment of a central repository for FHIR profiles, and terminology enacting as a single source of truth and making their content available in the DIC for validation. In return, integrating FHIR and its validation mechanisms into the existing framework would significantly improve data interoperability and reliability. This is crucial for the long-term sustainability of healthcare data exchanges across the federated network.

Despite FHIR's advantages, the simplicity of CSV remains the preferred format by many researchers. To address this, developing an abstract layer that supports both formats would be highly beneficial. This layer would enable the redaction of FHIR instance data to include only attributes relevant to specific research questions. This would allow a seamless transition to CSV after successful data validation, merging, and linking. This solution would combine the simplicity of CSV with the structured robustness of FHIR, potentially increasing its adoption among researchers. The approach presented here demonstrates a method where CSV extraction is independent of the redaction process, maintaining consistency in attributes transferred to CSV.

This study demonstrates the potential of integrating Pathling with the existing frameworks of the Medical Informatics Initiative (MII) and the German Health Research Platform (FDPG) to enhance the extraction of data features from FHIR cohorts. Our findings show that implementing a prototype can effectively expand the feature set for local data extraction at clinical sites, thus providing a robust basis for actionable insights. This initial prototype will enhance and support stakeholder engagement in the development process. By integrating feedback mechanisms early on, we aim to address diverse concerns like data privacy, system scalability, usability, and data quality.

Availability of data and material

The implementation of the feasibility platform is available at: https://github.com/geloro94/feasibility-deploy/tree/pathling_extraction_deploy

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Funding

The project was funded by the German Federal Ministry of Education and Research (BMBF) under the FDPG-PLUS Project, with grant numbers 01ZZ2309A, 01ZZ2309C, 01ZZ2309D, 01ZZ2309E, 01ZZ2309F, and DSF Community Project, with grant number 01ZZ2307A.

Author’s Contribution

LR wrote the initial version of the manuscript and was primarily responsible for the conceptual and implementation work. JG oversaw the project, provided valuable feedback, and contributed to all aspects of the manuscript. PB offered significant insights into Pathling's performance compared to Blaze, influencing the resulting architecture. LT participated in discussions regarding data extraction and its implications for future adaptations of this work. MK contributed to conceptualizing this work within the overarching MII process and discussed future implications, particularly concerning the role of FHIR profiles. RM provided insights into data extraction from a clinician's perspective and gave substantial structural feedback on the initial draft. HP served in an advisory capacity and provided substantial structural feedback on the initial draft. JI served in an advisory capacity throughout the research process. All authors actively reviewed, provided feedback, and approved the final version of this paper.

Acknowledgments

Not applicable.

Teece DJ. Profiting from innovation in the digital economy: Enabling technologies, standards, and licensing models in the wireless world. Research Policy 2018 Oct;47(8):1367–1387. doi: 10.1016/j.respol.2017.01.015
ATLAS. GitHub. Available from: https://github.com/OHDSI/Atlas/wiki/Home [accessed Jan 2, 2024]
Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S, Kohane I. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). Journal of the American Medical Informatics Association 2010 Mar 1;17(2):124–130. doi: 10.1136/jamia.2009.000893
Oehm J, Storck M, Fechner M, Brix TJ, Yildirim K, Dugas M. FhirExtinguisher: A FHIR Resource Flattening Tool Using FHIRPath. In: Mantas J, Stoicu-Tivadar L, Chronaki C, Hasman A, Weber P, Gallos P, Crişan-Vida M, Zoulias E, Chirila OS, editors. Studies in Health Technology and Informatics IOS Press; 2021. doi: 10.3233/SHTI210369
Weber GM, Murphy SN, McMurry AJ, MacFadden D, Nigrin DJ, Churchill S, Kohane IS. The Shared Health Research Information Network (SHRINE): A Prototype Federated Query Tool for Clinical Data Repositories. Journal of the American Medical Informatics Association 2009 Sep 1;16(5):624–630. doi: 10.1197/jamia.M3191
Semler S, Wissing F, Heyder R. German Medical Informatics Initiative: A National Approach to Integrating Health Data from Patient Care and Medical Research. Methods Inf Med 2018 Jul;57(S 01):e50–e56. doi: 10.3414/ME18-03-0003
Heyder R, NUM Coordination Office, Kroemer HK, Wiedmann S, Pley C, Heyer C, Heuschmann P, Vehreschild JJ, Krefting D, Illig T, Nauck M, Schaller J, Kraus M, Hoffmann W, Stahl D, Hanß S, Anton G, Schäfer C, Reese J-P, Hopff SM, Lorbeer R, Lorenz-Depiereux B, Prokosch H-U, Zenker S, Eils R, Bucher A, Kleesiek J, Vogl T, Hamm B, Penzkofer T, Schirrmeister W, Röhrig R, Walcher F, Majeed R, Erdmann B, Scheithauer S, Grundmann H, Dilthey A, Bludau A, NUKLEUS Study Group, NUM-RDP Coordination, RACOON Coordination, AKTIN Coordination, GenSurv Study Group. Das Netzwerk Universitätsmedizin: Technisch-organisatorische Ansätze für Forschungsdatenplattformen. Bundesgesundheitsbl 2023 Feb;66(2):114–125. doi: 10.1007/s00103-022-03649-1
Howie L, Hirsch B, Locklear T, Abernethy AP. Assessing The Value Of Patient-Generated Data To Comparative Effectiveness Research. Health Affairs 2014 Jul;33(7):1220–1228. doi: 10.1377/hlthaff.2014.0225
Prokosch H-U, Gebhardt M, Gruendner J, Kleinert P, Buckow K, Rosenau L, Semler SC. Towards a National Portal for Medical Research Data (FDPG): Vision, Status, and Lessons Learned. Studies in Health Technology and Informatics IOS Press; 2023. doi: 10.3233/SHTI230124
Albashiti F, Thasler R, Wendt T, Bathelt F, Reinecke I, Schreiweis B. Die Datenintegrationszentren – Von der Konzeption in der Medizininformatik-Initiative zur lokalen Umsetzung in einem Netzwerk Universitätsmedizin. Bundesgesundheitsbl 2024 Jun;67(6):629–636. doi: 10.1007/s00103-024-03879-5
Ganslandt T, Boeker M, Löbe M, Prasser F, Schepers J, Semler S, Thun S, Sax U. Der Kerndatensatz der Medizininformatik-Initiative: Ein Schritt zur Sekundärnutzung von Versorgungsdaten auf nationaler Ebene. Forum der Medizin-Dokumentation und Medizin-Informatik 2018;20(1):17–21.
Ammon D, Kurscheidt M, Buckow K, Kirsten T, Löbe M, Meineke F, Prasser F, Saß J, Sax U, Stäubert S, Thun S, Wettstein R, Wiedekopf JP, Wodke JAH, Boeker M, Ganslandt T. Arbeitsgruppe Interoperabilität: Kerndatensatz und Informationssysteme für Integration und Austausch von Daten in der Medizininformatik-Initiative. Bundesgesundheitsbl 2024 Jun;67(6):656–667. doi: 10.1007/s00103-024-03888-4
Gruendner J, Deppenwiese N, Folz M, Köhler T, Kroll B, Prokosch H-U, Rosenau L, Rühle M, Scheidl M-A, Schüttler C, Sedlmayr B, Twrdik A, Kiel A, Majeed RW. The Architecture of a Feasibility Query Portal for Distributed COVID-19 Fast Healthcare Interoperability Resources (FHIR) Patient Data Repositories: Design and Implementation Study. JMIR Med Inform 2022 May 25;10(5):e36709. doi: 10.2196/36709
Gulden C, Mate S, Prokosch H-U, Kraus S. Investigating the Capabilities of FHIR Search for Clinical Trial Phenotyping. German Medical Data Sciences: A Learning Healthcare System IOS Press; 2018;3–7. doi: 10.3233/978-1-61499-896-9-3
Lehne M, Sass J, Essenwanger A, Schepers J, Thun S. Why digital medicine depends on interoperability. npj Digit Med 2019 Dec;2(1):79. doi: 10.1038/s41746-019-0158-1
Blaze. 2023. Available from: https://github.com/samply/blaze [accessed Jun 12, 2023]
Clinical Quality Language (CQL). Available from: https://cql.hl7.org/ [accessed May 10, 2024]
Fhirpath - FHIR v5.0.0. Available from: https://www.hl7.org/fhir/fhirpath.html [accessed May 10, 2024]
bliki: Fluent Interface. martinfowler.com. Available from: https://martinfowler.com/bliki/FluentInterface.html [accessed Apr 21, 2024]
Grimes J, Szul P, Metke-Jimenez A, Lawley M, Loi K. Pathling: analytics on FHIR. J Biomed Semant 2022 Sep 8;13(1):23. doi: 10.1186/s13326-022-00277-1
Huang S. Tradeoffs between leveraging FHIR REST APIs vs. GraphQL APIs. DevDays; 2023. Available from: https://www.youtube.com/watch?v=bSvlihRU2oA
Home - SQL on FHIR v0.0.1-pre. Available from: https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/index.html [accessed Apr 21, 2024]
SOF tests. Available from: https://fhir.github.io/sql-on-fhir-v2/#impls [accessed Apr 21, 2024]
Behrend P. Performance Evaluation of FHIR Servers regarding Feasibility Queries for Clinical Trials [Master’s thesis]. [Lübeck, Germany]: University of Lübeck; 2024.
geloro94/feasibility-deploy at pathling_extraction_deploy. Available from: https://github.com/geloro94/feasibility-deploy/tree/pathling_extraction_deploy [accessed May 15, 2024]
Rosenau L, Majeed RW, Ingenerf J, Kiel A, Kroll B, Köhler T, Prokosch H-U, Gruendner J. Generation of a Fast Healthcare Interoperability Resources (FHIR)-based Ontology for Federated Feasibility Queries in the Context of COVID-19: Feasibility Study. JMIR Med Inform 2022 Apr 27;10(4):e35789. doi: 10.2196/35789
Rosenau L, Behrend P, Wiedekopf J, Gruendner J, Ingenerf J. Uncovering Harmonization Potential in Health Care Data Through Iterative Refinement of Fast Healthcare Interoperability Resources Profiles Based on Retrospective Discrepancy Analysis: Case Study. JMIR Med Inform 2024 Jul 23;12:e57005. doi: 10.2196/57005
Sweller J. Cognitive load theory, learning difficulty, and instructional design. Learning and Instruction 1994 Jan;4(4):295–312. doi: 10.1016/0959-4752(94)90003-5
Mapping-language - FHIR v5.0.0. Available from: https://www.hl7.org/fhir/mapping-language.html [accessed Apr 21, 2024]
Dimitrov A, Duftschmid G. Generation of FHIR-Based International Patient Summaries from ELGA Data. In: Schreier G, Pfeifer B, Baumgartner M, Hayn D, editors. Studies in Health Technology and Informatics IOS Press; 2022. doi: 10.3233/SHTI220339
Hosch R, Baldini G, Parmar V, Borys K, Koitka S, Engelke M, Arzideh K, Ulrich M, Nensa F. FHIR-PYrate: a data science friendly Python package to query FHIR servers. BMC Health Serv Res 2023 Jul 6;23(1):734. doi: 10.1186/s12913-023-09498-1
beda-software/fhirpath-py: FHIRPath implementation in Python. Available from: https://github.com/beda-software/fhirpath-py [accessed May 11, 2024]
Palm J, Meineke FA, Przybilla J, Peschel T. “fhircrackr”: An R Package Unlocking Fast Healthcare Interoperability Resources for Statistical Analysis. Appl Clin Inform 2023 Jan;14(01):054–064. doi: 10.1055/s-0042-1760436
XML Path Language (XPath) 3.1. Available from: https://www.w3.org/TR/xpath-31/ [accessed May 10, 2024]
Firely Terminal | Automate FHIR Validation, File and Release Management. Firely. Available from: https://fire.ly/products/firely-terminal/ [accessed May 11, 2024]
Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)—A metadata-driven methodology and workflow process for providing translational research informatics support. Journal of Biomedical Informatics 2009 Apr;42(2):377–381. doi: 10.1016/j.jbi.2008.08.010
Hund H, Wettstein R, Hampf C, Bialke M, Kurscheidt M, Schweizer ST, Zilske C, Mödinger S, Fegeler C. No Transfer Without Validation: A Data Sharing Framework Use Case. In: Hägglund M, Blusi M, Bonacina S, Nilsson L, Cort Madsen I, Pelayo S, Moen A, Benis A, Lindsköld L, Gallos P, editors. Studies in Health Technology and Informatics IOS Press; 2023. doi: 10.3233/SHTI230066

No competing interests reported.

Appendix1.json

Download PDF

Reviews received at journal
10 Nov, 2024
Reviewers agreed at journal
27 Sep, 2024
Reviewers agreed at journal
26 Sep, 2024
Reviewers agreed at journal
25 Sep, 2024
Reviewers invited by journal
24 Sep, 2024
Editor assigned by journal
02 Sep, 2024
Submission checks completed at journal
30 Aug, 2024
First submitted to journal
26 Aug, 2024

You are reading this latest preprint version

From Feasibility to Insight: Piloting Feature Extraction from FHIR Cohorts to Advance Clinical Research

Status:

Version 1

Abstract

Figures

Introduction

The German Medical Informatics Initiative Research Network Infrastructure

FDPG feature expansion for data extraction

Methods

Requirement Analysis

FDPG expansion

FHIRPath

Pathling

Results

User Interface

ViewDefiniton

Integrating Pathling into the Existing Architecture

Verification and Validation

Discussion

Conclusion

Declarations

Availability of data and material

Ethics approval and consent to participate

Consent for publication

Competing interests

Funding

Author’s Contribution

Acknowledgments

References

Additional Declarations

Supplementary Files

Status:

Version 1