Identifying Key Predictive Variables in Medical Records Using a Large Language Model (LLM)

doi:10.21203/rs.3.rs-4957517/v1

Download PDF

Perspective

Identifying Key Predictive Variables in Medical Records Using a Large Language Model (LLM)

https://doi.org/10.21203/rs.3.rs-4957517/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

EHR systems are widely used, but leveraging their unstructured clinical notes for insights has been challenging. Large Language Models (LLMs) can offer scalable, precise extraction of pertinent information from clinical notes. This paper presents a novel framework for using LLMs to derive medical insights from EHRs, demonstrated through an assessment on female infertility within the Veterans Health Administration (VHA), combining unstructured and structured data for enhanced analysis.

Health sciences/Health care/Public health

Physical sciences/Materials science

Physical sciences/Mathematics and computing

Artificial Intelligence

Large Language Models

Healthcare Analytics

Data Science in Medicine

In the United States (US), the transition from paper to electronic health records (EHR) was notably accelerated by initiatives such as the Meaningful Use program under the HITECH act¹ from the Centers for Medicare & Medicaid Services. Beginning in 2009, this program was intended to enhance data capture and improve patient outcomes.² Before this, the Veterans Health Administration (VHA), the largest integrated healthcare system in the US, had already independently developed and deployed the Veterans Health Information Systems and Technology Architecture (VistA), which captures medical records for millions of Veterans daily.^3,4 Despite the widespread adoption of EHR systems in the US, translating EHR data into meaningful insights has been limited.^5,6

Traditionally, EHR research and assessment has focused on structured data such as diagnostic codes, demographics, and lab results, leading to numerous insights in epidemiology, as well as advancements in clinical disease prediction and identifying drug repurposing targets.^7–11 Despite the notable achievements in leveraging structured data, it is often limited in providing a comprehensive understanding of a patient’s clinical presentation.^6,12 However, additional information within the medical record, often found in unstructured clinical text, can provide crucial information that might otherwise be unavailable. Prior estimates have even suggested that approximately 80% of medical data remains unstructured and unutilized following documentation.¹³

Previously, traditional Natural Language Processing (tNLP) techniques (e.g., rule-based approaches) have extracted insights from unstructured data.¹⁴ However, tNLP, while effective, necessitates extensive time-intensive preprocessing and lacks contextual understanding.^15,16 Since the 1960s,¹⁷ and more recently with the development of Google’s transformer model in 2017¹⁸ and OpenAI’s release of ChatGPT in 2022,¹⁹ the emergence of foundational Large Language Models (LLMs) has enabled more scalable and flexible analysis of unstructured text. This scalability is particularly crucial within the VHA, which operates across more than 1,255 healthcare facilities, including 170 medical centers and over 1,074 outpatient sites.²⁰ The diversity in language and documentation practices across these numerous sites necessitates an information extraction approach that can adapt to varying terminologies and nuances, making LLMs invaluable for extracting meaningful insights from the vast and varies unstructured clinical text within the VHA.

LLMs, which are trained on vast amounts of textual data and designed to understand context, are able to understand and generate text based on specific requirements. As such, these models are well suited to efficiently process and interpret natural language, adapt to medical terminologies, and analyze unstructured data with greater flexibility and efficiency than tNLP methods.^21–23 The robustness of LLMs, based on their design and training data, enables more nuanced information extraction from medical records.²⁴ By enhancing the ability to analyze unstructured text, these models hold the potential to significantly improve clinical decision-making, patient outcomes, and overall healthcare delivery.

Objective:

We introduce a comprehensive and scalable framework for leveraging LLMs to extract population-level medical insights from EHRs with a focus on unstructured data. Using female infertility as an illustrative example, we outline a methodology that includes (1) data collection and preparation, (2) theme discovery, (3) data integration, and (4) next steps for data analyses. We discuss the integration of LLM outputs with structured data to build comprehensive multi-modal datasets that can be used to enhance more traditional predictive models. Theme discovery is particularly crucial for our framework, as the risk factors for female infertility are still not fully understood. Our approach leverages LLMs to uncover hidden patterns and themes within unstructured notes, provide new insights into potential risk factors that might otherwise remain undetected. This secure and scalable framework also facilitates the analysis of vast amounts of medical records within a reasonable timeframe. Given the significant push to prioritize women’s health research within VHA²⁵ this work addresses a critical gap where lack of attention and dedicated research efforts have left many questions unanswered. A visual outline of our proposed methods can be found in Fig. 1.

An underlining goal is to demonstrate how LLMs can be effectively integrated into the VHA’s existing infrastructure to unlock deeper insights from unstructured EHR data. By refining our understanding of conditions such as female infertility and improving data integration methods, this approach aims to enhance the precision of healthcare delivery and inform more targeted strategies within the VHA, ultimately improving patient outcomes across the Veteran population.

[Insert Fig. 1 Approximately here]

LLM Framework:

Data Collection and Preparation:

Defining Outcome

For our outcome of interest, we explore the medical condition of female infertility, defined by the inability to conceive after 12 months of unprotected sexual intercourse.²⁶ Research on predicting female infertility has drawn the interest of several multidisciplinary groups in recent years.²⁷ Causes for female infertility have been shown to be influenced by various factors such as deficient ovulation, physical disorders, ovarian diseases, endometriosis, cervical trauma, and defective implantation, among others.²⁸ Risk prediction in female infertility has traditionally relied on structured EHR data (i.e., laboratory results, diagnostic codes, etc.). However, relying solely on structured data may overlook critical nuances and contextual information embedded within clinical narratives, which often capture subtleties in patient history, symptom progression, and physician observations. By incorporating unstructured data from clinical notes, our approach aims to reveal these hidden details, offering a more comprehensive understanding of the factors contributing to female infertility. This integration is essential for identifying less obvious risk factors and ensuring that the models developed are both thorough and reflective of the complexities inherent in clinical care. This paper details steps and strategies for using LLMs to create a multimodal dataset aimed at improving the detection of early risk factors for female infertility, which could, in turn, inform strategies for risk mitigation and improving patient outcomes.

Population Selection

For this example, we utilized a retrospective observational analysis framework using data from the Department of Veterans Affairs (VHA) EHR database, accessed through Azure Databricks v13.3 data lake.²⁹ EHR data between January 18th, 2006, and January 18th, 2024, were assessed for this project.

Female patients aged 18 to 35 with at least one primary care encounter within the last three years of the observational period were included. We excluded those over the age of 35, as this is considered “advanced maternal age,” a known risk for fertility complications.²⁶ Using the diagnostic code for female infertility (ICD-9 code: 628.9 or ICD-10 code: N97), we identified patients with a documented infertility diagnosis within their medical records. Female patients of the same age range with no diagnosis of infertility were included in this project as our controls, thus establishing our case-control framework. As a proof of concept, a random sample of 100 infertility cases and 100 controls were identified and examined.

Data Collection

Among our sample population, we extracted structured data from the VHA’s EHR, including patient demographics, diagnostic codes, medications, and laboratory results. In total, over 450 structured variables were extracted (list of structured variables can be found in Supplement Table 1). Unstructured data collection focused on clinical notes from primary care and women’s health clinic encounters. For infertility cases, we randomly extracted one progress note per patient within 5 years prior to their first infertility diagnosis; for controls, we randomly extracted one progress note within 5 years before their latest primary care or women’s health clinic encounter.

Scalable Analytical Tools

For this proof-of-concept methodology, we focused on analyzing unstructured data with Meta’s Llama-3-8b, ³⁰ an open-source LLM, using the foundation model with no additional finetuning from HuggingFace’s Transformers.³¹ Models were served and inferenced on the ND A100 v4-series virtual machine.³² To scale analyses, the Ray open-source unified compute framework was used to distribute workloads among Databricks GPU clusters.³³ This configuration enabled the optimization of analysis and improved processing time. All analyses operated within the Databricks Azure environment on the VA Enterprise Cloud.

Ethics Statement

This quality assessment project received determination of non-research from Stanford Institutional Review Board, (Stanford University, Stanford, CA, USA) Protocol #74380.

Theme Discovery:

Theme discovery is a pivotal step in achieving the primary objective of this work – creating a comprehensive multimodal dataset that integrates both structured and unstructured data to enhance predictive models. By identifying and understanding the underlying themes within unstructured clinical notes, we can extract valuable insights that may not be captured through structured data alone. This section details the initial process for identifying themes within the unstructured clinical notes. By comparing the prevalence of themes among our infertility and control sample populations, we identified key differentiating themes between groups.

Prompt Engineering

We employed iterative prompt engineering to develop and refine our prompts. Prompt engineering is the process of designing and refining the questions or instructions provided to a language model to guide its output.³⁴ This process not only uncovers new themes, but also refines previously identified ones, ensuring a robust and comprehensive thematic exploration.

To extract high-level themes from the text, we developed initial prompts which instructed the model to identify and extract themes present within the clinical note. We used zero-, one-, and few-shot prompting. Zero-shot prompting obtained initial results from the model without additional influence, serving as a baseline for theme identification. One- and few-shot prompting enabled in-context learning by providing one or a few sample notes with identified themes, which helped the model better understand the context and nuances of the task. Figure 2a details an example of a zero-shot prompt utilized to extract these themes from each clinical note. We further examined one- and few-shot prompting. Figure 2b illustrates an example of a one-shot prompt we tested. Based on sampled results, we iteratively refined our prompts, enabling the nuanced extraction of themes.

This iterative refinement of prompts was guided by subject matter experts (SMEs), including medical doctors (MDs) and data scientists, ensuring that the extracted themes were both clinically relevant and aligned with the goals of our assessment. The MDs were all board-certified, with extensive knowledge in reproductive medicine. They played a crucial role in reviewing the model’s outputs by providing expert feedback on the clinical validity of the identified themes, thereby helping to ensure that the themes were reflective of real-world clinical scenarios. The data scientists provided technical explanations of the model’s behavior, ensuring that the refinements were both clinically sound and technically feasible. During this process, SMEs engaged in discussions to resolve any differing opinions. In cases where discrepancies arose, a consensus was reached through collaborative discussion, with each expert providing insights based on their knowledge and clinical practice.

As we refined the prompts, the few-shot prompt approach had more precise and meaningful theme extraction. In this context, “meaningful” themes refer to those can offer valuable clinical insights, contributing to a deeper understanding of female infertility by highlighting potential risk factors or patterns that might otherwise be overlooked.

[Figure 2 Approximately here]

Class-Specific Themes Analysis

Once our few-shot prompt approach identified and extracted relevant themes, we conducted a comparison analysis to determine which themes were prevalent among each sample group. By calculating the absolute difference in the prevalence of themes among the case and control cohorts, we identified ‘infertility concerns’, ‘pregnancy planning and counseling,’ and ‘ovulation disorders’ to be most divergent between case and control groups; Table 1. These findings were further validated alongside SMEs, ensuring they accurately align with established clinical understandings of female infertility. If more granularity was needed, for instance if one of the identified themes was related to ‘ovulation disorders,’ we would delve into the specific keywords associated with this theme, such as ‘polycystic ovary syndrome (PCOS),’ ‘irregular menstrual cycles,’ or ‘anovulation’ for further adjudication.

Table 1

Top 20 Themes Based upon Prevalence Difference Among Groups
High Level Theme	Infertility Group, n	Control Group, n	Difference
Infertility concerns	17	3	14
Pregnancy planning and counseling	12	2	10
Ovulation disorders	15	5	10
Hormonal concerns	9	0	9
HPV* results and screening	5	14	9
Screening recommendations	20	25	5
Medication review	10	15	5
Allergy and sinus conditions	0	5	5
Appointment management	12	8	4
Mental health concerns	10	14	4
Abnormal pap smear test results**	8	4	4
Patient education and counseling	1	5	4
Pain in upper extremities	0	4	4
Headaches and migraines	4	1	3
Bacterial vaginosis	4	7	3
Painful intercourse	1	4	3
Toxic exposure concerns	0	3	3
Treatment plan	0	3	3
Iron deficiency	3	1	2
Hematological abnormalities	2	0	2
HPV = human papillomavirus *Pap smear = Papanicolaou test is a method of cervical screening used to detect potentially precancerous and cancerous processes in the cervix

[Table 1 Approximately here]

We identified five themes that most effectively distinguished between the infertility cases and controls: (1) Infertility concerns, (2) Pregnancy planning and counselling, (3) Ovulation disorders, (4) Hormone imbalances, and (5) Abnormal pap smear. We selected a cutoff of five themes for further analysis to strike a balance between comprehensiveness and manageability. This decision was influenced by the upcoming formal thematic analysis, where it was essential to focus on a set number of themes that could be effectively processed by the model. Limiting the number of themes to five allowed us to maintain analytical depth without overwhelming the model, ensuring that each theme could be thoroughly explored and integrated into future analyses.

Formal Thematic Analysis:

Once we identified the key themes, we developed additional prompts to determine if these themes were present within the clinical notes. Employing a similar approach to prompt engineering as described earlier, we utilized zero-, one-, and few-shot prompting techniques. For instance, Fig. 3 illustrates the instructions for a zero-shot prompt. Subsequent iterations and consultations with SMEs refined our few-shot prompting method, enhancing the model’s ability to identify and confidently assert the presence of specific themes within the clinical text. Figure 4 showcases a sample output from the model, using fictious patient data for illustrative purposes, detailing both the prevalence and the confidence levels of the identified themes.

[Figure 3 Approximately here]

[Figure 4 Approximately here]

By configuring the model to estimate the likelihood of a theme’s presence within the text, rather than simply categorizing presence as a binary variable, we generated numerical probabilities. This methodology facilitates a more nuanced analysis and can contribute significantly to the development of predictive models that are both more sophisticated and informative.

Data Integration:

The final step in our analysis preparation involved transforming the data into a tabular format, where each row corresponded to a patient record and the columns represented the numerical probabilities of whether or not the extracted themes were present within the unstructured text. This data configuration allowed us to merge the newly created probability-encoded variables with our curated structured dataset, which included diagnostic codes, laboratory results, and demographic information for each patient. The resultant multimodal dataset allows for the simultaneous analysis of variables derived from both structured and unstructured data sources. Table 2 details an example of the multi-modal dataset we were able to develop.

Table 2

Example of multi-modal dataset with extracted theme variables.
	Structured Data							Unstructured Data
Patient ID	Race	Ethnicity	BMI	Infertility dx	Pap Smear* Date	Pap Result	….	Theme: infertility concerns	Theme: ovulation disorders	Theme: abnormal pap	….
1	African American	Hispanic	39.8	1	06/17/2023	Normal	….	0.95	0.20	0.0	….
2	White	Non-Hispanic	26.2	0	08/26/2021	Normal	….	0.0	0.10	0.0	….
3	White	Hispanic	40.1	1	02/02/2019	Abnormal	….	0.81	0.62	0.75	….
4	Asian	Non-Hispanic	35.0	1	04/07/2024	Normal	….	0.91	0.76	0.0	….
5	Pacific Islander	Non-Hispanic	29.8	0	03/15/2018	Abnormal	….	0.05	0.11	0.80	….
6	White	Hispanic	31.5	0	06/18/2022	Abnormal	….	0.06	0.08	0.70	….
7	African American	Non-Hispanic	27.8	0	10/29/2023	Normal	….	0.01	0.05	0.05	….
8	White	Non-Hispanic	24.6	1	11/05/2021	Normal	….	0.96	0.80	0.10	….
9	Native American	Hispanic	41.8	1	05/12/2023	Normal	….	0.80	0.67	0.05	….
10	Asian	Non-Hispanic	29.1	0	07/10/2021	Abnormal	….	0.03	0.06	0.85	….
….	….	….	….	….	….	….	….	….	….	….	….
*Pap smear = Papanicolaou test is a method of cervical screening used to detect potentially precancerous and cancerous processes in the cervix

[Table 2 Approximately here]

This approach, from class-specific theme identification through data structuring for pattern analysis, emphasizes the potential of unsupervised learning techniques in extracting meaningful insights from complex, unstructured medical datasets. Through careful application of these methods, we can advance our understanding of specific medical conditions, like female infertility, and contribute valuable knowledge that can be used to improve patient outcomes.

Next Steps:

Once we have established the multimodal dataset, advanced, as well as traditional statistical analyses can be employed. For instance, we may choose to conduct exploratory data analysis (EDA), further identifying key associations between our extracted features and the diagnosis of infertility. Applying statistical tests, such as Pearson’s R or Chi-Square tests, we can compare the prevalence of specific features between our case-control cohorts.

We can additionally leverage machine learning techniques to enhance our understanding for female infertility, identify those are highest risk, or potentially even identify previously unknown modifiable risk-factors. These techniques could include traditional approaches such as multi-variate logistic regression, or more complex algorithms such as random forests and gradient boosting machines, known for their predictive accuracy and robustness against overfitting.³⁵ For instance, Lasso logistic regression could identify the most predictive features of female infertility by utilizing the algorithm's inherent feature selection capabilities. Additionally, models capable of processing sequential data, such as time-series models³⁶ or Long Short-Term Memory (LSTM) networks,^37,38 could analyze temporal patterns in clinical notes, potentially uncovering longitudinal risk factors for infertility.

This pragmatic framework provides an outline of how LLMs can be leveraged at VHA to extract and interpret complex medical data to enhance the care of our Veterans. This framework demonstrates the potential to rapidly advance clinical care, leveraging the large volume of otherwise underutilized information embedded within unstructured EHR notes. The integration of LLMs with traditional structured data will allow for more robust analyses and the development of more impactful predictive models.

The use of LLMs to analyze EHR data for female infertility patients showcases their potential to uncover patterns and associations linked to this condition. Detailed descriptions of symptoms, patient histories, even lifestyle behaviors and environmental exposures, could be quantitatively analyzed and correlated with fertility outcomes. This dataset allows for a more comprehensive analysis, enhancing our understanding of female infertility beyond what structured data analysis alone can provide. The methodologies for validating the LLM outputs are crucial to ensure reliability and accuracy. Cross-validation with different datasets, comparison with traditional methods, and expert reviews are essential steps to establish the validity of findings.

The potential applications of LLMs extend beyond female infertility to other complex and nuanced conditions. For instance, the subtle, often subjective descriptions of symptom progression in ALS or the detailed patient interactions noted in the case of Alzheimer’s disease have previously been systematically analyzed, offering new insights into these elusive conditions.^35,36 These broad applications suggests that LLMs could play a crucial role in transforming medical research across various domains. However, this also introduces new challenges and considerations, particularly in terms of the rigorous validation of model outputs and the management of ethical issues such as data privacy and the potential for bias. These considerations underscore the need for a careful, regulated approach to the use of AI in healthcare, ensuring that these powerful tools are used responsibly and effectively.³⁹

Although this represents a proof of concept for the use of LLMs at VHA, several considerations must be acknowledged. The findings are based on a deliberately limited cohort at VHA and intended solely to illustrate our methodology within our unique healthcare system and patient population. In addition, the results are derived from a constrained sample size within VHA and may not be representative of the broader Veteran population, and not expected to be generalizable to other populations. Furthermore, the methods employed in this paper for creating a multimodal dataset from structured and unstructured text are not exhaustive. Numerous alternative approaches could potentially be explored to enhance data integration and analysis. Additionally, the methods and results of this assessment are ongoing, and they should not be considered finalized or actionable at this early state. Therefore, the content of this manuscript should not be utilized to guide clinical decisions or diagnoses. Further evaluation with a more extensive and varied cohort, along with rigorous clinical validation, is essential to establish the reliability and applicability of this approach.

In conclusion, leveraging LLMs to extract additional knowledge from EHR data represents promising avenue for enhancing our understanding through a more comprehensive analysis of both structured and unstructured data. As we continue to refine these models, the potential for significant advancements in predictive healthcare and personalized medicine to improve our patients’ outcomes and advancing healthcare system is immense. The collaborative, iterative approach highlighted in our analysis, involving continuous collaborative engagement with different domain experts, is crucial.

Acknowledgements: We would like to express our sincere gratitude to the medical providers at the Palo Alto VA Healthcare System for serving as our subject matter experts for this project. Their clinical expertise and thoughtful feedback were instrumental in guiding the prompt engineering process and ensuring the clinical relevance of the themes extracted. Their dedication and insights have significantly enhanced the quality and impact of this research.

Author contribution statement: All authors contributed equally to the conceptualization and writing -review and editing - of the manuscript. ZPV: visualization, methodology, formal analysis (lead), writing -original draft (lead). ADW: data curation, formal analysis (supporting), visualization, writing- original draft (supporting). TLB: data curation, visualization, writing original draft-(supporting). PJH: formal analysis (supporting), visualization, writing -original draft (supporting). MP: formal analysis (supporting), writing -original draft (supporting). LY: formal analysis (supporting), writing -original draft (supporting). TFO: writing -original draft (supporting), supervision.

Conflict of Interest Statement: The authors have no conflicts of interest or disclosures to report.

Disclaimer: The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the U.S. Department of Veterans Affairs or the United States Government.

Data Availability Statement: Due to US Department of Veterans Affairs (VA) regulations and our ethics agreements, the analytic data sets used for this study are not permitted to leave the VA firewall without a Data Use Agreement. This limitation is consistent with other studies based on VA data. However, VA data are made freely available to researchers with an approved VA study protocol. For more information, please visit https://www.virec.research.va.gov or contact the VA Information Resource Center at [email protected]

Code Availability Statement: The full code used for the analyses presented in this paper are available upon request. Due to US Department of Veterans Affairs (VA) regulations and our ethics agreements, the full codebase cannot be shared publicly. Interested researchers may contact the corresponding author to discuss access under appropriate agreements. The code used for the initial theme exploration and for the formal thematic analysis has been made available on GitHub: annadware/LLM_Theme_Extraction: Identifying Key Predictive Variables in Medical Records within the VHA using a LLM (github.com). Any additional code other than that available on GitHub will be provided for academic and research purposes only and will be subject to relevant data use agreements.

Blumenthal D, Tavenner M. The “Meaningful Use” Regulation for Electronic Health Records. New England Journal of Medicine. 2010;363(6):501-504. doi:10.1056/NEJMP1006114/SUPPL_FILE/NEJMP1006114_DISCLOSURES.PDF
Krishnaraj A, Siddiqui A, Goldszal A. Meaningful use: Participating in the federal incentive program. Journal of the American College of Radiology. 2014;11(12):1205-1211. doi:10.1016/j.jacr.2014.09.012
Noël PH, Copeland LA, Perrin RA, et al. VHA Corporate Data Warehouse height and weight data: opportunities and challenges for health services research. J Rehabil Res Dev. 2010;47(8):739-750. doi:10.1682/JRRD.2009.08.0110
Corporate Data Warehouse (CDW). Accessed May 20, 2024. https://www.hsrd.research.va.gov/for_researchers/cdw.cfm
Holmes JH, Beinlich J, Boland MR, et al. Why Is the Electronic Health Record So Challenging for Research and Clinical Care? Methods Inf Med. 2021;60(1-02):32. doi:10.1055/S-0041-1731784
for Healthcare Research A. Registries for Evaluating Patient Outcomes: A User’s Guide Addendum 2-Tools and Technologies for Registry Interoperability Registries for Evaluating Patient Outcomes: A User’s Guide. doi:10.23970/AHRQEPCREGISTRIES3ADDENDUM2
Zong N, Wen A, Moon S, et al. Computational drug repurposing based on electronic health records: a scoping review. NPJ Digit Med. 2022;5(1). doi:10.1038/S41746-022-00617-6
Xu H, Li J, Jiang X, Chen Q. Electronic Health Records for Drug Repurposing: Current Status, Challenges, and Future Directions. Clin Pharmacol Ther. 2020;107(4):712. doi:10.1002/CPT.1769
Coley RY, Boggs JM, Beck A, Simon GE. Predicting outcomes of psychotherapy for depression with electronic health record data. J Affect Disord Rep. 2021;6. doi:10.1016/J.JADR.2021.100198
Soerensen SJC, Thomas IC, Schmidt B, et al. Using an Automated Electronic Health Record Score To Estimate Life Expectancy In Men Diagnosed With Prostate Cancer In The Veterans Health Administration. Urology. 2021;155:70-76. doi:10.1016/J.UROLOGY.2021.05.056
Hasan O, Barkat R, Rabbani A, Rabbani U, Mahmood F, Noordin S. Charlson comorbidity index predicts postoperative complications in surgically treated hip fracture patients in a tertiary care hospital: Retrospective cohort of 1045 patients. International Journal of Surgery. 2020;82:116-120. doi:10.1016/j.ijsu.2020.08.017
Schiltz NK, Foradori MA, Reimer AP, Plow M, Dolansky MA. Availability of information on functional limitations in structured electronic health records data. J Am Geriatr Soc. 2022;70(7):2161. doi:10.1111/JGS.17776
Kong HJ. Managing Unstructured Big Data in Healthcare System. Healthc Inform Res. 2019;25(1):1. doi:10.4258/HIR.2019.25.1.1
Singh S. Natural Language Processing for Information Extraction. Published online July 6, 2018. Accessed May 20, 2024. https://arxiv.org/abs/1807.02383v1
Chapman WW, Nadkarni PM, Hirschman L, D’Avolio LW, Savova GK, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc. 2011;18(5):540. doi:10.1136/AMIAJNL-2011-000465
Adnan K, Akbar R. Limitations of information extraction methods and techniques for heterogeneous unstructured big data. International Journal of Engineering Business Management. 2019;11. doi:10.1177/1847979019890771/ASSET/IMAGES/LARGE/10.1177_1847979019890771-FIG5.JPEG
Raiaan MAK, Mukta MSH, Fatema K, et al. A Review on Large Language Models: Architectures, Applications, Taxonomies, Open Issues and Challenges. IEEE Access. 2024;12:26839-26874. doi:10.1109/ACCESS.2024.3365742
Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need. Adv Neural Inf Process Syst. 2017;2017-December:5999-6009. Accessed May 20, 2024. https://arxiv.org/abs/1706.03762v7
Roumeliotis KI, Tselikas ND. ChatGPT and Open-AI Models: A Preliminary Review. Future Internet 2023, Vol 15, Page 192. 2023;15(6):192. doi:10.3390/FI15060192
About VHA - Veterans Health Administration. Accessed August 18, 2024. https://www.va.gov/health/aboutvha.asp
Belyaeva A, Cosentino J, Hormozdiari F, et al. Multimodal LLMs for health grounded in individual-specific data. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2023;14315 LNCS:86-102. doi:10.1007/978-3-031-47679-2_7
Alqahtani T, Badreldin HA, Alrashed M, et al. The emergent role of artificial intelligence, natural learning processing, and large language models in higher education and research. Research in Social and Administrative Pharmacy. 2023;19(8):1236-1242. doi:10.1016/J.SAPHARM.2023.05.016
Stade EC, Stirman SW, Ungar LH, et al. Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation. NPJ Mental Health Research. 2024;3(1):1234567890. doi:10.1038/S44184-024-00056-Z
Telenti A, Auli M, Hie BL, Maher C, Saria S, Ioannidis JPA. Large language models for science and medicine. Eur J Clin Invest. 2024;54(6). doi:10.1111/ECI.14183
FACT SHEET: President Biden Issues Executive Order and Announces New Actions to Advance Women’s Health Research and Innovation | The White House. Accessed August 18, 2024. https://www.whitehouse.gov/briefing-room/statements-releases/2024/03/18/fact-sheet-president-biden-issues-executive-order-and-announces-new-actions-to-advance-womens-health-research-and-innovation/
Vander Borght M, Wyns C. Fertility and infertility: Definition and epidemiology. Clin Biochem. 2018;62:2-10. doi:10.1016/J.CLINBIOCHEM.2018.03.012
Tadepalli SK, Lakshmi PV. A Comprehensive and Systematic Literature Review of Computational Intelligence Algorithms to Diagnose and Predict Female Infertility. Ann Rom Soc Cell Biol. 25(5926-5943). Accessed May 20, 2024. https://www.researchgate.net/publication/351286144_A_Comprehensive_and_Systematic_Literature_Review_of_Computational_Intelligence_Algorithms_to_Diagnose_and_Predict_Female_Infertility
Roupa Z, Polikandrioti M, Sotiropoulou P, et al. Causes of infertility in women at reproductive age. Health Science Journal. 2009;3(2):80-87. Accessed May 20, 2024. https://pure.unic.ac.cy/en/publications/causes-of-infertility-in-women-at-reproductive-age
Data Lakehouse Architecture | Databricks. Accessed April 10, 2024. https://www.databricks.com/product/data-lakehouse
Touvron H, Lavril T, Izacard G, et al. LLaMA: Open and Efficient Foundation Language Models. Published online February 27, 2023. Accessed June 11, 2024. https://arxiv.org/abs/2302.13971v1
Wolf T, Debut L, Sanh V, et al. HuggingFace’s Transformers: State-of-the-art Natural Language Processing. Published online October 9, 2019. Accessed June 11, 2024. https://arxiv.org/abs/1910.03771v5
ND A100 v4-series - Azure Virtual Machines | Microsoft Learn. Accessed June 11, 2024. https://learn.microsoft.com/en-us/azure/virtual-machines/nda100-v4-series
Ray Clusters Overview — Ray 2.24.0. Accessed June 11, 2024. https://docs.ray.io/en/latest/cluster/getting-started.html
Meskó B. Prompt Engineering as an Important Emerging Skill for Medical Professionals: Tutorial. J Med Internet Res. 2023;25(1). doi:10.2196/50638
Freeman EA, Moisen GG, Coulston JW, Wilson BT. Random forests and stochastic gradient boosting for predicting tree canopy cover: comparing tuning processes and model performance 1. doi:10.1139/cjfr-2014-0562
Che Z, Purushotham S, Cho K, Sontag D, Liu Y. Recurrent Neural Networks for Multivariate Time Series with Missing Values. Sci Rep. 2018;8(1):1-12. doi:10.1038/s41598-018-24271-9
Graves A. Generating Sequences With Recurrent Neural Networks. Published online August 4, 2013. Accessed May 20, 2024. https://arxiv.org/abs/1308.0850v5
Hochreiter S, Schmidhuber J. Long Short-Term Memory. Neural Comput. 1997;9(8):1735-1780. doi:10.1162/NECO.1997.9.8.1735
Trustworthy AI - VA Artificial Intelligence. Accessed August 18, 2024. https://department.va.gov/ai/trustworthy-ai/

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Identifying Key Predictive Variables in Medical Records Using a Large Language Model (LLM)

Status:

Version 1

Abstract

Figures

Background

Objective:

[Insert Fig. 1 Approximately here]

LLM Framework:

Data Collection and Preparation:

Population Selection

Data Collection

Scalable Analytical Tools

Ethics Statement

Theme Discovery:

Prompt Engineering

[Figure 2 Approximately here]

Class-Specific Themes Analysis

[Table 1 Approximately here]

Formal Thematic Analysis:

[Figure 3 Approximately here]

[Figure 4 Approximately here]

Data Integration:

[Table 2 Approximately here]

Next Steps:

Discussion

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1