Self-supervised representation learning for clinical decision making using EHR categorical data: a scoping review

doi:10.21203/rs.3.rs-5058251/v1

Download PDF

Article

Self-supervised representation learning for clinical decision making using EHR categorical data: a scoping review

https://doi.org/10.21203/rs.3.rs-5058251/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

The widespread adoption of Electronic Health Records (EHRs) and deep learning, particularly through Self-Supervised Representation Learning (SSRL) for categorical data, has transformed clinical decision-making. This scoping review, following PRISMA-ScR guidelines, examines 46 studies published from January 2019 to April 2024 across databases including PubMed, MEDLINE, Embase, ACM, and Web of Science, focusing on SSRL for unlabeled categorical EHR data. The review systematically assesses research trends in building efficient representations for medical tasks, identifying major trends in model families: Transformer-based (43%), Autoencoder-based (28%), and Graph Neural Network-based (17%) models. The analysis highlights scenarios where healthcare institutions can leverage or develop SSRL technologies. It also addresses current limitations in assessing the impact of these technologies and identifies research opportunities to enhance their influence on clinical practice.

Physical sciences/Mathematics and computing/Computer science

Biological sciences/Computational biology and bioinformatics/Machine learning

Biological sciences/Computational biology and bioinformatics/Predictive medicine

The advent of EHRs has revolutionized the healthcare industry by providing comprehensive, digitized patient information¹. This shift has enabled healthcare providers to maintain accurate and accessible records, facilitating better patient care². The widespread adoption of EHRs has fueled the development of deep learning models for various clinical decision-making, offering sophisticated tools for predicting patient trajectories, identifying disease patterns, and personalizing treatments^3,4.

Recently, an increasing number of deep neural networks (DNNs) based on SSRL have been deployed in real-world applications⁵. Examples include DINOv2⁶, OpenCLIP⁷ for vision, and GPT-4⁸ for free text. In the medical field, similar models like MedCLIP⁹ and MedSAM¹⁰ have also been developed, trained specifically with medical imaging and textual data. These models are trained on extensive datasets and are open-source, making them easily deployable. The representations they learn from the unlabeled data are designed for general-purpose use, enabling application across various downstream tasks, often referred to as foundation models^11,12. By providing efficient learned representations, these models offer new opportunities to enhance the performance of existing models and reduce the need for large, manually annotated datasets.Analyzing EHRs data poses several challenges, including its sparsity, high-dimensionality, and complex interrelationships¹³. EHRs consist of irregularly spaced visits over time, with each visit containing a subset of thousands of possible medical codes, along with laboratory test results, unstructured text, and images¹⁴. In this review, we focus specifically on EHR categorical data, also referred as structured data. EHR categorical data includes medical codes such as diagnoses, procedures, medications, and laboratory test codes. Categorical data is easier to de-identify following HIPAA guidelines¹⁵, enabling faster construction of large datasets, as it is generally considered safer in terms of patient privacy compared to clinical free text¹⁶.

SSRL in DNNs automatically discovers and extracts features from unlabeled data¹¹. Unlike supervised learning, which relies on labeled datasets, SSRL algorithms seek to predict part of the data from other parts, which could be incomplete, transformed, distorted, or corrupted. Essentially, the model learns to ’recover’ whole, parts of, or merely some features of its original input¹⁷. This enables SSRL to identify patterns and structures within unlabeled data, producing efficient representation vectors. These vectors, along with trained SSRL models, can be used for clustering similar data points, enhancing data visualization, or serving as inputs for subsequent predictive models. Figure 1 illustrates the application of SSRL in the clinical settings. This framework offers several advantages: it reduces the need for extensive manual labeling, can be generalized across different tasks without requiring full model retraining, and often outperforms the models trained on larger labeled datasets. As a result, using these representations and models optimizes manpower, computational resources, and model performance^11,17.

Despite the progress in large models for images and text, there is still a notable absence of large models based on EHRs in real-world applications. Previous reviews, including those by Si et al.¹⁸, Amirahmadi et al.¹⁹, Oss Boll et al.²⁰, and Hama et al.²¹, have covered both supervised and unsupervised methods across various data types. However, none have systematically analyzed representation learning using unlabeled EHR categorical data, covering both clustering and prediction tasks. As a result, the reader is left without a clear understanding of the current State-of-the-Art (SOTA) trends, limitations, and opportunities in this area. This review, covering studies from 2019 to 2024, addresses this critical gap by offering detailed insights into the latest SSRL methodologies for unlabeled EHR categorical data. We assess their potential applications, identify appropriate scenarios for its deployment, and evaluate the feasibility of implementation in current clinical settings. This review offers valuable guidance for future research, practical healthcare data analysis, and implementation in hospital settings. Our scoping review answers three main research questions: i) What techniques and models are used for analyzing categorical data? (see sections “Type of data, “Data preprocessing and “Self-supervised learning models” ii) How can SSRL models enhance clinical decision-making? (see sections “Fields of application” and “Evaluation tasks”) and iii) What are the current trends in the research field, and how do they impact medical settings? (see sections “Limitation of current research”, “Roadmap for using SSRL models in Clinical Decision-Making ” and “Future direction”). The detailed research questions and the full methodology of the scoping review is described in Methods section.

We differentiate our work from the narrative review by Wornow et al.⁵ by specifically addressing SSRL using unlabeled categorical data from EHRs, regardless of the changing definitions and usages of the term “foundation models” to provide the reader a systematic analysis and comprehensive view of current SOTA in the field. Notably, while we overlap with only 12 of the same papers over a similar time period, we include an additional 34 studies, underscoring the broader scope of our review. We highlight several agreements with their claims and systematically clarify the areas of similarity.

This scoping review is intended for an audience comprising medical professionals, data scientists, and healthcare stakeholders such as decision-makers and hospital IT teams. By synthesizing studies from databases across these fields, we aim to bridge the gap between clinical expertise and advanced data science techniques. Considering the societal and economic important impact of leveraging recent research advances in SSRL, our goal is to provide valuable insights that enhance clinical decision-making processes, encourage interdisciplinary collaboration in healthcare informatics, and assist decision-makers in effectively adapting their IT infrastructure and data management strategies.

This section provides a comprehensive overview of the findings from our scoping review, organized around subsections that emerged during our analysis. We begin with the studies characteristics and the type of data that have been used in these studies. Next, we analyze the studies from the technical aspects including the data preprocessing techniques, SSRL model types, models for downstream tasks, the evaluation metrics used and the interpretability techniques. Table 1 summarizes the key features of technical aspect, and Table 2 provides essential information of the studies from the medical perspective.

Studies characteristics

As illustrated in Fig. 2, most of the research (n = 33, 72%) was conducted by interdisciplinary teams of medical experts and data scientists. The United States led in the number of published studies (n = 21, 46%), followed by China (n = 9, 20%) and the United Kingdom (n = 4, 9%). Despite this geographic diversity, only a few studies (n = 11, 24%) involved international collaborations. For details on the authors and research teams, refer to Table S2.

Type of model and trend

Five main model types have been identified for representing EHR categorical data: Transformer-based models (n = 20, 43%), Autoencoder (AE) based models (n = 13, 28%), Graph Neural Network (GNN) based models (n = 8, 17%), Word-embedding models (n = 3, 7%), and Recurrent Neural Network (RNN) based models (n = 3, 7%). Studies that combine two or more model types are counted once for each corresponding model type. To assess their impact on research, we analyzed the number of citations for each model type.

Figure 3 shows the papers published from January 2019 to December 2023, their citation counts by July 2024, and their corresponding model types. Based on the number of citations, Transformers, RNN, and GNN models are the most impactful, with Transformer models showing particularly high citation counts for papers published from 2020 to 2023.

Type of data

Studies utilize various data types to represent patients and medical knowledge. Typically, patient representation is derived from EHRs, incorporating both categorical and non-categorical data. Additionally, external medical knowledge can be integrated into models through data collected beyond EHRs. For detailed information on the modalities used across studies, see Table S3.

EHR

Among the categorical data types in EHRs, diagnosis codes are the most frequently used (n = 45, 98%), including ICD-9, ICD-10-CM, and SNOMED-CT. Medication codes (n = 32, 70%), such as ATC and SNOMED-CT, along with procedure codes (n = 20, 43%) like CPT and ICD-10-PCS. To enhance patient representation, non-categorical data may also be included. The most common non-categorical data types are patient age (n = 19, 41%), clinical measurement values (n = 15, 33%) such as BMI, heart rate, and systolic blood pressure, and clinical narratives from physicians and practitioners (n = 7, 15%).

Beyond EHR

The integration of external data sources can further enrich patient profiles. Medical knowledge graphs and ontologies provide rich hierarchical information, while medical text corpora contain expert medical knowledge. These external sources offer a comprehensive understanding of clinical concept interactions. Among external data sources, ontologies are the most used (n = 7, 15%), they are employed to obtain the medical concept embeddings^22–28and for SSRL training task²³. Other significant external data sources include medical knowledge graph^25,29 and medical text corpora³⁰.

Table 1

A summary of the published year, data type, patient number, model type, and task type of the studies.
		Dataset										model type						task type
		numerical				categorical		patient number		dataset type		model type						task type
model name	year	procedure	diagnosis	medication	lab. tests	age	measurement	unlabeled	labeled	private	public	AE	Transformer	GNN	RNN	Word embedding	Others	classification	clustering	regression
Liang et al.³¹	2019		x	x	x			< 1k	< 1k	x							a	x
de Lusignan et al.³²	2019		x			x		11k		x		x							x
G-BERT²²	2019		x	x				83k	6k		1		x	x				x
Ruan et al.³³	2019		x	x	x	x	x	5k	5k	x		x						x	x
BEHRT³⁴	2020		x			x		1.6M	700k	x			x					x	x
ConvAE³⁵	2020	x	x	x	x			1.6M		x		x							x
Enhanced Reg³⁶	2020		x		x	x	x	104.4k	73k	x		x						x
CLMBR¹⁴	2021	x	x	x	x	x		3.4M	131k	x					x			x
EDisease³⁰	2021		x		x	x		1M	816k	x			x					x	x
ME2Vec³⁷	2021	x	x	x				111k	11k	x	2			x				x	x
PLGMNN³⁸	2021			x	x		x	< 1k	< 1k	x	1						b	x
Med-BERT³⁹	2021		x					28.4M	43k	x			x					x
Huang et al.⁴⁰	2021	x	x	x	x	x	x	105k		x						x			x
BRLTM ⁴¹	2021	x	x	x		x		44k	10k	x			x					x
Phe2vec⁴²	2021	x	x	x	x			300k		x						x			x
CEHR-BERT⁴³	2021	x	x	x				2.4M	591k	x			x					x	x
DICE⁴⁴	2021		x	x	x	x	x	1k	1k	x		x						x	x
Chushig-Muzo et al.⁴⁵	2021		x	x				6,5k		x		x							x
Poulain et al.⁴⁶	2021		x	x			x	7k			3		x							x
Kumar et al.²⁹	2022	x	x	x	x			29k	29k		1	x						x
Shao et al.⁴⁷	2022		x	x	x			30k		x		x							x
Claim-PT²³	2022	x	x	x		x		1.9M	1k	x			x					x
Navaz et al.⁴⁸	2022		x			x	x	5k			4,5	x						x	x
CEHR-GAN-BERT⁴⁹	2022	x	x	x		x		55k	< 1k		3,6		x					x
CEF-CL⁵⁰	2022	x	x					48k	48k	x	3						c	x
ADADIAG⁵¹	2022		x		x			28k	6k	x	6		x					x	x
Manzini et al.⁵²	2022		x	x		x	x	11k		x		x							x
Herp et al.⁵³	2023	x	x	x				19k	19k	x		x						x	x
MMMGCL²⁸	2023	x	x		x			14k	4k		1,2			x				x
MedM-PLM²⁶	2023		x	x				40k	5k		1		x	x				x
Ta et al.⁵⁴	2023	x	x	x	x		x	11k		x						x			x
Hi-BEHRT⁵⁵	2023	x	x	x	x	x	x	2.8M	406k	x			x					x
CLMBR-2⁵⁶	2023	x	x	x	x		x	1.8M	157k	x			x		x			x
Sherbet²⁴	2023		x					46k	7k		1,2			x				x
Ru et al.⁵⁷	2023	x	x	x	x			299k	31k	x			x					x
SeqCare²⁵	2023	x	x	x	x			14k	2k	x	1			x				x
Liu et al.²⁷	2023		x					2k			1	x							x
IPDM⁵⁸	2023		x					119k	24k		1,7		x					x	x
ExMed-BERT¹³	2023		x	x	x	x	x	3.5M	80k	x			x					x
Pellegrini et al.⁵⁹	2023		x	x	x	x	x	22k	22k		1,8,9		x	x				x	x
Jones et al.⁶⁰	2023		x	x				27k	11k	x		x						x
TransformEHR⁶¹	2023		x			x		6.5M	10k	x			x					x
CLMBR-3⁶²	2023	x	x	x			x	242k	18k	x					x			x
Profile model⁶³	2024		x			x		1M	53k	x			x					x	x
Seki et al.⁶⁴	2024		x	x	x		x	32k	15k	x				x				x	x
Foresight⁶⁵	2024	x	x	x		x		710k	37k	x			x					x
Other model type: a: Deep belief network, b: local-global memory neural network, c: contrastive learning
Public dataset: 1: MIMIC-III, 2: eICU, 3: All of Us Program, 4: epidemiological COVID-19 data, 5: Framingham offspring heart study, 6: MIMIC-IV, 7: Alzheimer’s Disease Neuroimaging Initiative (ADNI), 8: TADPOLE, 9: Sepsis Prediction Dataset

Technical aspects

Data preprocessing

Most models treat each data element as a distinct unit or token (n = 44, 95%). The identified data preprocessing techniques address various aspects such as numerical data, categorical data, data cleaning, and data shuffling. Some studies (n = 7, 15%) performed categorization by converting exact ages into intervals and clinical measurements into categories like high, normal, and low, based on clinical evaluation standards^{33,36,40,46,55,56,62}. When maintaining the numerical nature of data, missing value imputation^30,52,59 and value normalization^33,38,44,59 have also been employed.

Some studies standardize data elements by mapping them to known ontologies^23,35,51,56. A common approach to reducing dimensionality and data sparsity is using only the first digits of codes, effectively replacing them with parent node in the hierarchical ontology (n = 15, 33%).

In terms of data cleaning, typical practices include the removal of rare medical terms^{14,35,36,62,63,65} and the elimination of duplicated terms within a specific time range^22,35,42,54. Additionally, shuffling the order of medical concepts within a time window^40,54 was shown to help the model to generalize better, by mitigating the impact of arbitrary sequencing and emphasizing the importance of co-occurrence over specific order. This method can also be considered a form of data augmentation. Detailed information on data preprocessing across studies can be found in Table S4.

Self-supervised learning models

There are two primary self-supervised learning training strategies: generative and contrastive. Generative tasks involve models predicting parts of the data from other parts, which may be incomplete, transformed, masked, or corrupted. These tasks, such as autoregressive prediction and masked modeling, help the model learn to recover whole or partial features of its original input^17,66. Contrastive tasks, on the other hand, focus on distinguishing between similar and dissimilar data points, helping the model capture discriminative features that are essential for understanding different types of data⁶⁶. Both task types are crucial for training models to generate rich, generalized representations from unlabeled data^66,67, and they are applied across various model architectures. The objective of these models is to capture essential patterns and features in the data and output the learned representation which is typically a fixed-length, high-dimensional vector that condenses large amounts of information. Five major architecture types have been identified in the studies, each trained with unlabeled data with different training tasks. Details of the SSRL models used and the temporality monitored in each study are provided in Table S5.

Transformer-based models are among the most impactful model types in the studies. In the medical domain, most transformer-based models treat patients as documents, visits as sentences, and medical concepts as tokens, capturing detailed patient histories. BERT⁶⁸ is a transformer encoder-only model that effectively learns data representations by processing and contextualizing complex sequences of information. BERT models can be trained using various techniques, such as training with only Masked Language Model (MLM) by predicting randomly masked medical concepts in each EHR sequence^{34,41,46,51,57,63}, enhancing its contextual understanding. Training both with MLM and auxiliary tasks^{13,22,39,43,49,59}, further refine the model's representations by guiding it with specific medical insights. Additionally, self-contrastive learning techniques help improve BERT's robustness and accuracy in capturing meaningful patterns in medical data^30,55. Other transformer-based training tasks include next visit code prediction^23,56,61,65, medical code category prediction²³, medication-diagnosis cross prediction²⁶, and token replacement detection ELECTRA⁵⁸.

AE-based models are encoder-decoder models that aim to reconstruct the input, enabling the learning of data representations in a compressed, lower-dimensional space. AEs are designed to learn the most salient features of the data, which can be particularly useful for capturing the underlying structure of categorical EHR data. Various deviations of AE were applied in the studies: Stacked Autoencoder^32,36, Denoising Autoencoder⁴⁵, Autoencoder with RNN units, such as GRU³³ and LSTM^{44,48,52,53,60}. Additionally, AE can be combined with other models such as collective matrix factorization²⁹, CNN, and clustering algorithms^27,47.

GNN-based models use graph learning to represent medical ontologies, hospital visits, and disease co-occurrence. Nodes represent the medical concepts and personal entities, linked by edges indicating their relationships. Graph attention models were used to learn the medical concept embeddings within medical ontologies^22,26, with these embeddings frequently serving as initializations for further model training. Random walk technique is used to embed doctors according to their specialty³⁷. Graph contrastive learning^25,28 generates multiple views of augmented hospital visit graphs by modifying the original graph with node or edge perturbations, allowing the model to learn robust representations by contrasting positive pairs against negative pairs. These approaches ensure that the learned embeddings accurately reflect the complex relationships inherent in medical data⁶⁷.

Word-embedding-based models convert words into numerical vectors, allowing computers to understand their meanings and relationships from their context in a sequence of words. The model learns to map each word or concept to a dense vector representation, capturing semantic similarities based on co-occurrence patterns. Patient EHR data, composed of a sequence of medical concepts ordered by time, are used to train the representation model to predict medical concepts based on their surrounding context, helping the model understand relationships between concepts. Various algorithms were identified, such as Glove⁴², Word2vec^42,54,40 and FastText⁴².

RNN-based models are designed to capture temporal dependencies in sequential data, making them well-suited for tasks involving time-series EHR data. These models are trained with the objective of predicting future medical events based on a patient's historical data. Studies^14,56,62 use a specific type of RNN, GRU. The models were trained to predict the set of medical code of day t based on the medical codes of previous days. To enhance the temporality, these studies have also included the time gap information in the input.

Downstream task models

Predictive models for classification are used with the trained SSRL model as their backbone, to which a specific classification head is added. These predictive models require labeled data for training on specific tasks. Among the articles that have mentioned the predictive models used for classification tasks, different model types have been identified. These models are predominantly characterized by simple architectures which are easy to train. Some studies employ shallow neural networks such as linear layer^23,59,61,63, logistic regression (LR) (n = 8, 17%), and support vector machines (SVM)^31,33. Models that can capture more complex data patterns such as feedforward neural networks (n = 12, 26%) and RNN^{13,37–39,43,53} (n = 6, 13%), are also applied.

Clustering and visualization models are used with the data representation vector as input. We identified several techniques employed across the literature. T-distributed Stochastic Neighbor Embedding (t-SNE) emerged as the most frequently used model for data representation visualization and cluster interpretation (n = 12, 26%). In terms of clustering techniques, K-means^40,52–54 was found to be the most common method. These clustering models take the embedding vectors generated by trained representation learning models as input.

Evaluation metrics

Classification evaluation Since most of the classification tasks were binary, the most frequently used classification metric were AUROC (n = 21, 46%), followed by AUPRC (n = 14, 30%), accuracy (n = 10, 22%) and F1 (n = 9, 20%), while other metrics were also used but less frequently, such as precision (n = 6, 13%) and sensitivity (n = 5, 11%). A few studies have evaluated multi-class classification tasks. Metrics such as average precision^31,34, precision at k^63,65, macro-F1^29,37and weighted F1^24,29 were each reported in the studies^10,24,28,31.

Clustering evaluation Despite the abundance of clustering studies, only a few employed specific clustering analysis metrics. Silhouette analysis (n = 4, 9%) was the most frequently used metric, followed by Davies-Bouldin index^40,44 (n = 2, 4%) and purity score^35,47 (n = 2, 4%)

Interpretability

Interpretability in machine learning is defined as the extraction of relevant knowledge from a machine-learning model concerning relationships either contained in data or learned by the model⁶⁹. Attention weight analysis was used in several studies (n = 6, 13%). Statistical analysis of the clusters was employed in some papers (n = 3, 6%). For post-hoc interpretability, methods such as Integrated gradient¹³ and Gradient-based saliency⁶⁵ were utilized. Most of the papers interpreted their results using visualization computed by t-SNE (n = 12, 26%) and Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP) (n = 3, 6%). Ten papers involved medical expert interpretation. Overall, only two papers attempted post-hoc interpretability methods on trained models. Refer to Table S7 for detailed information on the interpretability methods used in the studies.

Clinical aspects

Table 2

A summary of medical domain, interpretability, and evaluation tasks of the selected studies.
	medical domain						interpretability					evaluation task
model name	Cardiology	General & multiple diseases	Neurology & Psychiatry	Primary Care	Oncology	other domains	attention weight analysis	statistical analysis on cluster	post-hoc interpretability	embedding visualization	medical expert interpretation	disease prediction	mortality prediction	readmission prediction	length of stay prediction	patient similarity	hospitalization	other tasks
Liang et al.³¹	x					a						m
de Lusignan et al.³²				x							x							8
G-BERT²²				x														1
Ruan et al.³³	x									x		m	x
BEHRT³⁴		x					x			x	x	m						8
ConvAE³⁵			x		x	d		x		x	x							9
Enhanced Reg³⁶	x					c						x		x
CLMBR¹⁴		x											x	x	x			2
EDisease³⁰				x						x	x		x			x	x
ME2Vec³⁷				x	x					x		m		x				4
PLGMNN³⁸	x				x	b,d												1, 10
Med-BERT³⁹	x				x		x				x	x
Huang et al.⁴⁰			x							x						x
BRLTM⁴¹			x				x					x
Phe2vec⁴²		x								x	x					x
CEHR-BERT⁴³	x	x										x	x	x			x
DICE⁴⁴	x							x		x	x	m				x
Chushig-Muzo et al.⁴⁵	x					a												5
Poulain et al.⁴⁶	x																	11
Kumar et al.²⁹				x								m	x
Shao et al.⁴⁷						c		x		x						x
Claim-PT²³			x			c						x
Navaz et al.⁴⁸	x					b							x			x
CEHR-GAN-BERT⁴⁹	x			x								x	x
CEF-CL⁵⁰		x										x
ADADIAG⁵¹	x						x			x		x
Manzini et al.⁵²						a					x							5
Herp et al.⁵³					x							m						10
MMMGCL²⁸				x									x	x	x
MedM-PLM²⁶				x			x							x				1, 3
Ta et al.⁵⁴						b					x							5
Hi-BEHRT⁵⁵	x		x			a,e						x
CLMBR-2⁵⁶		x											x	x	x		x
Sherbet²⁴	x			x								m					x
Ru et al.⁵⁷	x													x
SeqCare²⁵		x										m
Liu et al.²⁷						e				x								8
IPDM⁵⁸			x							x		m	x					10
ExMed-BERT¹³						c			x			x
Pellegrini et al.⁵⁹			x			b	x			x		m	x		x
Jones et al.⁶⁰			x									x					x	6, 7
TransformEHR⁶¹			x		x							x
CLMBR-3⁶²		x										x
Profile model⁶³		x								x		m
Seki et al.⁶⁴		x								x				x
Foresight⁶⁵		x							x		x	m

Other medical domain: a: Endocrinology, b: Infectious Diseases, c: Respiratory, d: Gastroenterology, e: Nephrology.

Other evaluation tasks: 1: medication recommendation, 2: ICU transfers, 3: ICD coding, 4: doctor recommendation, 5: patient subtyping, 6: emergency department visit, 7: high medical resource utilization, 8: characterization of clusters, 9: patient stratification, 10: prognosis analysis, 11: multiregression, m: multilabel

Fields of application

Our scoping review identified various tasks across the articles. These tasks were distributed across various clinical domains, with Cardiology^{24,31,33,36,38,39,43–46,48,49,51,55,57} (n = 15, 33%), both General & multiple diseases (n = 11, 24%), Neurology & Psychiatry and Primary Care (n = 9, 20%) being the most frequently studied areas. Oncology (n = 6, 13%), followed, while Infectious Diseases^38,48,54,59, Endocrinology^35,45,52,55 and Respiratory^13,23,36,47 each had 4 downstream tasks (n = 4, 9%). Gastroenterology^35,38 and Nephrology^27,55 had the lowest number of downstream tasks (n = 2, 4%). A detailed overview of the clinical events and their corresponding clinical domain mapping can be found in Table S1.

Evaluation tasks

Upon training, the deep learning models have developed an intrinsic representation of the data, which can be general (multiple tasks) or task-specific (single or few similar tasks). Representation quality is evaluated in various clinical tasks, including predictive tasks, or patient phenotyping. For detailed information on the evaluation tasks in the studies, see Table S7.

Predictive tasks Among the 73 predictive tasks, the primary focus was on disease prediction (n = 27, 59%), followed by mortality prediction (n = 11, 24%), readmission prediction^{14,26,28,36,37,43,56,57,64} (n = 9, 20%), hospitalization (n = 5, 11%), and length of stay prediction (n = 4, 9%).

Additional tasks Included medication recommendations^22,26,38 (n = 3, 7%), ICD coding⁴⁹, doctor recommendations³⁷, ICU transfers¹⁴, emergency department visits⁶⁰, and high medical resource utilization⁶⁰.

Patient Phenotyping Of the 33 patient phenotyping tasks, clustering was primarily used for visualization (n = 15, 33%), patient similarity assessment (n = 8, 24%), characterization of clusters (n = 3, 9%), patient subtyping (n = 2, 6%), and patient stratification (n = 1, 3%).

Medical expert involvement

Medical experts were involved across different stages of the studies, with varying degrees of participation. Among the reviewed publications, expert involvement was most prominent in study design (n = 14, 30%) and result interpretation (n = 13, 28%). Feature selection also saw substantial expert input (n = 10, 22%), while dataset extraction had more limited expert participation (n = 4, 9%).

The most employed SSRL model types include Transformer-based, AE-based, and GNN-based architectures, with training objectives centered on reconstructing or predicting corrupted portions of the input data, these types generally refer to foundation models¹¹. These models can represent EHR data as vectors in a latent space, facilitating clustering and enabling their use in downstream predictive tasks.

To train such SSRL, it is advised to use a broader patient cohort, transferring learned information from the entire patient population to specific models relevant to a subset of the population^13,14. The average unlabeled dataset size used for training SSRL models is 1.3 million data elements, compared to 96k data elements for labeled datasets used in downstream tasks, see Table S6 for detailed information on SSRL training cohort selection, types of cohorts, and cohort size. This comprehensive data exposure enhances the models' ability to learn underlying medical knowledge, thus improving predictive performance with specific patient subsets and even generalizing to other external datasets²³.

Labeling EHR data is manpower-intensive and time-consuming. SSRL models, especially those designed as general-purpose foundation models, streamline the development process by eliminating the need for labeled data or task-specific training⁵. Over half of the studies (n = 24, 52%) focused on general-purpose SSRL models, which, once trained, can be reused across various end tasks, in contrast to the task-specific nature of supervised learning models.

For clinical downstream tasks, the integration of SSRL models substantially improves the model's predictive performance. Training representation models on large unlabeled data enable the models to offer substantial improvements when fine-tuned on smaller datasets, achieving comparable performance with less training data relative to supervised learning models developed from scratch^43,55.

Limitation of current research

We identified several issues in the studies, which can be summarized into three aspects: data, modeling, and real-world medical application, illustrated in Fig. 4.

Data Due to different clinical practices and economic reasons, datasets collected from different regions may differ a lot, which is called data shift. Most studies (n = 26, 57%) rely solely on private data collected from their medical sites. Only a small portion demonstrates model generalizability and transparency using public datasets (n = 11, 24%) or a combination of public and private datasets (n = 9, 20%). Additionally, there is a notable lack of public EHR dataset resources. The most frequently used datasets are MIMIC-III, MIMIC-IV, and eICU, which focus on intensive care data, whereas public datasets for general wards are lacking. See Table S2 for the datasets used and their availability information.

Cohort selection is another critical issue, often leading to selection bias. Rigorous cohort selection criteria, while ensuring data relevance and quality, can result in unrepresentative patient samples, thus affecting the generalizability of the findings¹⁴. Studies frequently exclude patients based on the number of visits, medical codes, age range, and specific medical conditions, leading to cohorts that may not reflect the realistic patient population and often include more severe cases^22,23. Only 9% of studies utilized expert knowledge during the construction of datasets. This raises serious concerns about whether the datasets used are truly representative of the clinical complexities encountered in patient examinations and whether they accurately reflect the current gold standards in clinical practice.

Furthermore, data oversimplification is a common practice, where numerical data is categorized, and medical codes are truncated, which, while reducing input data dimensionality, introduces significant information loss and potential biases⁷⁰. For example, reducing ICD-9 codes to their first three digits decreases the number of concepts from 9,285 to 1,131⁴¹, resulting in a loss of granularity and potentially important clinical details, Detailed information on the impact of preprocessing on the number of features across studies can be found in Table S4.

Finally, the choice of coding systems, particularly the use of ICD for EHR analysis, raises concerns. ICD coding is often influenced by billing requirements rather than clinical accuracy, leading to potential biases. Additionally, since there is no unique mapping of a physician’s diagnosis to a coding scheme such as ICD, there is a tendency to select the code that delivers the greatest economic benefit from among several possible codes¹³. ICD-9, despite being the most frequently used ontology in these studies, has limited clinical relevance, as it does not cover all health conditions⁴⁷. Moreover, variations in ICD coding across countries complicate transfer learning and hinder the development of universally applicable models⁴⁸.

Modeling Most studies were evaluated using predictive tasks, typically comparing their model performance to classic end-to-end machine learning algorithms such as RNN, LR, SVM, and MLP, as shown in Fig. 5. On one hand, this demonstrates the superiority of SSRL frameworks over classic supervised learning ML baseline models. However, it also reveals a lack of direct comparison between different SSRL models. Additionally, the variety of clinical predictive tasks and datasets used makes it challenging to determine which model is optimal for a given task. Nonetheless, we observed that recent studies increasingly benchmark their models against other SSRL frameworks.

Another limitation of the modeling is the lack of interpretability. Deep learning models are often considered black boxes and can suffer from hallucinations. In real-world medical applications, clinical reasoning and model interpretation are crucial for providing justifiable guidance in decision-making¹¹. While most studies attempt to interpret the model outcomes based on attention weights, visual evaluation of clusters with t-SNE, and manual inspection, these interpretation methods can be subjective^71,72. Only a few articles perform formal post-hoc interpretation. The lack of unbiased interpretation may reduce the credibility of the findings.

Application The deployment of SSRL models in real-world clinical settings faces several significant limitations. First, the evaluation metrics commonly used in research are often technically focused and may not align with specific clinical needs, as noted by Wornow et al.⁵. Medical datasets frequently exhibit a marked class imbalance, with a much higher prevalence of healthy cases compared to disease cases. In such scenarios, achieving high sensitivity is often more critical than high specificity, as missing a true positive case can lead to severe consequences. The reliance on conventional data science metrics could result in unforeseen outcomes when applied to clinical practice. Furthermore, despite the potential advantages of transfer learning, SSRL models are typically data-intensive, making it challenging to train such frameworks in environments with limited data, such as small hospitals. To benefit from state-of-the-art models, these institutions would need access to pre-trained models built on large external datasets. However, shareable pre-trained models are often unavailable due to data security concerns. Even when such models are available, they may not be usable due to incompatibilities with different EHR coding systems, leading to interoperability challenges^23,51,73. Only eight studies have demonstrated the effectiveness of pre-trained models on external clinical datasets, highlighting a significant issue: the lack of proven generalizability of transfer learning across diverse populations and clinical environments.

Roadmap for using SSRL models in Clinical Decision-Making

No clear benefits

in clinical settings with low cardinality datasets and abundant annotated data for specific tasks, SSRL models may not offer clear advantages.

Benefits of integrating research

Following trends in Natural Language Processing (NLP), we anticipate the emergence of publicly shared Foundational Models (FMs) trained on extensive and diverse datasets, enabling them to acquire knowledge generalizable to other datasets²³. These FMs could be particularly beneficial for smaller healthcare institutions with limited data and computational resources⁵⁹. Examples include collaborative studies between the United States and German hospitals¹³, the application of models trained on adult data to pediatric cases⁶², and the transfer of models from EHR to insurance data⁶³. Local hospitals would use frozen architecture to build more efficient representations to be used as inputs for smaller supervised models requiring smaller annotated datasets^{33,36,50,50,62}.

Benefits of In-House Development

SSRL models are effective in high-cardinality datasets, particularly those with millions of patient records. The Med-BERT³⁵ model, trained on data from 28 million patients, exemplifies the potential of SSRL in such scenarios. A pretraining phase lasting approximately one week on a high-performance GPU, costing approximately 11,000$ (see Table S8), can produce a robust representation model capable of understanding and predicting complex health outcomes. This approach has demonstrated improved performance on predictive tasks and transferability across various clinical datasets²⁴.

Future directions

Based on the limitation mentioned above, future directions can be proposed:

Research on data It is essential to expand and share public datasets. Increasing the availability of public EHR datasets that cover a broader spectrum of medical care beyond intensive care units is crucial. Collaborative efforts among medical institutions, government agencies, and research organizations can facilitate this expansion. Additionally, establishing data-sharing agreements and frameworks that address privacy and security concerns will enhance model generalizability and transparency. More data from diverse distributions will make models robust to data shifts^49,51, enabling their application in small hospitals with domain adaptations. Additionally, researchers should focus on avoiding excessive data simplification and categorization by exploring advanced techniques for handling high-dimensional data without significant information loss. Recently, Guo et al.⁷³ proposed to use the widely adopted Observational Medical Outcomes Partnership Common Data Model (OMOP CDM). Revisiting coding systems to ensure clinical relevance and consistency across regions, such as adopting or developing comprehensive ontologies like SNOMED-CT⁷⁴, is also recommended.

Research on benchmarking In terms of modeling, future studies should benchmark various SSRL models using standardized clinical predictive tasks and datasets to identify the most effective models for specific applications. Enhancing interpretability is also critical. Developing transparent models or robust post-hoc interpretation methods, such as model-agnostic interpretability techniques and explainable AI (XAI) frameworks, will make models more clinically useful⁷⁵. Recently Self-Explainable Models (SEMs) have shown great explainability by proposing meaningful concepts to the user in medical applications while keeping strong performance⁷⁶. Collaborating with clinicians to interpret outcomes and validate findings will enhance credibility and relevance. Furthermore, demonstrating the possibility of models with minimal effort using advanced techniques like Low-Rank Adaptation (LoRA)⁷⁷ and Retrieval-Augmented Generation (RAG)⁷⁸ will make these models more adaptable and practical for real-world applications¹³.

Research on novel metrics and better proxies for healthcare impact Extensive validation across diverse populations and clinical settings is necessary to apply these models to ensure real-world applicability. Models must be tested with external datasets to prove their generalizability. Pretrained model sharing should be encouraged by developing secure methods, such as federated learning⁷⁹, domain adaptation^51,73 and prompt engineering⁸⁰, thus providing a less expensive route for local hospitals to adapt the model for their specific needs. Adopting evaluation metrics that reflect clinical outcomes and practical utility, rather than relying solely on technical performance metrics, is also essential. Engaging clinicians in the evaluation process will ensure the metrics used are aligned with clinical needs and that the models provide actionable insights. Finally, evaluating the information loss during preprocessing should be a priority, with preprocessing methods adapted to the precise clinical use to preserve critical information. Addressing data privacy and building trust in AI systems through transparent and ethical guidelines will encourage the adoption of these technologies, ultimately leading to improved patient outcomes and more effective healthcare delivery.

Study design and search strategy

We conducted a scoping review following the PRISMA extension for Scoping Reviews (PRISMA-ScR) guidelines⁸¹. To encompass both healthcare and engineering perspectives, we systematically searched five electronic databases: PubMed, MEDLINE, Embase, ACM, and Web of Science. The search was limited to papers published between January 2019 and April 2024.

Our search strategy was designed to identify studies meeting three criteria: (1) utilization of deep learning or neural networks, (2) application of un/self-supervised deep representation learning, and (3) use of electronic health records (EHRs) categorical data as the primary data source for SSRL model training. The search query combined the following keywords: ("deep learning" OR "neural network" OR "machine learning") AND ("unsupervised" OR "self-supervised" OR "pretrain*" OR "pre-train*" OR "BERT") AND ("electronic health record?" OR "ehr" OR "electronic medical record?" OR "emr" OR "Electronic Health Records" OR "health care data" OR "patient longitudinal" OR "patient trajectory").

Study selection

The screening process was conducted in multiple stages, see Fig. 6. First, a pilot screening of 100 papers was performed to refine the inclusion and exclusion criteria. Once consensus was reached, two independent reviewers screened all papers by title and abstract. The inter-rater reliability for the title and abstract screening process was 87%. Disagreements were resolved through discussion to achieve consensus. This was followed by a full-text review. We excluded studies that had duplicate titles, were review articles, did not use unsupervised deep learning on EHR categorical data for patient or encounter representation learning, or had outcomes not directly related to clinical decision-making. Studies focusing solely on physiological signals, clinical free texts, medical images, or clustering were also excluded. Three additional papers were identified through reference screening of included studies, resulting in a final sample of 46 papers for analysis.

Data extraction and analysis

We extracted data on study characteristics (e.g., number of patients, country), input data types and preprocessing methods, unsupervised components of deep learning models, evaluation metrics, end-tasks, and transferability properties. This information was compiled into a standardized spreadsheet (available in Supplementary Data 1, Tables S1-S8). Data analysis was performed using Python, primarily employing the pandas library for descriptive statistical techniques.

EHR

Electronic Health Record

SSRL

Self-Supervised Representation Learning

DNNs

Deep Neural Networks

SOTA

State-of-the-Art

RNN

Recurrent Neural Network

LSTM

Long Short Term Memory

GRU

Gated Recurrent Unit

BERT

Bidirectional Encoder Representations from Transformer

GNNs

Graph Neural Networks

Autoencoder

MLM

Masked Language Model

CNN

Convolutional Neural Network

t-SNE

T-distributed Stochastic Neighbor Embedding

UMAP

Uniform Manifold Approximation and Projection for Dimension Reduction

AUROC

the Area Under the Receiver Operating Characteristic Curve

AUPRC

Area Under the Precision-Recall Curve

NLP

Natural Language Processing

COMPETING INTERESTS

All authors declare no financial or non-financial competing interests.

Author Contribution

Z.Y. and B.A. led the review, performed data extraction, study analysis, the writing, and generated the figures and tables. B.M. supervised the review process, provided necessary feedback, generated figures, writing, and content curation. Z.J. provided NLP specialist insight, B.L. and E.J. provided medical specialist insight, G.C. provided medical semantic specialist insight. L.C. and S.M. oversaw the review process and provided necessary feedback. Z.Y. and B.A. contributed equally to the study. All authors have approved the manuscript and agree with its submission to this journal.

ACKNOWLEDGEMENTS

This study is part of the HERO project (CCER 2023 − 01571) which is financed by the philanthropic donation of Mr. Nicolas Pictet.

Gunter, T. D. & Terry, N. P. The Emergence of National Electronic Health Record Architectures in the United States and Australia: Models, Costs, and Questions. J Med Internet Res 7, e3 (2005).
Tsai, C. H. et al. Effects of Electronic Health Record Implementation and Barriers to Adoption and Use: A Scoping Review and Qualitative Analysis of the Content. Life (Basel) 10, 327 (2020).
Shickel, B., Tighe, P. J., Bihorac, A. & Rashidi, P. Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE journal of biomedical and health informatics 22, 1589–1604 (2017).
Health, C. for D. and R. Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices. FDA (2024).
Wornow, M. et al. The shaky foundations of large language models and foundation models for electronic health records. npj Digit. Med. 6, 1–10 (2023).
Oquab, M. et al. DINOv2: Learning Robust Visual Features without Supervision. Preprint at http://arxiv.org/abs/2304.07193 (2024).
Cherti, M. et al. Reproducible scaling laws for contrastive language-image learning. in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2818–2829 (2023). doi:10.1109/CVPR52729.2023.00276.
OpenAI et al. GPT-4 Technical Report. Preprint at http://arxiv.org/abs/2303.08774 (2024).
Wang, Z., Wu, Z., Agarwal, D. & Sun, J. MedCLIP: Contrastive Learning from Unpaired Medical Images and Text. Preprint at https://doi.org/10.48550/arXiv.2210.10163 (2022).
Ma, J. et al. Segment anything in medical images. Nat Commun 15, 654 (2024).
Bommasani, R. et al. On the Opportunities and Risks of Foundation Models. Preprint at https://doi.org/10.48550/arXiv.2108.07258 (2022).
Nicora, G., Gerbasi, A., Sacchi, L. & Bellazzi, R. Foundation Model and Electronic Health Records: A SWOT Analysis. (2024).
Lentzen, M. et al. A Transformer-Based Model Trained on Large Scale Claims Data for Prediction of Severe COVID-19 Disease Progression. IEEE J. Biomed. Health Inform. 27, 4548–4558 (2023).
Steinberg, E. et al. Language Models Are An Effective Patient Representation Learning Technique For Electronic Health Record Data. Preprint at http://arxiv.org/abs/2001.05295 (2020).
Rights (OCR), O. for C. The HIPAA Privacy Rule. https://www.hhs.gov/hipaa/for-professionals/privacy/index.html (2008).
Ford, E. et al. Should free-text data in electronic medical records be shared for research? A citizens’ jury study in the UK. J Med Ethics 46, 367–377 (2020).
Liu, X. et al. Self-Supervised Learning: Generative or Contrastive. IEEE Transactions on Knowledge and Data Engineering 35, 857–876 (2023).
Si, Y. et al. Deep representation learning of patient data from Electronic Health Records (EHR): A systematic review. Journal of Biomedical Informatics 115, 103671 (2021).
Amirahmadi, A., Ohlsson, M. & Etminani, K. Deep learning prediction models based on EHR trajectories: A systematic review. Journal of Biomedical Informatics 144, 104430 (2023).
Oss Boll, H. et al. Graph neural networks for clinical risk prediction based on electronic health records: A survey. Journal of Biomedical Informatics 151, 104616 (2024).
Hama, T. et al. Enhancing Patient Outcome Prediction through Deep Learning with Sequential Diagnosis Codes from Structural EHR: A Systematic Review. (2024). doi:10.2196/preprints.57358.
Shang, J., Ma, T., Xiao, C. & Sun, J. Pre-training of Graph Augmented Transformers for Medication Recommendation. Preprint at http://arxiv.org/abs/1906.00346 (2019).
Zeng, X., Linwood, S. L. & Liu, C. Pretrained transformer framework on pediatric claims data for population specific tasks. Sci Rep 12, 3651 (2022).
Lu, C., Reddy, C. K. & Ning, Y. Self-Supervised Graph Learning With Hyperbolic Embedding for Temporal Health Event Prediction. IEEE Trans. Cybern. 53, 2124–2136 (2023).
Xu, Y. et al. SeqCare: Sequential Training with External Medical Knowledge Graph for Diagnosis Prediction in Healthcare Data. in Proceedings of the ACM Web Conference 2023 2819–2830 (ACM, Austin TX USA, 2023). doi:10.1145/3543507.3583543.
Liu, S. et al. Multimodal Data Matters: Language Model Pre-Training Over Structured and Unstructured Electronic Health Records. IEEE J. Biomed. Health Inform. 27, 504–514 (2023).
Liu, Z. et al. Patient Clustering for Vital Organ Failure Using ICD Code With Graph Attention. IEEE Trans. Biomed. Eng. 70, 2329–2337 (2023).
Cao, Y., Wang, Q., Wang, X., Peng, D. & Li, P. Multi-gate Mixture of Multi-view Graph Contrastive Learning on Electronic Health Record. IEEE Journal of Biomedical and Health Informatics 1–13 (2023) doi:10.1109/JBHI.2023.3325221.
Kumar, S., Nanelia, A., Mariappan, R., Rajagopal, A. & Rajan, V. Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study. JMIR Med Inform 10, e28842 (2022).
Chen, Y.-P., Lo, Y.-H., Lai, F. & Huang, C.-H. Disease Concept-Embedding Based on the Self-Supervised Method for Medical Information Extraction from Electronic Health Records and Disease Retrieval: Algorithm Development and Validation Study. J Med Internet Res 23, e25113 (2021).
Liang, Z. et al. Deep generative learning for automated EHR diagnosis of traditional Chinese medicine. Comput Methods Programs Biomed 174, 17–23 (2019).
de Lusignan, S. et al. Analysis of Primary Care Computerised Medical Records with Deep Learning. Stud Health Technol Inform 258, 249–250 (2019).
Ruan, T. et al. Representation learning for clinical time series prediction tasks in electronic health records. BMC Med Inform Decis Mak 19, 259 (2019).
Li, Y. et al. BEHRT: Transformer for Electronic Health Records. Sci Rep 10, 7155 (2020).
Landi, I. et al. Deep representation learning of electronic health records to unlock patient stratification at scale. npj Digit. Med. 3, 1–11 (2020).
Wang, L., Tong, L., Davis, D., Arnold, T. & Esposito, T. The application of unsupervised deep learning in predictive models using electronic health records. BMC Medical Research Methodology 20, 37 (2020).
Wu, T., Wang, Y., Wang, Y., Zhao, E. & Yuan, Y. Leveraging graph-based hierarchical medical entity embedding for healthcare applications. Sci Rep 11, 5858 (2021).
Song, J. et al. Local–Global Memory Neural Network for Medication Prediction. IEEE Transactions on Neural Networks and Learning Systems 32, 1723–1736 (2021).
Rasmy, L., Xiang, Y., Xie, Z., Tao, C. & Zhi, D. Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. npj Digit. Med. 4, 1–13 (2021).
Huang, Y. et al. Patient Representation From Structured Electronic Medical Records Based on Embedding Technique: Development and Validation Study. JMIR Med Inform 9, e19905 (2021).
Meng, Y., Speier, W., Ong, M. K. & Arnold, C. W. Bidirectional Representation Learning From Transformers Using Multimodal Electronic Health Record Data to Predict Depression. IEEE J. Biomed. Health Inform. 25, 3121–3129 (2021).
De Freitas, J. K. et al. Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records. Patterns 2, 100337 (2021).
Pang, C. et al. CEHR-BERT: Incorporating temporal information from structured EHR data to improve prediction tasks. Preprint at http://arxiv.org/abs/2111.08585 (2021).
Huang, Y. et al. Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups. Journal of the American Medical Informatics Association 28, 2641–2653 (2021).
Chushig-Muzo, D., Soguero-Ruiz, C., de Miguel-Bohoyo, P. & Mora-Jiménez, I. Interpreting clinical latent representations using autoencoders and probabilistic models. Artificial Intelligence in Medicine 122, 102211 (2021).
Poulain, R., Gupta, M., Foraker, R. & Beheshti, R. Transformer-based Multi-target Regression on Electronic Health Records for Primordial Prevention of Cardiovascular Disease. in 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 726–731 (IEEE, Houston, TX, USA, 2021). doi:10.1109/BIBM52615.2021.9669441.
Shao, W. et al. Application of unsupervised deep learning algorithms for identification of specific clusters of chronic cough patients from EMR data. BMC Bioinformatics 23, 140 (2022).
Navaz, A. N., T. El-Kassabi, H., Serhani, M. A., Oulhaj, A. & Khalil, K. A Novel Patient Similarity Network (PSN) Framework Based on Multi-Model Deep Learning for Precision Medicine. JPM 12, 768 (2022).
Poulain, R., Gupta, M. & Beheshti, R. Few-Shot Learning with Semi-Supervised Transformers for Electronic Health Records. Proc Mach Learn Res 182, 853–873 (2022).
Zhang, Z., Yan, C., Zhang, X., Nyemba, S. L. & Malin, B. A. Forecasting the future clinical events of a patient through contrastive learning. Journal of the American Medical Informatics Association 29, 1584–1592 (2022).
Zhang, T., Chen, M. & Bui, A. A. T. AdaDiag: Adversarial Domain Adaptation of Diagnostic Prediction with Clinical Event Sequences. Journal of Biomedical Informatics 134, 104168 (2022).
Manzini, E. et al. Longitudinal deep learning clustering of Type 2 Diabetes Mellitus trajectories using routinely collected health records. Journal of Biomedical Informatics 135, 104218 (2022).
Herp, J. et al. Modeling of Electronic Health Records for Time-Variant Event Learning Beyond Bio-Markers—A Case Study in Prostate Cancer. IEEE Access 11, 50295–50309 (2023).
Ta, C. N. et al. Clinical and temporal characterization of COVID-19 subgroups using patient vector embeddings of electronic health records. Journal of the American Medical Informatics Association 30, 256–272 (2023).
Li, Y. et al. Hi-BEHRT: Hierarchical Transformer-Based Model for Accurate Prediction of Clinical Events Using Multimodal Longitudinal Electronic Health Records. IEEE J. Biomed. Health Inform. 27, 1106–1117 (2023).
Guo, L. L. et al. EHR foundation models improve robustness in the presence of temporal distribution shift | Scientific Reports. Scientific Reports 13, 3767 (2023).
Ru, B. et al. Comparison of Machine Learning Algorithms for Predicting Hospital Readmissions and Worsening Heart Failure Events in Patients With Heart Failure With Reduced Ejection Fraction: Modeling Study. JMIR Form Res 7, e41775 (2023).
Dong, B. et al. Toward a stable and low-resource PLM-based medical diagnostic system via prompt tuning and MoE structure. Sci Rep 13, 12595 (2023).
Pellegrini, C., Navab, N. & Kazi, A. Unsupervised pre-training of graph transformers on patient population graphs. Medical Image Analysis 89, 102895 (2023).
Jones, B. W., Taylor, W. D. & Walsh, C. G. Sequential autoencoders for feature engineering and pretraining in major depressive disorder risk prediction. JAMIA Open 6, ooad086 (2023).
Yang, Z., Mitra, A., Liu, W., Berlowitz, D. & Yu, H. TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records. Nat Commun 14, 7857 (2023).
Lemmon, J. et al. Self-supervised machine learning using adult inpatient data produces effective models for pediatric clinical prediction tasks. Journal of the American Medical Informatics Association 30, 2004–2011 (2023).
Blinov, P. & Kokh, V. Medical Profile Model: Scientific and Practical Applications in Healthcare. IEEE J. Biomed. Health Inform. 28, 450–458 (2024).
Seki, T., Kawazoe, Y. & Ohe, K. Graph Representation Learning-Based Fixed-Length Clinical Feature Vector Generation from Heterogeneous Medical Records. in Studies in Health Technology and Informatics (eds. Bichel-Findlay, J., Otero, P., Scott, P. & Huesing, E.) (IOS Press, 2024). doi:10.3233/SHTI231058.
Kraljevic, Z. et al. Foresight—a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study. The Lancet Digital Health 6, e281–e290 (2024).
Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023).
Albelwi, S. Survey on Self-Supervised Learning: Auxiliary Pretext Tasks and Contrastive Learning Methods in Imaging. Entropy 24, 551 (2022).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs] (2018).
Definitions, methods, and applications in interpretable machine learning. https://www.pnas.org/doi/epdf/10.1073/pnas.1900654116 doi:10.1073/pnas.1900654116.
Understanding the ICD-10 Code Structure. https://www.healthnetworksolutions.net/index.php/understanding-the-icd-10-code-structure.
Wattenberg, M., Viégas, F. & Johnson, I. How to Use t-SNE Effectively. Distill 1, e2 (2016).
Serrano, S. & Smith, N. A. Is Attention Interpretable? in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (eds. Korhonen, A., Traum, D. & Màrquez, L.) 2931–2951 (Association for Computational Linguistics, Florence, Italy, 2019). doi:10.18653/v1/P19-1282.
Guo, L. L. et al. A multi-center study on the adaptability of a shared foundation model for electronic health records. npj Digit. Med. 7, 1–9 (2024).
SNOMED International. Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT). https://www.snomed.org/ (1999).
Speith, T., Crook, B., Mann, S., Schomäcker, A. & Langer, M. Conceptualizing understanding in explainable artificial intelligence (XAI): an abilities-based approach. Ethics Inf Technol 26, 40 (2024).
Nauta, M. et al. Interpreting and Correcting Medical Image Classification with PIP-Net. Preprint at https://doi.org/10.48550/arXiv.2307.10404 (2023).
Hu, E. J. et al. LoRA: Low-Rank Adaptation of Large Language Models. Preprint at https://doi.org/10.48550/arXiv.2106.09685 (2021).
Gao, Y. et al. Retrieval-Augmented Generation for Large Language Models: A Survey. Preprint at https://doi.org/10.48550/arXiv.2312.10997 (2024).
Kim, J., Kim, J., Hur, K. & Choi, E. EHRFL: Federated Learning Framework for Heterogeneous EHRs and Precision-guided Selection of Participating Clients. Preprint at http://arxiv.org/abs/2404.13318 (2024).
Zaghir, J. et al. Prompt engineering paradigms for medical applications: scoping review and recommendations for better practices. Preprint at https://doi.org/10.48550/arXiv.2405.01249 (2024).
Tricco, A. C. et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann Intern Med 169, 467–473 (2018).

No competing interests reported.

Download PDF

Reviews received at journal
07 Oct, 2024
Reviewers agreed at journal
16 Sep, 2024
Reviewers agreed at journal
13 Sep, 2024
Reviewers invited by journal
11 Sep, 2024
Editor assigned by journal
10 Sep, 2024
Submission checks completed at journal
10 Sep, 2024
First submitted to journal
09 Sep, 2024

You are reading this latest preprint version

Self-supervised representation learning for clinical decision making using EHR categorical data: a scoping review

Status:

Version 1

Abstract

Figures

Introduction

Result

Studies characteristics

Type of model and trend

Type of data

EHR

Technical aspects

Clinical aspects

Discussion

Roadmap for using SSRL models in Clinical Decision-Making

Future directions

Methods

Study design and search strategy

Study selection

Data extraction and analysis

Abbreviations

Declarations

COMPETING INTERESTS

Author Contribution

ACKNOWLEDGEMENTS

References

Additional Declarations

Supplementary Files

Status:

Version 1