Clustering of patient comorbidities within electronic medical records enables high-precision COVID-19 mortality prediction

doi:10.21203/rs.3.rs-374482/v1

Download PDF

Article

Clustering of patient comorbidities within electronic medical records enables high-precision COVID-19 mortality prediction

https://doi.org/10.21203/rs.3.rs-374482/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

We present an explainable AI framework to predict mortality after a positive COVID-19 diagnosis based solely on data routinely collected in electronic healthcare records (EHRs) obtained prior to diagnosis. We grounded our analysis on the ½ Million people UK Biobank and linked NHS COVID-19 records. We developed a method to capture the complexities and large variety of clinical codes present in EHRs and we show that these have a larger impact on risk than all other patient data but age. We use a form of clustering for natural language processing of the clinical codes, specifically, topic modelling by Latent Dirichlet Allocation (LDA), to generate a succinct digital fingerprint of a patient’s full secondary care clinical history, i.e. their comorbidities and past interventions. These digital comorbidity fingerprints offer immediately interpretable clinical descriptions that are meaningful, e.g. grouping cardiovascular disorders with common risk factors but also novel groupings that are not obvious. The comorbidity fingerprints differ in both their breadth and depth from existing observational disease associations in the COVID-19 literature. Taking this data-driven approach allows us to avoid human-induction bias and confirmation bias during selection of what are important potential predictors of COVID-19 mortality. Together with age these digital fingerprints are the single most important factor in our predictor. This holds the potential for improving individual risk profiling for clinical decisions and the identification of groups for public health interventions such as vaccine programmes. Combining our digital precondition fingerprints with demographic characteristics allow us to match or exceed the performance of existing state-of-the-art COVID-19 mortality predictors (EHCF) which have been developed through expert consensus. Our precondition fingerprinting and entire mortality prediction analytics pipeline are designed so as to be rapidly redeployable, e.g. for COVID-19 variants or other pre-existing diseases.

Bioinformatics

Biotechnology and Bioengineering

COVID-19

mortality prediction

comorbidities

Due to technical limitations, full-text HTML conversion of this manuscript could not be completed. However, the manuscript can be downloaded and accessed as a PDF.

(Not answered)

Download PDF

Version 1

posted

You are reading this latest preprint version

Clustering of patient comorbidities within electronic medical records enables high-precision COVID-19 mortality prediction

Status:

Version 1

Abstract

Figures

Full Text

Additional Declarations

Status:

Version 1