In evidence-based medicine, structured and well-formulated materials bring clinicians more valuable evidence and boost efficiency of retrieving appropriate resources for medical treatment[1, 2]. As a prominent type of medical literatures, Case Report detailedly records the first discovery of patients with rare disease, symptom, special clinical prognosis or novel treatment to achieve the share of clinical experiences and knowledge in the practice of medicine with clinicians around the world. Case Report is the integration of optimal research evidence with clinical expertise and case values, which is a significant source of pedagogical reference. Facing with abnormal conditions in diagnosis, it is time-consuming for clinicians to read all of case reports completely to assist their treatment because of its overwhelming size and rapid growth. Focus on this limitation, constructing a comprehensive information system with high quality and structured multimodal summary from Case Report is of highly necessity. Moreover, medical ontology is applied in the system for a referential and efficient organizing and searching on Case Report.
From Egyptian period, clinicians began to write down medical knowledge acquired from practical experience on antiquity papyrus as clinical case notes[3]. By the end of 18th century, case notes had evolved into standardized case description essays and been organized into sections mainly including general patient information, past history, details of examination, treatment and subsequent course of condition[4, 5]. Since 1893, PubMed Central (PMC) began to collect case essays as official medical literatures named Case Report. Due to the limitation of literature preservation in that period, only scanned copies of original print version of Case Reports were available before editable full texts appeared in 1978, which contributed greatly to the structured extraction on Case Report. Nowadays, Case Report has formed a conventional template[6, 7] by authors over the world, where 5 main sections[8] (Fig. 1) are summarized: (i) sign and symptom of patient; (ii) detection procedure on patient; (iii) treatment strategy; (iv) result of treatment; (v) clinical follow-up.
Recent years, studies on clinical knowledge have mainly focused on electronic medical record (EMR), where case description is written in a fixed pattern to improve clinical and healthcare efficiency from operational standpoint. Researchers have developed methods to extract features from EMR and built clinical support system[9, 10] for more personalized clinical care. Accordingly, a series of computational methods, namely TEPAPA[11], Deepr[12] and DeepCare[13], have been proposed to address issues on patient risk prediction and clinical description extraction based on deep learning models[14–15]. Most researchers have focused on extracting clinical information from EMR texts such as patients’ histories, current symptoms, conditions and biomarkers, which could assist diagnosis and lead to a development of more targeted clinical interventions[16–18]. A free accessible EMR database[19], integrating a decade of detailed information about individual patient care, was established, which has promoted the development of some methods for analysis and prediction on EMR[20], for instance, de-identification to patient EMR privacy protection[21], prediction of clinical events[22], interactive semantic search on clinical records[23]. Considerable achievements have been made in extracting valuable information from EMR to assist clinicians, however, its shortcomings are obvious like lower accessibility resulting from ethical and privacy authorization requirements from hospitals, weaker representative and less reference value owing to massive collections of all patient cases without any screening process. On the contrary, Case Report is published as medical research literature with the advantages of large quantities, high availability and strong educational guidance, which have not been sufficiently utilized yet.
Along with the ever-increasing amounts of open access biomedical papers, literature retrieval is not restricted to title and abstract any more, but combining content in full text such as cited statements[24], linking the clinical entities extracted from electronic records with the biomedical literatures[25], retrieval biomedical figures[26]. Researches on literature retrieval have achieved great performance, however, few efforts have been focused on Case Report. Luo et al. constructed a machine learning-based model[27, 28] to automatically extract main findings from abstracts and full texts on the basis a manually annotating corpus[29] of main finding sentences. Even though a brief description of final conclusion or diagnosis are included in main finding sentences of Case Report, a substantial amount of clinical information (i.e. symptom of patients, laboratory tests and medical imaging figures) scatters over other sections of the full text. Moreover, clinicians could be uncertain the specific disease names facing patients with unusual phenotypes and need to review Case Reports by matching similar clinical manifestation. Part of clinicians prefer obtaining patient conditions through medical images due to their direct expression. Open-i also provides retrieval on case report figures by matching textual keywords with captions and abstracts, however, clinicians could be uncertain the specific disease names facing patients with unusual phenotypes and need to review Case Reports by matching similar clinical manifestation. The accuracy of searching results depend on whether the query terms close or match to the topics. It is obvious that most existing literature retrieval methods only provide keywords related to the title or topics, which are unworkable in case of unknown diseases where clinicians usually resort to matching similar clinical manifestation in Case Reports. It can be concluded that it is urgent to build a comprehensive literature retrieval framework that allows clinicians to both search patient symptoms or signs and browse medical imaging figures in structured form to support their diagnosis especially under the situation of indistinction for some rare diseases.
To improve the identification and diagnosis of rare diseases, open knowledge sharing and data structuring are crucial to assist clinicians. In this work, we propose a novel method of extracting multimodal summary (i.e. medical imaging figures, clinical and biological entities) from Case Reports and applying disease ontology, symptom ontology and body system ontology for reconstructing case summary to build CRFinder multimodal information system. Additionally, a user-friendly interface is designed for clinicians to retrieve and analyze appropriate Case Report by using medical ontology filters and browsing medical imaging figures, which could be regarded as hints. Specially, the main target users of CRFinder information system are the clinicians with less clinical experience. If junior clinicians meet patients with rare diseases that are beyond their clinical knowledge, they could be uncertain to the exact disease name or treatment except symptom and lesions of body part. This system would efficiently assist clinicians retrieve information they need. The contributions of this study can be summarized as follows:
-
To summarize key information from Case Report, a comprehensive structured multimodal information system of open access medical Case Report, CRFinder, is first developed, which comprises structured case summary of clinical and biological entities and medical images of different modalities.
-
Medical ontologies of disease, symptom and body system are applied for reconstructing multimodal summaries. We design a novel retrieval pattern on Case Report by browsing extracted medical figures, which provides clinicians with medical ontology-based hints and improves identification of rare disease and unexpected association between diseases or symptoms.
-
CRFinder system also provides a user-friendly web-based retrieval platform to assist clinicians have convenient and effective visualization and analysis on different modalities of essential information extracted from Case Report. Two retrieval functions embedded in this system including medical figures browsing and keywords searching cater to clinicians’ retrieval preference of rare disease.