In our study, we utilized a comprehensive inpatient business view constructed on a big data platform, encompassing various aspects of patients' hospitalization, including diagnoses, surgeries, medications, and examinations across different business domains. Its flexibility enables multidimensional data analysis and presentation, allowing customized analyses tailored to specific needs. This facilitates a more comprehensive understanding of patients' conditions during their hospital stay. This not only aids healthcare professionals in better comprehending patient conditions but also meets management requirements for hospital operations and healthcare service quality. By integrating the QLoRA algorithm with ChatGLM2-6b and Llama2-6b models and fine-tuning them on a local SQL dataset, we successfully improved the model's performance on simple and moderate difficulty-level SQL queries. Notably, ChatGLM2-6b outperformed Llama2-6b, possibly due to its superior performance on a Chinese-language dataset. Our research also affirmed the effectiveness of the QLoRA algorithm on simple queries under limited computational resources, suggesting that with LLMs and local datasets, finer results could be achieved through fine-tuning.
Additionally, leveraging ChatGPT3.5 combined with zero-shot and few-shot learning, we successfully fine-tuned the model in scenarios with limited or unlabeled data. In zero-shot learning, the model showcased remarkable performance without any additional input samples, relying on prior knowledge and well-designed prompts, particularly outperforming the post-fine-tuned 6b model on simple queries. Few-shot learning further demonstrated the model's high adaptability to a small number of input samples, with results comparable to those of professional database engineers, especially on hard and medium queries. Compared to traditional supervised learning, these learning methods offer evident advantages in resource-constrained environments, reducing dependence on extensive labeled data. This provides an effective solution for scenarios with data scarcity and limited computational resources in practical applications.
Overall, through the integration of a big data platform's inpatient business view, the QLoRA algorithm, and ChatGLM2-6b and Llama2-6b models, along with fine-tuning on a local SQL dataset, we have successfully obtained robust empirical support in the medical domain. Compared to manually crafting SQL statements, NL2SQL generation based on large models rapidly translates natural language into SQL statements, significantly saving time and human resources. With sufficiently intelligent large models, achieving the level of professional database engineers can be realized with appropriate prompts and a small number of samples, enabling non-professionals to easily conduct data analysis.
By integrating this approach with the hospital's big data platform, we can establish diverse database view covering outpatient, inpatient, medical insurance, financial settlements, and more. The careful selection of foundational LLMs and the utilization of NL2SQL generation empower the handling of various medical data analysis tasks. Furthermore, the integration of daily work activities facilitates the development of a comprehensive data query knowledge base, thereby enhancing the precision of the LLMs through the incorporation of extensive external knowledge. Utilizing Python toolkits such as LangChain [30], and configuring distinct prompts, provides tailored interfaces for various hospital management applications. These applications include medical retrieval for patients [31], medical quality management [32], dialogue response generation task [33], as well as BI-systems in medicine [34]. This approach lays the groundwork for future applications to conform to the outlined patterns, offering a promising avenue for enhanced efficiency and functionality.