One is Not Enough: Multi-Agent Conversation Framework Enhances Rare Disease Diagnostic Capabilities of Large Language Models

doi:10.21203/rs.3.rs-3757148/v1

Download PDF

Article

One is Not Enough: Multi-Agent Conversation Framework Enhances Rare Disease Diagnostic Capabilities of Large Language Models

https://doi.org/10.21203/rs.3.rs-3757148/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Importance

This study adopted multi-agent framework in large language models to enhance diagnosis in complex medical cases, particularly rare diseases, revealing limitation in current training and benchmarking of LLMs in healthcare.

Objective

This study aimed to develop MAC LLMs for medical diagnosis, and compare the knowledge base and diagnostic capabilities of GPT-3.5, GPT-4, and MAC in the context of rare diseases.

Design, Setting and Participants

This study examined 150 rare diseases using clinical case reports published after January 1, 2022, from the Medline database. Each case was curated, and both the initial and complete presentations were extracted to simulate the different stages of patient consultation. A MAC framework was developed. Disease knowledge base was tested using GPT-3.5, GPT-4, and the MAC. Each case was subjected to the three models to generate one most likely diagnosis, several possible diagnoses, and further diagnostic tests. The results were presented for panel discussions with physicians. Disease knowledge was evaluated. The accuracy and scoring of the one most likely diagnosis, several possible diagnoses, and further diagnostic tests were also evaluated.

Main Outcomes And Measures:

Scoring of disease knowledge. Accuracy and scoring of the one most likely diagnosis, several possible diagnoses and further diagnostic tests.

Results

In terms of disease-specific knowledge, GPT-3.5, GPT-4, and MAC scored above 4.5 on average for each aspect. In terms of diagnostic ability, MAC outperformed GPT-3.5 and GPT-4 in initial presentations, achieving higher accuracy in the most likely diagnoses (28%), possible diagnoses (47.3%), and further diagnostic tests (83.3%). GPT-3.5 and GPT-4 exhibited lower accuracy in these areas. In complete presentations, MAC continued to demonstrate higher accuracies in the most likely diagnosis (48.0%) and possible diagnoses (66.7%) compared to GPT-3.5 and GPT-4. Diagnostic capability scoring also indicated higher performance for MAC.

Conclusion And Relevance

Despite the comprehensive knowledge base of GPT-3.5 and GPT-4, a noticeable gap exists in their clinical application for diagnosing rare diseases, underscoring the limitations in the current training and benchmarking methods of LLMs within the healthcare sector. Compared with single-agent models, the MAC framework markedly improves the diagnostic ability of LLMs, enabling more in-depth analysis. Therefore, the MAC framework is a promising tool for the diagnosis of rare diseases in clinical settings and warrants further research to fully explore its potential.

Health sciences/Health care

Physical sciences/Mathematics and computing/Computer science

Recent advancements in large language models (LLMs) have notably enhanced their capabilities in the medical field, leading to increased exploration of their potential applications¹. Although these models have shown proficiency in answering medical knowledge queries, it is imperative to assess their effectiveness in handling more practical and complex medical tasks, such as the diagnosis of intricate diseases^{2, 3}.

Rare diseases are conditions with a prevalence ranging from 5 to 76 cases per 100,000 individuals⁴. The low prevalence of these diseases often results in a scarcity of specialized knowledge and resources, making accurate diagnosis difficult⁵. Additionally, the complexity and variability of symptoms can lead to frequent misdiagnoses or delayed diagnoses, particularly in areas with limited medical resources^{6, 7}.

LLMs, equipped with vast medical databases and advanced analytical algorithms, offer promising solutions to these challenges⁸. The adoption of LLMs may provide crucial diagnostic support in areas with limited medical resources, thereby enhancing healthcare equity for patients with rare diseases.

However, although LLMs such as GPT-4 demonstrate substantial proficiency in medical knowledge, effectively translating this knowledge into practical clinical capabilities continues to pose a significant challenge. Previous studies demonstrated the limitations of LLMs in the diagnosis of complex cases. However, the lack of specialized training and clinical reasoning in these models hinders the accurate diagnosis of rare diseases. In addition, the scarcity and complexity of these conditions exacerbate the challenge of acquiring the necessary expertise.

In the context of a large language model, an agent refers to a system capable of receiving input and performing actions to achieve specific goals. For example, when interacting with ChatGPT, the user is engaging with a single-agent model. The multi-agent conversation (MAC) framework is an innovative approach where multiple digital agents work together and execute tasks through interactive discussion among themselves. This technique significantly enhances the capabilities of LLMs for managing complex tasks, including solving mathematical problems and performing retrieval-augmented code generation^{9, 10, 11, 12}. The adoption of MAC may facilitate dynamic and interactive diagnostic processes. Different agents can simulate the collaborative nature of a medical team, enabling a more nuanced understanding of complex medical information. By facilitating an in-depth analysis that single-agent models may not achieve, they have the potential to improve the performance of LLMs in the diagnosis of rare diseases.

This study aimed to develop a MAC for medical diagnosis and compare the knowledge base and diagnostic capabilities of GPT-3.5, GPT-4, and the MAC in the context of rare diseases.

Study Design

This comparative study evaluated the knowledge and diagnostic capabilities of GPT-3.5, GPT-4, and MAC for 150 rare diseases using real-world clinical case reports from the MedLine database. Each case encompasses two scenarios: an initial presentation and a complete presentation, which represent different stages of patient care. Figure 1 shows the flowchart of this study.

Data Collection

Selection of diseases to be studied

This study involved 150 rare diseases selected from a pool of over 7,000 across 33 types in the Orphanet Database, a comprehensive resource co-funded by the European Commission¹³.

Owing to the varied distribution of rare diseases among different types, a normalized weighted random sampling method was used for selection to ensure a balanced representation. The sampling weights were adjusted based on the disease count in each type and moderated by natural logarithm transformation^{14, 15}.

Search for clinical case reports

After the diseases were selected for investigation, clinical case reports published after January 2022 were identified from the MEDLINE database. The search was conducted by one investigator and reviewed by another investigator.

Inclusion and exclusion criteria

Clinical case reports were included if they 1) presented a complete clinical picture of a real patient diagnosed with a rare disease, including demographics, symptoms, medical history, and diagnostic tests performed. 2) were published in English. Case reports were excluded if they 1) lacked information required to make a diagnosis, 2) were not published in English, 3) were animal studies, 4) contained factual errors that would influence the diagnosis, and 5) reported diseases other than those in the intended literature search.

Manual screening

Two blinded investigators independently screened the search results using defined criteria. The first investigator selected case reports for the test, followed by repeated screening by the second investigator. Any disagreements were resolved through a group discussion.

For each disease, the search results were screened until an eligible case report was identified. If no suitable reports were identified, new random sampling within the same disease category was conducted to select a different disease.

Data Preparation

Data Extraction

One investigator manually extracted data from each clinical case report, which was subsequently reviewed by a specialist doctor. The extracted information includes patient demographics, clinical presentation, medical history, physical examination results, and outcomes of tests (e.g., genetic tests, biopsies, radiographic examinations), along with the final diagnosis.

Data Curation

Final and possible differential diagnoses from the original texts were removed from the inputs sent to the LLMs. Patient information was presented in two scenarios: initial and complete presentations, each representing a different stage in the diagnostic process.

Initial Presentation: This simulates the first clinical encounter, focusing on the LLM's ability to suggest probable diagnoses and further tests using initial information such as demographics, clinical presentation, physical examination, medical history, and routine test results. Complete Presentation: This simulates a fully informed diagnostic scenario, aiming to evaluate the capacity of the LLM to reach a final diagnosis with comprehensive data, including all initial information and results from additional diagnostic tests. Supplementary File 1 provides an example of patient information.

Model testing

Model selection

GPT-3.5-turbo and GPT-4 are commonly tested for medical purposes and were selected for model testing.

Multi-agent conversation

The MAC framework, aimed at diagnosing and generating knowledge about rare diseases (Fig. 2), was developed under AutoGen's structure using GPT-4. AutoGen is a novel framework that facilitates multi-agent collaboration using LLMs⁹. This setup simulated a medical team consultation with three doctor agents and one supervising agent. Doctor agents collected information on the patients' conditions, engaged in medical reasoning, and shared opinions during joint discussions. The supervising agent oversaw these conversations, challenged the doctors' findings and perspectives, and facilitated a consensus. The final output was derived through multiple rounds of collaborative discussions.

Prompt engineering

While the input-output (IO) prompt technique was adopted in the testing of base model GPT-3.5 and GPT-4, three other prompt techniques were also tested to evaluate whether prompt engineering enhanced diagnostic performance. These techniques include zero-shot chain of thoughts (COT), tree of thoughts (TOT), and reflections of thoughts (ROT). The ROT was developed in our previous study, which enabled models to retrospectively adjust their initial outputs, thereby potentially enhancing the overall quality and accuracy of the output¹⁶.

Generating Disease Specific Knowledge

GPT-3.5, GPT-4, and MAC were assessed for their knowledge of each rare disease covered in the study, including disease definition, epidemiology, clinical description, etiology, diagnostic methods, differential diagnosis, antenatal diagnosis, genetic counseling, management and treatment, and prognosis.

Generating Diagnosis and Recommended Tests

For the initial presentation, the LLMs were tasked with generating one most likely diagnosis, several possible diagnoses, and further diagnostic tests. For complete presentations, the LLMs were tasked with generating one most likely diagnosis and several possible diagnoses.

Performance Evaluation

Performance was evaluated through panel discussions with three doctors who were blinded to the models and reviewed the content in a randomized order.

Disease knowledge evaluation

The knowledge generated by the LLMs was evaluated using a Likert scale. As described by a previous study, the evaluation metrics included inaccurate or inappropriate content, omissions, potentially harmful content, and bias¹⁷.

Diagnostic ability evaluation

The most likely diagnosis was considered accurate if it matched the exact diagnosis. Possible diagnoses were considered accurate if it includes the exact diagnosis. The recommended tests were evaluated as helpful or unhelpful in reaching the correct diagnosis.

The most likely diagnosis and possible diagnoses were also rated using the scale described by Bond et al¹⁸: scale: 5 for the exact diagnosis mentioned, 4 for very close, 3 for closely related and potentially helpful, 2 for related but unlikely to help, and 0 for unrelated diagnoses. Further diagnostic tests were assessed using a five-point Likert scale ranging from 1 (strongly agree) to 5 (strongly disagree) on their helpfulness.

Statistical Analysis

Statistical analyses were performed using SPSS version 25 (IBM, Armonk, NY, USA) and GraphPad Prism version 8 (GraphPad Software, San Diego, CA, USA). Continuous variables are presented as means and standard deviations, and the Shapiro–Wilk test was used to check for a normal distribution. Depending on the distribution, an ANOVA, or Kruskal–Wallis test, was applied. Discontinuous data were expressed as incidence and rate and analyzed using the chi-square test for differences.

Study sample

This study included 150 kinds of rare disease from 33 different disease categories. One to nine kinds of rare disease were randomly selected category. Details of the sampled diseases and their corresponding clinical case reports are provided in Supplementary File 2.

Performance on disease specific knowledge

GPT-3.5, GPT-4, and MAC achieved an average score above 4.5 across all testing aspects, including inappropriate/incorrect content, omission, likelihood of possible harm, extent of possible harm, and bias. The results of disease-specific knowledge performance are shown in Fig. 3.

Performance on diagnostic ability

In the initial presentation, MAC achieved higher accuracy than GPT-3.5 and GPT-4 for the most likely diagnosis (28%), possible diagnoses (47.3%), and recommended tests (83.3%). For GPT-3.5 and GPT-4, the accuracies for the most likely diagnosis were 16.5% and 19.5%, respectively, and the accuracies for the possible diagnoses were 28.3% and 31.1%, respectively. The accuracy of the recommended tests was 67.3% for GPT-3.5 and 70.7% for GPT-4. Diagnostic capability scoring also indicated higher performance for MAC. Figure 4 and Supplementary File 3 present the results.

For complete presentation, MAC achieved higher accuracy than GPT-3.5 and GPT-4 for diagnosis (48.0%) and possible diagnoses (66.7%). For GPT-3.5 and GPT-4, the most likely diagnosis accuracies were 27.3% and 32.2%, respectively, and the accuracies for possible diagnoses were 41.6% and 41.9%, respectively. Diagnostic capability scoring also indicated higher performance for MAC. Figure 5 and supplementary file 4 present the results.

Table 1 presents representative examples of the diagnostic differences among MAC, GPT-3.5 and GPT-4. In the two examples, single-agent models are capable of diagnosing apparent symptoms, such as epilepsy, while MAC was able to identify the underlying cause of the apparent symptoms, such as Sotos Syndrome.

Table 1

Representative examples
Disease	Presentation	GPT-3.5	GPT-4	MAC
Disease	Presentation	Most Likely Diagnosis	Most Likely Diagnosis	Most Likely Diagnosis
Bardet-Biedl Syndrome	Initial Presentation	Behçet's syndrome (BS)	Recurrent Pericarditis	Recurrent pericarditis in the context of Bardet-Biedl syndrome.
Bardet-Biedl Syndrome	Complete Presentation	Pericarditis	Recurrent pericarditis	Bardet-Biedl Syndrome (BBS) and recurrent pericarditis
Sotos Syndrome	Initial Presentation	Epilepsy	Refractory Epilepsy	Focal epilepsy
Sotos Syndrome	Complete Presentation	Temporal lobe epilepsy	Temporal Lobe Epilepsy	Sotos syndrome

Effect of prompt engineering

Supplementary File 5 presents the results of different prompt techniques, including I/O, COT, TOT, and ROT. Improvements in diagnostic performance were not observed when different prompt techniques were used.

Study finding

Despite GPT-3.5, GPT-4, and the multi-agent model demonstrating comparable levels of satisfactory knowledge, the diagnostic outcomes for GPT-3.5 and GPT-4 were notably less effective in real-world cases. These findings demonstrate the gap between extensive knowledge base and clinical application of rare disease diagnosis for single-agent models. On the other hand, the implementation of a MAC framework substantially enhanced the diagnostic capabilities of LLMs in comparison with single-agent models. The more in-depth analysis provided by the MAC framework may uncover underlying causes of diseases that are overlooked by single-agent models, making it a potential approach to bridge this gap between knowledge base and clinical capabilities in disease diagnosis.

Diagnostic assessment in clinical scenarios

Although existing benchmarks offered comprehensive evaluation of LLMs’ medical knowledge, challenges remain in assessing their application in clinical senarios^{1, 17}. Therefore, this study tested the performance of LLMs in real-world cases by asking the model to generate accurate diagnoses instead of choosing answers from given options. The initial and complete presentation assessed LLMs’ clinical capabilities in different stages of the diagnostic process.

A previous study reported the diagnostic accuracy for rare diseases in 10 cases using GPT-3.5 and GPT-4, with an accuracy of 23% and 40%, respectively¹⁹, whereas the accuracy was 27.3% for GPT-3.5 and 32.3% for GPT-4 in this study.

Gap between knowledge base and diagnostic ability

In this study, a gap was identified between the extensive knowledge base and clinical diagnostic capabilities of LLMs. This disparity could be attributed to several factors. First, a robust medical reasoning capability is required for LLMs to make diagnoses based on patient conditions. However, recent studies have raised questions regarding the reasoning abilities of these models, suggesting potential limitations in their application to complex reasoning scenarios²⁰. Second, the training materials for LLMs are primarily structured in a question-and-answer format, with a focus on imparting general medical knowledge^{21, 22, 23}. However, this approach fails to provide training in specialized domains and does not sufficiently incorporate actual clinical practice. Given the vast number of these diseases, their low incidence rates, and the limited reporting of cases^{24, 25}, the creation of a comprehensive database for training LLMs in the management of rare diseases remains challenging. The diagnostic performance of GPT-3.5 and GPT-4 in rare disease cases in this study shows results similar to those reported in a previous study. In the previous study, the accuracies for GPT-3.5 and GPT-4 in diagnosing 10 rare disease cases were found to be 23% and 40%, respectively. In contrast, our study reports slightly different accuracies: 27.3% for GPT-3.5 and 32.3% for GPT-4¹⁹.

MAC outperforms single-agent models

In this study, the MAC framework significantly outperformed the single-agent models in terms of the accuracy of the most likely diagnosis, possible diagnoses, and recommended tests. The advantages of MACs lie in their ability to simulate a dynamic and interactive process, resembling a real-life medical team, and in their effectiveness in integrating and synthesizing complex patient data. It outperforms single-agent models in various tasks, including common question answering and complex mechanic problems^{9, 11, 26}. Despite their potential, the abilities of multi-agent collaboration remain largely unexplored in the medical field. To the best of our knowledge, only one recent study has developed a multi-agent debate framework that achieved higher accuracy in medical question-and-answering¹⁰. In this study, GPT-3.5 and GPT-4 are capable of diagnosing diseases based on apparent symptoms, such as identifying pericarditis and epilepsy through clinical presentation, as detailed in Table 1. However, they lack in-depth exploration of the underlying causes of these conditions. In contrast, the MAC framework, through more in-depth analysis via joint conversation, can determine that pericarditis in a specific case is caused by Bardet-Biedl Syndrome.

A supervisor agent is essential in the MAC framework, as it coordinates and monitors the concurrent conversation to ensure its quality²⁷. In this study, a supervisory agent was adopted to oversee the joint conversations among other agents, emulating the dynamics of a senior medical professional in a clinical setting. The supervisory agent conducted a focused discussion. Its ability to challenge the findings and perspectives of other agents fosters a deeper level of critical analysis, mirroring the peer-review process of medical teams. Furthermore, by facilitating consensus among the agents, the supervising agent effectively synthesizes various viewpoints, particularly in complex cases. However, it is crucial to recognize that identifying the most effective MAC framework for medical practice requires considerable future research.

Limitation

This study has several limitations. First, given the vast number of rare diseases, the sample size in this study was relatively small, representing only a preliminary exploration. Normalized random sampling was employed to enhance the representativeness of the selected cases. Second, only one multi-agent framework was tested; further research is required to explore potential influencing factors, such as the number of agents, number of conversation rounds, and structure of the conversation. Third, the initial presentation was manually extracted from the patient information to simulate the initial patient consultation. Although this was performed by medical professionals, the extraction was subjective and may not always accurately reflect the initial consultation. The rationale for this design was to reflect varying diagnostic needs at different stages of patient care.

Despite the comprehensive knowledge base of GPT-3.5 and GPT-4, a noticeable gap exists in their clinical application for diagnosing rare diseases, underscoring the limitations in the current training and benchmarking methods of LLMs within the healthcare sector. Compared with single-agent models, the MAC framework markedly improves the diagnostic ability of LLMs in real-world cases. Therefore, the MAC framework is a promising tool for the diagnosis of rare diseases in clinical settings and warrants further research to fully explore its potential.

Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nature medicine 29, 1930-1940 (2023).
Khera R, et al. AI in Medicine-JAMA's Focus on Clinical Outcomes, Patient-Centered Care, Quality, and Equity. Jama 330, 818-820 (2023).
Arora A, Arora A. The promise of large language models in health care. Lancet (London, England) 401, 641 (2023).
Richter T, et al. Rare Disease Terminology and Definitions-A Systematic Global Review: Report of the ISPOR Rare Disease Special Interest Group. Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research 18, 906-914 (2015).
Wright CF, FitzPatrick DR, Firth HV. Paediatric genomics: diagnosing rare disease in children. Nature reviews Genetics 19, 253-268 (2018).
Adachi T, et al. Enhancing Equitable Access to Rare Disease Diagnosis and Treatment around the World: A Review of Evidence, Policies, and Challenges. International journal of environmental research and public health 20, (2023).
Serrano JG, et al. Advancing Understanding of Inequities in Rare Disease Genomics. Clinical therapeutics 45, 745-753 (2023).
Kanjee Z, Crowe B, Rodman A. Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge. Jama 330, 78-80 (2023).
Wu Q, et al. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. (2023).
Smit A, Duckworth P, Grinsztajn N, Barrett T, Pretorius A. Are we going MAD? Benchmarking Multi-Agent Debate between Language Models for Medical Q&A. In: Deep Generative Models for Health Workshop NeurIPS 2023) (2023).
Chan C-M, et al. Chateval: Towards better llm-based evaluators through multi-agent debate. (2023).
Hong S, et al. Metagpt: Meta programming for multi-agent collaborative framework. (2023).
Nguengang Wakap S, et al. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. European journal of human genetics : EJHG 28, 165-173 (2020).
Efraimidis P, Spirakis P. Weighted Random Sampling. In: Encyclopedia of Algorithms (ed Kao M-Y). Springer US (2008).
West RM. Best practice in statistics: The use of log transformation. Annals of clinical biochemistry 59, 162-165 (2022).
Li J, et al. Are You Asking GPT-4 Medical Questions Properly?-Prompt Engineering in Consistency and Reliability with Evidence-Based Guidelines for ChatGPT-4: A Pilot Study. (2023).
Singhal K, et al. Large language models encode clinical knowledge. Nature 620, 172-180 (2023).
Amis AA, Dawkins GP. Functional anatomy of the anterior cruciate ligament. Fibre bundle actions related to ligament replacements and injuries. The Journal of bone and joint surgery British volume 73, 260-267 (1991).
Mehnen L, Gruarin S, Vasileva M, Knapp BJm. ChatGPT as a medical doctor? A diagnostic accuracy study on common and rare diseases. 2023.2004. 2020.23288859 (2023).
Berglund L, et al. The Reversal Curse: LLMs trained on" A is B" fail to learn" B is A". (2023).
Zhang X, Wu J, He Z, Liu X, Su Y. Medical exam question answering with large-scale reading comprehension. In: Proceedings of the AAAI conference on artificial intelligence) (2018).
Pal A, Umapathi LK, Sankarasubbu M. Medmcqa: A large-scale multi-subject multi-choice dataset for medical domain question answering. In: Conference on Health, Inference, and Learning). PMLR (2022).
Jin Q, Dhingra B, Liu Z, Cohen WW, Lu XJapa. Pubmedqa: A dataset for biomedical research question answering. (2019).
Bellgard MI, et al. Rare disease research roadmap: navigating the bioinformatics and translational challenges for improved patient health outcomes. 3, 325-335 (2014).
Hageman IC, van Rooij IA, de Blaauw I, Trajanovska M, King SKJOJoRD. A systematic overview of rare disease patient registries: challenges in design, quality management, and maintenance. 18, 106 (2023).
Liu Z, Zhang Y, Li P, Liu Y, Yang DJapa. Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization. (2023).
Lin F, Norrie DH, Flores R, Kremer R. Incorporating conversation managers into multi-agent systems. In: Proc. of the Workshop on Agent Communication and Languages, 4th Inter. Conf. on Autonomous Agents (Agents 2000), Barcelona, Spain) (2000).

There is NO Competing Interest.

Supplementaryfile1.Exampleofpatientpresentation.pdf
Supplementary file 1. Example of patient presentation
Supplementaryfile2.Rarediseasesincludedinthestudy.pdf
Supplementary file 2. Rare diseases included in the study
Supplementaryfile3.Resultsforinitialpresentation.pdf
Supplementary file 3. Results for initial presentation
Supplementaryfile4.Resultsforcompletepresentation.pdf
Supplementary file 4. Results for complete presentation
Supplementaryfile5.Resultsforpromptengineering.pdf
Supplementary file 5. Results for prompt engineering

Download PDF

Version 1

posted

You are reading this latest preprint version

One is Not Enough: Multi-Agent Conversation Framework Enhances Rare Disease Diagnostic Capabilities of Large Language Models

Status:

Version 1

Abstract

Importance

Objective

Design, Setting and Participants

Main Outcomes And Measures:

Results

Conclusion And Relevance

Figures

Introduction

Methods

Study Design

Data Collection

Data Preparation

Model testing

Model selection

Multi-agent conversation

Prompt engineering

Generating Disease Specific Knowledge

Generating Diagnosis and Recommended Tests

Performance Evaluation

Disease knowledge evaluation

Diagnostic ability evaluation

Statistical Analysis

Results

Study sample

Performance on disease specific knowledge

Performance on diagnostic ability

Effect of prompt engineering

Discussion

Study finding

Diagnostic assessment in clinical scenarios

Gap between knowledge base and diagnostic ability

MAC outperforms single-agent models

Limitation

Conclusion

References

Additional Declarations

Supplementary Files

Status:

Version 1