Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that raises major concerns in healthcare due to its irreversibility and high prevalence among older adults1. Despite decades of research, treatment options for AD remain limited, leaving patients and families with little hope. Drug repurposing to identify novel therapeutic applications for existing drugs is an attractive additional approach to discovering treatment options compared to the costly and time-consuming process of new drug development alone, particularly for serious, widespread conditions that continue to have few effective treatments, such as AD2. In addition to accelerated timelines and lower costs throughout the discovery-to-market process, the approach offers well-established drug safety profiles and expedited clinical translation with enhanced patient accessibility. Nevertheless, the success of drug repurposing hinges on the prompt and accurate identification of promising candidates among a large collection of drugs.
The search for drug repurposing candidates typically relies on a comprehensive review of the scientific literature, focusing on studies that offer evidence of efficacy for certain drugs or their constituent ingredients. Mechanistic insights, preclinical experiments, clinical reports, large-scale observational studies, and drug repurposing databases collectively form the space within which searches are conducted. However, this review process is labor- and time-intensive, requiring researchers to incorporate interdisciplinary expertise in disease mechanisms, molecular biology, pharmacology, clinical research, and bioinformatics. As such, approaches that streamline this process offer an advantage in repurposing efforts.
Recent advancements in generative artificial intelligence (GAI), exemplified by OpenAI’s ChatGPT3, have showcased the remarkable capability of AI to understand and respond to diverse inquiries. The comprehension and response capabilities of GAI derive from extensive exposure to a vast corpus from the Internet, nuanced encoding of knowledge, and subsequent optimization of responses that display reasoning processes4,5. Beyond answering general questions, GAI has demonstrated effectiveness in specialized medical contexts6, including U.S. Medical Licensing Examination queries7, clinical decision-making consultations8,9, and medical research assessments10,11. Notably, ChatGPT is already being leveraged by biotechnology companies to suggest novel pathways for drug targets12. However, given its nascent stage and concerns regarding fabrication of information13,14, responsible deployment of this tool in the medical setting necessitates comprehensive verification of its functional utility and reliability with clinical data in the real world.
We hypothesized that ChatGPT can function as an AI-driven screening tool to generate drug repurposing candidates for AD. To assess this hypothesis, we provided ChatGPT (model GPT-4) with two sequential prompts. First, we prompted ChatGPT to provide the twenty most promising drug repurposing candidates for AD. Next, we prompted ChatGPT to confirm its previous output and return a final list of drugs (Fig. 1a). To account for the probabilistic nature of ChatGPT’s responses, we repeated this process ten times, resulting in a total of 59 unique drug candidates (Supplementary Table 1). We confirmed that each candidate appeared in at least one publication discussing their potential use in AD. We then identified the ten most frequently appearing drugs for subsequent testing with clinical data (minimum frequency N = 7, maximum frequency N = 10).
For each generated candidate, we composed two cohorts using de-identified electronic health record (EHR) data from large clinical datasets: 1) Vanderbilt University Medical Center (VUMC), and 2) the National Institutes of Health (NIH) All of Us Research Program15 (Fig. 1b). We employed Cox proportional hazards regression to compare the risk of developing AD between individuals with prior drug exposure and individuals never exposed to the drug. We used age 65 as time zero; prior drug exposure was defined by medication use ≤ 65 years of age. Each drug-exposed cohort was matched to an unexposed group based on propensity score (PS), using sex, race, EHR length after age 65, and drug-specific comorbidities at age 65 (i.e., at the time of cohort entry) as covariates. Drug-specific comorbidities were selected based on primary clinical indication. Given that the cohort size for a particular drug might not be sufficiently large in the independent datasets, we also performed a meta-analysis to derive a statistically robust estimate of each drug’s hazard ratio.
We observed that three of the top ten ChatGPT recommendations were associated with a significantly reduced risk of AD after ten years of follow-up using VUMC data: the antidiabetic medication metformin (hazard ratio (HR) = 0.67, 95% confidence interval (CI): 0.54–0.82, p < 1.5✕10− 4), the antihypertensive agent losartan (HR = 0.73, 95% CI: 0.57–0.92, p = 0.009), and the antibiotic minocycline (HR = 0.34, 95% CI: 0.13–0.89, p = 0.028) (Fig. 2). Though our studies with All of Us were limited by smaller sample sizes, metformin showed treatment effects in the expected direction (i.e., HR < 1). While not statistically significant at p < 0.05, the lipid-lowering medication simvastatin and the antidiabetic medication pioglitazone also exhibited beneficial treatment effects in both the VUMC and All of Us data.
In the meta-analysis, we confirmed the protective effect of metformin (HR = 0.67, 95% CI: 0.55–0.81, p = 6.4✕10− 5). The meta-analysis also revealed a statistically significant protective treatment effect for simvastatin (HR = 0.84, 95% CI: 0.73–0.98, p = 0.024) that had not been identified in either the VUMC or All of Us data in isolation. Losartan was found to have a significant protective treatment effect in meta-analysis as well (HR = 0.76, 95% CI: 0.60–0.95, p = 0.017); however, the effect estimates from VUMC and All of Us were opposing in their directionality.
Inadequate AD case counts (N < 5) prevented the evaluation of bexarotene and nilotinib in both VUMC and All of Us. The effects of minocycline, candesartan, rapamycin, and lithium could not be tested in All of Us for the same reason.
We found that ChatGPT’s utility as a drug repurposing tool resides in its ability to follow instructions pertaining to drug repurposing and rapidly synthesize information from relevant literature. ChatGPT did not propose any FDA-approved drugs for AD, suggesting that it accurately interprets the premise of drug repurposing. In this study, the drugs suggested with the highest frequency by ChatGPT were not novel repurposing candidates for AD, but rather drugs frequently mentioned together with AD in the literature. Antidiabetic drugs such as metformin and pioglitazone have received considerable attention as potential therapeutic candidates for AD, driven by increasing evidence implicating insulin resistance in the pathogenesis of AD16–18. Similarly, reported associations between AD and cardiovascular disease have sparked numerous investigations into the repurposing of cardiovascular drugs for AD, including statins and antihypertensive agents such as losartan and candesartan19–21. Rapamycin, nilotinib, lithium, and bexarotene have also been heavily explored in AD drug repurposing studies22–24.
We observed protective effects against AD for three of the ten drugs most frequently suggested by ChatGPT–metformin, simvastatin, and losartan–in meta-analysis combining data from two large-scale EHRs. Use of metformin, which produced the strongest signal in our meta-analysis, was associated with a 33% decreased risk of incident AD after age 65. Simvastatin and losartan produced more modest effects. In meta-analysis, simvastatin was associated with a 16% decreased risk of AD, while losartan was associated with a 24% decreased risk of AD. Whereas metformin and simvastatin were found to have consistent treatment effects (HR < 1) in both VUMC and All of Us, losartan had conflicting treatment effects (statistically significant HR < 1 using VUMC data, non-significant HR > 1 using All of Us data). This suggests that losartan's protective treatment effect in meta-analysis may have been driven by the larger sample size from VUMC. Despite supporting findings for these three drugs in previous studies, much remains unknown about the mechanisms by which these drugs affect AD pathophysiology and pathology, and population-based studies have not provided conclusive results25–27. Further investigation in preclinical and clinical studies will be needed to ascertain the viability of these drugs in decreasing risk of AD.
Our findings suggest that ChatGPT can generate quality hypotheses for drug repurposing. ChatGPT expedites the process of extensive literature review, which has become infeasible for humans to perform alone. With minimal costs, ChatGPT has the capacity and scalability to substantially accelerate the review process, allowing researchers to focus on testing and validating the hypotheses. Moreover, the anticipated regular updates of ChatGPT (which provide access to new Internet content) and its search engine plugins allow for consistently up-to-date and uninterrupted drug repurposing research. Furthermore, combining ChatGPT-powered hypotheses with robust verification using real-world clinical datasets provides a cost-effective pipeline to investigate preliminary signals before allocating additional resources to extensive research and clinical trials. This validation process serves as a critical balancing force to disprove invalid hypotheses, assuaging concerns about adverse consequences of AI hallucinations–a major criticism of ChatGPT use. Despite these advantages, any pipelines incorporating ChatGPT must account for the possibility of overlooked, but promising, repurposing candidates, which can transpire when candidates exhibit low occurrence in the literature or necessitate complex reasoning ability based on indirect evidence that surpasses ChatGPT's capabilities.
Our study has several limitations of note. First, we relied upon frequency to prioritize drug candidates; however, the number of times a repurposing candidate appears in ChatGPT queries may not be directly related to its promise in treating disease. Second, EHRs can contain missing or incomplete data28, and discontinuities in medication adherence may not be reported with perfect fidelity, creating possibilities for misclassification of outcome or exposure. Third, despite the use of two large EHRs, we still did not have adequate statistical power for hypothesis testing of less common drugs (e.g., nilotinib). Fourth, while our study evaluated drug exposure broadly as any-time, any-dose exposure ≤ 65 years of age, there exist many opportunities for deeper phenotyping in characterizing drug exposure. Fifth, we sought to control for a single primary indication for each drug using MEDI; however, we were unable to establish a clear primary indication for several drugs (i.e., nilotinib, bexarotene, minocycline, and rapamycin). Furthermore, a fully balanced covariate distribution was not achieved for metformin and simvastatin (standardized mean difference > 0.1 for EHR length after 65 and drug-specific comorbidities), suggesting there may be some residual confounding (although likely to bias towards the null). Sixth, this study cannot establish causal effects or mechanisms as might be the case in a clinical trial. Lastly, although ChatGPT exhibits exceptional response quality for general queries, further research is required to benchmark a range of GAI models and their fine-tuned variants for greatest effectiveness and reliability in supporting biomedical tasks, particularly drug repurposing.
Still, this proof-of-concept study showcases the feasibility of employing ChatGPT as an AI-driven hypothesis generator for drug repurposing, enabling the prompt generation of a promising list of drugs for subsequent testing in EHRs, using AD as a case study. Our findings suggest that ChatGPT is able to encode valuable insights concerning novel potential therapeutic utilities for existing drugs by comprehensively synthesizing literature, and can subsequently decode this knowledge when responding to queries. Pipelines that leverage the capabilities of ChatGPT offer a streamlined new framework for drug repurposing that can be applied to numerous diseases.