The proposed voice AI model discriminates between CD and CN individuals by analyzing “one-minute conversations”. The high discrimination accuracy of 0.950 attained through our simple method demonstrates the feasibility of using short conversational voices as a practical screening tool for analyzing cognitive function and alerting the individual or family to possible cognitive decline, leading to early treatment.
Extensive research on AI-based dementia assessment, particularly for AD, has been conducted worldwide. Practical digital biomarkers for diagnosing dementia can reduce the burden on clinical practice. However, it is difficult to realize this using only simple evaluation methods, such as short conversations. The diagnosis of dementia is complicated and should not be based solely on neuropsychological test scores, blood, cerebrospinal fluid, or imaging biomarkers. The DSM-5 criteria for diagnosing dementia involve significant cognitive decline from a previous level of activity, impairment in activities of daily living, and the exclusion of psychiatric disorders. Since detailed interviews with family members and caregivers, understanding of actual living conditions, and exclusion of psychiatric disorders are essential for diagnosis, we assumed that simple digital biomarkers alone could not cover all these criteria.[34]34 In recent years, significant progress has been made in developing AI systems that use observational methods to assess the impact of daily lives on the well-being of the elderly.[35]35 Although privacy concerns remain, the future promises to introduce digital biomarkers capable of diagnosing cognitive decline and dementia by integrating multiple assessments.
To clinically diagnose dementia based on pathological findings, cerebrospinal fluid, blood, and neurological imaging must be performed to identify abnormal protein accumulation in the brain. For example, diagnosing AD based on the ATN classification is essential to confirm amyloid beta and tau protein accumulation for DMT.[36]36 Pathologic diagnosis of DLB and other synucleinopathies using cerebrospinal fluid and neuroimaging is also nearing.[37,38]37,38 In an era where DMTs are effectively used, it is crucial to diagnose neurodegenerative dementias pathologically. Moreover, it is essential to confirm the results of imaging modalities, such as positron emission tomography (PET) and magnetic resonance imaging (MRI), as well as cerebrospinal fluid or blood biomarkers. Because diagnosing dementia using AI without clinical tests is challenging, focus has shifted from the development of digital biomarkers for diagnosing dementia to designing strategies aimed at facilitating early medical intervention for patients with cognitive decline, that is, at an early stage of dementia. This shift allows the effective utilization of the forthcoming therapeutic agents. In this regard, our proposed approach, without requiring specialized environments or equipment, represents a highly significant milestone.
Conversation and language abilities undergo impairment in the early stages of most dementia.[5]5 Recent studies have focused on AI-based assessments that employ speech and language. Typical testing procedures involve extracting pertinent features and subsequently inputting them into machine- or deep-learning classifiers to identify patterns consistent with dementia. Two primary features—acoustic and linguistic—are extractable and analyzable from human conversational voice.[10]10 Acoustic features delineate how individuals articulate speech, while linguistic features describe content aspects, such as vocabulary, grammar, and syntax. According to a recent review article, the extraction and analysis of linguistic features exhibit better accuracy (0.925) than the utilization of acoustic features alone (0.786). Employing linguistic and acoustic features in the AI analysis outperformed (0.939) using either feature in isolation.[10]10 The voice analysis AI developed in this study predominantly analyzed acoustic features and achieved a higher accuracy rate (0.950) than previous studies employing acoustic features alone. Using acoustic features alone may have the following advantages over using linguistic features: conversion errors do not affect the analysis results because there is no need to convert conversational voice into text, only a short conversation sample is needed, and the effects of the dialect characteristics of Japanese are reduced.
Recently, two type of tests mainly have been developed to analyze conversational voices: a picture description test (participants describe a picture, and their voice is recorded) and a conversation generated by an interview test. In our study, although voice data for ML were obtained from picture descriptions and interviews, the discrimination test was based only on “one-minute conversations”. The one-minute conversation was not provoked using a special task as in the picture description task, but rather was a spontaneous conversation based on an individual’s episodic memory, which resembles an interview task. In interview-based diagnosis, subjects answer multiple questions posed by humans or avatars, and their acoustic and linguistic features are analyzed to discriminate patients with cognitive decline and dementia.[20,39-41]20,39-41 However, the subjects are required to answer multiple questions, which makes the test time-consuming and may give the subject the impression that they are being tested for cognitive function. In contrast, the possibility of discriminating between CD and CN with high accuracy using a short conversational voice data obtained from only one question makes this method simple, and it can be widely used in clinical settings. Another advantage is that, unlike tests with definite correct answers, freeform conversations have no fixed answers. This reduces the learning effect, making repeated administration of the tests easier. The proposed voice AI model identified CN with 100% accuracy. The absence of false positives (that is, a CN diagnosed as a CD) indicates its usefulness as a screening test in a real clinical situation, preventing unnecessary worry or anxiety in healthy individuals and avoiding the unnecessary medical burden and cost of additional testing.
However, our voice AI model cannot detect MCI-level cognitive decline equivalent to an MMSE score of 24–27, considering our cutoff score was 23/24. Identifying patients with cognitive impairment before progression to dementia, specifically at the SCD and MCI levels, is crucial, as early intervention provides more opportunities for the prevention, care, and effective use of treatments such as DMT for AD. To identify cognitive impairment at the MCI level, it is necessary to adjust the cutoff MMSE score for data labeling and perform ML by incorporating CDR results, which are worth 0.5, and the results of more detailed neuropsychological tests. In the future, additional ML using voice data from individuals with milder cognitive decline should be performed to explore the potential for detecting such decline with higher accuracy. Another limitation of this study is that the voice data was collected only once per individual, making it challenging to evaluate individual data repeatedly to confirm that the decision of the voice AI model is always the same in the same individual. In other words, the possibility that the discrimination results in the same individual may differ depending on the condition of the day (e.g., lack of sleep, alcohol consumption, and accidental forgetfulness) needs to be considered to avoid making an incorrect decision.
The proposed voice AI model is novel in its ability to accurately detect cognitive decline based solely on minute-long conversations. Thus far, no free conversation-based AI application has received pharmaceutical approval for dementia detection. This technology aims to develop AI medical software for detecting cognitive decline using minute-long conversations accessible via mobile devices such as smartphones. Patients with mild dementia are often unaware of their cognitive decline, resulting in fewer proactive visits to healthcare facilities for cognitive assessment. Even when family or friends raise suspicion, patients often resist a cognitive assessment. In addition, geographical barriers make it difficult to seek medical evaluation, particularly in rural areas. Therefore, developing AI medical software that is universally accessible, respects personal dignity and privacy, and imposes minimal mental, physical, and financial burdens to support dementia diagnosis will improve diagnostic accuracy and its widespread adoption. In addition, digital biomarkers based on language and conversation could detect changes in cognitive function before conventional medical examinations, offering potential applications for early diagnosis and detection of mental disorders such as depression.[42]42 While uncertainty remains on how to connect family members diagnosed with cognitive decline to medical institutions, we believe that an AI-assisted simple cognitive function screening tool, using a short conversational voice, can be valuable in an era where dementia is on the rise.