Detecting Cognitive Impairment from Language and Speech for Early Screening of Alzheimer's Disease Dementia with Interpretable Transformer-Based Language Models

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Detecting Cognitive Impairment from Language and Speech for Early Screening of Alzheimer's Disease Dementia with Interpretable Transformer-Based Language Models

Published Date

2024-05

Publisher

Type

Thesis or Dissertation

Abstract

Alzheimer’s disease (AD) is a neurodegenerative disorder that affects the use of speech and language and is diffcult to diagnose in its early stages. Neural language models (NLMs) have delivered impressive performance on the task of discriminating between language produced by cognitively healthy individuals, and those with AD. As artificial neural networks (ANNs) grow in complexity, understanding their inner workings becomes increasingly challenging, which is particularly important in healthcare applications. The intrinsic evaluation metrics of autoregressive NLMs (e.g., predicting the next token given the context), such as perplexity (PPL), reflecting a model’s “surprise” at novel input, and have been widely used to understand the behavior of NLMs. As an alternative to fitting model parameters directly, this thesis proposes a novel method by which a pre-trained transformer-based NLM, GPT-2, is paired with an artificially degraded version of itself, GPT-D, to compute the ratio between these two models’ PPLs on language from cognitively healthy and impaired individuals. This technique approaches state-of-the-art (SOTA) performance on text data from a widely used “Cookie Theft” picture description task, and unlike established alternatives also generalizes well to spontaneous conversations, the degraded models generate text with characteristics known to be associated with AD, demonstrating the induction of dementia-related linguistic anomalies. The novel attention head ablation method employed in this thesis exhibits properties attributed to the concepts of cognitive and brain reserve in human brain studies, which postulate that people with more neurons in the brain and more effcient processing are more resilient to neurodegeneration. The results show that larger GPT-2 models require a disproportionately larger share of attention heads to be masked/ablated to display degradation of similar magnitude to masking in smaller models. To realize their benefits for assessment of mental status, transformer-based NLMs require verbatim transcriptions of speech from patients. While such models have shown promise in detecting cognitive impairment from language samples, the feasibility of deploying such automated tools in large-scale clinical settings depends on the ability to reliably capture and transcribe the speech input. Currently available automatic speech recognition ASR solutions have improved dramatically over the last few years but are still not perfect and can have high error rates on challenging speech, such as speech from audio data with sub-optimal recording quality. One of the key questions for successfully applying ASR technology for clinical applications is whether imperfect transcripts generated by ASR provide sucient information for downstream tasks to operate at an acceptable level of accuracy. This thesis examines the relationship between the errors produced by several transformer-based ASR systems and their impact on downstream dementia classification. One of the key findings is that ASR errors may provide important features for this downstream classification task, resulting in better performance compared to using manual transcripts. In summary, this thesis is a step toward a better understanding of the relationships between the inner workings of generative NLMs, the language that they produce, and the deleterious e↵ects of dementia on human speech and language characteristics. The probing methods also suggest that the attention mechanism in transformer models may present an analogue to the notions of cognitive and brain reserve and could potentially be used to model certain aspects of the progression of neurodegenerative disorders and aging. Additionally, the results presented in this thesis suggest that the ASR models and the downstream classification models react to acoustic and linguistic dementia manifestations in systematic and mutually synergistic ways, which would have significant implications for use of ASR technology. This line of research enables the automated analysis of speech collected from patients at least in the dementia screening settings, and it has the potential to expand to a variety of other clinical applications as well in which both language and speech characteristics are affected.

Description

University of Minnesota Ph.D. dissertation. May 2024. Major: Health Informatics. Advisor: Serguei Pakhomov. 1 computer file (PDF); xvi, 120 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Li, Changye. (2024). Detecting Cognitive Impairment from Language and Speech for Early Screening of Alzheimer's Disease Dementia with Interpretable Transformer-Based Language Models. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/264324.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.