Detecting Cognitive Impairment from Language and Speech for Early Screening of Alzheimer's Disease Dementia with Interpretable Transformer-Based Language Models

Li, Changye2024-07-242024-07-242024-05https://hdl.handle.net/11299/264324University of Minnesota Ph.D. dissertation. May 2024. Major: Health Informatics. Advisor: Serguei Pakhomov. 1 computer file (PDF); xvi, 120 pages.Alzheimer’s disease (AD) is a neurodegenerative disorder that affects the use of speech and language and is diffcult to diagnose in its early stages. Neural language models (NLMs) have delivered impressive performance on the task of discriminating between language produced by cognitively healthy individuals, and those with AD. As artificial neural networks (ANNs) grow in complexity, understanding their inner workings becomes increasingly challenging, which is particularly important in healthcare applications. The intrinsic evaluation metrics of autoregressive NLMs (e.g., predicting the next token given the context), such as perplexity (PPL), reflecting a model’s “surprise” at novel input, and have been widely used to understand the behavior of NLMs. As an alternative to fitting model parameters directly, this thesis proposes a novel method by which a pre-trained transformer-based NLM, GPT-2, is paired with an artificially degraded version of itself, GPT-D, to compute the ratio between these two models’ PPLs on language from cognitively healthy and impaired individuals. This technique approaches state-of-the-art (SOTA) performance on text data from a widely used “Cookie Theft” picture description task, and unlike established alternatives also generalizes well to spontaneous conversations, the degraded models generate text with characteristics known to be associated with AD, demonstrating the induction of dementia-related linguistic anomalies. The novel attention head ablation method employed in this thesis exhibits properties attributed to the concepts of cognitive and brain reserve in human brain studies, which postulate that people with more neurons in the brain and more effcient processing are more resilient to neurodegeneration. The results show that larger GPT-2 models require a disproportionately larger share of attention heads to be masked/ablated to display degradation of similar magnitude to masking in smaller models. To realize their benefits for assessment of mental status, transformer-based NLMs require verbatim transcriptions of speech from patients. While such models have shown promise in detecting cognitive impairment from language samples, the feasibility of deploying such automated tools in large-scale clinical settings depends on the ability to reliably capture and transcribe the speech input. Currently available automatic speech recognition ASR solutions have improved dramatically over the last few years but are still not perfect and can have high error rates on challenging speech, such as speech from audio data with sub-optimal recording quality. One of the key questions for successfully applying ASR technology for clinical applications is whether imperfect transcripts generated by ASR provide sucient information for downstream tasks to operate at an acceptable level of accuracy. This thesis examines the relationship between the errors produced by several transformer-based ASR systems and their impact on downstream dementia classification. One of the key findings is that ASR errors may provide important features for this downstream classification task, resulting in better performance compared to using manual transcripts. In summary, this thesis is a step toward a better understanding of the relationships between the inner workings of generative NLMs, the language that they produce, and the deleterious e↵ects of dementia on human speech and language characteristics. The probing methods also suggest that the attention mechanism in transformer models may present an analogue to the notions of cognitive and brain reserve and could potentially be used to model certain aspects of the progression of neurodegenerative disorders and aging. Additionally, the results presented in this thesis suggest that the ASR models and the downstream classification models react to acoustic and linguistic dementia manifestations in systematic and mutually synergistic ways, which would have significant implications for use of ASR technology. This line of research enables the automated analysis of speech collected from patients at least in the dementia screening settings, and it has the potential to expand to a variety of other clinical applications as well in which both language and speech characteristics are affected.enAlzheimer's DiseaseAutomatic Speech RecognitionNatural Language ProcessingDetecting Cognitive Impairment from Language and Speech for Early Screening of Alzheimer's Disease Dementia with Interpretable Transformer-Based Language ModelsThesis or Dissertation