Browsing by Subject "natural language processing"
Now showing 1 - 6 of 6
- Results Per Page
- Sort Options
Item Clausal Complementation in Nepal Bhasa(2021-11) Zhang, BoruiThis dissertation examines the syntax and lexical semantics of finite verbal de- pendent clauses in Nepal Bhasa through fieldwork and by creating a shallow parsing model and corpus-based search to test descriptive generalizations. Nepal Bhasa deploys two main different syntactic complementation strategies: head-final pre-verbal CPs, which I argue are true complements and head-initial post-verbal CPs, which I argue are parataxis. Complementation additionally introduces certain syntactic and morpho- logical constraints. Inchoative and perfective morphemes appear in free alternation in some mono-clausal environments, whereas in embedding structures, an embedding predicate with the inchoative suffix is restricted. By annotating a small dataset from open-source Nepal Bhasa data, I train a chunking model by adopting the technique of transfer learning in machine learning, with fine-tuning the pre-trained mBERT lan- guage model. The preliminary test results show the potential usefulness of using NLP tools to effectively build a corpus for research in low-resource languages. In particular, this method corroborates my descriptive generalization that inchoative is restricted on embedding predicates in Nepal Bhasa. Additional search over the structural treebank corpora of typologically related languages adds evidence to a cross-linguistic generaliza- tion on embedding verb restrictions.Item Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods(University of Minnesota Supercomputing Institute, 2010-10) Pedersen, TedMeasuring the similarity of short written contexts is a fundamental problem in Natural Language Processing. This article provides a unifying framework by which short context problems can be categorized both by their intended application and proposed solution. The goal is to show that various problems and methodologies that appear quite different on the surface are in fact very closely related. The axes by which these categorizations are made include the format of the contexts (headed versus headless), the way in which the contexts are to be measured (first-order versus second-order similarity), and the information used to represent the features in the contexts (micro versus macro views). The unifying thread that binds together many short context applications and methods is the fact that similarity decisions must be made between contexts that share few (if any) words in common.Item Data for Evaluation of Subcategorization Frames(dataset self-published online, 2006) Chesley, Paula; Salmon-Alt, SuzanneData for evaluation of subcategorization frames as detailed in the paper "Automatic extraction of subcategorization frames for French", available at http://pages.cs.brandeis.edu/~marc/misc/proceedings/lrec-2006/pdf/101_pdf.pdfItem GeoComputational Approaches to Evaluate the Impacts of Communication on Decision-making in Agriculture(2018-12) Runck, BryanThis dissertation proposes a new geocomputational approach to evaluate how communication-based interventions impact outcomes in agriculture. The decisions that people make in agriculture over the next ten to fifteen years will have long-term global consequences because agriculture is going to need to broadly change in order to meet the needs of the future. Many of the technical requirements and economic demands needed to enhance agriculture’s sustainability have been articulated with relative clarity. What remains opaque are the details of who should change, when, where, and how. A growing number of organizations are turning to communication-based interventions to answer these questions with people who will be impacted by changes. Evaluating these interventions is difficult because they are qualitative, affective, meaning-oriented, and discursive. This dissertation builds on existing trends in geocomputation around qualitative geographic information systems and incorporates new methods from machine learning into spatial agent-based modeling. Doing so allows for largely automated creation of agents from natural language text. The dissertation expands on these new tools in each chapter and applies them to the challenge of evaluating communication- based interventions focused on Midwest agriculture. Results suggest that novel insights can be gained into the inner workings of communication-based interventions for improving decision-making using the approaches described in this dissertation.Item Learning Healthcare System enabled by Real-time Knowledge Extraction from Text data(2019-07) Kaggal, VinodWe have a critical void in the clinical informatics ecosystems in enabling information captured in the Electronic Health Record (EHR) to be transformed into actionable knowledge. Incorporating knowledge into clinical practice leveraging informatics based analytical tools is critical in delivering optimal clinical care and lead us toward an effective Learning Healthcare System (LHS). A robust infrastructure plays a very critical role in enabling such clinical informatics ecosystems. This robust infrastructure must guarantee the ability to manage data volume and velocity, variety and veracity. This thesis work accomplishes i) Proposal of a data model to support building a robust analytics framework to automatically compute the knowledge within the EHR ii) Infrastructure to scale-up analytics and knowledge delivery iii) Clinical and Research projects that utilize this infrastructure for near real-time analysis of text data to derive intuitive clinical inferences of patient’s multi-dimensional data.Item Semantic Relatedness and Similarity Reference Standards for Medical Terms(2018-05-03) Pakhomov, Serguei; pakh0002@umn.edu; Pakhomov, Serguei; Natural Language Processing / Information Extraction (NLP/IE) Program (Institute for Health Informatics)This is a collection of reference standards created to test and validate computerized approaches to quantifying the degree of semantic relatedness and similarity between medical terms. Each dataset consists of a list of term pairs that have been evaluated by various healthcare professionals (e.g., medical coders, residents, clinicians) to determine the degree of semantic relatedness and similarity. The details pertaining to each dataset are provided in the referenced publications.