Browsing by Subject "Natural language processing"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Automated methods to extract patient new information from clinical notes in electronic health record systems(2013-11) Zhang, RuiThe widespread adoption of Electronic Health Record (EHR) has resulted in rapid text proliferation within clinical care. Clinicians' use of copying and pasting functions in EHR systems further compounds this by creating a large amount of redundant clinical information in clinical documents. A mixture of redundant information (especially outdated and incorrect information) and new information in a single clinical note increases clinicians' cognitive burden and results in decision-making difficulties. Moreover, replicated erroneous information can potentially cause risks to patient safety. However, automated methods to identify redundant or relevant new information in clinical texts have not been extensively investigated. The overarching goal of this research is to develop and evaluate automated methods to identify new and clinically relevant information in clinical notes using expert-derived reference standards. Modified global alignment methods were adapted to investigate the pattern of redundancy in individual longitudinal clinical notes as well as a larger group of patient clinical notes. Statistical language models were also developed to identify new and clinically relevant information in clinical notes. Relevant new information identified by automated methods will be highlighted in clinical notes to provide visualization cues to clinicians. New information proportion (NIP) was used to indicate the quantity of new information in each note and also navigate clinician notes with more new information. Classifying semantic types of new information further provides clinicians with specific types of new information that they are interested in finding. The techniques developed in this research can be incorporated into production EHR systems and could potentially aid clinicians in finding and synthesizing new information in a note more purposely, and could finally improve the efficiency of healthcare delivery.Item The impact of data fragmentation on high-throughput clinical phenotyping.(2012-02) Wei, WeiqiSubject selection is essential and has become the rate-limiting step for harvesting knowledge to advance healthcare through clinical research. Present manual approaches inhibit researchers from conducting deep and broad studies and drawing confident conclusions. High-throughput clinical phenotyping (HTCP), a recently proposed approach, leverages the machine-processable content from electronic medical record (EMR) for this otherwise inefficient process making subject selections scalable and practical. However, the ability to capture a patient’s medical data is often limited because of commonly existing data fragmentation problems within current EMR systems, i.e. different data types (structured vs. unstructured), heterogeneous data sources (single medical center vs. multiple healthcare centers), and various time frames (short time frame vs. long time frame). The role of data fragmentation on HTCP remains unknown. In this dissertation, by taking advantage of the REP patient-record-linkage system and the richness of EMR data at Mayo Clinic, I provide a multidimensional and thorough demonstration of how data fragmentation affects HTCP. The predominant message that this dissertation delivered to the health informatics field can be summarized as data fragmentation of EMR has a remarkable influence on HTCP. This risk should be carefully considered and mitigated by clinical researchers for the secondary and meaningful use of EMR, especially when developing or executing an HTCP algorithm for subject selection.Item A non-factoid question answering system for Tweet contextualization(2013-08) Nawale, Swapnil AtmaramInformation Retrieval (IR) is a field that deals with the storage and retrieval of information from a large collection of documents. A document consists primarily of text, for example, a webpage or a news article. IR attempts to satisfy the information need of the user. Traditionally, the user enters a natural language query, and documents containing information about that query are returned by the system. But in many cases, the user may be interested in specific and concise pieces of information rather than an entire document. One such scenario occurs in the field of Question Answering (QA). In QA, the user enters a natural language question and QA systems come up with a concise answer to the user's question. The question can be factoid or non-factoid. Factoid questions have simple facts as answers, and these facts are retrieved from a single document, whereas non-factoid questions typically have as answers longer pieces of readable information which may come from single or multiple documents. This thesis describes a non-factoid QA system developed for a retrieval task known as Tweet Contextualization. Our QA system for the Tweet Contextualization task takes tweets from microblogging website as an input and provides an answer to the question: What is this tweet about?, i.e., it provides the context for the tweet. This context is in the form of a maximum 500 word summary and is extracted from the recent, cleaned Wikipedia dump. We use Indri as a primary retrieval tool for our QA system. We also describe our approach for generating context summaries by considering n-gram overlap between tweets and sentences from the Wikipedia corpus. The top-ranked results achieved by our QA system for the INEX 2012 and 2013 Tweet Contextualization tracks are also included.