Browsing by Subject "Information Retrieval"

Now showing 1 - 4 of 4

Improving Search via Named Entity Recognition in Morphologically Rich Languages – A Case Study in Urdu
(2018-02) Riaz, Kashif
Search is not a solved problem even in the world of Google and Bing's state of the art engines. Google and similar search engines are keyword based. Keyword-based searching suffers from the vocabulary mismatch problem -- the terms in document and user's information request don't overlap. For example, cars and automobiles. This phenomenon is called synonymy. Similarly, the user's term may be polysemous -- a user is inquiring about a river's bank, but documents about financial institutions are matched. Vocabulary mismatch exacerbated when the search occurs in Morphological Rich Language (MRL). Concept search techniques like dimensionality reduction do not improve search in Morphological Rich Languages. Names frequently occur news text and determine the "what," "where," "when," and "who" in the news text. Named Entity Recognition attempts to recognize names automatically in text, but these techniques are far from mature in MRL, especially in Arabic Script languages. Urdu is one the focus MRL of this dissertation among Arabic, Farsi, Hindi, and Russian, but it does not have the enabling technologies for NER and search. A corpus, stop word generation algorithm, a light stemmer, a baseline, and NER algorithm is created so the NER-aware search can be accomplished for Urdu. This dissertation demonstrates that NER-aware search on Arabic, Russian, Urdu, and English shows significant improvement over baseline. Furthermore, this dissertation highlights the challenges for researching in low-resource MRL languages.
Oral history interview with Stuart Card
(Charles Babbage Institute, 2020-02-17) Card, Stuart
This interview is part of a series on Human Computer Interaction (HCI) conducted by the Charles Babbage Institute for ACM SIGCHI (Association for Computing Machinery Special Interest Group for Computer Human Interaction). HCI Pioneer Stuart Card discusses early education, attending Oberlin College, and helping lead its computer center, before the bulk of the interview focuses on his graduate education at Carnegie Mellon University working under Allen Newell, and his long and influential tenure at Xerox PARC. This includes his long and impactful collaboration with Newell and fellow Newell doctoral student Tom Moran. Newell, Card, and Moran were fundamentally important to theorizing early Human Computer Interaction, and the three co-wrote the widely used and deeply insightful textbook, The Psychology of Human Computer Interaction. Card provides an overview of his decades of work of Xerox PARC and various aspects of his research contributions to HCI models, information visualization, and information access (especially foraging theory). He moved into managing research and also relates a portion of his leadership roles at PARC and outside on important committees such as for the National Academy of Science. He briefly expresses his ideas on the early institutional history of SIGCHI and its evolution. Regarding his work at PARC, Card discusses his influential work on computer mice research at greater length. Card became an adjunct professor at Stanford University. He is an ACM Fellow and was awarded SIGCHI’s Lifetime Research Achievement Award.
Parsing the Wiki collection and snippet generation
(2013-04) Chittilla, Sai Subramanyam
Information Retrieval (IR) is a feld which deals with retrieving useful information from large sets of data in response to a query. Much information in this digital age is stored in XML format, which associates a structure with a document. Though IR systems have been used for years to access documents, the field has greatly expanded with the emergence of the world wide web, which emphasizes the structure of the data. The amount of data makes the identification of various portion(s) of a document difficult; document structure helps in this task. This thesis describes a retrieval task known as snippet retrieval. A snippet is the smallest meaningful body of text which can be used to establish the relevance of the document without actually looking at the document. The work on snippet retrieval is extended from past work in focused retrieval, wherein a ranked list of focused elements is retrieved in response to the user query. The Vector Space Model provides the framework for retrieval; we use Smart for basic retrieval functions. Our system for dynamic element retrieval, Flex, enables us to identify and rank the individual elements of each hypertext document with respect to the query. We include a discussion of focusing strategies and the use of focused elements for snippet generation. Results of our top-ranked 2011 and 2012 Snippet Retrieval track runs are included.
Understanding clinician information demands and synthesis of clinical documents in electronic health record systems.
(2012-06) Farri, Oladimeji Feyisetan
Large quantities of redundant clinical data are usually transferred from one clinical document to another, making the review of such documents cognitively burdensome and potentially error-prone. Inadequate designs of electronic health record (EHR) clinical document user interfaces probably contribute to the difficulties clinicians experience while processing patient-specific information during time-constrained patient encounters. Furthermore, the continuous need for clinicians to review multiple EHR clinical documents during the typical out-patient visit increases the likelihood of overloading their working memory in the short duration available for complex cognitive activities related to patient care. In a collection of three studies incorporating fundamental principles in clinical informatics, cognitive psychology and human-computer interaction, the think-aloud protocol, combined with other qualitative and quantitative methodologies, was utilized to investigate cognitive processes associated with clinicians' synthesis of EHR clinical documents, the impact of time restrictions on these processes, and implementing a novel visualization tool to enhance processing of these documents during patient care. These studies serve to fill fundamental knowledge gaps in our understanding of how clinicians interact with EHR systems when using clinical documents and can help future EHR system user interface design for clinical documentation with the ultimate goal of improving patient care and clinician satisfaction with these systems.

University Digital Conservancy

Browse by Subject

Browsing by Subject "Information Retrieval"