Information Retrieval (IR) is a field that deals with the storage and retrieval of information from a large collection of documents. A document consists primarily of text, for example, a webpage or a news article. IR attempts to satisfy the information need of the user. Traditionally, the user enters a natural language query, and documents containing information about that query are returned by the system. But in many cases, the user may be interested in specific and concise pieces of information rather than an entire document. One such scenario occurs in the field of Question Answering (QA). In QA, the user enters a natural language question and QA systems come up with a concise answer to the user's question. The question can be factoid or non-factoid. Factoid questions have simple facts as answers, and these facts are retrieved from a single document, whereas non-factoid questions typically have as answers longer pieces of readable information which may come from single or multiple documents. This thesis describes a non-factoid QA system developed for a retrieval task known as Tweet Contextualization. Our QA system for the Tweet Contextualization task takes tweets from microblogging website as an input and provides an answer to the question: <italic>What is this tweet about?</italic>, i.e., it provides the context for the tweet. This context is in the form of a maximum 500 word summary and is extracted from the recent, cleaned Wikipedia dump. We use Indri as a primary retrieval tool for our QA system. We also describe our approach for generating context summaries by considering n-gram overlap between tweets and sentences from the Wikipedia corpus. The top-ranked results achieved by our QA system for the INEX 2012 and 2013 Tweet Contextualization tracks are also included.
University of Minnesota M.S. thesis. August 2013. Major: Computer science. Advisor: Dr. David Schimf. 1 computer file (PDF); Dr. Donal B. Crouch. 1 computer file (PDF); vi, 36 pages.
Nawale, Swapnil Atmaram.
A non-factoid question answering system for Tweet contextualization.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.