A non-factoid question answering system for Tweet contextualization

Information Retrieval (IR) is a field that deals with the storage and retrieval of information from a large collection of documents. A document consists primarily of text, for example, a webpage or a news article. IR attempts to satisfy the information need of the user. Traditionally, the user enters a natural language query, and documents containing information about that query are returned by the system. But in many cases, the user may be interested in specific and concise pieces of information rather than an entire document. One such scenario occurs in the field of Question Answering (QA). In QA, the user enters a natural language question and QA systems come up with a concise answer to the user's question. The question can be factoid or non-factoid. Factoid questions have simple facts as answers, and these facts are retrieved from a single document, whereas non-factoid questions typically have as answers longer pieces of readable information which may come from single or multiple documents. This thesis describes a non-factoid QA system developed for a retrieval task known as Tweet Contextualization. Our QA system for the Tweet Contextualization task takes tweets from microblogging website as an input and provides an answer to the question: <italic>What is this tweet about?</italic>, i.e., it provides the context for the tweet. This context is in the form of a maximum 500 word summary and is extracted from the recent, cleaned Wikipedia dump. We use Indri as a primary retrieval tool for our QA system. We also describe our approach for generating context summaries by considering n-gram overlap between tweets and sentences from the Wikipedia corpus. The top-ranked results achieved by our QA system for the INEX 2012 and 2013 Tweet Contextualization tracks are also included.

Keywords

Information retrieval

Natural language processing

N-Gram overlap

Non-Factoid Question answering

Tweet contextualization

XML rertrieval

Description

University of Minnesota M.S. thesis. August 2013. Major: Computer science. Advisor: Dr. David Schimf. 1 computer file (PDF); Dr. Donal B. Crouch. 1 computer file (PDF); vi, 36 pages.

Collections

Master's Theses (Plan A and Professional Engineering Design Projects)

Suggested Citation

Nawale, Swapnil Atmaram. (2013). A non-factoid question answering system for Tweet contextualization. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/160250.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.

A non-factoid question answering system for Tweet contextualization

View/Download File

Persistent link to this item

Statistics

Title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

item.page.replaces

License

Collections

Series/Report Number

Funding Information

item.page.isbn

DOI identifier

Previously Published Citation

Other identifiers

Suggested Citation

University of Minnesota Twin Cities

A non-factoid question answering system for Tweet contextualization

View/Download File

Persistent link to this item

Statistics

Title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

item.page.replaces

License

Collections

Series/Report Number

Funding Information

item.page.isbn

DOI identifier

Previously Published Citation

Other identifiers

Suggested Citation