Parsing the Wiki collection and snippet generation

Information Retrieval (IR) is a feld which deals with retrieving useful information from large sets of data in response to a query. Much information in this digital age is stored in XML format, which associates a structure with a document. Though IR systems have been used for years to access documents, the field has greatly expanded with the emergence of the world wide web, which emphasizes the structure of the data. The amount of data makes the identification of various portion(s) of a document difficult; document structure helps in this task. This thesis describes a retrieval task known as snippet retrieval. A snippet is the smallest meaningful body of text which can be used to establish the relevance of the document without actually looking at the document. The work on snippet retrieval is extended from past work in focused retrieval, wherein a ranked list of focused elements is retrieved in response to the user query. The Vector Space Model provides the framework for retrieval; we use Smart for basic retrieval functions. Our system for dynamic element retrieval, Flex, enables us to identify and rank the individual elements of each hypertext document with respect to the query. We include a discussion of focusing strategies and the use of focused elements for snippet generation. Results of our top-ranked 2011 and 2012 Snippet Retrieval track runs are included.

Keywords

INEX

Information Retrieval

Smart

Snippet

Description

University of Minnesota M.S. thesis. April 2013. Major: Computer science. Advisor: Dr Donald Crouch. 1 computer file (PDF); vi, 31 pages.

Collections

Master's Theses (Plan A and Professional Engineering Design Projects)

Suggested Citation

Chittilla, Sai Subramanyam. (2013). Parsing the Wiki collection and snippet generation. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/152253.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.

Parsing the Wiki collection and snippet generation

View/Download File

Persistent link to this item

Statistics

Title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

item.page.replaces

License

Collections

Series/Report Number

Funding Information

item.page.isbn

DOI identifier

Previously Published Citation

Other identifiers

Suggested Citation

University of Minnesota Twin Cities

Parsing the Wiki collection and snippet generation

View/Download File

Persistent link to this item

Statistics

Title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

item.page.replaces

License

Collections

Series/Report Number

Funding Information

item.page.isbn

DOI identifier

Previously Published Citation

Other identifiers

Suggested Citation