Parsing the Wiki collection and snippet generation
2013-04
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Parsing the Wiki collection and snippet generation
Alternative title
Authors
Published Date
2013-04
Publisher
Type
Thesis or Dissertation
Abstract
Information Retrieval (IR) is a feld which deals with retrieving useful information from large sets of data in response to a query. Much information in this digital age is stored in XML format, which associates a structure with a document. Though IR systems have been used for years to access documents, the field has greatly expanded with the emergence of the world wide web, which emphasizes the structure of the data. The amount of data makes the identification of various portion(s) of a document difficult; document structure helps in this task.
This thesis describes a retrieval task known as snippet retrieval. A snippet is the smallest meaningful body of text which can be used to establish the relevance of the document without actually looking at the document. The work on snippet retrieval is extended from past work in focused retrieval, wherein a ranked list of focused elements is retrieved in response to the user query. The Vector Space Model provides the framework for retrieval; we use Smart for basic retrieval functions. Our system for dynamic element retrieval, Flex, enables us to identify and rank the individual elements of each hypertext document with respect to the query. We include a discussion of focusing strategies and the use of focused elements for snippet generation. Results of our top-ranked 2011 and 2012 Snippet Retrieval track runs are included.
Keywords
Description
University of Minnesota M.S. thesis. April 2013. Major: Computer science. Advisor: Dr Donald Crouch. 1 computer file (PDF); vi, 31 pages.
Related to
Replaces
License
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Chittilla, Sai Subramanyam. (2013). Parsing the Wiki collection and snippet generation. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/152253.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.