Browsing by Subject "Snippet"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item From focused elements to snippets(2013-04) Nagalla, SuprajaInformation Retrieval is a field of computing which traditionally deals with searching a large collection of documents and retrieving documents based on their similarity to the query. INEX [10] provides a platform (e.g., document collection, queries and uniform evaluation metrics) for the development and evaluation of retrieval algorithms for XML documents. The focus of INEX is to reduce the granularity of search results from the entire document to the element level. In 2011, INEX introduced a new track, called the Snippet Retrieval Track. In 2012, INEX improved this track to make the task of assessment easier. Its goal is to determine how best to generate informative snippets for search results. Such snippets should provide sufficient information to allow the user to determine the relevance of each document without viewing the document itself. The Snippet Retrieval track uses the 50.7GB INEX Wikipedia collection of about 2.7 million articles. We use the Smart [15] experimental retrieval system, based on the Vector Space Model [16], for indexing and retrieval. This thesis describes the approaches taken by UMD to generate runs to participate in the INEX 2011 and 2012 Snippet Retrieval track. We use our method of dynamic element retrieval [7] to generate the element vectors of the XML document tree at run time, thus producing a rank-ordered list of elements from each highly correlated document. These elements are further processed using our methods to generate snippets. The methods used, experimental results, and conclusions are described herein.Item Parsing the Wiki collection and snippet generation(2013-04) Chittilla, Sai SubramanyamInformation Retrieval (IR) is a feld which deals with retrieving useful information from large sets of data in response to a query. Much information in this digital age is stored in XML format, which associates a structure with a document. Though IR systems have been used for years to access documents, the field has greatly expanded with the emergence of the world wide web, which emphasizes the structure of the data. The amount of data makes the identification of various portion(s) of a document difficult; document structure helps in this task. This thesis describes a retrieval task known as snippet retrieval. A snippet is the smallest meaningful body of text which can be used to establish the relevance of the document without actually looking at the document. The work on snippet retrieval is extended from past work in focused retrieval, wherein a ranked list of focused elements is retrieved in response to the user query. The Vector Space Model provides the framework for retrieval; we use Smart for basic retrieval functions. Our system for dynamic element retrieval, Flex, enables us to identify and rank the individual elements of each hypertext document with respect to the query. We include a discussion of focusing strategies and the use of focused elements for snippet generation. Results of our top-ranked 2011 and 2012 Snippet Retrieval track runs are included.