Browsing by Subject "INEX"
Now showing 1 - 5 of 5
- Results Per Page
- Sort Options
Item From focused elements to snippets(2013-04) Nagalla, SuprajaInformation Retrieval is a field of computing which traditionally deals with searching a large collection of documents and retrieving documents based on their similarity to the query. INEX [10] provides a platform (e.g., document collection, queries and uniform evaluation metrics) for the development and evaluation of retrieval algorithms for XML documents. The focus of INEX is to reduce the granularity of search results from the entire document to the element level. In 2011, INEX introduced a new track, called the Snippet Retrieval Track. In 2012, INEX improved this track to make the task of assessment easier. Its goal is to determine how best to generate informative snippets for search results. Such snippets should provide sufficient information to allow the user to determine the relevance of each document without viewing the document itself. The Snippet Retrieval track uses the 50.7GB INEX Wikipedia collection of about 2.7 million articles. We use the Smart [15] experimental retrieval system, based on the Vector Space Model [16], for indexing and retrieval. This thesis describes the approaches taken by UMD to generate runs to participate in the INEX 2011 and 2012 Snippet Retrieval track. We use our method of dynamic element retrieval [7] to generate the element vectors of the XML document tree at run time, thus producing a rank-ordered list of elements from each highly correlated document. These elements are further processed using our methods to generate snippets. The methods used, experimental results, and conclusions are described herein.Item Improving results for the INEX thorough task.(2010-08) Mahule, Abhijeet P.Information retrieval strategies of earlier years were developed around the idea of retrieving entire documents in response to a user’s request. But with the widespread use of the web and markup languages like XML (Extensible Markup Language) for representing documents, the idea of retrieving at a more granular (element) level evolved. And with the continuous growth in XML information repositories, the focus is now on developing effective element retrieval strategies. Our element retrieval strategy, called Flex (flexible retrieval), is designed to work with a semistructured document collection of XML documents such as the INEX 2009 collection. In this thesis, we focus on improving the results for the INEX Thorough task by combining article retrieval with Flex for element generation and retrieval. Experiments to determine the best value of slope and pivot are necessary to enable the use of Lnu-ltu weighting by Flex. The results of our experiments to produce improved results for the INEX 2009 Thorough task are described and reported.Item Parsing the Wiki collection and snippet generation(2013-04) Chittilla, Sai SubramanyamInformation Retrieval (IR) is a feld which deals with retrieving useful information from large sets of data in response to a query. Much information in this digital age is stored in XML format, which associates a structure with a document. Though IR systems have been used for years to access documents, the field has greatly expanded with the emergence of the world wide web, which emphasizes the structure of the data. The amount of data makes the identification of various portion(s) of a document difficult; document structure helps in this task. This thesis describes a retrieval task known as snippet retrieval. A snippet is the smallest meaningful body of text which can be used to establish the relevance of the document without actually looking at the document. The work on snippet retrieval is extended from past work in focused retrieval, wherein a ranked list of focused elements is retrieved in response to the user query. The Vector Space Model provides the framework for retrieval; we use Smart for basic retrieval functions. Our system for dynamic element retrieval, Flex, enables us to identify and rank the individual elements of each hypertext document with respect to the query. We include a discussion of focusing strategies and the use of focused elements for snippet generation. Results of our top-ranked 2011 and 2012 Snippet Retrieval track runs are included.Item Personalized Book Retrieval System Using Amazon-LibraryThing Collection(2014-08) Ravva, VenkataravikiranInformation retrieval is the science of retrieving documents or information from a corpus based on the need of user. Selecting a book from a collection of available books based on its topical relevance to the query may not give us the "best" (or all the "best") such book(s). However, by including social data, such as popularity, reviws and ratings, may improve the results. So we include social data with book metadata for this purpose. The main goal of this research is to provide a book retrieval system for the Social Book Search (SBS) Track of the INEX forum. For the SBS track, participants are provided with an XML collection of data from Amazon and LibraryThing (LT) forum, a set of topics from the LT forum enriched with user catalogue data (i.e., books that the topic creator has in his LibraryThing personal catalogue), and anonymous user profiles. Participants must devise a system which provides the ISBN/work IDs of the books which are relevant to the topic creator. For this purpose, we designed a recommender system which provides personalized search results.Item Producing improved results for the INEX focused and relevant in context tasks.(2010-08) Vadlamudi, SandeepInformation retrieval (IR) is the science of retrieving information associated with a given query that is judged relevant by the user. With the use of XML, a mechanism was developed to identify the structure of a document, enabling the retrieval of elements from within documents. Now we can retrieve relevant elements at different levels of granularity. We use flexible retrieval to retrieve elements from a document [3]. The goal of this thesis is to improve the results of the INEX Focused and Relevant in Context (RIC) tasks. In the Focused task, we are required to produce a rank ordered list of non-overlapping elements, whereas in the RIC task, we are required to retrieve relevant focused elements from relevant articles. In this thesis, we discuss various methodologies that we have developed to improve our results for the Focused and RIC tasks. Experiments demonstrating the efficacy of our methods are detailed herein.