Browsing by Author "Tagarelli, Andrea"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item A Segment-based Approach To Clustering Multi-Topic Documents(2008-01-31) Tagarelli, Andrea; Karypis, GeorgeDocument clustering has been recognized as a central problem in text data management, and it becomes particularly challenging when documents have multiple topics. In this paper we address the problem of multi-topic document clustering by leveraging the natural composition of documents in text segments, which bear one or more topics on their own. We propose a segment-based document clustering framework, which is designed to induce a classification of documents starting from the identification of cohesive groups of segment-based portions of the original documents. We empirically give evidence of the significance of our approach on different, large collections of multi-topic documents.Item Document Clustering: The Next Frontier(2013-02-04) Anastasiu, David C.; Tagarelli, Andrea; Karypis, GeorgeThe proliferation of documents, on both the Web and in private systems, makes knowledge discovery in document collections arduous. Clustering has been long recognized as a useful tool for the task. It groups like-items together, maximizing intra-cluster similarity and inter-cluster distance. Clustering can provide insight into the make-up of a document collection and is often used as the initial step in data analysis. While most document clustering research to date has focused on moderate length single topic documents, real-life collections are often made up of very short or long documents. Short documents do not contain enough text to accurately compute similarities. Long documents often span multiple topics that general document similarity measures do not take into account. In this paper we will first give an overview of general purpose document clustering, and then focus on recent advancements in the next frontier in document clustering: long and short documents.Item Understanding Computer Usage Evolution(2014-10-10) Anastasiu, David C.; Rashid, Al M.; Tagarelli, Andrea; Karypis, GeorgeThe proliferation of computing devices in recent years has dramatically changed the way people work, play, communicate, and access information. The personal computer (PC) now has to compete with smartphones, tablets, and other devices for tasks it used to be the default device for. Understanding how PC usage evolves over time can help provide the best overall user experience for current customers, can help determine when they need brand new systems vs. upgraded components, and can inform future product design to better anticipate user needs. In this paper, we introduce a method for the analysis of users' computer usage evolution. Our algorithm, Orion, segments the application-level usage of different users into a sequence of prototypical usage patterns shared among users, referred to as protos. Using an iterative process, protos are automatically derived from the segmentation, and an optimal segmentation is determined from the protos by a dynamic programming algorithm. To ensure that the segmentation is robust, constraints on the length and the number of segments are utilized. We show the validity of our method by analyzing a dataset consisting of over 28K users whose PC usage covers approximately 1M weeks. Our results show that different groups of users exhibit different usage patterns, the usage patterns of nearly 50% of the users change over time, and more than 20% of the users undergo multiple changes. Moreover, many of the differences in the usage patterns and their changes appear to correlate with various user-specific information, such as their geographic location and/or the type of computer system that they have. To show the versatility of Orion, we present additional results from an analysis of 57K grocery store orders of nearly 1000 users.