Browsing by Author "Mane, Sandeep"
Now showing 1 - 7 of 7
Results Per Page
Sort Options
Item Concept-Aware Ranking: Teaching an Old Graph New Moves(2006-03-20) DeLong, Colin; Mane, Sandeep; Srivastava, JaideepRanking algorithms for web graphs, such as PageRank and HITS, are typically utilized for graphs where a node represents a unique URL (Webpage) and an edge is an explicitly-defined link between two such nodes. In addition to the link structure itself, additional information about the relationship between nodes may be available. For example, anchor text in a Web graph is likely to provide information about the underlying concepts connecting URLs. In this paper, we propose an extension to the Web graph model to take into account conceptual information encoded by links in order to construct a new graph which is sensitive to the conceptual links between nodes. By extracting keywords and recurring phrases from the anchor tag data, a set of concepts is defined. A new definition of a node (one which encodes both an URL and concept) is then leveraged to create an entirely new Web graph, with edges representing both explicit and implicit conceptual links between nodes. In doing so, inter-concept relationships can be modeled and utilized when using graph ranking algorithms. This improves result accuracy by not only retrieving links which are more authoritative given a users' context, but also by utilizing a larger pool of web pages that are limited by concept-space, rather than keyword-space. This is illustrated using webpages from the University of Minnesota's College of Liberal Arts websites.Item From Clicks to Bricks: CRM Lessons from E-commerce(2005-10-12) Mane, Sandeep; Desikan, Prasanna; Srivastava, JaideepE-commerce allows a level of closeness in customer-to-store interaction that is far greater than imaginable in the physical world, leading to unprecedented data collection, especially about the 'process of shopping'. The desire to understand individual customer's behavior and psychology at a deeper level by mining this data has led to significant advances in on-line customer relationship management (e-CRM). Services like real-time recommendations, faster checkouts, and price/feature comparisons of products across different e-stores or brands, have increased the general awareness of customers and made them more demanding. Web mining is the software technology that has made this possible by providing the means to automatically build sophisticated customer models from Web data collected at on-line stores. e-CRM has shown significant concrete benefits in customer experience and loyalty, leading to improved sales and profits. Physical stores have taken a note of these benefits of e-CRM, and are interested in exploring similar possibilities. A key barrier to applying e-CRM techniques to the physical world (p-CRM) has been the lack of ability to collect detailed customer data in the p-CRM world, at the same granularity and in real-time manner as in the e-CRM world. With new technologies like radio frequency identification (RFID) and handheld devices like personal digital assistants (PDA) becoming affordable, these technologies are now being used in major stores for inventory management and/or anti-theft purposes. Based on the confluence of these factors, we posit that "given such detailed knowledge of an individual customer's habits provides insight into his/her preferences and psychology, which can be used to develop a much higher level of trust in the customer-vendor relationship, the time is ripe for revisiting p-CRM to see what lessons learned from e-CRM are applicable." In this paper, we present a concrete proposal on how this can be done, and identify directions for future research.Item Identifying Clusters in Marked Spatial Point Processes: A Summary of Results(2006-03-20) Mane, Sandeep; Kang, James; Shekhar, Shashi; Srivastava, Jaideep; Murray, Carson; Pusey, AnneClustering of marked spatial point process is an important problem in many application domains (e.g. Behavioral Ecology). Classical clustering approaches handle homogeneous spatial points and hence cannot cluster marked spatial point process. In this paper, we propose a novel intuitive approach, Merge Algorithm, to hierarchically cluster marked spatial point process. This approach treats all spatial point processes in a dendrogram's sub-tree as a single spatial point process while clustering. The resulting dendrogram for marked spatial point process needs be analyzed by a domain expert to identify clusters. To remove the subjective nature of the clusters identified, we propose a novel statistical method, Cluster Identification Algorithm, to partition a dendrogram into clusters. This approach identifies (cuts) a dendrogram's sub-tree as a cluster if that subtree's intra-subtree similarity is significantly higher than inter-subtree similarity. Experiments with Jane Goodall Institute's chimpanzee ecological dataset from the Gombe National Park, Tanzania which shows that our proposed methods identified clusters which were compatible to those identified by domain experts.Item Mining Valid-Time Indeterminate Events(2006-03-27) Mane, Sandeep; Srivastava, Jaideep; Sinha, AbhinayaIn many temporally oriented applications, it is known that events have occurred but the exact time when an event has occurred is not known. For example, a blood test of a diabetic patient may yield information that the patient's blood glucose level is above the safe threshold but may not exactly tell when that has happened. Such temporal events are said to have valid-time indeterminacy, where the exact time of occurrence of an event is not known. Extensions to SQL for supporting valid-time indeterminacy in temporal databases have been studied. However, no prior research has been done on applying mining techniques for finding interesting patterns from valid-time indeterminate events. Thus, in this paper, we first provide a background on temporal valid-time indeterminacy. We then propose a measure, "ordering probability", for computing the probability of occurrence of an episode (ordered list of items) in the given temporal sequence of indeterminate events. The bounds for this measure are shown and then the anti-monotonic and asymmetric properties of this measure are proved. Mining of frequent patterns from indeterminate events will require computation of this measure for different sequences, hence an efficient algorithm for computing the ordering probability measure for a given episode in a sequence is proposed. Finally, the use of this measure in two temporal data mining frameworks, namely (i) sequence mining, and (ii) sequential pattern mining, are explained. The extensions of the frequency of an episode in sequence mining, and support for an episode in sequential pattern mining are shown. The research is this paper thus generalizes the research in temporal data mining to allow valid-time indeterminacy.Item Network Size Estimation In A Peer-to-Peer Network(2005-09-15) Mane, Sandeep; Mopuru, Sandeep; Mehra, Kriti; Srivastava, JaideepThe emergence of peer-to-peer networks over the last decade has changed user's perspective about information available on the Web. But, with thousand of nodes joining and leaving a peer-to-peer network within a short span of time, it has become practically impossible for a node (or peer) to keep track of complete network. Often times, however, a node needs to at least have an estimate of number of nodes in such a network. For example, in determining time-to-live for a search query packet, a node must have a good estimate of network size. Previous deterministic approaches require a complete walk on the network, since such networks usually lack a central authority. Such approaches hence do not scale well to large networks. A few approaches, which collect partial information about the network, have been proposed as an alternative to address the scalability issues. This paper presents a novel approach for size estimation of a peer-to-peer network. The basic idea is to sample nodes in the network and then using this partial information about the network, an estimate of the network size is obtained using capture-recapture method. The capture-recapture method is a statistical method, which has been widely used for estimation of size of a closed population in oceanography and epidemiology. For a better estimate, the capture-recapture method requires two or more random (independent) samplings (sets of detected nodes) of the network. In our case, for independent sampling, we use random walks on the peer-to-peer network, since a random walk can achieve same statistical properties as independent samplings for a peer-to-peer network (see Gkantsidis et al [1]). Experimental results show that the proposed random walk based capture-recapture approach gives a good estimate of network size. In addition, results of using proposed method as well as three other size estimation methods on scale-free and random networks shows that the former algorithm gives a better estimate (lesser error) with a slight overhead on computation. This research motivates further study of estimation techniques for open networks (i.e. networks whose size changes during the estimation process).Item Spatial Clustering Of Chimpanzee Locations For Neighborhood Identification(2005-09-15) Mane, Sandeep; Murray, Carson; Shekhar, Shashi; Srivastava, Jaideep; Pusey, AnneSince 1960, the chimpanzees (Pan troglodytes) of Gombe National Park, Tanzania, have been studied by behavioral ecologists, including Jane Goodall. Data has been collected for the last 40 years and it is now being further analyzed by researchers in order to increase our understanding of the social structure of chimpanzees. In this paper, we consider the following question of interest to behavioral ecologists Does clustering exist among female chimpanzees in terms of the spatial locations visited by them? The analysis of this question will help behavioral ecologists to learn about the space use and the social interactions between female chimpanzees. The data collected for this analysis are marked spatial point patterns over the park. Current spatial clustering methods lack the ability to handle such marked point patterns directly. This paper presents a novel application of spatial point pattern analysis and data mining techniques to the ecological problem of clustering female chimpanzees. We studied various spatial analysis techniques and found that the Ripleys K-function provides a powerful tool for evaluating clustering behavior among spatial point patterns. We then proposed two clustering approaches for marked point patterns based on this widely-used statistical K-function. Experimental results using the proposed clustering methods provide significant insight into the dynamics of female chimpanzee space use and into the overall social stucture of the species. In addition, the methods proposed here can be extended to also include temporal information.Item Who Thinks Who Knows Who? Socio-cognitive Analysis of Email Networks(2006-07-21) Pathak, Nishith; Mane, Sandeep; Srivastava, JaideepInterpersonal interaction plays an important role in organizational dynamics, and understanding these interaction networks is a key issue for any organization, since these can be tapped to facilitate various organizational processes. However, the approaches of collecting data about them using surveys/interviews are fraught with problems of scalability, logistics and reporting biases, especially since such surveys may be perceived to be intrusive. Widespread use of computer networks for organizational communication provides a unique opportunity to overcome these difficulties and automatically map the organizational networks with a high degree of detail and accuracy. This paper describes an effective and scalable approach for modeling organizational networks by tapping into an organization's email communication. The approach models communication between actors as non-stationary Bernoulli trials and Bayesian inference is used for estimating model parameters over time. This approach is useful for socio-cognitive analysis (who knows who knows who) of organizational communication networks. Using this approach, novel measures for analysis of (i) closeness between actors' perceptions about such organizational networks (agreement), (ii) divergence of an actor's perceptions about organizational network from reality (misperception) are explained. Using the Enron email data, we show that these techniques provide sociologists with a new tool to understand organizational networks.