Browsing by Author "Mithal, Varun"
Now showing 1 - 10 of 10
- Results Per Page
- Sort Options
Item A Comparative Evaluation of Anomaly Detection Techniques for Sequence Data(2008-07-07) Chandola, Varun; Mithal, Varun; Kumar, VipinAnomaly detection has traditionally dealt with record or transaction type data sets. But in many real domains, data naturally occurs as sequences, and therefore the desire of studying anomaly detection techniques in sequential data sets. The problem of detecting anomalies in sequence data sets is related to but different from the traditional anomaly detection problem, because the nature of data and anomalies are different than those found in record data sets. While there are many surveys and comparative evaluations for traditional anomaly detection, similar studies are not done for sequence anomaly detection. We investigate a broad spectrum of anomaly detection techniques for symbolic sequences, proposed in diverse application domains. Our hypothesis is that symbolic sequences from different domains have distinct characteristics in terms of the nature of sequences as well as the nature of anomalies which makes it important to investigate how different techniques behave for different types of sequence data. Such a study is critical to understand the relative strengths and weaknesses of different techniques. Our paper is one such attempt where we have comparatively evaluated 7 anomaly detection techniques on 10 public data sets, collected from three diverse application domains. To gain further understanding in the performance of the techniques, we present a novel way to generate sequence data with desired characteristics. The results on the artificially generated data sets help us in experimentally verifying our hypothesis regarding different techniques.Item A Data Mining Framework for Forest Fire Mapping(2012-03-29) Karpatne, Anuj; Chen, Xi; Chamber, Yashu; Mithal, Varun; Lau, Michael; Steinhaeuser, Karsten; Boriah, Shyam; Steinbach, Michael; Kumar, VipinForests are an important natural resource that support economic activity and play a significant role in regulating the climate and the carbon cycle, yet forest ecosystems are increasingly threatened by fires caused by a range of natural and anthropogenic factors. Mapping these fires, which can range in size from less than an acre to hundreds of thousands of acres, is an important task for supporting climate and carbon cycle studies as well as informing forest management. There are two primary approaches to fire mapping: field and aerial-based surveys, which are costly and limited in their extent; and remote sensing-based approaches, which are more cost-effective but pose several interesting methodological and algorithmic challenges. In this paper, we introduce a new framework for mapping forest fires based on satellite observations. Specifically, we develop spatio-temporal data mining methods for Moderate Resolution Imaging Spectroradiometer (MODIS) data to generate a history of forest fires. A systematic comparison with alternate approaches across diverse geographic regions demonstrates that our algorithmic paradigm is able to overcome some of the limitations in both data and methods employed by other prior efforts.Item A Study of Time Series Noise Reduction Techniques in the Context of Land Cover Change Detection(2011-08-12) Chen, Xi; Mithal, Varun; VangalaReddy, Sruthi; Brugere, Ivan; Boriah, Shyam; Kumar, VipinRemote sensing data sets frequently suffer from noise due to atmospheric conditions and instrument issues. This noise negatively affects the usability of these data sets and therefore noise reduction techniques are frequently used to reduce the impact of noise. A well-known remote sensing data set, MODIS Enhance Vegetation Index (EVI), measures the amount of vegetation (based on surface reflectance) observed from satellite. This data set has been used for land cover change detection, in both regional-scale and global-scale studies. Many noise reduction techniques have seen proposed in the remote sensing literature but comparative studies to understand relative performance of these techniques are scarce. Furthermore, the existing comparative studies typically evaluate a small number of techniques on a specific geographical region. Therefore, little is known about the global applicability of these techniques. In addition, time series based land cover change detection algorithms are known to be negatively impacted by the presence of noise. This paper investigates the interrelations of regional noise characteristics, change detection algorithms, and noise reduction methods. The methods for noise reduction are applied in three different geographic regions and through comparison we outline the noise characteristics relevant to the performance of land cover change detection.Item Classifying multivariate time series by learning sequence-level discriminative patterns(2018-01-23) Nayak, Guruprasad; Mithal, Varun; Jia, Xiaowei; Kumar, VipinTime series classification algorithms designed to use local context do not work on landcover classification problems where the instances of the two classes may often exhibit similar feature values due to the large natural variations in other land covers across the year and unrelated phenomena that they undergo. In this paper, we propose to learn discriminative patterns from the entire length of the time series, and use them as predictive features to identify the class of interest. We propose a novel neural network algorithm to learn the key signature of the class of interest as a function of the feature values together with the discriminative pattern made from that signature through the entire time series in a joint framework. We demonstrate the utility of this technique on the landcover classification application of burned area mapping that is of considerable societal importance.Item Computational Techniques to Identify Rare Events in Spatio-temporal Data(2018-05) Mithal, VarunRecent attention on the potential impacts of land cover changes to the environment as well as long-term climate change has increased the focus on automated tools for global-scale land surface monitoring. Advancements in remote sensing and data collection technologies have produced large earth science data sets that can now be used to build such tools. However, new data mining methods are needed to address the unique characteristics of earth science data and problems. In this dissertation, we explore two of these interesting problems, which are (1) build predictive models to identify rare classes when high quality annotated training samples are not available, and (2) classification enhancement of existing imperfect classification maps using physics-guided constraints. We study the problem of identifying land cover changes such as forest fires as a supervised binary classification task with the following characteristics: (i) instead of true labels only imperfect labels are available for training samples. These imperfect labels can be quite poor approximation of the true labels and thus may have little utility in practice. (ii) the imperfect labels are available for all instances (not just the training samples). (iii) the target class is a very small fraction of the total number of samples (traditionally referred to as the rare class problem). In our approach, we focus on leveraging imperfect labels and show how they, in conjunction with attributes associated with instances, open up exciting opportunities for performing rare class prediction. We applied this approach to identify burned areas using data from earth observing satellites, and have produced a database, which is more reliable and comprehensive (three times more burned area in tropical forests) compared to the state-of-art NASA product. We explore approaches to reduce errors in remote sensing based classification products, which are common due to poor data quality (eg., instrument failure, atmospheric interference) as well as limitations of the classification models. We present classification enhancement approaches, which aim to improve the input (imperfect) classification by using some implicit physics-based constraints related to the phenomena under consideration. Specifically, our approach can be applied in domains where (i) physical properties can be used to correct the imperfections in the initial classification products, and (ii) if clean labels are available, they can be used to construct the physical properties.Item Mapping Burned Areas in Tropical forests using MODIS data(2016-09-02) Mithal, Varun; Nayak, Guruprasad; Khandelwal, Ankush; Kumar, Vipin; Nemani, Ramakrishna; Oza, Nikunj C.This paper presents a new burned area product for the tropical forests in South America and South-east Asia. The product is derived from Moderate Resolution Imaging Spectroradiometer (MODIS) multispectral surface reflectance data and Active Fire hotspots using a novel rare class detection framework that builds data-adaptive classification models for different spatial regions and land cover classes. Burned areas are reported for 9 MODIS tiles at a spatial resolution of 500 m in the study period from 2001 to 2014. The total burned area detected in the tropical forests of South America and South-east Asia during these years is 2,286,385 MODIS pixels (approximately 571 K sq. km.), which is more than three times compared to the estimates by the state-of-the art MODIS MCD64A1 (742,886 MODIS pixels). We also present validation of this burned area product using (i) manual inspection of Landsat false color composites before and after burn date, (ii) manual inspection of synchronized changes in vegetation index time series around the burn date, and (iii) comprehensive quantitative validation using MODIS-derived differenced Normalized Burn Ratio (dNBR). Our validation results indicate that the events reported in our product are indeed true burn events that are missed by the state-of-art burned area products.Item Monitoring Global Forest Cover Using Data Mining(2010-07-14) Mithal, Varun; Boriah, Shyam; Garg, Ashish; Steinbach, Michael; Kumar, Vipin; Potter, Christopher; Klooster, Steven; Castilla-Rubio, Juan CarlosForests are a critical component of the planet's ecosystem. Unfortunately, there has been significant degradation in forest cover over recent decades as a result of logging, conversion to crop,plantation, and pasture land, or disasters (natural or man made) such as forest fires, floods, and hurricanes. As a result, significant attention is being given to the sustainable use of forests. A key to effective forest management is quantifiable knowledge about changes in forest cover. This requires identification and characterization of changes and the discovery of the relationship between these changes and natural and anthropogenic variables. In this paper, we present our preliminary efforts and achievements in addressing some of these tasks along with the challenges and opportunities that need to be addressed in the future. At a higher level, our goal is to provide an overview of the exciting opportunities and challenges in developing and applying data mining approaches to provide critical information for forest and land use management.Item Multiple Instance Learning for bags with Ordered instances(2017-06-07) Nayak, Guruprasad; Mithal, Varun; Kumar, VipinMultiple Instance Learning (MIL) algorithms are designed for problems where labels are available for groups of instances, commonly referred to as bags. In this paper, we consider a new MIL prob- lem setting where instances in a bag are not ex- changeable, and a bijection exists between every pair of bags. We propose a neural network based MIL algorithm (MILOrd) that leverages the exis- tence of such a bijection when learning to discrim- inate bags. MILOrd has an input node for each in- stance in the bag, an output node that captures the bag level prediction, and a hidden layer that cap- tures the output from an instance level classifier for each instance in the bag. The bag level prediction is obtained by combining these hidden layer val- ues using a function that models the importance of each instance, unlike the traditional schemes where each instance is considered equal. We demonstrate the utility of the proposed algorithm on the prob- lem of burned area mapping using yearly bags com- posed of multispectral reflectance data for different time steps in the year. Our experiments show that MILOrd outperforms traditional MIL schemes that don’t account for the presence of a bijection.Item Supplement for "Change Detection from Temporal Sequences of Class Labels: Application to Land Cover Change Mapping"(2013-01-25) Mithal, Varun; Khandelwal, Ankush; Boriah, Shyam; Steinhaeuser, Karsten; Kumar, VipinThis is a supplement for paper titled "Change Detection from Temporal Sequences of Class Labels: Application to Land Cover Change Mapping" which is included in proceedings of SIAM International Conference of Data Mining, 2013. This supplement section has enlarged figures mentioned in the main paper and additional experiments on synthetic data.Item Understanding Anomaly Detection Techniques for Symbolic Sequences(2009-01-05) Chandola, Varun; Mithal, Varun; Kumar, VipinWe present a comparative evaluation of a large number of anomaly detection techniques on a variety of publicly available as well as artificially generated data sets. Many of these are existing techniques while some are slight variants and/or adaptations of traditional anomaly detection techniques to sequence data. The specific contributions of this paper are as follows: (i). This evaluation facilitates understanding of the relative strengths and weaknesses of different techniques. Through careful experimentation, we illustrate that the performance of different techniques is dependent on the nature of sequences, and the nature of anomalies in the sequences. No one technique outperforms all others. For most techniques we also identify some data sets on which they perform very well, and some on which they perform poorly. (ii). We investigate variants that have not been tried before. For example, we evaluate a k-nearest neighbor based technique that performs better than a clustering based technique that was proposed for sequences. Also, we propose FSA-z, a variant of an existing Finite State Automaton (FSA) based technique, which performs consistently superior to the original FSA based technique. (iii). We propose a novel way of generating artificial sequence data sets to evaluate anomaly detection techniques. (iv). We characterize the nature of normal and anomalous test sequences, and associate the performance of each technique to one or more of such characteristics.