Browsing by Author "Boriah, Shyam"
Now showing 1 - 12 of 12
- Results Per Page
- Sort Options
Item A Data Mining Framework for Forest Fire Mapping(2012-03-29) Karpatne, Anuj; Chen, Xi; Chamber, Yashu; Mithal, Varun; Lau, Michael; Steinhaeuser, Karsten; Boriah, Shyam; Steinbach, Michael; Kumar, VipinForests are an important natural resource that support economic activity and play a significant role in regulating the climate and the carbon cycle, yet forest ecosystems are increasingly threatened by fires caused by a range of natural and anthropogenic factors. Mapping these fires, which can range in size from less than an acre to hundreds of thousands of acres, is an important task for supporting climate and carbon cycle studies as well as informing forest management. There are two primary approaches to fire mapping: field and aerial-based surveys, which are costly and limited in their extent; and remote sensing-based approaches, which are more cost-effective but pose several interesting methodological and algorithmic challenges. In this paper, we introduce a new framework for mapping forest fires based on satellite observations. Specifically, we develop spatio-temporal data mining methods for Moderate Resolution Imaging Spectroradiometer (MODIS) data to generate a history of forest fires. A systematic comparison with alternate approaches across diverse geographic regions demonstrates that our algorithmic paradigm is able to overcome some of the limitations in both data and methods employed by other prior efforts.Item A Study of Time Series Noise Reduction Techniques in the Context of Land Cover Change Detection(2011-08-12) Chen, Xi; Mithal, Varun; VangalaReddy, Sruthi; Brugere, Ivan; Boriah, Shyam; Kumar, VipinRemote sensing data sets frequently suffer from noise due to atmospheric conditions and instrument issues. This noise negatively affects the usability of these data sets and therefore noise reduction techniques are frequently used to reduce the impact of noise. A well-known remote sensing data set, MODIS Enhance Vegetation Index (EVI), measures the amount of vegetation (based on surface reflectance) observed from satellite. This data set has been used for land cover change detection, in both regional-scale and global-scale studies. Many noise reduction techniques have seen proposed in the remote sensing literature but comparative studies to understand relative performance of these techniques are scarce. Furthermore, the existing comparative studies typically evaluate a small number of techniques on a specific geographical region. Therefore, little is known about the global applicability of these techniques. In addition, time series based land cover change detection algorithms are known to be negatively impacted by the presence of noise. This paper investigates the interrelations of regional noise characteristics, change detection algorithms, and noise reduction methods. The methods for noise reduction are applied in three different geographic regions and through comparison we outline the noise characteristics relevant to the performance of land cover change detection.Item Contextual Time Series Change Detection(2012-07-23) Chen, Xi; Steinhaeuser, Karsten; Boriah, Shyam; Chatterjee, Snigdhansu; Kumar, VipinTime series are commonly used in a variety of fields, ranging from economics to manufacturing. As a result, time series analysis and modeling has become an active research area in statistics and data mining. In this paper, we focus on a type of change we call contextual time series change (CTC) and propose a novel two-stage algorithm to address it. In contrast to traditional change detection methods, which consider each time series separately, CTC is defined as a change relative to the behavior of a group of related time series. As a result, our proposed method is able to identify novel types of changes not found by other algorithms. We demonstrate the unique capabilities of our approach with several case studies on real-world datasets from the financial and Earth science domains.Item Land Cover Change Detection using Data Mining Techniques(2008-03-14) Boriah, Shyam; Kumar, Vipin; Steinbach, Michael; Potter, Christopher; Klooster, StevenThe study of land cover change is an important problem in the Earth science domain because of its impacts on local climate, radiation balance, biogeochemistry, hydrology, and the diversity and abundance of terrestrial species. Data mining and knowledge discovery techniques can aid this effort by efficiently discovering patterns that capture complex interactions between ocean temperature, air pressure, surface meteorology, and terrestrial carbon flux. Most well-known change detection techniques from statistics, signal processing and control theory are not well-suited for the massive high-dimensional spatio-temporal data sets from Earth Science due to limitations such as high computational complexity and the inability to take advantage of seasonality and spatio-temporal autocorrelation inherent in Earth Science data. In our work, we seek to address these challenges with new change detection techniques that are based on data mining approaches. Specifically, in this paper we have performed a case study for a new change detection technique for the land cover change detection problem. We study land cover change in the state of California, focusing on the San Francisco Bay Area as well perform an extended study on the entire state. We also perform a comparative evaluation on forests in the entire state. These results demonstrate the utility of data mining techniques for the land cover change detection problem.Item Language and Library Support for Climate Data Applications(2009) Van Wyk, Eric; Kumar, Vipin; Steinbach, Michael; Boriah, Shyam; Choudhary, AlokItem Monitoring Global Forest Cover Using Data Mining(2010-07-14) Mithal, Varun; Boriah, Shyam; Garg, Ashish; Steinbach, Michael; Kumar, Vipin; Potter, Christopher; Klooster, Steven; Castilla-Rubio, Juan CarlosForests are a critical component of the planet's ecosystem. Unfortunately, there has been significant degradation in forest cover over recent decades as a result of logging, conversion to crop,plantation, and pasture land, or disasters (natural or man made) such as forest fires, floods, and hurricanes. As a result, significant attention is being given to the sustainable use of forests. A key to effective forest management is quantifiable knowledge about changes in forest cover. This requires identification and characterization of changes and the discovery of the relationship between these changes and natural and anthropogenic variables. In this paper, we present our preliminary efforts and achievements in addressing some of these tasks along with the challenges and opportunities that need to be addressed in the future. At a higher level, our goal is to provide an overview of the exciting opportunities and challenges in developing and applying data mining approaches to provide critical information for forest and land use management.Item Pre-processing of the validation data used in the paper titled "Model-Free Time Series Segmentation Approach for Land Cover Change Detection"(2011-08-17) Garg, Ashish; Manikonda, Lydia; Kumar, Shashank; Krishna, Vikrant; Boriah, Shyam; Steinbach, Michael; Kumar, Vipin; Toshniwal, Durga; Potter, Christopher; Klooster, StevenThis report describes the detailed steps of pre-processing the validation data which is used for comparative evaluation of the algorithms proposed in the paper titled "A Model-Free Segmentation Approach for Land Cover Change Detection".Item Similarity Measures for Categorical Data--A Comparative Study(2007-10-15) Chandola, Varun; Boriah, Shyam; Kumar, VipinMeasuring similarity or distance between two entities is a key step for several data mining and knowledge discovery tasks. The notion of similarity for continuous data is relatively well-understood, but for categorical data, the similarity computation is not straightforward. Several data-driven similarity measures have been proposed in the literature to compute the similarity between two categorical data instances but their relative performance has not been evaluated. In this paper we study the performance of a variety of similarity measures in the context of a specific data mining task: outlier detection. Results on a variety of data sets show that while no one measure dominates others for all types of problems, some measures are able to have consistently high performance.Item Supplement for "Change Detection from Temporal Sequences of Class Labels: Application to Land Cover Change Mapping"(2013-01-25) Mithal, Varun; Khandelwal, Ankush; Boriah, Shyam; Steinhaeuser, Karsten; Kumar, VipinThis is a supplement for paper titled "Change Detection from Temporal Sequences of Class Labels: Application to Land Cover Change Mapping" which is included in proceedings of SIAM International Conference of Data Mining, 2013. This supplement section has enlarged figures mentioned in the main paper and additional experiments on synthetic data.Item Supplement for "Contextual Time Series Change Detection"(2013-01-25) Chen, Xi; Steinhaeuser, Karsten; Boriah, Shyam; Chatterjee, Snigdhansu; Kumar, VipinTime series data are common in a variety of fields ranging from economics to medicine and manufacturing. As a result, time series analysis and modeling has become an active research area in statistics and data mining. In this paper, we focus on a type of change we call contextual time series change (CTC) and propose a novel two-stage algorithm to address it. In contrast to traditional change detection methods, which consider each time series separately, CTC is defined as a change relative to the behavior of a group of related time series. As a result, our proposed method is able to identify novel types of changes not found by other algorithms. We demonstrate the unique capabilities of our approach with several case studies on real-world datasets from the financial and Earth science domains.Item Time series change detection: algorithms for land cover change.(2010-04) Boriah, ShyamThe climate and earth sciences have recently undergone a rapid transformation from a data-poor to a data-rich environment. In particular, climate and ecosystem related observations from remote sensors on satellites, as well as outputs of climate or earth system models from large-scale computational platforms, provide terabytes of temporal, spatial and spatio-temporal data. These massive and information-rich datasets offer huge potential for advancing the science of land cover change, climate change and anthropogenic impacts. One important area where remote sensing data can play a key role is in the study of land cover change. Specifically, the conversion of natural land cover into human-dominated cover types continues to be a change of global proportions with many unknown environmental consequences. In addition, being able to assess the carbon risk of changes in forest cover is of critical importance for both economic and scientific reasons. In fact, changes in forests account for as much as 20% of the greenhouse gas emissions in the atmosphere, an amount second only to fossil fuel emissions. Thus, there is a need in the earth science domain to systematically study land cover change in order to understand its impact on local climate, radiation balance, biogeochemistry, hydrology, and the diversity and abundance of terrestrial species. Land cover conversions include tree harvests in forested regions, urbanization, and agricultural intensification in former woodland and natural grassland areas. These types of conversions also have significant public policy implications due to issues such as water supply management and atmospheric CO2 output. In spite of the importance of this problem and the considerable advances made over the last few years in high-resolution satellite data, data mining, and online mapping tools and services, end users still lack practical tools to help them manage and transform this data into actionable knowledge of changes in forest ecosystems that can be used for decision making and policy planning purposes. In particular, previous change detection studies have primarily relied on examining differences between two or more satellite images acquired on different dates. Thus, a technological solution that detects global land cover change using high temporal resolution time series data will represent a paradigm-shift in the field of land cover change studies. To realize these ambitious goals, a number of computational challenges in spatio-temporal data mining need to be addressed. Specifically, analysis and discovery approaches need to be cognizant of climate and ecosystem data characteristics such as seasonality, non-stationarity/inter-region variability, multi-scale nature, spatio-temporal autocorrelation, high-dimensionality and massive data size. This dissertation, a step in that direction, translates earth science challenges to computer science problems, and provides computational solutions to address these problems. In particular, three key technical capabilities are developed: (1) Algorithms for time series change detection that are effective and can scale up to handle the large size of earth science data; (2) Change detection algorithms that can handle large numbers of missing and noisy values present in satellite data sets; and (3) Spatio-temporal analysis techniques to identify the scale and scope of disturbance events.Item Understanding Categorical Similarity Measures for Outlier Detection(2008-03-04) Chandola, Varun; Boriah, Shyam; Kumar, VipinCategorical attributes are present is many data sets that are analyzed using KDD techniques. A recent empirical study of 14 different data driven categorical similarity measures in the context of outlier detection showed that these measures have widely different performances when applied to several different publicly available data sets. As a next step in understanding the relation between the performance of a similarity measure and the nature of the data, we present an analysis framework to help give insights as to which similarity measure is better suited for what type of data. In this paper we present a framework for modeling categorical data sets with a desired set of characteristics. We also propose a set of separability statistics for a categorical data set that can be used to understand the performance of a similarity measure for outlier detection. In addition, we present three techniques to estimate the proposed separability statistics from a given categorical data set. We experimentally evaluate the different similarity measures in the context of outlier detection, and show how the performance of a similarity measure is related to the various data characteristics.