Browsing by Author "Karpatne, Anuj"
Now showing 1 - 7 of 7
- Results Per Page
- Sort Options
Item A Data Mining Framework for Forest Fire Mapping(2012-03-29) Karpatne, Anuj; Chen, Xi; Chamber, Yashu; Mithal, Varun; Lau, Michael; Steinhaeuser, Karsten; Boriah, Shyam; Steinbach, Michael; Kumar, VipinForests are an important natural resource that support economic activity and play a significant role in regulating the climate and the carbon cycle, yet forest ecosystems are increasingly threatened by fires caused by a range of natural and anthropogenic factors. Mapping these fires, which can range in size from less than an acre to hundreds of thousands of acres, is an important task for supporting climate and carbon cycle studies as well as informing forest management. There are two primary approaches to fire mapping: field and aerial-based surveys, which are costly and limited in their extent; and remote sensing-based approaches, which are more cost-effective but pose several interesting methodological and algorithmic challenges. In this paper, we introduce a new framework for mapping forest fires based on satellite observations. Specifically, we develop spatio-temporal data mining methods for Moderate Resolution Imaging Spectroradiometer (MODIS) data to generate a history of forest fires. A systematic comparison with alternate approaches across diverse geographic regions demonstrates that our algorithmic paradigm is able to overcome some of the limitations in both data and methods employed by other prior efforts.Item An approach for global monitoring of surface water extent variations using MODIS data(2016-08-29) Khandelwal, Ankush; Karpatne, Anuj; Marlier, Miriam E.; Kim, Jongyoun; Lettenmaier, Dennis P.; Kumar, VipinFreshwater resources are among the most basic requirements of human society. Nonetheless, global information about the space-time variations of the area of freshwater bodies, and the water stored in them, is surprisingly limited. We introduce a MODIS-based algorithm to map the global areal extent of surface water bodies at 500m spatial resolution at nominal eight-day intervals from 2000 to 2015. We demonstrate the algorithm construction and performance for five reservoirs on four continents with different shapes. The algorithm performs well compared to satellite radar altimetry and in situ height measurements, and in comparison with surface area estimates based on higher resolution Landsat data. We further present a summary of our global scale results over 69 reservoirs for which altimetry measurements are available, and show that our surface area estimates match well with relative height variations and show significant improvements over previous estimates. One of the main reasons for these improvements is a novel post-processing technique that makes use of imperfect labels produced by supervised classification approaches on multiple dates to estimate the elevation structure of locations that is used to enhance the quality and completeness of imperfect labels. However, the approach is still challenged in regions with frequent cloud cover, snow and ice coverage, or complicated geometries that require finer spatial resolution remote sensing data. The surface area estimates we describe here are publically available.Item GLADD-R: A new Global Lake Dynamics Database for Reservoirs created using machine learning and satellite data(2019-04-01) Khandelwal, Ankush; Karpatne, Anuj; Wei, Zhihao; Kuang, Huangying; Ghosh, Rahul; Dugan, Hilary; Hanson, Paul; Kumar, VipinReservoirs play a crucial role for human sustenance as they provide freshwater for agriculture, power generation, human consumption, and recreation. A global database of reservoirs that provides their location and dynamics can be of great importance to the ecological community as it enables the study of the impact of human actions and climate change on fresh water availability. Here we present a new database, GLADD-R (Global Lake Dynamics Database-Reservoirs) that provides such information for 1882 reservoirs between 1 and 100 square kilometers in size that were created after 1985. The visualization of these reservoirs and their surface area time series is available at http://umnlcc.cs.umn.edu/GlobalReservoirDatabase/.Item Global Lake Monitoring using Group-specific Local Learning(2014-10-09) Karpatne, Anuj; Khandelwal, Ankush; Kumar, VipinGlobal lake monitoring is crucial for the effective management of water resources as well as for conducting studies that link the impact of lake dynamics on climate change. Remote sensing datasets offer an opportunity for global lake monitoring by providing discriminatory features that can help distinguish land and water bodies at a global scale and in a timely fashion. A major challenge in global lake monitoring using remote sensing datasets is the presence of a rich variety in the land and water bodies at a global scale, motivating the need for local learning algorithms that can take into account the heterogeneity in the data. We propose a novel group-specific local learning scheme that uses information about the local neighborhood of a group of test instances for estimating the relevant context for classification. By comparing the performance of the proposed scheme with baseline approaches over 180 lakes from diverse regions of the world, we are able to demonstrate that the proposed scheme provides significant improvements in the classification performance.Item Physics Guided RNNs for Modeling Dynamical Systems: A Case Study in Simulating Lake Temperature Profiles(2019-01-31) Jia, Xiaowei; Willard, Jared; Karpatne, Anuj; Read, Jordan; Zwart, Jacob; Steinbach, Michael; Kumar, VipinThis paper proposes a physics-guided recurrent neural network model (PGRNN) that combines RNNs and physics-based models to leverage their complementary strengths and improve the modeling of physical processes. Specifically, we show that a PGRNN can improve prediction accuracy over that of physical models, while generating outputs consistent with physical laws, and achieving good generalizability. Standard RNNs, even when producing superior prediction accuracy, often produce physically inconsistent results and lack generalizability. We further enhance this approach by using a pre-training method that leverages the simulated data from a physics-based model to address the scarcity of observed data. Although we present and evaluate this methodology in the context of modeling the dynamics of temperature in lakes, it is applicable more widely to a range of scientific and engineering disciplines where mechanistic (also known as process-based) models are used, e.g., power engineering, climate science, materials science, computational chemistry, and biomedicine.Item Predictive Learning with Heterogeneity in Populations(2017-10) Karpatne, AnujPredictive learning forms the backbone of several data-driven systems powering scientific as well as commercial applications, e.g., filtering spam messages, detecting faces in images, forecasting health risks, and mapping ecological resources. However, one of the major challenges in applying standard predictive learning methods in real-world applications is the heterogeneity in populations of data instances, i.e., different groups (or populations) of data instances show different nature of predictive relationships. For example, different populations of human subjects may show different risks for a disease even if they have similar diagnosis reports, depending on their ethnic profiles, medical history, and lifestyle choices. In the presence of population heterogeneity, a central challenge is that the training data comprises of instances belonging from multiple populations, and the instances in the test set may be from a different population than that of the training instances. This limits the effectiveness of standard predictive learning frameworks that are based on the assumption that the instances are independent and identically distributed (i.i.d), which are ideally true only in simplistic settings. This thesis introduces several ways of learning predictive models with heterogeneity in populations, by incorporating information about the context of every data instance, which is available in varying types and formats in different application settings. It introduces a novel multi-task learning framework for problems where we have access to some ancillary variables that can be grouped to produce homogeneous partitions of data instances, thus addressing the heterogeneity in populations. This thesis also introduces a novel strategy for constructing mode-specific ensembles in binary classification settings, where each class shows multi-modal distribution due to the heterogeneity in their populations. When the context of data instances is implicitly defined such that the test data is known to comprise of contextually similar groups, this thesis presents a novel framework for adapting classification decisions using the group-level properties of test instances. This thesis also builds the foundations of a novel paradigm of scientific discovery, termed as theory-guided data science, that seeks to explore the full potential of data science methods but without ignoring the treasure of knowledge contained in scientific theories and principles.Item ReaLSAT: A new Reservoir and Lake Surface Area Timeseries Dataset created using machine learning and satellite imagery(2020-08-04) Khandelwal, Ankush; Ghosh, Rahul; Wei, Zhihao; Kuang, Huangying; Dugan, Hilary; Hanson, Paul; Karpatne, Anuj; Kumar, VipinLakes and reservoirs, as most humans experience and use them, are dynamic three-dimensional bodies of water, with surface levels that rise and fall with seasonal precipitation patterns, long-term changes in climate, and human management decisions. A global dataset that provides the location and dynamics of water bodies can be of great importance to the ecological community as it enables the study of the impact of human actions and climate change on fresh water availability. This paper presents a new database, ReaLSAT (Reservoir and Lake Surface Area Timeseries) that has been created by analyzing spectral data from Earth Observation (EO) Satellites using novel machine learning (ML) techniques. These ML techniques can construct highly accurate surface area extents of water bodies at regular intervals despite the challenges arising from heterogeneity and missing or poor quality spectral data. The ReaLSAT dataset provides information for 669107 lakes and reservoirs between 0.1 and 100 square kilometers in size. The visualization of these water bodies and their surface area time series is also available online. The aim of this paper is to provide an overview of the dataset and a summary of some of the key insights that can be derived from the dataset.