Computer Science & Engineering (CS&E) Technical Reports

Persistent link for this collectionhttps://hdl.handle.net/11299/214969

Located in this collection are publications by the department, faculty, students, and visiting members.

Want to submit a Technical Report? See the Technical Report Submission Instructions.

Searching for a technical report

The search bar below supports general keyword search, but can also be used to locate a report by its title or report number. Use quotation marks around a phrase or title to return reports that use that exact phrase. To search for a specific report number, use the following search string: "Technical Report; [Report Number]" (e.g. "Technical Report; 19-001"). You must include the quotations when searching for a report number.

Search within Computer Science & Engineering (CS&E) Technical Reports

Browse

Recent Submissions

Now showing 1 - 20 of 749
  • Item
    Understanding COVID-19 Effects on Mobility: A Community-Engaged Approach
    (2022) Sharma, Arun; Farhadloo, Majid; Li, Yan; Kulkarni, Aditya; Gupta, Jayant; Shekhar, Shashi
    Given aggregated mobile device data, the goal is to understand the impact of COVID-19 policy interventions on mobility. This problem is vital due to important societal use cases, such as safely reopening the economy. Challenges include understanding and interpreting questions of interest to policymakers, cross-jurisdictional variability in choice and time of interventions, the large data volume, and unknown sampling bias. The related work has explored the COVID-19 impact on travel distance, time spent at home, and the number of visitors at different points of interest. However, many policymakers are interested in long-duration visits to high-risk business categories and understanding the spatial selection bias to interpret summary reports. We provide an Entity Relationship diagram, system architecture, and implementation to support queries on long-duration visits in addition to fine resolution device count maps to understand spatial bias. We closely collaborated with policymakers to derive the system requirements and evaluate the system components, the summary reports, and visualizations.
  • Item
    Source Aware Modulation for leveraging limited data from heterogeneous sources
    (2021) Li, Xiang; Khandelwal, Ankush; Ghosh, Rahul; Renganathan, Arvind; Willard, Jared; Xu, Shaoming; Jia, Xiaowei; Shu, Lele; Teng, Victor; Steinbach, Michael; Nieber, John; Duffy, Christopher; Kumar, Vipin
    In many personalized prediction applications, sharing information between entities/tasks/sources is critical to address data scarcity. Furthermore, inherent characteristics of sources distinguish relationships between input drivers and response variables across entities. For example, for the same amount of rainfall (input driver), two different basins will have very different streamflow (response variable) values depending on the basin characteristics (e.g., soil porosity, slope, …). Given such heterogeneity, a trivial merging of data without source characteristics would lead to poor personalized predictions. In recent years, meta-learning has become a very popular framework to learn generalized global models that can be easily adapted (fine-tuned) for individual sources. In this talk, we present an exhaustive analysis of the source-aware modulation based meta-learning approach. Source-aware modulation adjusts the shared hidden features based on source characteristics. The adjusted hidden features are then used to calculate the response variable for individual sources. Although this strategy shows promising prediction improvement, its applicability is limited in certain applications where source characteristics might not be available (especially due to privacy concerns). In this work, we show that robust personalized predictions can be achieved even in the absence of explicit source characteristics. We investigated the performance of different modulation strategies under various data sparsity settings on two datasets. We demonstrate that source-aware modulation is a very viable solution (with or without known characteristics) compared to traditional meta-learning methods such as model agnostic meta-learning.
  • Item
    ReaLSAT: A new Reservoir and Lake Surface Area Timeseries Dataset created using machine learning and satellite imagery
    (2020-08-04) Khandelwal, Ankush; Ghosh, Rahul; Wei, Zhihao; Kuang, Huangying; Dugan, Hilary; Hanson, Paul; Karpatne, Anuj; Kumar, Vipin
    Lakes and reservoirs, as most humans experience and use them, are dynamic three-dimensional bodies of water, with surface levels that rise and fall with seasonal precipitation patterns, long-term changes in climate, and human management decisions. A global dataset that provides the location and dynamics of water bodies can be of great importance to the ecological community as it enables the study of the impact of human actions and climate change on fresh water availability. This paper presents a new database, ReaLSAT (Reservoir and Lake Surface Area Timeseries) that has been created by analyzing spectral data from Earth Observation (EO) Satellites using novel machine learning (ML) techniques. These ML techniques can construct highly accurate surface area extents of water bodies at regular intervals despite the challenges arising from heterogeneity and missing or poor quality spectral data. The ReaLSAT dataset provides information for 669107 lakes and reservoirs between 0.1 and 100 square kilometers in size. The visualization of these water bodies and their surface area time series is also available online. The aim of this paper is to provide an overview of the dataset and a summary of some of the key insights that can be derived from the dataset.
  • Item
    Physics-Guided Anomalous Trajectory Detection: Technical Report
    (2020) Shrinivasa Nairy, Divya; Adila, Dyah; Li, Yan; Shekhar, Shashi
    Given ship trajectory data for a region, this paper proposes a physics-guided approach to detect anomalous trajectories. This problem is important for detection of illegal fishing or cargo transfer, which cause environmental and societal damage. This problem is challenging due to the presence of gaps in trajectories. Current state-of-the-art approaches either ignore the gaps or fill them using simple linear interpolation, which underestimates the ship’s possible locations during the gap. This paper proposes a novel physics-guided gap-aware anomaly detection test that incorporates physical constraints using a space-time prism. The proposed approach is evaluated with a case study using Marine Cadastre data of ships traversing in the Aleutian Islands region of Alaska in October 2017. A trajectory that could have traversed a marine protected area is correctly flagged by the proposed approach for investigation.
  • Item
    Vehicle Emissions Prediction with Physics-Aware AI Models: Technical Report
    (2020) Panneer Selvam, Harish; Li, Yan; Wang, Pengyue; Northrop, William F; Shekhar, Shashi
    Given an on-board diagnostics (OBD) dataset and a physics-based emissions prediction model, this paper aims to develop an accurate and computational-efficient AI (Artificial Intelligence) method that predicts vehicle emissions values. The problem is of societal importance because vehicular emissions lead to climate change and impact human health. This problem is challenging because the OBD data does not contain enough parameters needed by high-order physics models. Conversely, related work has shown that low-order physics models have poor predictive accuracy when using available OBD data. This paper uses a divergent window co-occurrence pattern detection method to develop a spatiotemporal variability-aware AI model for predicting emission values from the OBD datasets. We conducted a case-study using real-world OBD data from a local public transportation agency. Results show that the proposed AI method has approximately 65% improved predictive accuracy than a non-AI low-order physics model and is approximately 35% more accurate than a baseline model.
  • Item
    Constellation Plan B Report
    (2020-01-03) Ramkumar, Aravind Alagiri; Sindhu, Rohit; Weissman, Jon
    Data explosion has been exponential in the last decade. A new and major addition to this phenomenon is the Internet of Things (IoT). With the technology becoming cheap and affordable, there has been a boom in the number of smart devices which generates huge amounts of data. All of these devices may have varying connectivity and are made of diparte designs. IoT is one platform which can help us make use of this huge amount of data being generated by thesis heterogeneous devices. Since, these devices are very different in design and architecture, we propose a system which provides a unified view to this heterogeneous world. We propose a dynamic P2P and edge based system named Constellation. This work presents the work pertaining to creating this network, the inherent dynamism involved in such a system and ways to utilize the system to provide useful analysis by executing global and local tasks.
  • Item
    Linear Hotspot Discovery on All Simple Paths: A Summary of Results
    (2019-09-10) Tang, Xun; Gupta, Jayant; Shekhar, Shashi
    Spatial hotspot discovery aims at discovering regions with statistically significant concentration of activities. It has shown great value in many important societal applications such as transportation engineering, public health, and public safety. This paper formulates the problem of Linear Hotspot Detection on All Simple Paths (LHDA) which identifies hotspots from the complete set of simple paths enumerated from a given spatial network. LHDA overcomes the limitations of existing methods which miss hotspots that naturally occur along linear simple paths on a road network. The problem is #p-hard due to the exponential number of simple paths. To address the computational challenges, we propose a novel algorithm named bi-directional fragment-multi-graph traversal (ASP_FMGT) and two path reduction approaches ASP_NR and ASP_HD. Extensive theoretical and experimental analyses show that ASP_FMGT has substantially improved performance over a baseline approach using depth-first-search with backtracking (ASP_Base) while keeping the solution complete and correct. Moreover, case studies on real-world datasets showed that ASP_FMGT outperforms existing approaches, including by discovering new hotspots unknown before and achieving higher accuracy for locating known hotspots.
  • Item
    UAV Landing at an Unknown Location Marked by a Radio Beacon
    (2019-07-29) Stefas, Nikolaos; Bayram, Haluk; Isler, Volkan
    We consider the problem of minimizing the time to approach and land near a target radio beacon at an unknown location with an Unmanned Aerial Vehicle (UAV). We show that a cone-like region exists above the target inside of which bearing measurements of a directional antenna lose directionality: signal recordings in all directions yield similar signal strength. We present a geometric model of this region based on antenna simulations and data collected with a real system. Our main contribution is a strategy that takes advantage of a UAV's ability to change altitude and exploits a special structure occurring when approaching the target beacon from above to reduce the flight time required to land near the beacon. We analyze the performance of our strategy and demonstrate through simulations that by exploiting this structure we can achieve shorter flight times than our previous work.
  • Item
    An Approximation Algorithm for Online Coverage Path Planning with the Energy Constraint
    (2019-06-17) Wei, Ming; Isler, Volkan
    In online coverage path planning, the objective is to plan paths for a robot to visit every point in an unknown environment as quickly as possible. We consider online coverage with an energy constraint which is modeled as the maximum distance B the robot can travel after a full recharge. There is a single charging station in the environment which needs to be revisited before the robot runs out of energy. Under this model, it has been proven that any algorithm for online coverage path planning with the energy constraint must have an approximation rate of ?(log B/4r) [1], where r is the robot diameter. In this paper, we present an optimal online algorithm whose worst-case performance matches this lower bound asymptotically. We present simulations and field experiments which demonstrate the applicability of our algorithm in real applications.
  • Item
    Open Science: GitHub Site Publication of Hotspot Algorithms
    (2019-05-02) Jing, Wen
    Open Science creates a resource for science community. It promotes reproducibility, so the result can be re-achieve. Democratizing Science allows public help check the correctness of the lab result and accelerate the lab developing process. This project contributes on both open science and democratizing science using GitHub by modularizing the lab code and filing all the explanation files to eliminate the gap between lab achievement and ordinary users.
  • Item
    Turning a Corner with a Dubins Car
    (2019-04-11) Koval, Alan; Isler, Volkan
    We study the problem of computing shortest collision-free Dubins paths when turning a corner. We present a sufficient condition for a closed-form solution. Specifically, consider S as the set consisting of paths of the form RSRSR, RSRSL, LSRSR and LSRSL that pass through the interior corner, where sub-paths RSR, RSL, and LSR are elementary Dubins paths composed of segments which are either straight (S) or turning left (L) or right (R). We find the closed-form optimal path around a corner when S is nonempty. Our solution can be used in an efficient path planner, for example, when navigating corridors. It can also be used as a subroutine for planners such as RRTs.
  • Item
    GLADD-R: A new Global Lake Dynamics Database for Reservoirs created using machine learning and satellite data
    (2019-04-01) Khandelwal, Ankush; Karpatne, Anuj; Wei, Zhihao; Kuang, Huangying; Ghosh, Rahul; Dugan, Hilary; Hanson, Paul; Kumar, Vipin
    Reservoirs play a crucial role for human sustenance as they provide freshwater for agriculture, power generation, human consumption, and recreation. A global database of reservoirs that provides their location and dynamics can be of great importance to the ecological community as it enables the study of the impact of human actions and climate change on fresh water availability. Here we present a new database, GLADD-R (Global Lake Dynamics Database-Reservoirs) that provides such information for 1882 reservoirs between 1 and 100 square kilometers in size that were created after 1985. The visualization of these reservoirs and their surface area time series is available at http://umnlcc.cs.umn.edu/GlobalReservoirDatabase/.
  • Item
    MESH: A Flexible Distributed Hypergraph Processing System
    (2019-03-31) Heintz, Benjamin; Hong, Rankyung; Singh, Shivangi; Khandelwal, Guarav; Tesdahl, Corey; Chandra, Abhishek
    With the rapid growth of large online social networks, the ability to analyze large-scale social structure and behavior has become critically important, and this has led to the development of several scalable graph processing systems. In reality, however, social interaction takes place not only between pairs of individuals as in the graph model, but rather in the context of multi-user groups. Research has shown that such group dynamics can be better modeled through a more general hypergraph model, resulting in the need to build scalable hypergraph processing systems. In this paper, we present MESH, a flexible distributed framework for scalable hypergraph processing. MESH provides an easy-to-use and expressive application programming interface that naturally extends the “think like a vertex” model common to many popular graph processing systems. Our framework provides a flexible implementation based on an underlying graph processing system, and enables different design choices for the key implementation issues of partitioning a hypergraph representation. We implement MESH on top of the popular GraphX graph processing framework in Apache Spark. Using a variety of real datasets and experiments conducted on a local 8-node cluster as well as a 65-node Amazon AWS testbed, we demonstrate that MESH provides flexibility based on data and application characteristics, as well as scalability with cluster size. We further show that it is competitive in performance to HyperX, another hypergraph processing system based on Spark, while providing a much simpler implementation (requiring about 5X fewer lines of code), thus showing that simplicity and flexibility need not come at the cost of performance.
  • Item
    Physics Guided RNNs for Modeling Dynamical Systems: A Case Study in Simulating Lake Temperature Profiles
    (2019-01-31) Jia, Xiaowei; Willard, Jared; Karpatne, Anuj; Read, Jordan; Zwart, Jacob; Steinbach, Michael; Kumar, Vipin
    This paper proposes a physics-guided recurrent neural network model (PGRNN) that combines RNNs and physics-based models to leverage their complementary strengths and improve the modeling of physical processes. Specifically, we show that a PGRNN can improve prediction accuracy over that of physical models, while generating outputs consistent with physical laws, and achieving good generalizability. Standard RNNs, even when producing superior prediction accuracy, often produce physically inconsistent results and lack generalizability.  We further enhance this approach by using a pre-training method that leverages the simulated data from a physics-based model to address the scarcity of observed data. Although we present and evaluate this methodology in the context of modeling the dynamics of temperature in lakes, it is applicable more widely to a range of scientific and engineering disciplines where mechanistic (also known as process-based) models are used, e.g., power engineering, climate science, materials science, computational chemistry, and biomedicine.
  • Item
    An Investigation of Composable Language Extensions for Parallel Programming
    (2019-01-17) Carlson, Travis; Coomey, Ciaradh; Councilman, Aaron; Stephen, Patrick; Van Wyk, Eric
    This paper demonstrates how parallel programming language features can be specified as composable language extensions to a general-purpose host programming language. To illustrate the expressiveness of language abstractions defined in this way we have re-implemented, as language extensions, various abstractions previously described in the literature that were implemented as part of traditional monolithic programming languages. These include abstractions for spawning parallel tasks, abstractions implementing bounded buffered channels and lattice-based variables to simplify the communication between parallel tasks, and domain-specific abstractions for simple specification of tensor computations backed by a code-generation technique for the efficient storage and processing of tensor computations.      The general-purpose host language is C, as implemented in the AbleC extensible C compiler framework. This system provides certain guarantees of extension composability that ensure that independentlydeveloped language extensions can be automatically composed by programmers that are not experts in language design or implementation. Thus programmers can freely select the abstractions that match their programming problem, their preferred programming style and level of expertise, and their desired performance requirements. This approach also provides benefits to researchers designing and developing new abstractions for parallel programming. It allows them to focus their efforts on the implementation of their new abstractions and re-use the host language implementation of general purpose features such as arithmetic expressions, control-flow statements, type checking, and other basic compiler infrastructure.
  • Item
    Adaptive View Planning for Aerial 3D Reconstruction
    (2018-09-17) Peng, Cheng; Isler, Volkan
    With the proliferation of small aerial vehicles, acquiring close up imagery for high-quality reconstruction is gaining importance. We present an adaptive view planning method to collect such images in an automated fashion. We first start by sampling a small set of views tobuild a coarse proxy to the scene. We then present (i)~a method that builds a set of adaptive viewing planes for efficient view selectionand (ii)~an algorithm to plan a trajectory that guarantees high reconstruction quality which does not deviates too much from the optimal one.The vehicle then follows the trajectory to cover the scene, and the procedure is repeated until reconstruction quality converges or a desired level of quality is achieved.The set of viewing planes provides an effective compromise between using the entire 3D free space and using a single view hemisphere to select the views.We compare our algorithm to existing methods in three challenging scenes. Our algorithm generates views which produce the least reconstruction error comparing to three different baseline approaches.
  • Item
    An Introduction to Spatial Data Mining
    (2018-08-08) Golmohammadi, Jamal; Xie, Yiqun; Gupta, Jayant; Li, Yan; Cai, Jiannan; Detor, Samantha; Roh, Abigail; Shekhar, Shashi
    The goal of spatial data mining is to discover potentially useful, interesting, and non-trivial patterns from spatial datasets. Spatial data mining is important for societal applications in public health, public safety, agriculture, environmental science, climate etc. For example,in epidemiology, spatial data mining helps to find areas with a high concentrations of disease incidents to manage disease outbreaks. Computerized methods are needed to discover spatial patterns since the volume and velocity of spatial data exceeds the number of human experts available to analyze it. In addition, spatial data has unique characteristics like spatial autocorrelation and spatial heterogeneity which violate the i.i.d (Independent and Identically Distributed data samples) assumption of traditional statistics and data mining methods. So, using traditional methods may miss patterns or may yield spurious patterns which are costly (e.g., stigmatization) in spatial applications. Also, there are other intrinsic challenges such as MAUP (Modifiable Areal Unit Problem) as illustrated by a current court case debating gerrymandering in elections. Spatial data mining considers the unique characteristics, and challenges of spatial data and domain knowledge of the target application to discover more accurate and interesting patterns.In this article, we discuss tools and computational methods of spatial data mining, focusing on the primary spatial pattern families: hotspot detection, colocation detection, spatial prediction and spatial outlier detection. Hotspot detection methods use domain information to model accurately more active and high density areas. Colocation detection methods find objects whose instances are in proximity of each other in a location. Spatial prediction approaches explicitly model neighborhood relationship of locations to predict target variables from input features. The goal of spatial outlier detection methods is to find data that are different from their neighbors.
  • Item
    Turning a Corner with a Dubins Car
    (2018-09-14) Koval, Alan; Isler, Volkan
    We study the problem of computing shortest collision-free Dubins paths when turning a corner. We present a sufficient condition for a closed-form solution. Specifically, consider S as the set consisting of paths of the form RSRSR, RSRSL, LSRSR and LSRSL that pass through the interior corner, where sub-paths RSR, RSL, and LSR are elementary Dubins paths composed of segments which are either straight (S) or turning left (L) or right (R). We find the closed-form optimal path around a corner when S is nonempty. Our solution can be used in an efficient path planner, for example, when navigating corridors. It can also be used as a subroutine for planners such as RRTs. An implementation and online demo is provided at: sites.google.com/umn.edu/rsnlabdemos/home/dubins-2018
  • Item
    Asynchronous Network Formation in Unknown Unbounded Environments
    (2018-09-17) Engin, Selim; Isler, Volkan
    In this paper, we study the Online Network Formation Problem (ONFP) for a mobile multi-robot system. Consider a group of robots with a bounded communication range operating in a large open area. One of the robots has a piece of information which has to be propagated to all other robots. What strategy should the robots pursue to disseminate the information to the rest of the robots as quickly as possible? The initial locations of the robots are unknown to each other, therefore the problem must be solved in an online fashion. For this problem, we present an algorithm whose competitive ratio is $O(H cdot max{M,sqrt{M H}})$ for arbitrary robot deployments, where $M$ is the largest edge length in the Euclidean minimum spanning tree on the initial robot configuration and $H$ is the height of the tree. We also study the case when the robot initial positions are chosen uniformly at random and improve the ratio to $O(M)$. Finally, we present simulation results to validate the performance in larger scales and demonstrate our algorithm using three robots in a field experiment.
  • Item
    Computer Aided Diagnosis of Skin Lesions from Morphological Features
    (2018-08-24) Heller, Nicholas; Bussmann, Erika; Shah, Aneri; Dean, Joshua; Papanikolopoulos, Nikolaos
    Skin cancer is the most common cancer, accounting for over 40% of all cancer cases. The morphological features of skin lesions are an integral component of skin cancer detection and diagnosis. With the rapid progress in the field of image classification, increasing attention has been put towards the Computer Aided Diagnosis of skin lesions based on  their  morphological  features.  The  International  Skin  Imaging  Collaboration (ISIC) Archive is the largest publicly available collection of dermoscopic images of skin lesions, and in 2018 ISIC hosted an image recognition challenge for dermoscopic images. This short paper describes the CAD system that we submitted to this challenge.