Browsing by Subject "MapReduce"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item SpatialHadoop: A MapReduce Framework for Big Spatial Data(2016-06) Eldawy, AhmedThere has been a recent explosion in the amounts of spatial data produced by several devices such as smart phones, satellites, space telescopes, medical devices, among others. This variety of such spatial data makes it widely used across important applications such as brain simulations, identifying cancer clusters, tracking infectious disease, drug addiction, simulating climate changes, and event detection and analysis. While there are several distributed systems that are designed to handle Big Data in general, e.g., Hadoop, Hive, Spark, and Impala, they all fall short in supporting spatial data efficiently. As a result, there are great research efforts in either extending these systems or building new systems to efficiently support Big Spatial Data. In this thesis, we describe SpatialHadoop, a full-fledged system for spatial data which extends Hadoop in its core to efficiently support spatial data. SpatialHadoop is available as an open source software and has been already downloaded around 80,000 times. SpatialHadoop consists of four main layers, namely, language, indexing, query processing, and visualization. In the language layer, SpatialHadoop provides a high level language, termed Pigeon, which provides standard spatial data types and query processing for easy access to non-technical users. The indexing layer provides efficient spatial indexes, such as grid, R-tree, R+-tree, and Quad tree, which organize the data nicely in the distributed file system. The indexes follow a two-level design of one global index that partitions the data across machines, and multiple local indexes that organize records in each machine. The query processing layer encapsulates a set of spatial operations that ship with SpatialHadoop including basic spatial operations, join operations and computational geometry operations. The visualization layer allows users to explore big spatial data by generating images that provide bird’s-eye view on the data. SpatialHadoop is already used as a back bone in several real systems, including SHAHED, a web-based application for interactive exploration of satellite data.Item ST-Hadoop: A MapReduce Framework for Big Spatio-temporal Data Management(2019-05) Alarabi, LouaiApache Hadoop, employing the MapReduce programming paradigm, that has been widely accepted as the standard framework for analyzing big data in distributed environments. Unfortunately, this rich framework was not genuinely exploited towards processing large scale spatio-temporal data, especially with the emergence and popularity of applications that create them in large-scale. The huge volumes of spatio-temporal data come from applications, like Taxi fleet in urban computing, Asteroids in astronomy research studies, animal movements in habitat studies, neuron analysis in neuroscience research studies, and contents of social networks (e.g., Twitter or Facebook). Managing space and time are two fundamental characteristics that raised the demand for processing spatio-temporal data created by these applications. Besides the massive size of data, the complexity of shapes and formats associated with these data raised many challenges in managing spatio-temporal data. The goal of the dissertation is centered on establishing a full-fledged big spatio-temporal data management system that serves the need for a wide range of spatio-temporal applications. This involves indexing, querying, and analyzing spatio-temporal data. We propose ST-Hadoop; the first full-fledged open-source system with native support for big spatio-temporal data, available to download http://st-hadoop.cs.umn.edu/. ST- Hadoop injects spatio-temporal data awareness inside the highly popular Hadoop system that is considered state-of-the-art for off-line analysis of big data systems. Considering a distributed environment, we focus on the following: (1) indexing spatio-temporal data and (2) Supporting various fundamental spatio-temporal operations, such as range, kNN, and join (3) Supporting indexing and querying trajectories, which is considered as a special class of spatio-temporal data that require special handling. Throughout this dissertation, we will touch base on the background and related work, motivate for the proposed system, and highlight our contributions.