Eldawy, Ahmed2016-09-192016-09-192016-06https://hdl.handle.net/11299/182261University of Minnesota Ph.D. dissertation. June 2016. Major: Computer Science. Advisor: Mohamed Mokbel. 1 computer file (PDF); viii, 136 pages.There has been a recent explosion in the amounts of spatial data produced by several devices such as smart phones, satellites, space telescopes, medical devices, among others. This variety of such spatial data makes it widely used across important applications such as brain simulations, identifying cancer clusters, tracking infectious disease, drug addiction, simulating climate changes, and event detection and analysis. While there are several distributed systems that are designed to handle Big Data in general, e.g., Hadoop, Hive, Spark, and Impala, they all fall short in supporting spatial data efficiently. As a result, there are great research efforts in either extending these systems or building new systems to efficiently support Big Spatial Data. In this thesis, we describe SpatialHadoop, a full-fledged system for spatial data which extends Hadoop in its core to efficiently support spatial data. SpatialHadoop is available as an open source software and has been already downloaded around 80,000 times. SpatialHadoop consists of four main layers, namely, language, indexing, query processing, and visualization. In the language layer, SpatialHadoop provides a high level language, termed Pigeon, which provides standard spatial data types and query processing for easy access to non-technical users. The indexing layer provides efficient spatial indexes, such as grid, R-tree, R+-tree, and Quad tree, which organize the data nicely in the distributed file system. The indexes follow a two-level design of one global index that partitions the data across machines, and multiple local indexes that organize records in each machine. The query processing layer encapsulates a set of spatial operations that ship with SpatialHadoop including basic spatial operations, join operations and computational geometry operations. The visualization layer allows users to explore big spatial data by generating images that provide bird’s-eye view on the data. SpatialHadoop is already used as a back bone in several real systems, including SHAHED, a web-based application for interactive exploration of satellite data.enBig DataHadoopMapReduceSpatialSpatialHadoopSpatialHadoop: A MapReduce Framework for Big Spatial DataThesis or Dissertation