Hadoop GIS: a high performance spatial data warehousing system over mapreduce

doi:10.14778/2536222.2536227

Journal Article•DOI•

Hadoop GIS: a high performance spatial data warehousing system over mapreduce

Ablimit Aji¹, Fusheng Wang¹, Hoang Vo¹, Rubao Lee², Qiaoling Liu¹, Xiaodong Zhang², Joel H. Saltz¹ - Show less +3 more•Institutions (2)

Emory University¹, Ohio State University²

01 Aug 2013-Vol. 6, Iss: 11, pp 1009-1020

TL;DR: Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop and integrated into Hive to support declarative spatial queries with an integrated architecture is presented.

read less

Abstract: Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.

...read moreread less

Content maybe subject to copyright Report

Hadoop GIS: a high performance spatial data warehousing system over mapreduce

Citations

Cites methods from "Hadoop GIS: a high performance spat..."

Cites result from "Hadoop GIS: a high performance spat..."

Cites methods from "Hadoop GIS: a high performance spat..."

Cites background from "Hadoop GIS: a high performance spat..."

References

"Hadoop GIS: a high performance spat..." refers methods in this paper

"Hadoop GIS: a high performance spat..." refers background in this paper

"Hadoop GIS: a high performance spat..." refers background in this paper

"Hadoop GIS: a high performance spat..." refers background in this paper

"Hadoop GIS: a high performance spat..." refers background or methods in this paper

Related Papers (5)