scispace - formally typeset
Journal ArticleDOI

Hadoop GIS: a high performance spatial data warehousing system over mapreduce

Reads0
Chats0
TLDR
Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop and integrated into Hive to support declarative spatial queries with an integrated architecture is presented.
Abstract
Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article

A MapReduce based Big-data Framework for Object Extraction from Mosaic Satellite Images

TL;DR: In this paper, a framework stitching of vector representations of large scale raster mosaic images in distributed computing model is proposed to eliminate the negative effect of the lack of resources of the central system and scalability problem can be eliminated.
Posted Content

Efficient Methods and Parallel Execution for Algorithm Sensitivity Analysis with Parameter Tuning on Microscopy Imaging Datasets.

TL;DR: The sensitivity analysis framework provides a range of strategies for the efficient exploration of the parameter space, as well as multiple indexes to evaluate the effect of parameter modification to outputs or even correlation between parameters.
Journal ArticleDOI

A Raster Data Framework Based on Distributed Heterogeneous Cluster

TL;DR: This paper presents a data placement strategy across a distributed HDFS cluster in a way to optimize spatial data retrieval and processing and shows that it can deliver good performance benefits by way of reading blocks of data at almost 10–12 times the default, which contributes to the improvement of the various applications that use region growing methods.
Journal ArticleDOI

3DPro: Querying Complex Three-Dimensional Data with Progressive Compression and Refinement

TL;DR: 3DPro, a system that supports efficient spatial queries for complex 3D objects through a novel Filter-Progressive-Refine paradigm, which out-performs the state-of-the-art 3D data processing techniques by up to an order of magnitude for typical spatial queries.
Journal ArticleDOI

Efficient Processing of Spatio-Temporal Joins on IoT Data

TL;DR: The proposed method divides the 3D spatio-temporal space into small, equal-sized spaces, called cells, and retrieves only the data within or near the identified cells and performs the join only between the retrieved data.
References
More filters
Proceedings ArticleDOI

The R*-tree: an efficient and robust access method for points and rectangles

TL;DR: The R*-tree is designed which incorporates a combined optimization of area, margin and overlap of each enclosing rectangle in the directory which clearly outperforms the existing R-tree variants.
Proceedings ArticleDOI

Pig latin: a not-so-foreign language for data processing

TL;DR: A new language called Pig Latin is described, designed to fit in a sweet spot between the declarative style of SQL, and the low-level, procedural style of map-reduce, which is an open-source, Apache-incubator project, and available for general use.
Journal ArticleDOI

Hive: a warehousing solution over a map-reduce framework

TL;DR: Hadoop is a popular open-source map-reduce implementation which is being used as an alternative to store and process extremely large data sets on commodity hardware.
Journal ArticleDOI

MapReduce: a flexible data processing tool

TL;DR: MapReduce advantages over parallel databases include storage-system independence and fine-grain fault tolerance for large jobs.
Proceedings ArticleDOI

A comparison of approaches to large-scale data analysis

TL;DR: A benchmark consisting of a collection of tasks that are run on an open source version of MR as well as on two parallel DBMSs shows a dramatic performance difference between the two paradigms.