Journal ArticleDOI
Hadoop GIS: a high performance spatial data warehousing system over mapreduce
Ablimit Aji,Fusheng Wang,Hoang Vo,Rubao Lee,Qiaoling Liu,Xiaodong Zhang,Joel H. Saltz +6 more
- Vol. 6, Iss: 11, pp 1009-1020
Reads0
Chats0
TLDR
Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop and integrated into Hive to support declarative spatial queries with an integrated architecture is presented.Abstract:
Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.read more
Citations
More filters
Proceedings ArticleDOI
A GPU-friendly Geometric Data Model and Algebra for Spatial Queries
Harish Doraiswamy,Juliana Freire +1 more
TL;DR: A new model that represents spatial data as geometric objects and define an algebra consisting of GPU-friendly composable operators that operate over these objects is proposed, which shows that it is orders of magnitude faster than a CPU-based implementation and outperforms custom GPU-based approaches.
Journal ArticleDOI
Optimizing and accelerating space–time Ripley ’s K function based on Apache Spark for distributed spatiotemporal point pattern analysis
TL;DR: A distributed computing method to accelerate space–time Ripley’s K function upon state-of-the-art distributed computing framework Apache Spark is presented, and four strategies are adopted to simplify calculation procedures and accelerate distributed computing respectively.
Proceedings ArticleDOI
On Spatial Joins in MapReduce
Ibrahim Sabek,Mohamed F. Mokbel +1 more
TL;DR: This paper provides the first attempt for a full-fledged query optimizer for MapReduce-based spatial join algorithms by developing its own taxonomy that covers almost all possible ways of doing a spatial join for any two input datasets.
Proceedings ArticleDOI
High performance integrated spatial big data analytics
TL;DR: A scalable spatial query based data integration engine with MapReduce is provided, and integrated spatial data analytics by consolidating multiple data sources provides significant potential for data quality improvement in terms of completeness and accuracy, and much increased values derived from the data.
Journal ArticleDOI
High Performance Processing and Analysis of Geospatial Data Using CUDA on GPU
TL;DR: In this article, high-performance processing of massive geospatial data on many-core GPU (Graphic Processing Unit) is presented using CUDA (Compute Unified Device Architecture) pro...
References
More filters
Proceedings ArticleDOI
The R*-tree: an efficient and robust access method for points and rectangles
TL;DR: The R*-tree is designed which incorporates a combined optimization of area, margin and overlap of each enclosing rectangle in the directory which clearly outperforms the existing R-tree variants.
Proceedings ArticleDOI
Pig latin: a not-so-foreign language for data processing
TL;DR: A new language called Pig Latin is described, designed to fit in a sweet spot between the declarative style of SQL, and the low-level, procedural style of map-reduce, which is an open-source, Apache-incubator project, and available for general use.
Journal ArticleDOI
Hive: a warehousing solution over a map-reduce framework
Ashish Thusoo,Joydeep Sen Sarma,Namit Jain,Zheng Shao,Prasad Chakka,Suresh Anthony,Hao Liu,Pete Wyckoff,Raghotham Murthy +8 more
TL;DR: Hadoop is a popular open-source map-reduce implementation which is being used as an alternative to store and process extremely large data sets on commodity hardware.
Journal ArticleDOI
MapReduce: a flexible data processing tool
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: MapReduce advantages over parallel databases include storage-system independence and fine-grain fault tolerance for large jobs.
Proceedings ArticleDOI
A comparison of approaches to large-scale data analysis
Andrew Pavlo,Paulson Erik S,Alexander Rasin,Daniel J. Abadi,David J. DeWitt,Samuel Madden,Michael Stonebraker +6 more
TL;DR: A benchmark consisting of a collection of tasks that are run on an open source version of MR as well as on two parallel DBMSs shows a dramatic performance difference between the two paradigms.