Journal ArticleDOI
Hadoop GIS: a high performance spatial data warehousing system over mapreduce
Ablimit Aji,Fusheng Wang,Hoang Vo,Rubao Lee,Qiaoling Liu,Xiaodong Zhang,Joel H. Saltz +6 more
- Vol. 6, Iss: 11, pp 1009-1020
Reads0
Chats0
TLDR
Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop and integrated into Hive to support declarative spatial queries with an integrated architecture is presented.Abstract:
Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.read more
Citations
More filters
Journal ArticleDOI
Big Data and cloud computing: innovation opportunities and challenges
TL;DR: This review introduces future innovations and a research agenda for cloud computing supporting the transformation of the volume, velocity, variety and veracity into values of Big Data for local to global digital earth science and applications.
Proceedings ArticleDOI
SpatialHadoop: A MapReduce framework for spatial data
Ahmed Eldawy,Mohamed F. Mokbel +1 more
TL;DR: SpatialHadoop is a comprehensive extension to Hadoop that injects spatial data awareness in each Hadoan layer, namely, the language, storage, MapReduce, and operations layers, with orders of magnitude better performance than Hadoops for spatial data processing.
Journal ArticleDOI
Remote sensing big data computing
TL;DR: A brief overview on the Big Data and data-intensive problems, including the analysis of RS Big Data, Big Data challenges, current techniques and works for processing RS Big data is given.
Proceedings ArticleDOI
GeoSpark: a cluster computing framework for processing large-scale spatial data
Jia Yu,Jinxuan Wu,Mohamed Sarwat +2 more
TL;DR: This paper introduces GeoSpark an in-memory cluster computing framework for processing large-scale spatial data that achieves better run time performance than its Hadoop-based counterparts (e.g., SpatialHadoop).
Proceedings ArticleDOI
Simba: Efficient In-Memory Spatial Analytics
TL;DR: Simba is a scalable and efficient in-memory spatial query processing and analytics for big spatial data that extends the Spark SQL engine to support rich spatial queries and analytics through both SQL and the DataFrame API.
References
More filters
Journal ArticleDOI
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
TL;DR: This paper explores the feasibility of building a hybrid system that takes the best features from both technologies; the prototype built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.
Journal ArticleDOI
SCOPE: easy and efficient parallel processing of massive data sets
Ronnie Chaiken,Bob Jenkins,Per-Ake Larson,Bill Ramsey,Darren A. Shakib,Simon Weaver,Jingren Zhou +6 more
TL;DR: A new declarative and extensible scripting language, SCOPE (Structured Computations Optimized for Parallel Execution), targeted for this type of massive data analysis, designed for ease of use with no explicit parallelism, while being amenable to efficient parallel execution on large clusters.
Proceedings Article
Hilbert R-tree: An Improved R-tree using Fractals
Ibrahim Kamel,Christos Faloutsos +1 more
TL;DR: In this article, the Hilbert R-tree structure is proposed to facilitate the deferred splitting in R-trees by proposing an ordering on the R -tree nodes, in the sense that it should group similar data rectangles to gether, to minimize the area and perimeter of the resulting minimum bounding rectangles.
Journal ArticleDOI
MapReduce and parallel DBMSs: friends or foes?
Michael Stonebraker,Daniel J. Abadi,David J. DeWitt,Samuel Madden,Paulson Erik S,Andrew Pavlo,Alexander Rasin +6 more
TL;DR: MapReduce complements DBMSs since databases are not designed for extract-transform-load tasks, a MapReduce specialty.
Proceedings ArticleDOI
A comparison of join algorithms for log processing in MaPreduce
TL;DR: Key implementation details of a number of well-known join strategies in MapReduce are described and a comprehensive experimental comparison of these join techniques on a 100-node Hadoop cluster is presented.