Hadoop GIS: a high performance spatial data warehousing system over mapreduce
Citations
545 citations
Cites methods from "Hadoop GIS: a high performance spat..."
...Search, query, indexing and data model design Performance is critical in Big Data era, and accurately and quickly locating data requires a new generation of search engines and query systems (Miyano and Uehara 2012; Aji et al. 2013)....
[...]
475 citations
Cites result from "Hadoop GIS: a high performance spat..."
...Similar to Hadoop, a SpatialHadoop cluster contains one master node that breaks a map-reduce job into smaller tasks, carried out by slave nodes....
[...]
460 citations
Cites methods from "Hadoop GIS: a high performance spat..."
...In addition, the Hadoop–GIS [57] system for large-scale spatial data processing, search and accessing is also build upon the Hadoop system....
[...]
332 citations
228 citations
Cites background from "Hadoop GIS: a high performance spat..."
...Hadoop GIS [11] is a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop....
[...]
...For join operations (using 3 million records in each table), as shown in Figure 11, Simba runs distance join 1.5x faster than SpatialSpark, 25x faster than Hadoop GIS, and 26x faster than DBMS X. Note that distance join over point objects is not natively supported in SpatialHadoop....
[...]
...To make the matter worse, if we want to retrieve (or do analyses over) the intersection of results from multiple kNN queries, more complex expressions such as nested sub-queries will be involved....
[...]
...Note that other spatial analytics systems (GeoSpark, SpatialSpark, SpatialHadoop, and Hadoop GIS) do not support more than two dimensions....
[...]
...For example, Simba builds its index (which uses R-tree for both local indexes and the global index) over 1 billion records (60GB in file size) in around 25 minutes, which is 2.5x faster than SpatialHadoop, 3x faster than SpatialSpark, 12x faster than Hadoop GIS, and 15x faster than Geomesa....
[...]
References
4,686 citations
"Hadoop GIS: a high performance spat..." refers methods in this paper
...The spatial filtering component performs MBR based spatial join filtering with the two R*-Trees, and refinement on the spatial join condition is further performed on the polygon pairs through geometric computations....
[...]
...Bulk spatial index building is performed on each dataset to generate index files – here we use R*-Trees [12]....
[...]
2,058 citations
"Hadoop GIS: a high performance spat..." refers background in this paper
...MapReduce systems with high-level declarative languages include Pig Latin/Pig [25, 19], SCOPE [17], and HiveQL/Hive [29]....
[...]
1,785 citations
"Hadoop GIS: a high performance spat..." refers background in this paper
...MapReduce systems with high-level declarative languages include Pig Latin/Pig [25, 19], SCOPE [17], and HiveQL/Hive [29]....
[...]
...Hive [29] is an open source MapReduce based query system that...
[...]
...Declarative query interfaces such as Hive [29], Pig [19], and Scope [17] have brought the large scale data analysis one step closer to the common users by providing high level, easy to use programming abstractions to MapReduce....
[...]
1,293 citations
"Hadoop GIS: a high performance spat..." refers background in this paper
...Comparisons of MapReduce and parallel databases for structured data are discussed in [29, 20, 30]....
[...]
1,188 citations
"Hadoop GIS: a high performance spat..." refers background or methods in this paper
...Data loading speed is a major bottleneck for SDBMS based solutions [26], especially for...
[...]
...The high data loading overhead is another major bottleneck for SDBMS based solutions [26]....
[...]
...However, this approach is highly expensive on software licensing and dedicated hardware, and requires sophisticated tuning and maintenance efforts [26]....
[...]
...We have previously developed a parallel SDBMS based approach PAIS [30, 31, 7] based on DB2 DPF with reasonable scalability, but the approach is highly expensive on software license and hardware requirement[26], and requires sophisticated tuning and maintenance....
[...]