Research on the Parallelization of the DBSCAN Clustering Algorithm for Spatial Data Mining Based on the Spark Platform
Reads0
Chats0
TLDR
The experimental results indicate that the proposed parallel algorithm design is able to achieve more stable speedup at an increased involved spatial data scale and solves the following problems that arise when computing macro data.Abstract:
Density-based spatial clustering of applications with noise (DBSCAN) is a density-based clustering algorithm that has the characteristics of being able to discover clusters of any shape, effectively distinguishing noise points and naturally supporting spatial databases. DBSCAN has been widely used in the field of spatial data mining. This paper studies the parallelization design and realization of the DBSCAN algorithm based on the Spark platform, and solves the following problems that arise when computing macro data: the requirement of a great deal of calculation using the single-node algorithm; the low level of resource-utilization with the multi-node algorithm; the large time consumption; and the lack of instantaneity. The experimental results indicate that the proposed parallel algorithm design is able to achieve more stable speedup at an increased involved spatial data scale.read more
Citations
More filters
Journal ArticleDOI
Field-Based High-Throughput Phenotyping for Maize Plant Using 3D LiDAR Point Cloud Generated With a "Phenomobile".
TL;DR: This paper mounts a LiDAR (Velodyne HDL64-S3) on a mobile robot, making the robot a “phenomobile,” and develops software for data collection and analysis under Robotic Operating System using open source components and algorithm libraries.
Journal ArticleDOI
Air quality predictions with a semi-supervised bidirectional LSTM neural network
TL;DR: A semi-supervised model was proposed for predicting PM2.5 multi-step predictions using LSTM-based models and captured at least 70% of the explained variance in this study, demonstrating the feasibility of the model.
Journal ArticleDOI
Sensor Reliability in Cyber-Physical Systems Using Internet-of-Things Data: A Review and Case Study
Fernando Castaño,Stanisław Strzelczak,Alberto Villalonga,Rodolfo E. Haber,Joanna Kossakowska +4 more
TL;DR: The results demonstrate the effectiveness of the proposed method for increasing sensor reliability in cyber-physical systems using Internet-of-Things data.
Journal ArticleDOI
DENCAST: distributed density-based clustering for multi-target regression
TL;DR: The DENCAST system is proposed, a novel distributed algorithm implemented in Apache Spark, which performs density-based clustering and exploits the identified clusters to solve both single- and multi-target regression tasks (and thus, solves complex tasks such as time series prediction).
Proceedings ArticleDOI
Theoretically-Efficient and Practical Parallel DBSCAN
Yiqiu Wang,Yan Gu,Julian Shun +2 more
TL;DR: This paper presents new parallel algorithms for Euclidean exact DBSCAN and approximate DBS CAN that match the work bounds of their sequential counterparts, and are highly parallel (polylogarithmic depth).
References
More filters
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Proceedings Article
A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise
TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
Journal ArticleDOI
OPTICS: ordering points to identify the clustering structure
TL;DR: A new algorithm is introduced for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure.
Journal ArticleDOI
A Survey of General-Purpose Computation on Graphics Hardware
John D. Owens,David Luebke,Naga K. Govindaraju,Mark J. Harris,Jens Krüger,Aaron Lefohn,Timothy John Purcell +6 more
TL;DR: This report describes, summarize, and analyzes the latest research in mapping general‐purpose computation to graphics hardware.