scispace - formally typeset
Open AccessJournal ArticleDOI

Research on the Parallelization of the DBSCAN Clustering Algorithm for Spatial Data Mining Based on the Spark Platform

Reads0
Chats0
TLDR
The experimental results indicate that the proposed parallel algorithm design is able to achieve more stable speedup at an increased involved spatial data scale and solves the following problems that arise when computing macro data.
Abstract
Density-based spatial clustering of applications with noise (DBSCAN) is a density-based clustering algorithm that has the characteristics of being able to discover clusters of any shape, effectively distinguishing noise points and naturally supporting spatial databases. DBSCAN has been widely used in the field of spatial data mining. This paper studies the parallelization design and realization of the DBSCAN algorithm based on the Spark platform, and solves the following problems that arise when computing macro data: the requirement of a great deal of calculation using the single-node algorithm; the low level of resource-utilization with the multi-node algorithm; the large time consumption; and the lack of instantaneity. The experimental results indicate that the proposed parallel algorithm design is able to achieve more stable speedup at an increased involved spatial data scale.

read more

Citations
More filters
Journal ArticleDOI

Field-Based High-Throughput Phenotyping for Maize Plant Using 3D LiDAR Point Cloud Generated With a "Phenomobile".

TL;DR: This paper mounts a LiDAR (Velodyne HDL64-S3) on a mobile robot, making the robot a “phenomobile,” and develops software for data collection and analysis under Robotic Operating System using open source components and algorithm libraries.
Journal ArticleDOI

Air quality predictions with a semi-supervised bidirectional LSTM neural network

TL;DR: A semi-supervised model was proposed for predicting PM2.5 multi-step predictions using LSTM-based models and captured at least 70% of the explained variance in this study, demonstrating the feasibility of the model.
Journal ArticleDOI

Sensor Reliability in Cyber-Physical Systems Using Internet-of-Things Data: A Review and Case Study

TL;DR: The results demonstrate the effectiveness of the proposed method for increasing sensor reliability in cyber-physical systems using Internet-of-Things data.
Journal ArticleDOI

DENCAST: distributed density-based clustering for multi-target regression

TL;DR: The DENCAST system is proposed, a novel distributed algorithm implemented in Apache Spark, which performs density-based clustering and exploits the identified clusters to solve both single- and multi-target regression tasks (and thus, solves complex tasks such as time series prediction).
Proceedings ArticleDOI

Theoretically-Efficient and Practical Parallel DBSCAN

TL;DR: This paper presents new parallel algorithms for Euclidean exact DBSCAN and approximate DBS CAN that match the work bounds of their sequential counterparts, and are highly parallel (polylogarithmic depth).
References
More filters
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Proceedings Article

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
Journal ArticleDOI

OPTICS: ordering points to identify the clustering structure

TL;DR: A new algorithm is introduced for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure.
Journal ArticleDOI

A Survey of General-Purpose Computation on Graphics Hardware

TL;DR: This report describes, summarize, and analyzes the latest research in mapping general‐purpose computation to graphics hardware.
Related Papers (5)