Topic

# DBSCAN

About: DBSCAN is a research topic. Over the lifetime, 4111 publications have been published within this topic receiving 132420 citations. The topic is also known as: Density-based spatial clustering of applications with noise.

##### Papers published on a yearly basis

##### Papers

More filters

•

02 Aug 1996TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.

Abstract: Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the well-known algorithm CLAR-ANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.

17,056 citations

•

01 Jan 1996TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.

Abstract: Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the well-known algorithm CLARANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.

14,297 citations

••

TL;DR: In this article, the authors present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches, and discuss the advantages and disadvantages of these algorithms.

Abstract: In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

9,141 citations

•

01 Jan 2013

TL;DR: This book discusses data mining through the lens of cluster analysis, which examines the relationships between data, clusters, and algorithms, and some of the techniques used to solve these problems.

Abstract: 1 Introduction 1.1 What is Data Mining? 1.2 Motivating Challenges 1.3 The Origins of Data Mining 1.4 Data Mining Tasks 1.5 Scope and Organization of the Book 1.6 Bibliographic Notes 1.7 Exercises 2 Data 2.1 Types of Data 2.2 Data Quality 2.3 Data Preprocessing 2.4 Measures of Similarity and Dissimilarity 2.5 Bibliographic Notes 2.6 Exercises 3 Exploring Data 3.1 The Iris Data Set 3.2 Summary Statistics 3.3 Visualization 3.4 OLAP and Multidimensional Data Analysis 3.5 Bibliographic Notes 3.6 Exercises 4 Classification: Basic Concepts, Decision Trees, and Model Evaluation 4.1 Preliminaries 4.2 General Approach to Solving a Classification Problem 4.3 Decision Tree Induction 4.4 Model Overfitting 4.5 Evaluating the Performance of a Classifier 4.6 Methods for Comparing Classifiers 4.7 Bibliographic Notes 4.8 Exercises 5 Classification: Alternative Techniques 5.1 Rule-Based Classifier 5.2 Nearest-Neighbor Classifiers 5.3 Bayesian Classifiers 5.4 Artificial Neural Network (ANN) 5.5 Support Vector Machine (SVM) 5.6 Ensemble Methods 5.7 Class Imbalance Problem 5.8 Multiclass Problem 5.9 Bibliographic Notes 5.10 Exercises 6 Association Analysis: Basic Concepts and Algorithms 6.1 Problem Definition 6.2 Frequent Itemset Generation 6.3 Rule Generation 6.4 Compact Representation of Frequent Itemsets 6.5 Alternative Methods for Generating Frequent Itemsets 6.6 FP-Growth Algorithm 6.7 Evaluation of Association Patterns 6.8 Effect of Skewed Support Distribution 6.9 Bibliographic Notes 6.10 Exercises 7 Association Analysis: Advanced Concepts 7.1 Handling Categorical Attributes 7.2 Handling Continuous Attributes 7.3 Handling a Concept Hierarchy 7.4 Sequential Patterns 7.5 Subgraph Patterns 7.6 Infrequent Patterns 7.7 Bibliographic Notes 7.8 Exercises 8 Cluster Analysis: Basic Concepts and Algorithms 8.1 Overview 8.2 K-means 8.3 Agglomerative Hierarchical Clustering 8.4 DBSCAN 8.5 Cluster Evaluation 8.6 Bibliographic Notes 8.7 Exercises 9 Cluster Analysis: Additional Issues and Algorithms 9.1 Characteristics of Data, Clusters, and Clustering Algorithms 9.2 Prototype-Based Clustering 9.3 Density-Based Clustering 9.4 Graph-Based Clustering 9.5 Scalable Clustering Algorithms 9.6 Which Clustering Algorithm? 9.7 Bibliographic Notes 9.8 Exercises 10 Anomaly Detection 10.1 Preliminaries 10.2 Statistical Approaches 10.3 Proximity-Based Outlier Detection 10.4 Density-Based Outlier Detection 10.5 Clustering-Based Techniques 10.6 Bibliographic Notes 10.7 Exercises Appendix A Linear Algebra Appendix B Dimensionality Reduction Appendix C Probability and Statistics Appendix D Regression Appendix E Optimization Author Index Subject Index

7,356 citations

••

01 Jun 2010TL;DR: A brief overview of clustering is provided, well known clustering methods are summarized, the major challenges and key issues in designing clustering algorithms are discussed, and some of the emerging and useful research directions are pointed out.

Abstract: Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into a system of ranked taxa: domain, kingdom, phylum, class, etc. Cluster analysis is the formal study of methods and algorithms for grouping, or clustering, objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminant analysis (supervised learning). The aim of clustering is to find structure in data and is therefore exploratory in nature. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms, K-means, was first published in 1955. In spite of the fact that K-means was proposed over 50 years ago and thousands of clustering algorithms have been published since then, K-means is still widely used. This speaks to the difficulty in designing a general purpose clustering algorithm and the ill-posed problem of clustering. We provide a brief overview of clustering, summarize well known clustering methods, discuss the major challenges and key issues in designing clustering algorithms, and point out some of the emerging and useful research directions, including semi-supervised clustering, ensemble clustering, simultaneous feature selection during data clustering, and large scale data clustering.

6,601 citations