scispace - formally typeset


Canopy clustering algorithm

About: Canopy clustering algorithm is a(n) research topic. Over the lifetime, 12087 publication(s) have been published within this topic receiving 339413 citation(s).
More filters

Journal ArticleDOI
J. A. Hartigan1, M. A. Wong1Institutions (1)

9,656 citations

Anil K. Jain1, Richard C. Dubes1Institutions (1)
01 Jan 1988

8,580 citations

Proceedings Article
Andrew Y. Ng1, Michael I. Jordan1, Yair Weiss2Institutions (2)
03 Jan 2001
TL;DR: A simple spectral clustering algorithm that can be implemented using a few lines of Matlab is presented, and tools from matrix perturbation theory are used to analyze the algorithm, and give conditions under which it can be expected to do well.
Abstract: Despite many empirical successes of spectral clustering methods— algorithms that cluster points using eigenvectors of matrices derived from the data—there are several unresolved issues. First. there are a wide variety of algorithms that use the eigenvectors in slightly different ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab. Using tools from matrix perturbation theory, we analyze the algorithm, and give conditions under which it can be expected to do well. We also show surprisingly good experimental results on a number of challenging clustering problems.

8,315 citations

Journal ArticleDOI
Ulrike von Luxburg1Institutions (1)
Abstract: In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

8,197 citations

Journal ArticleDOI
Anil K. Jain1Institutions (1)
01 Jun 2010
TL;DR: A brief overview of clustering is provided, well known clustering methods are summarized, the major challenges and key issues in designing clustering algorithms are discussed, and some of the emerging and useful research directions are pointed out.
Abstract: Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into a system of ranked taxa: domain, kingdom, phylum, class, etc. Cluster analysis is the formal study of methods and algorithms for grouping, or clustering, objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminant analysis (supervised learning). The aim of clustering is to find structure in data and is therefore exploratory in nature. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms, K-means, was first published in 1955. In spite of the fact that K-means was proposed over 50 years ago and thousands of clustering algorithms have been published since then, K-means is still widely used. This speaks to the difficulty in designing a general purpose clustering algorithm and the ill-posed problem of clustering. We provide a brief overview of clustering, summarize well known clustering methods, discuss the major challenges and key issues in designing clustering algorithms, and point out some of the emerging and useful research directions, including semi-supervised clustering, ensemble clustering, simultaneous feature selection during data clustering, and large scale data clustering.

5,786 citations

Network Information
Related Topics (5)
Cluster analysis

146.5K papers, 2.9M citations

88% related
Feature extraction

111.8K papers, 2.1M citations

88% related
Fuzzy logic

151.2K papers, 2.3M citations

87% related
Feature (computer vision)

128.2K papers, 1.7M citations

87% related
Artificial neural network

207K papers, 4.5M citations

86% related
No. of papers in the topic in previous years