scispace - formally typeset
Search or ask a question

Showing papers by "Ulrike von Luxburg published in 2008"


Journal ArticleDOI
TL;DR: In this paper, the authors investigated consistency of the popular family of spectral clustering algorithms, which clusters the data with the help of eigenvectors of graph Laplacian matrices.
Abstract: Consistency is a key property of all statistical procedures analyzing randomly sampled data. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of the popular family of spectral clustering algorithms, which clusters the data with the help of eigenvectors of graph Laplacian matrices. We develop new methods to establish that, for increasing sample size, those eigenvectors converge to the eigenvectors of certain limit operators. As a result, we can prove that one of the two major classes of spectral clustering (normalized clustering) converges under very general conditions, while the other (unnormalized clustering) is only consistent under strong additional assumptions, which are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering.

204 citations


Proceedings Article
08 Dec 2008
TL;DR: This paper studies the convergence of graph clustering criteria such as the normalized cut (Ncut) as the sample size tends to infinity and finds that the limit expressions are different for different types of graph, for example the r-neighborhood graph or the k-nearest neighbor graph.
Abstract: Graph clustering methods such as spectral clustering are defined for general weighted graphs. In machine learning, however, data often is not given in form of a graph, but in terms of similarity (or distance) values between points. In this case, first a neighborhood graph is constructed using the similarities between the points and then a graph clustering algorithm is applied to this graph. In this paper we investigate the influence of the construction of the similarity graph on the clustering results. We first study the convergence of graph clustering criteria such as the normalized cut (Ncut) as the sample size tends to infinity. We find that the limit expressions are different for different types of graph, for example the r-neighborhood graph or the k-nearest neighbor graph. In plain words: Ncut on a kNN graph does something systematically different than Ncut on an r-neighborhood graph! This finding shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to. We also provide examples which show that these differences can be observed for toy and real data already for rather small sample sizes.

173 citations


Posted Content
TL;DR: This chapter provides an overview of the key ideas and insights of statistical learning theory and describes some other variants of machine learning.
Abstract: Statistical learning theory provides the theoretical basis for many of today's machine learning algorithms. In this article we attempt to give a gentle, non-technical overview over the key ideas and insights of statistical learning theory. We target at a broad audience, not necessarily machine learning researchers. This paper can serve as a starting point for people who want to get an overview on the field before diving into technical details.

7 citations