scispace - formally typeset
Search or ask a question
Topic

Correlation clustering

About: Correlation clustering is a research topic. Over the lifetime, 19362 publications have been published within this topic receiving 602579 citations.


Papers
More filters
Journal ArticleDOI
02 Dec 2001
TL;DR: The fundamental concepts of clustering are introduced while it surveys the widely known clustering algorithms in a comparative way and the issues that are under-addressed by the recent algorithms are illustrated.
Abstract: Cluster analysis aims at identifying groups of similar objects and, therefore helps to discover distribution of patterns and interesting correlations in large data sets. It has been subject of wide research since it arises in many application domains in engineering, business and social sciences. Especially, in the last years the availability of huge transactional and experimental data sets and the arising requirements for data mining created needs for clustering algorithms that scale and can be applied in diverse domains. This paper introduces the fundamental concepts of clustering while it surveys the widely known clustering algorithms in a comparative way. Moreover, it addresses an important issue of clustering process regarding the quality assessment of the clustering results. This is also related to the inherent features of the data set under concern. A review of clustering validity measures and approaches available in the literature is presented. Furthermore, the paper illustrates the issues that are under-addressed by the recent algorithms and gives the trends in clustering process.

2,643 citations

Proceedings Article
28 Jun 2001
TL;DR: This paper demonstrates how the popular k-means clustering algorithm can be protably modied to make use of information about the problem domain that is available in addition to the data instances themselves.
Abstract: Clustering is traditionally viewed as an unsupervised method for data analysis. However, in some cases information about the problem domain is available in addition to the data instances themselves. In this paper, we demonstrate how the popular k-means clustering algorithm can be protably modied to make use of this information. In experiments with articial constraints on six data sets, we observe improvements in clustering accuracy. We also apply this method to the real-world problem of automatically detecting road lanes from GPS data and observe dramatic increases in performance.

2,641 citations

Journal ArticleDOI
TL;DR: The problems of determining the number of clusters and the clustering method are solved simultaneously by choosing the best model, and the EM result provides a measure of uncertainty about the associated classification of each data point.
Abstract: We consider the problem of determining the structure of clustered data, without prior knowledge of the number of clusters or any other information about their composition. Data are represented by a mixture model in which each component corresponds to a different cluster. Models with varying geometric properties are obtained through Gaussian components with different parametrizations and cross-cluster constraints. Noise and outliers can be modelled by adding a Poisson process component. Partitions are determined by the expectation-maximization (EM) algorithm for maximum likelihood, with initial values from agglomerative hierarchical clustering. Models are compared using an approximation to the Bayes factor based on the Bayesian information criterion (BIC); unlike significance tests, this allows comparison of more than two models at the same time, and removes the restriction that the models compared be nested. The problems of determining the number of clusters and the clustering method are solved simultaneously by choosing the best model. Moreover, the EM result provides a measure of uncertainty about the associated classification of each data point. Examples are given, showing that this approach can give performance that is much better than standard procedures, which often fail to identify groups that are either overlapping or of varying sizes and shapes.

2,576 citations

Journal ArticleDOI
TL;DR: The global k-means algorithm is presented which is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure consisting of N executions of the k-Means algorithm from suitable initial positions.

2,544 citations

Journal ArticleDOI
TL;DR: The authors proposed a variance estimator for the OLS estimator as well as for nonlinear estimators such as logit, probit, and GMM that enables cluster-robust inference when there is two-way or multiway clustering that is nonnested.
Abstract: In this article we propose a variance estimator for the OLS estimator as well as for nonlinear estimators such as logit, probit, and GMM. This variance estimator enables cluster-robust inference when there is two-way or multiway clustering that is nonnested. The variance estimator extends the standard cluster-robust variance estimator or sandwich estimator for one-way clustering (e.g., Liang and Zeger 1986; Arellano 1987) and relies on similar relatively weak distributional assumptions. Our method is easily implemented in statistical packages, such as Stata and SAS, that already offer cluster-robust standard errors when there is one-way clustering. The method is demonstrated by a Monte Carlo analysis for a two-way random effects model; a Monte Carlo analysis of a placebo law that extends the state–year effects example of Bertrand, Duflo, and Mullainathan (2004) to two dimensions; and by application to studies in the empirical literature where two-way clustering is present.

2,542 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
92% related
Fuzzy logic
151.2K papers, 2.3M citations
89% related
Feature extraction
111.8K papers, 2.1M citations
88% related
Feature (computer vision)
128.2K papers, 1.7M citations
86% related
Artificial neural network
207K papers, 4.5M citations
85% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023135
2022410
202148
202040
201981
2018180