Home
/
Topics
/
Correlation clustering

Topic

Correlation clustering

About: Correlation clustering is a research topic. Over the lifetime, 19362 publications have been published within this topic receiving 602579 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973
1972
1971
1970
1969

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

On Clustering Validation Techniques

[...]

Maria Halkidi¹, Yannis Batistakis¹, Michalis Vazirgiannis¹•Institutions (1)

Athens University of Economics and Business¹

02 Dec 2001

TL;DR: The fundamental concepts of clustering are introduced while it surveys the widely known clustering algorithms in a comparative way and the issues that are under-addressed by the recent algorithms are illustrated.

...read moreread less

Abstract: Cluster analysis aims at identifying groups of similar objects and, therefore helps to discover distribution of patterns and interesting correlations in large data sets. It has been subject of wide research since it arises in many application domains in engineering, business and social sciences. Especially, in the last years the availability of huge transactional and experimental data sets and the arising requirements for data mining created needs for clustering algorithms that scale and can be applied in diverse domains. This paper introduces the fundamental concepts of clustering while it surveys the widely known clustering algorithms in a comparative way. Moreover, it addresses an important issue of clustering process regarding the quality assessment of the clustering results. This is also related to the inherent features of the data set under concern. A review of clustering validity measures and approaches available in the literature is presented. Furthermore, the paper illustrates the issues that are under-addressed by the recent algorithms and gives the trends in clustering process.

...read moreread less

2,643 citations

Proceedings Article•

Constrained K-means Clustering with Background Knowledge

[...]

Kiri L. Wagstaff, Claire Cardie, Seth Rogers, Stefan Schrödl

28 Jun 2001

TL;DR: This paper demonstrates how the popular k-means clustering algorithm can be protably modied to make use of information about the problem domain that is available in addition to the data instances themselves.

...read moreread less

Abstract: Clustering is traditionally viewed as an unsupervised method for data analysis. However, in some cases information about the problem domain is available in addition to the data instances themselves. In this paper, we demonstrate how the popular k-means clustering algorithm can be protably modied to make use of this information. In experiments with articial constraints on six data sets, we observe improvements in clustering accuracy. We also apply this method to the real-world problem of automatically detecting road lanes from GPS data and observe dramatic increases in performance.

...read moreread less

2,641 citations

Journal Article•DOI•

How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis

[...]

Chris Fraley¹, Adrian E. Raftery¹•Institutions (1)

University of Washington¹

01 Jan 1998-The Computer Journal

TL;DR: The problems of determining the number of clusters and the clustering method are solved simultaneously by choosing the best model, and the EM result provides a measure of uncertainty about the associated classification of each data point.

...read moreread less

Abstract: We consider the problem of determining the structure of clustered data, without prior knowledge of the number of clusters or any other information about their composition. Data are represented by a mixture model in which each component corresponds to a different cluster. Models with varying geometric properties are obtained through Gaussian components with different parametrizations and cross-cluster constraints. Noise and outliers can be modelled by adding a Poisson process component. Partitions are determined by the expectation-maximization (EM) algorithm for maximum likelihood, with initial values from agglomerative hierarchical clustering. Models are compared using an approximation to the Bayes factor based on the Bayesian information criterion (BIC); unlike significance tests, this allows comparison of more than two models at the same time, and removes the restriction that the models compared be nested. The problems of determining the number of clusters and the clustering method are solved simultaneously by choosing the best model. Moreover, the EM result provides a measure of uncertainty about the associated classification of each data point. Examples are given, showing that this approach can give performance that is much better than standard procedures, which often fail to identify groups that are either overlapping or of varying sizes and shapes.

...read moreread less

2,576 citations

Journal Article•DOI•

The global k-means clustering algorithm

[...]

Aristidis Likas, Nikos Vlassis, Jakob Verbeek

01 Feb 2003-Pattern Recognition

TL;DR: The global k-means algorithm is presented which is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure consisting of N executions of the k-Means algorithm from suitable initial positions.

...read moreread less

2,544 citations

Journal Article•DOI•

Robust Inference with Multi-way Clustering

[...]

A. Colin Cameron, Jonah B. Gelbach, Douglas L. Miller¹•Institutions (1)

University of California, Davis¹

01 Apr 2011-Journal of Business & Economic Statistics

TL;DR: The authors proposed a variance estimator for the OLS estimator as well as for nonlinear estimators such as logit, probit, and GMM that enables cluster-robust inference when there is two-way or multiway clustering that is nonnested.

...read moreread less

Abstract: In this article we propose a variance estimator for the OLS estimator as well as for nonlinear estimators such as logit, probit, and GMM. This variance estimator enables cluster-robust inference when there is two-way or multiway clustering that is nonnested. The variance estimator extends the standard cluster-robust variance estimator or sandwich estimator for one-way clustering (e.g., Liang and Zeger 1986; Arellano 1987) and relies on similar relatively weak distributional assumptions. Our method is easily implemented in statistical packages, such as Stata and SAS, that already offer cluster-robust standard errors when there is one-way clustering. The method is demonstrated by a Monte Carlo analysis for a two-way random effects model; a Monte Carlo analysis of a placebo law that extends the state–year effects example of Bertrand, Duflo, and Mullainathan (2004) to two dimensions; and by application to studies in the empirical literature where two-way clustering is present.

...read moreread less

2,542 citations

…
1
2
3
4
5
6
7
…
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

19,926

Papers

667,566

Citations

No. of papers in the topic in previous years
Year	Papers
2023	135
2022	410
2021	48
2020	40
2019	81
2018	180

Correlation clustering

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics