scispace - formally typeset
Search or ask a question
Topic

CURE data clustering algorithm

About: CURE data clustering algorithm is a research topic. Over the lifetime, 13766 publications have been published within this topic receiving 461296 citations.


Papers
More filters
Journal ArticleDOI
02 Dec 2001
TL;DR: The fundamental concepts of clustering are introduced while it surveys the widely known clustering algorithms in a comparative way and the issues that are under-addressed by the recent algorithms are illustrated.
Abstract: Cluster analysis aims at identifying groups of similar objects and, therefore helps to discover distribution of patterns and interesting correlations in large data sets. It has been subject of wide research since it arises in many application domains in engineering, business and social sciences. Especially, in the last years the availability of huge transactional and experimental data sets and the arising requirements for data mining created needs for clustering algorithms that scale and can be applied in diverse domains. This paper introduces the fundamental concepts of clustering while it surveys the widely known clustering algorithms in a comparative way. Moreover, it addresses an important issue of clustering process regarding the quality assessment of the clustering results. This is also related to the inherent features of the data set under concern. A review of clustering validity measures and approaches available in the literature is presented. Furthermore, the paper illustrates the issues that are under-addressed by the recent algorithms and gives the trends in clustering process.

2,643 citations

Proceedings Article
28 Jun 2001
TL;DR: This paper demonstrates how the popular k-means clustering algorithm can be protably modied to make use of information about the problem domain that is available in addition to the data instances themselves.
Abstract: Clustering is traditionally viewed as an unsupervised method for data analysis. However, in some cases information about the problem domain is available in addition to the data instances themselves. In this paper, we demonstrate how the popular k-means clustering algorithm can be protably modied to make use of this information. In experiments with articial constraints on six data sets, we observe improvements in clustering accuracy. We also apply this method to the real-world problem of automatically detecting road lanes from GPS data and observe dramatic increases in performance.

2,641 citations

Journal ArticleDOI
TL;DR: The global k-means algorithm is presented which is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure consisting of N executions of the k-Means algorithm from suitable initial positions.

2,544 citations

Journal ArticleDOI
TL;DR: The authors proposed a variance estimator for the OLS estimator as well as for nonlinear estimators such as logit, probit, and GMM that enables cluster-robust inference when there is two-way or multiway clustering that is nonnested.
Abstract: In this article we propose a variance estimator for the OLS estimator as well as for nonlinear estimators such as logit, probit, and GMM. This variance estimator enables cluster-robust inference when there is two-way or multiway clustering that is nonnested. The variance estimator extends the standard cluster-robust variance estimator or sandwich estimator for one-way clustering (e.g., Liang and Zeger 1986; Arellano 1987) and relies on similar relatively weak distributional assumptions. Our method is easily implemented in statistical packages, such as Stata and SAS, that already offer cluster-robust standard errors when there is one-way clustering. The method is demonstrated by a Monte Carlo analysis for a two-way random effects model; a Monte Carlo analysis of a placebo law that extends the state–year effects example of Bertrand, Duflo, and Mullainathan (2004) to two dimensions; and by application to studies in the empirical literature where two-way clustering is present.

2,542 citations

Journal ArticleDOI
TL;DR: This paper surveys and summarizes previous works that investigated the clustering of time series data in various application domains, including general-purpose clustering algorithms commonly used in time series clustering studies.

2,336 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
90% related
Support vector machine
73.6K papers, 1.7M citations
88% related
Fuzzy logic
151.2K papers, 2.3M citations
86% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Artificial neural network
207K papers, 4.5M citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202393
2022323
20217
20205
201919
2018113