Journal ArticleDOI
Automated variable weighting in k-means type clustering
TLDR
A new step is introduced to the k-means clustering process to iteratively update variable weights based on the current partition of data and a formula for weight calculation is proposed, and the convergency theorem of the new clustered process is given.Abstract:
This paper proposes a k-means type clustering algorithm that can automatically calculate variable weights. A new step is introduced to the k-means clustering process to iteratively update variable weights based on the current partition of data and a formula for weight calculation is proposed. The convergency theorem of the new clustering process is given. The variable weights produced by the algorithm measure the importance of variables in clustering and can be used in variable selection in data mining applications where large and complex real data are often involved. Experimental results on both synthetic and real data have shown that the new algorithm outperformed the standard k-means type algorithms in recovering clusters in data.read more
Citations
More filters
Journal ArticleDOI
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering
TL;DR: This survey tries to clarify the different problem definitions related to subspace clustering in general; the specific difficulties encountered in this field of research; the varying assumptions, heuristics, and intuitions forming the basis of different approaches; and how several prominent solutions tackle different problems.
Journal ArticleDOI
Subspace clustering
TL;DR: The problems motivating subspace clustering are sketched, different definitions and usages of subspaces for clusteringare described, and exemplary algorithmic solutions are discussed.
Journal ArticleDOI
A k-mean clustering algorithm for mixed numeric and categorical data
Amir Ahmad,Lipika Dey +1 more
TL;DR: A clustering algorithm based on k-mean paradigm that works well for data with mixed numeric and categorical features is presented and a new cost function and distance measure based on co-occurrence of values is proposed.
Journal ArticleDOI
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data
TL;DR: This paper presents a new k-means type algorithm for clustering high-dimensional objects in sub-spaces that can generate better clustering results than other subspace clustering algorithms and is also scalable to large data sets.
Book ChapterDOI
Feature Selection for Clustering: A Review
TL;DR: In this paper, feature selection is broadly categorized into four models: filter model, wrapper model, embedded model, and hybrid model, which is one of the most used techniques to reduce dimensionality among practitioners.
References
More filters
Some methods for classification and analysis of multivariate observations
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Proceedings ArticleDOI
Automatic subspace clustering of high dimensional data for data mining applications
TL;DR: CLIQUE is presented, a clustering algorithm that satisfies each of these requirements of data mining applications including the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records.