Automated variable weighting in k-means type clustering

doi:10.1109/TPAMI.2005.95

Journal ArticleDOI

Automated variable weighting in k-means type clustering

Joshua Zhexue Huang, +3 more

- 01 May 2005 -

IEEE Transactions on Pattern Analysis an...

- Vol. 27, Iss: 5, pp 657-668

TLDR

A new step is introduced to the k-means clustering process to iteratively update variable weights based on the current partition of data and a formula for weight calculation is proposed, and the convergency theorem of the new clustered process is given.

Abstract:

This paper proposes a k-means type clustering algorithm that can automatically calculate variable weights. A new step is introduced to the k-means clustering process to iteratively update variable weights based on the current partition of data and a formula for weight calculation is proposed. The convergency theorem of the new clustering process is given. The variable weights produced by the algorithm measure the importance of variables in clustering and can be used in variable selection in data mining applications where large and complex real data are often involved. Experimental results on both synthetic and real data have shown that the new algorithm outperformed the standard k-means type algorithms in recovering clusters in data.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

Hans-Peter Kriegel, +2 more

- 23 Mar 2009 -

ACM Transactions on Knowledge Discovery ...

TL;DR: This survey tries to clarify the different problem definitions related to subspace clustering in general; the specific difficulties encountered in this field of research; the varying assumptions, heuristics, and intuitions forming the basis of different approaches; and how several prominent solutions tackle different problems.

...read moreread less

Journal ArticleDOI

Subspace clustering

Hans-Peter Kriegel, +2 more

TL;DR: The problems motivating subspace clustering are sketched, different definitions and usages of subspaces for clusteringare described, and exemplary algorithmic solutions are discussed.

...read moreread less

Journal ArticleDOI

A k-mean clustering algorithm for mixed numeric and categorical data

Amir Ahmad, +1 more

TL;DR: A clustering algorithm based on k-mean paradigm that works well for data with mixed numeric and categorical features is presented and a new cost function and distance measure based on co-occurrence of values is proposed.

...read moreread less

Journal ArticleDOI

An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

Liping Jing, +2 more

- 01 Aug 2007 -

IEEE Transactions on Knowledge and Data ...

TL;DR: This paper presents a new k-means type algorithm for clustering high-dimensional objects in sub-spaces that can generate better clustering results than other subspace clustering algorithms and is also scalable to large data sets.

...read moreread less

Book ChapterDOI

Feature Selection for Clustering: A Review

Salem Alelyani, +2 more

TL;DR: In this paper, feature selection is broadly categorized into four models: filter model, wrapper model, embedded model, and hybrid model, which is one of the most used techniques to reduce dimensionality among practitioners.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Some methods for classification and analysis of multivariate observations

James B. MacQueen

TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.

...read moreread less

Algorithms for clustering data

Anil K. Jain, +1 more

Book

Algorithms for clustering data

Anil K. Jain, +1 more

Book

Cluster Analysis for Applications

Michael R. Anderberg

Proceedings ArticleDOI

Automatic subspace clustering of high dimensional data for data mining applications

Rakesh Agrawal, +3 more

TL;DR: CLIQUE is presented, a clustering algorithm that satisfies each of these requirements of data mining applications including the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records.

...read moreread less

Automated variable weighting in k-means type clustering

Citations

Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

Subspace clustering

A k-mean clustering algorithm for mixed numeric and categorical data

An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

Feature Selection for Clustering: A Review

References

Some methods for classification and analysis of multivariate observations

Algorithms for clustering data

Algorithms for clustering data

Cluster Analysis for Applications

Automatic subspace clustering of high dimensional data for data mining applications

Related Papers (5)

Some methods for classification and analysis of multivariate observations

Pattern Recognition with Fuzzy Objective Function Algorithms

Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data clustering: a review

UCI Machine Learning Repository