Journal ArticleDOI
Data clustering: 50 years beyond K-means
Anil K. Jain
- Vol. 31, Iss: 8, pp 651-666
TLDR
A brief overview of clustering is provided, well known clustering methods are summarized, the major challenges and key issues in designing clustering algorithms are discussed, and some of the emerging and useful research directions are pointed out.Abstract:
Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into a system of ranked taxa: domain, kingdom, phylum, class, etc. Cluster analysis is the formal study of methods and algorithms for grouping, or clustering, objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminant analysis (supervised learning). The aim of clustering is to find structure in data and is therefore exploratory in nature. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms, K-means, was first published in 1955. In spite of the fact that K-means was proposed over 50 years ago and thousands of clustering algorithms have been published since then, K-means is still widely used. This speaks to the difficulty in designing a general purpose clustering algorithm and the ill-posed problem of clustering. We provide a brief overview of clustering, summarize well known clustering methods, discuss the major challenges and key issues in designing clustering algorithms, and point out some of the emerging and useful research directions, including semi-supervised clustering, ensemble clustering, simultaneous feature selection during data clustering, and large scale data clustering.read more
Citations
More filters
Pattern Recognition and Machine Learning
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Journal ArticleDOI
Variational Mode Decomposition
TL;DR: This work proposes an entirely non-recursive variational mode decomposition model, where the modes are extracted concurrently and is a generalization of the classic Wiener filter into multiple, adaptive bands.
Journal ArticleDOI
Clustering by fast search and find of density peaks
Alex Rodriguez,Alessandro Laio +1 more
TL;DR: A method in which the cluster centers are recognized as local density maxima that are far away from any points of higher density, and the algorithm depends only on the relative densities rather than their absolute values.
Journal ArticleDOI
A Comprehensive Survey of Clustering Algorithms
Dongkuan Xu,Yingjie Tian +1 more
TL;DR: This review paper begins at the definition of clustering, takes the basic elements involved in the clustering process, such as the distance or similarity measurement and evaluation indicators, into consideration, and analyzes the clustered algorithms from two perspectives, the traditional ones and the modern ones.
Journal ArticleDOI
Living on the Edge: The Role of Proactive Caching in 5G Wireless Networks
TL;DR: In this article, a proactive caching mechanism is proposed to reduce peak traffic demands by proactively serving predictable user demands via caching at base stations and users' devices, and the results show that important gains can be obtained for each case study, with backhaul savings and a higher ratio of satisfied users.
References
More filters
Book
Using multivariate statistics
TL;DR: In this Section: 1. Multivariate Statistics: Why? and 2. A Guide to Statistical Techniques: Using the Book Research Questions and Associated Techniques.
Journal ArticleDOI
Maximum likelihood from incomplete data via the EM algorithm
Journal ArticleDOI
Distinctive Image Features from Scale-Invariant Keypoints
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Journal ArticleDOI
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article
Latent Dirichlet Allocation
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).