Topic

Fuzzy clustering

About: Fuzzy clustering is a research topic. Over the lifetime, 23230 publications have been published within this topic receiving 601269 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Clustering of the self-organizing map

[...]

Juha Vesanto¹, Esa Alhoniemi¹•Institutions (1)

Helsinki University of Technology¹

01 May 2000-IEEE Transactions on Neural Networks

TL;DR: The two-stage procedure--first using SOM to produce the prototypes that are then clustered in the second stage--is found to perform well when compared with direct clustering of the data and to reduce the computation time.

...read moreread less

Abstract: The self-organizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a low-dimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quantitative analysis of the map and the data, similar units need to be grouped, i.e., clustered. In this paper, different approaches to clustering of the SOM are considered. In particular, the use of hierarchical agglomerative clustering and partitive clustering using K-means are investigated. The two-stage procedure-first using SOM to produce the prototypes that are then clustered in the second stage-is found to perform well when compared with direct clustering of the data and to reduce the computation time.

...read moreread less

2,387 citations

Journal Article•DOI•

Clustering of time series data-a survey

[...]

T. Warren Liao¹•Institutions (1)

Louisiana State University¹

01 Nov 2005-Pattern Recognition

TL;DR: This paper surveys and summarizes previous works that investigated the clustering of time series data in various application domains, including general-purpose clustering algorithms commonly used in time series clustering studies.

...read moreread less

2,336 citations

Journal Article•DOI•

Sparse Subspace Clustering: Algorithm, Theory, and Applications

[...]

Ehsan Elhamifar¹, René Vidal²•Institutions (2)

University of California, Berkeley¹, Johns Hopkins University²

01 Nov 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this article, a sparse subspace clustering algorithm is proposed to cluster high-dimensional data points that lie in a union of low-dimensional subspaces, where a sparse representation corresponds to selecting a few points from the same subspace.

...read moreread less

Abstract: Many real-world problems deal with collections of high-dimensional data, such as images, videos, text, and web documents, DNA microarray data, and more. Often, such high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories to which the data belong. In this paper, we propose and study an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among the infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of the data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of the subspaces and the distribution of the data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm is efficient and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal directly with data nuisances, such as noise, sparse outlying entries, and missing entries, by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering.

...read moreread less

2,298 citations

Journal Article•DOI•

Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

[...]

Zhexue Huang¹•Institutions (1)

Commonwealth Scientific and Industrial Research Organisation¹

01 Sep 1998-Data Mining and Knowledge Discovery

TL;DR: Two algorithms which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values are presented and are shown to be efficient when clustering large data sets, which is critical to data mining applications.

...read moreread less

Abstract: The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values. The k-modes algorithm uses a simple matching dissimilarity measure to deal with categorical objects, replaces the means of clusters with modes, and uses a frequency-based method to update modes in the clustering process to minimise the clustering cost function. With these extensions the k-modes algorithm enables the clustering of categorical data in a fashion similar to k-means. The k-prototypes algorithm, through the definition of a combined dissimilarity measure, further integrates the k-means and k-modes algorithms to allow for clustering objects described by mixed numeric and categorical attributes. We use the well known soybean disease and credit approval data sets to demonstrate the clustering performance of the two algorithms. Our experiments on two real world data sets with half a million objects each show that the two algorithms are efficient when clustering large data sets, which is critical to data mining applications.

...read moreread less

2,289 citations

Journal Article•DOI•

Clumpak: a program for identifying clustering modes and packaging population structure inferences across K

[...]

Naama M. Kopelman¹, Jonathan Mayzel¹, Mattias Jakobsson², Noah A. Rosenberg³, Itay Mayrose¹ - Show less +1 more•Institutions (3)

Tel Aviv University¹, Uppsala University², Stanford University³

01 Sep 2015-Molecular Ecology Resources

TL;DR: Clumpak, available at http://clumpak.tau.ac.il, simplifies the use of model-based analyses of population structure in population genetics and molecular ecology by automating the postprocessing of results of model‐based population structure analyses.

...read moreread less

Abstract: The identification of the genetic structure of populations from multilocus genotype data has become a central component of modern population-genetic data analysis. Application of model-based clustering programs often entails a number of steps, in which the user considers different modelling assumptions, compares results across different predetermined values of the number of assumed clusters (a parameter typically denoted K), examines multiple independent runs for each fixed value of K, and distinguishes among runs belonging to substantially distinct clustering solutions. Here, we present CLUMPAK (Cluster Markov Packager Across K), a method that automates the postprocessing of results of model-based population structure analyses. For analysing multiple independent runs at a single K value, CLUMPAK identifies sets of highly similar runs, separating distinct groups of runs that represent distinct modes in the space of possible solutions. This procedure, which generates a consensus solution for each distinct mode, is performed by the use of a Markov clustering algorithm that relies on a similarity matrix between replicate runs, as computed by the software CLUMPP. Next, CLUMPAK identifies an optimal alignment of inferred clusters across different values of K, extending a similar approach implemented for a fixed K in CLUMPP and simplifying the comparison of clustering results across different K values. CLUMPAK incorporates additional features, such as implementations of methods for choosing K and comparing solutions obtained by different programs, models, or data subsets. CLUMPAK, available at http://clumpak.tau.ac.il, simplifies the use of model-based analyses of population structure in population genetics and molecular ecology.

...read moreread less

2,252 citations

Collapse

Network Information

Performance

Metrics

23,840

Papers

668,841

Citations

No. of papers in the topic in previous years
Year	Papers
2023	186
2022	433
2021	456
2020	463
2019	587
2018	569

Fuzzy clustering

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics