Survey of clustering algorithms
read more
Citations
Clustering by fast search and find of density peaks
The Self-Organizing Map
When is nearest neighbor meaningful
Handbook of Blind Source Separation: Independent Component Analysis and Applications
A survey of techniques for internet traffic classification using machine learning
References
Basic Local Alignment Search Tool
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Fuzzy sets
A new look at the statistical model identification
Optimization by Simulated Annealing
Related Papers (5)
Frequently Asked Questions (14)
Q2. What is the term used to describe the problems accompanying high dimensional spaces?
The term, “curse of dimensionality,” which was first used by Bellman to indicate the exponential growth of complexity in the case of multivariate function estimation under a high dimensionality situation [28], is generally used to describe the problems accompanying high dimensional spaces [34], [132].
Q3. What is the important graph representation for HC analysis?
A minimum cut (mincut) procedure, which aims to separate a graph with a minimum number of edges, is used to find these HCSs recursively.
Q4. What software package was used in this analysis?
The software package GeneCluster, developed by Whitehead Institute/MIT Center for Genome Research (WICGR), was used in this analysis.
Q5. What is the way to reduce the computational complexity of GAs-based clustering?
It also uses a nearest-neighbor algorithm to divide data into small subsets, before GAs-based clustering, in order to reduce the computational complexity.
Q6. How many operations can be used to determine the distance between a given sequence?
If a sequence comparison is regarded as a process of transforming a given sequence to another with a series of substitution, insertion, and deletion operations, the distance between the two sequences can be defined by virtue of the minimum number of required operations.
Q7. How can the authors calculate the posterior probability for assigning a data point to a cluster?
As long as the parameter vector is decided, the posterior probability for assigning a data point to a cluster can be easily calculated with Bayes’s theorem.
Q8. What is the way to expose the relations between genes?
Since many genes usually display more than one function, fuzzy clustering may be more effective in exposing these relations [73].
Q9. What is the definition of a generalized projected cluster?
ORCLUS (arbitrarily ORiented projected CLUster generation) [2] defines a generalized projected cluster as a densely distributed subset of data objects in a subspace, along with a subset of vectors that represent the subspace.
Q10. How can the authors avoid the time-consuming process to describe the nonlinear mapping?
By designing and calculating an inner-product kernel, the authors can avoid the time-consuming, sometimes even infeasible process to explicitly describe the nonlinear mapping and compute the corresponding points in the transformed space.
Q11. How many other algorithms can be accessed?
Several other constructive clustering algorithms, including the FACS and plastic neural gas, can be accessed in [223] and [232], respectively.
Q12. What is the expression level of the th gene in the th condition, tissue, or?
After the normalization of the fluorescence intensities, the gene expression profiles are represented as a matrix , where is the expression level of the th gene in the th condition, tissue, or experimental stage.
Q13. What are some other neural network architectures that are used for clustering?
Most of these architectures utilize prototype vectors to represent clusters, e.g., cluster detection and labeling network (CDL) [82], HEC [194], and SPLL [296].
Q14. What is the popular data set to examine the performance of novel methods in pattern recognition and machine?
The iris data set [92] is one of the most popular data sets to examine the performance of novel methods in pattern recognition and machine learning.