scispace - formally typeset
Search or ask a question
Book ChapterDOI

A cluster validity index for hard clustering

29 Apr 2012-pp 168-174
TL;DR: To determine the optimal number of clusters in data sets, the new cluster validity index has been applied to the complete link hierarchical clustering algorithm and obtained results confirm very good performances of the proposed approach.
Abstract: This paper describes a new cluster validity index for the well-separable clusters in data sets. The validity indices are necessary for many clustering algorithms to assign the naturally existing clusters correctly. In the presented method, to determine the optimal number of clusters in data sets, the new cluster validity index has been used. It has been applied to the complete link hierarchical clustering algorithm. The basis to define the new cluster validity index is founding of the large increments of intercluster and intracluster distances, when the clustering algorithm is performed. The maximum value of the index determines the optimal number of clusters in the given set simultaneously. Obtained results confirm very good performances of the proposed approach.
Citations
More filters
Book ChapterDOI
09 Jun 2013
TL;DR: A new method to the determination of the optimal number of well-separable clusters in data sets using the agglomerative hierarchical clustering, and the modified RS cluster validity index has been applied.
Abstract: This paper describes a new method to the determination of the optimal number of well-separable clusters in data sets. The determination of this parameter is necessary for many clustering algorithms to define the naturally existing clusters correctly. In the presented method the idea of the agglomerative hierarchical clustering has been used, and the modified RS cluster validity index has been applied. In the first phase of the method, clusters are created due to the idea of hierarchical clustering. Then, for the optimal number of clusters the k-means algorithm is performed. The method has been used for multidimensional data, and the received results confirm very good performances of the proposed method.

4 citations

Book ChapterDOI
14 Jun 2015
TL;DR: The conditions of convergence of the estimation algorithm for the variance of noise growing up when number of observations is tending to infinity are presented and the orthogonal series approach is presented.
Abstract: The article concerns of the problem of regression functions estimation when the output is contaminated by additive nonstationary noise. We investigate the model \(y_i = R\left( {{\bf x _i}} \right) + Z _i ,\,i = 1,2, \ldots n\), where x i is assumed to be the set of deterministic inputs (d-dimensional vector), y i is the scalar, probabilistic outputs, and Z i is a measurement noise with zero mean and variance depending on n. \(R\left( . \right)\) is a completely unknown function. The problem of finding function \(R\left( . \right)\) may be solved by applying non-parametric methodology, for instance: algorithms based on the Parzen kernel or algorithms derived from orthogonal series. In this work we present the orthogonal series approach. The analysis has been made for some class of nonstationarity. We present the conditions of convergence of the estimation algorithm for the variance of noise growing up when number of observations is tending to infinity. The results of numerical simulations are presented.

3 citations


Additional excerpts

  • ...[1], [2], [3], [4], [5], [15], [19], [20], [21], [32], [33], [34], [39], [40], [44], [49], [52], [54])....

    [...]

References
More filters
Book
01 Jan 1973

20,541 citations


"A cluster validity index for hard c..." refers background in this paper

  • ...They make it possible to determine two major properties of clusters, their separability and compactness [2,5]....

    [...]

Journal ArticleDOI
TL;DR: A new graphical display is proposed for partitioning techniques, where each cluster is represented by a so-called silhouette, which is based on the comparison of its tightness and separation, and provides an evaluation of clustering validity.

14,144 citations


"A cluster validity index for hard c..." refers background in this paper

  • ..., Dunn’s index [3], Davies–Bouldin index [1], PBM index [11], RS index [5], SIL index [13]....

    [...]

Journal ArticleDOI
TL;DR: A measure is presented which indicates the similarity of clusters which are assumed to have a data density which is a decreasing function of distance from a vector characteristic of the cluster which can be used to infer the appropriateness of data partitions.
Abstract: A measure is presented which indicates the similarity of clusters which are assumed to have a data density which is a decreasing function of distance from a vector characteristic of the cluster. The measure can be used to infer the appropriateness of data partitions and can therefore be used to compare relative appropriateness of various divisions of the data. The measure does not depend on either the number of clusters analyzed nor the method of partitioning of the data and can be used to guide a cluster seeking algorithm.

6,757 citations


"A cluster validity index for hard c..." refers background in this paper

  • ..., Dunn’s index [3], Davies–Bouldin index [1], PBM index [11], RS index [5], SIL index [13]....

    [...]

Journal ArticleDOI
01 Jan 1973
TL;DR: Two fuzzy versions of the k-means optimal, least squared error partitioning problem are formulated for finite subsets X of a general inner product space; in both cases, the extremizing solutions are shown to be fixed points of a certain operator T on the class of fuzzy, k-partitions of X, and simple iteration of T provides an algorithm which has the descent property relative to the least squarederror criterion function.
Abstract: Two fuzzy versions of the k-means optimal, least squared error partitioning problem are formulated for finite subsets X of a general inner product space. In both cases, the extremizing solutions are shown to be fixed points of a certain operator T on the class of fuzzy, k-partitions of X, and simple iteration of T provides an algorithm which has the descent property relative to the least squared error criterion function. In the first case, the range of T consists largely of ordinary (i.e. non-fuzzy) partitions of X and the associated iteration scheme is essentially the well known ISODATA process of Ball and Hall. However, in the second case, the range of T consists mainly of fuzzy partitions and the associated algorithm is new; when X consists of k compact well separated (CWS) clusters, Xi , this algorithm generates a limiting partition with membership functions which closely approximate the characteristic functions of the clusters Xi . However, when X is not the union of k CWS clusters, the limi...

5,787 citations


"A cluster validity index for hard c..." refers background in this paper

  • ..., Dunn’s index [3], Davies–Bouldin index [1], PBM index [11], RS index [5], SIL index [13]....

    [...]