scispace - formally typeset
Search or ask a question
Topic

Dunn index

About: Dunn index is a research topic. Over the lifetime, 150 publications have been published within this topic receiving 24021 citations.


Papers
More filters
Book ChapterDOI
01 Jan 2016
TL;DR: Experiments compare the behavior of these new indexes with usual cluster quality indexes based on Euclidean distance on different kinds of test datasets for which ground truth is available and clearly highlights the superior accuracy and stability of the new method.
Abstract: This paper presents new cluster quality indexes which can be efficiently applied for a low-to-high dimensional range of data and which are tolerant to noise. These indexes relies on feature maximization, which is an alternative measure to usual distributional measures relying on entropy or on Chi-square metric or vector-based measures such as Euclidean distance or correlation distance. Experiments compare the behavior of these new indexes with usual cluster quality indexes based on Euclidean distance on different kinds of test datasets for which ground truth is available. This comparison clearly highlights the superior accuracy and stability of the new method.
05 Dec 2012
TL;DR: An evolutionary clustering technique is proposed that opts for cluster centers straight way from the data set, further making it to speed up the fitness evaluation by estimating a data table in advance, and by using binary instead of string representation to encode a variable number of cluster centers.
Abstract: Evolutionary clustering technique is proposed that opts for cluster centers straight way from the data set, further making it to speed up the fitness evaluation by estimating a data table in advance. It saves the distances among pairs of data points, and by using binary instead of string representation to encode a variable number of cluster centers. The development of ECT has capability to properly cluster different data sets. The experimental results show that the ECT provides a more stable clustering performance in terms of number of clusters and clustering results. These results require less computational time as compared to other GA-based clustering algorithms. Key Words Clustering Technique, Evolutionary Algorithms, Reproduction, Crossover, Mutation, Fitness, Cluster Validity 1. INTRODUCTION Cluster analysis, also known as unsupervised learning, is one of the most useful methods in the cluster analysis process for discovering groups. Clustering aims to organize a collection of data items into clusters, such that objects within the same cluster have a high degree of similarity, while objects belonging to different clusters have a high degree of dissimilarity. Cluster analysis makes it possible to look at properties of whole clusters instead of individual objects. This is a simplification that is useful when handling large amounts of data [1]. Some algorithms require certain parameters for clustering, such as the number of clusters and cluster shapes, as previous literature has stated [2]. Several non-GA-based clustering algorithms have been widely used, such as K-means, Fuzzy-c-means, EM, etc. However, the number of clusters in a data set is not known in most real-life situations. None of these non-GA-based clustering algorithms is capable of efficiently and automatically forming natural groups from all the input patterns, especially when the number of clusters included in the data set tends to be large. This is often due to a bad choice of initial cluster centers. Difficult problems such as these are referred to as unsupervised clustering or non-parametric clustering, and are often dealt with by employing an evolutionary approach. Genetic algorithms (GA) are the best-known evolutionary techniques [3]. To date, some research articles have dealt with this method [4]. Among the GA-based clustering algorithms illustrated in the current literature, the GCUK (Genetic Clustering for Unknown K) method [5] is the most effective one. However, its cost of computational time is very high because it uses a string representation (or real-number encoding) to encode clusters that require a great deal of time for floating-point computation. In our work, the cluster centers are selected from the data set, and a binary representation is used to encode a variable number of cluster centers. In the conventional GA-based clustering methods, the cluster mean is used as the center of a cluster, and thus the distance from every data point to its cluster center must be evaluated each time the fitness of a chromosome is evaluated. Fitness evaluation during the conventional evolution process is quite time-consuming due to the repeated computation of the distance between every data point and its corresponding cluster center. Since our method selects cluster centers directly from the data set, it has the advantage of constructing a look-up table that saves the distances between all pairs of data points in advance. With the aid of the look-up table, the distances between all pairs of data points need to be evaluated only once throughout the entire evolution process. The question generally asked, in relation to the cluster validity problem, is whether the underlying assumptions (cluster shapes, number of clusters, initial conditions, etc.) of the clustering technique are satisfied for all of the input data sets. In order to address this problem, several cluster validity measures such as the Dunn index, the XB index (Xie-Beni index), the BM index [6] and the DB index [7] have been proposed [8,9,10,11]. It is impossible to answer every question without prior knowledge of the data. However, we can look for measures that provide reasonable clustering results in terms of homogeneity within clusters and heterogeneity between clusters, as discussed above. Our experiments show that the Dunn index slows down the overall process although it provides good results for strip-shaped clusters, the XB index performs poorly when the number of clusters is large, and the BM index tends to form two clusters for most of the data sets. The DB index, defined as a function of the ratio of the sum of the within-cluster scatter to the between-cluster separation, is shown to provide the most reasonable measure among all indices mentioned above. Therefore, we adopt the DB index to measure cluster validity in our experiments. The superiority of the proposed algorithm, over other proposed genetic clustering algorithms, is demonstrated in the experimental results. This paper is organized as follows: Section 2 describes how to implement a genetic algorithm. In Section 3, our proposed algorithm is introduced. Section 4 provides experimental results and comparisons with the GCUK method. Conclusions and directions for future research are given in Section 5.
Posted Content
TL;DR: In this article, a study of VMware ESXi 5.1 server has been carried out to find the optimal set of parameters which suggest usage of different resources of the server using feature selection algorithms.
Abstract: A study of VMware ESXi 5.1 server has been carried out to find the optimal set of parameters which suggest usage of different resources of the server. Feature selection algorithms have been used to extract the optimum set of parameters of the data obtained from VMware ESXi 5.1 server using esxtop command. Multiple virtual machines (VMs) are running in the mentioned server. K-means algorithm is used for clustering the VMs. The goodness of each cluster is determined by Davies Bouldin index and Dunn index respectively. The best cluster is further identified by the determined indices. The features of the best cluster are considered into a set of optimal parameters.
Proceedings ArticleDOI
11 Apr 2013
TL;DR: The presented method parses the set of training data, consisting of normal and anomaly data, and separates the data into two clusters, each cluster represented by its centroid - one of the normal observations, and the other - for the anomalies.
Abstract: In the present paper a 2-means clustering-based anomaly detection technique is proposed. The presented method parses the set of training data, consisting of normal and anomaly data, and separates the data into two clusters. Each cluster is represented by its centroid - one of the normal observations, and the other - for the anomalies. The paper also provides appropriate methods for clustering, training and detection of attacks. The performance of the presented methodology is evaluated by the following methods: Recall, Precision and F1-measure. Measurements of performance are executed with Dunn index and Davies-Bouldin index.
Proceedings Article
17 Mar 2009
TL;DR: The paper presents a software radio based reconfigurable receiver architecture capable of identifying key parameters of the incoming signal, such as symbol rate and constellation shape, and then use these parameters to adjust the functional blocks of the receiver, and correctly demodulate thecoming signal.
Abstract: The paper presents a software radio based reconfigurable receiver architecture for identification of Linear, Bi-dimensional modulation techniques. The proposed architecture is capable of identifying key parameters of the incoming signal, such as symbol rate and constellation shape, and then use these parameters to adjust the functional blocks of the receiver, and correctly demodulate the incoming signal. The K-means clustering algorithm along with Dunn index clustering validity measure is used to correctly identify the constellation shape. This receiver architecture can be applied to any linear, Bi dimensional modulation techniques.

Network Information
Related Topics (5)
Feature selection
41.4K papers, 1M citations
70% related
Support vector machine
73.6K papers, 1.7M citations
69% related
Genetic algorithm
67.5K papers, 1.2M citations
68% related
Cluster analysis
146.5K papers, 2.9M citations
68% related
Web service
57.6K papers, 989K citations
66% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202120
202028
201917
201813
201710
201611