scispace - formally typeset
Search or ask a question

Showing papers on "Dunn index published in 2013"


Book ChapterDOI
10 Dec 2013
TL;DR: A kernel based rough-fuzzy C-Means (KRFCM) algorithm is proposed and a modified version of the performance indexes obtained by replacing the distance function with kernel function is used by providing a comparative analysis of RFCM with KRFCM by computing their DB and D index values.
Abstract: Data clustering has found its usefulness in various fields. Algorithms are mostly developed using euclidean distance. But it has several drawbacks which maybe rectified by using kernel distance formula. In this paper, we propose a kernel based rough-fuzzy C-Means (KRFCM) algorithm and use modified version of the performance indexes (DB and D) obtained by replacing the distance function with kernel function. We provide a comparative analysis of RFCM with KRFCM by computing their DB and D index values. The analysis is based upon both numerical as well as image datasets. The results establish that the proposed algotihtm outperforms the existing one.

11 citations


Proceedings ArticleDOI
19 Jun 2013
TL;DR: Experimental results indicate that TriWClustering can find significant triclusters and promote a useful tool for cross species gene regulation analysis.
Abstract: Many different biological data mining methods have been used in gene expression data analysis A common method is two-way clustering, also called biclustering, which is used to identify the gene groups that behave similarly under a subset of experimental conditions This paper introduces a novel approach called three-way clustering (TriWClustering) for cross-species gene regulation analysis, to mine coherent clusters named triclusters in three-dimensional (gene-condition-organism) gene expression datasets The developed method has been applied to three different gene expression data obtained from NCBI's GEO data collection Biological and statistical significance of the results are evaluated using Gene Ontology term enrichment analysis and Dunn index (DI) metric, respectively The experimental results indicate that TriWClustering can find significant triclusters and promote a useful tool for cross species gene regulation analysis

5 citations


Dissertation
01 Jul 2013
TL;DR: This research proposed a new pattern extraction scheme integrating triangular kernel function and local average density technique called TKC to improve KNN-density-based clustering algorithm and was benchmarked with other well-known clustering methods.
Abstract: Multidimensional data refers to data that contains at least three attributes or dimensions. The availability of huge amount of multidimensional data that has been collected over the years has greatly challenged the ability to digest the data and to gain useful knowledge that would otherwise be lost. Clustering technique has enabled the manipulation of this knowledge to gain an interesting pattern analysis that could benefit the relevant parties. In this study, three crucial challenges in extracting the pattern of the multidimensional data are highlighted: the dimension of huge multidimensional data requires efficient exploration method for the pattern extraction, the need for better mechanisms to test and validate clustering results and the need for more informative visualization to interpret the “best” clusters. Densitybased clustering algorithms such as density-based spatial clustering application with noise (DBSCAN), density clustering (DENCLUE) and kernel fuzzy C-means (KFCM) that use probabilistic similarity function have been introduced by previous works to determine the number of clusters automatically. However, they have difficulties in dealing with clusters of different densities, shapes and size. In addition, they require many parameter inputs that are difficult to determine. Kernel-nearestneighbor (KNN)-density-based clustering including kernel-nearest-neighbor-based clustering (KNNClust) has been proposed to solve the problems of determining smoothing parameters for multidimensional data and to discover cluster with arbitrary shape and densities. However, KNNClust faces problem on clustering data with different size. Therefore, this research proposed a new pattern extraction scheme integrating triangular kernel function and local average density technique called TKC to improve KNN-density-based clustering algorithm. The improved scheme has been validated experimentally with two scenarios: using real multidimensional spatio-temporal data and using various classification datasets. Four different measurements were used to validate the clustering results; Dunn and Silhouette index to assess the quality, F-measure to evaluate the performance of approach in terms of accuracy, ANOVA test to analyze the cluster distribution, and processing time to measure the efficiency. The proposed scheme was benchmarked with other well-known clustering methods including KNNClust, Iterative Local Gaussian Clustering (ILGC), basic k-means, KFCM, DBSCAN and DENCLUE. The results on the classification dataset demonstrated that TKC produced clusters with higher accuracy and more efficient than other clustering methods. In addition, the analysis of the results showed that the proposed TKC scheme is capable of handling multidimensional data, validated by Silhouette and Dunn index which was close to one, indicating reliable results.

3 citations


Journal ArticleDOI
19 Nov 2013
TL;DR: This work proposes an unsupervised image segmentation method using Rough-Fuzzy C-Mean a hybrid model for segmenting RGB image by reducing cluster centers using rough sets and FuzzyC-Means Method and compares the effectiveness of the clustering methods with cluster validity index such as DB Index, XB Index and Dunn Index.
Abstract: Image segmentation is the process of subdividing an image into its constituent parts and extracting these parts of interest, which are the objects. Colour image segmentation emerges as a new area of research. It can solve many contemporary problems in medical imaging, mining and mineral imaging, bioinformatics, and material sciences. Naturally, color image segmentation demands well defined borders of different objects in an image. So, there is a fundamental demand of accuracy. The segmented regions or components should not be further away from the true object than one or a few pixels. So, there is a need for improved image segmentation technique that can segment different components precisely. Image data may have corrupted values due to the usual limitations or artifacts of imaging devices. Noisy data, data sparsity, and high dimensionality of data create difficulties in image pixel clustering. As a result, image pixel clustering becomes a harder problem than other form of data. Taking into account all the above considerations we propose an unsupervised image segmentation method using Rough-Fuzzy C-Mean a hybrid model for segmenting RGB image by reducing cluster centers using rough sets and Fuzzy C-Means Method, and also compare the effectiveness of the clustering methods such as Hard C Means (HCM), Fuzzy C Means (FCM), Fuzzy K Means (FKM), Rough C Means (RCM) with cluster validity index such as DB Index, XB Index and Dunn Index. A good clustering procedure should make the value of DB index as low as possible, for Dunn Index high value, and for XB Index low value.

3 citations


Journal ArticleDOI
TL;DR: LTKC approach was found to be able to discover responsible clusters within fatal accident data, which had proven by silhouette and Dunn index values close to 1.
Abstract: accidents are an important concern of today's governments and societies, due to the high cost of human and economical resources involved. Data mining has been proven able to significantly help in improving traffic safety. Among several data mining tasks, clustering technique is mostly applied on spatio-temporal data, especially for the traffic data. A number of traffic related works proposed different clustering techniques for mining the spatio-temporal of traffic accident. However, some difficulties appeared when analyzing these datasets, such as the size of data, the lack of statistical evaluation methods, and interpreting the valuable patterns. With regard to solving this problem, this paper proposes a clustering approach for mining spatio-temporal data of fatal accident using local triangular kernel clustering (LTKC) algorithm. LTKC is kernel-density-based clustering algorithm that has the ability to determine the number of clusters automatically. We also propose three visualization techniques for use to interpret and present the optimal clustering result in an easy-understanding form. From the experimental results, LTKC approach was found to be able to discover responsible clusters within fatal accident data, which had proven by silhouette and Dunn index values close to 1. In addition, using visual techniques, we can state that the clustering results were well-separated and compact clusters.

2 citations


Journal ArticleDOI
TL;DR: The experimental results reveal that the quality of the clustered partitions based on the internal criterion conclude, kernel fuzzy c mean clustering algorithm performs better than fuzzy c means and k-means clustering methods.
Abstract: Segmentation of digital image plays a major role in computer visualization It is used to extract meaningful objects that exist on the images Region based clustering is done to extract objects based on the colors present in the satellite images The principle of clustering is to identify the similar domains from a huge data set to produce an accurate representation of the image In this paper, k-means, fuzzy c means and kernel fuzzy c means clustering algorithms are used to partition an image data set into number clusters The images are clustered into four and six categories for which the qualities of the images are compared through the internal criterion techniques Davies–Bouldin index and Dunn index For this paper, experiment is carried out with more than 100 satellite images Finally the PASCO Satellite Ortho (PSO) satellite image is selected, which covers the areas around Mt Kaimondake in Kagoshima, Japan The experimental results reveal that the quality of the clustered partitions based on the internal criterion conclude, kernel fuzzy c means clustering algorithm performs better than fuzzy c means and k-means clustering methods

Proceedings ArticleDOI
11 Apr 2013
TL;DR: The presented method parses the set of training data, consisting of normal and anomaly data, and separates the data into two clusters, each cluster represented by its centroid - one of the normal observations, and the other - for the anomalies.
Abstract: In the present paper a 2-means clustering-based anomaly detection technique is proposed. The presented method parses the set of training data, consisting of normal and anomaly data, and separates the data into two clusters. Each cluster is represented by its centroid - one of the normal observations, and the other - for the anomalies. The paper also provides appropriate methods for clustering, training and detection of attacks. The performance of the presented methodology is evaluated by the following methods: Recall, Precision and F1-measure. Measurements of performance are executed with Dunn index and Davies-Bouldin index.