scispace - formally typeset
Search or ask a question
Author

R. Jothi

Bio: R. Jothi is an academic researcher from Pandit Deendayal Petroleum University. The author has contributed to research in topics: Cluster analysis & Minimum spanning tree. The author has an hindex of 4, co-authored 11 publications receiving 98 citations. Previous affiliations of R. Jothi include Indian Institute of Information Technology, Design and Manufacturing, Jabalpur & VIT University.

Papers
More filters
Journal ArticleDOI
TL;DR: A deterministic initialization algorithm for K-means (DK-me means) is proposed by exploring a set of probable centers through a constrained bi-partitioning approach and achieves improved results in terms of faster and stable convergence, and better cluster quality as compared to other algorithms.
Abstract: Clustering has been widely applied in interpreting the underlying patterns in microarray gene expression profiles, and many clustering algorithms have been devised for the same. K-means is one of the popular algorithms for gene data clustering due to its simplicity and computational efficiency. But, K-means algorithm is highly sensitive to the choice of initial cluster centers. Thus, the algorithm easily gets trapped with local optimum if the initial centers are chosen randomly. This paper proposes a deterministic initialization algorithm for K-means (DK-means) by exploring a set of probable centers through a constrained bi-partitioning approach. The proposed algorithm is compared with classical K-means with random initialization and improved K-means variants such as K-means++ and MinMax algorithms. It is also compared with three deterministic initialization methods. Experimental analysis on gene expression datasets demonstrates that DK-means achieves improved results in terms of faster and stable convergence, and better cluster quality as compared to other algorithms.

45 citations

Journal ArticleDOI
TL;DR: This paper proposes an algorithm namely MST-based clustering on partition-based nearest neighbor graph for reducing the computational overhead by using a centroid based nearest neighbor rule and proves that both size and computational time to construct the graph (LNG) is O(n3/2), which is a O ( n ) factor improvement over the traditional algorithms.

35 citations

Journal ArticleDOI
TL;DR: A novel clustering algorithm using Eigenanalysis on Minimum Spanning Tree based neighborhood graph (E-MST) using a similarity graph obtained from k(') rounds of MST (k(')-MST neighborhood graph) achieves improved clustering results.

20 citations

Book ChapterDOI
13 May 2015
TL;DR: The proposed algorithms make use of a centroid based nearest neighbor rule to generate a partition-based Local Neighborhood Graph (LNG) and it is proved that both the size and the computational time to construct the graph (L NG) is O(n 3/2), which is a factor improvement over the traditional algorithms.
Abstract: Minimum spanning tree (MST) based clustering algorithms have been employed successfully to detect clusters of heterogeneous nature. Given a dataset of n random points, most of the MST-based clustering algorithms first generate a complete graph G of the dataset and then construct MST from G. The first step of the algorithm is the major bottleneck which takes O(n 2) time. This paper proposes two algorithms namely MST-based clustering on K-means Graph and MST-based clustering on Bi-means Graph for reducing the computational overhead. The proposed algorithms make use of a centroid based nearest neighbor rule to generate a partition-based Local Neighborhood Graph (LNG). We prove that both the size and the computational time to construct the graph (LNG) is O(n 3/2), which is a \(O(\sqrt n)\) factor improvement over the traditional algorithms. The approximate MST is constructed from LNG in \(O(n^{3/2} \lg n)\) time, which is asymptotically faster than O(n 2). The advantage of the proposed algorithms is that they do not require any parameter setting which is a major issue in many of the nearest neighbor finding algorithms. Experimental results demonstrate that the computational time has been reduced significantly by maintaining the quality of the clusters obtained from the MST.

12 citations

Proceedings ArticleDOI
14 Jun 2018
TL;DR: A comparative study on software fault prediction using K-means clustering algorithm and its variants and results indicate that proper initial seed selection enables K- means algorithm to effectively group the faulty modules.
Abstract: Software fault prediction is an important task in software development process which enables software practitioners to easily detect and rectify the errors in modules or classes. Various fault prediction techniques have been studied in the past and unsupervised learning methods such as clustering techniques are drawing much attention in the recent years. K-means is a well known clustering algorithm which is applied on various exploratory analysis including software fault prediction. This paper provides a comparative study on software fault prediction using K-means clustering algorithm and its variants. We use five software fault prediction datasets taken from PROMISE repository to evaluate the prediction accuracy of the clustering algorithms. Experimental results indicate that proper initial seed selection enables K-means algorithm to effectively group the faulty modules.

5 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A deterministic initialization algorithm for K-means (DK-me means) is proposed by exploring a set of probable centers through a constrained bi-partitioning approach and achieves improved results in terms of faster and stable convergence, and better cluster quality as compared to other algorithms.
Abstract: Clustering has been widely applied in interpreting the underlying patterns in microarray gene expression profiles, and many clustering algorithms have been devised for the same. K-means is one of the popular algorithms for gene data clustering due to its simplicity and computational efficiency. But, K-means algorithm is highly sensitive to the choice of initial cluster centers. Thus, the algorithm easily gets trapped with local optimum if the initial centers are chosen randomly. This paper proposes a deterministic initialization algorithm for K-means (DK-means) by exploring a set of probable centers through a constrained bi-partitioning approach. The proposed algorithm is compared with classical K-means with random initialization and improved K-means variants such as K-means++ and MinMax algorithms. It is also compared with three deterministic initialization methods. Experimental analysis on gene expression datasets demonstrates that DK-means achieves improved results in terms of faster and stable convergence, and better cluster quality as compared to other algorithms.

45 citations

Journal ArticleDOI
TL;DR: The comparative analysis, based on the modified Dunn Index, and silhouette validity ratio have proved that the proposed initialization algorithm has performed better than the other initialization algorithms.

37 citations

Journal ArticleDOI
TL;DR: Two different versions of a new internal index for clustering validation using graphs capture the structural characteristics of each cluster and shows a superior capacity to deal with datasets that present different configurations of variances, densities, geometries and levels of noise.
Abstract: This paper presents two different versions of a new internal index for clustering validation using graphs These graphs capture the structural characteristics of each cluster In this way, the new index overcomes the limitations of traditional indices based on statistics measurements and it is effective on clusters of different shapes and sizes These graphs are generated through an iterative process based on the principal component analysis, which partitions the clusters in a configurable number of “sub-clusters” Then, a minimum spanning tree based on the centroids of each of these sub-clusters is built and used to estimate both the quality of the clusters and the distances between them In particular, the quality of a cluster is defined in this paper as the level of “cohesion” among its sub-clusters The difference between the two versions of the proposed index is how this level of "cohesion" is measured Finally, a comparison of the performance of these two versions of the proposed index with a selected group of well-known internal indices is carried out In these tests, the two versions of the index show a superior capacity to deal with datasets that present different configurations of variances, densities, geometries and levels of noise

36 citations

Proceedings ArticleDOI
01 Dec 2016
TL;DR: This paper developed two variants of the bag of visual words (BOW and HOG-BOW) and examined the use of gray and color information as well as different spatial pooling approaches and modified existing deep CNN architectures: AlexNet and GoogleNet.
Abstract: Most research in image classification has focused on applications such as face, object, scene and character recognition. This paper examines a comparative study between deep convolutional neural networks (CNNs) and bag of visual words (BOW) variants for recognizing animals. We developed two variants of the bag of visual words (BOW and HOG-BOW) and examine the use of gray and color information as well as different spatial pooling approaches. We combined the final feature vectors extracted from these BOW variants with a regularized L2 support vector machine (L2-SVM) to distinguish between classes within our datasets. We modified existing deep CNN architectures: AlexNet and GoogleNet, by reducing the number of neurons in each layer of the fully connected layers and last inception layer for both scratch and pre-trained versions. Finally, we compared the existing CNN methods, our modified CNN architectures and the proposed BOW variants on our novel wild-animal dataset (Wild-Anim). The results show that the CNN methods significantly outperform the BOW techniques.

36 citations

Journal ArticleDOI
TL;DR: This paper proposes an algorithm namely MST-based clustering on partition-based nearest neighbor graph for reducing the computational overhead by using a centroid based nearest neighbor rule and proves that both size and computational time to construct the graph (LNG) is O(n3/2), which is a O ( n ) factor improvement over the traditional algorithms.

35 citations