scispace - formally typeset
Search or ask a question
Author

Malay K. Pakhira

Bio: Malay K. Pakhira is an academic researcher from Kalyani Government Engineering College. The author has contributed to research in topics: Cluster analysis & Fuzzy clustering. The author has an hindex of 7, co-authored 20 publications receiving 988 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A cluster validity index and its fuzzification is described, which can provide a measure of goodness of clustering on different partitions of a data set, and results demonstrating the superiority of the PBM-index in appropriately determining the number of clusters are provided.

710 citations

Journal ArticleDOI
TL;DR: The effectiveness of the PBMF index as the optimization criterion along with a genetic fuzzy partitioning technique is demonstrated on a number of artificial and real data sets including a remote sensing image of the city of Kolkata.

199 citations

Journal ArticleDOI
TL;DR: An efficient partitional clustering technique, called SAKM-clustering, that integrates the power of simulated annealing for obtaining minimum energy configuration, and the searching capability of K-means algorithm is proposed in this article.
Abstract: An efficient partitional clustering technique, called SAKM-clustering, that integrates the power of simulated annealing for obtaining minimum energy configuration, and the searching capability of K-means algorithm is proposed in this article. The clustering methodology is used to search for appropriate clusters in multidimensional feature space such that a similarity metric of the resulting clusters is optimized. Data points are redistributed among the clusters probabilistically, so that points that are farther away from the cluster center have higher probabilities of migrating to other clusters than those which are closer to it. The superiority of the SAKM-clustering algorithm over the widely used K-means algorithm is extensively demonstrated for artificial and real life data sets.

78 citations

Journal ArticleDOI
TL;DR: It is shown how VAT-based algorithms may be used for automatic determination of number of clusters very efficiently.

23 citations

Proceedings ArticleDOI
25 Jun 2005
TL;DR: A pipelined version of the commonly used Genetic Algorithms and a corresponding hardware platform is described and a detailed description of one such unit is presented, showing that PLGA may be even more effective than PGAs.
Abstract: Genetic Algorithms (GAs) are very commonly used as function optimizers, basically due to their search capability. A number of different serial and parallel versions of GA exist. In this paper, a pipelined version of the commonly used Genetic Algorithms and a corresponding hardware platform is described. The main idea of achieving pipelined execution of different operations of GA is to use a stochastic selection function which works with the fitness value of the candidate chromosome only. The modified algorithm is termed PLGA (Pipelined Genetic Algorithm). When executed in a CGA (Classical Genetic Algorithm) framework, the stochastic selection gives comparable performances with the roulette-wheel selection. In the pipelined hardware environment, PLGA will be much faster than the CGA. When executed on similar hardware platforms, PLGA may attain a maximum speedup of four over CGA. However, if CGA is executed in a uniprocessor system the speedup is much more. A comparison of PLGA against PGA (Parallel Genetic Algorithms) shows that PLGA may be even more effective than PGAs. A scheme for realizing the hardware pipeline is also presented. Since a general function evaluation unit is essential, a detailed description of one such unit is presented.

16 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This article evaluates the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely Davies-Bouldin index, Dunn's index, Calinski-Harabasz index, andA recently developed index I.
Abstract: In this article, we evaluate the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely Davies-Bouldin index, Dunn's index, Calinski-Harabasz index, and a recently developed index I. Based on a relation between the index I and the Dunn's index, a lower bound of the value of the former is theoretically estimated in order to get unique hard K-partition when the data set has distinct substructures. The effectiveness of the different validity indices and clustering methods in automatically evolving the appropriate number of clusters is demonstrated experimentally for both artificial and real-life data sets with the number of clusters varying from two to ten. Once the appropriate number of clusters is determined, the SA-based clustering technique is used for proper partitioning of the data into the said number of clusters.

1,247 citations

Journal ArticleDOI
TL;DR: A simulated annealing based multiobjective optimization algorithm that incorporates the concept of archive in order to provide a set of tradeoff solutions for the problem under consideration that is found to be significantly superior for many objective test problems.
Abstract: This paper describes a simulated annealing based multiobjective optimization algorithm that incorporates the concept of archive in order to provide a set of tradeoff solutions for the problem under consideration. To determine the acceptance probability of a new solution vis-a-vis the current solution, an elaborate procedure is followed that takes into account the domination status of the new solution with the current solution, as well as those in the archive. A measure of the amount of domination between two solutions is also used for this purpose. A complexity analysis of the proposed algorithm is provided. An extensive comparative study of the proposed algorithm with two other existing and well-known multiobjective evolutionary algorithms (MOEAs) demonstrate the effectiveness of the former with respect to five existing performance measures, and several test problems of varying degrees of difficulty. In particular, the proposed algorithm is found to be significantly superior for many objective test problems (e.g., 4, 5, 10, and 15 objective problems), while recent studies have indicated that the Pareto ranking-based MOEAs perform poorly for such problems. In a part of the investigation, comparison of the real-coded version of the proposed algorithm is conducted with a very recent multiobjective simulated annealing algorithm, where the performance of the former is found to be generally superior to that of the latter.

764 citations

Journal ArticleDOI
01 Jan 2008
TL;DR: Differential evolution has emerged as one of the fast, robust, and efficient global search heuristics of current interest as mentioned in this paper, which has been applied to the automatic clustering of large unlabeled data sets.
Abstract: Differential evolution (DE) has emerged as one of the fast, robust, and efficient global search heuristics of current interest. This paper describes an application of DE to the automatic clustering of large unlabeled data sets. In contrast to most of the existing clustering techniques, the proposed algorithm requires no prior knowledge of the data to be classified. Rather, it determines the optimal number of partitions of the data "on the run." Superiority of the new method is demonstrated by comparing it with two recently developed partitional clustering techniques and one popular hierarchical clustering algorithm. The partitional clustering algorithms are based on two powerful well-known optimization algorithms, namely the genetic algorithm and the particle swarm optimization. An interesting real-world application of the proposed method to automatic segmentation of images is also reported.

700 citations

Journal ArticleDOI
01 Mar 2009
TL;DR: An up-to-date overview that is fully devoted to evolutionary algorithms for clustering, is not limited to any particular kind of evolutionary approach, and comprises advanced topics like multiobjective and ensemble-based evolutionary clustering.
Abstract: This paper presents a survey of evolutionary algorithms designed for clustering tasks. It tries to reflect the profile of this area by focusing more on those subjects that have been given more importance in the literature. In this context, most of the paper is devoted to partitional algorithms that look for hard clusterings of data, though overlapping (i.e., soft and fuzzy) approaches are also covered in the paper. The paper is original in what concerns two main aspects. First, it provides an up-to-date overview that is fully devoted to evolutionary algorithms for clustering, is not limited to any particular kind of evolutionary approach, and comprises advanced topics like multiobjective and ensemble-based evolutionary clustering. Second, it provides a taxonomy that highlights some very important aspects in the context of evolutionary data clustering, namely, fixed or variable number of clusters, cluster-oriented or nonoriented operators, context-sensitive or context-insensitive operators, guided or unguided operators, binary, integer, or real encodings, centroid-based, medoid-based, label-based, tree-based, or graph-based representations, among others. A number of references are provided that describe applications of evolutionary algorithms for clustering in different domains, such as image processing, computer security, and bioinformatics. The paper ends by addressing some important issues and open questions that can be subject of future research.

690 citations

Proceedings ArticleDOI
05 Jun 2002
TL;DR: This work considers the question of whether there exists a simple and practical approximation algorithm for k-means clustering, and presents a local improvement heuristic based on swapping centers in and out that yields a (9+ε)-approximation algorithm.
Abstract: In k-means clustering we are given a set of n data points in d-dimensional space ℜd and an integer k, and the problem is to determine a set of k points in ℜd, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomial-time algorithms are known for this problem. Although asymptotically efficient approximation algorithms exist, these algorithms are not practical due to the extremely high constant factors involved. There are many heuristics that are used in practice, but we know of no bounds on their performance.We consider the question of whether there exists a simple and practical approximation algorithm for k-means clustering. We present a local improvement heuristic based on swapping centers in and out. We prove that this yields a (9+e)-approximation algorithm. We show that the approximation factor is almost tight, by giving an example for which the algorithm achieves an approximation factor of (9-e). To establish the practical value of the heuristic, we present an empirical study that shows that, when combined with Lloyd's algorithm, this heuristic performs quite well in practice.

639 citations