Bio: Sanghamitra Bandyopadhyay is an academic researcher from Indian Statistical Institute. The author has contributed to research in topic(s): Cluster analysis & Fuzzy clustering. The author has an hindex of 50, co-authored 360 publication(s) receiving 13375 citation(s). Previous affiliations of Sanghamitra Bandyopadhyay include University of Maryland, Baltimore County & Tsinghua University.
Topics: Cluster analysis, Fuzzy clustering, Correlation clustering, CURE data clustering algorithm, k-medians clustering
01 Sep 2000-Pattern Recognition
TL;DR: The superiority of the GA-clustering algorithm over the commonly used K-means algorithm is extensively demonstrated for four artificial and three real-life data sets.
Abstract: A genetic algorithm-based clustering technique, called GA-clustering, is proposed in this article. The searching capability of genetic algorithms is exploited in order to search for appropriate cluster centres in the feature space such that a similarity metric of the resulting clusters is optimized. The chromosomes, which are represented as strings of real numbers, encode the centres of a fixed number of clusters. The superiority of the GA-clustering algorithm over the commonly used K-means algorithm is extensively demonstrated for four artificial and three real-life data sets.
TL;DR: This article evaluates the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely Davies-Bouldin index, Dunn's index, Calinski-Harabasz index, andA recently developed index I.
Abstract: In this article, we evaluate the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely Davies-Bouldin index, Dunn's index, Calinski-Harabasz index, and a recently developed index I. Based on a relation between the index I and the Dunn's index, a lower bound of the value of the former is theoretically estimated in order to get unique hard K-partition when the data set has distinct substructures. The effectiveness of the different validity indices and clustering methods in automatically evolving the appropriate number of clusters is demonstrated experimentally for both artificial and real-life data sets with the number of clusters varying from two to ten. Once the appropriate number of clusters is determined, the SA-based clustering technique is used for proper partitioning of the data into the said number of clusters.
01 Jun 2008-IEEE Transactions on Evolutionary Computation
TL;DR: A simulated annealing based multiobjective optimization algorithm that incorporates the concept of archive in order to provide a set of tradeoff solutions for the problem under consideration that is found to be significantly superior for many objective test problems.
Abstract: This paper describes a simulated annealing based multiobjective optimization algorithm that incorporates the concept of archive in order to provide a set of tradeoff solutions for the problem under consideration. To determine the acceptance probability of a new solution vis-a-vis the current solution, an elaborate procedure is followed that takes into account the domination status of the new solution with the current solution, as well as those in the archive. A measure of the amount of domination between two solutions is also used for this purpose. A complexity analysis of the proposed algorithm is provided. An extensive comparative study of the proposed algorithm with two other existing and well-known multiobjective evolutionary algorithms (MOEAs) demonstrate the effectiveness of the former with respect to five existing performance measures, and several test problems of varying degrees of difficulty. In particular, the proposed algorithm is found to be significantly superior for many objective test problems (e.g., 4, 5, 10, and 15 objective problems), while recent studies have indicated that the Pareto ranking-based MOEAs perform poorly for such problems. In a part of the investigation, comparison of the real-coded version of the proposed algorithm is conducted with a very recent multiobjective simulated annealing algorithm, where the performance of the former is found to be generally superior to that of the latter.
01 Mar 2004-Pattern Recognition
TL;DR: A cluster validity index and its fuzzification is described, which can provide a measure of goodness of clustering on different partitions of a data set, and results demonstrating the superiority of the PBM-index in appropriately determining the number of clusters are provided.
Abstract: In this article, a cluster validity index and its fuzzification is described, which can provide a measure of goodness of clustering on different partitions of a data set. The maximum value of this index, called the PBM-index, across the hierarchy provides the best partitioning. The index is defined as a product of three factors, maximization of which ensures the formation of a small number of compact clusters with large separation between at least two clusters. We have used both the k-means and the expectation maximization algorithms as underlying crisp clustering techniques. For fuzzy clustering, we have utilized the well-known fuzzy c-means algorithm. Results demonstrating the superiority of the PBM-index in appropriately determining the number of clusters, as compared to three other well-known measures, the Davies–Bouldin index, Dunn's index and the Xie–Beni index, are provided for several artificial and real-life data sets.
15 Nov 2007-Information Sciences
TL;DR: A new diversity parameter has been used to ensure sufficient diversity amongst the solutions of the non-dominated fronts, while retaining at the same time the convergence to the Pareto-optimal front.
Abstract: In this article we describe a novel Particle Swarm Optimization (PSO) approach to multi-objective optimization (MOO), called Time Variant Multi-Objective Particle Swarm Optimization (TV-MOPSO). TV-MOPSO is made adaptive in nature by allowing its vital parameters (viz., inertia weight and acceleration coefficients) to change with iterations. This adaptiveness helps the algorithm to explore the search space more efficiently. A new diversity parameter has been used to ensure sufficient diversity amongst the solutions of the non-dominated fronts, while retaining at the same time the convergence to the Pareto-optimal front. TV-MOPSO has been compared with some recently developed multi-objective PSO techniques and evolutionary algorithms for 11 function optimization problems, using different performance measures.
Thomas G. Dietterich1•Institutions (1)
01 Dec 1996-ACM Computing Surveys
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).
01 Jan 2014-Journal of management science
Alan Frieze1•Institutions (1)
22 Jan 2006-
TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.
Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.
01 May 2005-IEEE Transactions on Neural Networks
TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.
Abstract: Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.
01 Jun 2009-Information Sciences
TL;DR: A new optimization algorithm based on the law of gravity and mass interactions is introduced and the obtained results confirm the high performance of the proposed method in solving various nonlinear functions.
Abstract: In recent years, various heuristic optimization methods have been developed. Many of these methods are inspired by swarm behaviors in nature. In this paper, a new optimization algorithm based on the law of gravity and mass interactions is introduced. In the proposed algorithm, the searcher agents are a collection of masses which interact with each other based on the Newtonian gravity and the laws of motion. The proposed method has been compared with some well-known heuristic search methods. The obtained results confirm the high performance of the proposed method in solving various nonlinear functions.
Author's H-index: 50