Author

# Sanghamitra Bandyopadhyay

Other affiliations: University of Maryland, Baltimore County, Tsinghua University

Bio: Sanghamitra Bandyopadhyay is an academic researcher from Indian Statistical Institute. The author has contributed to research in topics: Cluster analysis & Fuzzy clustering. The author has an hindex of 50, co-authored 360 publications receiving 13375 citations. Previous affiliations of Sanghamitra Bandyopadhyay include University of Maryland, Baltimore County & Tsinghua University.

##### Papers published on a yearly basis

##### Papers

More filters

••

TL;DR: The superiority of the GA-clustering algorithm over the commonly used K-means algorithm is extensively demonstrated for four artificial and three real-life data sets.

1,337 citations

••

TL;DR: This article evaluates the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely Davies-Bouldin index, Dunn's index, Calinski-Harabasz index, andA recently developed index I.

Abstract: In this article, we evaluate the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely Davies-Bouldin index, Dunn's index, Calinski-Harabasz index, and a recently developed index I. Based on a relation between the index I and the Dunn's index, a lower bound of the value of the former is theoretically estimated in order to get unique hard K-partition when the data set has distinct substructures. The effectiveness of the different validity indices and clustering methods in automatically evolving the appropriate number of clusters is demonstrated experimentally for both artificial and real-life data sets with the number of clusters varying from two to ten. Once the appropriate number of clusters is determined, the SA-based clustering technique is used for proper partitioning of the data into the said number of clusters.

1,247 citations

••

TL;DR: A simulated annealing based multiobjective optimization algorithm that incorporates the concept of archive in order to provide a set of tradeoff solutions for the problem under consideration that is found to be significantly superior for many objective test problems.

Abstract: This paper describes a simulated annealing based multiobjective optimization algorithm that incorporates the concept of archive in order to provide a set of tradeoff solutions for the problem under consideration. To determine the acceptance probability of a new solution vis-a-vis the current solution, an elaborate procedure is followed that takes into account the domination status of the new solution with the current solution, as well as those in the archive. A measure of the amount of domination between two solutions is also used for this purpose. A complexity analysis of the proposed algorithm is provided. An extensive comparative study of the proposed algorithm with two other existing and well-known multiobjective evolutionary algorithms (MOEAs) demonstrate the effectiveness of the former with respect to five existing performance measures, and several test problems of varying degrees of difficulty. In particular, the proposed algorithm is found to be significantly superior for many objective test problems (e.g., 4, 5, 10, and 15 objective problems), while recent studies have indicated that the Pareto ranking-based MOEAs perform poorly for such problems. In a part of the investigation, comparison of the real-coded version of the proposed algorithm is conducted with a very recent multiobjective simulated annealing algorithm, where the performance of the former is found to be generally superior to that of the latter.

764 citations

••

TL;DR: A cluster validity index and its fuzzification is described, which can provide a measure of goodness of clustering on different partitions of a data set, and results demonstrating the superiority of the PBM-index in appropriately determining the number of clusters are provided.

710 citations

••

TL;DR: A new diversity parameter has been used to ensure sufficient diversity amongst the solutions of the non-dominated fronts, while retaining at the same time the convergence to the Pareto-optimal front.

482 citations

##### Cited by

More filters

••

[...]

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

•

9,185 citations

••

[...]

TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.

Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

7,116 citations

••

TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.

Abstract: Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.

5,744 citations

••

TL;DR: A new optimization algorithm based on the law of gravity and mass interactions is introduced and the obtained results confirm the high performance of the proposed method in solving various nonlinear functions.

5,501 citations