scispace - formally typeset
Search or ask a question
Author

Sanghamitra Bandyopadhyay

Bio: Sanghamitra Bandyopadhyay is an academic researcher from Indian Statistical Institute. The author has contributed to research in topics: Cluster analysis & Fuzzy clustering. The author has an hindex of 50, co-authored 360 publications receiving 13375 citations. Previous affiliations of Sanghamitra Bandyopadhyay include University of Maryland, Baltimore County & Tsinghua University.


Papers
More filters
Journal ArticleDOI
01 Oct 2004
TL;DR: The concept of multiobjective optimization (MOO) has been integrated with variable length chromosomes for the development of a nonparametric genetic classifier which can overcome the problems, like overfitting/overlearning and ignoring smaller classes, as faced by single objective classifiers.
Abstract: The concept of multiobjective optimization (MOO) has been integrated with variable length chromosomes for the development of a nonparametric genetic classifier which can overcome the problems, like overfitting/overlearning and ignoring smaller classes, as faced by single objective classifiers. The classifier can efficiently approximate any kind of linear and/or nonlinear class boundaries of a data set using an appropriate number of hyperplanes. While designing the classifier the aim is to simultaneously minimize the number of misclassified training points and the number of hyperplanes, and to maximize the product of class wise recognition scores. The concepts of validation set (in addition to training and test sets) and validation functional are introduced in the multiobjective classifier for selecting a solution from a set of nondominated solutions provided by the MOO algorithm. This genetic classifier incorporates elitism and some domain specific constraints in the search process, and is called the CEMOGA-Classifier (constrained elitist multiobjective genetic algorithm based classifier). Two new quantitative indices, namely, the purity and minimal spacing, are developed for evaluating the performance of different MOO techniques. These are used, along with classification accuracy, required number of hyperplanes and the computation time, to compare the CEMOGA-Classifier with other related ones.

186 citations

Journal ArticleDOI
TL;DR: The existence of various miRNAs involved in regulating the main protein cascades in the insulin signaling pathway that affect insulin resistance are discussed and insights will provide a better understanding on the impact of miRNA in the diabetes signaling pathway and insulin resistance‐associated diagnostics and therapeutics.
Abstract: The prevalence of type-2 diabetes (T2D) is increasing significantly throughout the globe since the last decade. This heterogeneous and multifactorial disease, also known as insulin resistance, is caused by the disruption of the insulin signaling pathway. In this review, we discuss the existence of various miRNAs involved in regulating the main protein cascades in the insulin signaling pathway that affect insulin resistance. The influence of miRNAs (miR-7, miR-124a, miR-9, miR-96, miR-15a/b, miR-34a, miR-195, miR-376, miR-103, miR-107, and miR-146) in insulin secretion and beta (β) cell development has been well discussed. Here, we highlight the role of miRNAs in different significant protein cascades within the insulin signaling pathway such as miR-320, miR-383, miR-181b with IGF-1, and its receptor (IGF1R); miR-128a, miR-96, miR-126 with insulin receptor substrate (IRS) proteins; miR-29, miR-384-5p, miR-1 with phosphatidylinositol 3-kinase (PI3K); miR-143, miR-145, miR-29, miR-383, miR-33a/b miR-21 with AKT/protein kinase B (PKB) and miR-133a/b, miR-223, miR-143 with glucose transporter 4 (GLUT4). Insulin resistance, obesity, and hyperlipidemia (high lipid levels in the blood) have a strong connection with T2D and several miRNAs influence these clinical outcomes such as miR-143, miR-103, and miR-107, miR-29a, and miR-27b. We also corroborate from previous evidence how these interactions are related to insulin resistance and T2D. The insights highlighted in this review will provide a better understanding on the impact of miRNA in the insulin signaling pathway and insulin resistance-associated diagnostics and therapeutics for T2D.

184 citations

Journal ArticleDOI
TL;DR: Experimental results show that, in contrast to the case when all the data is transmitted to a central location for application of the conventional clustering algorithm, the communication cost of the proposed approach is significantly smaller and the accuracy of the obtained centroids is high and the number of samples which are incorrectly labeled is also small.

179 citations

Journal ArticleDOI
TL;DR: The proposed GA with point symmetry (GAPS) distance based clustering algorithm is able to detect any type of clusters, irrespective of their geometrical shape and overlapping nature, as long as they possess the characteristic of symmetry.

164 citations

Journal ArticleDOI
TL;DR: A new symmetry-based genetic clustering algorithm is proposed which automatically evolves the number of clusters as well as the proper partitioning from a data set using a newly proposed PS-based cluster validity index, sym-index, as a measure of the validity of the corresponding partitioning.
Abstract: In this paper, a new symmetry-based genetic clustering algorithm is proposed which automatically evolves the number of clusters as well as the proper partitioning from a data set. Strings comprise both real numbers and the don't care symbol in order to encode a variable number of clusters. Here, assignment of points to different clusters are done based on a point symmetry (PS)-based distance rather than the Euclidean distance. A newly proposed PS-based cluster validity index, sym-index, is used as a measure of the validity of the corresponding partitioning. The algorithm is, therefore, able to detect both convex and nonconvex clusters irrespective of their sizes and shapes as long as they possess the symmetry property. Kd-tree-based nearest neighbor search is used to reduce the complexity of computing PS-based distance. A proof on the convergence property of variable string length genetic algorithm with PS- distance-based clustering (VGAPS-clustering) technique is also provided. The effectiveness of VGAPS-clustering compared to variable string length genetic K-means algorithm (GCUK-clustering) and one recently developed weighted sum validity function-based hybrid niching genetic algorithm (HNGA-clustering) is demonstrated for nine artificial and five real-life data sets.

145 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Proceedings ArticleDOI
22 Jan 2006
TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.
Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

7,116 citations

Journal ArticleDOI
TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.
Abstract: Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.

5,744 citations

Journal ArticleDOI
TL;DR: A new optimization algorithm based on the law of gravity and mass interactions is introduced and the obtained results confirm the high performance of the proposed method in solving various nonlinear functions.

5,501 citations