scispace - formally typeset
Search or ask a question
Author

Sanghamitra Bandyopadhyay

Bio: Sanghamitra Bandyopadhyay is an academic researcher from Indian Statistical Institute. The author has contributed to research in topics: Cluster analysis & Fuzzy clustering. The author has an hindex of 50, co-authored 360 publications receiving 13375 citations. Previous affiliations of Sanghamitra Bandyopadhyay include University of Maryland, Baltimore County & Tsinghua University.


Papers
More filters
Journal ArticleDOI
TL;DR: The problem of classifying an image into different homogeneous regions is viewed as the task of clustering the pixels in the intensity space and real-coded variable string length genetic fuzzy clustering with automatic evolution of clusters is used.
Abstract: The problem of classifying an image into different homogeneous regions is viewed as the task of clustering the pixels in the intensity space. Real-coded variable string length genetic fuzzy clustering with automatic evolution of clusters is used for this purpose. The cluster centers are encoded in the chromosomes, and the Xie-Beni index is used as a measure of the validity of the corresponding partition. The effectiveness of the proposed technique is demonstrated for classifying different landcover regions in remote sensing imagery. Results are compared with those obtained using the well-known fuzzy C-means algorithm.

254 citations

Journal ArticleDOI
02 Feb 2010-Silence
TL;DR: A cancer-miRNA network is developed by mining the literature of experimentally verified cancer- miRNA relationships and it is found that in 67% of the cancer types have at least two neighboring miRNAs showing downregulation which is statistically significant (P < 10-7, Randomization test).
Abstract: MicroRNAs are a class of small noncoding RNAs that are abnormally expressed in different cancer cells. Molecular signature of miRNAs in different malignancies suggests that these are not only actively involved in the pathogenesis of human cancer but also have a significant role in patients survival. The differential expression patterns of specific miRNAs in a specific cancer tissue type have been reported in hundreds of research articles. However limited attempt has been made to collate this multitude of information and obtain a global perspective of miRNA dysregulation in multiple cancer types. In this article a cancer-miRNA network is developed by mining the literature of experimentally verified cancer-miRNA relationships. This network throws up several new and interesting biological insights which were not evident in individual experiments, but become evident when studied in the global perspective. From the network a number of cancer-miRNA modules have been identified based on a computational approach to mine associations between cancer types and miRNAs. The modules that are generated based on these association are found to have a number of common predicted target onco/tumor suppressor genes. This suggests a combinatorial effect of the module associated miRNAs on target gene regulation in selective cancer tissues or cell lines. Moreover, neighboring miRNAs (group of miRNAs that are located within 50 kb of genomic location) of these modules show similar dysregulation patterns suggesting common regulatory pathway. Besides this, neighboring miRNAs may also show a similar dysregulation patterns (differentially coexpressed) in the cancer tissues. In this study, we found that in 67% of the cancer types have at least two neighboring miRNAs showing downregulation which is statistically significant (P < 10-7, Randomization test). A similar result is obtained for the neighboring miRNAs showing upregulation in specific cancer type. These results elucidate the fact that the neighboring miRNAs might be differentially coexpressed in cancer tissues as that of the normal tissue types. Additionally, cancer-miRNA network efficiently detect hub miRNAs dysregulated in many cancer types and identify cancer specific miRNAs. Depending on the expression patterns, it is possible to identify those hubs that have strong oncogenic or tumor suppressor characteristics. Limited work has been done towards revealing the fact that a number of miRNAs can control commonly altered regulatory pathways. However, this becomes immediately evident by accompanying the analysis of cancer-miRNA relationships in the proposed network model. These raise many unaddressed issues in miRNA research that have never been reported previously. These observations are expected to have an intense implication in cancer and may be useful for further research.

224 citations

Journal ArticleDOI
TL;DR: This article has identified approximately 300 tissue-specific negative examples using a novel approach that involves expression profiling of both miRNAs and mRNAs, miRNA-mRNA structural interactions and seed-site conservation, which clearly establishes the effectiveness of the proposed approach of selecting the negative examples systematically.
Abstract: Motivation: Prediction of microRNA (miRNA) target mRNAs using machine learning approaches is an important area of research. However, most of the methods suffer from either high false positive or false negative rates. One reason for this is the marked deficiency of negative examples or miRNA-non target pairs. Systematic identification of non-target mRNAs is still not addressed properly, and therefore, current machine learning approaches are compelled to rely on artificially generated negative examples for training. Results: In this paper we have identified ∼300 tissue specific negative examples using a novel approach that involves expression profiling of both miRNAs and mRNAs, miRNA-mRNA structural interactions and seed site conservation. The newly generated negative examples are validated with pSILAC data set (Selbach et al., 2008) that elucidate the fact that the identified non-targets are indeed non-targets.These high throughput tissue specific negative examples and a set of experimentally verified positive examples are then used to build a system called TargetMiner, a support vector machine (SVM) based classifier. In addition to assessing the prediction accuracy on cross-validation experiments, TargetMiner has been validated with a completely independent experimental test data set. Our method outperforms 10 existing target prediction algorithms and provides a good balance between sensitivity and specificity that is not reflected in the existing methods. We achieve a significantly higher sensitivity and specificity of 69% and 67.8% based on a pool of 90 feature set and 76.5% and 66.1% using a set of 30 selected feature set on the completely independent test data set. In order to establish the effectiveness of the systematically generated negative examples, the SVM is trained using a different set of negative data generated using the method in (Yousef et al., 2007). A significantly higher false positive rate (70.6%) is observed when tested on the independent set, while all other factors are kept the same. Again, when an existing method (NBmiRTar) is executed with the our proposed negative data, we observe an improvement in its performance. These clearly establish the effectiveness of the proposed approach of selecting the negative examples systematically. Availability: TargetMiner is now available as an online tool at www.isical.ac.in/∼bioinfo miu Supplementary materials: Supplementary materials are available at www.isical.ac.in/∼bioinfo miu/Download.html

204 citations

Journal ArticleDOI
TL;DR: The effectiveness of the PBMF index as the optimization criterion along with a genetic fuzzy partitioning technique is demonstrated on a number of artificial and real data sets including a remote sensing image of the city of Kolkata.

199 citations

Journal ArticleDOI
TL;DR: The significant superiority of the proposed two-stage clustering algorithm as compared to the average linkage method, Self Organizing Map (SOM) and a recently developed weighted Chinese restaurant-based clustering method (CRC), widely used methods for clustering gene expression data, is established.
Abstract: Motivation: Recent advancements in microarray technology allows simultaneous monitoring of the expression levels of a large number of genes over different time points. Clustering is an important tool for analyzing such microarray data, typical properties of which are its inherent uncertainty, noise and imprecision. In this article, a two-stage clustering algorithm, which employs a recently proposed variable string length genetic scheme and a multiobjective genetic clustering algorithm, is proposed. It is based on the novel concept of points having significant membership to multiple classes. An iterated version of the well-known Fuzzy C-Means is also utilized for clustering. Results: The significant superiority of the proposed two-stage clustering algorithm as compared to the average linkage method, Self Organizing Map (SOM) and a recently developed weighted Chinese restaurant-based clustering method (CRC), widely used methods for clustering gene expression data, is established on a variety of artificial and publicly available real life data sets. The biological relevance of the clustering solutions are also analyzed. Contact: anirbanbuba@yahoo.com Supplementary information: The processed and normalized data sets, supplementary figures, tables and other related materials are available at http://d.1asphost.com/anirbanmukhopadhyay/simmts.html

196 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Proceedings ArticleDOI
22 Jan 2006
TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.
Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

7,116 citations

Journal ArticleDOI
TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.
Abstract: Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.

5,744 citations

Journal ArticleDOI
TL;DR: A new optimization algorithm based on the law of gravity and mass interactions is introduced and the obtained results confirm the high performance of the proposed method in solving various nonlinear functions.

5,501 citations