Showing papers by "Sanghamitra Bandyopadhyay published in 2015"

PDF

Open Access

Journal Article•DOI•

A Survey of Multiobjective Evolutionary Clustering

[...]

Anirban Mukhopadhyay¹, Ujjwal Maulik², Sanghamitra Bandyopadhyay³•Institutions (3)

Kalyani Government Engineering College¹, Jadavpur University², Indian Statistical Institute³

26 May 2015-ACM Computing Surveys

TL;DR: A comprehensive and critical survey of the multitude of multiobjective evolutionary clustering techniques existing in the literature, classified according to the encoding strategies adopted, objective functions, evolutionary operators, strategy for maintaining nondominated solutions, and the method of selection of the final solution.

...read moreread less

Abstract: Data clustering is a popular unsupervised data mining tool that is used for partitioning a given dataset into homogeneous groups based on some similarity/dissimilarity metric. Traditional clustering algorithms often make prior assumptions about the cluster structure and adopt a corresponding suitable objective function that is optimized either through classical techniques or metaheuristic approaches. These algorithms are known to perform poorly when the cluster assumptions do not hold in the data. Multiobjective clustering, in which multiple objective functions are simultaneously optimized, has emerged as an attractive and robust alternative in such situations. In particular, application of multiobjective evolutionary algorithms for clustering has become popular in the past decade because of their population-based nature. Here, we provide a comprehensive and critical survey of the multitude of multiobjective evolutionary clustering techniques existing in the literature. The techniques are classified according to the encoding strategies adopted, objective functions, evolutionary operators, strategy for maintaining nondominated solutions, and the method of selection of the final solution. The pros and cons of the different approaches are mentioned. Finally, we have discussed some real-life applications of multiobjective clustering in the domains of image segmentation, bioinformatics, web mining, and so forth.

...read moreread less

132 citations

Journal Article•DOI•

An Algorithm for Many-Objective Optimization With Reduced Objective Computations: A Study in Differential Evolution

[...]

Sanghamitra Bandyopadhyay¹, Arpan Mukherjee¹•Institutions (1)

Indian Statistical Institute¹

01 Jun 2015-IEEE Transactions on Evolutionary Computation

TL;DR: An algorithm for many-objective optimization problems, which will work more quickly than existing ones, while offering competitive performance, and a new form of elitism so as to restrict the number of higher ranked solutions that are selected in the next population is proposed.

...read moreread less

Abstract: In this paper we have developed an algorithm for many-objective optimization problems, which will work more quickly than existing ones, while offering competitive performance. The algorithm periodically reorders the objectives based on their conflict status and selects a subset of conflicting objectives for further processing. We have taken differential evolution multiobjective optimization (DEMO) as the underlying metaheuristic evolutionary algorithm, and implemented the technique of selecting a subset of conflicting objectives using a correlation-based ordering of objectives. The resultant method is called $\alpha $ -DEMO, where $\alpha $ is a parameter determining the number of conflicting objectives to be selected. We have also proposed a new form of elitism so as to restrict the number of higher ranked solutions that are selected in the next population. The $\alpha $ -DEMO with the revised elitism is referred to as $\alpha $ -DEMO-revised. Extensive results of the five DTLZ functions show that the number of objective computations required in the proposed algorithm is much less compared to the existing algorithms, while the convergence measures are competitive or often better. Statistical significance testing is also performed. A real-life application on structural optimization of factory shed truss is demonstrated.

...read moreread less

89 citations

Journal Article•DOI•

MBSTAR: multiple instance learning for predicting specific functional binding sites in microRNA targets

[...]

Sanghamitra Bandyopadhyay¹, Dip Ghosh¹, Ramkrishna Mitra², Zhongming Zhao•Institutions (2)

Indian Statistical Institute¹, Vanderbilt University²

23 Jan 2015-Scientific Reports

TL;DR: This work presents a novel machine learning based approach, MBSTAR (Multiple instance learning of Binding Sites of miRNA TARgets), for accurate prediction of true or functional miRNA binding sites and predicts target mRNAs with highest accuracy.

...read moreread less

Abstract: MicroRNA (miRNA) regulates gene expression by binding to specific sites in the 3′untranslated regions of its target genes. Machine learning based miRNA target prediction algorithms first extract a set of features from potential binding sites (PBSs) in the mRNA and then train a classifier to distinguish targets from non-targets. However, they do not consider whether the PBSs are functional or not, and consequently result in high false positive rates. This substantially affects the follow up functional validation by experiments. We present a novel machine learning based approach, MBSTAR (Multiple instance learning of Binding Sites of miRNA TARgets), for accurate prediction of true or functional miRNA binding sites. Multiple instance learning framework is adopted to handle the lack of information about the actual binding sites in the target mRNAs. Biologically validated 9531 interacting and 973 non-interacting miRNA-mRNA pairs are identified from Tarbase 6.0 and confirmed with PAR-CLIP dataset. It is found that MBSTAR achieves the highest number of binding sites overlapping with PAR-CLIP with maximum F-Score of 0.337. Compared to the other methods, MBSTAR also predicts target mRNAs with highest accuracy. The tool and genome wide predictions are available at http://www.isical.ac.in/~bioinfo_miu/MBStar30.htm.

...read moreread less

65 citations

Journal Article•DOI•

Unsupervised feature selection using an improved version of Differential Evolution

[...]

Tapas Bhadra¹, Sanghamitra Bandyopadhyay¹•Institutions (1)

Indian Statistical Institute¹

15 May 2015-Expert Systems With Applications

TL;DR: The experimental results confirm the superiority of the proposed algorithm over the other state-of-the-art unsupervised feature selection algorithms for eight different kinds of datasets with the number of points and dimensions.

...read moreread less

Abstract: In this article, an unsupervised feature selection algorithm is proposed using an improved version of a recently developed Differential Evolution technique called MoDE. The proposed algorithm produces an optimal feature subset while optimizing three criteria, namely, the average standard deviation of the selected feature subset, the average dissimilarity of the selected features, and the average similarity of non-selected features with respect to their first nearest neighbor selected features. Normalized mutual information score is employed for computing both the similarity as well as the dissimilarity measures. The experimental results confirm the superiority of the proposed algorithm over the other state-of-the-art unsupervised feature selection algorithms for eight different kinds of datasets with the number of points ranging from 80 to 6238 and the number of dimensions ranging from 30 to 649.

...read moreread less

50 citations

Journal Article•DOI•

Transcriptomic Analysis of mRNAs in Human Monocytic Cells Expressing the HIV-1 Nef Protein and Their Exosomes

[...]

Madeeha Aqil¹, Saurav Mallik², Sanghamitra Bandyopadhyay², Ujjwal Maulik³, Shahid Jameel¹ - Show less +1 more•Institutions (3)

International Centre for Genetic Engineering and Biotechnology¹, Indian Statistical Institute², Jadavpur University³

15 Apr 2015-BioMed Research International

TL;DR: This study identifies selectively expressed mRNAs in Nef-expressing U937 cells and their exosomes and supports a new mode on intercellular regulation by the HIV-1 Nef protein.

...read moreread less

Abstract: The Nef protein of human immunodeficiency virus (HIV) promotes viral replication and progression to AIDS. Besides its well-studied effects on intracellular signaling, Nef also functions through its secretion in exosomes, which are nanovesicles containing proteins, microRNAs, and mRNAs and are important for intercellular communication. Nef expression enhances exosome secretion and these exosomes can enter uninfected CD4 T cells leading to apoptotic death. We have recently reported the first miRNome analysis of exosomes secreted from Nef-expressing U937monocytic cells. Here we show genome-wide transcriptome analysis of Nef-expressing U937 cells and their exosomes. We identified four key mRNAs preferentially retained in Nef-expressing cells; these code for MECP2, HMOX1, AARSD1, and ATF2 and are important for chromatin modification and gene expression. Interestingly, their target miRNAs are exported out in exosomes. We also identified three key mRNAs selectively secreted in exosomes from Nef-expressing U937 cells and their corresponding miRNAs being preferentially retained in cells. These are AATK, SLC27A1, and CDKAL and are important in apoptosis and fatty acid transport. Thus, our study identifies selectively expressed mRNAs in Nef-expressing U937 cells and their exosomes and supports a new mode on intercellular regulation by the HIV-1 Nef protein.

...read moreread less

36 citations

Journal Article•DOI•

MicroRNA signatures highlight new breast cancer subtypes.

[...]

Malay Bhattacharyya¹, Joyshree Nath², Sanghamitra Bandyopadhyay²•Institutions (2)

Indian Institute of Engineering Science and Technology, Shibpur¹, Indian Statistical Institute²

10 Feb 2015-Gene

TL;DR: The experimental results demonstrate that miRNAs carry a unique signature that distinguishes cancer sub types and reveal new cancer subtypes, and additional survival analyses based on clinical data also strengthen this claim.

...read moreread less

35 citations

Journal Article•DOI•

A review of in silico approaches for analysis and prediction of HIV-1-human protein–protein interactions

[...]

Sanghamitra Bandyopadhyay¹, Sumanta Ray, Anirban Mukhopadhyay, Ujjwal Maulik•Institutions (1)

Indian Statistical Institute¹

01 Sep 2015-Briefings in Bioinformatics

TL;DR: A comparative assessment of these studies and some methodologies for discussing the implication of their results are presented, and different computational techniques for predicting HIV-1-human PPIs are reviewed and a comparative study of their applicability is provided.

...read moreread less

Abstract: The computational or in silico approaches for analysing the HIV-1-human protein-protein interaction (PPI) network, predicting different host cellular factors and PPIs and discovering several pathways are gaining popularity in the field of HIV research. Although there exist quite a few studies in this regard, no previous effort has been made to review these works in a comprehensive manner. Here we review the computational approaches that are devoted to the analysis and prediction of HIV-1-human PPIs. We have broadly categorized these studies into two fields: computational analysis of HIV-1-human PPI network and prediction of novel PPIs. We have also presented a comparative assessment of these studies and proposed some methodologies for discussing the implication of their results. We have also reviewed different computational techniques for predicting HIV-1-human PPIs and provided a comparative study of their applicability. We believe that our effort will provide helpful insights to the HIV research community.

...read moreread less

35 citations

Journal Article•DOI•

Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

[...]

Ujjwal Maulik¹, Saurav Mallik², Anirban Mukhopadhyay³, Sanghamitra Bandyopadhyay²•Institutions (3)

Jadavpur University¹, Indian Statistical Institute², Kalyani Government Engineering College³

01 Apr 2015-PLOS ONE

TL;DR: A computational rule mining framework, StatBicRM, to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets, which performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets.

...read moreread less

Abstract: Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level.

...read moreread less

31 citations

Journal Article•DOI•

FOCS: Fast Overlapped Community Search

[...]

Sanghamitra Bandyopadhyay¹, Garisha Chowdhary¹, Debarka Sengupta²•Institutions (2)

Indian Statistical Institute¹, Genome Institute of Singapore²

01 Nov 2015-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Fast Overlapped Community Search (FOCS), an algorithm that accounts for local connectedness in order to identify overlapped communities, is proposed and is shown to be linear in number of edges and nodes.

...read moreread less

Abstract: Discovery of natural groups of similarly functioning individuals is a key task in analysis of real world networks. Also, overlap between community pairs is commonplace in large social and biological graphs, in particular. In fact, overlaps between communities are known to be denser than the non-overlapped regions of the communities. However, most of the existing algorithms that detect overlapping communities assume that the communities are denser than their surrounding regions, and falsely identify overlaps as communities. Further, many of these algorithms are computationally demanding and thus, do not scale reasonably with varying network sizes. In this article, we propose Fast Overlapped Community Search (FOCS), an algorithm that accounts for local connectedness in order to identify overlapped communities. FOCS is shown to be linear in number of edges and nodes. It additionally gains in speed via simultaneous selection of multiple near-best communities rather than merely the best, at each iteration. FOCS outperforms some popular overlapped community finding algorithms in terms of computational time while not compromising with quality.

...read moreread less

30 citations

Journal Article•DOI•

Concordant dysregulation of miR-5p and miR-3p arms of the same precursor microRNA may be a mechanism in inducing cell proliferation and tumorigenesis: a lung cancer study

[...]

Ramkrishna Mitra¹, Chen Ching Lin¹, Christine M. Eischen¹, Sanghamitra Bandyopadhyay², Zhongming Zhao¹ - Show less +1 more•Institutions (2)

Vanderbilt University¹, Indian Statistical Institute²

07 Apr 2015-RNA

TL;DR: It is revealed that the reduced expression of miR-145-5p/-3p pair potentially contributes to elevated expression of genes in the "FOXM1 transcription factor network" pathway, which may consequently lead to uncontrolled cell proliferation.

...read moreread less

Abstract: A precursor microRNA (miRNA) has two arms: miR-5p and miR-3p (miR-5p/-3p). Depending on the tissue or cell types, both arms can become functional. However, little is known about their coregulatory mechanisms during the tumorigenic process. Here, by using the large-scale miRNA expression profiles of five cancer types, we revealed that several of miR-5p/-3p arms were concordantly dysregulated in each cancer. To explore possible coregulatory mechanisms of concordantly dysregulated miR-5p/-3p pairs, we developed a robust computational framework and applied it to lung cancer data. The framework deciphers miR-5p/-3p coregulated protein interaction networks critical to lung cancer development. As a novel part in the method, we uniquely applied the second-order partial correlation to minimize false-positive regulations. Using 279 matched miRNA and mRNA expression profiles extracted from tumor and normal lung tissue samples, we identified 17 aberrantly expressed miR-5p/-3p pairs that potentially modulate the gene expression of 35 protein complexes. Functional analyses revealed that these complexes are associated with cancer-related biological processes, suggesting the oncogenic potential of the reported miR-5p/-3p pairs. Specifically, we revealed that the reduced expression of miR-145-5p/-3p pair potentially contributes to elevated expression of genes in the "FOXM1 transcription factor network" pathway, which may consequently lead to uncontrolled cell proliferation. Subsequently, the regulation of miR-145-5p/-3p in the FOXM1signaling pathway was validated by a cohort of 104 matched miRNA and protein (reverse-phase protein array) expression profiles in lung cancer. In summary, our computational framework provides a novel tool to study miR-5p/-3p coregulatory mechanisms in cancer and other diseases.

...read moreread less

28 citations

Journal Article•DOI•

Simultaneous feature selection and symmetry based clustering using multiobjective framework

[...]

Sriparna Saha¹, Rachamadugu Spandana¹, Asif Ekbal¹, Sanghamitra Bandyopadhyay²•Institutions (2)

Indian Institute of Technology Patna¹, Indian Statistical Institute²

01 Apr 2015

TL;DR: A new framework based on multiobjective optimization (MOO), namely FeaClusMOO, is proposed which is capable of identifying the correct partitioning as well as the most relevant set of features from a data set.

...read moreread less

Abstract: In this paper a new framework based on multiobjective optimization (MOO), namely FeaClusMOO, is proposed which is capable of identifying the correct partitioning as well as the most relevant set of features from a data set. A newly developed multiobjective simulated annealing based optimization technique namely archived multiobjective simulated annealing (AMOSA) is used as the background strategy for optimization. Here features and cluster centers are encoded in the form of a string. As the objective functions, two internal cluster validity indices measuring the goodness of the obtained partitioning using Euclidean distance and point symmetry based distance, respectively, and a count on the number of features are utilized. These three objectives are optimized simultaneously using AMOSA in order to detect the appropriate subset of features, appropriate number of clusters as well as the appropriate partitioning. Points are allocated to different clusters using a point symmetry based distance. Mutation changes the feature combination as well as the set of cluster centers. Since AMOSA, like any other MOO technique, provides a set of solutions on the final Pareto front, a technique based on the concept of semi-supervised classification is developed to select a solution from the given set. The effectiveness of the proposed FeaClustMOO in comparison with other clustering techniques like its Euclidean distance based version where Euclidean distance is used for cluster assignment, a genetic algorithm based automatic clustering technique (VGAPS-clustering) using point symmetry based distance with all the features, K-means clustering technique with all features is shown for seven higher dimensional data sets obtained from real-life.

...read moreread less

Occupancy estimation using non intrusive sensors in energy efficient buildings

[...]

Abhay Arora, Manar Amayri, Venkataramana Badarla, Sanghamitra Bandyopadhyay

01 Jan 2015

TL;DR: A general approach is proposed to estimate the number of occupants in a zone using different kinds of measurements such as motion detection, power consumption or CO2 concentration using a C4.5 learning algorithm that yields human readable decision trees.

...read moreread less

Abstract: A general approach is proposed to estimate the number of occupants in a zone using different kinds of measurements such as motion detection, power consumption or CO2 concentration. The proposed approach is inspired from machine learning. It starts by determining among different measurements those that are the most useful by calculating the information gains. Then, an estimation algorithm is proposed. It relies on a C4.5 learning algorithm that yields human readable decision trees using measurements to estimate the number of occupants. It has been applied to an office setting.

...read moreread less

Journal Article•DOI•

A multiobjective approach for identifying protein complexes and studying their association in multiple disorders

[...]

Sanghamitra Bandyopadhyay¹, Sumanta Ray², Anirban Mukhopadhyay³, Ujjwal Maulik⁴•Institutions (4)

Indian Statistical Institute¹, Aliah University², Kalyani Government Engineering College³, Jadavpur University⁴

09 Aug 2015-Algorithms for Molecular Biology

TL;DR: The task of identifying protein complexes as a multiobjective optimization problem is presented and identified protein complexes are found to be associated with several disorders classes like ‘Cancer’, ‘Endocrine’ and ‘Multiple’.

...read moreread less

Abstract: Detecting protein complexes within protein–protein interaction (PPI) networks is a major step toward the analysis of biological processes and pathways. Identification and characterization of protein complexes in PPI network is an ongoing challenge. Several high-throughput experimental techniques provide substantial number of PPIs which are widely utilized for compiling the PPI network of a species. Here we focus on detecting human protein complexes by developing a multiobjective framework. For this large human PPI network is partitioned into modules which serves as protein complex. For building the objective functions we have utilized topological properties of PPI network and biological properties based on Gene Ontology semantic similarity. The proposed method is compared with that of some state-of-the-art algorithms in the context of different performance metrics. For the purpose of biological validation of our predicted complexes we have also employed a Gene Ontology and pathway based analysis here. Additionally, we have performed an analysis to associate resulting protein complexes with 22 key disease classes. Two bipartite networks are created to clearly visualize the association of identified protein complexes with the disorder classes. Here, we present the task of identifying protein complexes as a multiobjective optimization problem. Identified protein complexes are found to be associated with several disorders classes like ‘Cancer’, ‘Endocrine’ and ‘Multiple’. This analysis uncovers some new relationships between disorders and predicted complexes that may take a potential role in the prediction of multi target drugs.

...read moreread less

Proceedings Article•DOI•

Estimating Occupancy in an Office Setting

[...]

Manar Amayri, Stéphane Ploix, Sanghamitra Bandyopadhyay

24 Sep 2015

...read moreread less

Journal Article•DOI•

Structural insight into Mycobacterium tuberculosis maltosyl transferase inhibitors: pharmacophore-based virtual screening, docking, and molecular dynamics simulations

[...]

Soumi Sengupta¹, Debjani Roy², Sanghamitra Bandyopadhyay³•Institutions (3)

Norwegian University of Science and Technology¹, Bose Institute², Indian Statistical Institute³

11 Feb 2015-Journal of Biomolecular Structure & Dynamics

TL;DR: Pharmacophore-based virtual screening, subsequent docking, and molecular dynamics simulations have been done to identify potential inhibitors of maltosyl transferase of Mycobacterium tuberculosis (mtb GlgE) and have confirmed stable protein ligand binding.

...read moreread less

Abstract: Pharmacophore-based virtual screening, subsequent docking, and molecular dynamics (MD) simulations have been done to identify potential inhibitors of maltosyl transferase of Mycobacterium tuberculo...

...read moreread less

Journal Article•DOI•

Priority based ε dominance

[...]

Sanghamitra Bandyopadhyay¹, Rudrasis Chakraborty¹, Ujjwal Maulik²•Institutions (2)

Indian Statistical Institute¹, Jadavpur University²

01 Jun 2015-Information Sciences

TL;DR: PBE based AMOSA is found to comprehensively outperform AMOS a, MOEA/D-DE, the conventional ?

...read moreread less

Proceedings Article•

Variable Weighted Maximal Relevance Minimal Redundancy Criterion for Feature Selection Using Normalized Mutual Information.

[...]

Sanghamitra Bandyopadhyay¹, Tapas Bhadra¹, Ujjwal Maulik²•Institutions (2)

Indian Statistical Institute¹, Jadavpur University²

01 Jan 2015

Proceedings Article•DOI•

A fuzzy citation-kNN algorithm for multiple instance learning

[...]

Dip Ghosh¹, Sanghamitra Bandyopadhyay¹•Institutions (1)

Indian Statistical Institute¹

01 Aug 2015

TL;DR: Experiments on drug discovery and image datasets show that the performance of the proposed algorithm (MI-FCKNN) is better than the traditional citation-kNN and competitive with most state-of-the-art algorithms.

...read moreread less

Abstract: In multiple instance learning (MIL) setting, instances are grouped together in different labeled bags and the classifier tries to learn the label of unknown bags or instances. This is significantly different from traditional supervised learning techniques where the instances are labeled itself. In this work, a fuzzy based citation-kNN technique, which uses modified Hausdorff distance between bags, is introduced. Introduction of a fuzzy distance measure helps to solve the problem of overlapping bags. Effect of false positive instances in a positive bag are also reduced by calculating a fuzzy class membership for the training bags. Experiments on drug discovery and image datasets show that the performance of the proposed algorithm (MI-FCKNN) is better than the traditional citation-kNN and competitive with most state-of-the-art algorithms.

...read moreread less

Journal Article•DOI•

Exploring the Genomic Roadmap and Molecular Phylogenetics Associated with MODY Cascades Using Computational Biology

[...]

Chiranjib Chakraborty¹, Sanghamitra Bandyopadhyay², C. George Priya Doss³, Govindasamy Agoramoorthy⁴•Institutions (4)

Galgotias University¹, Indian Statistical Institute², VIT University³, Tajen University⁴

01 Apr 2015-Cell Biochemistry and Biophysics

TL;DR: The prediction of sequence conservation, molecular phylogenetics, protein–protein network and the association between the MODY cascades enhances opportunities to get more insights into the less-known MODY disease.

...read moreread less

Abstract: Maturity onset diabetes of the young (MODY) is a metabolic and genetic disorder. It is different from type 1 and type 2 diabetes with low occurrence level (1–2 %) among all diabetes. This disorder is a consequence of β-cell dysfunction. Till date, 11 subtypes of MODY have been identified, and all of them can cause gene mutations. However, very little is known about the gene mapping, molecular phylogenetics, and co-expression among MODY genes and networking between cascades. This study has used latest servers and software such as VarioWatch, ClustalW, MUSCLE, G Blocks, Phylogeny.fr, iTOL, WebLogo, STRING, and KEGG PATHWAY to perform comprehensive analyses of gene mapping, multiple sequences alignment, molecular phylogenetics, protein–protein network design, co-expression analysis of MODY genes, and pathway development. The MODY genes are located in chromosomes-2, 7, 8, 9, 11, 12, 13, 17, and 20. Highly aligned block shows Pro, Gly, Leu, Arg, and Pro residues are highly aligned in the positions of 296, 386, 437, 455, 456 and 598, respectively. Alignment scores inform us that HNF1A and HNF1B proteins have shown high sequence similarity among MODY proteins. Protein–protein network design shows that HNF1A, HNF1B, HNF4A, NEUROD1, PDX1, PAX4, INS, and GCK are strongly connected, and the co-expression analyses between MODY genes also show distinct association between HNF1A and HNF4A genes. This study has used latest tools of bioinformatics to develop a rapid method to assess the evolutionary relationship, the network development, and the associations among eleven MODY genes and cascades. The prediction of sequence conservation, molecular phylogenetics, protein–protein network and the association between the MODY cascades enhances opportunities to get more insights into the less-known MODY disease.

...read moreread less

Journal Article•DOI•

Finding quasi core with simulated stacked neural networks

[...]

Malay Bhattacharyya¹, Sanghamitra Bandyopadhyay²•Institutions (2)

Indian Institute of Engineering Science and Technology, Shibpur¹, Indian Statistical Institute²

10 Feb 2015-Information Sciences

TL;DR: This paper proposes a stacked neural network model for finding out the largest quasi-complete module (core) in weighted graphs and shows the effectiveness of the proposed approach on DIMACS graphs.

...read moreread less

Proceedings Article•DOI•

Ties that matter

[...]

Garisha Chowdhary¹, Sanghamitra Bandyopadhyay¹•Institutions (1)

Indian Statistical Institute¹

29 Oct 2015

TL;DR: The method discussed in this article identifies unique and high number of mutual connections through weighted self-information and has highly reduced number of edges, still conserving the centrality distributions as far as possible.

...read moreread less

Abstract: On-line social networks mostly allow individuals to extend friend requests to all forms of possible connections including those related to official purposes, interests, family relations, friendships, and acquaintances. One requires to mine relevant connections in order to make reliable and meaningful interpretations following network analysis. Most networks lack weight assignments that mark the strength of a connection. Thus there is a requirement of methods that can effectively identify essential edges from only the topological information available. The method discussed in this article identifies unique and high number of mutual connections through weighted self-information. The extracted skeleton network has highly reduced number of edges, still conserving the centrality distributions as far as possible. The method used is applied locally to each node, to extract connections relevant to every node. Results are demonstrated on five datasets which show that the proposed method is able to eliminate a large number of irrelevant edges. The method is also found to scale well to large datasets.

...read moreread less

Proceedings Article•DOI•

Event extraction from cancer genetics literature

[...]

Debajyoti Sinha¹, Utpal Garain¹, Sanghamitra Bandyopadhyay¹•Institutions (1)

Indian Statistical Institute¹

02 Mar 2015

TL;DR: Two syntactic patterns namely phrase structure and dependency structure are explored to produce improved results with respect to the Cancer Genetics Data provided in the BioNLP'13 Shared Task.

...read moreread less

Abstract: This paper attempts to employ learning based pattern classification technique to extract events from biological literature. Although various approaches to extract events have been explored, none is suitable for designing a practical system of event extraction. Extracting events more precisely is still an ongoing process. In this paper, new features that seem to be relevant for the given task are investigated. Two syntactic patterns namely phrase structure and dependency structure are explored to produce improved results with respect to the Cancer Genetics Data provided in the BioNLP'13 Shared Task. A stacked model based on conditional probability scores are also considered as features. The patterns and the probability scores along with some other linguistic features are fed to SVMs to train it for the task of bio-event extraction from natural language articles. The results are compared with the performance of the best extraction system in Cancer Genetics Task.

...read moreread less

Proceedings Article•

Pattern Recognition and Machine Intelligence : 6th International Conference, PReMI 2015, Warsaw, Poland, June 30 - July 3, 2015, Proceedings

[...]

Marzena Kryszkiewicz, Sanghamitra Bandyopadhyay¹, Henryk Rybiński, Sankar K. Pal¹•Institutions (1)

Indian Statistical Institute¹

01 Jan 2015

TL;DR: This book constitutes the proceedings of the 6th International Conference on Pattern Recognition and Machine Intelligence, PReMI 2015, held in Warsaw, Poland, in June/July 2015.

...read moreread less

Abstract: This book constitutes the proceedings of the 6th International Conference on Pattern Recognition and Machine Intelligence, PReMI 2015, held in Warsaw, Poland, in June/July 2015. The total of 53 full papers and 1 short paper presented in this volume were carefully reviewed and selected from 90 submissions. They were organized in topical sections named: foundations of machine learning; image processing; image retrieval; image tracking; pattern recognition; data mining techniques for large scale data; fuzzy computing; rough sets; bioinformatics; and applications of artificial intelligence

...read moreread less

Journal Article•DOI•

Feature selection using feature dissimilarity measure and density-based clustering: Application to biological data

[...]

Debarka Sengupta¹, Indranil Aich, Sanghamitra Bandyopadhyay²•Institutions (2)

Genome Institute of Singapore¹, Indian Statistical Institute²

28 Sep 2015-Journal of Biosciences

TL;DR: An unsupervised feature selection technique is proposed, using maximum information compression index as the dissimilarity measure and the well-known density-based cluster identification technique DBSCAN for identifying the largest natural group of dissimilar features.

...read moreread less

Abstract: Reduction of dimensionality has emerged as a routine process in modelling complex biological systems. A large number of feature selection techniques have been reported in the literature to improve model performance in terms of accuracy and speed. In the present article an unsupervised feature selection technique is proposed, using maximum information compression index as the dissimilarity measure and the well-known density-based cluster identification technique DBSCAN for identifying the largest natural group of dissimilar features. The algorithm is fast and less sensitive to the user-supplied parameters. Moreover, the method automatically determines the required number of features and identifies them. We used the proposed method for reducing dimensionality of a number of benchmark data sets of varying sizes. Its performance was also extensively compared with some other well-known feature selection methods.

...read moreread less