Showing papers by "Sanghamitra Bandyopadhyay published in 2018"

PDF

Open Access

Journal Article•DOI•

dropClust: efficient clustering of ultra-large scRNA-seq data.

[...]

Debajyoti Sinha¹, Debajyoti Sinha², Akhilesh Kumar³, Himanshu Kumar³, Sanghamitra Bandyopadhyay², Debarka Sengupta⁴ - Show less +2 more•Institutions (4)

University of Calcutta¹, Indian Statistical Institute², Indian Institute of Science Education and Research, Bhopal³, Indraprastha Institute of Information Technology⁴

06 Apr 2018-Nucleic Acids Research

TL;DR: Locality Sensitive Hashing is exploited, an approximate nearest neighbour search technique, to develop a de novo clustering algorithm for large-scale single cell data that outperformed the existing best practice methods in terms of execution time, clustering accuracy and detectability of minor cell sub-types.

...read moreread less

Abstract: Droplet based single cell transcriptomics has recently enabled parallel screening of tens of thousands of single cells. Clustering methods that scale for such high dimensional data without compromising accuracy are scarce. We exploit Locality Sensitive Hashing, an approximate nearest neighbour search technique to develop a de novo clustering algorithm for large-scale single cell data. On a number of real datasets, dropClust outperformed the existing best practice methods in terms of execution time, clustering accuracy and detectability of minor cell sub-types.

...read moreread less

90 citations

Journal Article•DOI•

DECOR: Differential Evolution using Clustering based Objective Reduction for many-objective optimization

[...]

Monalisa Pal¹, Sriparna Saha², Sanghamitra Bandyopadhyay¹•Institutions (2)

Indian Statistical Institute¹, Indian Institute of Technology Patna²

01 Jan 2018-Information Sciences

TL;DR: This work proposes a novel many-objective optimization algorithm, viz.

...read moreread less

36 citations

Journal Article•DOI•

Structure-Aware Principal Component Analysis for Single-Cell RNA-seq Data.

[...]

Snehalika Lall¹, Debajyoti Sinha¹, Sanghamitra Bandyopadhyay¹, Debarka Sengupta²•Institutions (2)

Indian Statistical Institute¹, Indraprastha Institute of Information Technology²

22 Aug 2018-Journal of Computational Biology

TL;DR: In this paper, it has become possible to profile transcriptomes of several thousands of cells in a day, although such a large single-cell cohort may not be suitable for large-scale data collection.

...read moreread less

Abstract: With the emergence of droplet-based technologies, it has now become possible to profile transcriptomes of several thousands of cells in a day. Although such a large single-cell cohort may ...

...read moreread less

33 citations

Journal Article•DOI•

Integrating Multiple Data Sources for Combinatorial Marker Discovery: A Study in Tumorigenesis

[...]

Sanghamitra Bandyopadhyay¹, Saurav Mallik¹•Institutions (1)

Indian Statistical Institute¹

01 Mar 2018-IEEE/ACM Transactions on Computational Biology and Bioinformatics

TL;DR: This work proposes a novel computational framework for identifying significant combinatorial markers using both gene expression and methylation data and expects that such a combination of markers will produce lower false positives than individual markers.

...read moreread less

Abstract: Identification of combinatorial markers from multiple data sources is a challenging task in bioinformatics. Here, we propose a novel computational framework for identifying significant combinatorial markers ( $SCM$ s) using both gene expression and methylation data. The gene expression and methylation data are integrated into a single continuous data as well as a (post-discretized) boolean data based on their intrinsic (i.e., inverse) relationship. A novel combined score of methylation and expression data (viz., $CoMEx$ ) is introduced which is computed on the integrated continuous data for identifying initial non-redundant set of genes. Thereafter, (maximal) frequent closed homogeneous genesets are identified using a well-known biclustering algorithm applied on the integrated boolean data of the determined non-redundant set of genes. A novel sample-based weighted support ( $WS$ ) is then proposed that is consecutively calculated on the integrated boolean data of the determined non-redundant set of genes in order to identify the non-redundant significant genesets. The top few resulting genesets are identified as potential $SCM$ s. Since our proposed method generates a smaller number of significant non-redundant genesets than those by other popular methods, the method is much faster than the others. Application of the proposed technique on an expression and a methylation data for Uterine tumor or Prostate Carcinoma produces a set of significant combination of markers. We expect that such a combination of markers will produce lower false positives than individual markers.

...read moreread less

25 citations

Journal Article•DOI•

Detecting TF-miRNA-gene network based modules for 5hmC and 5mC brain samples: a intra- and inter-species case-study between human and rhesus

[...]

Ujjwal Maulik¹, Sagnik Sen¹, Saurav Mallik¹, Sanghamitra Bandyopadhyay²•Institutions (2)

Jadavpur University¹, Indian Statistical Institute²

22 Jan 2018-BMC Genetics

TL;DR: This article provides a new study on brain tissue between human and rhesus on two methylation cytosine variants based data-profiles through TF-miRNA-gene network based module detection and identifies co-methylated and multi-stage co- methylated gene modules.

...read moreread less

Abstract: Study of epigenetics is currently a high-impact research topic. Multi stage methylation is also an area of high-dimensional prospect. In this article, we provide a new study (intra and inter-species study) on brain tissue between human and rhesus on two methylation cytosine variants based data-profiles (viz., 5-hydroxymethylcytosine (5hmC) and 5-methylcytosine (5mC) samples) through TF-miRNA-gene network based module detection. First of all, we determine differentially 5hmC methylated genes for human as well as rhesus for intra-species analysis, and differentially multi-stage methylated genes for inter-species analysis. Thereafter, we utilize weighted topological overlap matrix (TOM) measure and average linkage clustering consecutively on these genesets for intra- and inter-species study.We identify co-methylated and multi-stage co-methylated gene modules by using dynamic tree cut, for intra-and inter-species cases, respectively. Each module is represented by individual color in the dendrogram. Gene Ontology and KEGG pathway based analysis are then performed to identify biological functionalities of the identified modules. Finally, top ten regulator TFs and targeter miRNAs that are associated with the maximum number of gene modules, are determined for both intra-and inter-species analysis. The novel TFs and miRNAs obtained from the analysis are: MYST3 and ZNF771 as TFs (for human intra-species analysis), BAZ2B, RCOR3 and ATF1 as TFs (for rhesus intra-species analysis), and mml-miR-768-3p and mml-miR-561 as miRs (for rhesus intra-species analysis); and MYST3 and ZNF771 as miRs(for inter-species study). Furthermore, the genes/TFs/miRNAs that are already found to be liable for several brain-related dreadful diseases as well as rare neglected diseases (e.g., wolf Hirschhorn syndrome, Joubarts Syndrome, Huntington’s disease, Simian Immunodeficiency Virus(SIV) mediated enchaphilits, Parkinsons Disease, Bipolar disorder and Schizophenia etc.) are mentioned.

...read moreread less

23 citations

Proceedings Article•DOI•

Ultrafast Genomic Database Search using Layered Locality Sensitive Hashing

[...]

Angana Chakraborty¹, Sanghamitra Bandyopadhyay¹•Institutions (1)

Indian Statistical Institute¹

01 Jan 2018

TL;DR: This article demonstrates layered locality sensitive hashing in genomic sequence comparison, which reduces the search time by 93.6%, while producing results almost as good as the exact ones.

...read moreread less

Abstract: In this article, we will demonstrate Layered Locality Sensitive Hashing in genomic sequence comparison. Locality Sensitive Hashing based algorithms have already been proved to be successful for approximate nearest neighbor search in high dimensional data. Genomic database search is the primary task for homology detection and motif identification. However, the huge genome size and unknown repetitive regions make the task even more difficult. To tackle this problem we have introduced layered locality sensitive hashing for large scale genomic comparisons. As it turns out, the proposed method reduces the search time by 93.6%, while producing results almost as good as the exact ones.

...read moreread less

5 citations

Proceedings Article•DOI•

Analysis on Preservation Characteristics of Modular Structure during HIV-1 Progression using Weighted and Normalized Graphlet Frequency Distribution

[...]

Sourav Biswas¹, Sumanta Ray², Sanghamitra Bandyopadhyay¹•Institutions (2)

Indian Statistical Institute¹, Aliah University²

01 Oct 2018

TL;DR: A computational framework to measure the preservation characteristics of modular structures between two biological networks is proposed and results confirm high similarity of topological properties between co-expressed modules of acute and chronic stages than acute and nonprogressor stages.

...read moreread less

Abstract: In this paper, we have proposed a computational framework to measure the preservation characteristics of modular structures between two biological networks. The preservation characteristics of co-expressed gene modules are identified by comparing the frequencies of few predefined small substructures called graphlets in the coexpression networks of three HIV-1 infection stages: acute, nonprogressor, and chronic. A novel similarity measure has been proposed based on the frequencies and significances of those graphlets occurring in the networks. A widely used tool GtrieScanner is utilized to find the frequencies and significance of those graphlets in networks. Results confirm high similarity of topological properties between co-expressed modules of acute and chronic stages than acute and nonprogressor stages. Our method contributes to an important understanding of preservation characteristics of the modular organization in two different biological networks.

...read moreread less

3 citations

Book Chapter•DOI•

Multi-Objective Optimization Approaches in Biological Learning System on Microarray Data

[...]

Saurav Mallik¹, Tapas Bhadra², Soumita Seth², Sanghamitra Bandyopadhyay¹, Jianjiao Chen³ - Show less +1 more•Institutions (3)

Indian Statistical Institute¹, Aliah University², University of Miami³

01 Jan 2018

TL;DR: This book chapter provides a comprehensive review of various multi-Objective optimization techniques used in biological learning systems dealing with the microarray or RNA-Seq data and depicts a new direction to bioinspired learning system related to multi-objective optimization.

...read moreread less

Abstract: Multi-Objective optimization is a well-known and efficient method in computer science. In various real-life problems, multiple conflicting objective functions need to be optimized simultaneously to attain the desired goal of the underlying pattern recognition task. The approaches of multi-objective optimization have a great impact in designing sophisticated learning systems, especially building robust biological learning systems. Remembering that, in this book chapter, we provide a comprehensive review of various multi-objective optimization techniques used in biological learning systems dealing with the microarray or RNA-Seq data. In this regard, the task of designing a multi-class cancer classification system employing a multi-objective optimization technique is first addressed. Next, how a gene regulatory network can be built from a perspective of multi-objective optimization is discussed. The next application deals with fuzzy clustering of categorical attributes using a multi-objective genetic algorithm. After this, how microarray data can be automatically clustered using a multi-objective differential evolution is addressed. Then, the applicability of multi-objective particle swarm optimization techniques in identifying gene markers is explored. The next application concentrates on feature selection for microarray data using a multi-objective binary particle swarm optimization technique. Thereafter, a multi-objective optimization approach is addressed for producing differentially coexpressed module during the progression of the HIV disease. In addition, we represent a comparative study based on the literature along with highlighting the advantages and limitations of the methods. Finally, our study depicts a new direction to bioinspired learning system related to multi-objective optimization.

...read moreread less

2 citations

Book Chapter•DOI•

Exploration of Many-Objective Feature Selection for Recognition of Motor Imagery Tasks

[...]

Monalisa Pal¹, Sanghamitra Bandyopadhyay¹•Institutions (1)

Indian Statistical Institute¹

01 Jan 2018

TL;DR: This work analyses the classification performance by varying the feature dimension and number of objectives of left/right motor imagery signal classification with many-objective feature selection by modeling the feature selection as an optimization problem with six objectives.

...read moreread less

Abstract: Brain–Computer Interfacing helps in creation of a communication pathway between brain and external device such that the biological modality of performing the task could be bypassed. This necessitates fast and reliable decoding of brain signals which mandate feature selection to play a crucial role. The literature discloses the improvement in performance of left/right motor imagery signal classification with many-objective feature selection where several classification performance metrics have been maximized for obtaining a good quality feature set. This work analyses the classification performance by varying the feature dimension and number of objectives. A recent many-objective optimization coupled with objective reduction algorithm viz. $\alpha $-DEMO has been used for modeling the feature selection as an optimization problem with six objectives. The results obtained in this work have been statistically validated by Friedman Test.

...read moreread less

1 citations

Proceedings Article•DOI•

Consensus of Subjective Preferences of Multiple Occupants for Building Energy Management

[...]

Monalisa Pal¹, Sanghamitra Bandyopadhyay¹•Institutions (1)

Indian Statistical Institute¹

01 Nov 2018

TL;DR: Inspired by generalized Nash bargaining solutions, the proposed approach obtains a fair consensus solution with respect to the preferences of multiple occupants and can benefit from more physically significant utility functions for each decision-maker which, in turn, can aid the energy policymakers in the long run.

...read moreread less

Abstract: In the field of building energy management, comfort levels are often targeted with respect to a single occupant. In studies with multiple occupants, either a hierarchical ordering of occupants with their respective preferences are considered or aggregated results with respect to individual preferences are considered. Studies are scarce which consider the subjective preferences of multiple occupants with equal priority. Inspired by generalized Nash bargaining solutions, the proposed approach obtains a fair consensus solution with respect to the preferences of multiple occupants. This approach is implemented on a real-life scenario of zero cost human-based energy retrofit planning for building energy management using several possible approaches to integrate the fair consensus searching with the search for Pareto-optimality pertaining to the multiple objectives characterizing occupant’s comfort. In future, this work can benefit from more physically significant utility functions for each decision-maker which, in turn, can aid the energy policymakers in the long run.

...read moreread less

1 citations

Book Chapter•DOI•

Involvement of MicroRNAs in Alzheimer’s Disease

[...]

Malay Bhattacharyya, Sanghamitra Bandyopadhyay

19 Feb 2018