scispace - formally typeset
Search or ask a question

Showing papers by "Sanghamitra Bandyopadhyay published in 2018"


Journal ArticleDOI
TL;DR: Locality Sensitive Hashing is exploited, an approximate nearest neighbour search technique, to develop a de novo clustering algorithm for large-scale single cell data that outperformed the existing best practice methods in terms of execution time, clustering accuracy and detectability of minor cell sub-types.
Abstract: Droplet based single cell transcriptomics has recently enabled parallel screening of tens of thousands of single cells. Clustering methods that scale for such high dimensional data without compromising accuracy are scarce. We exploit Locality Sensitive Hashing, an approximate nearest neighbour search technique to develop a de novo clustering algorithm for large-scale single cell data. On a number of real datasets, dropClust outperformed the existing best practice methods in terms of execution time, clustering accuracy and detectability of minor cell sub-types.

90 citations


Journal ArticleDOI
TL;DR: This work proposes a novel many-objective optimization algorithm, viz.

36 citations


Journal ArticleDOI
TL;DR: In this paper, it has become possible to profile transcriptomes of several thousands of cells in a day, although such a large single-cell cohort may not be suitable for large-scale data collection.
Abstract: With the emergence of droplet-based technologies, it has now become possible to profile transcriptomes of several thousands of cells in a day. Although such a large single-cell cohort may ...

33 citations


Journal ArticleDOI
TL;DR: This work proposes a novel computational framework for identifying significant combinatorial markers using both gene expression and methylation data and expects that such a combination of markers will produce lower false positives than individual markers.
Abstract: Identification of combinatorial markers from multiple data sources is a challenging task in bioinformatics. Here, we propose a novel computational framework for identifying significant combinatorial markers ( $SCM$ s) using both gene expression and methylation data. The gene expression and methylation data are integrated into a single continuous data as well as a (post-discretized) boolean data based on their intrinsic (i.e., inverse) relationship. A novel combined score of methylation and expression data (viz., $CoMEx$ ) is introduced which is computed on the integrated continuous data for identifying initial non-redundant set of genes. Thereafter, (maximal) frequent closed homogeneous genesets are identified using a well-known biclustering algorithm applied on the integrated boolean data of the determined non-redundant set of genes. A novel sample-based weighted support ( $WS$ ) is then proposed that is consecutively calculated on the integrated boolean data of the determined non-redundant set of genes in order to identify the non-redundant significant genesets. The top few resulting genesets are identified as potential $SCM$ s. Since our proposed method generates a smaller number of significant non-redundant genesets than those by other popular methods, the method is much faster than the others. Application of the proposed technique on an expression and a methylation data for Uterine tumor or Prostate Carcinoma produces a set of significant combination of markers. We expect that such a combination of markers will produce lower false positives than individual markers.

25 citations


Journal ArticleDOI
TL;DR: This article provides a new study on brain tissue between human and rhesus on two methylation cytosine variants based data-profiles through TF-miRNA-gene network based module detection and identifies co-methylated and multi-stage co- methylated gene modules.
Abstract: Study of epigenetics is currently a high-impact research topic. Multi stage methylation is also an area of high-dimensional prospect. In this article, we provide a new study (intra and inter-species study) on brain tissue between human and rhesus on two methylation cytosine variants based data-profiles (viz., 5-hydroxymethylcytosine (5hmC) and 5-methylcytosine (5mC) samples) through TF-miRNA-gene network based module detection. First of all, we determine differentially 5hmC methylated genes for human as well as rhesus for intra-species analysis, and differentially multi-stage methylated genes for inter-species analysis. Thereafter, we utilize weighted topological overlap matrix (TOM) measure and average linkage clustering consecutively on these genesets for intra- and inter-species study.We identify co-methylated and multi-stage co-methylated gene modules by using dynamic tree cut, for intra-and inter-species cases, respectively. Each module is represented by individual color in the dendrogram. Gene Ontology and KEGG pathway based analysis are then performed to identify biological functionalities of the identified modules. Finally, top ten regulator TFs and targeter miRNAs that are associated with the maximum number of gene modules, are determined for both intra-and inter-species analysis. The novel TFs and miRNAs obtained from the analysis are: MYST3 and ZNF771 as TFs (for human intra-species analysis), BAZ2B, RCOR3 and ATF1 as TFs (for rhesus intra-species analysis), and mml-miR-768-3p and mml-miR-561 as miRs (for rhesus intra-species analysis); and MYST3 and ZNF771 as miRs(for inter-species study). Furthermore, the genes/TFs/miRNAs that are already found to be liable for several brain-related dreadful diseases as well as rare neglected diseases (e.g., wolf Hirschhorn syndrome, Joubarts Syndrome, Huntington’s disease, Simian Immunodeficiency Virus(SIV) mediated enchaphilits, Parkinsons Disease, Bipolar disorder and Schizophenia etc.) are mentioned.

23 citations


Proceedings ArticleDOI
01 Jan 2018
TL;DR: This article demonstrates layered locality sensitive hashing in genomic sequence comparison, which reduces the search time by 93.6%, while producing results almost as good as the exact ones.
Abstract: In this article, we will demonstrate Layered Locality Sensitive Hashing in genomic sequence comparison. Locality Sensitive Hashing based algorithms have already been proved to be successful for approximate nearest neighbor search in high dimensional data. Genomic database search is the primary task for homology detection and motif identification. However, the huge genome size and unknown repetitive regions make the task even more difficult. To tackle this problem we have introduced layered locality sensitive hashing for large scale genomic comparisons. As it turns out, the proposed method reduces the search time by 93.6%, while producing results almost as good as the exact ones.

5 citations


Proceedings ArticleDOI
01 Oct 2018
TL;DR: A computational framework to measure the preservation characteristics of modular structures between two biological networks is proposed and results confirm high similarity of topological properties between co-expressed modules of acute and chronic stages than acute and nonprogressor stages.
Abstract: In this paper, we have proposed a computational framework to measure the preservation characteristics of modular structures between two biological networks. The preservation characteristics of co-expressed gene modules are identified by comparing the frequencies of few predefined small substructures called graphlets in the coexpression networks of three HIV-1 infection stages: acute, nonprogressor, and chronic. A novel similarity measure has been proposed based on the frequencies and significances of those graphlets occurring in the networks. A widely used tool GtrieScanner is utilized to find the frequencies and significance of those graphlets in networks. Results confirm high similarity of topological properties between co-expressed modules of acute and chronic stages than acute and nonprogressor stages. Our method contributes to an important understanding of preservation characteristics of the modular organization in two different biological networks.

3 citations


Book ChapterDOI
01 Jan 2018
TL;DR: This book chapter provides a comprehensive review of various multi-Objective optimization techniques used in biological learning systems dealing with the microarray or RNA-Seq data and depicts a new direction to bioinspired learning system related to multi-objective optimization.
Abstract: Multi-Objective optimization is a well-known and efficient method in computer science. In various real-life problems, multiple conflicting objective functions need to be optimized simultaneously to attain the desired goal of the underlying pattern recognition task. The approaches of multi-objective optimization have a great impact in designing sophisticated learning systems, especially building robust biological learning systems. Remembering that, in this book chapter, we provide a comprehensive review of various multi-objective optimization techniques used in biological learning systems dealing with the microarray or RNA-Seq data. In this regard, the task of designing a multi-class cancer classification system employing a multi-objective optimization technique is first addressed. Next, how a gene regulatory network can be built from a perspective of multi-objective optimization is discussed. The next application deals with fuzzy clustering of categorical attributes using a multi-objective genetic algorithm. After this, how microarray data can be automatically clustered using a multi-objective differential evolution is addressed. Then, the applicability of multi-objective particle swarm optimization techniques in identifying gene markers is explored. The next application concentrates on feature selection for microarray data using a multi-objective binary particle swarm optimization technique. Thereafter, a multi-objective optimization approach is addressed for producing differentially coexpressed module during the progression of the HIV disease. In addition, we represent a comparative study based on the literature along with highlighting the advantages and limitations of the methods. Finally, our study depicts a new direction to bioinspired learning system related to multi-objective optimization.

2 citations


Book ChapterDOI
01 Jan 2018
TL;DR: This work analyses the classification performance by varying the feature dimension and number of objectives of left/right motor imagery signal classification with many-objective feature selection by modeling the feature selection as an optimization problem with six objectives.
Abstract: Brain–Computer Interfacing helps in creation of a communication pathway between brain and external device such that the biological modality of performing the task could be bypassed. This necessitates fast and reliable decoding of brain signals which mandate feature selection to play a crucial role. The literature discloses the improvement in performance of left/right motor imagery signal classification with many-objective feature selection where several classification performance metrics have been maximized for obtaining a good quality feature set. This work analyses the classification performance by varying the feature dimension and number of objectives. A recent many-objective optimization coupled with objective reduction algorithm viz. \(\alpha \)-DEMO has been used for modeling the feature selection as an optimization problem with six objectives. The results obtained in this work have been statistically validated by Friedman Test.

1 citations


Proceedings ArticleDOI
01 Nov 2018
TL;DR: Inspired by generalized Nash bargaining solutions, the proposed approach obtains a fair consensus solution with respect to the preferences of multiple occupants and can benefit from more physically significant utility functions for each decision-maker which, in turn, can aid the energy policymakers in the long run.
Abstract: In the field of building energy management, comfort levels are often targeted with respect to a single occupant. In studies with multiple occupants, either a hierarchical ordering of occupants with their respective preferences are considered or aggregated results with respect to individual preferences are considered. Studies are scarce which consider the subjective preferences of multiple occupants with equal priority. Inspired by generalized Nash bargaining solutions, the proposed approach obtains a fair consensus solution with respect to the preferences of multiple occupants. This approach is implemented on a real-life scenario of zero cost human-based energy retrofit planning for building energy management using several possible approaches to integrate the fair consensus searching with the search for Pareto-optimality pertaining to the multiple objectives characterizing occupant’s comfort. In future, this work can benefit from more physically significant utility functions for each decision-maker which, in turn, can aid the energy policymakers in the long run.

1 citations