scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.

TL;DR: The expanded CMap is reported, made possible by a new, low-cost, high-throughput reduced representation expression profiling method that is shown to be highly reproducible, comparable to RNA sequencing, and suitable for computational inference of the expression levels of 81% of non-measured transcripts.
About: This article is published in Cell.The article was published on 2017-11-30 and is currently open access. It has received 1943 citations till now.
Citations
More filters
Journal ArticleDOI
TL;DR: This work proposes two models that complement the treatment of breast cancer patients and uses the Pearson correlation coefficient and accuracy as the performance measures for the regression and classification models, respectively.
Abstract: Multiomics data of cancer patients and cell lines, in synergy with deep learning techniques, have aided in unravelling predictive problems related to cancer research and treatment. However, there is still room for improvement in the performance of the existing models based on the aforementioned combination. In this work, we propose two models that complement the treatment of breast cancer patients. First, we discuss our deep learning-based model for breast cancer subtype classification. Second, we propose DCNN-DR, a deep convolute.ion neural network-drug response method for predicting the effectiveness of drugs on in vitro and in vivo breast cancer datasets. Finally, we applied DCNN-DR for predicting effective drugs for the basal-like breast cancer subtype and validated the results with the information available in the literature. The models proposed use late integration methods and have fairly better predictive performance compared to the existing methods. We use the Pearson correlation coefficient and accuracy as the performance measures for the regression and classification models, respectively.

4 citations

Journal ArticleDOI
TL;DR: In this paper , a network-based method for drug repurposing against different stages of Alzheimer's disease (AD) severity is proposed, which ranks the candidate repurposed drugs based on a weighted sum of connections in a network resembling the structural similarity with failed, approved or currently ongoing drugs.
Abstract: Alzheimer's disease (AD) is a progressive neurodegenerative disease and the most common type of dementia. With no disease-curing drugs available and an ever-growing AD-related healthcare burden, novel approaches for identifying therapies are needed. In this work, we propose stage-specific candidate repurposed drugs against AD by using a novel network-based method for drug repurposing against different stages of AD severity. For each AD stage, this approach a) ranks the candidate repurposed drugs based on a novel network-based score emerging from the weighted sum of connections in a network resembling the structural similarity with failed, approved or currently ongoing drugs b) re-ranks the candidate drugs based on functional, structural and a priori information according to a recently developed method by our group and c) checks and re-ranks for permeability through the Blood Brain Barrier (BBB). Overall, we propose for further experimental validation 10 candidate repurposed drugs for each AD stage comprising a set of 26 elite candidate repurposed drugs due to overlaps between the three AD stages. We applied our methodology in a retrospective way on the known clinical trial drugs till 2016 and we show that we were able to highly rank a drug that did enter clinical trials in the following year. We expect that our proposed network-based drug-repurposing methodology will serve as a paradigm for application for ranking candidate repurposed drugs in other brain diseases beyond AD.

4 citations

Posted ContentDOI
10 Aug 2020-bioRxiv
TL;DR: It is concluded that even for cells in which drug responses have not been fully characterized, it is possible to identify unassayed drugs that reverse in those cells the expression signatures observed in disease.
Abstract: Motivation Drug re-positioning allows expedited discovery of new applications for existing compounds, but re-screening vast compound libraries is often prohibitively expensive. “Connectivity mapping” is a process that links drugs to diseases by identifying drugs whose impact on expression in a collection of cells most closely reverses the disease’s impact on expression in disease-relevant tissues. The high throughput LINCS project has expanded the universe of compounds, cellular perturbations, and cell types for which data are available, but even with this effort, many potentially clinically useful combinations are missing. To evaluate the possibility of finding disease-relevant drug connectivity despite missing data, we compared methods using cross-validation on a complete subset of the LINCS data. Results Modified recommender systems with either neighborhood-based or SVD imputation methods were compared to autoencoders and two naive methods. All were evaluated for accuracy in prediction of both expression signatures and connectivity query responses. We demonstrate that cellular context is important, and that it is possible to predict cell-specific drug responses with improved accuracy over naive approaches. Neighborhood-based collaborative filtering was the most successful, improving prediction accuracy in all tested cells. We conclude that even for cells in which drug responses have not been fully characterized, it is possible to identify drugs that reverse the expression signatures observed in disease. Contact donna.slonim@tufts.edu Supplementary information bcb.cs.tufts.edu/cmap

4 citations

Posted ContentDOI
14 May 2020-bioRxiv
TL;DR: DComboNet is valuable for prioritizing drug combination and the network model may facilitate the understanding of the combination mechanisms, as well as achieving better performance than previous methods.
Abstract: Anti-cancer drug combination is an effective solution to improve treatment efficacy and overcome resistance. Here we propose a network-based method (DComboNet) to prioritize the candidate drug combinations. The level one model is to predict generalized anti-cancer drug combination effectiveness and level two model is to predict personalized drug combinations. By integrating drugs, genes, pathways and their associations, DComboNet achieves better performance than previous methods, with high AUC value of around 0.8. The level two model performs better than level one model by introducing cancer sample specific transcriptome data into network construction. DComboNet is further applied on finding combinable drugs for sorafenib in hepatocellular cancer, and the results are verified with literatures and cell line experiments. More importantly, three potential mechanism modes of combinations were inferred based on network analysis. In summary, DComboNet is valuable for prioritizing drug combination and the network model may facilitate the understanding of the combination mechanisms.

4 citations

DOI
01 Dec 2021
TL;DR: In this article, the residual fully-connected neural network (RFCN) was proposed for modeling genomic profiling data. But, the proposed RFCN architecture is not suitable for unsupervised learning.
Abstract: Deep learning has achieved great successes in traditional fields like computer vision (CV), natural language processing (NLP), speech processing, and more. These advancements have greatly inspired researchers in genomics and made deep learning in genomics an exciting and popular topic. The convolutional neural network (CNN) and recurrent neural network (RNN) are frequently used to solve genomic sequencing and prediction problems, and multiple layer perception (MLP) and auto-encoders (AE) are frequently used for genomic profiling data like RNA expression data and gene mutation data. Here, we introduce a new neural network architecture-the residual fully-connected neural network (RFCN)-and describe its advantage in modeling genomic profiling data. We also incorporate AutoML algorithms and implement AutoGenome, an end-to-end, automated deep learning framework for genomic studies. By utilizing the proposed RFCN architecture, automatic hyper-parameter search, and neural architecture search algorithms, AutoGenome can automatically train high-performance deep learning models for various kinds of genomic profiling data. To help researchers better understand the trained models, AutoGenome can assess the importance of different features and export the most critical features for supervised learning tasks and the representative latent vectors for unsupervised learning tasks. We expect AutoGenome will become a popular tool in genomic studies.

4 citations

References
More filters
Journal ArticleDOI
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

34,830 citations

Journal Article
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large datasets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of datasets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the datasets.

30,124 citations

Journal ArticleDOI
TL;DR: The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data and provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-power gene expression and genomic hybridization experiments.
Abstract: The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data. GEO provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-throughput gene expression and genomic hybridization experiments. GEO is not intended to replace in house gene expression databases that benefit from coherent data sets, and which are constructed to facilitate a particular analytic method, but rather complement these by acting as a tertiary, central data distribution hub. The three central data entities of GEO are platforms, samples and series, and were designed with gene expression and genomic hybridization experiments in mind. A platform is, essentially, a list of probes that define what set of molecules may be detected. A sample describes the set of molecules that are being probed and references a single platform used to generate its molecular abundance data. A series organizes samples into the meaningful data sets which make up an experiment. The GEO repository is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.

10,968 citations

Journal ArticleDOI
TL;DR: How BLAT was optimized is described, which is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences.
Abstract: Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments A new tool, BLAT, is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences BLAT's speed stems from an index of all nonoverlapping K-mers in the genome This index fits inside the RAM of inexpensive computers, and need only be computed once for each genome assembly BLAT has several major stages It uses the index to find regions in the genome likely to be homologous to the query sequence It performs an alignment between homologous regions It stitches together these aligned regions (often exons) into larger alignments (typically genes) Finally, BLAT revisits small internal exons possibly missed at the first stage and adjusts large gap boundaries that have canonical splice sites where feasible This paper describes how BLAT was optimized Effects on speed and sensitivity are explored for various K-mer sizes, mismatch schemes, and number of required index matches BLAT is compared with other alignment programs on various test sets and then used in several genome-wide applications http://genomeucscedu hosts a web-based BLAT server for the human genome

8,326 citations

Journal ArticleDOI
TL;DR: This paper proposed parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples.
Abstract: SUMMARY Non-biological experimental variation or “batch effects” are commonly observed across multiple batches of microarray experiments, often rendering the task of combining data from these batches difficult. The ability to combine microarray data sets is advantageous to researchers to increase statistical power to detect biological phenomena from studies where logistical considerations restrict sample size or in studies that require the sequential hybridization of arrays. In general, it is inappropriate to combine data sets without adjusting for batch effects. Methods have been proposed to filter batch effects from data, but these are often complicated and require large batch sizes (>25) to implement. Because the majority of microarray studies are conducted using much smaller sample sizes, existing methods are not sufficient. We propose parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples. We illustrate our methods using two example data sets and show that our methods are justifiable, easy to apply, and useful in practice. Software for our method is freely available at: http://biosun1.harvard.edu/complab/batch/.

6,319 citations

Related Papers (5)