scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.

TL;DR: The expanded CMap is reported, made possible by a new, low-cost, high-throughput reduced representation expression profiling method that is shown to be highly reproducible, comparable to RNA sequencing, and suitable for computational inference of the expression levels of 81% of non-measured transcripts.
About: This article is published in Cell.The article was published on 2017-11-30 and is currently open access. It has received 1943 citations till now.
Citations
More filters
Journal ArticleDOI
TL;DR: It is suggested that combinations of SRC or MEK inhibitors with gemcitabine possess synergistic effects on the squamous subtype of PDAC cells and warrant further investigation.
Abstract: Pancreatic adenocarcinoma (PDAC) is a highly aggressive cancer with a high chance of recurrence, limited treatment options, and poor prognosis. A recent study has classified pancreatic cancers into four molecular subtypes: (1) squamous, (2) immunogenic, (3) pancreatic progenitor and (4) aberrantly differentiated endocrine exocrine. Among all the subtypes, the squamous subtype has the worst prognosis. This study aims to utilize large scale genomic datasets and computational systems biology to identify potential drugs targeting the squamous subtype of PDAC through combination therapy. Using the transcriptomic data available from the International Cancer Genome Consortium, Cancer Cell Line Encyclopedia and Connectivity Map, we identified 26 small molecules that could target the squamous subtype of PDAC. Among them include inhibitors targeting the SRC proto-oncogene (SRC) and the mitogen-activated protein kinase kinase 1/2 (MEK1/2). Further analyses demonstrated that the SRC inhibitors (dasatinib and PP2) and MEK1/2 inhibitor (pimasertib) synergized gemcitabine sensitivity specifically in the squamous subtype of PDAC cells (SW1990 and BxPC3), but not in the PDAC progenitor cells (AsPC1). Further analysis revealed that the synergistic effects are dependent on SRC or MEK1/2 activities, as overexpression of SRC or MEK1/2 completely abrogated the synergistic effects SRC inhibitors (dasatinib and PP2) and MEK1/2 inhibitor (pimasertib). In contrast, no significant toxicity was observed in the MRC5 human lung fibroblast and ARPE-19 human retinal pigment epithelial cells. Together, our findings suggest that combinations of SRC or MEK inhibitors with gemcitabine possess synergistic effects on the squamous subtype of PDAC cells and warrant further investigation.

35 citations

Journal ArticleDOI
TL;DR: This study integrated the next-generation L1000-based CMap and an analytic Web tool, the L1000FWD, for systematic analyses of polypharmacology and drug repurposing, and identified KM-00927 and BRD-K75081836 as novel HDAC inhibitors and mitomycin C as a topoisomerase IIB inhibitor.
Abstract: Drug repurposing aims to find novel indications of clinically used or experimental drugs. Because drug data already exist, drug repurposing may save time and cost, and bypass safety concerns. Polypharmacology, one drug with multiple targets, serves as a basis for drug repurposing. Large-scale databases have been accumulated in recent years, and utilization and integration of these databases would be highly helpful for polypharmacology and drug repurposing. The Connectivity Map (CMap) is a database collecting gene-expression profiles of drug-treated human cancer cells, which has been widely used for investigation of polypharmacology and drug repurposing. In this study, we integrated the next-generation L1000-based CMap and an analytic Web tool, the L1000FWD, for systematic analyses of polypharmacology and drug repurposing. Two different types of anti-cancer drugs were used as proof-of-concept examples, including histone deacetylase (HDAC) inhibitors and topoisomerase inhibitors. We identified KM-00927 and BRD-K75081836 as novel HDAC inhibitors and mitomycin C as a topoisomerase IIB inhibitor. Our study provides a prime example of utilization and integration of the freely available public resources for systematic polypharmacology analysis and drug repurposing.

35 citations

Journal ArticleDOI
TL;DR: A proof-of-concept trial using the TCGA breast cancer dataset demonstrates the application of Dr Insight for a comprehensive analysis, from redirection of drug therapies, to a systematic construction of disease-specific drug-target networks.
Abstract: Motivation Transcriptome-based computational drug repurposing has attracted considerable interest by bringing about faster and more cost-effective drug discovery. Nevertheless, key limitations of the current drug connectivity-mapping paradigm have been long overlooked, including the lack of effective means to determine optimal query gene signatures. Results The novel approach Dr Insight implements a frame-breaking statistical model for the 'hand-shake' between disease and drug data. The genome-wide screening of concordantly expressed genes (CEGs) eliminates the need for subjective selection of query signatures, added to eliciting better proxy for potential disease-specific drug targets. Extensive comparisons on simulated and real cancer datasets have validated the superior performance of Dr Insight over several popular drug-repurposing methods to detect known cancer drugs and drug-target interactions. A proof-of-concept trial using the TCGA breast cancer dataset demonstrates the application of Dr Insight for a comprehensive analysis, from redirection of drug therapies, to a systematic construction of disease-specific drug-target networks. Availability and implementation Dr Insight R package is available at https://cran.r-project.org/web/packages/DrInsight/index.html. Supplementary information Supplementary data are available at Bioinformatics online.

35 citations

Journal ArticleDOI
TL;DR: This review explores recent synthetic strategies for the production of bioactive small molecules and concludes with a presentation of current methods that enable the assessment of the biological performance diversity of small-molecule libraries.
Abstract: Diversity-oriented synthesis has historically focused on the generation of small-molecule collections with considerable scaffold, stereochemical, and appendage diversity. Recently, this focus has begun to shift to the production of small-molecule libraries with diverse biological activities. It is currently not clear which properties and structural features of molecules are predictive of diverse performance in biological assays, and a better understanding of this relationship is critical for the development of performance-diverse small-molecule libraries for the discovery of novel probes for challenging targets. This review explores recent synthetic strategies for the production of bioactive small molecules and concludes with a presentation of current methods that enable the assessment of the biological performance diversity of small-molecule libraries.

35 citations

Posted ContentDOI
08 Jun 2017-bioRxiv
TL;DR: This work shows that mitigating off-target effects is feasible in these datasets via computational methodologies to produce a Consensus Gene Signature (CGS), and compares RNAi technology to clustered regularly interspaced short palindromic repeat (CRISPR)-based knockout by analysis of 373 sgRNAs in 6 cells lines, and shows that the on-target efficacies are comparable.
Abstract: The application of RNA interference (RNAi) to mammalian cells has provided the means to perform phenotypic screens to determine the functions of genes. Although RNAi has revolutionized loss of function genetic experiments, it has been difficult to systematically assess the prevalence and consequences of off-target effects. The Connectivity Map (CMAP) represents an unprecedented resource to study the gene expression consequences of expressing short hairpin RNAs (shRNAs). Analysis of signatures for over 13,000 shRNAs applied in 9 cell lines revealed that miRNA-like off-target effects of RNAi are far stronger and more pervasive than generally appreciated. We show that mitigating off-target effects is feasible in these datasets via computational methodologies to produce a Consensus Gene Signature (CGS). In addition, we compared RNAi technology to clustered regularly interspaced short palindromic repeat (CRISPR)-based knockout by analysis of 373 sgRNAs in 6 cells lines, and show that the on-target efficacies are comparable, but CRISPR technology is far less susceptible to systematic off-target effects. These results will help guide the proper use and analysis of loss-of-function reagents for the determination of gene function.

35 citations


Cites background from "A Next Generation Connectivity Map:..."

  • ...Conceptually, the pattern of mRNA changes serves as a signature of the perturbation, and correlations between these signatures allows insight into connections between genes, drugs, and disease states [1-5]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

34,830 citations

Journal Article
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large datasets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of datasets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the datasets.

30,124 citations

Journal ArticleDOI
TL;DR: The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data and provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-power gene expression and genomic hybridization experiments.
Abstract: The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data. GEO provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-throughput gene expression and genomic hybridization experiments. GEO is not intended to replace in house gene expression databases that benefit from coherent data sets, and which are constructed to facilitate a particular analytic method, but rather complement these by acting as a tertiary, central data distribution hub. The three central data entities of GEO are platforms, samples and series, and were designed with gene expression and genomic hybridization experiments in mind. A platform is, essentially, a list of probes that define what set of molecules may be detected. A sample describes the set of molecules that are being probed and references a single platform used to generate its molecular abundance data. A series organizes samples into the meaningful data sets which make up an experiment. The GEO repository is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.

10,968 citations

Journal ArticleDOI
TL;DR: How BLAT was optimized is described, which is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences.
Abstract: Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments A new tool, BLAT, is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences BLAT's speed stems from an index of all nonoverlapping K-mers in the genome This index fits inside the RAM of inexpensive computers, and need only be computed once for each genome assembly BLAT has several major stages It uses the index to find regions in the genome likely to be homologous to the query sequence It performs an alignment between homologous regions It stitches together these aligned regions (often exons) into larger alignments (typically genes) Finally, BLAT revisits small internal exons possibly missed at the first stage and adjusts large gap boundaries that have canonical splice sites where feasible This paper describes how BLAT was optimized Effects on speed and sensitivity are explored for various K-mer sizes, mismatch schemes, and number of required index matches BLAT is compared with other alignment programs on various test sets and then used in several genome-wide applications http://genomeucscedu hosts a web-based BLAT server for the human genome

8,326 citations

Journal ArticleDOI
TL;DR: This paper proposed parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples.
Abstract: SUMMARY Non-biological experimental variation or “batch effects” are commonly observed across multiple batches of microarray experiments, often rendering the task of combining data from these batches difficult. The ability to combine microarray data sets is advantageous to researchers to increase statistical power to detect biological phenomena from studies where logistical considerations restrict sample size or in studies that require the sequential hybridization of arrays. In general, it is inappropriate to combine data sets without adjusting for batch effects. Methods have been proposed to filter batch effects from data, but these are often complicated and require large batch sizes (>25) to implement. Because the majority of microarray studies are conducted using much smaller sample sizes, existing methods are not sufficient. We propose parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples. We illustrate our methods using two example data sets and show that our methods are justifiable, easy to apply, and useful in practice. Software for our method is freely available at: http://biosun1.harvard.edu/complab/batch/.

6,319 citations

Related Papers (5)