scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.

TL;DR: The expanded CMap is reported, made possible by a new, low-cost, high-throughput reduced representation expression profiling method that is shown to be highly reproducible, comparable to RNA sequencing, and suitable for computational inference of the expression levels of 81% of non-measured transcripts.
About: This article is published in Cell.The article was published on 2017-11-30 and is currently open access. It has received 1943 citations till now.
Citations
More filters
Journal ArticleDOI
TL;DR: An algorithm is introduced that can predict the effect of interventions on gene expression signatures associated with high disease processes and risk, and identify and validate promising drug targets for paediatric and adult cancers.
Abstract: Despite advances in the molecular exploration of paediatric cancers, approximately 50% of children with high-risk neuroblastoma lack effective treatment. To identify therapeutic options for this group of high-risk patients, we combine predictive data mining with experimental evaluation in patient-derived xenograft cells. Our proposed algorithm, TargetTranslator, integrates data from tumour biobanks, pharmacological databases, and cellular networks to predict how targeted interventions affect mRNA signatures associated with high patient risk or disease processes. We find more than 80 targets to be associated with neuroblastoma risk and differentiation signatures. Selected targets are evaluated in cell lines derived from high-risk patients to demonstrate reversal of risk signatures and malignant phenotypes. Using neuroblastoma xenograft models, we establish CNR2 and MAPK8 as promising candidates for the treatment of high-risk neuroblastoma. We expect that our method, available as a public tool (targettranslator.org), will enhance and expedite the discovery of risk-associated targets for paediatric and adult cancers.

38 citations

Journal ArticleDOI
TL;DR: More effective bridging from NP research to CT was the goal of a September, 2018 transdisciplinary workshop, and participants emphasized that replicability and likelihood of successful translation depend on rigor in experimental design, interpretation, and reporting across the continuum of NP research.
Abstract: While great interest in health effects of natural product (NP) including dietary supplements and foods persists, promising preclinical NP research is not consistently translating into actionable clinical trial (CT) outcomes. Generally considered the gold standard for assessing safety and efficacy, CTs, especially phase III CTs, are costly and require rigorous planning to optimize the value of the information obtained. More effective bridging from NP research to CT was the goal of a September, 2018 transdisciplinary workshop. Participants emphasized that replicability and likelihood of successful translation depend on rigor in experimental design, interpretation, and reporting across the continuum of NP research. Discussions spanned good practices for NP characterization and quality control; use and interpretation of models (computational through in vivo) with strong clinical predictive validity; controls for experimental artefacts, especially for in vitro interrogation of bioactivity and mechanisms of action; rigorous assessment and interpretation of prior research; transparency in all reporting; and prioritization of research questions. Natural product clinical trials prioritized based on rigorous, convergent supporting data and current public health needs are most likely to be informative and ultimately affect public health. Thoughtful, coordinated implementation of these practices should enhance the knowledge gained from future NP research.

38 citations

Journal ArticleDOI
TL;DR: The cell viability–signature relationship was used to predict viability from transcriptomics signatures, and compounds that induce cell death in tumor cell lines were identified and validated.
Abstract: Transcriptional perturbation signatures are valuable data sources for functional genomics. Linking perturbation signatures to screenings opens the possibility to model cellular phenotypes from expression data and to identify efficacious drugs. We linked perturbation transcriptomics data from the LINCS-L1000 project with cell viability information upon genetic (Achilles project) and chemical (CTRP screen) perturbations yielding more than 90 000 signature-viability pairs. An integrated analysis showed that the cell viability signature is a major factor underlying perturbation signatures. The signature is linked to transcription factors regulating cell death, proliferation and division time. We used the cell viability-signature relationship to predict viability from transcriptomics signatures, and identified and validated compounds that induce cell death in tumor cell lines. We showed that cellular toxicity can lead to unexpected similarity of signatures, confounding mechanism of action discovery. Consensus compound signatures predicted cell-specific drug sensitivity, even if the signature is not measured in the same cell line, and outperformed conventional drug-specific features. Our results can help in understanding mechanisms behind cell death and removing confounding factors of transcriptomic perturbation screens. To interactively browse our results and predict cell viability in new gene expression samples, we developed CEVIChE (CEll VIability Calculator from gene Expression; https://saezlab.shinyapps.io/ceviche/).

38 citations

Journal ArticleDOI
TL;DR: Terreic acid and pergolide robustly reduced alcohol intake and BALs in HDID-1 mice, providing the first evidence for transcriptome-based drug discovery to target an addiction trait.

38 citations

Journal ArticleDOI
TL;DR: Expression levels of HPRT1, Jag2, AURKA, PGK1, and HRPT1 have the potential to be used independently as diagnostic, prognostic, or treatment biomarkers in endometrial cancer.
Abstract: Incidence of endometrial cancer are rising both in the United States and worldwide. As endometrial cancer becomes more prominent, the need to develop and characterize biomarkers for early stage diagnosis and the treatment of endometrial cancer has become an important priority. Several biomarkers currently used to diagnose endometrial cancer are directly related to obesity. Although epigenetic and mutational biomarkers have been identified and have resulted in treatment options for patients with specific aberrations, many tumors do not harbor those specific aberrations. A promising alternative is to determine biomarkers based on differential gene expression, which can be used to estimate prognosis. We evaluated 589 patients to determine differential expression between normal and malignant patient samples. We then supplemented these evaluations with immunohistochemistry staining of endometrial tumors and normal tissues. Additionally, we used the Library of Integrated Network-based Cellular Signatures to evaluate the effects of 1826 chemotherapy drugs on 26 cell lines to determine the effects of each drug on HPRT1 and AURKA expression. Expression of HPRT1, Jag2, AURKA, and PGK1 were elevated when compared to normal samples, and HPRT1 and PGK1 showed a stepwise elevation in expression that was significantly related to cancer grade. To determine the prognostic potential of these genes, we evaluated patient outcome and found that levels of both HPRT1 and AURKA were significantly correlated with overall patient survival. When evaluating drugs that had the most significant effect on lowering the expression of HPRT1 and AURKA, we found that Topo I and MEK inhibitors were most effective at reducing HPRT1 expression. Meanwhile, drugs that were effective at reducing AURKA expression were more diverse (MEK, Topo I, MELK, HDAC, etc.). The effects of these drugs on the expression of HPRT1 and AURKA provides insight into their role within cellular maintenance. Collectively, these data show that JAG2, AURKA, PGK1, and HRPT1 have the potential to be used independently as diagnostic, prognostic, or treatment biomarkers in endometrial cancer. Expression levels of these genes may provide physicians with insight into tumor aggressiveness and chemotherapy drugs that are well suited to individual patients.

38 citations


Cites methods from "A Next Generation Connectivity Map:..."

  • ...We used the Level 5 data, which were generated using the L1000 platform [45], normalized using a z-score methodology within each plate, and averaged across replicates....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

34,830 citations

Journal Article
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large datasets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of datasets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the datasets.

30,124 citations

Journal ArticleDOI
TL;DR: The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data and provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-power gene expression and genomic hybridization experiments.
Abstract: The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data. GEO provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-throughput gene expression and genomic hybridization experiments. GEO is not intended to replace in house gene expression databases that benefit from coherent data sets, and which are constructed to facilitate a particular analytic method, but rather complement these by acting as a tertiary, central data distribution hub. The three central data entities of GEO are platforms, samples and series, and were designed with gene expression and genomic hybridization experiments in mind. A platform is, essentially, a list of probes that define what set of molecules may be detected. A sample describes the set of molecules that are being probed and references a single platform used to generate its molecular abundance data. A series organizes samples into the meaningful data sets which make up an experiment. The GEO repository is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.

10,968 citations

Journal ArticleDOI
TL;DR: How BLAT was optimized is described, which is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences.
Abstract: Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments A new tool, BLAT, is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences BLAT's speed stems from an index of all nonoverlapping K-mers in the genome This index fits inside the RAM of inexpensive computers, and need only be computed once for each genome assembly BLAT has several major stages It uses the index to find regions in the genome likely to be homologous to the query sequence It performs an alignment between homologous regions It stitches together these aligned regions (often exons) into larger alignments (typically genes) Finally, BLAT revisits small internal exons possibly missed at the first stage and adjusts large gap boundaries that have canonical splice sites where feasible This paper describes how BLAT was optimized Effects on speed and sensitivity are explored for various K-mer sizes, mismatch schemes, and number of required index matches BLAT is compared with other alignment programs on various test sets and then used in several genome-wide applications http://genomeucscedu hosts a web-based BLAT server for the human genome

8,326 citations

Journal ArticleDOI
TL;DR: This paper proposed parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples.
Abstract: SUMMARY Non-biological experimental variation or “batch effects” are commonly observed across multiple batches of microarray experiments, often rendering the task of combining data from these batches difficult. The ability to combine microarray data sets is advantageous to researchers to increase statistical power to detect biological phenomena from studies where logistical considerations restrict sample size or in studies that require the sequential hybridization of arrays. In general, it is inappropriate to combine data sets without adjusting for batch effects. Methods have been proposed to filter batch effects from data, but these are often complicated and require large batch sizes (>25) to implement. Because the majority of microarray studies are conducted using much smaller sample sizes, existing methods are not sufficient. We propose parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples. We illustrate our methods using two example data sets and show that our methods are justifiable, easy to apply, and useful in practice. Software for our method is freely available at: http://biosun1.harvard.edu/complab/batch/.

6,319 citations

Related Papers (5)