scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.

TL;DR: The expanded CMap is reported, made possible by a new, low-cost, high-throughput reduced representation expression profiling method that is shown to be highly reproducible, comparable to RNA sequencing, and suitable for computational inference of the expression levels of 81% of non-measured transcripts.
About: This article is published in Cell.The article was published on 2017-11-30 and is currently open access. It has received 1943 citations till now.
Citations
More filters
Journal ArticleDOI
TL;DR: A four-MDG-based prognostic signature, comprising GPRC5A, SOWAHC, S100A14, and ARNTL2, was established for PACA patients and it is envisaged that this signature will help in evaluation of intratumoral immune texture and enable identification of novel stratification biomarkers for precision therapies.
Abstract: Pancreatic cancer (PACA), which is characterized by an immunosuppressive nature, remains one of the deadliest malignancies worldwide. Aberrant DNA methylation (DNAm) reportedly influences tumor immune microenvironment. Here, we evaluated the role of DNA methylation driven genes (MDGs) in PACA through integrative analyses of epigenomic, transcriptomic, genomic and clinicopathological data obtained from TCGA, ICGC, ArrayExpress and GEO databases. Thereafter, we established a four-MDG signature, comprising GPRC5A, SOWAHC, S100A14, and ARNTL2. High signature risk-scores were associated with poor histologic grades and late TNM stages. Survival analyses showed the signature had a significant predictive effect on OS. WGCNA revealed that the signature may be associated with immune system, while high risk-scores might reflect immune dysregulation. Furthermore, GSEA and GSVA revealed significant enrichment of p53 pathway and mismatch repair pathways in high risk-score subgroups. Immune infiltration analysis showed that CD8+ T cells were more abundant in low score subgroups, while M0 macrophages exhibited an opposite trend. Moreover, negative regulatory genes of cancer-immunity cycle (CIC) illustrated that immunosuppressors TGFB1, VEGFA, and CD274 (PDL1) were all positively correlated with risk-scores. Furthermore, the four signature genes were negatively correlated with CD8+ lymphocytes, but positively associated with myeloid derived suppressor cells (MDSC). Conversely, specimens with high risk-scores exhibited heavier tumor mutation burdens (TMB) and might show better responses to some chemotherapy and targeted drugs, which would benefit stratification of PACA patients. On the other hand, we investigated the corresponding proteins of the four MDGs using paraffin-embedded PACA samples collected from patients who underwent radical surgery in our center and found that all these four proteins were elevated in cancerous tissues and might serve as prognostic markers for PACA patients, high expression levels indicated poor prognosis. In conclusion, we successfully established a four-MDG-based prognostic signature for PACA patients. We envisage that this signature will help in evaluation of intratumoral immune texture and enable identification of novel stratification biomarkers for precision therapies.

7 citations

Posted ContentDOI
12 Apr 2021-bioRxiv
TL;DR: A set of guidelines for different aspects of training gene expression-based predictors using cell line datasets is introduced, which provide extensive analysis of the generalization of drug sensitivity predictors, and challenge many current practices in the community including the choice of training dataset and measure ofdrug sensitivity.
Abstract: The goal of precision oncology is to tailor treatment for patients individually using the genomic profile of their tumors. Pharmacogenomics datasets such as cancer cell lines are among the most valuable resources for drug sensitivity prediction, a crucial task of precision oncology. Machine learning methods have been employed to predict drug sensitivity based on the multiple omics data available for large panels of cancer cell lines. However, there are no comprehensive guidelines on how to properly train and validate such machine learning models for drug sensitivity prediction. In this paper, we introduce a set of guidelines for different aspects of training gene expression-based predictors using cell line datasets. These guidelines provide extensive analysis of the generalization of drug sensitivity predictors, and challenge many current practices in the community including the choice of training dataset and measure of drug sensitivity. Application of these guidelines in future studies will enable the development of more robust preclinical biomarkers.

7 citations


Cites methods from "A Next Generation Connectivity Map:..."

  • ...We also utilized feature selection to reduce the input dimensionality (number of genes) and tried focusing only on the L1000 landmark genes [54], or focusing on top genes selected by the Minimum Redundancy--Maximum Relevance (mRMR) method [55]....

    [...]

Journal ArticleDOI
Abstract: While advancements in genome sequencing have identified millions of somatic mutations in cancer, their functional impact is poorly understood. We previously developed the expression-based variant impact phenotyping (eVIP) method to use gene expression data to characterize the function of gene variants. The eVIP method uses a decision tree-based algorithm to predict the functional impact of somatic variants by comparing gene expression signatures induced by introduction of wild-type (WT) versus mutant cDNAs in cell lines. The method distinguishes between variants that are gain-of-function, loss-of-function, change-of-function, or neutral. We present eVIP2, software that allows for pathway analysis (eVIP Pathways) and usage with RNA-seq data. To demonstrate the eVIP2 software and approach, we characterized two recurrent frameshift variants in RNF43, a negative regulator of Wnt signaling, frequently mutated in colorectal, gastric, and endometrial cancer. RNF43 WT, RNF43 R117fs, RNF43 G659fs, or GFP control cDNA were overexpressed in HEK293T cells. Analysis with eVIP2 predicted that the frameshift at position 117 was a loss-of-function mutation, as expected. The second frameshift at position 659 has been previously described as a passenger mutation that maintains the RNF43 WT function as a negative regulator of Wnt. Surprisingly, eVIP2 predicted G659fs to be a change-of-function mutation. Additional eVIP Pathways analysis of RNF43 G659fs predicted 10 pathways to be significantly altered, including TNF-α via NFκB signaling, KRAS signaling, and hypoxia, highlighting the benefit of a more comprehensive approach when determining the impact of gene variant function. To validate these predictions, we performed reporter assays and found that each pathway activated by expression of RNF43 G659fs, but not expression of RNF43 WT, was identified as impacted by eVIP2, supporting that RNF43 G659fs is a change-of-function mutation and its effect on the identified pathways. Pathway activation was further validated by Western blot analysis. Lastly, we show primary colon adenocarcinoma patient samples with R117fs and G659fs variants have transcriptional profiles similar to BRAF missense mutations with activated RAS/MAPK signaling, consistent with KRAS signaling pathways being GOF in both variants. The eVIP2 method is an important step towards overcoming the current challenge of variant interpretation in the implementation of precision medicine. eVIP2 is available at https://github.com/BrooksLabUCSC/eVIP2.

7 citations

Journal ArticleDOI
TL;DR: It is identified that MHC-II signature is an independent and favorable predictor of immune response and the prognosis of bladder cancer treated with immune checkpoint inhibitors (ICIs), one that may be superior to tumor mutation burden.
Abstract: A large proportion of anti-tumor immunity research is focused on major histocompatibility complex class I (MHC-I) molecules and CD8+ T cells. Despite mounting evidence has shown that CD4+ T cells play a major role in anti-tumor immunity, the role of the MHC-II molecules in tumor immunotherapy has not been thoroughly researched and reported. In this study, we defined a MHC-II signature for the first time by calculating the enrichment score of MHC-II protein binding pathway with a single sample gene set enrichment analysis (ssGSEA) algorithm. To evaluate and validate the predictive value of the MHC class II (MHC-II) signature, we collected the transcriptome, mutation data and matched clinical data of bladder cancer patients from IMvigor210, The Cancer Genome Atlas (TCGA) databases and Gene Expression Omnibus (GEO) databases. Comprehensive analyses of immunome, transcriptome, metabolome, genome and drugome were performed in order to determine the association of MHC-II signature and tumor immunotherapy. We identified that MHC-II signature is an independent and favorable predictor of immune response and the prognosis of bladder cancer treated with immune checkpoint inhibitors (ICIs), one that may be superior to tumor mutation burden. MHC-II signature was significantly associated with increased immune cell infiltration and levels of immune-related gene expression signatures. Additionally, transcriptomic analysis showed immune activation in the high-MHC-II signature subgroup, whereas it showed fatty acid metabolism and glucuronidation in the low-MHC-II signature subgroup. Moreover, exploration of corresponding genomic profiles highlighted the significance of tumor protein p53 (TP53) and fibroblast growth factor receptor 3 (FGFR3) alterations. Our results also allowed for the identification of candidate compounds for combined immunotherapy treatment that may be beneficial for patients with bladder cancer and a high MHC-II signature. In conclusion, this study provides a new perspective on MHC-II signature, as an independent and favorable predictor of immune response and prognosis of bladder cancer treated with ICIs.

7 citations

Journal ArticleDOI
TL;DR: In this paper, the effect of esomeprazole on pro-inflammatory and profibrotic molecules through nuclear translocation of the transcription factor nuclear factor-like 2 (Nrf2) and induction of the cytoprotective molecule heme oxygenase 1 (HO1) was investigated.
Abstract: Idiopathic pulmonary fibrosis (IPF) is an orphan disease characterized by progressive loss of lung function resulting in shortness of breath and often death within 3–4 years of diagnosis. Repetitive lung injury in susceptible individuals is believed to promote chronic oxidative stress, inflammation, and uncontrolled collagen deposition. Several preclinical and retrospective clinical studies in IPF have reported beneficial outcomes associated with the use of proton pump inhibitors (PPIs) such as esomeprazole. Accordingly, we sought to investigate molecular mechanism(s) by which PPIs favorably regulate the disease process. We stimulated oxidative stress, pro-inflammatory and profibrotic phenotypes in primary human lung epithelial cells and fibroblasts upon treatment with bleomycin or transforming growth factor β (TGFβ) and assessed the effect of a prototype PPI, esomeprazole, in regulating these processes. Our study shows that esomeprazole controls pro-inflammatory and profibrotic molecules through nuclear translocation of the transcription factor nuclear factor-like 2 (Nrf2) and induction of the cytoprotective molecule heme oxygenase 1 (HO1). Genetic deletion of Nrf2 or pharmacological inhibition of HO1 impaired esomeprazole-mediated regulation of proinflammatory and profibrotic molecules. Additional studies indicate that activation of Mitogen Activated Protein Kinase (MAPK) pathway is involved in the process. Our experimental data was corroborated by bioinformatics studies of an NIH chemical library which hosts gene expression profiles of IPF lung fibroblasts treated with over 20,000 compounds including esomeprazole. Intriguingly, we found 45 genes that are upregulated in IPF but downregulated by esomeprazole. Pathway analysis showed that these genes are enriched for profibrotic processes. Unbiased high throughput RNA-seq study supported antifibrotic effect of esomeprazole and revealed several novel targets. Taken together, PPIs may play antifibrotic role in IPF through direct regulation of the MAPK/Nrf2/HO1 pathway to favorably influence the disease process in IPF.

7 citations

References
More filters
Journal ArticleDOI
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

34,830 citations

Journal Article
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large datasets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of datasets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the datasets.

30,124 citations

Journal ArticleDOI
TL;DR: The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data and provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-power gene expression and genomic hybridization experiments.
Abstract: The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data. GEO provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-throughput gene expression and genomic hybridization experiments. GEO is not intended to replace in house gene expression databases that benefit from coherent data sets, and which are constructed to facilitate a particular analytic method, but rather complement these by acting as a tertiary, central data distribution hub. The three central data entities of GEO are platforms, samples and series, and were designed with gene expression and genomic hybridization experiments in mind. A platform is, essentially, a list of probes that define what set of molecules may be detected. A sample describes the set of molecules that are being probed and references a single platform used to generate its molecular abundance data. A series organizes samples into the meaningful data sets which make up an experiment. The GEO repository is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.

10,968 citations

Journal ArticleDOI
TL;DR: How BLAT was optimized is described, which is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences.
Abstract: Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments A new tool, BLAT, is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences BLAT's speed stems from an index of all nonoverlapping K-mers in the genome This index fits inside the RAM of inexpensive computers, and need only be computed once for each genome assembly BLAT has several major stages It uses the index to find regions in the genome likely to be homologous to the query sequence It performs an alignment between homologous regions It stitches together these aligned regions (often exons) into larger alignments (typically genes) Finally, BLAT revisits small internal exons possibly missed at the first stage and adjusts large gap boundaries that have canonical splice sites where feasible This paper describes how BLAT was optimized Effects on speed and sensitivity are explored for various K-mer sizes, mismatch schemes, and number of required index matches BLAT is compared with other alignment programs on various test sets and then used in several genome-wide applications http://genomeucscedu hosts a web-based BLAT server for the human genome

8,326 citations

Journal ArticleDOI
TL;DR: This paper proposed parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples.
Abstract: SUMMARY Non-biological experimental variation or “batch effects” are commonly observed across multiple batches of microarray experiments, often rendering the task of combining data from these batches difficult. The ability to combine microarray data sets is advantageous to researchers to increase statistical power to detect biological phenomena from studies where logistical considerations restrict sample size or in studies that require the sequential hybridization of arrays. In general, it is inappropriate to combine data sets without adjusting for batch effects. Methods have been proposed to filter batch effects from data, but these are often complicated and require large batch sizes (>25) to implement. Because the majority of microarray studies are conducted using much smaller sample sizes, existing methods are not sufficient. We propose parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples. We illustrate our methods using two example data sets and show that our methods are justifiable, easy to apply, and useful in practice. Software for our method is freely available at: http://biosun1.harvard.edu/complab/batch/.

6,319 citations

Related Papers (5)