scispace - formally typeset
Search or ask a question
Author

Denis Torre

Other affiliations: University of Miami
Bio: Denis Torre is an academic researcher from Icahn School of Medicine at Mount Sinai. The author has contributed to research in topics: Medicine & Biology. The author has an hindex of 13, co-authored 26 publications receiving 1066 citations. Previous affiliations of Denis Torre include University of Miami.

Papers
More filters
Journal ArticleDOI
TL;DR: A high-throughput processing infrastructure and search database (ARCHS4) that provides processed RNA-seq data for 187,946 publicly available mouse and human samples to support exploration and reuse is developed.
Abstract: RNA sequencing (RNA-seq) is the leading technology for genome-wide transcript quantification. However, publicly available RNA-seq data is currently provided mostly in raw form, a significant barrier for global and integrative retrospective analyses. ARCHS4 is a web resource that makes the majority of published RNA-seq data from human and mouse available at the gene and transcript levels. For developing ARCHS4, available FASTQ files from RNA-seq experiments from the Gene Expression Omnibus (GEO) were aligned using a cloud-based infrastructure. In total 187,946 samples are accessible through ARCHS4 with 103,083 mouse and 84,863 human. Additionally, the ARCHS4 web interface provides intuitive exploration of the processed data through querying tools, interactive visualization, and gene pages that provide average expression across cell lines and tissues, top co-expressed genes for each gene, and predicted biological functions and protein–protein interactions for each gene based on prior knowledge combined with co-expression.

428 citations

Journal ArticleDOI
TL;DR: The ChEA3 background database contains a collection of gene set libraries generated from multiple sources including TF–gene co-expression from RNA-seq studies, TF–target associations from ChIP-seq experiments, and TF-gree co-occurrence computed from crowd-submitted gene lists, which illuminate general transcription factor properties such as whether the TF behaves as an activator or a repressor.
Abstract: Identifying the transcription factors (TFs) responsible for observed changes in gene expression is an important step in understanding gene regulatory networks. ChIP-X Enrichment Analysis 3 (ChEA3) is a transcription factor enrichment analysis tool that ranks TFs associated with user-submitted gene sets. The ChEA3 background database contains a collection of gene set libraries generated from multiple sources including TF-gene co-expression from RNA-seq studies, TF-target associations from ChIP-seq experiments, and TF-gene co-occurrence computed from crowd-submitted gene lists. Enrichment results from these distinct sources are integrated to generate a composite rank that improves the prediction of the correct upstream TF compared to ranks produced by individual libraries. We compare ChEA3 with existing TF prediction tools and show that ChEA3 performs better. By integrating the ChEA3 libraries, we illuminate general transcription factor properties such as whether the TF behaves as an activator or a repressor. The ChEA3 web-server is available from https://amp.pharm.mssm.edu/ChEA3.

379 citations

Journal ArticleDOI
Alexandra B Keenan1, Sherry L. Jenkins1, Kathleen M. Jagodnik1, Simon Koplev1, Edward He1, Denis Torre1, Zichen Wang1, Anders B. Dohlman1, Moshe C. Silverstein1, Alexander Lachmann1, Maxim V. Kuleshov1, Avi Ma'ayan1, Vasileios Stathias2, Raymond Terryn2, Daniel J. Cooper2, Michele Forlin2, Amar Koleti2, Dusica Vidovic2, Caty Chung2, Stephan C. Schürer2, Jouzas Vasiliauskas3, Marcin Pilarczyk3, Behrouz Shamsaei3, Mehdi Fazel3, Yan Ren3, Wen Niu3, Nicholas A. Clark3, Shana White3, Naim Al Mahi3, Lixia Zhang3, Michal Kouril3, John F. Reichard3, Siva Sivaganesan3, Mario Medvedovic3, Jaroslaw Meller3, Rick J. Koch1, Marc R. Birtwistle1, Ravi Iyengar1, Eric A. Sobie1, Evren U. Azeloglu1, Julia A. Kaye4, Jeannette Osterloh4, Kelly Haston4, Jaslin Kalra4, Steve Finkbiener4, Jonathan Z. Li5, Pamela Milani5, Miriam Adam5, Renan Escalante-Chong5, Karen Sachs5, Alexander LeNail5, Divya Ramamoorthy5, Ernest Fraenkel5, Gavin Daigle6, Uzma Hussain6, Alyssa Coye6, Jeffrey D. Rothstein6, Dhruv Sareen7, Loren Ornelas7, Maria G. Banuelos7, Berhan Mandefro7, Ritchie Ho7, Clive N. Svendsen7, Ryan G. Lim8, Jennifer Stocksdale8, Malcolm Casale8, Terri G. Thompson8, Jie Wu8, Leslie M. Thompson8, Victoria Dardov7, Vidya Venkatraman7, Andrea Matlock7, Jennifer E. Van Eyk7, Jacob D. Jaffe9, Malvina Papanastasiou9, Aravind Subramanian9, Todd R. Golub, Sean D. Erickson10, Mohammad Fallahi-Sichani10, Marc Hafner10, Nathanael S. Gray10, Jia-Ren Lin10, Caitlin E. Mills10, Jeremy L. Muhlich10, Mario Niepel10, Caroline E. Shamu10, Elizabeth H. Williams10, David Wrobel10, Peter K. Sorger10, Laura M. Heiser11, Joe W. Gray11, James E. Korkola11, Gordon B. Mills12, Mark A. LaBarge13, Mark A. LaBarge14, Heidi S. Feiler11, Mark A. Dane11, Elmar Bucher11, Michel Nederlof11, Damir Sudar11, Sean M. Gross11, David Kilburn11, Rebecca Smith11, Kaylyn Devlin11, Ron Margolis, Leslie Derr, Albert Lee, Ajay Pillai 
TL;DR: The LINCS program focuses on cellular physiology shared among tissues and cell types relevant to an array of diseases, including cancer, heart disease, and neurodegenerative disorders.
Abstract: The Library of Integrated Network-Based Cellular Signatures (LINCS) is an NIH Common Fund program that catalogs how human cells globally respond to chemical, genetic, and disease perturbations. Resources generated by LINCS include experimental and computational methods, visualization tools, molecular and imaging data, and signatures. By assembling an integrated picture of the range of responses of human cells exposed to many perturbations, the LINCS program aims to better understand human disease and to advance the development of new therapies. Perturbations under study include drugs, genetic perturbations, tissue micro-environments, antibodies, and disease-causing mutations. Responses to perturbations are measured by transcript profiling, mass spectrometry, cell imaging, and biochemical methods, among other assays. The LINCS program focuses on cellular physiology shared among tissues and cell types relevant to an array of diseases, including cancer, heart disease, and neurodegenerative disorders. This Perspective describes LINCS technologies, datasets, tools, and approaches to data accessibility and reusability.

300 citations

Journal ArticleDOI
TL;DR: By providing an intuitive user interface for notebook generation for RNA-seq data analysis, starting from the raw reads all the way to a complete interactive and reproducible report, BioJupies is a useful resource for experimental and computational biologists.
Abstract: Summary BioJupies is a web application that enables the automated creation, storage, and deployment of Jupyter Notebooks containing RNA-seq data analyses. Through an intuitive interface, novice users can rapidly generate tailored reports to analyze and visualize their own raw sequencing files, gene expression tables, or fetch data from >9,000 published studies containing >300,000 preprocessed RNA-seq samples. Generated notebooks have the executable code of the entire pipeline, rich narrative text, interactive data visualizations, differential expression, and enrichment analyses. The notebooks are permanently stored in the cloud and made available online through a persistent URL. The notebooks are downloadable, customizable, and can run within a Docker container. By providing an intuitive user interface for notebook generation for RNA-seq data analysis, starting from the raw reads all the way to a complete interactive and reproducible report, BioJupies is a useful resource for experimental and computational biologists. BioJupies is freely available as a web-based application from http://biojupies.cloud .

183 citations

Posted ContentDOI
14 Sep 2017-bioRxiv
TL;DR: ARCHS4, a web resource that makes the majority of previously published RNA-seq data from human and mouse freely available at the gene count level, outperforms co-expression data created from other major gene expression data repositories such as GTEx and CCLE.
Abstract: RNA-sequencing (RNA-seq) is currently the leading technology for genome-wide transcript quantification. While the volume of RNA-seq data is rapidly increasing, the currently publicly available RNA-seq data is provided mostly in raw form, with small portions processed non-uniformly. This is mainly because the computational demand, particularly for the alignment step, is a significant barrier for global and integrative retrospective analyses. To address this challenge, we developed all RNA-seq and ChIP-seq sample and signature search (ARCHS4), a web resource that makes the majority of previously published RNA-seq data from human and mouse freely available at the gene count level. Such uniformly processed data enables easy integration for downstream analyses. For developing the ARCHS4 resource, all available FASTQ files from RNA-seq experiments were retrieved from the Gene Expression Omnibus (GEO) and aligned using a cloud-based infrastructure. In total 137,792 samples are accessible through ARCHS4 with 72,363 mouse and 65,429 human samples. Through efficient use of cloud resources and dockerized deployment of the sequencing pipeline, the alignment cost per sample is reduced to less than one cent. ARCHS4 is updated automatically by adding newly published samples to the database as they become available. Additionally, the ARCHS4 web interface provides intuitive exploration of the processed data through querying tools, interactive visualization, and gene landing pages that provide average expression across cell lines and tissues, top co-expressed genes, and predicted biological functions and protein-protein interactions for each gene based on prior knowledge combined with co-expression. Benchmarking the quality of these predictions, co-expression correlation data created from ARCHS4 outperforms co-expression data created from other major gene expression data repositories such as GTEx and CCLE. ARCHS4 is freely accessible from: http://amp.pharm.mssm.edu/archs4.

146 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The latest version of STRING more than doubles the number of organisms it covers, and offers an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input.
Abstract: Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein-protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING database aims to collect, score and integrate all publicly available sources of protein-protein interaction information, and to complement these with computational predictions. Its goal is to achieve a comprehensive and objective global network, including direct (physical) as well as indirect (functional) interactions. The latest version of STRING (11.0) more than doubles the number of organisms it covers, to 5090. The most important new feature is an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input. For the enrichment analysis, STRING implements well-known classification systems such as Gene Ontology and KEGG, but also offers additional, new classification systems based on high-throughput text-mining as well as on a hierarchical clustering of the association network itself. The STRING resource is available online at https://string-db.org/.

10,584 citations

Journal ArticleDOI
TL;DR: Using scRNA-seq analysis, Bhattacharya and colleagues identify a subset of profibrotic lung macrophages that have a gene expression signature intermediate between those of monocytes and alveolar macrophage.
Abstract: Tissue fibrosis is a major cause of mortality that results from the deposition of matrix proteins by an activated mesenchyme. Macrophages accumulate in fibrosis, but the role of specific subgroups in supporting fibrogenesis has not been investigated in vivo. Here, we used single-cell RNA sequencing (scRNA-seq) to characterize the heterogeneity of macrophages in bleomycin-induced lung fibrosis in mice. A novel computational framework for the annotation of scRNA-seq by reference to bulk transcriptomes (SingleR) enabled the subclustering of macrophages and revealed a disease-associated subgroup with a transitional gene expression profile intermediate between monocyte-derived and alveolar macrophages. These CX3CR1+SiglecF+ transitional macrophages localized to the fibrotic niche and had a profibrotic effect in vivo. Human orthologs of genes expressed by the transitional macrophages were upregulated in samples from patients with idiopathic pulmonary fibrosis. Thus, we have identified a pathological subgroup of transitional macrophages that are required for the fibrotic response to injury.

1,790 citations

Journal ArticleDOI
TL;DR: Pathway analysis implicates immunity, lipid metabolism, tau binding proteins, and amyloid precursor protein (APP) metabolism, showing that genetic variants affecting APP and Aβ processing are associated not only with early-onset autosomal dominant Alzheimer’s disease but also with LOAD.
Abstract: Risk for late-onset Alzheimer’s disease (LOAD), the most prevalent dementia, is partially driven by genetics. To identify LOAD risk loci, we performed a large genome-wide association meta-analysis of clinically diagnosed LOAD (94,437 individuals). We confirm 20 previous LOAD risk loci and identify five new genome-wide loci (IQCK, ACE, ADAM10, ADAMTS1, and WWOX), two of which (ADAM10, ACE) were identified in a recent genome-wide association (GWAS)-by-familial-proxy of Alzheimer’s or dementia. Fine-mapping of the human leukocyte antigen (HLA) region confirms the neurological and immune-mediated disease haplotype HLA-DR15 as a risk factor for LOAD. Pathway analysis implicates immunity, lipid metabolism, tau binding proteins, and amyloid precursor protein (APP) metabolism, showing that genetic variants affecting APP and Aβ processing are associated not only with early-onset autosomal dominant Alzheimer’s disease but also with LOAD. Analyses of risk genes and pathways show enrichment for rare variants (P = 1.32 × 10−7), indicating that additional rare variants remain to be identified. We also identify important genetic correlations between LOAD and traits such as family history of dementia and education.

1,641 citations

Journal ArticleDOI
TL;DR: A new dedicated aspect of BioGRID annotates genome-wide CRISPR/Cas9-based screens that report gene–phenotype and gene–gene relationships, and captures chemical interaction data, including chemical–protein interactions for human drug targets drawn from the DrugBank database and manually curated bioactive compounds reported in the literature.
Abstract: The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the curation and archival storage of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2018 (build 3.4.164), BioGRID contains records for 1 598 688 biological interactions manually annotated from 55 809 publications for 71 species, as classified by an updated set of controlled vocabularies for experimental detection methods. BioGRID also houses records for >700 000 post-translational modification sites. BioGRID now captures chemical interaction data, including chemical-protein interactions for human drug targets drawn from the DrugBank database and manually curated bioactive compounds reported in the literature. A new dedicated aspect of BioGRID annotates genome-wide CRISPR/Cas9-based screens that report gene-phenotype and gene-gene relationships. An extension of the BioGRID resource called the Open Repository for CRISPR Screens (ORCS) database (https://orcs.thebiogrid.org) currently contains over 500 genome-wide screens carried out in human or mouse cell lines. All data in BioGRID is made freely available without restriction, is directly downloadable in standard formats and can be readily incorporated into existing applications via our web service platforms. BioGRID data are also freely distributed through partner model organism databases and meta-databases.

1,046 citations

Journal ArticleDOI
01 Mar 2021
TL;DR: Enrichr as discussed by the authors is a gene set search engine that enables the querying of hundreds of thousands of annotated gene sets Enrichr uniquely integrates knowledge from many high-profile projects to provide synthesized information about mammalian genes and gene sets.
Abstract: Profiling samples from patients, tissues, and cells with genomics, transcriptomics, epigenomics, proteomics, and metabolomics ultimately produces lists of genes and proteins that need to be further analyzed and integrated in the context of known biology Enrichr (Chen et al, 2013; Kuleshov et al, 2016) is a gene set search engine that enables the querying of hundreds of thousands of annotated gene sets Enrichr uniquely integrates knowledge from many high-profile projects to provide synthesized information about mammalian genes and gene sets The platform provides various methods to compute gene set enrichment, and the results are visualized in several interactive ways This protocol provides a summary of the key features of Enrichr, which include using Enrichr programmatically and embedding an Enrichr button on any website © 2021 Wiley Periodicals LLC Basic Protocol 1: Analyzing lists of differentially expressed genes from transcriptomics, proteomics and phosphoproteomics, GWAS studies, or other experimental studies Basic Protocol 2: Searching Enrichr by a single gene or key search term Basic Protocol 3: Preparing raw or processed RNA-seq data through BioJupies in preparation for Enrichr analysis Basic Protocol 4: Analyzing gene sets for model organisms using modEnrichr Basic Protocol 5: Using Enrichr in Geneshot Basic Protocol 6: Using Enrichr in ARCHS4 Basic Protocol 7: Using the enrichment analysis visualization Appyter to visualize Enrichr results Basic Protocol 8: Using the Enrichr API Basic Protocol 9: Adding an Enrichr button to a website

884 citations