scispace - formally typeset
Search or ask a question
Journal ArticleDOI

deepTools2: a next generation web server for deep-sequencing data analysis

TL;DR: An update to the Galaxy-based web server deepTools, which allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches, is presented.
Abstract: We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
13 Jun 2019-Cell
TL;DR: A strategy to "anchor" diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities.

7,892 citations


Cites methods from "deepTools2: a next generation web s..."

  • ...We created normalized read coverage tracks (bigwig format) for each BAM file using the program bamCoverage in the deepTools package [Ramı́rez et al., 2016] with the binSize parameter set to 1 and using the reads per kilobase per million mapped reads (RPKM) normalization option....

    [...]

  • ...1.2 Ramı́rez et al., 2016 https://github.com/deeptools/deepTools GOstats v2....

    [...]

  • ...We created normalized read coverage tracks (bigwig format) for each BAM file using the program bamCoverage in the deepTools package [Ramı́rez et al., 2016] with the binSize parameter set to 1 and using the reads per kilobase per million mapped reads (RPKM) normalization option. e10 Cell 177,…...

    [...]

01 Feb 2015
TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.
Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

4,409 citations

Journal ArticleDOI
TL;DR: Improvements to Galaxy's core framework, user interface, tools, and training materials enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed.
Abstract: Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially.

2,601 citations

Posted ContentDOI
02 Nov 2018-bioRxiv
TL;DR: This work presents a strategy for comprehensive integration of single cell data, including the assembly of harmonized references, and the transfer of information across datasets, and demonstrates how anchoring can harmonize in-situ gene expression and scRNA-seq datasets.
Abstract: Single cell transcriptomics (scRNA-seq) has transformed our ability to discover and annotate cell types and states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, including high-dimensional immunophenotypes, chromatin accessibility, and spatial positioning, a key analytical challenge is to integrate these datasets into a harmonized atlas that can be used to better understand cellular identity and function. Here, we develop a computational strategy to "anchor" diverse datasets together, enabling us to integrate and compare single cell measurements not only across scRNA-seq technologies, but different modalities as well. After demonstrating substantial improvement over existing methods for data integration, we anchor scRNA-seq experiments with scATAC-seq datasets to explore chromatin differences in closely related interneuron subsets, and project single cell protein measurements onto a human bone marrow atlas to annotate and characterize lymphocyte populations. Lastly, we demonstrate how anchoring can harmonize in-situ gene expression and scRNA-seq datasets, allowing for the transcriptome-wide imputation of spatial gene expression patterns, and the identification of spatial relationships between mapped cell types in the visual cortex. Our work presents a strategy for comprehensive integration of single cell data, including the assembly of harmonized references, and the transfer of information across datasets. Availability: Installation instructions, documentation, and tutorials are available at: https://www.satijalab.org/seurat

2,037 citations


Cites methods from "deepTools2: a next generation web s..."

  • ...We created normalized read coverage tracks (bigwig format) for each BAM file using the program bamCoverage in the deepTools package [Ramı́rez et al., 2016] with the binSize parameter set to 1 and using the reads per kilobase per million mapped reads (RPKM) normalization option....

    [...]

Journal ArticleDOI
05 May 2017-Science
TL;DR: This work systematically analyzed binding specificities of full-length transcription factors and extended DNA binding domains to unmethylated and CpG-methylated DNA by using methylation-sensitive SELEX (systematic evolution of ligands by exponential enrichment).
Abstract: INTRODUCTION Nearly all cells in the human body share the same primary genome sequence consisting of four nucleotide bases. One of the bases, cytosine, is commonly modified by methylation of its 5 position in CpG dinucleotides (mCpG). Most CpG dinucleotides in the human genome are methylated, but the level of CpG methylation varies with genetic location (promoter versus gene body), whether genes are active versus silenced, and cell type. Research has shown that the maintenance of a particular cellular state after cell division is dependent on faithful transmission of methylated CpGs, as well as inheritance of the mother cells’ repertoire of transcription factors by the daughter cells. These two mechanisms of epigenetic inheritance are linked to each other; the binding of transcription factors can be affected by cytosine methylation, and cytosine methylation can, in turn, be added or removed by proteins that associate with transcription factors. RATIONALE The genetic and epigenetic language, which imparts when and where genes are expressed, is understood at a conceptual level. However, a more detailed understanding is needed of the genomic regulatory mechanism by which methylated cytosines affect transcription factor binding. Because cytosine methylation changes DNA structure, it has the potential to affect binding of all transcription factors. However, a systematic analysis of binding of a large collection of transcription factors to all possible DNA sequences has not previously been conducted. RESULTS To globally characterize the effect of cytosine methylation on transcription factor binding, we systematically analyzed binding specificities of full-length transcription factors and extended DNA binding domains to unmethylated and CpG-methylated DNA by using methylation-sensitive SELEX (systematic evolution of ligands by exponential enrichment). We evaluated binding of 542 transcription factors and identified a large number of previously uncharacterized transcription factor recognition motifs. Binding of most major classes of transcription factors, including bHLH, bZIP, and ETS, was inhibited by mCpG. In contrast, transcription factors such as homeodomain, POU, and NFAT proteins preferred to bind methylated DNA. This class of binding was enriched in factors with central roles in embryonic and organismal development. The observed binding preferences were validated using several orthogonal methods, including bisulfite-SELEX and protein-binding microarrays. In addition, the preference of the pluripotency factor OCT4 to bind to a mCpG-containing motif was confirmed by chromatin immunoprecipitation analysis in mouse embryonic stem cells with low or high levels of CpG methylation (due to deficiency in all enzymes that methylate cytosines or contribute to their removal, respectively). Crystal structure analysis of the homeodomain proteins HOXB13, CDX1, CDX2, and LHX4 revealed three key residues that contribute to the preference of this developmentally important family of transcription factors for mCpG. The preference for binding to mCpG was due to direct hydrophobic interactions with the 5-methyl group of methylcytosine. In contrast, inhibition of binding of other transcription factors to methylated sequences was found to be caused by steric hindrance. CONCLUSION Our work constitutes a global analysis of the effect of cytosine methylation on DNA binding specificities of human transcription factors. CpG methylation can influence binding of most transcription factors to DNA—in some cases negatively and in others positively. Our finding that many developmentally important transcription factors prefer to bind to mCpG sites can inform future analyses of the role of DNA methylation on cell differentiation, chromatin reprogramming, and transcriptional regulation.

846 citations

References
More filters
Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: It is demonstrated in macrophages and B cells that collaborative interactions of the common factor PU.1 with small sets of macrophage- or B cell lineage-determining transcription factors establish cell-specific binding sites that are associated with the majority of promoter-distal H3K4me1-marked genomic regions.

9,620 citations


Additional excerpts

  • ...The rapidly increasing diversity of experimental assays using high-throughput sequencing has led to a concomitant increase in the number of analysis packages that allow for insightful visualization and downstream analyses (e.g. ChAsE (1), the ChIP-seq web server (http://ccg. vital-it.ch/chipseq), Genomation (2), Homer (3), ngs.plot (4))....

    [...]

  • ...ch/chipseq), Genomation (2), Homer (3), ngs....

    [...]

Journal Article
01 Jan 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

8,106 citations

Journal ArticleDOI
Anshul Kundaje1, Wouter Meuleman2, Wouter Meuleman1, Jason Ernst3, Misha Bilenky4, Angela Yen2, Angela Yen1, Alireza Heravi-Moussavi4, Pouya Kheradpour1, Pouya Kheradpour2, Zhizhuo Zhang1, Zhizhuo Zhang2, Jianrong Wang1, Jianrong Wang2, Michael J. Ziller2, Viren Amin5, John W. Whitaker, Matthew D. Schultz6, Lucas D. Ward2, Lucas D. Ward1, Abhishek Sarkar1, Abhishek Sarkar2, Gerald Quon1, Gerald Quon2, Richard Sandstrom7, Matthew L. Eaton2, Matthew L. Eaton1, Yi-Chieh Wu2, Yi-Chieh Wu1, Andreas R. Pfenning2, Andreas R. Pfenning1, Xinchen Wang1, Xinchen Wang2, Melina Claussnitzer1, Melina Claussnitzer2, Yaping Liu2, Yaping Liu1, Cristian Coarfa5, R. Alan Harris5, Noam Shoresh2, Charles B. Epstein2, Elizabeta Gjoneska1, Elizabeta Gjoneska2, Danny Leung8, Wei Xie8, R. David Hawkins8, Ryan Lister6, Chibo Hong9, Philippe Gascard9, Andrew J. Mungall4, Richard A. Moore4, Eric Chuah4, Angela Tam4, Theresa K. Canfield7, R. Scott Hansen7, Rajinder Kaul7, Peter J. Sabo7, Mukul S. Bansal2, Mukul S. Bansal1, Mukul S. Bansal10, Annaick Carles4, Jesse R. Dixon8, Kai How Farh2, Soheil Feizi2, Soheil Feizi1, Rosa Karlic11, Ah Ram Kim2, Ah Ram Kim1, Ashwinikumar Kulkarni12, Daofeng Li13, Rebecca F. Lowdon13, Ginell Elliott13, Tim R. Mercer14, Shane Neph7, Vitor Onuchic5, Paz Polak15, Paz Polak2, Nisha Rajagopal8, Pradipta R. Ray12, Richard C Sallari2, Richard C Sallari1, Kyle Siebenthall7, Nicholas A Sinnott-Armstrong1, Nicholas A Sinnott-Armstrong2, Michael Stevens13, Robert E. Thurman7, Jie Wu16, Bo Zhang13, Xin Zhou13, Arthur E. Beaudet5, Laurie A. Boyer1, Philip L. De Jager2, Philip L. De Jager15, Peggy J. Farnham17, Susan J. Fisher9, David Haussler18, Steven J.M. Jones4, Steven J.M. Jones19, Wei Li5, Marco A. Marra4, Michael T. McManus9, Shamil R. Sunyaev2, Shamil R. Sunyaev15, James A. Thomson20, Thea D. Tlsty9, Li-Huei Tsai1, Li-Huei Tsai2, Wei Wang, Robert A. Waterland5, Michael Q. Zhang21, Lisa Helbling Chadwick22, Bradley E. Bernstein2, Bradley E. Bernstein15, Bradley E. Bernstein6, Joseph F. Costello9, Joseph R. Ecker11, Martin Hirst4, Alexander Meissner2, Aleksandar Milosavljevic5, Bing Ren8, John A. Stamatoyannopoulos7, Ting Wang13, Manolis Kellis2, Manolis Kellis1 
19 Feb 2015-Nature
TL;DR: It is shown that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease.
Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

5,037 citations


"deepTools2: a next generation web s..." refers background in this paper

  • ...Since deepTools employ a high level of parallelization for the computationally most expensive tasks, they are well suited to work with a large number of samples emerging from large-scale data production centers (12,13) or single-cell sequencing (14)....

    [...]