deepTools2: a next generation web server for deep-sequencing data analysis

doi:10.1093/NAR/GKW257

Home
/
Papers
/
deepTools2: a next generation web server for deep-sequencing data analysis

Journal Article•DOI•

deepTools2: a next generation web server for deep-sequencing data analysis

Fidel Ramírez¹, Devon Ryan¹, Björn Grüning², Vivek Bhardwaj¹, Fabian Kilpert¹, Andreas S. Richter¹, Steffen Heyne¹, Friederike Dündar³, Thomas Manke¹ - Show less +5 more•Institutions (3)

Max Planck Society¹, University of Freiburg², Cornell University³

08 Jul 2016-Nucleic Acids Research (Oxford University Press)-Vol. 44

TL;DR: An update to the Galaxy-based web server deepTools, which allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches, is presented.

read less

Abstract: We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Comprehensive Integration of Single-Cell Data.

[...]

Tim Stuart, Andrew Butler¹, Paul J. Hoffman, Christoph Hafemeister, Efthymia Papalexi¹, William M. Mauck¹, Yuhan Hao¹, Marlon Stoeckius², Peter Smibert², Rahul Satija¹ - Show less +6 more•Institutions (2)

New York University¹, Harvard University²

13 Jun 2019-Cell

TL;DR: A strategy to "anchor" diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities.

...read moreread less

7,892 citations

Cites methods from "deepTools2: a next generation web s..."

...We created normalized read coverage tracks (bigwig format) for each BAM file using the program bamCoverage in the deepTools package [Ramı́rez et al., 2016] with the binSize parameter set to 1 and using the reads per kilobase per million mapped reads (RPKM) normalization option....
[...]
...1.2 Ramı́rez et al., 2016 https://github.com/deeptools/deepTools GOstats v2....
[...]
...We created normalized read coverage tracks (bigwig format) for each BAM file using the program bamCoverage in the deepTools package [Ramı́rez et al., 2016] with the binSize parameter set to 1 and using the reads per kilobase per million mapped reads (RPKM) normalization option. e10 Cell 177,…...
[...]

Integrative analysis of 111 reference human epigenomes

[...]

Anshul Kundaje, Wouter Meuleman, Jason Ernst, Angela Yen, Pouya Kheradpour, Zhizhuo Zhang, Jianrong Wang, Lucas D. Ward, Abhishek Sarkar, Gerald Quon, Matthew L. Eaton, Yi-Chieh Wu, Andreas R. Pfenning, Xinchen Wang, Melina Claussnitzer, Yaping Liu, Mukul S. Bansal, Soheil Feizi-Khankandi, Ah Ram Kim, Richard C Sallari, Nicholas A Sinnott-Armstrong, Laurie A. Boyer, Elizabeta Gjoneska, Li-Huei Tsai, Manolis Kellis - Show less +21 more

01 Feb 2015

TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.

...read moreread less

Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

...read moreread less

4,409 citations

Journal Article•DOI•

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update

[...]

Enis Afgan¹, Dannon Baker¹, Bérénice Batut², Marius van den Beek³, Dave Bouvier⁴, Martin Čech⁴, John Chilton⁴, Dave Clements¹, Nate Coraor⁴, Björn Grüning², Aysam Guerler¹, Jennifer Hillman-Jackson⁴, Saskia Hiltemann⁵, Vahid Jalili⁶, Helena Rasche², Nicola Soranzo⁷, Jeremy Goecks⁶, James Taylor¹, Anton Nekrutenko⁴, Daniel Blankenberg⁸ - Show less +16 more•Institutions (8)

Johns Hopkins University¹, University of Freiburg², PSL Research University³, Pennsylvania State University⁴, Erasmus University Rotterdam⁵, Oregon Health & Science University⁶, Norwich Research Park⁷, Cleveland Clinic Lerner Research Institute⁸

02 Jul 2018-Nucleic Acids Research

TL;DR: Improvements to Galaxy's core framework, user interface, tools, and training materials enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed.

...read moreread less

Abstract: Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially.

...read moreread less

2,601 citations

Posted Content•DOI•

Comprehensive integration of single cell data

[...]

Tim Stuart, Andrew Butler¹, Paul J. Hoffman, Christoph Hafemeister, Efthymia Papalexi¹, William M. Mauck¹, Marlon Stoeckius², Peter Smibert², Rahul Satija¹ - Show less +5 more•Institutions (2)

New York University¹, Harvard University²

02 Nov 2018-bioRxiv

TL;DR: This work presents a strategy for comprehensive integration of single cell data, including the assembly of harmonized references, and the transfer of information across datasets, and demonstrates how anchoring can harmonize in-situ gene expression and scRNA-seq datasets.

...read moreread less

Abstract: Single cell transcriptomics (scRNA-seq) has transformed our ability to discover and annotate cell types and states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, including high-dimensional immunophenotypes, chromatin accessibility, and spatial positioning, a key analytical challenge is to integrate these datasets into a harmonized atlas that can be used to better understand cellular identity and function. Here, we develop a computational strategy to "anchor" diverse datasets together, enabling us to integrate and compare single cell measurements not only across scRNA-seq technologies, but different modalities as well. After demonstrating substantial improvement over existing methods for data integration, we anchor scRNA-seq experiments with scATAC-seq datasets to explore chromatin differences in closely related interneuron subsets, and project single cell protein measurements onto a human bone marrow atlas to annotate and characterize lymphocyte populations. Lastly, we demonstrate how anchoring can harmonize in-situ gene expression and scRNA-seq datasets, allowing for the transcriptome-wide imputation of spatial gene expression patterns, and the identification of spatial relationships between mapped cell types in the visual cortex. Our work presents a strategy for comprehensive integration of single cell data, including the assembly of harmonized references, and the transfer of information across datasets. Availability: Installation instructions, documentation, and tutorials are available at: https://www.satijalab.org/seurat

...read moreread less

2,037 citations

Cites methods from "deepTools2: a next generation web s..."

...We created normalized read coverage tracks (bigwig format) for each BAM file using the program bamCoverage in the deepTools package [Ramı́rez et al., 2016] with the binSize parameter set to 1 and using the reads per kilobase per million mapped reads (RPKM) normalization option....
[...]

Journal Article•DOI•

Impact of cytosine methylation on DNA binding specificities of human transcription factors.

[...]

Yimeng Yin¹, Ekaterina Morgunova¹, Arttu Jolma¹, Eevi Kaasinen¹, Biswajyoti Sahu², Syed Khund-Sayeed³, Pratyush Kumar Das², Teemu Kivioja², Kashyap Dave¹, Fan Zhong¹, Kazuhiro R. Nitta¹, Minna Taipale¹, Alexander Popov⁴, Paul A. Ginno⁵, Silvia Domcke⁶, Silvia Domcke⁵, Jian Yan¹, Dirk Schübeler⁵, Dirk Schübeler⁶, Charles Vinson³, Jussi Taipale¹, Jussi Taipale² - Show less +18 more•Institutions (6)

Karolinska Institutet¹, University of Helsinki², National Institutes of Health³, European Synchrotron Radiation Facility⁴, Friedrich Miescher Institute for Biomedical Research⁵, University of Basel⁶

05 May 2017-Science

TL;DR: This work systematically analyzed binding specificities of full-length transcription factors and extended DNA binding domains to unmethylated and CpG-methylated DNA by using methylation-sensitive SELEX (systematic evolution of ligands by exponential enrichment).

...read moreread less

Abstract: INTRODUCTION Nearly all cells in the human body share the same primary genome sequence consisting of four nucleotide bases. One of the bases, cytosine, is commonly modified by methylation of its 5 position in CpG dinucleotides (mCpG). Most CpG dinucleotides in the human genome are methylated, but the level of CpG methylation varies with genetic location (promoter versus gene body), whether genes are active versus silenced, and cell type. Research has shown that the maintenance of a particular cellular state after cell division is dependent on faithful transmission of methylated CpGs, as well as inheritance of the mother cells’ repertoire of transcription factors by the daughter cells. These two mechanisms of epigenetic inheritance are linked to each other; the binding of transcription factors can be affected by cytosine methylation, and cytosine methylation can, in turn, be added or removed by proteins that associate with transcription factors. RATIONALE The genetic and epigenetic language, which imparts when and where genes are expressed, is understood at a conceptual level. However, a more detailed understanding is needed of the genomic regulatory mechanism by which methylated cytosines affect transcription factor binding. Because cytosine methylation changes DNA structure, it has the potential to affect binding of all transcription factors. However, a systematic analysis of binding of a large collection of transcription factors to all possible DNA sequences has not previously been conducted. RESULTS To globally characterize the effect of cytosine methylation on transcription factor binding, we systematically analyzed binding specificities of full-length transcription factors and extended DNA binding domains to unmethylated and CpG-methylated DNA by using methylation-sensitive SELEX (systematic evolution of ligands by exponential enrichment). We evaluated binding of 542 transcription factors and identified a large number of previously uncharacterized transcription factor recognition motifs. Binding of most major classes of transcription factors, including bHLH, bZIP, and ETS, was inhibited by mCpG. In contrast, transcription factors such as homeodomain, POU, and NFAT proteins preferred to bind methylated DNA. This class of binding was enriched in factors with central roles in embryonic and organismal development. The observed binding preferences were validated using several orthogonal methods, including bisulfite-SELEX and protein-binding microarrays. In addition, the preference of the pluripotency factor OCT4 to bind to a mCpG-containing motif was confirmed by chromatin immunoprecipitation analysis in mouse embryonic stem cells with low or high levels of CpG methylation (due to deficiency in all enzymes that methylate cytosines or contribute to their removal, respectively). Crystal structure analysis of the homeodomain proteins HOXB13, CDX1, CDX2, and LHX4 revealed three key residues that contribute to the preference of this developmentally important family of transcription factors for mCpG. The preference for binding to mCpG was due to direct hydrophobic interactions with the 5-methyl group of methylcytosine. In contrast, inhibition of binding of other transcription factors to methylated sequences was found to be caused by steric hindrance. CONCLUSION Our work constitutes a global analysis of the effect of cytosine methylation on DNA binding specificities of human transcription factors. CpG methylation can influence binding of most transcription factors to DNA—in some cases negatively and in others positively. Our finding that many developmentally important transcription factors prefer to bind to mCpG sites can inform future analyses of the role of DNA methylation on cell differentiation, chromatin reprogramming, and transcriptional regulation.

...read moreread less

846 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities

[...]

Sven Heinz¹, Christopher Benner¹, Nathanael J. Spann¹, Eric Bertolino², Yin C. Lin¹, Peter Laslo³, Jason X. Cheng², Cornelis Murre¹, Harinder Singh⁴, Harinder Singh², Christopher K. Glass¹ - Show less +7 more•Institutions (4)

University of California, San Diego¹, University of Chicago², University of Leeds³, Genentech⁴

28 May 2010-Molecular Cell

TL;DR: It is demonstrated in macrophages and B cells that collaborative interactions of the common factor PU.1 with small sets of macrophage- or B cell lineage-determining transcription factors establish cell-specific binding sites that are associated with the majority of promoter-distal H3K4me1-marked genomic regions.

...read moreread less

9,620 citations

Additional excerpts

...The rapidly increasing diversity of experimental assays using high-throughput sequencing has led to a concomitant increase in the number of analysis packages that allow for insightful visualization and downstream analyses (e.g. ChAsE (1), the ChIP-seq web server (http://ccg. vital-it.ch/chipseq), Genomation (2), Homer (3), ngs.plot (4))....
[...]
...ch/chipseq), Genomation (2), Homer (3), ngs....
[...]

Journal Article•

An integrated encyclopedia of DNA elements in the human genome.

[...]

ENCODEConsortium

01 Jan 2012-Nature

...read moreread less

8,106 citations

Journal Article•DOI•

Integrative analysis of 111 reference human epigenomes

[...]

Anshul Kundaje¹, Wouter Meuleman², Wouter Meuleman¹, Jason Ernst³, Misha Bilenky⁴, Angela Yen², Angela Yen¹, Alireza Heravi-Moussavi⁴, Pouya Kheradpour¹, Pouya Kheradpour², Zhizhuo Zhang¹, Zhizhuo Zhang², Jianrong Wang¹, Jianrong Wang², Michael J. Ziller², Viren Amin⁵, John W. Whitaker, Matthew D. Schultz⁶, Lucas D. Ward², Lucas D. Ward¹, Abhishek Sarkar¹, Abhishek Sarkar², Gerald Quon¹, Gerald Quon², Richard Sandstrom⁷, Matthew L. Eaton², Matthew L. Eaton¹, Yi-Chieh Wu², Yi-Chieh Wu¹, Andreas R. Pfenning², Andreas R. Pfenning¹, Xinchen Wang¹, Xinchen Wang², Melina Claussnitzer¹, Melina Claussnitzer², Yaping Liu², Yaping Liu¹, Cristian Coarfa⁵, R. Alan Harris⁵, Noam Shoresh², Charles B. Epstein², Elizabeta Gjoneska¹, Elizabeta Gjoneska², Danny Leung⁸, Wei Xie⁸, R. David Hawkins⁸, Ryan Lister⁶, Chibo Hong⁹, Philippe Gascard⁹, Andrew J. Mungall⁴, Richard A. Moore⁴, Eric Chuah⁴, Angela Tam⁴, Theresa K. Canfield⁷, R. Scott Hansen⁷, Rajinder Kaul⁷, Peter J. Sabo⁷, Mukul S. Bansal², Mukul S. Bansal¹, Mukul S. Bansal¹⁰, Annaick Carles⁴, Jesse R. Dixon⁸, Kai How Farh², Soheil Feizi², Soheil Feizi¹, Rosa Karlic¹¹, Ah Ram Kim², Ah Ram Kim¹, Ashwinikumar Kulkarni¹², Daofeng Li¹³, Rebecca F. Lowdon¹³, Ginell Elliott¹³, Tim R. Mercer¹⁴, Shane Neph⁷, Vitor Onuchic⁵, Paz Polak¹⁵, Paz Polak², Nisha Rajagopal⁸, Pradipta R. Ray¹², Richard C Sallari², Richard C Sallari¹, Kyle Siebenthall⁷, Nicholas A Sinnott-Armstrong¹, Nicholas A Sinnott-Armstrong², Michael Stevens¹³, Robert E. Thurman⁷, Jie Wu¹⁶, Bo Zhang¹³, Xin Zhou¹³, Arthur E. Beaudet⁵, Laurie A. Boyer¹, Philip L. De Jager², Philip L. De Jager¹⁵, Peggy J. Farnham¹⁷, Susan J. Fisher⁹, David Haussler¹⁸, Steven J.M. Jones⁴, Steven J.M. Jones¹⁹, Wei Li⁵, Marco A. Marra⁴, Michael T. McManus⁹, Shamil R. Sunyaev², Shamil R. Sunyaev¹⁵, James A. Thomson²⁰, Thea D. Tlsty⁹, Li-Huei Tsai¹, Li-Huei Tsai², Wei Wang, Robert A. Waterland⁵, Michael Q. Zhang²¹, Lisa Helbling Chadwick²², Bradley E. Bernstein², Bradley E. Bernstein¹⁵, Bradley E. Bernstein⁶, Joseph F. Costello⁹, Joseph R. Ecker¹¹, Martin Hirst⁴, Alexander Meissner², Aleksandar Milosavljevic⁵, Bing Ren⁸, John A. Stamatoyannopoulos⁷, Ting Wang¹³, Manolis Kellis², Manolis Kellis¹ - Show less +120 more•Institutions (22)

Massachusetts Institute of Technology¹, Broad Institute², University of California, Los Angeles³, University of British Columbia⁴, Baylor College of Medicine⁵, Howard Hughes Medical Institute⁶, University of Washington⁷, Ludwig Institute for Cancer Research⁸, University of California, San Francisco⁹, University of Connecticut¹⁰, University of Zagreb¹¹, University of Texas at Austin¹², Washington University in St. Louis¹³, University of Queensland¹⁴, Harvard University¹⁵, Cold Spring Harbor Laboratory¹⁶, University of Southern California¹⁷, University of California, Santa Cruz¹⁸, Simon Fraser University¹⁹, Morgridge Institute for Research²⁰, University of Texas at Dallas²¹, National Institutes of Health²²

19 Feb 2015-Nature

TL;DR: It is shown that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease.

...read moreread less

5,037 citations

"deepTools2: a next generation web s..." refers background in this paper

...Since deepTools employ a high level of parallelization for the computationally most expensive tasks, they are well suited to work with a large number of samples emerging from large-scale data production centers (12,13) or single-cell sequencing (14)....
[...]