Home
/
Authors
/
Daniel Blankenberg

Author

Daniel Blankenberg

Cleveland Clinic Lerner Research Institute

Other affiliations: Watson School of Biological Sciences, University of Minnesota, Cleveland Clinic Lerner College of Medicine ...read more

Bio: Daniel Blankenberg is an academic researcher from Cleveland Clinic Lerner Research Institute. The author has contributed to research in topics: Medicine & Computer science. The author has an hindex of 22, co-authored 45 publications receiving 9193 citations. Previous affiliations of Daniel Blankenberg include Watson School of Biological Sciences & University of Minnesota.

Topics: Medicine, Computer science, Workflow, Software, Biology ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2012
2011
2010
2007
2005

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update

[...]

Enis Afgan¹, Dannon Baker¹, Bérénice Batut², Marius van den Beek³, Dave Bouvier⁴, Martin Čech⁴, John Chilton⁴, Dave Clements¹, Nate Coraor⁴, Björn Grüning², Aysam Guerler¹, Jennifer Hillman-Jackson⁴, Saskia Hiltemann⁵, Vahid Jalili⁶, Helena Rasche², Nicola Soranzo⁷, Jeremy Goecks⁶, James Taylor¹, Anton Nekrutenko⁴, Daniel Blankenberg⁸ - Show less +16 more•Institutions (8)

Johns Hopkins University¹, University of Freiburg², PSL Research University³, Pennsylvania State University⁴, Erasmus University Rotterdam⁵, Oregon Health & Science University⁶, Norwich Research Park⁷, Cleveland Clinic Lerner Research Institute⁸

02 Jul 2018-Nucleic Acids Research

TL;DR: Improvements to Galaxy's core framework, user interface, tools, and training materials enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed.

...read moreread less

Abstract: Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially.

...read moreread less

2,601 citations

Journal Article•DOI•

Galaxy: A platform for interactive large-scale genome analysis

[...]

Belinda Giardine¹, Cathy Riemer¹, Ross C. Hardison, Richard Burhans¹, Laura Elnitski², Prachi Shah¹, Prachi Shah², Yi Zhang¹, Daniel Blankenberg, Istvan Albert, James Taylor¹, Webb Miller¹, W. James Kent³, Anton Nekrutenko - Show less +10 more•Institutions (3)

Pennsylvania State University¹, National Institutes of Health², University of California, Santa Cruz³

01 Oct 2005-Genome Research

TL;DR: An interactive system, Galaxy, that combines the power of existing genome annotation databases with a simple Web portal to enable users to search remote resources, combine data from independent queries, and visualize the results.

...read moreread less

Abstract: Accessing and analyzing the exponentially expanding genomic sequence and functional data pose a challenge for biomedical researchers. Here we describe an interactive system, Galaxy, that combines the power of existing genome annotation databases with a simple Web portal to enable users to search remote resources, combine data from independent queries, and visualize the results. The heart of Galaxy is a flexible history system that stores the queries from each user; performs operations such as intersections, unions, and subtractions; and links to other computational tools. Galaxy can be accessed at http://g2.bx.psu.edu.

...read moreread less

2,071 citations

Journal Article•DOI•

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update

[...]

Enis Afgan¹, Dannon Baker¹, Marius van den Beek², Daniel Blankenberg³, Dave Bouvier³, Martin Čech³, John Chilton³, Dave Clements¹, Nate Coraor³, Carl Eberhard¹, Björn Grüning⁴, Aysam Guerler¹, Jennifer Hillman-Jackson³, Gregory Von Kuster³, Eric Rasche⁵, Nicola Soranzo⁶, Nitesh Turaga¹, James Taylor¹, Anton Nekrutenko³, Jeremy Goecks⁷ - Show less +16 more•Institutions (7)

Johns Hopkins University¹, Pierre-and-Marie-Curie University², Pennsylvania State University³, University of Freiburg⁴, Texas A&M University⁵, Norwich University⁶, George Washington University⁷

08 Jul 2016-Nucleic Acids Research

TL;DR: Galaxy seeks to make data-intensive research more accessible, transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication, or reuse.

...read moreread less

Abstract: High-throughput data production technologies, particularly 'next-generation' DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods , as well as substantial computational power. This has led to an acute crisis in life sciences, as researchers without informatics training attempt to perform computation-dependent analyses. Since 2005, the Galaxy project has worked to address this problem by providing a framework that makes advanced computational tools usable by non experts. Galaxy seeks to make data-intensive research more accessible , transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication , or reuse. In this report we highlight recently added features enabling biomedical analyses on a large scale.

...read moreread less

1,774 citations

Journal Article•DOI•

Galaxy: A Web‐Based Genome Analysis Tool for Experimentalists

[...]

Daniel Blankenberg¹, Gregory Von Kuster¹, Nathaniel Coraor¹, Guruprasad Ananda¹, Ross Lazarus¹, Ross Lazarus², Mary E. Mangan, Anton Nekrutenko¹, James Taylor¹, James Taylor³ - Show less +6 more•Institutions (3)

Pennsylvania State University¹, Harvard University², Emory University³

01 Jan 2010-Current protocols in molecular biology

TL;DR: Galaxy is a software system that provides informatics support through a framework that gives experimentalists simple interfaces to powerful tools, while automatically managing the computational details.

...read moreread less

Abstract: High-throughput data production has revolutionized molecular biology. However, massive increases in data generation capacity require analysis approaches that are more sophisticated, and often very computationally intensive. Thus, making sense of high-throughput data requires informatics support. Galaxy (http://galaxyproject.org) is a software system that provides this support through a framework that gives experimentalists simple interfaces to powerful tools, while automatically managing the computational details. Galaxy is distributed both as a publicly available Web service, which provides tools for the analysis of genomic, comparative genomic, and functional genomic data, or a downloadable package that can be deployed in individual laboratories. Either way, it allows experimentalists without informatics or programming expertise to perform complex large-scale analysis with just a Web browser.

...read moreread less

1,501 citations

Journal Article•DOI•

Manipulation of FASTQ data with Galaxy

[...]

Daniel Blankenberg¹, Assaf Gordon¹, Gregory Von Kuster¹, Nathan Coraor¹, James Taylor¹, Anton Nekrutenko¹ - Show less +2 more•Institutions (1)

Watson School of Biological Sciences¹

01 Jul 2010-Bioinformatics

TL;DR: A tool suite that functions on all of the commonly known FASTQ format variants and provides a pipeline for manipulating next generation sequencing data taken from a sequencing machine all the way through the quality filtering steps is described.

...read moreread less

Abstract: Summary: Here, we describe a tool suite that functions on all of the commonly known FASTQ format variants and provides a pipeline for manipulating next generation sequencing data taken from a sequencing machine all the way through the quality filtering steps. Availability and Implementation: This open-source toolset was implemented in Python and has been integrated into the online data analysis platform Galaxy (public web access: http://usegalaxy.org; download: http://getgalaxy.org). Two short movies that highlight the functionality of tools described in this manuscript as well as results from testing components of this tool suite against a set of previously published files are available at http://usegalaxy.org/u/dan/p/fastq Contact:james.taylor@emory.edu; anton@bx.psu.edu Supplementary information:Supplementary data are available at Bioinformatics online.

...read moreread less

630 citations

1
2
3
4
…
5
6
7
8
9
10
11
12

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)

[...]

Glenn Tesler

01 Jun 2012

TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).

...read moreread less

Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

...read moreread less

10,124 citations

Journal Article•DOI•

Metagenomic biomarker discovery and explanation

[...]

Nicola Segata¹, Jacques Izard¹, Jacques Izard², Levi Waldron¹, Dirk Gevers³, Larisa Miropolsky¹, Wendy S. Garrett¹, Curtis Huttenhower¹ - Show less +4 more•Institutions (3)

Harvard University¹, The Forsyth Institute², Broad Institute³

24 Jun 2011-Genome Biology

TL;DR: A new method for metagenomic biomarker discovery is described and validates by way of class comparison, tests of biological consistency and effect size estimation to address the challenge of finding organisms, genes, or pathways that consistently explain the differences between two or more microbial communities.

...read moreread less

Abstract: This study describes and validates a new method for metagenomic biomarker discovery by way of class comparison, tests of biological consistency and effect size estimation. This addresses the challenge of finding organisms, genes, or pathways that consistently explain the differences between two or more microbial communities, which is a central problem to the study of metagenomics. We extensively validate our method on several microbiomes and a convenient online interface for the method is provided at http://huttenhower.sph.harvard.edu/lefse/.

...read moreread less

9,057 citations

Journal Article•DOI•

A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3

[...]

Pablo Cingolani¹, Adrian E. Platts², Le Lily Wang¹, M. Coon¹, Tung T. Nguyen¹, Luan Wang¹, Susan Land¹, Xiangyi Lu¹, Douglas M. Ruden¹ - Show less +5 more•Institutions (2)

Wayne State University¹, McGill University²

01 Apr 2012-Fly

TL;DR: It appears that the 5′ and 3′ UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus.

...read moreread less

Abstract: We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w1118; iso-2; iso-3 strain and the reference y1; cn1 bw1 sp1 strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in...

...read moreread less

8,017 citations

Journal Article•DOI•

Most mammalian mRNAs are conserved targets of microRNAs

[...]

Robin C. Friedman¹, Kyle Kai-How Farh, Christopher B. Burge, David P. Bartel•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2009-Genome Research

TL;DR: This work overhauled its tool for finding preferential conservation of sequence motifs and applied it to the analysis of human 3'UTRs, increasing by nearly threefold the detected number of preferentially conserved miRNA target sites.

...read moreread less

Abstract: MicroRNAs (miRNAs) are small endogenous RNAs that pair to sites in mRNAs to direct post-transcriptional repression. Many sites that match the miRNA seed (nucleotides 2–7), particularly those in 3 untranslated regions (3UTRs), are preferentially conserved. Here, we overhauled our tool for finding preferential conservation of sequence motifs and applied it to the analysis of human 3UTRs, increasing by nearly threefold the detected number of preferentially conserved miRNA target sites. The new tool more efficiently incorporates new genomes and more completely controls for background conservation by accounting for mutational biases, dinucleotide conservation rates, and the conservation rates of individual UTRs. The improved background model enabled preferential conservation of a new site type, the “offset 6mer,” to be detected. In total, >45,000 miRNA target sites within human 3UTRs are conserved above background levels, and >60% of human protein-coding genes have been under selective pressure to maintain pairing to miRNAs. Mammalian-specific miRNAs have far fewer conserved targets than do the more broadly conserved miRNAs, even when considering only more recently emerged targets. Although pairing to the 3 end of miRNAs can compensate for seed mismatches, this class of sites constitutes less than 2% of all preferentially conserved sites detected. The new tool enables statistically powerful analysis of individual miRNA target sites, with the probability of preferentially conserved targeting (PCT) correlating with experimental measurements of repression. Our expanded set of target predictions (including conserved 3-compensatory sites), are available at the TargetScan website, which displays the PCT for each site and each predicted target.

...read moreread less

7,744 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse