Home
/
Authors
/
Darren Marvin

Author

Darren Marvin

Bio: Darren Marvin is an academic researcher from University of Southampton. The author has contributed to research in topics: e-Science & Workflow. The author has an hindex of 10, co-authored 15 publications receiving 2905 citations.

Topics: e-Science, Workflow, Web service, Grid, Service (business) ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Taverna: a tool for the composition and enactment of bioinformatics workflows

[...]

Tom Oinn¹, Matthew Addis², Justin Ferris², Darren Marvin², Martin Senger¹, Mark Greenwood³, Tim Carver⁴, Kevin Glover⁵, Matthew Pocock⁶, Anil Wipat⁶, Peter Li⁶ - Show less +7 more•Institutions (6)

European Bioinformatics Institute¹, University of Southampton², University of Manchester³, Wellcome Trust⁴, Information Technology University⁵, University of Newcastle⁶

22 Nov 2004-Bioinformatics

TL;DR: The Taverna project has developed a tool for the composition and enactment of bioinformatics workflows for the life sciences community that is written in a new language called Scufl, where by each step within a workflow represents one atomic task.

...read moreread less

Abstract: Motivation:In silico experiments in bioinformatics involve the co-ordinated use of computational tools and information repositories. A growing number of these resources are being made available with programmatic access in the form of Web services. Bioinformatics scientists will need to orchestrate these Web services in workflows as part of their analyses. Results: The Taverna project has developed a tool for the composition and enactment of bioinformatics workflows for the life sciences community. The tool includes a workbench application which provides a graphical user interface for the composition of workflows. These workflows are written in a new language called the simple conceptual unified flow language (Scufl), where by each step within a workflow represents one atomic task. Two examples are used to illustrate the ease by which in silico experiments can be represented as Scufl workflows using the workbench application. Availability: The Taverna workflow system is available as open source and can be downloaded with example Scufl workflows from http://taverna.sourceforge.net

...read moreread less

1,709 citations

Journal Article•DOI•

Taverna: lessons in creating a workflow environment for the life sciences

[...]

Tom Oinn¹, R. Mark Greenwood², Matthew Addis³, M. Nedim Alpdemir², Justin Ferris³, Kevin Glover⁴, Carole Goble², Antoon Goderis², Duncan Hull², Darren Marvin³, Peter Li⁵, Phillip Lord², Matthew Pocock⁵, Martin Senger¹, Robert Stevens², Anil Wipat⁵, Chris Wroe² - Show less +13 more•Institutions (5)

European Bioinformatics Institute¹, University of Manchester², University of Southampton³, Information Technology University⁴, University of Newcastle⁵

25 Aug 2006-Concurrency and Computation: Practice and Experience

TL;DR: The Taverna Workbench as discussed by the authors is a Grid environment for the composition and execution of workflows for the life sciences community, which is based on the myGrid project's workbench.

...read moreread less

Abstract: Life sciences research is based on individuals, often with diverse skills, assembled into research groups. These groups use their specialist expertise to address scientific problems. The in silico experiments undertaken by these research groups can be represented as workflows involving the co-ordinated use of analysis programs and information repositories that may be globally distributed. With regards to Grid computing, the requirements relate to the sharing of analysis and information resources rather than sharing computational power. The myGrid project has developed the Taverna Workbench for the composition and execution of workflows for the life sciences community. This experience paper describes lessons learnt during the development of Taverna. A common theme is the importance of understanding how workflows fit into the scientists' experimental context. The lessons reflect an evolving understanding of life scientists' requirements on a workflow environment, which is relevant to other areas of data intensive and exploratory science.

...read moreread less

729 citations

Provenance of e-Science Experiments - Experience from Bioinformatics

[...]

Mark A. Greenwood, Carole Goble, Robert Stevens, Jun Zhao, Matthew Addis, Darren Marvin, Luc Moreau, Tom Oinn, Paul Watson - Show less +5 more

01 Jan 2003

TL;DR: An overview of initial work on the provenance of bioinformatics e-Science experiments within myGrid uses two kinds of provenance: the derivation path of information and annotation and explores how the resulting Webs of experimental data holdings can be mined for useful information and presentations for the e-Scientist.

...read moreread less

Abstract: Like experiments performed at a laboratory bench, the data associated with an e-Science experiment are of reduced value if other scientists are not able to identify the origin, or provenance, of those data. Provenance information is essential if experiments are to be validated and verified by others, or even by those who originally performed them. In this article, we give an overview of our initial work on the provenance of bioinformatics e-Science experiments within myGrid. We use two kinds of provenance: the derivation path of information and annotation. We show how this kind of provenance can be delivered within the myGrid demonstrator WorkBench and we explore how the resulting Webs of experimental data holdings can be mined for useful information and presentations for the e-Scientist.

...read moreread less

126 citations

Journal Issue•DOI•

Taverna: lessons in creating a workflow environment for the life sciences: Research Articles

[...]

Tom Oinn¹, Mark Greenwood², Matthew Addis³, M. Nedim Alpdemir², Justin Ferris³, Kevin Glover⁴, Carole Goble², Antoon Goderis², Duncan Hull², Darren Marvin³, Peter Li⁵, Phillip Lord², Matthew Pocock⁵, Martin Senger¹, Robert Stevens², Anil Wipat⁵, Chris Wroe² - Show less +13 more•Institutions (5)

European Bioinformatics Institute¹, University of Manchester², University of Southampton³, Information Technology University⁴, University of Newcastle⁵

15 Aug 2006-Concurrency and Computation: Practice and Experience

TL;DR: The Taverna Workbench as mentioned in this paper is a Grid environment for the composition and execution of workflows for the life sciences community, which is based on the myGrid project's workbench.

...read moreread less

115 citations

Experiences with e-Science workflow specification and enactment in bioinformatics

[...]

Matthew Addis, Justin Ferris, Mark A. Greenwood, Peter Li, Darren Marvin, Tom Oinn, Anil Wipat - Show less +3 more

01 Jan 2003

TL;DR: The EPSRC funded Grid project has developed a graphical toolset and workflow enactor which uses its own high level representation of a process flow, including specification of processing units, data transfers and execution constraints.

...read moreread less

Abstract: Workflow techniques form an important part of in-silico experimentation within the bioinformatics domain and potentially allow the eScientist to describe and enact their experimental processes in a structured, repeatable and verifiable way. Bioinformaticians routinely use Web-based resources within their in-silico experiments. However, the use of current web service orchestration techniques is problematic, and represents a significant barrier to take-up by the bioinformatics community, due to the rapidly evolving and competing standards, a lack of freely available tools, limited support for interaction with stateful services, and inappropriate levels of abstraction for the bioinformatics domain. As a result, the EPSRC funded Grid[11] project has, in collaboration with the European Bioinformatics Institute and the Human Genome Mapping Project, developed a graphical toolset and workflow enactor which uses its own high level representation of a process flow, including specification of processing units, data transfers and execution constraints.

...read moreread less

58 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

[...]

Jeremy Goecks¹, Anton Nekrutenko², James Taylor¹•Institutions (2)

Emory University¹, Pennsylvania State University²

25 Aug 2010-Genome Biology

TL;DR: Galaxy Pages are interactive, web-based documents that provide users with a medium to communicate a complete computational analysis and provide support for capturing the context and intent of computational methods.

...read moreread less

Abstract: Increased reliance on computational approaches in the life sciences has revealed grave concerns about how accessible and reproducible computation-reliant results truly are. Galaxy http://usegalaxy.org, an open web-based platform for genomic research, addresses these problems. Galaxy automatically tracks and manages data provenance and provides support for capturing the context and intent of computational methods. Galaxy Pages are interactive, web-based documents that provide users with a medium to communicate a complete computational analysis.

...read moreread less

3,576 citations

Journal Article•DOI•

STRING 8—a global view on proteins and their functional interactions in 630 organisms

[...]

Lars Juhl Jensen¹, Michael Kuhn¹, Manuel Stark¹, Samuel Chaffron¹, Christopher J. Creevey¹, Jean Muller¹, T. Doerks¹, Philippe Julien¹, Alexander Roth¹, Milan Simonovic¹, Peer Bork¹, Christian von Mering¹ - Show less +8 more•Institutions (1)

University of Lausanne¹

01 Jan 2009-Nucleic Acids Research

TL;DR: The most important new developments in STRING 8 over previous releases include a URL-based programming interface, improved interaction prediction via genomic neighborhood in prokaryotes, and the inclusion of protein structures.

...read moreread less

Abstract: Functional partnerships between proteins are at the core of complex cellular phenotypes, and the networks formed by interacting proteins provide researchers with crucial scaffolds for modeling, data reduction and annotation. STRING is a database and web resource dedicated to protein–protein interactions, including both physical and functional interactions. It weights and integrates information from numerous sources, including experimental repositories, computational prediction methods and public text collections, thus acting as a metadatabase that maps all interaction evidence onto a common set of genomes and proteins. The most important new developments in STRING 8 over previous releases include a URL-based programming interface, which can be used to query STRING from other resources, improved interaction prediction via genomic neighborhood in prokaryotes, and the inclusion of protein structures. Version 8.0 of STRING covers about 2.5 million proteins from 630 organisms, providing the most comprehensive view on protein–protein interactions currently available. STRING can be reached at http://string-db.org/.

...read moreread less

2,394 citations

Journal Article•DOI•

Snakemake--a scalable bioinformatics workflow engine.

[...]

Johannes Köster¹, Sven Rahmann¹•Institutions (1)

University of Duisburg-Essen¹

01 Oct 2012-Bioinformatics

TL;DR: Snakemake is a workflow engine that provides a readable Python-based workflow definition language and a powerful execution environment that scales from single-core workstations to compute clusters without modifying the workflow.

...read moreread less

Abstract: Snakemake is a workflow engine that provides a readable Python-based workflow definition language and a powerful execution environment that scales from single-core workstations to compute clusters without modifying the workflow. It is the first system to support the use of automatically inferred multiple named wildcards (or variables) in input and output filenames.

...read moreread less

1,932 citations

Journal Article•DOI•

Scientific Workflow Management and the Kepler System

[...]

Bertram Ludäscher¹, Bertram Ludäscher², Ilkay Altintas², Chad Berkley³, Dan Higgins³, Efrat Jaeger², Matthew B. Jones³, Edward A. Lee⁴, Jing Tao², Yang Zhao⁴ - Show less +6 more•Institutions (4)

University of California, Davis¹, San Diego Supercomputer Center², University of California, Santa Barbara³, University of California, Berkeley⁴

25 Aug 2006-Concurrency and Computation: Practice and Experience

TL;DR: Kepler as mentioned in this paper is a scientific workflow system, which is currently under development across a number of scientific data management projects and is a community-driven, open source project, and always welcome related projects and new contributors to join.

...read moreread less

Abstract: Many scientific disciplines are now data and information driven, and new scientific knowledge is often gained by scientists putting together data analysis and knowledge discovery “pipelines”. A related trend is that more and more scientific communities realize the benefits of sharing their data and computational services, and are thus contributing to a distributed data and computational community infrastructure (a.k.a. “the Grid”). However, this infrastructure is only a means to an end and scientists ideally should be bothered little with its existence. The goal is for scientists to focus on development and use of what we call scientific workflows. These are networks of analytical steps that may involve, e.g., database access and querying steps, data analysis and mining steps, and many other steps including computationally intensive jobs on high performance cluster computers. In this paper we describe characteristics of and requirements for scientific workflows as identified in a number of our application projects. We then elaborate on Kepler, a particular scientific workflow system, currently under development across a number of scientific data management projects. We describe some key features of Kepler and its underlying Ptolemyii system, planned extensions, and areas of future research. Kepler is a communitydriven, open source project, and we always welcome related projects and new contributors to join.

...read moreread less

1,926 citations

Journal Article•DOI•

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update

[...]

Enis Afgan¹, Dannon Baker¹, Marius van den Beek², Daniel Blankenberg³, Dave Bouvier³, Martin Čech³, John Chilton³, Dave Clements¹, Nate Coraor³, Carl Eberhard¹, Björn Grüning⁴, Aysam Guerler¹, Jennifer Hillman-Jackson³, Gregory Von Kuster³, Eric Rasche⁵, Nicola Soranzo⁶, Nitesh Turaga¹, James Taylor¹, Anton Nekrutenko³, Jeremy Goecks⁷ - Show less +16 more•Institutions (7)

Johns Hopkins University¹, Pierre-and-Marie-Curie University², Pennsylvania State University³, University of Freiburg⁴, Texas A&M University⁵, Norwich University⁶, George Washington University⁷

08 Jul 2016-Nucleic Acids Research

TL;DR: Galaxy seeks to make data-intensive research more accessible, transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication, or reuse.

...read moreread less

Abstract: High-throughput data production technologies, particularly 'next-generation' DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods , as well as substantial computational power. This has led to an acute crisis in life sciences, as researchers without informatics training attempt to perform computation-dependent analyses. Since 2005, the Galaxy project has worked to address this problem by providing a framework that makes advanced computational tools usable by non experts. Galaxy seeks to make data-intensive research more accessible , transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication , or reuse. In this report we highlight recently added features enabling biomedical analyses on a large scale.

...read moreread less

1,774 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse