scispace - formally typeset
Search or ask a question

Showing papers by "Enis Afgan published in 2012"


Journal ArticleDOI
TL;DR: The presented solution improves accessibility of cloud resources, tools, and data to the level of an individual researcher and contributes toward reproducibility and transparency of research solutions.
Abstract: Background: Cloud computing provides an infrastructure that facilitates large scale computational analysis in a scalable, democratized fashion, However, in this context it is difficult to ensure sharing of an analysis environment and associated data in a scalable and precisely reproducible way. Results: CloudMan (usecloudman.org) enables individual researchers to easily deploy, customize, and share their entire cloud analysis environment, including data, tools, and configurations. Conclusions: With the enabled customization and sharing of instances, CloudMan can be used as a platform for collaboration. The presented solution improves accessibility of cloud resources, tools, and data to the level of an individual researcher and contributes toward reproducibility and transparency of research solutions.

48 citations


Journal ArticleDOI
TL;DR: This unit demonstrates how to utilize cloud computing resources to perform open‐ended bioinformatic analyses, with fully automated management of the underlying cloud infrastructure, using three projects, CloudBioLinux, CloudMan, and Galaxy.
Abstract: Cloud computing has revolutionized availability and access to computing and storage resources, making it possible to provision a large computational infrastructure with only a few clicks in a Web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this unit, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatic analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy, into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to set up the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command-line interface, and the Web-based Galaxy interface.

25 citations


Proceedings Article
21 May 2012
TL;DR: This paper demonstrates how CloudMan (http://usecloudman.org) can be used to provide complete and complex tool execution environments for making cloud resources functional for a desired domain.
Abstract: Cloud computing has revolutionized how availability and access to computing and storage resources is realized; it has made it possible to provision a large computational infrastructure in a matter of minutes, all through a web browser. What it has not yet solved is accessibility of a tool execution environment where tools and data can easily be added and used in non-trivial scenarios. In this paper, we demonstrate how CloudMan (http://usecloudman.org) can be used to provide complete and complex tool execution environments for making cloud resources functional for a desired domain.

24 citations


Journal ArticleDOI
TL;DR: A reference model for deploying applications into virtualized environments is described and it describes how to compose those otherwise dispersed components into a coherent unit and imposes minimal overhead on management of the infrastructure required to run the application.
Abstract: Modern scientific research has been revolutionized by the availability of powerful and flexible computational infrastructure. Virtualization has made it possible to acquire computational resources on demand. Establishing and enabling use of these environments is essential, but their widespread adoption will only succeed if they are transparently usable. Requiring changes to applications being deployed or requiring users to change how they utilize those applications represent barriers to the infrastructure acceptance. The problem lies in the process of deploying applications so that they can take advantage of the elasticity of the environment and deliver it transparently to users. Here, we describe a reference model for deploying applications into virtualized environments. The model is rooted in the low-level components common to a range of virtualized environments and it describes how to compose those otherwise dispersed components into a coherent unit. Use of the model enables applications to be deployed into the new environment without any modifications, it imposes minimal overhead on management of the infrastructure required to run the application, and yields a set of higher-level services as a byproduct of the component organization and the underlying infrastructure. We provide a fully functional sample application deployment and implement a framework for managing the overall application deployment. Copyright © 2011 John Wiley & Sons, Ltd.

11 citations


Journal ArticleDOI
TL;DR: In this paper, a tool that automatically manipulates and understands job submission parameters to realize a range of job execution alternatives across a distributed compute infrastructure is presented to a user at the time of job submission in the form of tradeoffs mapped onto two conflicting objectives, namely job cost and runtime.
Abstract: Growth in availability of data collection devices has allowed individual researchers to gain access to large quantities of data that needs to be analyzed. As a result, many labs and departments have acquired considerable compute resources. However, effective and efficient utilization of those resources remains a barrier for the individual researchers because the distributed computing environments are difficult to understand and control. We introduce a methodology and a tool that automatically manipulates and understands job submission parameters to realize a range of job execution alternatives across a distributed compute infrastructure. Generated alternatives are presented to a user at the time of job submission in the form of tradeoffs mapped onto two conflicting objectives, namely job cost and runtime. Such presentation of job execution alternatives allows a user to immediately and quantitatively observe viable options regarding their job execution, and thus allows the user to interact with the environment at a true service level. Generated job execution alternatives have been tested through simulation and on real-world resources and, in both cases, the average accuracy of the runtime of the generated and perceived job alternatives is within 5%.

9 citations


Proceedings ArticleDOI
11 Nov 2012
TL;DR: How bioinformatics training on Bio-Linux is helping to bridge the data production and analysis gap is discussed.
Abstract: Because of the ever-increasing application of next-generation sequencing (NGS) in research, and the expectation of faster experiment turn-around, it is becoming unfeasible and unscalable for analysis to be done exclusively by existing trained bioinformaticians. Instead, researchers and bench biologists are performing at least parts of most analyses. In order for this to be realized, two conditions must be satisfied: (1) well designed and accessible tools need to be made available, and (2) researchers and biologists need to be trained to use such tools in order to confidently handle high volumes of NGS data. Bio-Linux is a fully featured, powerful, configurable and easy to maintain bioinformatics workstation and helps on both counts by offering well over one hundred bioinformatics tools packaged into a single distribution, easily accessible and readily usable. Bio-Linux is also accessible in the form of virtual images or on the cloud, thus providing researchers with immediate access to scalable compute infrastructure required to run the analysis. Furthermore this paper discusses how bioinformatics training on Bio-Linux is helping to bridge the data production and analysis gap.

4 citations


01 Jan 2012
TL;DR: This work introduces a methodology and a tool that automatically manipulates and understands job submission parameters to realize a range of job execution alternatives across a distributed compute infrastructure.
Abstract: Growth in availability of data collection devices has allowed individual researchers to gain access to large quantities of data that needs to be analyzed. As a result, many labs and departments have acquired considerable compute resources. However, effective and efficient utilization of those resources remains a barrier for the individual researchers because the distributed computing environments are difficult to understand and control. We introduce a methodology and a tool that automatically manipulates and understands job submission parameters to realize a range of job execution alternatives across a distributed compute infrastructure. Generated alternatives are presented to a user at the time of job submission in the form of tradeoffs mapped onto two conflicting objectives, namely job cost and runtime. Such presentation of job execution alternatives allows a user to immediately and quantitatively observe viable options regarding their job execution, and thus allows the user to interact with the environment at a true service level. Generated job execution alternatives have been tested through simulation and on real-world resources and, in both cases, the average accuracy of the runtime of the generated and perceived job alternatives is within 5%.

1 citations