scispace - formally typeset
Search or ask a question

Showing papers by "Enis Afgan published in 2010"


Journal ArticleDOI
TL;DR: A cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon’s EC2 cloud infrastructure without any informatics requirements, and provides an automated method for building custom deployments of cloud resources.
Abstract: Background Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is “cloud computing”, which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate “as is” use by experimental biologists.

189 citations


Journal ArticleDOI
TL;DR: With Galaxy CloudMan, an individual researcher can, without any informatics support, gain access to a complete NGS data analysis solution in a matter of minutes and release it once the analysis has completed, thus eliminating the need for the infrastructure maintenance.
Abstract: As experimental biologists become increasingly reliant on high-throughput data production, the scale and sophistication of computational infrastructure needed to support data storage and analysis has grown dramatically. In addition, the computational infrastructure needs to be coupled with the appropriate data analysis tools. Such an environment requires informatics support to setup, configure and maintain the infrastructure. Moreover, once setup, the complete environment needs to be maintained during the periods of inactivity or low usage. For the experimentalists, such requirements represent a barrier to realizing the next step in science. Cloud computing has recently emerged as a model that is well suited for the periodic computational requirements convenient to experimental biologists. However, cloud computing resources are not yet suitable for immediate use by the experimentalists because they still need to be configured and managed. To help in enabling seamless next-generation sequencing (NGS) analyses on the cloud, we have developed Galaxy CloudMan. Galaxy CloudMan is a comprehensive manager for running and managing cloud computing resources. Cloud resources managed by Galaxy CloudMan are preconfigured with tools necessary for the NGS analyses. Access and interaction with the preconfigured NGS tools is handled through Galaxy, an open-source, web based system that provides an integrated analysis environment where domain scientists can, without informatics expertise, interactively construct multi-step analyses, with outputs from one step feeding seamlessly to the next. Separate from the Galaxy analysis interface, CloudMan offers a simple web-based interface that allows anyone to acquire a desired number of computational and storage resources on a cloud infrastructure and access the familiar Galaxy interface and associated tools. CloudMan automatically handles all aspects of resource acquisition, configuration, and data persistence, thus entirely insulating a user from the low-level computational details. With Galaxy CloudMan, an individual researcher can, without any informatics support, gain access to a complete NGS data analysis solution in a matter of minutes and release it once the analysis has completed, thus eliminating the need for the infrastructure maintenance.

24 citations


24 May 2010
TL;DR: This paper presents a tool that addresses this application-resource relationship issue by providing insight into past application performance across various resources by creating a standalone, dedicated application that stores all aspects of a job being executed for later retrieval, comparison, and analysis.
Abstract: Performance of any one application is more often than not very intimately related to the hardware and software characteristics of a resource the application is being executed on, as well as the use of application parameters during job instantiation. As a result, execution of applications and associated user jobs in heterogeneous grid environments exhibit heterogeneous performance. Users perceive this heterogeneity through inconsistent job execution times and cost. In order to understand, and possibly, eliminate such inconsistencies, grid schedulers and/or users must be aware of the existing application and resource relationships. This paper presents a tool that addresses this application-resource relationship issue by providing insight into past application performance across various resources. The tool named Application Performance Database (AppDB) is a standalone, dedicated application that stores all aspects of a job being executed for later retrieval, comparison, and analysis. Access to AppDB is realized through a web-service based API or a web interface.

3 citations


Journal ArticleDOI
TL;DR: Analysis of performance characteristics of NCBI BLAST on several resources and captures influence of resource characteristics and job parameters on BLAST job runtime across those resources shows runtime savings up to 50% and resource utilization improvement of approximately 40%.
Abstract: Sequence analysis has become essential to the study of genomes and biological research in general. Basic Local Alignment Search Tool (BLAST) leads the way as the most accepted method for performing necessary query searches and analysis of discovered genes. Combating growing data sizes, with the goal of speeding up job runtimes, scientist are resorting to grid computing technologies. However, grid environments are characterized by dynamic, heterogeneous, and transient state of available resources causing major hindrance to users when trying to realize user-desired levels of service. This paper analyzes performance characteristics of NCBI BLAST on several resources and captures influence of resource characteristics and job parameters on BLAST job runtime across those resources. Obtained results are summarized as a set of principles characterizing performance of NCBI BLAST across homogeneous and heterogeneous environments. These principles are then applied and verified through creation of a grid-enabled BLAST wrapper application called Dynamic BLAST. Results show runtime savings up to 50% and resource utilization improvement of approximately 40%.

2 citations