Home
/
Authors
/
Paul Dave

Author

Paul Dave

Other affiliations: Argonne National Laboratory

Bio: Paul Dave is an academic researcher from University of Chicago. The author has contributed to research in topics: Workflow & Cloud computing. The author has an hindex of 3, co-authored 4 publications receiving 31 citations. Previous affiliations of Paul Dave include Argonne National Laboratory.

Topics: Workflow, Cloud computing, Data management, Data access ...read more

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Experiences in building a next-generation sequencing analysis service using galaxy, globus online and Amazon web service

[...]

Ravi Madduri¹, Paul Dave², Dinanath Sulakhe², Lukasz Lacinski², Bo Liu², Ian Foster¹ - Show less +2 more•Institutions (2)

Argonne National Laboratory¹, University of Chicago²

22 Jul 2013

TL;DR: The Globus Genomics system allows biomedical researchers to perform rapid analysis of large NGS datasets using just a web browser in a fully automated manner, without software installation.

...read moreread less

Abstract: We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next-generation sequencing (NGS) genomic data. This system is notable for its high degree of end-to-end automation, which encompasses every stage of the data analysis pipeline from initial data access (from remote sequencing center or database, by the Globus Online file transfer system) to on-demand resource acquisition (on Amazon EC2, via the Globus Provision cloud manager); specification, configuration, and reuse of multi-step processing pipelines (via the Galaxy workflow system); and efficient scheduling of these pipelines over many processors (via the Condor scheduler). The system allows biomedical researchers to perform rapid analysis of large NGS datasets using just a web browser in a fully automated manner, without software installation.

...read moreread less

20 citations

Journal Article•DOI•

An integrative computational approach for prioritization of genomic variants.

[...]

Inna Dubchak¹, Inna Dubchak², Sandhya Balasubramanian³, Sheng Wang⁴, Cem Meyden⁵, Dinanath Sulakhe³, Dinanath Sulakhe⁶, Alexander Poliakov², Daniela Börnigen⁴, Daniela Börnigen³, Bingqing Xie³, Bingqing Xie⁷, Andrew Taylor³, Jianzhu Ma⁴, Alex R. Paciorkowski⁸, Ghayda Mirzaa⁹, Paul Dave⁶, Gady Agam⁷, Jinbo Xu⁴, Lihadh Al-Gazali¹⁰, Christopher E. Mason⁵, M. Elizabeth Ross⁵, Natalia Maltsev⁶, Natalia Maltsev³, T. Conrad Gilliam⁶, T. Conrad Gilliam³ - Show less +22 more•Institutions (10)

Lawrence Berkeley National Laboratory¹, Joint Genome Institute², University of Chicago³, Toyota Technological Institute at Chicago⁴, Cornell University⁵, Argonne National Laboratory⁶, Illinois Institute of Technology⁷, University of Rochester Medical Center⁸, Seattle Children's Research Institute⁹, United Arab Emirates University¹⁰

15 Dec 2014-PLOS ONE

TL;DR: The study demonstrates that the seamless integration of bioinformatics resources enables fast and efficient prioritization and characterization of genomic factors and molecular networks contributing to the phenotypes of interest.

...read moreread less

Abstract: An essential step in the discovery of molecular mechanisms contributing to disease phenotypes and efficient experimental planning is the development of weighted hypotheses that estimate the functional effects of sequence variants discovered by high-throughput genomics. With the increasing specialization of the bioinformatics resources, creating analytical workflows that seamlessly integrate data and bioinformatics tools developed by multiple groups becomes inevitable. Here we present a case study of a use of the distributed analytical environment integrating four complementary specialized resources, namely the Lynx platform, VISTA RViewer, the Developmental Brain Disorders Database (DBDB), and the RaptorX server, for the identification of high-confidence candidate genes contributing to pathogenesis of spina bifida. The analysis resulted in prediction and validation of deleterious mutations in the SLC19A placental transporter in mothers of the affected children that causes narrowing of the outlet channel and therefore leads to the reduced folate permeation rate. The described approach also enabled correct identification of several genes, previously shown to contribute to pathogenesis of spina bifida, and suggestion of additional genes for experimental validations. The study demonstrates that the seamless integration of bioinformatics resources enables fast and efficient prioritization and characterization of genomic factors and molecular networks contributing to the phenotypes of interest.

...read moreread less

7 citations

Journal Article•DOI•

Utilisation of a thoracic oncology database to capture radiological and pathological images for evaluation of response to chemotherapy in patients with malignant pleural mesothelioma

[...]

George B. Carey¹, Stephanie M. Kazantsev¹, Mosmi Surati², Cleo E. Rolle¹, Archana Kanteti³, Ahad Ali Sadiq¹, Neil Bahroos¹, Brigitte Raumann¹, Ravi Madduri¹, Paul Dave¹, Adam Starkey¹, Thomas A. Hensing¹, Aliya N. Husain¹, Everett E. Vokes¹, Wickii T. Vigneswaran¹, Samuel G. Armato¹, Hedy L. Kindler¹, Ravi Salgia¹ - Show less +14 more•Institutions (3)

University of Chicago¹, University of Michigan², University of Illinois at Chicago³

01 Jan 2012-BMJ Open

TL;DR: A retrospective investigation into tumour response for malignant pleural mesothelioma patients treated at the University of Chicago Medical Center with either of two analogous chemotherapy regimens and consented to at least one of two UCMC IRB protocols is performed.

...read moreread less

Abstract: Objective An area of need in cancer informatics is the ability to store images in a comprehensive database as part of translational cancer research. To meet this need, we have implemented a novel tandem database infrastructure that facilitates image storage and utilisation. Background We had previously implemented the Thoracic Oncology Program Database Project (TOPDP) database for our translational cancer research needs. While useful for many research endeavours, it is unable to store images, hence our need to implement an imaging database which could communicate easily with the TOPDP database. Methods The Thoracic Oncology Research Program (TORP) imaging database was designed using the Research Electronic Data Capture (REDCap) platform, which was developed by Vanderbilt University. To demonstrate proof of principle and evaluate utility, we performed a retrospective investigation into tumour response for malignant pleural mesothelioma (MPM) patients treated at the University of Chicago Medical Center with either of two analogous chemotherapy regimens and consented to at least one of two UCMC IRB protocols, 9571 and 13473A. Results A cohort of 22 MPM patients was identified using clinical data in the TOPDP database. After measurements were acquired, two representative CT images and 0–35 histological images per patient were successfully stored in the TORP database, along with clinical and demographic data. Discussion We implemented the TORP imaging database to be used in conjunction with our comprehensive TOPDP database. While it requires an additional effort to use two databases, our database infrastructure facilitates more comprehensive translational research. Conclusions The investigation described herein demonstrates the successful implementation of this novel tandem imaging database infrastructure, as well as the potential utility of investigations enabled by it. The data model presented here can be utilised as the basis for further development of other larger, more streamlined databases in the future.

...read moreread less

4 citations

Proceedings Article•DOI•

Distributed tools deployment and management for multiple galaxy instances in globus genomics

[...]

Dinanath Sulakhe¹, Alex Rodriguez¹, Nick Prozorovsky², Nilesh Kavthekar³, Ravi Madduri¹, Amol Parikh⁴, Paul Dave¹, Lukasz Lacinski¹, Ian Foster¹ - Show less +5 more•Institutions (4)

University of Chicago¹, University of Illinois at Urbana–Champaign², University of Pennsylvania³, VIT University⁴

17 Nov 2013

TL;DR: This work presents the challenges associated with managing multiple Galaxy instances on the cloud for various research groups using Globus Genomics, a cloud based platform-as-a-service (PaaS) that provides the Galaxy workflow system as a hosted service along with data management capabilities using Globu Online.

...read moreread less

Abstract: Workflow systems play an important role in the analysis of the fast-growing genomics data produced by low-cost next generation sequencing (NGS) technologies. Many biomedical research groups lack the expertise to assemble and run the sophisticated computational pipelines required for high-throughput analysis of such data. There is an urgent need for services that can allow researchers to run their analytical workflows where they can define their own research methodologies by selecting the tools of their interest. We present the challenges associated with managing multiple Galaxy instances on the cloud for various research groups using Globus Genomics, a cloud based platform-as-a-service (PaaS) that provides the Galaxy workflow system as a hosted service along with data management capabilities using Globus Online. We address the unique challenges, our strategy, and a tool for automatically deploying and managing hundreds of analytical tools coming from the public Galaxy Tool Shed, new tools wrapped by our group, and tools wrapped by end users across multiple Galaxy instances hosted with Globus Genomics.

...read moreread less

1 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

QoS-Aware Autonomic Resource Management in Cloud Computing: A Systematic Review

[...]

Sukhpal Singh¹, Inderveer Chana¹•Institutions (1)

Thapar University¹

22 Dec 2015-ACM Computing Surveys

TL;DR: This research work will help researchers find the important characteristics of autonomic resource management and will also help to select the most suitable technique for autonomicresource management in a specific application along with significant future research directions.

...read moreread less

Abstract: As computing infrastructure expands, resource management in a large, heterogeneous, and distributed environment becomes a challenging task. In a cloud environment, with uncertainty and dispersion of resources, one encounters problems of allocation of resources, which is caused by things such as heterogeneity, dynamism, and failures. Unfortunately, existing resource management techniques, frameworks, and mechanisms are insufficient to handle these environments, applications, and resource behaviors. To provide efficient performance of workloads and applications, the aforementioned characteristics should be addressed effectively. This research depicts a broad methodical literature analysis of autonomic resource management in the area of the cloud in general and QoS (Quality of Service)-aware autonomic resource management specifically. The current status of autonomic resource management in cloud computing is distributed into various categories. Methodical analysis of autonomic resource management in cloud computing and its techniques are described as developed by various industry and academic groups. Further, taxonomy of autonomic resource management in the cloud has been presented. This research work will help researchers find the important characteristics of autonomic resource management and will also help to select the most suitable technique for autonomic resource management in a specific application along with significant future research directions.

...read moreread less

177 citations

Proceedings Article•DOI•

Skyport: container-based execution environment management for multi-cloud scientific workflows

[...]

Wolfgang Gerlach¹, Wei Tang¹, Kevin P. Keegan¹, Travis Harrison¹, Andreas Wilke¹, Jared M. Bischof¹, Mark D'Souza¹, Scott Devoid¹, Daniel E. Murphy-Olson¹, Narayan Desai², Folker Meyer¹ - Show less +7 more•Institutions (2)

Argonne National Laboratory¹, Ericsson²

16 Nov 2014

TL;DR: The Skyport data analysis platform that provides scalable workflow execution environments for scientific data in the cloud, Skyport greatly reduces the complexity associated with providing the environment necessary to execute complex workflows.

...read moreread less

Abstract: Recently, Linux container technology has been gaining attention as it promises to transform the way software is developed and deployed. The portability and ease of deployment makes Linux containers an ideal technology to be used in scientific workflow platforms. Skyport utilizes Docker Linux containers to solve software deployment problems and resource utilization inefficiencies inherent to all existing scientific workflow platforms. As an extension to AWE/Shock, our data analysis platform that provides scalable workflow execution environments for scientific data in the cloud, Skyport greatly reduces the complexity associated with providing the environment necessary to execute complex workflows.

...read moreread less

97 citations

Journal Article•DOI•

PERCH: A Unified Framework for Disease Gene Prioritization

[...]

Bing Jian Feng¹•Institutions (1)

University of Utah¹

28 Jan 2017-Human Mutation

TL;DR: A framework that can prioritize disease genes by quantitatively unifying a new deleteriousness measure called BayesDel, an improved assessment of the biological relevance of genes to the disease, a modified linkage analysis, a novel rare‐variant association test, and a converted variant call quality score is described.

...read moreread less

Abstract: To interpret genetic variants discovered from next-generation sequencing, integration of heterogeneous information is vital for success. This article describes a framework named PERCH (Polymorphism Evaluation, Ranking, and Classification for a Heritable trait), available at http://BJFengLab.org/. It can prioritize disease genes by quantitatively unifying a new deleteriousness measure called BayesDel, an improved assessment of the biological relevance of genes to the disease, a modified linkage analysis, a novel rare-variant association test, and a converted variant call quality score. It supports data that contain various combinations of extended pedigrees, trios, and case-controls, and allows for a reduced penetrance, an elevated phenocopy rate, liability classes, and covariates. BayesDel is more accurate than PolyPhen2, SIFT, FATHMM, LRT, Mutation Taster, Mutation Assessor, PhyloP, GERP++, SiPhy, CADD, MetaLR, and MetaSVM. The overall approach is faster and more powerful than the existing quantitative method pVAAST, as shown by the simulations of challenging situations in finding the missing heritability of a complex disease. This framework can also classify variants of unknown significance (variants of uncertain significance) by quantitatively integrating allele frequencies, deleteriousness, association, and co-segregation. PERCH is a versatile tool for gene prioritization in gene discovery research and variant classification in clinical genetic testing.

...read moreread less

93 citations

Journal Article•DOI•

Experiences building Globus Genomics: a next-generation sequencing analysis service using Galaxy, Globus, and Amazon Web Services

[...]

Ravi Madduri¹, Dinanath Sulakhe¹, Lukasz Lacinski¹, Bo Liu¹, Alex Rodriguez¹, Kyle Chard¹, Utpal J. Dave¹, Ian Foster¹ - Show less +4 more•Institutions (1)

Argonne National Laboratory¹

10 Sep 2014-Concurrency and Computation: Practice and Experience

TL;DR: The Globus Genomics system allows biomedical researchers to perform rapid analysis of large next‐generation sequencing datasets in a fully automated manner, without software installation or a need for any local computing infrastructure.

...read moreread less

Abstract: We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next-generation sequencing genomic data. This system achieves a high degree of end-to-end automation that encompasses every stage of data analysis including initial data retrieval from remote sequencing centers or storage via the Globus file transfer system; specification, configuration, and reuse of multistep processing pipelines via the Galaxy workflow system; creation of custom Amazon Machine Images and on-demand resource acquisition via a specialized elastic provisioner on Amazon EC2; and efficient scheduling of these pipelines over many processors via the HTCondor scheduler. The system allows biomedical researchers to perform rapid analysis of large next-generation sequencing datasets in a fully automated manner, without software installation or a need for any local computing infrastructure. We report performance and cost results for some representative workloads. Copyright © 2014 John Wiley & Sons, Ltd.

...read moreread less

65 citations

Journal Article•DOI•

The Globus Galaxies platform: delivering science gateways as a service

[...]

Ravi Madduri¹, Kyle Chard¹, Ryan Chard², Lukasz Lacinski¹, Alex Rodriguez¹, Dinanath Sulakhe¹, David Kelly¹, Utpal J. Dave¹, Ian Foster¹ - Show less +5 more•Institutions (2)

Argonne National Laboratory¹, Victoria University of Wellington²

01 Nov 2015-Concurrency and Computation: Practice and Experience

TL;DR: A domain‐independent, cloud‐based science gateway platform, the Globus Galaxies platform, which overcomes this gap by providing a set of hosted services that directly address the needs of science gateway developers.

...read moreread less

Abstract: Summary The use of public cloud computers to host sophisticated scientific data and software is transforming scientific practice by enabling broad access to capabilities previously available only to the few. The primary obstacle to more widespread use of public clouds to host scientific software (‘cloud-based science gateways’) has thus far been the considerable gap between the specialized needs of science applications and the capabilities provided by cloud infrastructures. We describe here a domain-independent, cloud-based science gateway platform, the Globus Galaxies platform, which overcomes this gap by providing a set of hosted services that directly address the needs of science gateway developers. The design and implementation of this platform leverages our several years of experience with Globus Genomics, a cloud-based science gateway that has served more than 200 genomics researchers across 30 institutions. Building on that foundation, we have implemented a platform that leverages the popular Galaxy system for application hosting and workflow execution; Globus services for data transfer, user and group management, and authentication; and a cost-aware elastic provisioning model specialized for public cloud resources. We describe here the capabilities and architecture of this platform, present six scientific domains in which we have successfully applied it, report on user experiences, and analyze the economics of our deployments. Published 2015. This article is a U.S. Government work and is in the public domain in the USA. Concurrency and Computation: Practice and Experience published by John Wiley & Sons Ltd.

...read moreread less

45 citations

1
2
3
4
…
5
6
7

Collapse