scispace - formally typeset
Search or ask a question
Author

Dinanath Sulakhe

Bio: Dinanath Sulakhe is an academic researcher from University of Chicago. The author has contributed to research in topics: Medicine & Workflow. The author has an hindex of 12, co-authored 31 publications receiving 625 citations. Previous affiliations of Dinanath Sulakhe include Toyota Technological Institute at Chicago & Argonne National Laboratory.

Papers
More filters
Proceedings ArticleDOI
04 Jun 2004
TL;DR: The Grid2003 Project has deployed a multivirtual organization, application-driven grid laboratory that has sustained for several months the production-level services required by physics experiments of the Large Hadron Collider at CERN, the Sloan Digital Sky Survey project, the gravitational wave search experiment LIGO, the BTeV experiment at Fermilab, as well as applications in molecular structure analysis and genome analysis, and computer science research projects in such areas as job and data scheduling.
Abstract: The Grid2003 Project has deployed a multivirtual organization, application-driven grid laboratory ("Grid3") that has sustained for several months the production-level services required by physics experiments of the Large Hadron Collider at CERN (ATLAS and CMS), the Sloan Digital Sky Survey project, the gravitational wave search experiment LIGO, the BTeV experiment at Fermilab, as well as applications in molecular structure analysis and genome analysis, and computer science research projects in such areas as job and data scheduling. The deployed infrastructure has been operating since November 2003 with 27 sites, a peak of 2800 processors, work loads from 10 different applications exceeding 1300 simultaneous jobs, and data transfers among sites of greater than 2 TB/day. We describe the principles that have guided the development of this unique infrastructure and the practical experiences that have resulted from its creation and use. We discuss application requirements for grid services deployment and configuration, monitoring infrastructure, application performance, metrics, and operational experiences. We also summarize lessons learned.

138 citations

Journal ArticleDOI
TL;DR: The PUMA2 system is an interactive, integrated bioinformatics environment for high-throughput genetic sequence analysis and metabolic reconstructions from sequence data that provides a framework for comparative and evolutionary analysis of genomic data and metabolic networks in the context of taxonomic and phenotypic information.
Abstract: The PUMA2 system (available at http://compbio.mcs. 10 anl.gov/puma2) is an interactive, integrated bioinformatics environment for high-throughput genetic sequence analysis and metabolic reconstructions from sequence data. PUMA2 provides a framework for comparative and evolutionary analysis of geno15 mic data and metabolic networks in the context of taxonomic and phenotypic information. Grid infrastructure is used to perform computationally intensive tasks. PUMA2 currently contains precomputed analysis of 213 prokaryotic, 22 eukaryotic, 650 mito20 chondrial and 1493 viral genomes and automated metabolic reconstructions for .200 organisms. Genomic data is annotated with information integrated from .20 sequence, structural and metabolic databases and ontologies. PUMA2 supports both 25 automated and interactive expert-driven annotation of genomes, using a variety of publicly available bioinformatics tools. It also contains a suite of unique PUMA2 tools for automated assignment of gene function, evolutionary analysis of protein families 30 and comparative analysis of metabolic pathways. PUMA2 allows users to submit batch sequence data for automated functional analysis and construction of metabolic models. The results of these analyses are made available to the users in the PUMA2 envir35 onment for further interactive sequence analysis and annotation.

72 citations

Journal ArticleDOI
TL;DR: The Globus Genomics system allows biomedical researchers to perform rapid analysis of large next‐generation sequencing datasets in a fully automated manner, without software installation or a need for any local computing infrastructure.
Abstract: We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next-generation sequencing genomic data. This system achieves a high degree of end-to-end automation that encompasses every stage of data analysis including initial data retrieval from remote sequencing centers or storage via the Globus file transfer system; specification, configuration, and reuse of multistep processing pipelines via the Galaxy workflow system; creation of custom Amazon Machine Images and on-demand resource acquisition via a specialized elastic provisioner on Amazon EC2; and efficient scheduling of these pipelines over many processors via the HTCondor scheduler. The system allows biomedical researchers to perform rapid analysis of large next-generation sequencing datasets in a fully automated manner, without software installation or a need for any local computing infrastructure. We report performance and cost results for some representative workloads. Copyright © 2014 John Wiley & Sons, Ltd.

65 citations

Journal ArticleDOI
TL;DR: The caGrid Workflow Toolkit is designed and implemented to ease building and running caGrid workflows and provides users with support for various phases in using workflows: service discovery, composition and orchestration, data access, and secure service invocation.
Abstract: Background In biological and medical domain, the use of web services made the data and computation functionality accessible in a unified manner, which helped automate the data pipeline that was previously performed manually. Workflow technology is widely used in the orchestration of multiple services to facilitate in-silico research. Cancer Biomedical Informatics Grid (caBIG) is an information network enabling the sharing of cancer research related resources and caGrid is its underlying service-based computation infrastructure. CaBIG requires that services are composed and orchestrated in a given sequence to realize data pipelines, which are often called scientific workflows.

53 citations

Journal ArticleDOI
TL;DR: A domain‐independent, cloud‐based science gateway platform, the Globus Galaxies platform, which overcomes this gap by providing a set of hosted services that directly address the needs of science gateway developers.
Abstract: Summary The use of public cloud computers to host sophisticated scientific data and software is transforming scientific practice by enabling broad access to capabilities previously available only to the few. The primary obstacle to more widespread use of public clouds to host scientific software (‘cloud-based science gateways’) has thus far been the considerable gap between the specialized needs of science applications and the capabilities provided by cloud infrastructures. We describe here a domain-independent, cloud-based science gateway platform, the Globus Galaxies platform, which overcomes this gap by providing a set of hosted services that directly address the needs of science gateway developers. The design and implementation of this platform leverages our several years of experience with Globus Genomics, a cloud-based science gateway that has served more than 200 genomics researchers across 30 institutions. Building on that foundation, we have implemented a platform that leverages the popular Galaxy system for application hosting and workflow execution; Globus services for data transfer, user and group management, and authentication; and a cost-aware elastic provisioning model specialized for public cloud resources. We describe here the capabilities and architecture of this platform, present six scientific domains in which we have successfully applied it, report on user experiences, and analyze the economics of our deployments. Published 2015. This article is a U.S. Government work and is in the public domain in the USA. Concurrency and Computation: Practice and Experience published by John Wiley & Sons Ltd.

45 citations


Cited by
More filters
Book ChapterDOI
30 Nov 2005
TL;DR: The principal characteristics of the latest release, the Web services-based GT4, which provides significant improvements over previous releases in terms of robustness, performance, usability, documentation, standards compliance, and functionality are summarized.
Abstract: The Globus Toolkit (GT) has been developed since the late 1990s to support the development of service-oriented distributed computing applications and infrastructures. Core GT components address, within a common framework, basic issues relating to security, resource access, resource management, data movement, resource discovery, and so forth. These components enable a broader “Globus ecosystem” of tools and components that build on, or interoperate with, core GT functionality to provide a wide range of useful application-level functions. These tools have in turn been used to develop a wide range of both “Grid” infrastructures and distributed applications. I summarize here the principal characteristics of the latest release, the Web services-based GT4, which provides significant improvements over previous releases in terms of robustness, performance, usability, documentation, standards compliance, and functionality.

1,509 citations

Journal Article
TL;DR: Why interactome networks are important to consider in biology, how they can be mapped and integrated with each other, what global properties are starting to emerge from interactome network models, and how these properties may relate to human disease are detailed.
Abstract: Complex biological systems and cellular networks may underlie most genotype to phenotype relationships. Here, we review basic concepts in network biology, discussing different types of interactome networks and the insights that can come from analyzing them. We elaborate on why interactome networks are important to consider in biology, how they can be mapped and integrated with each other, what global properties are starting to emerge from interactome network models, and how these properties may relate to human disease.

1,323 citations

Proceedings Article
01 Jan 2003

1,212 citations

Journal ArticleDOI
TL;DR: An update to the taverna tool suite is provided, highlighting new features and developments in the workbench and the Taverna Server.
Abstract: The Taverna workflow tool suite (http://www.taverna.org.uk) is designed to combine distributed Web Services and/or local tools into complex analysis pipelines. These pipelines can be executed on local desktop machines or through larger infrastructure (such as supercomputers, Grids or cloud environments), using the Taverna Server. In bioinformatics, Taverna workflows are typically used in the areas of high-throughput omics analyses (for example, proteomics or transcriptomics), or for evidence gathering methods involving text mining or data mining. Through Taverna, scientists have access to several thousand different tools and resources that are freely available from a large range of life science institutions. Once constructed, the workflows are reusable, executable bioinformatics protocols that can be shared, reused and repurposed. A repository of public workflows is available at http://www.myexperiment.org. This article provides an update to the Taverna tool suite, highlighting new features and developments in the workbench and the Taverna Server.

724 citations