Home
/
Authors
/
Alex Rodriguez

Author

Alex Rodriguez

Other affiliations: Argonne National Laboratory

Bio: Alex Rodriguez is an academic researcher from University of Chicago. The author has contributed to research in topics: Workflow & Cloud computing. The author has an hindex of 12, co-authored 22 publications receiving 597 citations. Previous affiliations of Alex Rodriguez include Argonne National Laboratory.

Topics: Workflow, Cloud computing, Grid computing, Grid, TeraGrid ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

FIGfams: yet another set of protein families

[...]

Folker Meyer¹, Ross Overbeek¹, Alex Rodriguez¹•Institutions (1)

Argonne National Laboratory¹

01 Nov 2009-Nucleic Acids Research

TL;DR: This work presents FIGfams, a new collection of over 100 000 protein families that are the product of manual curation and close strain comparison, and Associated with each FIGfam is a two-tiered, rapid, accurate decision procedure to determine family membership for new proteins.

...read moreread less

Abstract: We present FIGfams, a new collection of over 100 000 protein families that are the product of manual curation and close strain comparison. Using the Subsystem approach the manual curation is carried out, ensuring a previously unattained degree of throughput and consistency. FIGfams are based on over 950 000 manually annotated proteins and across many hundred Bacteria and Archaea. Associated with each FIGfam is a two-tiered, rapid, accurate decision procedure to determine family membership for new proteins. FIGfams are freely available under an open source license. These can be downloaded at ftp://ftp.theseed.org/FIGfams/. The web site for FIGfams is http://www.theseed.org/wiki/FIGfams/

...read moreread less

148 citations

Proceedings Article•DOI•

The Grid2003 production grid: principles and practice

[...]

Ian Foster¹, J. Gieraltowski¹, S. Gose¹, Natalia Maltsev¹, E. May¹, Alex Rodriguez¹, Dinanath Sulakhe¹, Alexandre Vaniachine¹, James Shank², S. Youssef², D. Adams³, R. Baker³, W. Deng³, J. Smith³, Dantong Yu³, I. Legrand⁴, S. Singh⁴, Conrad Steenberg⁴, Y. Xia⁴, A. Afaq, E. Berman, James Annis, Lothar At Bauerdick, Michael Ernst, Ian Fisk, L. Giacchetti, G. Graham, A. Heavey, Jozef Kaiser, N. Kuropatkin, Ruth Pordes, V. Sekhri, J. Weigand, Y. Wu, K. Baker⁵, L. Sorrillo⁵, John Huth⁶, M. Allen⁷, L. Grundhoefer⁷, J. Hicks⁷, F. Luehring⁷, S. Peck⁷, Rob Quick⁷, Stephen C. Simms⁷, G. Fekete⁸, J. VandenBerg⁸, K. Cho, K. Kwon, D. Son, H. Park, Shane Canon⁹, Keith Jackson⁹, David E. Konerding⁹, Jason Lee⁹, Doug Olson⁹, I. Sakrejda⁹, Brian Tierney⁹, Mark L. Green, Russ Miller, James Letts, Tim Martin, D. Bury¹⁰, Catalin Dumitrescu¹⁰, D. Engh¹⁰, Robert Gardner¹⁰, M. Mambelli¹⁰, Y. Smirnov¹⁰, Jens Voeckler¹⁰, Michael Wilde¹⁰, Yong Zhao¹⁰, X. Zhao¹⁰, Paul Avery, Richard Cavanaugh, B. Kim, C.Y. Prescott, Jorge Luis Rodriguez, A. Zahn, Shawn McKee¹¹, Chris Jordan, J. Prewett, T. L. Thomas, Horst Severini, Ben Clifford, Ewa Deelman, L. Flon, Carl Kesselman, Gaurang Mehta, N. Olomu, Karan Vahi, K. De, P McGuigan, M. Sosebee, D. Bradley¹², Peter Couvares¹², A A De Smet¹², C. Kireyev¹², E. Paulson¹², Alain Roy¹², Scott Koranda, B. Moe, B. Brown¹³, Paul Sheldon¹³ - Show less +98 more•Institutions (13)

Argonne National Laboratory¹, Boston University², Brookhaven College³, California Institute of Technology⁴, Hampton University⁵, Harvard University⁶, Indiana University⁷, Johns Hopkins University⁸, Lawrence Berkeley National Laboratory⁹, University of Chicago¹⁰, University of Michigan¹¹, University of Wisconsin-Madison¹², Vanderbilt University¹³

04 Jun 2004

TL;DR: The Grid2003 Project has deployed a multivirtual organization, application-driven grid laboratory that has sustained for several months the production-level services required by physics experiments of the Large Hadron Collider at CERN, the Sloan Digital Sky Survey project, the gravitational wave search experiment LIGO, the BTeV experiment at Fermilab, as well as applications in molecular structure analysis and genome analysis, and computer science research projects in such areas as job and data scheduling.

...read moreread less

Abstract: The Grid2003 Project has deployed a multivirtual organization, application-driven grid laboratory ("Grid3") that has sustained for several months the production-level services required by physics experiments of the Large Hadron Collider at CERN (ATLAS and CMS), the Sloan Digital Sky Survey project, the gravitational wave search experiment LIGO, the BTeV experiment at Fermilab, as well as applications in molecular structure analysis and genome analysis, and computer science research projects in such areas as job and data scheduling. The deployed infrastructure has been operating since November 2003 with 27 sites, a peak of 2800 processors, work loads from 10 different applications exceeding 1300 simultaneous jobs, and data transfers among sites of greater than 2 TB/day. We describe the principles that have guided the development of this unique infrastructure and the practical experiences that have resulted from its creation and use. We discuss application requirements for grid services deployment and configuration, monitoring infrastructure, application performance, metrics, and operational experiences. We also summarize lessons learned.

...read moreread less

138 citations

Journal Article•DOI•

Experiences building Globus Genomics: a next-generation sequencing analysis service using Galaxy, Globus, and Amazon Web Services

[...]

Ravi Madduri¹, Dinanath Sulakhe¹, Lukasz Lacinski¹, Bo Liu¹, Alex Rodriguez¹, Kyle Chard¹, Utpal J. Dave¹, Ian Foster¹ - Show less +4 more•Institutions (1)

Argonne National Laboratory¹

10 Sep 2014-Concurrency and Computation: Practice and Experience

TL;DR: The Globus Genomics system allows biomedical researchers to perform rapid analysis of large next‐generation sequencing datasets in a fully automated manner, without software installation or a need for any local computing infrastructure.

...read moreread less

Abstract: We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next-generation sequencing genomic data. This system achieves a high degree of end-to-end automation that encompasses every stage of data analysis including initial data retrieval from remote sequencing centers or storage via the Globus file transfer system; specification, configuration, and reuse of multistep processing pipelines via the Galaxy workflow system; creation of custom Amazon Machine Images and on-demand resource acquisition via a specialized elastic provisioner on Amazon EC2; and efficient scheduling of these pipelines over many processors via the HTCondor scheduler. The system allows biomedical researchers to perform rapid analysis of large next-generation sequencing datasets in a fully automated manner, without software installation or a need for any local computing infrastructure. We report performance and cost results for some representative workloads. Copyright © 2014 John Wiley & Sons, Ltd.

...read moreread less

65 citations

Journal Article•DOI•

The Globus Galaxies platform: delivering science gateways as a service

[...]

Ravi Madduri¹, Kyle Chard¹, Ryan Chard², Lukasz Lacinski¹, Alex Rodriguez¹, Dinanath Sulakhe¹, David Kelly¹, Utpal J. Dave¹, Ian Foster¹ - Show less +5 more•Institutions (2)

Argonne National Laboratory¹, Victoria University of Wellington²

01 Nov 2015-Concurrency and Computation: Practice and Experience

TL;DR: A domain‐independent, cloud‐based science gateway platform, the Globus Galaxies platform, which overcomes this gap by providing a set of hosted services that directly address the needs of science gateway developers.

...read moreread less

Abstract: Summary The use of public cloud computers to host sophisticated scientific data and software is transforming scientific practice by enabling broad access to capabilities previously available only to the few. The primary obstacle to more widespread use of public clouds to host scientific software (‘cloud-based science gateways’) has thus far been the considerable gap between the specialized needs of science applications and the capabilities provided by cloud infrastructures. We describe here a domain-independent, cloud-based science gateway platform, the Globus Galaxies platform, which overcomes this gap by providing a set of hosted services that directly address the needs of science gateway developers. The design and implementation of this platform leverages our several years of experience with Globus Genomics, a cloud-based science gateway that has served more than 200 genomics researchers across 30 institutions. Building on that foundation, we have implemented a platform that leverages the popular Galaxy system for application hosting and workflow execution; Globus services for data transfer, user and group management, and authentication; and a cost-aware elastic provisioning model specialized for public cloud resources. We describe here the capabilities and architecture of this platform, present six scientific domains in which we have successfully applied it, report on user experiences, and analyze the economics of our deployments. Published 2015. This article is a U.S. Government work and is in the public domain in the USA. Concurrency and Computation: Practice and Experience published by John Wiley & Sons Ltd.

...read moreread less

45 citations

Journal Article•DOI•

LINE1 insertions as a genomic risk factor for schizophrenia: Preliminary evidence from an affected family

[...]

Guia Guffanti¹, Simona Gaudi, Torsten Klengel¹, James H. Fallon², Harry Mangalam², Ravi Madduri³, Alex Rodriguez³, Alex Rodriguez⁴, Paula DeCrescenzo⁵, Emily Glovienka⁵, Janet L. Sobell⁶, Claudia Klengel¹, Michele T. Pato⁶, Kerry J. Ressler¹, Carlos N. Pato⁶, Fabio Macciardi² - Show less +12 more•Institutions (6)

Harvard University¹, University of California, Irvine², Argonne National Laboratory³, University of Chicago⁴, Columbia University Medical Center⁵, SUNY Downstate Medical Center⁶

01 Jun 2016-American Journal of Medical Genetics

TL;DR: An analytical workflow is developed to identify L1 polymorphic insertions with next‐generation sequencing (NGS) using data from a family in which SZ segregates, showing the utility of NGS to uncover a neglected type of genetic variants with the potential to influence the risk of schizophrenia like SNVs and CNVs.

...read moreread less

Abstract: Recent studies show that human-specific LINE1s (L1HS) play a key role in the development of the central nervous system (CNS) and its disorders, and that their transpositions within the human genome are more common than previously thought. Many polymorphic L1HS, that is, present or absent across individuals, are not annotated in the current release of the genome and are customarily termed "non-reference L1s." We developed an analytical workflow to identify L1 polymorphic insertions with next-generation sequencing (NGS) using data from a family in which SZ segregates. Our workflow exploits two independent algorithms to detect non-reference L1 insertions, performs local de novo alignment of the regions harboring predicted L1 insertions and resolves the L1 subfamily designation from the de novo assembled sequence. We found 110 non-reference L1 polymorphic loci exhibiting Mendelian inheritance, the vast majority of which are already reported in dbRIP and/or euL1db, thus, confirming their status as non-reference L1 polymorphic insertions. Four previously undetected L1 polymorphic loci were confirmed by PCR amplification and direct sequencing of the insert. A large fraction of our non-reference L1s is located within the open reading frame of protein-coding genes that belong to pathways already implicated in the pathogenesis of schizophrenia. The finding of these polymorphic variants among SZ offsprings is intriguing and suggestive of putative pathogenic role. Our data show the utility of NGS to uncover L1 polymorphic insertions, a neglected type of genetic variants with the potential to influence the risk to develop schizophrenia like SNVs and CNVs. © 2016 Wiley Periodicals, Inc.

...read moreread less

33 citations

1
2
3
4
…
5

Cited by

PDF

Open Access

More filters

KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集ゲノム医学の現在と未来--基礎と臨床) -- (データベース)

[...]

光輝中尾, 實金久

01 Jan 2000

3,536 citations

Journal Article•DOI•

The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST)

[...]

Ross Overbeek¹, Robert Olson¹, Gordon D. Pusch¹, Gary J. Olsen¹, James J. Davis¹, Terry Disz¹, Robert Edwards², Svetlana Gerdes¹, Bruce Parrello¹, Maulik Shukla³, Veronika Vonstein¹, Alice R. Wattam³, Fangfang Xia¹, Rick Stevens¹ - Show less +10 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, San Diego State University², Virginia Tech³

01 Jan 2014-Nucleic Acids Research

TL;DR: The interconnectedness of the SEED database and RAST, the RAST annotation pipeline and updates to both resources are described.

...read moreread less

Abstract: In 2004, the SEED (http://pubseed.theseed.org/) was created to provide consistent and accurate genome annotations across thousands of genomes and as a platform for discovering and developing de novo annotations. The SEED is a constantly updated integration of genomic data with a genome database, web front end, API and server scripts. It is used by many scientists for predicting gene functions and discovering new pathways. In addition to being a powerful database for bioinformatics research, the SEED also houses subsystems (collections of functionally related protein families) and their derived FIGfams (protein families), which represent the core of the RAST annotation engine (http://rast.nmpdr.org/). When a new genome is submitted to RAST, genes are called and their annotations are made by comparison to the FIGfam collection. If the genome is made public, it is then housed within the SEED and its proteins populate the FIGfam collection. This annotation cycle has proven to be a robust and scalable solution to the problem of annotating the exponentially increasing number of genomes. To date, >12 000 users worldwide have annotated >60 000 distinct genomes using RAST. Here we describe the interconnectedness of the SEED database and RAST, the RAST annotation pipeline and updates to both resources.

...read moreread less

3,415 citations

Journal Article•DOI•

The Metagenomics RAST Server: A Public Resource for the Automatic Phylogenetic and Functional Analysis of Metagenomes

[...]

Folker Meyer¹, Folker Meyer², Daniel Paarmann¹, Mark D'Souza¹, Robert Olson², Elizabeth M. Glass², Michael Kubal¹, Tobias Paczian², Alexis A. Rodriguez¹, Rick Stevens², Rick Stevens¹, Andreas Wilke¹, Jared Wilkening², Robert Edwards³, Robert Edwards² - Show less +11 more•Institutions (3)

University of Chicago¹, Argonne National Laboratory², San Diego State University³

19 Sep 2008-BMC Bioinformatics

TL;DR: The open-source metagenomics RAST service provides a new paradigm for the annotation and analysis of metagenomes that is stable, extensible, and freely available to all researchers.

...read moreread less

Abstract: Random community genomes (metagenomes) are now commonly used to study microbes in different environments. Over the past few years, the major challenge associated with metagenomics shifted from generating to analyzing sequences. High-throughput, low-cost next-generation sequencing has provided access to metagenomics to a wide range of researchers. A high-throughput pipeline has been constructed to provide high-performance computing to all researchers interested in using metagenomics. The pipeline produces automated functional assignments of sequences in the metagenome by comparing both protein and nucleotide databases. Phylogenetic and functional summaries of the metagenomes are generated, and tools for comparative metagenomics are incorporated into the standard views. User access is controlled to ensure data privacy, but the collaborative environment underpinning the service provides a framework for sharing datasets between multiple users. In the metagenomics RAST, all users retain full control of their data, and everything is available for download in a variety of formats. The open-source metagenomics RAST service provides a new paradigm for the annotation and analysis of metagenomes. With built-in support for multiple data sources and a back end that houses abstract data types, the metagenomics RAST is stable, extensible, and freely available to all researchers. This service has removed one of the primary bottlenecks in metagenome sequence analysis – the availability of high-performance computing for annotating the data. http://metagenomics.nmpdr.org

...read moreread less

3,322 citations

Journal Article•DOI•

RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes

[...]

Thomas Brettin¹, Thomas Brettin², James J. Davis¹, James J. Davis², Terry Disz, Robert Edwards¹, Robert Edwards³, Svetlana Gerdes¹, Gary J. Olsen⁴, Robert Olson², Robert Olson¹, Ross Overbeek¹, Bruce Parrello¹, Gordon D. Pusch¹, Maulik Shukla⁵, James Thomason⁶, Rick Stevens², Rick Stevens¹, Veronika Vonstein¹, Alice R. Wattam⁵, Fangfang Xia¹, Fangfang Xia² - Show less +18 more•Institutions (6)

Argonne National Laboratory¹, University of Chicago², San Diego State University³, University of Illinois at Urbana–Champaign⁴, Virginia Tech⁵, Cold Spring Harbor Laboratory⁶

10 Feb 2015-Scientific Reports

TL;DR: The RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines and offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job.

...read moreread less

Abstract: The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.

...read moreread less

1,666 citations

Book Chapter•DOI•

Globus toolkit version 4: software for service-oriented systems

[...]

Ian Foster¹•Institutions (1)

Argonne National Laboratory¹

30 Nov 2005

TL;DR: The principal characteristics of the latest release, the Web services-based GT4, which provides significant improvements over previous releases in terms of robustness, performance, usability, documentation, standards compliance, and functionality are summarized.

...read moreread less

Abstract: The Globus Toolkit (GT) has been developed since the late 1990s to support the development of service-oriented distributed computing applications and infrastructures. Core GT components address, within a common framework, basic issues relating to security, resource access, resource management, data movement, resource discovery, and so forth. These components enable a broader “Globus ecosystem” of tools and components that build on, or interoperate with, core GT functionality to provide a wide range of useful application-level functions. These tools have in turn been used to develop a wide range of both “Grid” infrastructures and distributed applications. I summarize here the principal characteristics of the latest release, the Web services-based GT4, which provides significant improvements over previous releases in terms of robustness, performance, usability, documentation, standards compliance, and functionality.

...read moreread less

1,509 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128

Collapse