Home
/
Authors
/
Philip F. LoCascio

Author

Philip F. LoCascio

Other affiliations: University of Tennessee

Bio: Philip F. LoCascio is an academic researcher from Oak Ridge National Laboratory. The author has contributed to research in topics: Protein structure prediction & Massively parallel. The author has an hindex of 8, co-authored 13 publications receiving 9466 citations. Previous affiliations of Philip F. LoCascio include University of Tennessee.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Prodigal: prokaryotic gene recognition and translation initiation site identification

[...]

Doug Hyatt¹, Doug Hyatt², Gwo Liang Chen², Philip F. LoCascio², Miriam Land², Frank W. Larimer¹, Frank W. Larimer², Loren Hauser² - Show less +4 more•Institutions (2)

University of Tennessee¹, Oak Ridge National Laboratory²

08 Mar 2010-BMC Bioinformatics

TL;DR: This work developed a new gene prediction algorithm called Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm), which achieved good results compared to existing methods, and it is believed it will be a valuable asset to automated microbial annotation pipelines.

...read moreread less

Abstract: The quality of automated gene prediction in microbial organisms has improved steadily over the past decade, but there is still room for improvement. Increasing the number of correct identifications, both of genes and of the translation initiation sites for each gene, and reducing the overall number of false positives, are all desirable goals. With our years of experience in manually curating genomes for the Joint Genome Institute, we developed a new gene prediction algorithm called Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm). With Prodigal, we focused specifically on the three goals of improved gene structure prediction, improved translation initiation site recognition, and reduced false positives. We compared the results of Prodigal to existing gene-finding methods to demonstrate that it met each of these objectives. We built a fast, lightweight, open source gene prediction program called Prodigal http://compbio.ornl.gov/prodigal/ . Prodigal achieved good results compared to existing methods, and we believe it will be a valuable asset to automated microbial annotation pipelines.

...read moreread less

7,157 citations

Journal Article•DOI•

The genome of black cottonwood, Populus trichocarpa (Torr. & Gray)

[...]

Gerald A. Tuskan¹, Gerald A. Tuskan², Stephen P. DiFazio¹, Stephen P. DiFazio³, Stefan Jansson⁴, Joerg Bohlmann⁵, Igor V. Grigoriev⁶, Uffe Hellsten⁶, Nicholas H. Putnam⁶, Steven G. Ralph⁵, Stephane Rombauts⁷, Asaf Salamov⁶, Jacquie Schein, Lieven Sterck⁷, Andrea Aerts⁶, Rishikeshi Bhalerao⁴, Rishikesh P. Bhalerao⁸, Damien Blaudez⁹, Wout Boerjan⁷, Annick Brun⁹, Amy M. Brunner¹⁰, Victor Busov¹¹, Malcolm M. Campbell¹², John E. Carlson¹³, Michel Chalot⁹, Jarrod Chapman⁶, G.-L. Chen¹, Dawn Cooper⁵, Pedro M. Coutinho¹⁴, Jérémy Couturier⁹, Sarah F. Covert¹⁵, Quentin C. B. Cronk⁵, R. Cunningham¹, John M. Davis¹⁶, Sven Degroeve⁷, Annabelle Déjardin⁹, Claude W. dePamphilis¹³, John C. Detter⁶, Bill Dirks¹⁷, Inna Dubchak⁶, Inna Dubchak¹⁸, Sébastien Duplessis⁹, Jürgen Ehlting⁵, Brian E. Ellis⁵, Karla C Gendler¹⁹, David Goodstein⁶, Michael Gribskov²⁰, Jane Grimwood²¹, Andrew Groover²², Lee E. Gunter¹, Björn Hamberger⁵, Berthold Heinze, Yrjö Helariutta²³, Yrjö Helariutta⁸, Yrjö Helariutta²⁴, Bernard Henrissat¹⁴, D. Holligan¹⁵, Robert A. Holt, Wenyu Huang⁶, N. Islam-Faridi²², Steven J.M. Jones, M. Jones-Rhoades²⁵, Richard A. Jorgensen¹⁹, Chandrashekhar P. Joshi¹¹, Jaakko Kangasjärvi²⁴, Jan Karlsson⁴, Colin T. Kelleher⁵, Robert Kirkpatrick, Matias Kirst¹⁶, Annegret Kohler⁹, Udaya C. Kalluri¹, Frank W. Larimer¹, Jim Leebens-Mack¹⁵, Jean-Charles Leplé⁹, Philip F. LoCascio¹, Y. Lou⁶, Susan Lucas⁶, Francis Martin⁹, Barbara Montanini⁹, Carolyn A. Napoli¹⁹, David R. Nelson²⁶, C D Nelson²², Kaisa Nieminen²⁴, Ove Nilsson⁸, V. Pereda⁹, Gary F. Peter¹⁶, Ryan N. Philippe⁵, Gilles Pilate⁹, Alexander Poliakov¹⁸, J. Razumovskaya¹, Paul G. Richardson⁶, Cécile Rinaldi⁹, Kermit Ritland⁵, Pierre Rouzé⁷, D. Ryaboy¹⁸, Jeremy Schmutz²¹, J. Schrader²⁷, Bo Segerman⁴, H. Shin, Asim Siddiqui, Fredrik Sterky, Astrid Terry⁶, Chung-Jui Tsai¹¹, Edward C. Uberbacher¹, Per Unneberg, Jorma Vahala²⁴, Kerr Wall¹³, Susan R. Wessler¹⁵, Guojun Yang¹⁵, T. Yin¹, Carl J. Douglas⁵, Marco A. Marra, Göran Sandberg⁸, Y. Van de Peer⁷, Daniel S. Rokhsar⁶, Daniel S. Rokhsar¹⁷ - Show less +112 more•Institutions (27)

Oak Ridge National Laboratory¹, University of Tennessee², West Virginia University³, Umeå University⁴, University of British Columbia⁵, United States Department of Energy⁶, Ghent University⁷, Swedish University of Agricultural Sciences⁸, Institut national de la recherche agronomique⁹, Virginia Tech¹⁰, Michigan Technological University¹¹, University of Toronto¹², Pennsylvania State University¹³, University of Provence¹⁴, University of Georgia¹⁵, University of Florida¹⁶, University of California, Berkeley¹⁷, Lawrence Berkeley National Laboratory¹⁸, University of Arizona¹⁹, Purdue University²⁰, Stanford University²¹, United States Department of Agriculture²², University of Turku²³, University of Helsinki²⁴, Massachusetts Institute of Technology²⁵, University of Tennessee Health Science Center²⁶, University of Tübingen²⁷

15 Sep 2006-Science

TL;DR: The draft genome of the black cottonwood tree, Populus trichocarpa, has been reported in this paper, with more than 45,000 putative protein-coding genes identified.

...read moreread less

Abstract: We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.

...read moreread less

4,025 citations

Journal Article•DOI•

Gene and translation initiation site prediction in metagenomic sequences

[...]

Doug Hyatt¹, Philip F. LoCascio¹, Loren Hauser¹, Edward C. Uberbacher¹•Institutions (1)

University of Tennessee¹

01 Sep 2012-Bioinformatics

TL;DR: MetaProdigal is presented, a metagenomic version of the gene prediction program Prodigal that can identify genes in short, anonymous coding sequences with a high degree of accuracy and can identify sequences that use alternate genetic codes and confidence values for each gene call.

...read moreread less

Abstract: Motivation: Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. Results: We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translation initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements. Availability: The Prodigal software is freely available under the General Public License from http://code.google.com/p/prodigal/. Contact: [email protected] Supplementary Information: Supplementary data are available at Bioinformatics online.

...read moreread less

450 citations

The genome of black cottonwood, Populus trichocarpa (Torr. & Gray) - eScholarship

[...]

Gerald A. Tuskan, Stephen P. DiFazio, Stefan Jansson, Joerg Bohlmann, Igor V. Grigoriev, Uffe Hellsten, Nicholas H. Putnam, Steven G. Ralph, Stephane Rombauts, Asaf Salamov, Jacquie Schein, Lieven Sterck, Andrea Aerts, Rishikeshi Bhalerao, Rishikesh P. Bhalerao, Damien Blaudez, Wout Boerjan, Annick Brun, Amy M. Brunner, Victor Busov, Malcolm M. Campbell, John E. Carlson, Michel Chalot, Jarrod Chapman, G.-L. Chen, Dawn Cooper, Pedro M. Coutinho, Jérémy Couturier, Sarah F. Covert, Quentin C. B. Cronk, R. Cunningham, J. Davis, Sven Degroeve, Annabelle Déjardin, C. dePamphillis, John C. Detter, Bill Dirks, Inna Dubchak, Sébastien Duplessis, J. Ehiting, Brian E. Ellis, Karla C Gendler, David Goodstein, Michael Gribskov, Jane Grimwood, Andrew Groover, Lee E. Gunter, Björn Hamberger, Berthold Heinze, Yrjö Helariutta, Bernard Henrissat, D. Holligan, Robert A. Holt, Wenyu Huang, N. Islam-Faridi, Steven J.M. Jones, M. Jones-Rhoades, Richard A. Jorgensen, Chandrashekhar P. Joshi, Jaakko Kangasjärvi, Jan Karlsson, Colin T. Kelleher, Robert Kirkpatrick, Matias Kirst, Annegret Kohler, Udaya C. Kalluri, Frank W. Larimer, Jim Leebens-Mack, Jean-Charles Leplé, Philip F. LoCascio, Y. Lou, Susan Lucas, Francis Martin, Barbara Montanini, Carolyn A. Napoli, David R. Nelson, D. Nelson, Kaisa Nieminen, Ove Nilsson, Gary F. Peter, Ryan N. Philippe, Gilles Pilate, Alexander Poliakov, J. Razumovskaya, Paul G. Richardson - Show less +81 more

01 Sep 2006

TL;DR: Analyzing the draft genome of the black cottonwood tree, Populus trichocarpa, revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome.

...read moreread less

355 citations

Journal Article•DOI•

A computational pipeline for protein structure prediction and analysis at genome scale

[...]

Manesh Shah¹, Sergei Passovets², Sergei Passovets¹, Dongsup Kim¹, Kyle Ellrott², Li Wang¹, Li Wang², Inna Vokler¹, Inna Vokler², Philip F. LoCascio¹, Dong Xu², Dong Xu¹, Ying Xu², Ying Xu¹ - Show less +10 more•Institutions (2)

Oak Ridge National Laboratory¹, University of Tennessee²

12 Oct 2003-Bioinformatics

TL;DR: In this article, a threading-based protein structure prediction system called PROSPECT is presented, which consists of a dozen tools for identification of protein domains and signal peptide, protein triage to determine the protein type (membrane or globular), protein fold recognition, generation of atomic structural models, prediction result validation, etc.

...read moreread less

Abstract: Motivation Experimental techniques alone cannot keep up with the production rate of protein sequences, while computational techniques for protein structure predictions have matured to such a level to provide reliable structural characterization of proteins at large scale. Integration of multiple computational tools for protein structure prediction can complement experimental techniques. Results We present an automated pipeline for protein structure prediction. The centerpiece of the pipeline is our threading-based protein structure prediction system PROSPECT. The pipeline consists of a dozen tools for identification of protein domains and signal peptide, protein triage to determine the protein type (membrane or globular), protein fold recognition, generation of atomic structural models, prediction result validation, etc. Different processing and prediction branches are determined automatically by a prediction pipeline manager based on identified characteristics of the protein. The pipeline has been implemented to run in a heterogeneous computational environment as a client/server system with a web interface. Genome-scale applications on Caenorhabditis elegans, Pyrococcus furiosus and three cyanobacterial genomes are presented. Availability The pipeline is available at http://compbio.ornl.gov/proteinpipeline/

...read moreread less

23 citations

Cited by

PDF

Open Access

More filters

Fast parallel algorithms for short-range molecular dynamics

[...]

Steven J. Plimpton¹•Institutions (1)

Sandia National Laboratories¹

01 May 1993

TL;DR: Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems.

...read moreread less

Abstract: Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently—those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed-memory parallel machine which allows for message-passing of data between independently executing processors. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers--the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems. For large problems, the spatial algorithm achieves parallel efficiencies of 90% and a 1840-node Intel Paragon performs up to 165 faster than a single Cray C9O processor. Trade-offs between the three algorithms and guidelines for adapting them to more complex molecular dynamics simulations are also discussed.

...read moreread less

29,323 citations

Journal Article•DOI•

Prokka: Rapid Prokaryotic Genome Annotation

[...]

Torsten Seemann¹•Institutions (1)

Victorian Life Sciences Computation Initiative¹

15 Jul 2014-Bioinformatics

TL;DR: Prokka is introduced, a command line software tool to fully annotate a draft bacterial genome in about 10 min on a typical desktop computer, and produces standards-compliant output files for further analysis or viewing in genome browsers.

...read moreread less

Abstract: UNLABELLED: The multiplex capability and high yield of current day DNA-sequencing instruments has made bacterial whole genome sequencing a routine affair. The subsequent de novo assembly of reads into contigs has been well addressed. The final step of annotating all relevant genomic features on those contigs can be achieved slowly using existing web- and email-based systems, but these are not applicable for sensitive data or integrating into computational pipelines. Here we introduce Prokka, a command line software tool to fully annotate a draft bacterial genome in about 10 min on a typical desktop computer. It produces standards-compliant output files for further analysis or viewing in genome browsers. AVAILABILITY AND IMPLEMENTATION: Prokka is implemented in Perl and is freely available under an open source GPLv2 license from http://vicbioinformatics.com/.

...read moreread less

10,432 citations

Journal Article•DOI•

The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics

[...]

Brandi L. Cantarel¹, Pedro M. Coutinho², Corinne Rancurel², Thomas Bernard², Vincent Lombard², Bernard Henrissat² - Show less +2 more•Institutions (2)

University of Provence¹, Aix-Marseille University²

01 Jan 2009-Nucleic Acids Research

TL;DR: The Carbohydrate-Active Enzyme (CAZy) database is a knowledge-based resource specialized in the enzymes that build and breakdown complex carbohydrates and glycoconjugates and has been used to improve the quality of functional predictions of a number genome projects by providing expert annotation.

...read moreread less

Abstract: The Carbohydrate-Active Enzyme (CAZy) database is a knowledge-based resource specialized in the enzymes that build and breakdown complex carbohydrates and glycoconjugates. As of September 2008, the database describes the present knowledge on 113 glycoside hydrolase, 91 glycosyltransferase, 19 polysaccharide lyase, 15 carbohydrate esterase and 52 carbohydrate-binding module families. These families are created based on experimentally characterized proteins and are populated by sequences from public databases with significant similarity. Protein biochemical information is continuously curated based on the available literature and structural information. Over 6400 proteins have assigned EC numbers and 700 proteins have a PDB structure. The classification (i) reflects the structural features of these enzymes better than their sole substrate specificity, (ii) helps to reveal the evolutionary relationships between these enzymes and (iii) provides a convenient framework to understand mechanistic properties. This resource has been available for over 10 years to the scientific community, contributing to information dissemination and providing a transversal nomenclature to glycobiologists. More recently, this resource has been used to improve the quality of functional predictions of a number genome projects by providing expert annotation. The CAZy resource resides at URL: http://www.cazy.org/.

...read moreread less

6,028 citations

Journal Article•DOI•

CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

[...]

Donovan H. Parks¹, Michael Imelfort¹, Connor T. Skennerton¹, Philip Hugenholtz¹, Gene W. Tyson¹ - Show less +1 more•Institutions (1)

University of Queensland¹

01 Jul 2015-Genome Research

TL;DR: An objective measure of genome quality is proposed that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities and is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches.

...read moreread less

Abstract: Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. Although this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of “marker” genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate-, single-cell-, and metagenome-derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities.

...read moreread less

5,788 citations

Journal Article•DOI•

Genome sequence of the palaeopolyploid soybean

[...]

Jeremy Schmutz, Steven B. Cannon¹, Jessica A. Schlueter², Jessica A. Schlueter³, Jianxin Ma³, Therese Mitros⁴, William Nelson⁵, David L. Hyten¹, Qijian Song¹, Qijian Song⁶, Jay J. Thelen⁷, Jianlin Cheng⁷, Dong Xu⁷, Uffe Hellsten⁸, Gregory D. May⁹, Yeisoo Yu⁵, Tetsuya Sakurai, Taishi Umezawa, Madan K. Bhattacharyya¹⁰, Devinder Sandhu¹¹, Babu Valliyodan⁷, Erika Lindquist⁸, Myron Peto¹, David Grant¹, Shengqiang Shu⁸, David Goodstein⁸, Kerrie Barry⁸, Montona Futrell-Griggs³, Brian Abernathy³, Jianchang Du³, Zhixi Tian³, Liucun Zhu³, Navdeep Gill³, Trupti Joshi⁷, Marc Libault⁷, Ananad Sethuraman, Xue-Cheng Zhang⁷, Kazuo Shinozaki, Henry T. Nguyen⁷, Rod A. Wing⁵, Perry B. Cregan¹, James E. Specht¹², Jane Grimwood⁸, Daniel S. Rokhsar⁸, Gary Stacey⁷, Randy C. Shoemaker¹, Scott A. Jackson³ - Show less +43 more•Institutions (12)

Agricultural Research Service¹, University of North Carolina at Charlotte², Purdue University³, University of California, Berkeley⁴, University of Arizona⁵, University of Maryland, College Park⁶, University of Missouri⁷, Joint Genome Institute⁸, National Center for Genome Resources⁹, Iowa State University¹⁰, University of Wisconsin–Stevens Point¹¹, University of Nebraska–Lincoln¹²

14 Jan 2010-Nature

TL;DR: An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

...read moreread less

Abstract: Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70% more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78% of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75% of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

...read moreread less

3,743 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse