Prokka: Rapid Prokaryotic Genome Annotation
Reads0
Chats0
TLDR
Prokka is introduced, a command line software tool to fully annotate a draft bacterial genome in about 10 min on a typical desktop computer, and produces standards-compliant output files for further analysis or viewing in genome browsers.Abstract:
UNLABELLED: The multiplex capability and high yield of current day DNA-sequencing instruments has made bacterial whole genome sequencing a routine affair. The subsequent de novo assembly of reads into contigs has been well addressed. The final step of annotating all relevant genomic features on those contigs can be achieved slowly using existing web- and email-based systems, but these are not applicable for sensitive data or integrating into computational pipelines. Here we introduce Prokka, a command line software tool to fully annotate a draft bacterial genome in about 10 min on a typical desktop computer. It produces standards-compliant output files for further analysis or viewing in genome browsers. AVAILABILITY AND IMPLEMENTATION: Prokka is implemented in Perl and is freely available under an open source GPLv2 license from http://vicbioinformatics.com/.read more
Citations
More filters
Journal ArticleDOI
NCBI prokaryotic genome annotation pipeline
Tatiana Tatusova,Michael DiCuccio,Azat Badretdin,Vyacheslav Chetvernin,Eric P. Nawrocki,Leonid Zaslavsky,Alexandre Lomsadze,Kim D. Pruitt,Mark Borodovsky,James Ostell +9 more
TL;DR: The new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies less on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence.
Journal ArticleDOI
Roary: Rapid large-scale prokaryote pan genome analysis
Andrew J. Page,Carla A. Cummins,Martin Hunt,Vanessa K. Wong,Sandra Reuter,Matthew T. G. Holden,Maria Fookes,Daniel Falush,Jacqueline A. Keane,Julian Parkhill +9 more
TL;DR: Roary, a tool that rapidly builds large-scale pan genomes, identifying the core and accessory genes, is introduced, making construction of the pan genome of thousands of prokaryote samples possible on a standard desktop without compromising on the accuracy of results.
Journal ArticleDOI
RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes
Thomas Brettin,Thomas Brettin,James J. Davis,James J. Davis,Terry Disz,Robert Edwards,Robert Edwards,Svetlana Gerdes,Gary J. Olsen,Robert Olson,Robert Olson,Ross Overbeek,Bruce Parrello,Gordon D. Pusch,Maulik Shukla,James Thomason,Rick Stevens,Rick Stevens,Veronika Vonstein,Alice R. Wattam,Fangfang Xia,Fangfang Xia +21 more
TL;DR: The RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines and offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job.
Journal ArticleDOI
A taxonomic note on the genus Lactobacillus: Description of 23 novel genera, emended description of the genus Lactobacillus Beijerinck 1901, and union of Lactobacillaceae and Leuconostocaceae.
Jinshui Zheng,Stijn Wittouck,Elisa Salvetti,Charles M. A. P. Franz,Hugh M. B. Harris,Paola Mattarelli,Paul W. O'Toole,Bruno Pot,Peter Vandamme,Jens Walter,Koichi Watanabe,Sander Wuyts,Giovanna E. Felis,Michael G. Gänzle,Michael G. Gänzle,Sarah Lebeer +15 more
TL;DR: This study evaluated the taxonomy of Lactobacillaceae and Leuconostocaceae on the basis of whole genome sequences and proposed reclassification reflects the phylogenetic position of the micro-organisms, and groups lactobacilli into robust clades with shared ecological and metabolic properties.
Journal ArticleDOI
Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications.
TL;DR: Developments in the BIGSdb software made from publication to June 2018 are described and it is shown how the platform realises microbial population genomics for a wide range of applications.
References
More filters
Journal ArticleDOI
The Pfam protein families database
Marco Punta,Penny Coggill,Ruth Y. Eberhardt,Jaina Mistry,John Tate,Chris Boursnell,Ningze Pang,Kristoffer Forslund,Goran Ceric,Jody Clements,Andreas Heger,Liisa Holm,Erik L. L. Sonnhammer,Sean R. Eddy,Alex Bateman,Robert D. Finn +15 more
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Journal ArticleDOI
BLAST+: architecture and applications.
Christiam Camacho,George Coulouris,Vahram Avagyan,Ning Ma,Jason S. Papadopoulos,Kevin Bealer,Thomas L. Madden +6 more
TL;DR: The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences.
Journal ArticleDOI
Pfam: the protein families database.
Robert D. Finn,Alex Bateman,Jody Clements,Penelope Coggill,Ruth Y. Eberhardt,Sean R. Eddy,Andreas Heger,Kirstie Hetherington,Liisa Holm,Jaina Mistry,Erik L. L. Sonnhammer,John Tate,Marco Punta +12 more
TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.
Journal ArticleDOI
The RAST Server: Rapid Annotations using Subsystems Technology
Ramy K. Aziz,Ramy K. Aziz,Daniela Bartels,Aaron A. Best,Matthew DeJongh,Terrence Disz,Terrence Disz,Robert Edwards,Kevin Formsma,Svetlana Gerdes,Elizabeth M. Glass,Michael Kubal,Folker Meyer,Folker Meyer,Gary J. Olsen,Gary J. Olsen,Robert Olson,Robert Olson,Andrei L. Osterman,Ross Overbeek,Leslie Klis McNeil,Daniel Paarmann,Tobias Paczian,Bruce Parrello,Gordon D. Pusch,Claudia I. Reich,Rick Stevens,Rick Stevens,Olga Vassieva,Veronika Vonstein,Andreas Wilke,Olga Zagnitko +31 more
TL;DR: A fully automated service for annotating bacterial and archaeal genomes that identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user.
Journal ArticleDOI
SignalP 4.0: discriminating signal peptides from transmembrane regions
Thomas Nordahl Petersen,Søren Brunak,Søren Brunak,Gunnar von Heijne,Gunnar von Heijne,Henrik Nielsen +5 more
TL;DR: SignalP 4.0 was the best signal-peptide predictor for all three organism types but was not in all cases as good as SignalP 3.0 according to cleavage-site sensitivity or signal- peptide correlation when there are no transmembrane proteins present.