Journal ArticleDOI
GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations
TLDR
The GenomeTools, a convenient and efficient software library and associated software tools for developing bioinformatics software intended to create, process or convert annotation graphs, strictly follow the annotation graph approach, offering a unified graph-based representation.Abstract:
Genome annotations are often published as plain text files describing genomic features and their subcomponents by an implicit annotation graph. In this paper, we present the GenomeTools, a convenient and efficient software library and associated software tools for developing bioinformatics software intended to create, process or convert annotation graphs. The GenomeTools strictly follow the annotation graph approach, offering a unified graph-based representation. This gives the developer intuitive and immediate access to genomic features and tools for their manipulation. To process large annotation sets with low memory overhead, we have designed and implemented an efficient pull-based approach for sequential processing of annotations. This allows to handle even the largest annotation sets, such as a complete catalogue of human variations. Our object-oriented C-based software library enables a developer to conveniently implement their own functionality on annotation graphs and to integrate it into larger workflows, simultaneously accessing compressed sequence data if required. The careful C implementation of the GenomeTools does not only ensure a light-weight memory footprint while allowing full sequential as well as random access to the annotation graph, but also facilitates the creation of bindings to a variety of script programming languages (like Python and Ruby) sharing the same interface.read more
Citations
More filters
Journal ArticleDOI
A chromosome conformation capture ordered sequence of the barley genome
Martin Mascher,Heidrun Gundlach,Axel Himmelbach,Sebastian Beier,Sven Twardziok,Thomas Wicker,Volodymyr Radchuk,Christoph Dockter,Pete E. Hedley,Joanne Russell,Micha Bayer,Luke Ramsay,Hui Liu,Georg Haberer,Xiao-Qi Zhang,Qisen Zhang,Roberto A. Barrero,Lin Li,Stefan Taudien,Marco Groth,Marius Felder,Alex Hastie,Hana Šimková,Helena Staňková,Jan Vrána,Saki Chan,María Muñoz-Amatriaín,Rachid Ounit,Steve Wanamaker,Dan Bolser,Christian Colmsee,Thomas Schmutzer,Lala Aliyeva-Schnorr,Stefano Grasso,Jaakko Tanskanen,Anna Chailyan,Dharanya Sampath,Darren Heavens,Leah Clissold,Sujie Cao,Brett Chapman,Fei Dai,Yong Han,Hua Li,Xuan Li,Chongyun Lin,John K. McCooke,Cong Tan,Penghao Wang,Songbo Wang,Shuya Yin,Gaofeng Zhou,Jesse Poland,Matthew I. Bellgard,Ljudmilla Borisjuk,Andreas Houben,Jaroslav Doležel,Sarah Ayling,Stefano Lonardi,Paul J. Kersey,Peter Langridge,Gary J. Muehlbauer,Matthew D. Clark,Matthew D. Clark,Mario Caccamo,Mario Caccamo,Alan H. Schulman,Klaus F. X. Mayer,Matthias Platzer,Timothy J. Close,Uwe Scholz,Mats Hansson,Guoping Zhang,Ilka Braumann,Manuel Spannagl,Chengdao Li,Chengdao Li,Chengdao Li,Robbie Waugh,Robbie Waugh,Nils Stein,Nils Stein +81 more
TL;DR: The importance of the barley reference sequence for breeding is demonstrated by inspecting the genomic partitioning of sequence variation in modern elite germplasm, highlighting regions vulnerable to genetic erosion.
Journal ArticleDOI
LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons.
Shujun Ou,Ning Jiang +1 more
TL;DR: LTR_retriever is an accurate and sensitive program that identifies LTR retrotransposons and generates nonredundant exemplars from DNA sequences for whole-genome annotation and evolutionary studies and demonstrated significant improvements by achieving high levels of sensitivity, specificity, accuracy, and precision in rice.
Journal ArticleDOI
The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution
Hélène Badouin,Jérôme Gouzy,Christopher J. Grassa,Christopher J. Grassa,Florent Murat,S. Evan Staton,Ludovic Cottret,Christine Lelandais-Brière,Gregory L. Owens,Sébastien Carrère,Baptiste Mayjonade,Ludovic Legrand,Navdeep Gill,Nolan C. Kane,Nolan C. Kane,John E. Bowers,Sariel Hübner,Sariel Hübner,Arnaud Bellec,Aurélie Bérard,Hélène Bergès,Nicolas Blanchet,Marie Claude Boniface,Dominique Brunel,Olivier Catrice,Nadia Chaidir,Nadia Chaidir,Clotilde Claudel,Cécile Donnadieu,Thomas Faraut,Ghislain Fievet,Nicolas Helmstetter,Matthew G. King,Matthew G. King,Steven J. Knapp,Zhao Lai,Marie-Christine Le Paslier,Yannick Lippi,Lolita Lorenzon,Jennifer R. Mandel,Gwenola Marage,Gwenaëlle Marchand,Elodie Marquand,Emmanuelle Bret-Mestries,Evan Morien,Savithri U. Nambeesan,Thuy Tien Nguyen,Thuy Tien Nguyen,Prune Pegot-Espagnet,Nicolas Pouilly,Frances Raftis,Erika Sallet,Thomas Schiex,Justine Thomas,Céline Vandecasteele,D. Varès,Felicity Vear,Sonia Vautrin,Martin Crespi,Brigitte Mangin,John M. Burke,Jérôme Salse,Stéphane Muños,Patrick Vincourt,Loren H. Rieseberg,Loren H. Rieseberg,Nicolas B. Langlade +66 more
TL;DR: It is found that the genomic architecture of flowering time has been shaped by the most recent whole-genome duplication, which suggests that ancient paralogues can remain in the same regulatory networks for dozens of millions of years.
Journal ArticleDOI
A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples
Samia N. Naccache,Scot Federman,Narayanan Veeraraghavan,Matei Zaharia,Deanna Lee,Erik Samayoa,Jerome Bouquet,Alexander L. Greninger,Ka Cheung Luk,Barryett Enge,Debra A. Wadford,Sharon Messenger,Gillian Genrich,Kristen Pellegrino,Gilda Grard,Eric M. Leroy,Bradley S. Schneider,Joseph N. Fair,Miguel Ángel Martínez,Pavel Isa,John A. Crump,Joseph L. DeRisi,Taylor Sittler,John Hackett,Steve Miller,Charles Y. Chiu +25 more
TL;DR: SURPI is described, a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and use of the pipeline is demonstrated in the analysis of 237 clinical samples comprising more than 1.1 billion sequences.
Journal ArticleDOI
Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
Shujun Ou,Weija Su,Yi Liao,Kapeel Chougule,Jireh Agda,Adam J. Hellinga,Carlos Santiago Blanco Lugo,Tyler A. Elliott,Doreen Ware,Doreen Ware,Thomas Peterson,Ning Jiang,Candice N. Hirsch,Matthew B. Hufford +13 more
TL;DR: A comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) is created that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements and will greatly facilitate TE annotation in eukaryotic genomes.
References
More filters
Journal ArticleDOI
MagicMatch—cross-referencing sequence identifiers across databases
TL;DR: A rapid and efficient method to map sequence identifiers across databases that uses the MD5 checksum algorithm for message integrity to generate sequence fingerprints and uses these fingerprints as hash strings to map sequences across databases.
Journal ArticleDOI
FISH Oracle: a web server for flexible visualization of DNA copy number data in a genomic context
TL;DR: FISH Oracle offers a fast and easy to use visualization tool for array CGH and SNP array data that will be instrumental to identify candidate onco and tumor suppressor genes based on the frequency and genomic position of DNA copy number changes.
Journal ArticleDOI
A New Efficient Data Structure for Storage and Retrieval of Multiple Biosequences
Sascha Steinbiss,Stefan Kurtz +1 more
TL;DR: A novel, space-efficient data structure for storing multiple biological sequences of variable alphabet size, with customizable character transformations, wildcard support, and an assortment of internal representations optimized for different distributions of wildcards and sequence lengths is presented.
Journal ArticleDOI
LTRsift: a graphical user interface for semi-automatic classification and postprocessing of de novo detected LTR retrotransposons.
TL;DR: LTRsift is a useful and convenient tool for semi-automated classification of newly detected LTR retrotransposons based on their internal features, helpful in postprocessing and refining the output of software for predicting L TR retrotranspoons up to the stage of preparing full-length reference sequence libraries.
Journal ArticleDOI
CASSys: an integrated software-system for the interactive analysis of ChIP-seq data.
TL;DR: A software system spanning all steps of ChIP-seq data analysis, which supersedes the laborious application of several single command line tools and provides functionality ranging from quality assessment and -control of short reads, over the mapping of reads against a reference genome (readmapping) and the detection of enriched regions (peakdetection) to various follow-up analyses.
Related Papers (5)
BEDTools: a flexible suite of utilities for comparing genomic features
Aaron R. Quinlan,Ira M. Hall +1 more