GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations

doi:10.1109/TCBB.2013.68

Journal ArticleDOI

GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations

Gordon Gremme, +2 more

- 01 May 2013 -

IEEE/ACM Transactions on Computational B...

- Vol. 10, Iss: 3, pp 645-656

TLDR

The GenomeTools, a convenient and efficient software library and associated software tools for developing bioinformatics software intended to create, process or convert annotation graphs, strictly follow the annotation graph approach, offering a unified graph-based representation.

Abstract:

Genome annotations are often published as plain text files describing genomic features and their subcomponents by an implicit annotation graph. In this paper, we present the GenomeTools, a convenient and efficient software library and associated software tools for developing bioinformatics software intended to create, process or convert annotation graphs. The GenomeTools strictly follow the annotation graph approach, offering a unified graph-based representation. This gives the developer intuitive and immediate access to genomic features and tools for their manipulation. To process large annotation sets with low memory overhead, we have designed and implemented an efficient pull-based approach for sequential processing of annotations. This allows to handle even the largest annotation sets, such as a complete catalogue of human variations. Our object-oriented C-based software library enables a developer to conveniently implement their own functionality on annotation graphs and to integrate it into larger workflows, simultaneously accessing compressed sequence data if required. The careful C implementation of the GenomeTools does not only ensure a light-weight memory footprint while allowing full sequential as well as random access to the annotation graph, but also facilitates the creation of bindings to a variety of script programming languages (like Python and Ruby) sharing the same interface.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A chromosome conformation capture ordered sequence of the barley genome

Martin Mascher, +81 more

- 27 Apr 2017 -

Nature

TL;DR: The importance of the barley reference sequence for breeding is demonstrated by inspecting the genomic partitioning of sequence variation in modern elite germplasm, highlighting regions vulnerable to genetic erosion.

...read moreread less

Journal ArticleDOI

LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons.

Shujun Ou, +1 more

- 12 Dec 2017 -

Plant Physiology

TL;DR: LTR_retriever is an accurate and sensitive program that identifies LTR retrotransposons and generates nonredundant exemplars from DNA sequences for whole-genome annotation and evolutionary studies and demonstrated significant improvements by achieving high levels of sensitivity, specificity, accuracy, and precision in rice.

...read moreread less

Journal ArticleDOI

The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution

Hélène Badouin, +66 more

- 22 May 2017 -

Nature

TL;DR: It is found that the genomic architecture of flowering time has been shaped by the most recent whole-genome duplication, which suggests that ancient paralogues can remain in the same regulatory networks for dozens of millions of years.

...read moreread less

Journal ArticleDOI

A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples

Samia N. Naccache, +25 more

- 01 Jul 2014 -

Genome Research

TL;DR: SURPI is described, a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and use of the pipeline is demonstrated in the analysis of 237 clinical samples comprising more than 1.1 billion sequences.

...read moreread less

Journal ArticleDOI

Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline

Shujun Ou, +13 more

- 16 Dec 2019 -

Genome Biology

TL;DR: A comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) is created that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements and will greatly facilitate TE annotation in eukaryotic genomes.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

MagicMatch—cross-referencing sequence identifiers across databases

Mike L. Smith, +4 more

- 15 Aug 2005 -

Bioinformatics

TL;DR: A rapid and efficient method to map sequence identifiers across databases that uses the MD5 checksum algorithm for message integrity to generate sequence fingerprints and uses these fingerprints as hash strings to map sequences across databases.

...read moreread less

Journal ArticleDOI

FISH Oracle: a web server for flexible visualization of DNA copy number data in a genomic context

Malte Mader, +3 more

- 28 Jul 2011 -

Journal of Clinical Bioinformatics

TL;DR: FISH Oracle offers a fast and easy to use visualization tool for array CGH and SNP array data that will be instrumental to identify candidate onco and tumor suppressor genes based on the frequency and genomic position of DNA copy number changes.

...read moreread less

Journal ArticleDOI

A New Efficient Data Structure for Storage and Retrieval of Multiple Biosequences

Sascha Steinbiss, +1 more

- 01 Mar 2012 -

IEEE/ACM Transactions on Computational B...

TL;DR: A novel, space-efficient data structure for storing multiple biological sequences of variable alphabet size, with customizable character transformations, wildcard support, and an assortment of internal representations optimized for different distributions of wildcards and sequence lengths is presented.

...read moreread less

Journal ArticleDOI

LTRsift: a graphical user interface for semi-automatic classification and postprocessing of de novo detected LTR retrotransposons.

Sascha Steinbiss, +2 more

- 07 Nov 2012 -

Mobile Dna

TL;DR: LTRsift is a useful and convenient tool for semi-automated classification of newly detected LTR retrotransposons based on their internal features, helpful in postprocessing and refining the output of software for predicting L TR retrotranspoons up to the stage of preparing full-length reference sequence libraries.

...read moreread less

Journal ArticleDOI

CASSys: an integrated software-system for the interactive analysis of ChIP-seq data.

Malik Alawi, +2 more

- 21 Jun 2011 -

Journal of Integrative Bioinformatics

TL;DR: A software system spanning all steps of ChIP-seq data analysis, which supersedes the laborious application of several single command line tools and provides functionality ranging from quality assessment and -control of short reads, over the mapping of reads against a reference genome (readmapping) and the detection of enriched regions (peakdetection) to various follow-up analyses.

...read moreread less