scispace - formally typeset
Journal ArticleDOI

GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations

TLDR
The GenomeTools, a convenient and efficient software library and associated software tools for developing bioinformatics software intended to create, process or convert annotation graphs, strictly follow the annotation graph approach, offering a unified graph-based representation.
Abstract
Genome annotations are often published as plain text files describing genomic features and their subcomponents by an implicit annotation graph. In this paper, we present the GenomeTools, a convenient and efficient software library and associated software tools for developing bioinformatics software intended to create, process or convert annotation graphs. The GenomeTools strictly follow the annotation graph approach, offering a unified graph-based representation. This gives the developer intuitive and immediate access to genomic features and tools for their manipulation. To process large annotation sets with low memory overhead, we have designed and implemented an efficient pull-based approach for sequential processing of annotations. This allows to handle even the largest annotation sets, such as a complete catalogue of human variations. Our object-oriented C-based software library enables a developer to conveniently implement their own functionality on annotation graphs and to integrate it into larger workflows, simultaneously accessing compressed sequence data if required. The careful C implementation of the GenomeTools does not only ensure a light-weight memory footprint while allowing full sequential as well as random access to the annotation graph, but also facilitates the creation of bindings to a variety of script programming languages (like Python and Ruby) sharing the same interface.

read more

Citations
More filters
Journal ArticleDOI

A chromosome conformation capture ordered sequence of the barley genome

Martin Mascher, +81 more
- 27 Apr 2017 - 
TL;DR: The importance of the barley reference sequence for breeding is demonstrated by inspecting the genomic partitioning of sequence variation in modern elite germplasm, highlighting regions vulnerable to genetic erosion.
Journal ArticleDOI

LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons.

TL;DR: LTR_retriever is an accurate and sensitive program that identifies LTR retrotransposons and generates nonredundant exemplars from DNA sequences for whole-genome annotation and evolutionary studies and demonstrated significant improvements by achieving high levels of sensitivity, specificity, accuracy, and precision in rice.
Journal ArticleDOI

The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution

TL;DR: It is found that the genomic architecture of flowering time has been shaped by the most recent whole-genome duplication, which suggests that ancient paralogues can remain in the same regulatory networks for dozens of millions of years.
Journal ArticleDOI

Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline

TL;DR: A comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) is created that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements and will greatly facilitate TE annotation in eukaryotic genomes.
References
More filters
Journal ArticleDOI

MagicMatch—cross-referencing sequence identifiers across databases

TL;DR: A rapid and efficient method to map sequence identifiers across databases that uses the MD5 checksum algorithm for message integrity to generate sequence fingerprints and uses these fingerprints as hash strings to map sequences across databases.
Journal ArticleDOI

FISH Oracle: a web server for flexible visualization of DNA copy number data in a genomic context

TL;DR: FISH Oracle offers a fast and easy to use visualization tool for array CGH and SNP array data that will be instrumental to identify candidate onco and tumor suppressor genes based on the frequency and genomic position of DNA copy number changes.
Journal ArticleDOI

A New Efficient Data Structure for Storage and Retrieval of Multiple Biosequences

TL;DR: A novel, space-efficient data structure for storing multiple biological sequences of variable alphabet size, with customizable character transformations, wildcard support, and an assortment of internal representations optimized for different distributions of wildcards and sequence lengths is presented.
Journal ArticleDOI

LTRsift: a graphical user interface for semi-automatic classification and postprocessing of de novo detected LTR retrotransposons.

TL;DR: LTRsift is a useful and convenient tool for semi-automated classification of newly detected LTR retrotransposons based on their internal features, helpful in postprocessing and refining the output of software for predicting L TR retrotranspoons up to the stage of preparing full-length reference sequence libraries.
Journal ArticleDOI

CASSys: an integrated software-system for the interactive analysis of ChIP-seq data.

TL;DR: A software system spanning all steps of ChIP-seq data analysis, which supersedes the laborious application of several single command line tools and provides functionality ranging from quality assessment and -control of short reads, over the mapping of reads against a reference genome (readmapping) and the detection of enriched regions (peakdetection) to various follow-up analyses.
Related Papers (5)