scispace - formally typeset
Search or ask a question
Journal ArticleDOI

RATT: Rapid Annotation Transfer Tool

01 May 2011-Nucleic Acids Research (Oxford University Press)-Vol. 39, Iss: 9
TL;DR: A method to rapidly provide accurate annotation for new genomes using previously annotated genomes as a reference, implemented in a tool called RATT (Rapid Annotation Transfer Tool), transfers annotations from a high-quality reference to a new genome on the basis of conserved synteny.
Abstract: Second-generation sequencing technologies have made large-scale sequencing projects commonplace. However, making use of these datasets often requires gene function to be ascribed genome wide. Although tool development has kept pace with the changes in sequence production, for tasks such as mapping, de novo assembly or visualization, genome annotation remains a challenge. We have developed a method to rapidly provide accurate annotation for new genomes using previously annotated genomes as a reference. The method, implemented in a tool called RATT (Rapid Annotation Transfer Tool), transfers annotations from a high-quality reference to a new genome on the basis of conserved synteny. We demonstrate that a Mycobacterium tuberculosis genome or a single 2.5 Mb chromosome from a malaria parasite can be annotated in less than five minutes with only modest computational resources. RATT is available at http://ratt.sourceforge.net.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
11 Apr 2018-Nature
TL;DR: Whole-genome sequencing and phenotyping of 1,011 natural isolates of the yeast Saccharomyces cerevisiae reveal its evolutionary history, including a single out-of-China origin and multiple domestication events, and provides a framework for genotype–phenotype studies in this model organism.
Abstract: Large-scale population genomic surveys are essential to explore the phenotypic diversity of natural populations. Here we report the whole-genome sequencing and phenotyping of 1,011 Saccharomyces cerevisiae isolates, which together provide an accurate evolutionary picture of the genomic variants that shape the species-wide phenotypic landscape of this yeast. Genomic analyses support a single ‘out-of-China’ origin for this species, followed by several independent domestication events. Although domesticated isolates exhibit high variation in ploidy, aneuploidy and genome content, genome evolution in wild isolates is mainly driven by the accumulation of single nucleotide polymorphisms. A common feature is the extensive loss of heterozygosity, which represents an essential source of inter-individual variation in this mainly asexual species. Most of the single nucleotide polymorphisms, including experimentally identified functional polymorphisms, are present at very low frequencies. The largest numbers of variants identified by genome-wide association are copy-number changes, which have a greater phenotypic effect than do single nucleotide polymorphisms. This resource will guide future population genomics and genotype–phenotype studies in this classic model system. Whole-genome sequencing of 1,011 natural isolates of the yeast Saccharomyces cerevisiae reveals its evolutionary history, including a single out-of-China origin and multiple domestication events, and provides a framework for genotype–phenotype studies in this model organism.

727 citations

Journal ArticleDOI
TL;DR: WormBase ParaSite (http://parasite.wormbase.org) as mentioned in this paper is a portal for the analysis of helminth genomic data, including worms and platy helminths.

455 citations

Journal ArticleDOI
TL;DR: It is demonstrated that there is little variation in unique gene content across Leishmania species, but large-scale genetic heterogeneity can result through gene amplification on disomic chromosomes and variation in chromosome number.
Abstract: Leishmania parasites cause a spectrum of clinical pathology in humans ranging from disfiguring cutaneous lesions to fatal visceral leishmaniasis. We have generated a reference genome for Leishmania mexicana and refined the reference genomes for Leishmania major, Leishmania infantum, and Leishmania braziliensis. This has allowed the identification of a remarkably low number of genes or paralog groups (2, 14, 19, and 67, respectively) unique to one species. These were found to be conserved in additional isolates of the same species. We have predicted allelic variation and find that in these isolates, L. major and L. infantum have a surprisingly low number of predicted heterozygous SNPs compared with L. braziliensis and L. mexicana. We used short read coverage to infer ploidy and gene copy numbers, identifying large copy number variations between species, with 200 tandem gene arrays in L. major and 132 in L. mexicana. Chromosome copy number also varied significantly between species, with nine supernumerary chromosomes in L. infantum, four in L. mexicana, two in L. braziliensis, and one in L. major. A significant bias against gene arrays on supernumerary chromosomes was shown to exist, indicating that duplication events occur more frequently on disomic chromosomes. Taken together, our data demonstrate that there is little variation in unique gene content across Leishmania species, but large-scale genetic heterogeneity can result through gene amplification on disomic chromosomes and variation in chromosome number. Increased gene copy number due to chromosome amplification may contribute to alterations in gene expression in response to environmental conditions in the host, providing a genetic basis for disease tropism.

397 citations


Cites methods from "RATT: Rapid Annotation Transfer Too..."

  • ...…annotation, based on predicted orthology, were transferred from L. major to the L. mexicana U1103 genome using the Rapid Annotation Transfer Tool (Otto et al. 2011), and thereafter manually annotated using codon bias and BLAST searches against the NCBI nr database as a guide for gene prediction....

    [...]

  • ...mexicana U1103 genome using the Rapid Annotation Transfer Tool (Otto et al. 2011), and thereafter manually annotated using codon bias and BLAST searches against the NCBI nr database as a guide for gene prediction....

    [...]

  • ...The gene models and functional annotation, based on predicted orthology, were transferred from L. major to the L. mexicana U1103 genome using the Rapid Annotation Transfer Tool (Otto et al. 2011), and thereafter manually annotated using codon bias and BLAST searches against the NCBI nr database as a guide for gene prediction....

    [...]

Journal ArticleDOI
TL;DR: The upgraded genome of Schistosoma mansoni will form a fundamental dataset to underpin further advances in schistosome research and is consolidated into a searchable format within the GeneDB and SchistoDB databases.
Abstract: Schistosomiasis is one of the most prevalent parasitic diseases, affecting millions of people in developing countries. Amongst the human-infective species, Schistosoma mansoni is also the most commonly used in the laboratory and here we present the systematic improvement of its draft genome. We used Sanger capillary and deep-coverage Illumina sequencing from clonal worms to upgrade the highly fragmented draft 380 Mb genome to one with only 885 scaffolds and more than 81% of the bases organised into chromosomes. We have also used transcriptome sequencing (RNA-seq) from four time points in the parasite's life cycle to refine gene predictions and profile their expression. More than 45% of predicted genes have been extensively modified and the total number has been reduced from 11,807 to 10,852. Using the new version of the genome, we identified trans-splicing events occurring in at least 11% of genes and identified clear cases where it is used to resolve polycistronic transcripts. We have produced a high-resolution map of temporal changes in expression for 9,535 genes, covering an unprecedented dynamic range for this organism. All of these data have been consolidated into a searchable format within the GeneDB (www.genedb.org) and SchistoDB (www.schistodb.net) databases. With further transcriptional profiling and genome sequencing increasingly accessible, the upgraded genome will form a fundamental dataset to underpin further advances in schistosome research.

393 citations


Cites methods from "RATT: Rapid Annotation Transfer Too..."

  • ...To transfer the existing annotation to the latest reference we used RATT [22] (with the old assembly split into four parts and using options –q and –r) to define regions with synteny between Author Summary Schistosomiasis is a disease caused by parasitic blood flukes of the genus Schistosoma....

    [...]

  • ...To transfer the existing annotation to the latest reference we used RATT [22] (with the old assembly split into four parts and using options –q and –r) to define regions with synteny between Author Summary...

    [...]

  • ...Gene models were migrated from previous version using RATT [22]....

    [...]

  • ...(PDF) Figure S4 Plot showing (A) transcript length and (B) number of exons for the three different categories of gene models transfered using the Rapid Annnotation Transfer Tool (RATT)....

    [...]

Journal ArticleDOI
TL;DR: It is shown that whole-genome sequence data reveals genetic structure within these lines not shown by multilocus typing, and suggests that drug resistance has emerged multiple times in this closely related set of lines, providing additional power to track the drug resistance and epidemiology of an important human pathogen.
Abstract: Visceral leishmaniasis is a potentially fatal disease endemic to large parts of Asia and Africa, primarily caused by the protozoan parasite Leishmania donovani. Here, we report a high-quality reference genome sequence for a strain of L. donovani from Nepal, and use this sequence to study variation in a set of 16 related clinical lines, isolated from visceral leishmaniasis patients from the same region, which also differ in their response to in vitro drug susceptibility. We show that whole-genome sequence data reveals genetic structure within these lines not shown by multilocus typing, and suggests that drug resistance has emerged multiple times in this closely related set of lines. Sequence comparisons with other Leishmania species and analysis of single-nucleotide diversity within our sample showed evidence of selection acting in a range of surface- and transport-related genes, including genes associated with drug resistance. Against a background of relative genetic homogeneity, we found extensive variation in chromosome copy number between our lines. Other forms of structural variation were significantly associated with drug resistance, notably including gene dosage and the copy number of an experimentally verified circular episome present in all lines and described here for the first time. This study provides a basis for more powerful molecular profiling of visceral leishmaniasis, providing additional power to track the drug resistance and epidemiology of an important human pathogen.

387 citations


Cites methods from "RATT: Rapid Annotation Transfer Too..."

  • ...donovani based on sequence conservation and synteny with RATT (Rapid Annotation Transfer Tool) (Otto et al. 2011)....

    [...]

  • ...L. infantum gene models ( June 2010) were transferred to L. donovani based on sequence conservation and synteny with RATT (Rapid Annotation Transfer Tool) (Otto et al. 2011)....

    [...]

  • ...Of the 8395 genes in the L. infantum genome (Peacock et al. 2007), 8252 were trans- ferred to L. donovani using RATT (Otto et al. 2011)....

    [...]

  • ...RATT: rapid annotation transfer tool....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations


"RATT: Rapid Annotation Transfer Too..." refers background in this paper

  • ...Major advances have been made in data processing, particularly with the development of numerous algorithms for assembly (3,4) and alignment of short reads against a reference sequence (5,6), known as mapping....

    [...]

Journal ArticleDOI
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

35,225 citations


"RATT: Rapid Annotation Transfer Too..." refers methods in this paper

  • ...Functions are ascribed at levels of granularity that reflect the strength of sequence-similarity based evidence and are recorded as free-text descriptions or by using controlled vocabulary terms chosen from an ontology such as the Gene Ontology (9)....

    [...]

Journal ArticleDOI
TL;DR: A fully automated service for annotating bacterial and archaeal genomes that identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user.
Abstract: The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them. We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment. The service normally makes the annotated genome available within 12–24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service. By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.

9,397 citations


"RATT: Rapid Annotation Transfer Too..." refers background in this paper

  • ...For instance, the RAST Server (14) or the integrated microbial genome system (15) but these are currently restricted to prokaryotes....

    [...]

Journal ArticleDOI
TL;DR: Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies and is in close agreement with simulated results without read-pair information.
Abstract: We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25-50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length, up to 50-kb N50 length in simulations of prokaryotic data and 3-kb N50 on simulated mammalian BACs. When applied to real Solexa data sets without read pairs, Velvet generated contigs of approximately 8 kb in a prokaryote and 2 kb in a mammalian BAC, in close agreement with our simulated results without read-pair information. Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.

9,389 citations


"RATT: Rapid Annotation Transfer Too..." refers background in this paper

  • ...Major advances have been made in data processing, particularly with the development of numerous algorithms for assembly (3,4) and alignment of short reads against a reference sequence (5,6), known as mapping....

    [...]

Journal ArticleDOI
TL;DR: The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes.
Abstract: The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. Two new graphical viewing tools provide alternative ways to analyze genome alignments. The new system is the first version of MUMmer to be released as open-source software. This allows other developers to contribute to the code base and freely redistribute the code. The MUMmer sources are available at http://www.tigr.org/software/mummer.

4,886 citations


"RATT: Rapid Annotation Transfer Too..." refers methods in this paper

  • ...First, two sequences are compared using ‘nucmer’ from the MUMmer package (17) to define sequence regions that share synteny....

    [...]