Identification of protein coding regions in RNA transcripts
Shiyuyun Tang,Alexandre Lomsadze,Mark Borodovsky +2 more
- pp 588-588
TLDR
It is demonstrated that the GeneMarkS-T self-training is robust with respect to the presence of errors in assembled transcripts and accuracy of GeneMarkT in identification of protein-coding regions and, particularly, in prediction of gene starts compares favorably to other existing methods.Abstract:
Massive parallel sequencing of RNA transcripts by the next generation technology (RNA-Seq) is a powerful method of generating critically important data for discovery of structure and function of eukaryotic genes. The transcripts may or may not carry protein-coding regions. If protein coding region is present, it should be a continuous (spliced) open reading frame. Gene finding in transcripts can be done by statistical (alignment-free) as well as by alignment based methods. We describe a new tool, GeneMarkS-T, for ab initio identification of protein-coding regions, complete or incomplete, in RNA transcripts assembled from RNA-Seq reads. Important feature of GeneMarkS-T is unsupervised estimation of parameters of the algorithm that makes unnecessary several conventional steps used in the gene prediction protocols, most importantly the manually curated preparation of training sets. We demonstrate that i/the GeneMarkS-T self-training is robust with respect to the presence of errors in assembled transcripts and ii/accuracy of GeneMarkS-T in identification of protein-coding regions and, particularly, in prediction of gene starts compares favorably to other existing methods.read more
Citations
More filters
Journal ArticleDOI
SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification.
Manuel Tardaguila,Lorena de la Fuente,Cristina Martí,Cécile Pereira,Francisco Jose Pardo-Palacios,Hector del Risco,Marc Ferrell,Maravillas Mellado,Marissa Macchietto,Kenneth Verheggen,Mariola J. Edelmann,Iakes Ezkurdia,Jesús Vázquez,Michael L. Tress,Ali Mortazavi,Lennart Martens,Susana Rodríguez-Navarro,Victoria Moreno-Manzano,Ana Conesa +18 more
TL;DR: SQANTI allows the user to maximize the analytical outcome of long-read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes and shows that these new transcripts have a major impact in the correct quantification of transcript levels by state-of-the-art short-read-based quantification algorithms.
Journal ArticleDOI
Diversity and evolution of the emerging Pandoraviridae family
Matthieu Legendre,Elisabeth Fabre,Olivier Poirot,Sandra Jeudy,Audrey Lartigue,Jean-Marie Alempic,Laure Beucher,Nadège Philippe,Lionel Bertaux,Eugene Christo-Foroux,Karine Labadie,Yohann Couté,Chantal Abergel,Jean-Michel Claverie +13 more
TL;DR: It is suggested that de novo gene creation could contribute to the evolution of the giant pandoravirus genomes because most of the strain-specific genes have no extant homolog and exhibit statistical features comparable to intergenic regions.
Journal ArticleDOI
EnTAP: Bringing faster and smarter functional annotation to non-model eukaryotic transcriptomes
Alexander Hart,Samuel Ginzburg,Muyang Sam Xu,Cera R Fisher,Nasim Rahmatpour,Jeffry B. Mitton,Robin Paul,Jill L. Wegrzyn +7 more
TL;DR: EnTAP (Eukaryotic Non‐Model Transcriptome Annotation Pipeline) was designed to improve the accuracy, speed, and flexibility of functional gene annotation for de novo assembled transcriptomes in non‐model eukaryotes.
Journal ArticleDOI
Plant genome and transcriptome annotations: from misconceptions to simple solutions
TL;DR: A comprehensive review of typical ontologies to be used in the plant sciences, useful databases and resources used for functional annotation, what to expect from an annotated plant genome and a recipe and reference chart outlining typical steps used to annotate plant genomes/transcriptomes using publicly available resources are presented.
Journal ArticleDOI
The transcriptome, extracellular proteome and active secretome of agroinfiltrated Nicotiana benthamiana uncover a large, diverse protease repertoire.
Friederike M. Grosse-Holz,Steven L. Kelly,Svenja Blaskowski,Farnusch Kaschani,Markus Kaiser,Renier A. L. van der Hoorn +5 more
TL;DR: This data set increases the understanding of the plant response to agroinfiltration and indicates ways to improve a key expression platform for both plant science and molecular farming.
References
More filters
Journal ArticleDOI
RNA-Seq: a revolutionary tool for transcriptomics
TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.
Journal ArticleDOI
Velvet: Algorithms for de novo short read assembly using de Bruijn graphs
Daniel R. Zerbino,Ewan Birney +1 more
TL;DR: Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies and is in close agreement with simulated results without read-pair information.
Journal ArticleDOI
Prodigal: prokaryotic gene recognition and translation initiation site identification
Doug Hyatt,Doug Hyatt,Gwo Liang Chen,Philip F. LoCascio,Miriam Land,Frank W. Larimer,Frank W. Larimer,Loren Hauser +7 more
TL;DR: This work developed a new gene prediction algorithm called Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm), which achieved good results compared to existing methods, and it is believed it will be a valuable asset to automated microbial annotation pipelines.
Journal ArticleDOI
De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis
Brian J. Haas,Alexie Papanicolaou,Moran Yassour,Moran Yassour,Manfred Grabherr,Philip D. Blood,Joshua C. Bowden,M. B. Couger,David Eccles,Bo Li,Matthias Lieber,Matthew D. MacManes,Michael Ott,Joshua Orvis,Nathalie Pochet,Nathalie Pochet,Francesco Strozzi,Nathan T. Weeks,Rick Westerman,Thomas William,Colin N. Dewey,Robert Henschel,Richard D. LeDuc,Nir Friedman,Aviv Regev +24 more
TL;DR: This protocol provides a workflow for genome-independent transcriptome analysis leveraging the Trinity platform and presents Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes.
Journal ArticleDOI
An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs
TL;DR: 5'-Noncoding sequences have been compiled from 699 vertebrate mRNAs and GCCA/GCCATGG emerges as the consensus sequence for initiation of translation in vertebrates.