Identification of protein coding regions in RNA transcripts

doi:10.1145/2649387.2660783

Open AccessProceedings ArticleDOI

Identification of protein coding regions in RNA transcripts

- pp 588-588

TLDR

It is demonstrated that the GeneMarkS-T self-training is robust with respect to the presence of errors in assembled transcripts and accuracy of GeneMarkT in identification of protein-coding regions and, particularly, in prediction of gene starts compares favorably to other existing methods.

Abstract:

Massive parallel sequencing of RNA transcripts by the next generation technology (RNA-Seq) is a powerful method of generating critically important data for discovery of structure and function of eukaryotic genes. The transcripts may or may not carry protein-coding regions. If protein coding region is present, it should be a continuous (spliced) open reading frame. Gene finding in transcripts can be done by statistical (alignment-free) as well as by alignment based methods. We describe a new tool, GeneMarkS-T, for ab initio identification of protein-coding regions, complete or incomplete, in RNA transcripts assembled from RNA-Seq reads. Important feature of GeneMarkS-T is unsupervised estimation of parameters of the algorithm that makes unnecessary several conventional steps used in the gene prediction protocols, most importantly the manually curated preparation of training sets. We demonstrate that i/the GeneMarkS-T self-training is robust with respect to the presence of errors in assembled transcripts and ii/accuracy of GeneMarkS-T in identification of protein-coding regions and, particularly, in prediction of gene starts compares favorably to other existing methods.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification.

Manuel Tardaguila, +18 more

- 09 Feb 2018 -

Genome Research

TL;DR: SQANTI allows the user to maximize the analytical outcome of long-read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes and shows that these new transcripts have a major impact in the correct quantification of transcript levels by state-of-the-art short-read-based quantification algorithms.

...read moreread less

Journal ArticleDOI

Diversity and evolution of the emerging Pandoraviridae family

Matthieu Legendre, +13 more

- 11 Jun 2018 -

Nature Communications

TL;DR: It is suggested that de novo gene creation could contribute to the evolution of the giant pandoravirus genomes because most of the strain-specific genes have no extant homolog and exhibit statistical features comparable to intergenic regions.

...read moreread less

Journal ArticleDOI

EnTAP: Bringing faster and smarter functional annotation to non-model eukaryotic transcriptomes

Alexander Hart, +7 more

- 01 Mar 2020 -

Molecular Ecology Resources

TL;DR: EnTAP (Eukaryotic Non‐Model Transcriptome Annotation Pipeline) was designed to improve the accuracy, speed, and flexibility of functional gene annotation for de novo assembled transcriptomes in non‐model eukaryotes.

...read moreread less

Journal ArticleDOI

Plant genome and transcriptome annotations: from misconceptions to simple solutions

Marie E. Bolger, +4 more

- 05 Jan 2017 -

Briefings in Bioinformatics

TL;DR: A comprehensive review of typical ontologies to be used in the plant sciences, useful databases and resources used for functional annotation, what to expect from an annotated plant genome and a recipe and reference chart outlining typical steps used to annotate plant genomes/transcriptomes using publicly available resources are presented.

...read moreread less

Journal ArticleDOI

The transcriptome, extracellular proteome and active secretome of agroinfiltrated Nicotiana benthamiana uncover a large, diverse protease repertoire.

Friederike M. Grosse-Holz, +5 more

- 01 May 2018 -

Plant Biotechnology Journal

TL;DR: This data set increases the understanding of the plant response to agroinfiltration and indicates ways to improve a key expression platform for both plant science and molecular farming.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

RNA-Seq: a revolutionary tool for transcriptomics

Zhong Wang, +2 more

- 01 Jan 2009 -

Nature Reviews Genetics

TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.

...read moreread less

Journal ArticleDOI

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs

Daniel R. Zerbino, +1 more

- 01 May 2008 -

Genome Research

TL;DR: Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies and is in close agreement with simulated results without read-pair information.

...read moreread less

Journal ArticleDOI

Prodigal: prokaryotic gene recognition and translation initiation site identification

Doug Hyatt, +7 more

- 08 Mar 2010 -

BMC Bioinformatics

TL;DR: This work developed a new gene prediction algorithm called Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm), which achieved good results compared to existing methods, and it is believed it will be a valuable asset to automated microbial annotation pipelines.

...read moreread less

Journal ArticleDOI

De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis

Brian J. Haas, +24 more

- 01 Aug 2013 -

Nature Protocols

TL;DR: This protocol provides a workflow for genome-independent transcriptome analysis leveraging the Trinity platform and presents Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes.

...read moreread less

Journal ArticleDOI

An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs

Marilyn Kozak

- 26 Oct 1987 -

Nucleic Acids Research

TL;DR: 5'-Noncoding sequences have been compiled from 699 vertebrate mRNAs and GCCA/GCCATGG emerges as the consensus sequence for initiation of translation in vertebrates.

...read moreread less

Collapse

Related Papers (5)

Probabilistic error correction for RNA sequencing

Hai-Son Le, +4 more

- 01 May 2013 -

Nucleic Acids Research

BlockClust: efficient clustering and classification of non-coding RNAs from short read RNA-seq profiles

Pavankumar Videm, +3 more

- 15 Jun 2014 -

Bioinformatics

LncRNApred: Classification of Long Non-Coding RNAs and Protein-Coding Transcripts by the Ensemble Algorithm with a New Hybrid Feature.

Cong Pian, +6 more

- 26 May 2016 -

PLOS ONE

ORMAN: optimal resolution of ambiguous RNA-Seq multimappings in the presence of novel isoforms.

Phuong Dao, +8 more

- 01 Mar 2014 -

Bioinformatics

DeepCPP: a deep neural network based on nucleotide bias information and minimum distribution similarity feature selection for RNA coding potential prediction

Yu Zhang, +3 more

- 22 Mar 2021 -

Briefings in Bioinformatics

Identification of protein coding regions in RNA transcripts

Citations

SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification.

Diversity and evolution of the emerging Pandoraviridae family

EnTAP: Bringing faster and smarter functional annotation to non-model eukaryotic transcriptomes

Plant genome and transcriptome annotations: from misconceptions to simple solutions

The transcriptome, extracellular proteome and active secretome of agroinfiltrated Nicotiana benthamiana uncover a large, diverse protease repertoire.

References

RNA-Seq: a revolutionary tool for transcriptomics

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs

Prodigal: prokaryotic gene recognition and translation initiation site identification

De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis

An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs

Related Papers (5)

Probabilistic error correction for RNA sequencing

BlockClust: efficient clustering and classification of non-coding RNAs from short read RNA-seq profiles

LncRNApred: Classification of Long Non-Coding RNAs and Protein-Coding Transcripts by the Ensemble Algorithm with a New Hybrid Feature.

ORMAN: optimal resolution of ambiguous RNA-Seq multimappings in the presence of novel isoforms.

DeepCPP: a deep neural network based on nucleotide bias information and minimum distribution similarity feature selection for RNA coding potential prediction

Trending Questions (1)