scispace - formally typeset
Open AccessJournal ArticleDOI

Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts

TLDR
The implementation of CNCI offered highly accurate classification of transcripts assembled from whole-transcriptome sequencing data in a cross-species manner, that demonstrated gene evolutionary divergence between vertebrates, and invertebrates, or between plants, and provided a long non-coding RNA catalog of orangutan.
Abstract
It is a challenge to classify protein-coding or non-coding transcripts, especially those re-constructed from high-throughput sequencing data of poorly annotated species. This study developed and evaluated a powerful signature tool, Coding-Non-Coding Index (CNCI), by profiling adjoining nucleotide triplets to effectively distinguish protein-coding and non-coding sequences independent of known annotations. CNCI is effective for classifying incomplete transcripts and sense-antisense pairs. The implementation of CNCI offered highly accurate classification of transcripts assembled from whole-transcriptome sequencing data in a cross-species manner, that demonstrated gene evolutionary divergence between vertebrates, and invertebrates, or between plants, and provided a long non-coding RNA catalog of orangutan. CNCI software is available at http://www.bioinfo.org/software/cnci.

read more

Citations
More filters
Journal ArticleDOI

CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features.

TL;DR: The coding potential calculator CPC1 is upgraded to CPC2, which runs ∼1000 times faster than CPC1 and exhibits superior accuracy compared with CPC1, especially for long non-coding transcripts.
Journal ArticleDOI

NONCODE 2016: an informative and valuable data source of long non-coding RNAs

TL;DR: In this update, NONCODE has added six new species, bringing the total to 16 species altogether and introduced three important new features: conservation annotation; the relationships between lncRNAs and diseases; and an interface to choose high-quality datasets through predicted scores, literature support and long-read sequencing method support.
Journal ArticleDOI

PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme

TL;DR: PLEK is an efficient alignment-free computational tool to distinguish lncRNAs from mRNAs in RNA-seq transcriptomes of species lacking reference genomes and is especially suitable for PacBio or 454 sequencing data and large-scale transcriptome data.
Journal ArticleDOI

Long non-coding RNAs as a source of new peptides

TL;DR: It is found that a large fraction of the lncRNAs expressed in cells from six different species is associated with ribosomes, indicating that they play an important role in de novo protein evolution.
Journal ArticleDOI

NONCODEv4: exploring the world of long non-coding RNA genes

TL;DR: This update of NONCODE expands the ncRNA data set by collection of newly identified ncRNAs from literature published in the last 2 years and integration of the latest version of RefSeq and Ensembl.
References
More filters
Journal ArticleDOI

LIBSVM: A library for support vector machines

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Journal ArticleDOI

An integrated encyclopedia of DNA elements in the human genome

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Journal ArticleDOI

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation

TL;DR: The results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.
Journal ArticleDOI

TopHat: discovering splice junctions with RNA-Seq

TL;DR: The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer.
Journal ArticleDOI

Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks

TL;DR: This protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results, which takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.
Related Papers (5)