scispace - formally typeset
Open AccessJournal ArticleDOI

A novel representation of protein sequences for prediction of subcellular location using support vector machines

Reads0
Chats0
TLDR
A novel representation of protein sequences that involves local compositions of amino acids and twin amino acids, and local frequencies of distance between successive (basic, hydrophobic, and other) amino acids is proposed.
Abstract
As the number of complete genomes rapidly increases, accurate methods to automatically predict the subcellular location of proteins are increasingly useful to help their functional annotation. In order to improve the predictive accuracy of the many prediction methods developed to date, a novel representation of protein sequences is proposed. This representation involves local compositions of amino acids and twin amino acids, and local frequencies of distance between successive (basic, hydrophobic, and other) amino acids. For calculating the local features, each sequence is split into three parts: N-terminal, middle, and C-terminal. The N-terminal part is further divided into four regions to consider ambiguity in the length and position of signal sequences. We tested this representation with support vector machines on two data sets extracted from the SWISS-PROT database. Through fivefold cross-validation tests, overall accuracies of more than 87% and 91% were obtained for eukaryotic and prokaryotic proteins, respectively. It is concluded that considering the respective features in the N-terminal, middle, and C-terminal parts is helpful to predict the subcellular location.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

PSORTb 3.0

TL;DR: This work developed PSORTb version 3.0 with improved recall, higher proteome-scale prediction coverage, and new refined localization subcategories, and evaluated the most accurate SCL predictors using 5-fold cross validation plus an independent proteomics analysis.
Journal ArticleDOI

Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms.

TL;DR: This protocol is a step-by-step guide on how to use the Web-server predictors in the Cell-PLoc package, a package of Web servers developed recently by hybridizing the 'higher level' approach with the ab initio approach.
Journal ArticleDOI

Recent progress in protein subcellular location prediction

TL;DR: The cell is deemed to be the most basic structural and functional unit of all living organisms and often is called a ‘‘building block of life’’ and playing a critical role in generating energy in the eukaryotic cell.
Journal ArticleDOI

Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization

TL;DR: A new predictor called “Plant-mPLoc” is developed by integrating the gene ontology information, functional domain information, and sequential evolutionary information through three different modes of pseudo amino acid composition that has the capacity to deal with multiple-location proteins beyond the reach of any existing predictors specialized for identifying plant protein subcellular localization.
Journal ArticleDOI

REVIEW : Recent advances in developing web-servers for predicting protein attributes

Kuo-Chen Chou, +1 more
- 28 Sep 2009 - 
TL;DR: In this minireview, a systematic introduction is presented to highlight the development of these web-servers by this group during the last three years.
References
More filters
Journal ArticleDOI

Improved tools for biological sequence comparison.

TL;DR: Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.
Journal ArticleDOI

A general method applicable to the search for similarities in the amino acid sequence of two proteins

TL;DR: A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed and it is possible to determine whether significant homology exists between the proteins to trace their possible evolutionary development.
Journal ArticleDOI

Identification of common molecular subsequences.

TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).
BookDOI

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

TL;DR: Learning with Kernels provides an introduction to SVMs and related kernel methods that provide all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms.
Journal ArticleDOI

Improved Prediction of Signal Peptides: SignalP 3.0

TL;DR: Improvements of the currently most popular method for prediction of classically secreted proteins, SignalP, which consists of two different predictors based on neural network and hidden Markov model algorithms, where both components have been updated.
Related Papers (5)