A novel representation of protein sequences for prediction of subcellular location using support vector machines

doi:10.1110/PS.051597405

Open AccessJournal ArticleDOI

A novel representation of protein sequences for prediction of subcellular location using support vector machines

Setsuro Matsuda, +5 more

- 01 Nov 2005 -

Protein Science

- Vol. 14, Iss: 11, pp 2804-2813

Chats0

TLDR

A novel representation of protein sequences that involves local compositions of amino acids and twin amino acids, and local frequencies of distance between successive (basic, hydrophobic, and other) amino acids is proposed.

Abstract:

As the number of complete genomes rapidly increases, accurate methods to automatically predict the subcellular location of proteins are increasingly useful to help their functional annotation. In order to improve the predictive accuracy of the many prediction methods developed to date, a novel representation of protein sequences is proposed. This representation involves local compositions of amino acids and twin amino acids, and local frequencies of distance between successive (basic, hydrophobic, and other) amino acids. For calculating the local features, each sequence is split into three parts: N-terminal, middle, and C-terminal. The N-terminal part is further divided into four regions to consider ambiguity in the length and position of signal sequences. We tested this representation with support vector machines on two data sets extracted from the SWISS-PROT database. Through fivefold cross-validation tests, overall accuracies of more than 87% and 91% were obtained for eukaryotic and prokaryotic proteins, respectively. It is concluded that considering the respective features in the N-terminal, middle, and C-terminal parts is helpful to predict the subcellular location.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

PSORTb 3.0

Nancy Yiu-Lin Yu, +10 more

- 01 Jul 2010 -

Bioinformatics

TL;DR: This work developed PSORTb version 3.0 with improved recall, higher proteome-scale prediction coverage, and new refined localization subcategories, and evaluated the most accurate SCL predictors using 5-fold cross validation plus an independent proteomics analysis.

...read moreread less

Journal ArticleDOI

Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms.

Kuo-Chen Chou, +1 more

- 01 Jan 2008 -

Nature Protocols

TL;DR: This protocol is a step-by-step guide on how to use the Web-server predictors in the Cell-PLoc package, a package of Web servers developed recently by hybridizing the 'higher level' approach with the ab initio approach.

...read moreread less

Journal ArticleDOI

Recent progress in protein subcellular location prediction

Kuo-Chen Chou, +1 more

- 01 Nov 2007 -

Analytical Biochemistry

TL;DR: The cell is deemed to be the most basic structural and functional unit of all living organisms and often is called a ‘‘building block of life’’ and playing a critical role in generating energy in the eukaryotic cell.

...read moreread less

Journal ArticleDOI

Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization

Kuo-Chen Chou, +1 more

- 28 Jun 2010 -

PLOS ONE

TL;DR: A new predictor called “Plant-mPLoc” is developed by integrating the gene ontology information, functional domain information, and sequential evolutionary information through three different modes of pseudo amino acid composition that has the capacity to deal with multiple-location proteins beyond the reach of any existing predictors specialized for identifying plant protein subcellular localization.

...read moreread less

Journal ArticleDOI

REVIEW : Recent advances in developing web-servers for predicting protein attributes

Kuo-Chen Chou, +1 more

- 28 Sep 2009 -

Natural Science

TL;DR: In this minireview, a systematic introduction is presented to highlight the development of these web-servers by this group during the last three years.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Improved tools for biological sequence comparison.

William R. Pearson, +1 more

- 01 Apr 1988 -

Proceedings of the National Academy of S...

TL;DR: Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.

...read moreread less

Journal ArticleDOI

A general method applicable to the search for similarities in the amino acid sequence of two proteins

Saul B. Needleman, +1 more

- 28 Mar 1970 -

Journal of Molecular Biology

TL;DR: A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed and it is possible to determine whether significant homology exists between the proteins to trace their possible evolutionary development.

...read moreread less

Journal ArticleDOI

Identification of common molecular subsequences.

Temple F. Smith, +1 more

- 25 Mar 1981 -

Journal of Molecular Biology

TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).

...read moreread less

BookDOI

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Bernhard Schölkopf, +1 more

TL;DR: Learning with Kernels provides an introduction to SVMs and related kernel methods that provide all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms.

...read moreread less

Journal ArticleDOI

Improved Prediction of Signal Peptides: SignalP 3.0

Jannick Dyrløv Bendtsen, +3 more

- 16 Jul 2004 -

Journal of Molecular Biology

TL;DR: Improvements of the currently most popular method for prediction of classically secreted proteins, SignalP, which consists of two different predictors based on neural network and hidden Markov model algorithms, where both components have been updated.

...read moreread less