scispace - formally typeset
Search or ask a question
Topic

Pseudo amino acid composition

About: Pseudo amino acid composition is a research topic. Over the lifetime, 460 publications have been published within this topic receiving 38733 citations.


Papers
More filters
Journal ArticleDOI
15 May 2001-Proteins
TL;DR: A remarkable improvement in prediction quality has been observed by using the pseudo‐amino acid composition and its mathematical framework and biochemical implication may also have a notable impact on improving the prediction quality of other protein features.
Abstract: The cellular attributes of a protein, such as which compartment of a cell it belongs to and how it is associated with the lipid bilayer of an organelle, are closely correlated with its biological functions. The success of human genome project and the rapid increase in the number of protein sequences entering into data bank have stimulated a challenging frontier: How to develop a fast and accurate method to predict the cellular attributes of a protein based on its amino acid sequence? The existing algorithms for predicting these attributes were all based on the amino acid composition in which no sequence order effect was taken into account. To improve the prediction quality, it is necessary to incorporate such an effect. However, the number of possible patterns for protein sequences is extremely large, which has posed a formidable difficulty for realizing this goal. To deal with such a difficulty, the pseudo-amino acid composition is introduced. It is a combination of a set of discrete sequence correlation factors and the 20 components of the conventional amino acid composition. A remarkable improvement in prediction quality has been observed by using the pseudo-amino acid composition. The success rates of prediction thus obtained are so far the highest for the same classification schemes and same data sets. It has not escaped from our notice that the concept of pseudo-amino acid composition as well as its mathematical framework and biochemical implication may also have a notable impact on improving the prediction quality of other protein features.

1,731 citations

Journal ArticleDOI
TL;DR: This review is to discuss each of the five procedures of the introduction of pseudo amino acid composition (PseAAC), its different modes and applications as well as its recent development, particularly in how to use the general formulation of PseAAC to reflect the core and essential features that are deeply hidden in complicated protein sequences.

1,163 citations

Journal ArticleDOI
TL;DR: The very high success rate for both the training- set proteins and the testing-set proteins, which has been further validated by a simulated analysis and a jackknife analysis, indicates that it is possible to predict the structural class of a protein according to its amino acid composition if an ideal and complete database can be established.
Abstract: A protein is usually classified into one of the following five struc- tural classes: a!, j3, a! +j3, a!/j3, and ( (irregular). The structural class of aprotein is correlated with its amino acid composition. However, given the amino acid composition of aprotein, how may one predict its structural class? Various efforts have been made in addressing this problem. This review addresses the progress in this field, with the focus on the state of the art, which is featured by a novel prediction algorithm and a recently developed database. The novel algorithm is characterized by a covariance matrix that takes into account the coupling effect among different amino acid components of a protein. The new database was established based on the requirement that the classes should have (1) as many nonhomologous structures as possible, (2) good quality structure, and (3) typical or distinguishable features for each of the structural classes concerned. The very high success rate for both the training-set proteins and the testing-set proteins, which has been further validated by a simulated analysis and a jackknife analysis, indicates that it is possible to predict the structural class of a protein according to its amino acid composition if an ideal and complete database can be established. It also suggests that the overall fold of a protein is basically determined by its amino acid composition.

1,055 citations

Journal ArticleDOI
TL;DR: Support Vector Machine has been introduced to predict the subcellular localization of proteins from their amino acid compositions and can be a complementary method to other existing methods based on sorting signals.
Abstract: Motivation: Subcellular localization is a key functional characteristic of proteins. A fully automatic and reliable prediction system for protein subcellular localization is needed, especially for the analysis of large-scale genome sequences. Results: In this paper, Support Vector Machine has been introduced to predict the subcellular localization of proteins from their amino acid compositions. The total prediction accuracies reach 91.4% for three subcellular locations in prokaryotic organisms and 79.4% for four locations in eukaryotic organisms. Predictions by our approach are robust to errors in the protein N-terminal sequences. This new approach provides superior prediction performance compared with existing algorithms based on amino acid composition and can be a complementary method to other existing methods based on sorting signals. Availability: A web server implementing the prediction method is available at http://www.bioinfo.tsinghua.edu.cn/ SubLoc/.

871 citations

Journal ArticleDOI
TL;DR: This protocol is a step-by-step guide on how to use the Web-server predictors in the Cell-PLoc package, a package of Web servers developed recently by hybridizing the 'higher level' approach with the ab initio approach.
Abstract: Information on subcellular localization of proteins is important to molecular cell biology, proteomics, system biology and drug discovery. To provide the vast majority of experimental scientists with a user-friendly tool in these areas, we present a package of Web servers developed recently by hybridizing the 'higher level' approach with the ab initio approach. The package is called Cell-PLoc and contains the following six predictors: Euk-mPLoc, Hum-mPLoc, Plant-PLoc, Gpos-PLoc, Gneg-PLoc and Virus-PLoc, specialized for eukaryotic, human, plant, Gram-positive bacterial, Gram-negative bacterial and viral proteins, respectively. Using these Web servers, one can easily get the desired prediction results with a high expected accuracy, as demonstrated by a series of cross-validation tests on the benchmark data sets that covered up to 22 subcellular location sites and in which none of the proteins included had > or =25% sequence identity to any other protein in the same subcellular-location subset. Some of these Web servers can be particularly used to deal with multiplex proteins as well, which may simultaneously exist at, or move between, two or more different subcellular locations. Proteins with multiple locations or dynamic features of this kind are particularly interesting, because they may have some special biological functions intriguing to investigators in both basic research and drug discovery. This protocol is a step-by-step guide on how to use the Web-server predictors in the Cell-PLoc package. The computational time for each prediction is less than 5 s in most cases. The Cell-PLoc package is freely accessible at http://chou.med.harvard.edu/bioinf/Cell-PLoc.

855 citations


Network Information
Related Topics (5)
Protein structure
42.3K papers, 3M citations
77% related
Binding site
48.1K papers, 2.5M citations
73% related
Peptide sequence
84.1K papers, 4.3M citations
72% related
Genome
74.2K papers, 3.8M citations
71% related
Protein subunit
33.2K papers, 1.7M citations
70% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202310
202221
202117
202023
201913
201822