scispace - formally typeset
Search or ask a question

Showing papers on "Pseudo amino acid composition published in 2006"


Journal ArticleDOI
TL;DR: A novel hybridization classifier was developed by fusing many basic individual classifiers through a voting system and is anticipated that the powerful fusion classifier may also become a very useful high throughput tool in characterizing other attributes of proteins according to their sequences, such as enzyme class, membrane protein type, and nuclear receptor subfamily.
Abstract: Facing the explosion of newly generated protein sequences in the post genomic era, we are challenged to develop an automated method for fast and reliably annotating their subcellular locations. Kno...

251 citations


Journal ArticleDOI
TL;DR: An incisive and compelling analysis was given to elucidate that the overwhelmingly high success rate obtained by the new predictor is by no means due to a trivial utilization of the GO annotations.

239 citations


Journal ArticleDOI
Xuan Xiao1, Shi-Huang Shao1, Yongsheng Ding1, Zheng-De Huang1, Kuo-Chen Chou1 
TL;DR: Many important features, which are originally hidden in the long amino acid sequences, can be clearly displayed through their cellular automata images, and many image recognition tools can be straightforwardly applied to the target aimed here.
Abstract: The avalanche of newly found protein sequences in the post-genomic era has motivated and challenged us to develop an automated method that can rapidly and accurately predict the localization of an uncharacterized protein in cells because the knowledge thus obtained can greatly speed up the process in finding its biological functions. However, it is very difficult to establish such a desired predictor by acquiring the key statistical information buried in a pile of extremely complicated and highly variable sequences. In this paper, based on the concept of the pseudo amino acid composition (Chou, K. C. PROTEINS: Structure, Function, and Genetics, 2001, 43: 246-255), the approach of cellular automata image is introduced to cope with this problem. Many important features, which are originally hidden in the long amino acid sequences, can be clearly displayed through their cellular automata images. One of the remarkable merits by doing so is that many image recognition tools can be straightforwardly applied to the target aimed here. High success rates were observed through the self-consistency, jackknife, and independent dataset tests, respectively.

210 citations


Journal ArticleDOI
TL;DR: The advantage by incorporating the complexity measure factor into the pseudo amino acid composition as one of its components is that it can catch the essence of the overall sequence pattern of a protein and hence more effectively reflect its sequence‐order effects.
Abstract: The structural class is an important feature widely used to characterize the overall folding type of a protein. How to improve the prediction quality for protein structural classification by effectively incorporating the sequence-order effects is an important and challenging problem. Based on the concept of the pseudo amino acid composition [Chou, K. C. Proteins Struct Funct Genet 2001, 43, 246; Erratum: Proteins Struct Funct Genet 2001, 44, 60], a novel approach for measuring the complexity of a protein sequence was introduced. The advantage by incorporating the complexity measure factor into the pseudo amino acid composition as one of its components is that it can catch the essence of the overall sequence pattern of a protein and hence more effectively reflect its sequence-order effects. It was demonstrated thru the jackknife crossvalidation test that the overall success rate by the new approach was significantly higher than those by the others. It has not escaped our notice that the introduction of the complexity measure factor can also be used to improve the prediction quality for, among many other protein attributes, subcellular localization, enzyme family class, membrane protein type, and G-protein couple receptor type.

180 citations


Journal ArticleDOI
TL;DR: In this paper, based on the concept of representing protein samples in terms of their pseudo-amino acid composition, the fuzzy K-nearest neighbors (KNN) algorithm has been introduced to predict membrane protein types, and high success rates were observed.

167 citations


Journal ArticleDOI
Chao Chen1, Yuan-Xin Tian1, Xiaoyong Zou1, Peixiang Cai1, Jinyuan Mo1 
TL;DR: A novel predictor is developed for predicting protein structural class by employing a support vector machine learning system and using a different pseudo-amino acid composition (PseAA), indicating that the current predictor featured with the PseAA may play an important complementary role to the elegant covariant discriminant predictor and other existing algorithms.

166 citations


Journal ArticleDOI
TL;DR: A novel approach called "stacked generalization" or "stacking" has been introduced, which can combine several different types of classifiers through a meta-classifier to maximize the generalization accuracy.

161 citations


Journal ArticleDOI
Chao Chen1, Xi-Bin Zhou1, Yuan-Xin Tian1, Xiaoyong Zou1, Peixiang Cai1 
TL;DR: A dual-layer support vector machine (SVM) fusion network that is featured by using a different pseudo-amino acid composition (PseAA) and a significant enhancement in success rates was observed, indicating that the current approach may serve as a powerful complementary tool to other existing methods in this area.

154 citations


Journal ArticleDOI
TL;DR: The GO-PseAA predictor is very promising for predicting protein-protein interactions from protein sequences, and might become a useful vehicle for studying the network biology in the postgenomic era.
Abstract: To understand the networks in living cells, it is indispensably important to identify protein−protein interactions on a genomic scale Unfortunately, it is both time-consuming and expensive to do so solely based on experiments due to the nature of the problem whose complexity is obviously overwhelming, just like the fact that “life is complicated” Therefore, developing computational techniques for predicting protein−protein interactions would be of significant value in this regard By fusing the approach based on the gene ontology and the approach of pseudo-amino acid composition, a predictor called “GO-PseAA” predictor was established to deal with this problem As a showcase, prediction was performed on 6323 protein pairs from yeast To avoid redundancy and homology bias, none of the protein pairs investigated has ≥40% sequence identity with any other The overall success rate obtained by jackknife cross-validation was 816%, indicating the GO-PseAA predictor is very promising for predicting protein−pro

151 citations


Journal ArticleDOI
Pufeng Du1, Yanda Li1
TL;DR: A method which is based on an extended version of pseudo-amino acid composition to predict the protein localization within mitochondria and the membrane protein type for mitochondrial inner membrane proteins is developed.
Abstract: Background Knowing the submitochondria localization of a mitochondria protein is an important step to understand its function. We develop a method which is based on an extended version of pseudo-amino acid composition to predict the protein localization within mitochondria. This work goes one step further than predicting protein subcellular location. We also try to predict the membrane protein type for mitochondrial inner membrane proteins.

150 citations


Journal ArticleDOI
TL;DR: The improved feature extraction method is effective and feasible, and the feature vectors based on this method may contain more protein quaternary structure information and appear to capture essential information about the composition and hydrophobicity of residues in the surface patches that buried in the interfaces of associated subunits.
Abstract: The interaction of non-covalently bound monomeric protein subunits forms oligomers. The oligomeric proteins are superior to the monomers within the scope of functional evolution of biomacromolecules. Such complexes are involved in various biological processes, and play an important role. It is highly desirable to predict oligomer types automatically from their sequence. Here, based on the concept of pseudo amino acid composition, an improved feature extraction method of weighted auto-correlation function of amino acid residue index and Naive Bayes multi-feature fusion algorithm is proposed and applied to predict protein homo-oligomer types. We used the support vector machine (SVM) as base classifiers, in order to obtain better results. For example, the total accuracies of A, B, C, D and E sets based on this improved feature extraction method are 77.63, 77.16, 76.46, 76.70 and 75.06% respectively in the jackknife test, which are 6.39, 5.92, 5.22, 5.46 and 3.82% higher than that of G set based on conventional amino acid composition method with the same SVM. Comparing with Chou’s feature extraction method of incorporating quasi-sequence-order effect, our method can increase the total accuracy at a level of 3.51 to 1.01%. The total accuracy improves from 79.66 to 80.83% by using the Naive Bayes Feature Fusion algorithm. These results show: 1) The improved feature extraction method is effective and feasible, and the feature vectors based on this method may contain more protein quaternary structure information and appear to capture essential information about the composition and hydrophobicity of residues in the surface patches that buried in the interfaces of associated subunits; 2) Naive Bayes Feature Fusion algorithm and SVM can be referred as a powerful computational tool for predicting protein homo-oligomer types.

Journal ArticleDOI
TL;DR: The concept of pseudo-amino acid composition was incorporated to represent a peptide in a mathematical framework that includes the sequence-order effect along with conventional amino acid composition to formulating an in silico approach for the classification of conotoxins into superfamilies.

Journal ArticleDOI
TL;DR: It is anticipated that the novel ensemble classifier may also become a very useful vehicle in classifying other attributes of proteins according to their sequences, such as membrane protein type, enzyme family/sub‐family, G‐protein coupled receptor (GPCR) type, and structural class, among many others.
Abstract: One of the fundamental goals in cell biology and proteomics is to identify the functions of proteins in the context of compartments that organize them in the cellular environment. Knowledge of subcellular locations of proteins can provide key hints for revealing their functions and understanding how they interact with each other in cellular networking. Unfortunately, it is both time-consuming and expensive to determine the localization of an uncharacterized protein in a living cell purely based on experiments. With the avalanche of newly found protein sequences emerging in the post genomic era, we are facing a critical challenge, that is, how to develop an automated method to fast and reliably identify their subcellular locations so as to be able to timely use them for basic research and drug discovery. In view of this, an ensemble classifier was developed by the approach of fusing many basic individual classifiers through a voting system. Each of these basic classifiers was trained in a different dimension of the amphiphilic pseudo amino acid composition (Chou [2005] Bioinformatics 21: 10-19). As a demonstration, predictions were performed with the fusion classifier for proteins among the following 14 localizations: (1) cell wall, (2) centriole, (3) chloroplast, (4) cytoplasm, (5) cytoskeleton, (6) endoplasmic reticulum, (7) extracellular, (8) Golgi apparatus, (9) lysosome, (10) mitochondria, (11) nucleus, (12) peroxisome, (13) plasma membrane, and (14) vacuole. The overall success rates thus obtained via the resubstitution test, jackknife test, and independent dataset test were all significantly higher than those by the existing classifiers. It is anticipated that the novel ensemble classifier may also become a very useful vehicle in classifying other attributes of proteins according to their sequences, such as membrane protein type, enzyme family/sub-family, G-protein coupled receptor (GPCR) type, and structural class, among many others. The fusion ensemble classifier will be available at www.pami.sjtu.edu.cn/people/hbshen.

Journal ArticleDOI
TL;DR: The overall success rates obtained are much higher than those obtained by the other methods on the same stringent data set, indicating that the FunD-PseAA predictor may become a useful high throughput tool in bioinformatics and proteomics.

Journal ArticleDOI
15 May 2006-Proteins
TL;DR: The high jackknife success rates yielded for such a stringent dataset indicate the GO‐PseAA predictor is very powerful and might become a useful tool in bioinformatics and proteomics.
Abstract: Proteases play a vitally important role in regulating most physiological processes. Different types of proteases perform different functions with different biological processes. Therefore, it is highly desired to develop a fast and reliable means to identify the types of proteases according to their sequences, or even just identify whether they are proteases or nonproteases. The avalanche of protein sequences generated in the postgenomic era has made such a challenge become even more critical and urgent. By hybridizing the gene ontology approach and pseudo amino acid composition approach, a powerful predictor called GO-PseAA predictor was introduced to address the problems. To avoid redundancy and bias, demonstrations were performed on a dataset where none of proteins has ≥ 25% sequence identity to any other. The overall success rates thus obtained by the jackknife cross-validation test in identifying protease and nonprotease was 91.82%, and that in identifying the protease type was 85.49% among the following five types: (1) aspartic, (2) cysteine, (3) metallo, (4) serine, and (5) threonine. The high jackknife success rates yielded for such a stringent dataset indicate the GO-PseAA predictor is very powerful and might become a useful tool in bioinformatics and proteomics. Proteins 2006. © 2006 Wiley-Liss, Inc.

Journal ArticleDOI
TL;DR: High success rates are obtained by both self-consistency test and jackknife test, and the method illuminates that the protein subcellular location can be predicted from its surface physio-chemical characteristic of protein folding.

Book ChapterDOI
16 Aug 2006
TL;DR: A new method to predict protein subcellular location based on pseudo amino acid composition and immune genetic algorithm, which illuminates that the hydrophobic patterns of protein sequence influence its sub cellular location.
Abstract: Protein subcellular location prediction with computational method is still a hot spot in bioinformatics. In this paper, we present a new method to predict protein subcellular location, which based on pseudo amino acid composition and immune genetic algorithm. Hydrophobic patterns of amino acid couples and approximate entropy are introduced to construct pseudo amino acid composition. Immune Genetic algorithm (IGA) is applied to find the fittest weight factors for pseudo amino acid composition, which are crucial in this method. As such, high success rates are obtained by both self-consistency test and jackknife test. More than 80% predictive accuracy is achieved in independent dataset test. The result demonstrates that this new method is practical. And, the method illuminates that the hydrophobic patterns of protein sequence influence its subcellular location.