Showing papers on "Pseudo amino acid composition published in 2013"

PDF

Open Access

Journal Article•DOI•

propy: a tool to generate various modes of Chou’s PseAAC

[...]

Dong-Sheng Cao¹, Qing-Song Xu¹, Yi-Zeng Liang¹•Institutions (1)

01 Apr 2013-Bioinformatics

TL;DR: A freely available, open source python package called protein in python (propy) for calculating the widely used structural and physicochemical features of proteins and peptides from amino acid sequence and can also easily compute the previous descriptors based on user-defined properties, which are automatically available from the AAindex database.

...read moreread less

Abstract: Summary: Sequence-derived structural and physiochemical features have been frequently used for analysing and predicting structural, functional, expression and interaction profiles of proteins and peptides. To facilitate extensive studies of proteins and peptides, we developed a freely available, open source python package called protein in python (propy) for calculating the widely used structural and physicochemical features of proteins and peptides from amino acid sequence. It computes five feature groups composed of 13 features, including amino acid composition, dipeptide composition, tripeptide composition, normalized Moreau–Broto autocorrelation, Moran autocorrelation, Geary autocorrelation, sequence-order-coupling number, quasi-sequence-order descriptors, composition, transition and distribution of various structural and physicochemical properties and two types of pseudo amino acid composition (PseAAC) descriptors. These features could be generally regarded as different Chou’s PseAAC modes. In addition, it can also easily compute the previous descriptors based on user-defined properties, which are automatically available from the AAindex database. Availability: The python package, propy, is freely available via http:// code.google.com/p/protpy/downloads/list, and it runs on Linux and MS-Windows.

...read moreread less

388 citations

Journal Article•DOI•

iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition.

[...]

Yan Xu¹, Jun Ding¹, Ling-Yun Wu², Kuo-Chen Chou•Institutions (2)

University of Science and Technology Beijing¹, Chinese Academy of Sciences²

07 Feb 2013-PLOS ONE

TL;DR: It was observed that the overall cross-validation success rate achieved by iSNO-PseAAC in identifying nitrosylated proteins on an independent dataset was over 90%, indicating that the new predictor is quite promising.

...read moreread less

Abstract: Posttranslational modifications (PTMs) of proteins are responsible for sensing and transducing signals to regulate various cellular functions and signaling events. S-nitrosylation (SNO) is one of the most important and universal PTMs. With the avalanche of protein sequences generated in the post-genomic age, it is highly desired to develop computational methods for timely identifying the exact SNO sites in proteins because this kind of information is very useful for both basic research and drug development. Here, a new predictor, called iSNO-PseAAC, was developed for identifying the SNO sites in proteins by incorporating the position-specific amino acid propensity (PSAAP) into the general form of pseudo amino acid composition (PseAAC). The predictor was implemented using the conditional random field (CRF) algorithm. As a demonstration, a benchmark dataset was constructed that contains 731 SNO sites and 810 non-SNO sites. To reduce the homology bias, none of these sites were derived from the proteins that had pairwise sequence identity to any other. It was observed that the overall cross-validation success rate achieved by iSNO-PseAAC in identifying nitrosylated proteins on an independent dataset was over 90%, indicating that the new predictor is quite promising. Furthermore, a user-friendly web-server for iSNO-PseAAC was established at http://app.aporc.org/iSNO-PseAAC/, by which users can easily obtain the desired results without the need to follow the mathematical equations involved during the process of developing the prediction method. It is anticipated that iSNO-PseAAC may become a useful high throughput tool for identifying the SNO sites, or at the very least play a complementary role to the existing methods in this area.

...read moreread less

358 citations

Journal Article•DOI•

iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition

[...]

Pengmian Feng, Wei Chen, Hao Lin¹, Kuo-Chen Chou²•Institutions (2)

University of Electronic Science and Technology of China¹, King Abdulaziz University²

01 Nov 2013-Analytical Biochemistry

TL;DR: It was observed that the overall success rate achieved by iHSP-PseRAAAC in identifying the functional types of HSPs among the aforementioned six types was more than 87%, which was derived by the jackknife test on a stringent benchmark dataset.

...read moreread less

270 citations

Journal Article•DOI•

iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins

[...]

Wei-Zhong Lin¹, Jian-An Fang¹, Xuan Xiao, Kuo-Chen Chou•Institutions (1)

Donghua University¹

05 Mar 2013-Molecular BioSystems

TL;DR: A new predictor, called iLoc-Animal, has been developed that can be used to deal with the systems containing both single- and multi-label animal (metazoan except human) proteins and the outcomes achieved were quite encouraging, indicating that the predictor may become a useful tool in this area.

...read moreread less

Abstract: Predicting protein subcellular localization is a challenging problem, particularly when query proteins have multi-label features meaning that they may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing methods can only be used to deal with the single-label proteins. Actually, multi-label proteins should not be ignored because they usually bear some special function worthy of in-depth studies. By introducing the “multi-label learning” approach, a new predictor, called iLoc-Animal, has been developed that can be used to deal with the systems containing both single- and multi-label animal (metazoan except human) proteins. Meanwhile, to measure the prediction quality of a multi-label system in a rigorous way, five indices were introduced; they are “Absolute-True”, “Absolute-False” (or Hamming-Loss”), “Accuracy”, “Precision”, and “Recall”. As a demonstration, the jackknife cross-validation was performed with iLoc-Animal on a benchmark dataset of animal proteins classified into the following 20 location sites: (1) acrosome, (2) cell membrane, (3) centriole, (4) centrosome, (5) cell cortex, (6) cytoplasm, (7) cytoskeleton, (8) endoplasmic reticulum, (9) endosome, (10) extracellular, (11) Golgi apparatus, (12) lysosome, (13) mitochondrion, (14) melanosome, (15) microsome, (16) nucleus, (17) peroxisome, (18) plasma membrane, (19) spindle, and (20) synapse, where many proteins belong to two or more locations. For such a complicated system, the outcomes achieved by iLoc-Animal for all the aforementioned five indices were quite encouraging, indicating that the predictor may become a useful tool in this area. It has not escaped our notice that the multi-label approach and the rigorous measurement metrics can also be used to investigate many other multi-label problems in molecular biology. As a user-friendly web-server, iLoc-Animal is freely accessible to the public at the web-site http://www.jci-bioinfo.cn/iLoc-Animal.

...read moreread less

232 citations

Journal Article•DOI•

Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou’s pseudo amino acid composition

[...]

Yen Kuang Chen¹, Kuo Bin Li¹•Institutions (1)

National Yang-Ming University¹

07 Feb 2013-Journal of Theoretical Biology

TL;DR: A novel computational classifier for the prediction of membrane protein types using proteins' sequences using sequence attributes including the cationic patch sizes, the orientation, and the topology of transmembrane segments and most of the sequence attributes implemented in the proposed classifier have supported literature evidences.

...read moreread less

123 citations

Journal Article•DOI•

GOASVM: a subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou's pseudo-amino acid composition.

[...]

Shibiao Wan¹, Man-Wai Mak¹, Sun-Yuan Kung²•Institutions (2)

Hong Kong Polytechnic University¹, Princeton University²

21 Apr 2013-Journal of Theoretical Biology

TL;DR: An efficient GO method called GOASVM is proposed that exploits the information from the GO term frequencies and distant homologs to represent a protein in the general form of Chou's pseudo-amino acid composition.

...read moreread less

104 citations

Journal Article•DOI•

Protein Remote Homology Detection by Combining Chou’s Pseudo Amino Acid Composition and Profile‐Based Protein Representation

[...]

Bin Liu¹, Bin Liu², Xiaolong Wang², Quan Zou³, Qiwen Dong¹, Qingcai Chen² - Show less +2 more•Institutions (3)

Fudan University¹, Harbin Institute of Technology Shenzhen Graduate School², Xiamen University³

01 Oct 2013-Molecular Informatics

TL;DR: Motivated by the success of the pseudo amino acid composition (PseAAC) proposed by Chou, this approach for protein remote homology detection achieves superior or comparable performance with current state‐of‐the‐art methods.

...read moreread less

Abstract: Protein remote homology detection is a key problem in bioinformatics. Currently the discriminative methods, such as Support Vector Machine (SVM) can achieve the best performance. The most efficient approach to improve the performance of SVM-based methods is to find a general protein representation method that is able to convert proteins with different lengths into fixed length vectors and captures the different properties of the proteins for the discrimination. The bottleneck of designing the protein representation method is that native proteins have different lengths. Motivated by the success of the pseudo amino acid composition (PseAAC) proposed by Chou, we applied this approach for protein remote homology detection. Some new indices derived from the amino acid index (AAIndex) database are incorporated into the PseAAC to improve the generalization ability of this method. Finally, the performance is further improved by combining the modified PseAAC with profile-based protein representation containing the evolutionary information extracted from the frequency profiles. Our experiments on a well-known benchmark show this method achieves superior or comparable performance with current state-of-the-art methods.

...read moreread less

98 citations

Journal Article•DOI•

Using radial basis function on the general form of Chou's pseudo amino acid composition and PSSM to predict subcellular locations of proteins with both single and multiple sites

[...]

Chao Huang¹, Jingqi Yuan¹•Institutions (1)

Shanghai Jiao Tong University¹

01 Jul 2013-BioSystems

TL;DR: Two different feature extraction methods and two different models of neural networks were performed on three benchmark datasets of different kinds of proteins, i.e. datasets constructed specially for Gram-positive bacterial proteins, plant proteins and virus proteins, and shows that RBF neural network has apparently superiorities against BP neural network on these datasets no matter which type of feature extraction is chosen.

...read moreread less

Abstract: Prediction of protein subcellular location is a meaningful task which attracted much attention in recent years. A lot of protein subcellular location predictors which can only deal with the single-location proteins were developed. However, some proteins may belong to two or even more subcellular locations. It is important to develop predictors which will be able to deal with multiplex proteins, because these proteins have extremely useful implication in both basic biological research and drug discovery. Considering the circumstance that the number of methods dealing with multiplex proteins is limited, it is meaningful to explore some new methods which can predict subcellular location of proteins with both single and multiple sites. Different methods of feature extraction and different models of predict algorithms using on different benchmark datasets may receive some general results. In this paper, two different feature extraction methods and two different models of neural networks were performed on three benchmark datasets of different kinds of proteins, i.e. datasets constructed specially for Gram-positive bacterial proteins, plant proteins and virus proteins. These benchmark datasets have different number of location sites. The application result shows that RBF neural network has apparently superiorities against BP neural network on these datasets no matter which type of feature extraction is chosen.

...read moreread less

70 citations

Journal Article•DOI•

A Multilabel Model Based on Chou’s Pseudo–Amino Acid Composition for Identifying Membrane Proteins with Both Single and Multiple Functional Types

[...]

Chao Huang¹, Chao Huang², Jing-Qi Yuan¹, Jing-Qi Yuan²•Institutions (2)

Shanghai Jiao Tong University¹, Chinese Ministry of Education²

02 Apr 2013-The Journal of Membrane Biology

TL;DR: P Pseudo–amino acid composition, which has proven to be a very efficient tool in representing protein sequences, and a multilabel KNN algorithm are used to compose this prediction engine.

...read moreread less

Abstract: Predicting membrane protein type is a meaningful task because this kind of information is very useful to explain the function of membrane proteins. Due to the explosion of new protein sequences discovered, it is highly desired to develop efficient computation tools for quickly and accurately predicting the membrane type for a given protein sequence. Even though several membrane predictors have been developed, they can only deal with the membrane proteins which belong to the single membrane type. The fact is that there are membrane proteins belonging to two or more than two types. To solve this problem, a system for predicting membrane protein sequences with single or multiple types is proposed. Pseudo–amino acid composition, which has proven to be a very efficient tool in representing protein sequences, and a multilabel KNN algorithm are used to compose this prediction engine. The results of this initial study are encouraging.

...read moreread less

61 citations

Journal Article•DOI•

Predicting protein subchloroplast locations with both single and multiple sites via three different modes of Chou's pseudo amino acid compositions.

[...]

Chao Huang¹, Jing-Qi Yuan¹•Institutions (1)

Shanghai Jiao Tong University¹

21 Oct 2013-Journal of Theoretical Biology

TL;DR: The performance of the prediction models indicate that the proposed methods might be applied as a useful and efficient assistant tool for the prediction of sub-subcellular localizations.

...read moreread less

50 citations

Journal Article•DOI•

A Short Survey on Genetic Sequences, Chou’s Pseudo Amino Acid Composition and its Combination with Fuzzy Set Theory

[...]

D. N. Georgiou, Theodoros E. Karakasidis, A.C. Megaritis

13 Dec 2013-The Open Bioinformatics Journal

TL;DR: This survey presents results concerning genetic sequences and Chou's pseudo amino acid composition as well as methodologies developed based on this concept along with elements of fuzzy set theory, and emphasizes on fuzzy clustering and its application in analysis of genetic sequences.

...read moreread less

Abstract: The study of genetic sequences is of great importance in biology and medicine. Sequence analysis and taxonomy are two major fields of application of bioinformatics. In this survey, we present results concerning genetic sequences and Chou's pseudo amino acid composition as well as methodologies developed based on this concept along with elements of fuzzy set theory, and emphasize on fuzzy clustering and its application in analysis of genetic sequences.

...read moreread less

Journal Article•DOI•

An alignment-free method to find similarity among protein sequences via the general form of Chou’s pseudo amino acid composition

[...]

Manoj Kumar Gupta¹, Rajdeep Niyogi¹, Manoj Misra¹•Institutions (1)

Indian Institute of Technology Roorkee¹

16 Jul 2013-Sar and Qsar in Environmental Research

TL;DR: This paper proposes a method to create the 60-dimensional feature vector for protein sequences via the general form of pseudo amino acid composition and compares it with six other recently proposed alignment-free methods to show that the proposed method gives a more consistent biological relationship than the others.

...read moreread less

Abstract: In this paper, we propose a method to create the 60-dimensional feature vector for protein sequences via the general form of pseudo amino acid composition. The construction of the feature vector is based on the contents of amino acids, total distance of each amino acid from the first amino acid in the protein sequence and the distribution of 20 amino acids. The obtained cosine distance metric (also called the similarity matrix) is used to construct the phylogenetic tree by the neighbour joining method. In order to show the applicability of our approach, we tested it on three proteins: 1) ND5 protein sequences from nine species, 2) ND6 protein sequences from eight species, and 3) 50 coronavirus spike proteins. The results are in agreement with known history and the output from the multiple sequence alignment program ClustalW, which is widely used. We have also compared our phylogenetic results with six other recently proposed alignment-free methods. These comparisons show that our proposed method gives a more consistent biological relationship than the others. In addition, the time complexity is linear and space required is less as compared with other alignment-free methods that use graphical representation. It should be noted that the multiple sequence alignment method has exponential time complexity.

...read moreread less

Journal Article•DOI•

Discriminating bioluminescent proteins by incorporating average chemical shift and evolutionary information into the general form of Chou's pseudo amino acid composition

[...]

Guo-Liang Fan¹, Qian-Zhong Li¹•Institutions (1)

Inner Mongolia University¹

07 Oct 2013-Journal of Theoretical Biology

TL;DR: A novel method called auto covariance of averaged chemical shift (acACS) for extracting structure features from a sequence by combining dipeptide composition, reduced amino acid composition, evolutionary information, and acACS outperformed other feature extraction methods.

...read moreread less

Journal Article•DOI•

Predicting subchloroplast locations of proteins based on the general form of chou's pseudo amino acid composition: approached from optimal tripeptide composition

[...]

Hao Lin¹, Chen Ding¹, Lu-Feng Yuan¹, Wei Chen, Hui Ding¹, Zi-Qiang Li², Feng-Biao Guo¹, Jian Huang¹, Nini Rao¹ - Show less +5 more•Institutions (2)

University of Electronic Science and Technology of China¹, Sichuan Agricultural University²

04 Apr 2013-International Journal of Biomathematics

TL;DR: A novel method to predict subchloroplast locations of proteins using tripeptide compositions using the binomial distribution to optimize the feature sets and a predictor called ChloPred has been built, which will provide important information for theoretical and experimental research of chloroplast proteins.

...read moreread less

Abstract: Chloroplasts are organelles found in plant cells that conduct photosynthesis. The subchloroplast locations of proteins are correlated with their functions. With the availability of a great number of protein data, it is highly desired to develop a computational method to predict the subchloroplast locations of chloroplast proteins. In this study, we proposed a novel method to predict subchloroplast locations of proteins using tripeptide compositions. It first used the binomial distribution to optimize the feature sets. Then the support vector machine was selected to perform the prediction of subchloroplast locations of proteins. The proposed method was tested on a reliable and rigorous dataset including 259 chloroplast proteins with sequence identity ≤ 25%. In the jack-knife cross-validation, 92.21% envelope proteins, 93.20% thylakoid membrane, 52.63% thylakoid lumen and 85.00% stroma can be correctly identified. The overall accuracy achieves 88.03% which is higher than that of other models. Based on this method, a predictor called ChloPred has been built and can be freely available from http://cobi.uestc.edu.cn/people/hlin/tools/ChloPred/. The predictor will provide important information for theoretical and experimental research of chloroplast proteins.

...read moreread less

Journal Article•DOI•

Virus-ECC-mPLoc: a multi-label predictor for predicting the subcellular localization of virus proteins with both single and multiple sites based on a general form of Chou's pseudo amino acid composition.

[...]

Xiao Wang¹, Guo-Zheng Li, Wencong Lu•Institutions (1)

Tongji University¹

28 Feb 2013-Protein and Peptide Letters

TL;DR: A new predictor, called Virus-ECC-mPLoc, has been developed that can be used to deal with the systems containing both singleplex and multiplex proteins by introducing a powerful multi-label learning approach which exploits correlations between subcellular locations and by hybridizing the gene ontology information with the dipeptide composition information.

...read moreread less

Abstract: Protein subcellular localization aims at predicting the location of a protein within a cell using computational methods. Knowledge of subcellular localization of viral proteins in a host cell or virus-infected cell is important because it is closely related to their destructive tendencies and consequences. Prediction of viral protein subcellular localization is an important but challenging problem, particularly when proteins may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular localization methods specialized for viral proteins are only used to deal with the single-location proteins. To better reflect the characteristics of multiplex proteins, a new predictor, called Virus-ECC-mPLoc, has been developed that can be used to deal with the systems containing both singleplex and multiplex proteins by introducing a powerful multi-label learning approach which exploits correlations between subcellular locations and by hybridizing the gene ontology information with the dipeptide composition information. It can be utilized to identify viral proteins among the following six locations: (1) viral capsid, (2) host cell membrane, (3) host endoplasmic reticulum, (4) host cytoplasm, (5) host nucleus, and (6) secreted. Experimental results show that the overall success rates thus obtained by Virus-ECC-mPLoc are 86.9% for jackknife test and 87.2% for independent data set test, which are significantly higher than that by any of the existing predictors. As a user-friendly web-server, Virus-ECCmPLoc is freely accessible to the public at the web-site http://levis.tongji.edu.cn:8080/bioinfo/Virus-ECC-mPLoc/.

...read moreread less

Journal Article•DOI•

Prediction of essential proteins in prokaryotes by incorporating various physico-chemical features into the general form of Chou's pseudo amino acid composition.

[...]

Aditya Narayan Sarangi¹, Mohtashim Lohani, Rakesh Aggarwal•Institutions (1)

Sanjay Gandhi Post Graduate Institute of Medical Sciences¹

30 Jun 2013-Protein and Peptide Letters

TL;DR: The result indicates that this classifier model can be used for identification of novel prokaryotic essential proteins.

...read moreread less

Abstract: Prediction of essential proteins of a pathogenic organism is the key for the potential drug target identification, because inhibition of these would be fatal for the pathogen. Identification of these proteins requires the use of complex experimental techniques which are quite expensive and time consuming. We implemented Support Vector Machine algorithm to develop a classifier model for in silico prediction of prokaryotic essential proteins based on the physico-chemical properties of the amino acid sequences. This classifier was designed based on a set of 10 physico-chemical descriptor vectors (DVs) and 4 hybrid DVs calculated from amino acid sequences using PROFEAT and PseAAC servers. The classifier was trained using data sets consisting of 500 known essential and 500 non-essential proteins (n=1,000) and evaluated using an external validation set consisting of 3,462 essential proteins and 5,538 non-essential proteins (n=9,000). The performances of individual DV sets were evaluated. DV set 13, which is the combination of composition, transition and distribution descriptor set and hybrid autocorrelation descriptor set, provided accuracy of 91.2% in 10-fold cross-validation of the training set and an accuracy of 89.7% in external validation set and of 91.8% and 88.1% using a different yeast protein dataset. Our result indicates that this classification model can be used for identification of novel prokaryotic essential proteins.

...read moreread less

Journal Article•DOI•

Using the concept of Chou's pseudo amino acid composition to predict protein solubility: an approach with entropies in information theory.

[...]

Niu Xiaohui¹, Li Nana¹, Xia Jingbo¹, Chen Dingyan², Peng Yuehua², Xiao Yang², Wei Weiquan², Wang Dongming², Wang Zeng-zhen² - Show less +5 more•Institutions (2)

Huazhong Agricultural University¹, Huazhong University of Science and Technology²

07 Sep 2013-Journal of Theoretical Biology

TL;DR: In this study, the introduction of the entropy in information theory was introduced as another predictive factor in the model and significantly improved the performance of the predictive method.

...read moreread less

Journal Article•DOI•

Swfoldrate: Predicting protein folding rates from amino acid sequence with sliding window method

[...]

Xiang Cheng, Xuan Xiao, Zhi-cheng Wu, Pu Wang, Wei-zhong Lin - Show less +1 more

01 Jan 2013-Proteins

TL;DR: The long‐range and short‐range contact in protein were used to derive extended version of the pseudo amino acid composition based on sliding window method, capable of predicting the protein folding rates just from the amino acid sequence without the aid of any structural class information.

...read moreread less

Abstract: Protein folding is the process by which a protein processes from its denatured state to its specific biologically active conformation. Understanding the relationship between sequences and the folding rates of proteins remains an important challenge. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. In this study, the long-range and short-range contact in protein were used to derive extended version of the pseudo amino acid composition based on sliding window method. This method is capable of predicting the protein folding rates just from the amino acid sequence without the aid of any structural class information. We systematically studied the contributions of individual features to folding rate prediction. The optimal feature selection procedures are adopted by means of combining the forward feature selection and sequential backward selection method. Using the jackknife cross validation test, the method was demonstrated on the large dataset. The predictor was achieved on the basis of multitudinous physicochemical features and statistical features from protein using nonlinear support vector machine (SVM) regression model, the method obtained an excellent agreement between predicted and experimentally observed folding rates of proteins. The correlation coefficient is 0.9313 and the standard error is 2.2692. The prediction server is freely available at http://www.jci-bioinfo.cn/swfrate/input.jsp.

...read moreread less

Journal Article•DOI•

Identification and characterization of plastid-type proteins from sequence-attributed features using machine learning

[...]

Rakesh Kaundal¹, Sitanshu Sekhar Sahu¹, Ruchi Verma¹, Tyler Weirick¹•Institutions (1)

Oklahoma State University–Stillwater¹

09 Oct 2013-BMC Bioinformatics

TL;DR: This work compared the amino acid compositions of plastid proteins with those of non-plastid ones and found significant differences, which were used as a basis to develop various feature-based prediction models using similarity-search and machine learning.

...read moreread less

Abstract: Plastids are an important component of plant cells, being the site of manufacture and storage of chemical compounds used by the cell, and contain pigments such as those used in photosynthesis, starch synthesis/storage, cell color etc. They are essential organelles of the plant cell, also present in algae. Recent advances in genomic technology and sequencing efforts is generating a huge amount of DNA sequence data every day. The predicted proteome of these genomes needs annotation at a faster pace. In view of this, one such annotation need is to develop an automated system that can distinguish between plastid and non-plastid proteins accurately, and further classify plastid-types based on their functionality. We compared the amino acid compositions of plastid proteins with those of non-plastid ones and found significant differences, which were used as a basis to develop various feature-based prediction models using similarity-search and machine learning. In this study, we developed separate Support Vector Machine (SVM) trained classifiers for characterizing the plastids in two steps: first distinguishing the plastid vs. non-plastid proteins, and then classifying the identified plastids into their various types based on their function (chloroplast, chromoplast, etioplast, and amyloplast). Five diverse protein features: amino acid composition, dipeptide composition, the pseudo amino acid composition, Nterminal-Center-Cterminal composition and the protein physicochemical properties are used to develop SVM models. Overall, the dipeptide composition-based module shows the best performance with an accuracy of 86.80% and Matthews Correlation Coefficient (MCC) of 0.74 in phase-I and 78.60% with a MCC of 0.44 in phase-II. On independent test data, this model also performs better with an overall accuracy of 76.58% and 74.97% in phase-I and phase-II, respectively. The similarity-based PSI-BLAST module shows very low performance with about 50% prediction accuracy for distinguishing plastid vs. non-plastids and only 20% in classifying various plastid-types, indicating the need and importance of machine learning algorithms. The current work is a first attempt to develop a methodology for classifying various plastid-type proteins. The prediction modules have also been made available as a web tool, PLpred available at http://bioinfo.okstate.edu/PLpred/ for real time identification/characterization. We believe this tool will be very useful in the functional annotation of various genomes.

...read moreread less

Journal Article•DOI•

Locating apoptosis proteins by incorporating the signal peptide cleavage sites into the general form of Chou's Pseudo amino acid composition

[...]

Yufang Qin¹, Li Zheng², Jifeng Huang²•Institutions (2)

College of Information Technology¹, Shanghai Normal University²

05 Jun 2013-International Journal of Quantum Chemistry

TL;DR: A new apoptosis proteins localization algorithm, named PSSP, is proposed based on the predicted cleavage sites of primary protein sequences, which demonstrates that the total accuracies by this approach are comparable to existing methods.

...read moreread less

Abstract: Apoptosis proteins play an essential role in the development and homeostasis of an organism. The accurate prediction of subcellular location for apoptosis proteins is helpful for understanding the mechanism of programmed cell death and their biological functions. In this article, a new apoptosis proteins localization algorithm, named PSSP, is proposed based on the predicted cleavage sites of primary protein sequences. First, protein chains are divided into N-terminal signal parts and mature protein parts according to their predicted cleavage sites by SignalP. Then, amino acid composition (ACC) of the individual subsequence together with pseudo-ACC and stereochemical properties of whole chain were extracted to represent a given protein sequence. Jackknife test by support vector machine on three broadly used datasets (ZD98, ZW225, and CL317 datasets) of apoptosis proteins demonstrated that the total accuracies by this approach are 93.9, 87.6, and 91.5%, respectively. In addition, an independent nonapoptosis benchmark dataset (NNPSL) was also used to evaluate the performance of this method, and predictive accuracies for eukaryotic and prokaryotic proteins are also comparable to existing methods. © 2013 Wiley Periodicals, Inc.

...read moreread less

Book Chapter•DOI•

Predicting G-protein-coupled receptors families using different physiochemical properties and pseudo amino acid composition.

[...]

Zia ur Rehman¹, Muhammad Tayyeb Mirza¹, Asifullah Khan¹, Henri Xhaard²•Institutions (2)

Pakistan Institute of Engineering and Applied Sciences¹, University of Helsinki²

01 Jan 2013-Methods in Enzymology

TL;DR: A hybrid feature extraction strategy is shown to be suitable to represent GPCRs and to be able to exploit GPCR amino acid sequence discrimination capability in spatial as well as transform domain.

...read moreread less

Abstract: G-protein-coupled receptors (GPCRs) initiate signaling pathways via trimetric guanine nucleotide-binding proteins. GPCRs are classified based on their ligand-binding properties and molecular phylogenetic analyses. Nonetheless, these later analyses are in most case dependent on multiple sequence alignments, themselves dependent on human intervention and expertise. Alignment-free classifications of GPCR sequences, in addition to being unbiased, present many applications uncovering hidden physicochemical parameters shared among specific groups of receptors, to being used in automated workflows for large-scale molecular modeling applications. Current alignment-free classification methods, however, do not reach a full accuracy. This chapter discusses how GPCRs amino acid sequences can be classified using pseudo amino acid composition and multiscale energy representation of different physiochemical properties of amino acids. A hybrid feature extraction strategy is shown to be suitable to represent GPCRs and to be able to exploit GPCR amino acid sequence discrimination capability in spatial as well as transform domain. Classification strategies such as support vector machine and probabilistic neural network are then discussed in regards to GPCRs classification. The work of GPCR-Hybrid web predictor is also discussed.

...read moreread less

Journal Article•DOI•

MitProt-Pred

[...]

Muhammad Tayyeb Mirza¹, Asifullah Khan¹, Muhammad Atif Tahir¹, Yeon Soo Lee²•Institutions (2)

Pakistan Institute of Engineering and Applied Sciences¹, Catholic University of Daegu²

01 Oct 2013-Computers in Biology and Medicine

TL;DR: MitProt-Pred is developed that utilizes Bi-profile Bayes, Pseudo Average Chemical Shift, Split Amino Acid Composition, and Pseudo Amino acid Composition based features of the protein sequences to achieve significantly improved prediction performance for two standard datasets.

...read moreread less

Journal Article•DOI•

Protein structural classification based on pseudo amino acid composition using SVM classifier

[...]

Zbigniew Krajewski¹, Ewaryst Tkacz¹•Institutions (1)

Silesian University of Technology¹

01 Jan 2013-Biocybernetics and Biomedical Engineering

TL;DR: A structural classification by the aid of support vector machine (SVM) classifier of Amino acid composition and pseudo amino acid composition features was applied with different variants to avoid the redundancy and to ensure a maximal amount of available data.

...read moreread less

Patent•

Membrane protein sub-cell positioning method based on complex space multi-view feature fusion

[...]

Yu Dongjun, Hu Jun, Yang Jian, Xiaowei Wu, Hong-Bin Shen, Yong Qi, Tang Zhenmin, Yang Jingyu - Show less +4 more

25 Sep 2013

TL;DR: In this article, a membrane protein sub-cell positioning method based on complex space multi-view feature fusion is proposed, where features of pseudo amino acid composition of a protein sequence and features of a position-specific scoring matrix based on autocorrelation transform are extracted, and two kinds of features are combined into a feature vector in a complex space in a parallel mode.

...read moreread less

Abstract: The invention discloses a membrane protein sub-cell positioning method based on complex space multi-view feature fusion. Firstly, features of pseudo amino acid composition of a protein sequence and features of a position-specific scoring matrix based on autocorrelation transform are extracted; secondly, the two kinds of features are combined into a feature vector in a complex space in a parallel mode; thirdly, dimension reduction is conducted on the complex features after parallel combination through general principal component analysis method so as to remove noise; finally, the fused features are classified through an optimization evidence theory based K nearest neighbor classifier, and the position of a sub-cell is determined. The membrane protein sub-cell positioning method has the advantages that the complex space multi-view feature fusion technology is adopted, so that the diagnostic features of the protein sequence are extracted effectively; the K nearest neighbor classifier based on the optimization evidence theory is used, so that the accuracy of the membrane protein sub-cell positioning is improved.

...read moreread less

Journal Article•DOI•

Analyzes of the similarities of protein sequences based on the pseudo amino acid composition

[...]

Yan-ping Zhang¹, Ji-shuo Ruan¹, Ping-an He²•Institutions (2)

Nankai University¹, Zhejiang Sci-Tech University²

18 Dec 2013-Chemical Physics Letters

TL;DR: In this paper, the occurrence frequency of 20 amino acids and the new numerical characteristic of 2D graphical representation based on three physicochemical properties indexes as pseudo amino acid components were taken into account.

...read moreread less

Journal Article•DOI•

Prediction of Membrane Protein Types Using Pseudo-Amino Acid Composition and Ensemble Classification

[...]

Maqsood Hayat, Asifullah Khan

01 Jan 2013-International Journal of Computer and Electrical Engineering

TL;DR: An ensemble classification approach is developed using K-nearest neighbor and Probabilistic Neural Network as the basic learning mechanisms and the success rate has been obtained on all the tests such as self-consistency, jackknife, and independent dataset test is quite promising, indicating that the ensemble classifier may become a useful and high performance tool in identifying membrane proteins and their types.

...read moreread less

Abstract:  Abstract—Predicting membrane protein types is an important and challenging research in current molecular and cellular biology. The knowledge of membrane proteins types often provides crucial hints for determining the function of uncharacterized membrane proteins. It is thus highly desirable to develop an automated method that can serve as a high throughput tool in identifying the types of newly found membrane proteins by their primary sequence information only. In this paper, features are extracted from membrane protein sequences using pseudo-amino acid (PseAA) composition. An ensemble classification approach is developed using K-nearest neighbor and Probabilistic Neural Network as the basic learning mechanisms. Each basic classifier is trained using PseAA composition with different tiers. The success rate has been obtained by the ensemble classifier on all the tests such as self-consistency, jackknife, and independent dataset test is quite promising and indicating that the ensemble classifier may become a useful and high performance tool in identifying membrane proteins and their types.

...read moreread less

Proceedings Article•DOI•

Predicting the Subcellular Localization of Proteins with Multiple Sites Based on N-Terminal Signals

[...]

Xumi Qu¹, Yuehui Chen¹, Shanping Qiao¹•Institutions (1)

University of Jinan¹

07 Dec 2013

TL;DR: This study divides a protein sequence into two parts according to its N-terminal sorting signals and extracts their pseudo amino acid composition features respectively and uses the multi-label KNN, shorted for ML-KNN to deal with the proteins which have two, three or even more locations.

...read moreread less

Abstract: Sub cellular localization of proteins is an important attribute in bioinformatics, closely related to its functions, signal transduction and biological process. In this research field, great progress has been made in recent years. However, some shortcomings still exist in the prediction methods. Such as the extracted features information is not complete enough to achieve a higher prediction accuracy rate, some important protein information and the correlation of the amino acid sequence are usually ignored and so on. Some proteins do not have only one location, they may have two locations or three and even more, but were considered to have only one location. In this study, we divide a protein sequence into two parts according to its N-terminal sorting signals and extract their pseudo amino acid composition features respectively. And then we use the multi-label KNN, shorted for ML-KNN to deal with the proteins which have two, three or even more locations. The results are satisfied by Jack Knife test.

...read moreread less

Book Chapter•DOI•

Prediction of Protein Structural Class by Functional Link Artificial Neural Network Using Hybrid Feature Extraction Method

[...]

Bishnupriya Panda¹, Ambika Prasad Mishra¹, Babita Majhi², Minakhi Rout¹•Institutions (2)

Siksha O Anusandhan University¹, Central University, India²

19 Dec 2013

TL;DR: Chou's pseudo amino acid composition along with amphiphillic correlation factor and the spectral characteristics of the protein has been used to represent protein data to create a statistical framework for structural class prediction.

...read moreread less

Abstract: During last few decades' accurate prediction of protein structural class has been a challenging problem. Efficient and meaningful representation of protein molecule plays a significant role. In this paper Chou's pseudo amino acid composition along with amphiphillic correlation factor and the spectral characteristics of the protein has been used to represent protein data. Thus a protein sample is represented by a set of discrete components which incorporate both the sequence order and the sequence length effects. On the basis of such a statistical framework a simple functionally linked artificial neural network has been used for structural class prediction.

...read moreread less

Journal Article•DOI•

A Novel Pseudo Amino Acid Composition for Predicting Subcellular Location of Proteins

[...]

Wangren Qiu, Xuan Xiao, Lidong Wang, Dianxuan Gong

03 Jan 2013-Journal of Computers

TL;DR: A 20-dimension CGR-walk mode for representation of protein sample and the comparison results indicate that the present method may at least serve as an alternative to the existing predictors in this field.

...read moreread less

Abstract: Information on subcellular localization of proteins plays a vitally important role in molecular cell biology, proteomics and drug discovery. In this field, finding the most suitable representation for protein sample is one of the most crucial procedures. Inspired by the modes of pseudo amino acid composition (PAA), cellular automaton image (CAI) for protein and the chaos game representation (CGR) for DNA sequence, a 20-dimension CGR-walk mode for representation of protein sample is proposed. In the proposed model, the sequence order effect is discussed and manifested with a point of the 20-dimension space. And then, the track of protein sample is projected to all of the twenty amino acids, in another word, a protein sample is expressed by a 20-dimension vector. Followed with the preparation work, the proposed mode is applied into four protein datasets. The comparison results indicate that the present method may at least serve as an alternative to the existing predictors in this field.

...read moreread less

Development and performance evaluation of FLANN based model for protein structural class prediction

[...]

Bishnupriya Panda, Ambika Prasad Mishra, Babita Majhi, Minakhi Rout

01 Jan 2013

TL;DR: Chou's pseudo amino acid composition along with amphiphillic correlation factor has been used to represent protein data and a simple functionally linked artificial neural network has beenUsed for structural class prediction.

...read moreread less