scispace - formally typeset
Journal ArticleDOI

iPhos‐PseEvo: Identifying Human Phosphorylated Proteins by Incorporating Evolutionary Information into General PseAAC via Grey System Theory

Reads0
Chats0
TLDR
A predictor called iPhos‐PseEvo is proposed by incorporating the protein sequence evolutionary information into the general pseudo amino acid composition (PseAAC) via the grey system theory, and constructing an ensemble predictor by fusing an array of individual random forest classifiers thru a voting system.
Abstract
Protein phosphorylation plays a critical role in human body by altering the structural conformation of a protein, causing it to become activated/deactivated, or functional modification Given an uncharacterized protein sequence, can we predict whether it may be phosphorylated or may not? This is no doubt a very meaningful problem for both basic research and drug development Unfortunately, to our best knowledge, so far no high throughput bioinformatics tool whatsoever has been developed to address such a very basic but important problem due to its extremely complexity and lacking sufficient training data Here we proposed a predictor called iPhos-PseEvo by (1) incorporating the protein sequence evolutionary information into the general pseudo amino acid composition (PseAAC) via the grey system theory, (2) balancing out the skewed training datasets by the asymmetric bootstrap approach, and (3) constructing an ensemble predictor by fusing an array of individual random forest classifiers thru a voting system Rigorous jackknife tests have indicated that very promising success rates have been achieved by iPhos-PseEvo even for such a difficult problem A user-friendly web-server for iPhos-PseEvo has been established at http://wwwjci-bioinfocn/iPhos-PseEvo, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved It has not escaped our notice that the formulation and approach presented here can be used to analyze many other problems in protein science as well

read more

Citations
More filters
Journal ArticleDOI

iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC.

TL;DR: A novel predictor called iDNA6mA-PseKNC is proposed that is established by incorporating nucleotide physicochemical properties into Pseudo K-tuple Nucleotide Composition (PSEKNC), and it has been observed via rigorous cross-validations that the predictor's sensitivity, specificity, accuracy, and stability are excellent.
Journal ArticleDOI

iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC.

TL;DR: A two-layer seamless predictor named as 'iPromoter-2 L', which serves to identify a query DNA sequence as a promoter or non-promoter, and the second layer to predict which of the following six types the identified promoter belongs to.
Journal ArticleDOI

iRNA-PseColl: Identifying the Occurrence Sites of Different RNA Modifications by Incorporating Collective Effects of Nucleotides into PseKNC.

TL;DR: A novel platform called “iRNA-PseColl” has been developed, formed by incorporating both the individual and collective features of the sequence elements into the general pseudo K-tuple nucleotide composition (PseKNC) of RNA via the chemicophysical properties and density distribution of its constituent nucleotides.
Journal ArticleDOI

Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences

Bin Liu, +2 more
- 28 Apr 2017 - 
TL;DR: The updated Pse-in-One 2.0 package has incorporated 23 new pseudo component modes as well as a series of new feature analysis approaches, and is available at http://bioinformatics.hitsz.edu.cn/Pse- in-One2.0/.
Journal ArticleDOI

iPTM-mLys: identifying multiple lysine PTM sites and their different types

TL;DR: Rigorous cross-validations via a set of multi-label metrics indicate that the first multi- label PTM predictor is very promising and encouraging.
References
More filters
Journal ArticleDOI

Random Forests

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Journal ArticleDOI

The Protein Kinase Complement of the Human Genome

TL;DR: The protein kinase complement of the human genome is catalogued using public and proprietary genomic, complementary DNA, and expressed sequence tag sequences to provide a starting point for comprehensive analysis of protein phosphorylation in normal and disease states and a detailed view of the current state of human genome analysis through a focus on one large gene family.
Journal ArticleDOI

Prediction of protein cellular attributes using pseudo‐amino acid composition

Kuo-Chen Chou
- 15 May 2001 - 
TL;DR: A remarkable improvement in prediction quality has been observed by using the pseudo‐amino acid composition and its mathematical framework and biochemical implication may also have a notable impact on improving the prediction quality of other protein features.
Journal ArticleDOI

Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements

TL;DR: The use of composition-based statistics is particularly beneficial for large-scale automated applications of PSI-BLAST, and the use, for each database sequence, of a position-specific scoring system tuned to that sequence's amino acid composition.
Journal ArticleDOI

Some remarks on protein attribute prediction and pseudo amino acid composition.

TL;DR: This review is to discuss each of the five procedures of the introduction of pseudo amino acid composition (PseAAC), its different modes and applications as well as its recent development, particularly in how to use the general formulation of PseAAC to reflect the core and essential features that are deeply hidden in complicated protein sequences.
Related Papers (5)