scispace - formally typeset
Search or ask a question
Author

Guangya Zhang

Bio: Guangya Zhang is an academic researcher from Huaqiao University. The author has contributed to research in topics: Chemistry & Medicine. The author has an hindex of 10, co-authored 37 publications receiving 419 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper presents an amino acid composition distribution method for extracting useful features from primary sequence, and the k-nearest neighbor was used as the classifier and the overall prediction accuracy reached 90.74%.

109 citations

Journal ArticleDOI
TL;DR: A powerful predictor based on k-nearest neighbor was introduced to identify the types of lipases according to their sequences, indicating that the improved Chou's pseudo amino acid composition might be a useful tool for extracting the features of protein sequences, or at lease can play a complementary role to many of the other existing approaches.
Abstract: By proposing a improved Chous pseudo amino acid composition approach to extract the features of the sequences, a powerful predictor based on k-nearest neighbor was introduced to identify the types of lipases according to their sequences. To avoid redundancy and bias, demonstrations were performed on a dataset where none of the proteins has ≥ 25%sequence identity to any other. The overall success rate thus obtained by the 10-fold cross-validation test was over 90%, indicating that the improved Chous pseudo amino acid composition might be a useful tool for extracting the features of protein sequences, or at lease can play a complementary role to many of the other existing approaches.

88 citations

Journal ArticleDOI
TL;DR: It was demonstrated that LogitBoost outperformed AdaBoost and performed comparably with RBF neural network and support vector machine and the influence of protein size on discrimination was addressed.

47 citations

Journal ArticleDOI
TL;DR: Based on the information of dipeptide composition, a statistical method for discriminating thermophilic and mesophilic proteins is developed and the accuracy of the method for the training dataset was 86.3%.

32 citations

Journal ArticleDOI
TL;DR: In this paper, four pattern recognition methods, namely, principal component analysis (PCA), stepwise regression (SR), partial least-square regression (PLSR), and backpropagation neural network, were used to discriminate thermophilic and mesophilic proteins.

29 citations


Cited by
More filters
Journal Article
TL;DR: This volume is keyed to high resolution electron microscopy, which is a sophisticated form of structural analysis, but really morphology in a modern guise, the physical and mechanical background of the instrument and its ancillary tools are simply and well presented.
Abstract: I read this book the same weekend that the Packers took on the Rams, and the experience of the latter event, obviously, colored my judgment. Although I abhor anything that smacks of being a handbook (like, \"How to Earn a Merit Badge in Neurosurgery\") because too many volumes in biomedical science already evince a boyscout-like approach, I must confess that parts of this volume are fast, scholarly, and significant, with certain reservations. I like parts of this well-illustrated book because Dr. Sj6strand, without so stating, develops certain subjects on technique in relation to the acquisition of judgment and sophistication. And this is important! So, given that the author (like all of us) is somewhat deficient in some areas, and biased in others, the book is still valuable if the uninitiated reader swallows it in a general fashion, realizing full well that what will be required from the reader is a modulation to fit his vision, propreception, adaptation and response, and the kind of problem he is undertaking. A major deficiency of this book is revealed by comparison of its use of physics and of chemistry to provide understanding and background for the application of high resolution electron microscopy to problems in biology. Since the volume is keyed to high resolution electron microscopy, which is a sophisticated form of structural analysis, but really morphology in a modern guise, the physical and mechanical background of The instrument and its ancillary tools are simply and well presented. The potential use of chemical or cytochemical information as it relates to biological fine structure , however, is quite deficient. I wonder when even sophisticated morphol-ogists will consider fixation a reaction and not a technique; only then will the fundamentals become self-evident and predictable and this sine qua flon will become less mystical. Staining reactions (the most inadequate chapter) ought to be something more than a technique to selectively enhance contrast of morphological elements; it ought to give the structural addresses of some of the chemical residents of cell components. Is it pertinent that auto-radiography gets singled out for more complete coverage than other significant aspects of cytochemistry by a high resolution microscopist, when it has a built-in minimal error of 1,000 A in standard practice? I don't mean to blind-side (in strict football terminology) Dr. Sj6strand's efforts for what is \"routinely used in our laboratory\"; what is done is usually well done. It's just that …

3,197 citations

Journal ArticleDOI
TL;DR: This review is to discuss each of the five procedures of the introduction of pseudo amino acid composition (PseAAC), its different modes and applications as well as its recent development, particularly in how to use the general formulation of PseAAC to reflect the core and essential features that are deeply hidden in complicated protein sequences.

1,163 citations

Journal ArticleDOI
28 Jun 2010-PLOS ONE
TL;DR: A new predictor called “Plant-mPLoc” is developed by integrating the gene ontology information, functional domain information, and sequential evolutionary information through three different modes of pseudo amino acid composition that has the capacity to deal with multiple-location proteins beyond the reach of any existing predictors specialized for identifying plant protein subcellular localization.
Abstract: One of the fundamental goals in proteomics and cell biology is to identify the functions of proteins in various cellular organelles and pathways. Information of subcellular locations of proteins can provide useful insights for revealing their functions and understanding how they interact with each other in cellular network systems. Most of the existing methods in predicting plant protein subcellular localization can only cover three or four location sites, and none of them can be used to deal with multiplex plant proteins that can simultaneously exist at two, or move between, two or more different location sites. Actually, such multiplex proteins might have special biological functions worthy of particular notice. The present study was devoted to improve the existing plant protein subcellular location predictors from the aforementioned two aspects. A new predictor called “Plant-mPLoc” is developed by integrating the gene ontology information, functional domain information, and sequential evolutionary information through three different modes of pseudo amino acid composition. It can be used to identify plant proteins among the following 12 location sites: (1) cell membrane, (2) cell wall, (3) chloroplast, (4) cytoplasm, (5) endoplasmic reticulum, (6) extracellular, (7) Golgi apparatus, (8) mitochondrion, (9) nucleus, (10) peroxisome, (11) plastid, and (12) vacuole. Compared with the existing methods for predicting plant protein subcellular localization, the new predictor is much more powerful and flexible. Particularly, it also has the capacity to deal with multiple-location proteins, which is beyond the reach of any existing predictors specialized for identifying plant protein subcellular localization. As a user-friendly web-server, Plant-mPLoc is freely accessible at http://www.csbio.sjtu.edu.cn/bioinf/plant-multi/. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results. It is anticipated that the Plant-mPLoc predictor as presented in this paper will become a very useful tool in plant science as well as all the relevant areas.

669 citations

Journal ArticleDOI
TL;DR: Random forests has become a popular technique for classification, prediction, studying variable importance, variable selection, and outlier detection, and results of new tests regarding variable rankings based on RF variable importance measures are presented.

622 citations

Journal ArticleDOI
TL;DR: This minireview is to summarize the progresses by focusing on the following six aspects: Use the pseudo amino acid composition or PseAAC to predict various attributes of protein/peptide sequences that are useful for drug development.
Abstract: Facing the explosive growth of biological sequence data, such as those of protein/peptide and DNA/RNA, generated in the post-genomic age, many bioinformatical and mathematical approaches as well as physicochemical concepts have been introduced to timely derive useful informations from these biological sequences, in order to stimulate the development of medical science and drug design. Meanwhile, because of the rapid penetrations from these disciplines, medicinal chemistry is currently undergoing an unprecedented revolution. In this minireview, we are to summarize the progresses by focusing on the following six aspects. (1) Use the pseudo amino acid composition or PseAAC to predict various attributes of protein/peptide sequences that are useful for drug development. (2) Use pseudo oligonucleotide composition or PseKNC to do the same for DNA/RNA sequences. (3) Introduce the multi-label approach to study those systems where the constituent elements bear multiple characters and functions. (4) Utilize the graphical rules and "wenxiang" diagrams to analyze complicated biomedical systems. (5) Recent development in identifying the interactions of drugs with its various types of target proteins in cellular networking. (6) Distorted key theory and its application in developing peptide drugs.

487 citations