scispace - formally typeset
Search or ask a question
Author

Ao Li

Other affiliations: Yale University
Bio: Ao Li is an academic researcher from University of Science and Technology of China. The author has contributed to research in topics: Medicine & Computer science. The author has an hindex of 28, co-authored 111 publications receiving 2483 citations. Previous affiliations of Ao Li include Yale University.


Papers
More filters
Journal ArticleDOI
TL;DR: It is proposed that PPSP could be a potentially powerful tool for the experimentalists who are focusing on phosphorylation substrates with their PK-specific sites identification and the BDT strategy could also be a ubiquitous approach for PTMs, such as sumoylation and ubiquitination, etc.
Abstract: As a reversible and dynamic post-translational modification (PTM) of proteins, phosphorylation plays essential regulatory roles in a broad spectrum of the biological processes. Although many studies have been contributed on the molecular mechanism of phosphorylation dynamics, the intrinsic feature of substrates specificity is still elusive and remains to be delineated. In this work, we present a novel, versatile and comprehensive program, PPSP (Prediction of PK-specific Phosphorylation site), deployed with approach of Bayesian decision theory (BDT). PPSP could predict the potential phosphorylation sites accurately for ~70 PK (Protein Kinase) groups. Compared with four existing tools Scansite, NetPhosK, KinasePhos and GPS, PPSP is more accurate and powerful than these tools. Moreover, PPSP also provides the prediction for many novel PKs, say, TRK, mTOR, SyK and MET/RON, etc. The accuracy of these novel PKs are also satisfying. Taken together, we propose that PPSP could be a potentially powerful tool for the experimentalists who are focusing on phosphorylation substrates with their PK-specific sites identification. Moreover, the BDT strategy could also be a ubiquitous approach for PTMs, such as sumoylation and ubiquitination, etc.

207 citations

Journal ArticleDOI
TL;DR: This study proposes a Multimodal Deep Neural Network by integrating Multi-dimensional Data (MDNNMD) for the prognosis prediction of breast cancer and shows that the proposed method achieves a better performance than the prediction methods with single-dimensional data and other existing approaches.
Abstract: Breast cancer is a highly aggressive type of cancer with very low median survival. Accurate prognosis prediction of breast cancer can spare a significant number of patients from receiving unnecessary adjuvant systemic treatment and its related expensive medical costs. Previous work relies mostly on selected gene expression data to create a predictive model. The emergence of deep learning methods and multi-dimensional data offers opportunities for more comprehensive analysis of the molecular characteristics of breast cancer and therefore can improve diagnosis, treatment, and prevention. In this study, we propose a Multimodal Deep Neural Network by integrating Multi-dimensional Data (MDNNMD) for the prognosis prediction of breast cancer. The novelty of the method lies in the design of our method's architecture and the fusion of multi-dimensional data. The comprehensive performance evaluation results show that the proposed method achieves a better performance than the prediction methods with single-dimensional data and other existing approaches. The source code implemented by TensorFlow 1.0 deep learning library can be downloaded from the Github: https://github.com/USTC-HIlab/MDNNMD.

174 citations

Journal ArticleDOI
TL;DR: An online web server based on this method has been developed and is freely available to both academic and commercial users, which can be accessed by at and results indicate that LOCSVMPSI is a powerful tool for the prediction of eukaryotic protein subcellular localization.
Abstract: Subcellular location of a protein is one of the key functional characters as proteins must be localized correctly at the subcellular level to have normal biological function. In this paper, a novel method named LOCSVMPSI has been introduced, which is based on the support vector machine (SVM) and the position-specific scoring matrix generated from profiles of PSI-BLAST. With a jackknife test on the RH2427 data set, LOCSVMPSI achieved a high overall prediction accuracy of 90.2%, which is higher than the prediction results by SubLoc and ESLpred on this data set. In addition, prediction performance of LOCSVMPSI was evaluated with 5-fold cross validation test on the PK7579 data set and the prediction results were consistently better than the previous method based on several SVMs using composition of both amino acids and amino acid pairs. Further test on the SWISSPROT new-unique data set showed that LOCSVMPSI also performed better than some widely used prediction methods, such as PSORTII, TargetP and LOCnet. All these results indicate that LOCSVMPSI is a powerful tool for the prediction of eukaryotic protein subcellular localization. An online web server (current version is 1.3) based on this method has been developed and is freely available to both academic and commercial users, which can be accessed by at http://Bioinformatics.ustc.edu.cn/LOCSVMPSI/LOCSVMPSI.php.

163 citations

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed an orthogonal coding input scheme, which makes use of multi-missing values in one row of a certain gene expression profile and imputes the missing value into a much higher dimensional space, to obtain better performance.
Abstract: Background Gene expression profiling has become a useful biological resource in recent years, and it plays an important role in a broad range of areas in biology. The raw gene expression data, usually in the form of large matrix, may contain missing values. The downstream analysis methods that postulate complete matrix input are thus not applicable. Several methods have been developed to solve this problem, such as K nearest neighbor impute method, Bayesian principal components analysis impute method, etc. In this paper, we introduce a novel imputing approach based on the Support Vector Regression (SVR) method. The proposed approach utilizes an orthogonal coding input scheme, which makes use of multi-missing values in one row of a certain gene expression profile and imputes the missing value into a much higher dimensional space, to obtain better performance.

148 citations

Journal ArticleDOI
TL;DR: It is found that cloning efficiency increases over the differentiation hierarchy, and terminally differentiated postmitotic granulocytes yield cloned pups with the greatest cloning efficiency.
Abstract: Since the creation of Dolly via somatic cell nuclear transfer (SCNT), more than a dozen species of mammals have been cloned using this technology. One hypothesis for the limited success of cloning via SCNT (1%-5%) is that the clones are likely to be derived from adult stem cells. Support for this hypothesis comes from the findings that the reproductive cloning efficiency for embryonic stem cells is five to ten times higher than that for somatic cells as donors and that cloned pups cannot be produced directly from cloned embryos derived from differentiated B and T cells or neuronal cells. The question remains as to whether SCNT-derived animal clones can be derived from truly differentiated somatic cells. We tested this hypothesis with mouse hematopoietic cells at different differentiation stages: hematopoietic stem cells, progenitor cells and granulocytes. We found that cloning efficiency increases over the differentiation hierarchy, and terminally differentiated postmitotic granulocytes yield cloned pups with the greatest cloning efficiency.

126 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The properties of three well-known N-terminal sequence motifs directing proteins to the secretory pathway, mitochondria and chloroplasts are described and a brief history of methods to predict subcellular localization based on these sorting signals and other sequence properties are sketched.
Abstract: Determining the subcellular localization of a protein is an important first step toward understanding its function. Here, we describe the properties of three well-known N-terminal sequence motifs directing proteins to the secretory pathway, mitochondria and chloroplasts, and sketch a brief history of methods to predict subcellular localization based on these sorting signals and other sequence properties. We then outline how to use a number of internet-accessible tools to arrive at a reliable subcellular localization prediction for eukaryotic and prokaryotic proteins. In particular, we provide detailed step-by-step instructions for the coupled use of the amino-acid sequence-based predictors TargetP, SignalP, ChloroP and TMHMM, which are all hosted at the Center for Biological Sequence Analysis, Technical University of Denmark. In addition, we describe and provide web references to other useful subcellular localization predictors. Finally, we discuss predictive performance measures in general and the performance of TargetP and SignalP in particular.

3,235 citations