scispace - formally typeset
Search or ask a question
Author

Chunyan Ao

Other affiliations: Xidian University
Bio: Chunyan Ao is an academic researcher from University of Electronic Science and Technology of China. The author has contributed to research in topics: Feature selection & Random forest. The author has an hindex of 1, co-authored 1 publications receiving 1 citations. Previous affiliations of Chunyan Ao include Xidian University.

Papers
More filters
Journal ArticleDOI
24 May 2021-Methods
TL;DR: Based on hybrid features and a random forest, a novel predictor, RFhy-m2G, was developed to identify the n2-methylguanosine modification sites for three species.

18 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In the anticipated model, two kinds of feature descriptors, namely, binary and k-mer composition were used to encode the DNA sequences of Geobacter pickeringii.
Abstract: 4mC is a type of DNA alteration that has the ability to synchronize multiple biological movements, for example, DNA replication, gene expressions, and transcriptional regulations. Accurate prediction of 4mC sites can provide exact information to their hereditary functions. The purpose of this study was to establish a robust deep learning model to recognize 4mC sites in Geobacter pickeringii. In the anticipated model, two kinds of feature descriptors, namely, binary and k-mer composition were used to encode the DNA sequences of Geobacter pickeringii. The obtained features from their fusion were optimized by using correlation and gradient-boosting decision tree (GBDT)-based algorithm with incremental feature selection (IFS) method. Then, these optimized features were inserted into 1D convolutional neural network (CNN) to classify 4mC sites from non-4mC sites in Geobacter pickeringii. The performance of the anticipated model on independent data exhibited an accuracy of 0.868, which was 4.2% higher than the existing model.

17 citations

Journal ArticleDOI
01 Jan 2022-Research
TL;DR: There are many branches of biological sequence classification research as mentioned in this paper , including function and modification classification of biological sequences based on machine learning, which is the basic tasks to understand the biological functions of DNA, RNA, proteins, and peptides.
Abstract: With the rapid development of biotechnology, the number of biological sequences has grown exponentially. The continuous expansion of biological sequence data promotes the application of machine learning in biological sequences to construct predictive models for mining biological sequence information. There are many branches of biological sequence classification research. In this review, we mainly focus on the function and modification classification of biological sequences based on machine learning. Sequence-based prediction and analysis are the basic tasks to understand the biological functions of DNA, RNA, proteins, and peptides. However, there are hundreds of classification models developed for biological sequences, and the quite varied specific methods seem dizzying at first glance. Here, we aim to establish a long-term support website ( http://lab.malab.cn/~acy/BioseqData/home.html ), which provides readers with detailed information on the classification method and download links to relevant datasets. We briefly introduce the steps to build an effective model framework for biological sequence data. In addition, a brief introduction to single-cell sequencing data analysis methods and applications in biology is also included. Finally, we discuss the current challenges and future perspectives of biological sequence classification research.

13 citations

Journal ArticleDOI
TL;DR: In this paper , a Random Forest (RF)-based model, called Bitter-RF, was developed for identifying bitter peptides. But, the model was not used to build a prediction model for the peptide.
Abstract: Introduction Bitter peptides are short peptides with potential medical applications. The huge potential behind its bitter taste remains to be tapped. To better explore the value of bitter peptides in practice, we need a more effective classification method for identifying bitter peptides. Methods In this study, we developed a Random forest (RF)-based model, called Bitter-RF, using sequence information of the bitter peptide. Bitter-RF covers more comprehensive and extensive information by integrating 10 features extracted from the bitter peptides and achieves better results than the latest generation model on independent validation set. Results The proposed model can improve the accurate classification of bitter peptides (AUROC = 0.98 on independent set test) and enrich the practical application of RF method in protein classification tasks which has not been used to build a prediction model for bitter peptides. Discussion We hope the Bitter-RF could provide more conveniences to scholars for bitter peptide research.

10 citations

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors investigated the relationship between four main factors (environment, habits, parental vision, and demographic) and myopia status by analyzing the questionnaire data, and found that the 4 most influential features with XGBoost could achieve a competitive AUC of 0.764.

9 citations

Journal ArticleDOI
TL;DR: In this article, a new predictor, DrugHybrid_BS, is developed based on hybrid features and Bagging-SVM to identify potentially druggable proteins, which combines the three features of monoDiKGap (k=2), cross-covariance, and grouped amino acid composition.
Abstract: Drug targets are biological macromolecules or biomolecule structures capable of specifically binding a therapeutic effect with a particular drug or regulating physiological functions. Due to the important value and role of drug targets in recent years, the prediction of potential drug targets has become a research hotspot. The key to the research and development of modern new drugs is first to identify potential drug targets. In this paper, a new predictor, DrugHybrid_BS, is developed based on hybrid features and Bagging-SVM to identify potentially druggable proteins. This method combines the three features of monoDiKGap (k=2), cross-covariance, and grouped amino acid composition. It removes redundant features and analyses key features through MRMD and MRMD2.0. The cross-validation results show that 96.9944% of the potentially druggable proteins can be accurately identified, and the accuracy of the independent test set has reached 96.5665%. This all means that DrugHybrid_BS has the potential to become a useful predictive tool for druggable proteins. In addition, the hybrid key features can identify 80.0343% of the potentially druggable proteins combined with Bagging-SVM, which indicates the significance of this part of the features for research.

8 citations