iTTCA-RF: a random forest predictor for tumor T cell antigens.
Reads0
Chats0
TLDR
Li et al. as mentioned in this paper used four types feature encoding methods to build an efficient predictor, including amino acid composition, global protein sequence descriptors and grouped amino acid and peptide composition, and employed a two-step feature selection technique to search for the optimal feature subset.Abstract:
Cancer is one of the most serious diseases threatening human health. Cancer immunotherapy represents the most promising treatment strategy due to its high efficacy and selectivity and lower side effects compared with traditional treatment. The identification of tumor T cell antigens is one of the most important tasks for antitumor vaccines development and molecular function investigation. Although several machine learning predictors have been developed to identify tumor T cell antigen, more accurate tumor T cell antigen identification by existing methodology is still challenging. In this study, we used a non-redundant dataset of 592 tumor T cell antigens (positive samples) and 393 tumor T cell antigens (negative samples). Four types feature encoding methods have been studied to build an efficient predictor, including amino acid composition, global protein sequence descriptors and grouped amino acid and peptide composition. To improve the feature representation ability of the hybrid features, we further employed a two-step feature selection technique to search for the optimal feature subset. The final prediction model was constructed using random forest algorithm. Finally, the top 263 informative features were selected to train the random forest classifier for detecting tumor T cell antigen peptides. iTTCA-RF provides satisfactory performance, with balanced accuracy, specificity and sensitivity values of 83.71%, 78.73% and 88.69% over tenfold cross-validation as well as 73.14%, 62.67% and 83.61% over independent tests, respectively. The online prediction server was freely accessible at http://lab.malab.cn/~acy/iTTCA
. We have proven that the proposed predictor iTTCA-RF is superior to the other latest models, and will hopefully become an effective and useful tool for identifying tumor T cell antigens presented in the context of major histocompatibility complex class I.read more
Citations
More filters
Journal ArticleDOI
CRBPDL: Identification of circRNA-RBP interaction sites using an ensemble neural network approach
Mengting Niu,Quan Zou,Chen Lin +2 more
TL;DR: A novel calculation model, CRBPDL, which uses an Adaboost integrated deep hierarchical network to identify the binding sites of circular RNA-RBP and is capable of performing universal, reliable, and robust.
Journal ArticleDOI
Protein–DNA/RNA interactions: Machine intelligence tools and approaches in the era of artificial intelligence and big data
TL;DR: An overview of the development progress of computational methods for protein–DNA/RNA interactions using machine intelligence techniques is provided and the advantages and shortcomings of these methods are summarized.
Journal ArticleDOI
Risk prediction of diabetes and pre-diabetes based on physical examination data.
Yumei Han,Hui Yang,Qin-Lai Huang,Zi-Jie Sun,Ming Liang Li,Jingbo Zhang,Ke-Jun Deng,Shuo Chen,Hao Lin +8 more
TL;DR: This work collected the physical examination data from Beijing Physical Examination Center from January 2006 to December 2017, and divided the population into three groups according to the WHO (1999) Diabetes Diagnostic Standards.
Journal ArticleDOI
DeepMC-iNABP: Deep learning for multiclass identification and classification of nucleic acid-binding proteins
TL;DR: In this article , a computational predictor, called DeepMC-iNABP, was proposed to solve the problem of ignoring DNA- and RNA-binding proteins (DRBPs), and the cross-predicting problem referring to DBP predictors predicting DBPs as RBPs, and vice versa.
Journal ArticleDOI
Prediction of Hormone-Binding Proteins Based on K-mer Feature Representation and Naive Bayes
TL;DR: Wang et al. as mentioned in this paper used k-mer (K=3) feature representation method to extract features, and feature selection algorithm was used to reduce the dimensionality of the extracted features and select the appropriate optimal feature set.
References
More filters
Journal ArticleDOI
Non-normal data: Is ANOVA still a valid option?
TL;DR: This study provides a systematic examination of F‐test robustness to violations of normality in terms of Type I error, considering a wide variety of distributions commonly found in the health and social sciences.
Journal Article
Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning
TL;DR: imbalanced-learn as mentioned in this paper is an open-source python toolbox aiming at providing a wide range of methods to cope with the problem of imbalanced dataset frequently encountered in machine learning and pattern recognition.
Journal ArticleDOI
Prediction of protein folding class using global description of amino acid sequence.
TL;DR: A method for predicting protein folding class based on global protein chain description and a voting process, achieved by a computer-simulated neural network trained on a data base consisting of 83 folding classes, shows that proteins were assigned to the correct class with an average accuracy.
Journal ArticleDOI
DUNet: A deformable network for retinal vessel segmentation
TL;DR: Wang et al. as discussed by the authors proposed Deformable U-Net (DUNet), which exploits the retinal vessels' local features with a U-shape architecture, in an end-to-end manner for retinal vessel segmentation.
Journal ArticleDOI
iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences.
Zhen Chen,Pei Zhao,Fuyi Li,André Leier,Tatiana T. Marquez-Lago,Yanan Wang,Geoffrey I. Webb,A. Ian Smith,Roger J. Daly,Kuo-Chen Chou,Jiangning Song +10 more
TL;DR: iFeature is a versatile Python‐based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences, capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors.