iTTCA-RF: a random forest predictor for tumor T cell antigens.

doi:10.1186/S12967-021-03084-X

Open AccessJournal ArticleDOI

iTTCA-RF: a random forest predictor for tumor T cell antigens.

Shihu Jiao, +3 more

- 27 Oct 2021 -

Journal of Translational Medicine

- Vol. 19, Iss: 1, pp 449

Chats0

TLDR

Li et al. as mentioned in this paper used four types feature encoding methods to build an efficient predictor, including amino acid composition, global protein sequence descriptors and grouped amino acid and peptide composition, and employed a two-step feature selection technique to search for the optimal feature subset.

Abstract:

Cancer is one of the most serious diseases threatening human health. Cancer immunotherapy represents the most promising treatment strategy due to its high efficacy and selectivity and lower side effects compared with traditional treatment. The identification of tumor T cell antigens is one of the most important tasks for antitumor vaccines development and molecular function investigation. Although several machine learning predictors have been developed to identify tumor T cell antigen, more accurate tumor T cell antigen identification by existing methodology is still challenging. In this study, we used a non-redundant dataset of 592 tumor T cell antigens (positive samples) and 393 tumor T cell antigens (negative samples). Four types feature encoding methods have been studied to build an efficient predictor, including amino acid composition, global protein sequence descriptors and grouped amino acid and peptide composition. To improve the feature representation ability of the hybrid features, we further employed a two-step feature selection technique to search for the optimal feature subset. The final prediction model was constructed using random forest algorithm. Finally, the top 263 informative features were selected to train the random forest classifier for detecting tumor T cell antigen peptides. iTTCA-RF provides satisfactory performance, with balanced accuracy, specificity and sensitivity values of 83.71%, 78.73% and 88.69% over tenfold cross-validation as well as 73.14%, 62.67% and 83.61% over independent tests, respectively. The online prediction server was freely accessible at http://lab.malab.cn/~acy/iTTCA . We have proven that the proposed predictor iTTCA-RF is superior to the other latest models, and will hopefully become an effective and useful tool for identifying tumor T cell antigens presented in the context of major histocompatibility complex class I.

iTTCA-RF: a random forest predictor for tumor T cell antigens.

Citations

CRBPDL: Identification of circRNA-RBP interaction sites using an ensemble neural network approach

Protein–DNA/RNA interactions: Machine intelligence tools and approaches in the era of artificial intelligence and big data

Risk prediction of diabetes and pre-diabetes based on physical examination data.

DeepMC-iNABP: Deep learning for multiclass identification and classification of nucleic acid-binding proteins

Prediction of Hormone-Binding Proteins Based on K-mer Feature Representation and Naive Bayes

References

Non-normal data: Is ANOVA still a valid option?

Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning

Prediction of protein folding class using global description of amino acid sequence.

DUNet: A deformable network for retinal vessel segmentation

iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences.

Related Papers (5)

TAP 1.0: A robust immunoinformatic tool for the prediction of tumor T-cell antigens based on AAindex properties.

scPred: scPred: Cell type prediction at single-cell resolution

scPred: Cell type prediction at single-cell resolution

An efficient statistical feature selection approach for classification of gene expression data

Immunosignature Screening for Multiple Cancer Subtypes Based on Expression Rule