scispace - formally typeset
Open AccessJournal ArticleDOI

iTTCA-RF: a random forest predictor for tumor T cell antigens.

Reads0
Chats0
TLDR
Li et al. as mentioned in this paper used four types feature encoding methods to build an efficient predictor, including amino acid composition, global protein sequence descriptors and grouped amino acid and peptide composition, and employed a two-step feature selection technique to search for the optimal feature subset.
Abstract
Cancer is one of the most serious diseases threatening human health. Cancer immunotherapy represents the most promising treatment strategy due to its high efficacy and selectivity and lower side effects compared with traditional treatment. The identification of tumor T cell antigens is one of the most important tasks for antitumor vaccines development and molecular function investigation. Although several machine learning predictors have been developed to identify tumor T cell antigen, more accurate tumor T cell antigen identification by existing methodology is still challenging. In this study, we used a non-redundant dataset of 592 tumor T cell antigens (positive samples) and 393 tumor T cell antigens (negative samples). Four types feature encoding methods have been studied to build an efficient predictor, including amino acid composition, global protein sequence descriptors and grouped amino acid and peptide composition. To improve the feature representation ability of the hybrid features, we further employed a two-step feature selection technique to search for the optimal feature subset. The final prediction model was constructed using random forest algorithm. Finally, the top 263 informative features were selected to train the random forest classifier for detecting tumor T cell antigen peptides. iTTCA-RF provides satisfactory performance, with balanced accuracy, specificity and sensitivity values of 83.71%, 78.73% and 88.69% over tenfold cross-validation as well as 73.14%, 62.67% and 83.61% over independent tests, respectively. The online prediction server was freely accessible at http://lab.malab.cn/~acy/iTTCA . We have proven that the proposed predictor iTTCA-RF is superior to the other latest models, and will hopefully become an effective and useful tool for identifying tumor T cell antigens presented in the context of major histocompatibility complex class I.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

AAPred-CNN: accurate predictor based on deep convolution neural network for identification of anti-angiogenic peptides.

Changhang Lin, +2 more
- 01 Jan 2022 - 
TL;DR: Wang et al. as discussed by the authors proposed a deep convolution neural network-based predictor (AAPred-CNN) for anti-angiogenic peptides, which can achieve superior or comparable performance to the state-of-the-art model, although they are given a few labeled sequences to train.
Journal ArticleDOI

PSRTTCA: A new approach for improving the prediction and characterization of tumor T cell antigens using propensity score representation learning

TL;DR: In this article , a new machine learning (ML)-based approach for improving the identification and characterization of tumor T cell antigens (TTCAs) based on their primary sequences is proposed.
Journal ArticleDOI

Empirical comparison and recent advances of computational prediction of hormone binding proteins using machine learning methods.

TL;DR: In this article , the authors compared and assessed the implementation of ML-based tools in recognition of hormone binding proteins (HBPs) in a unique way, and they hope that this study will give enough awareness and knowledge for research on HBPs.
Journal ArticleDOI

HKAM-MKM: A hybrid kernel alignment maximization-based multiple kernel model for identifying DNA-binding proteins

TL;DR: Li et al. as discussed by the authors proposed a hybrid kernel alignment maximization-based multiple kernel model (HKAM-MKM) for predicting DNA-binding proteins (DBPs) and obtained the highest Matthew's correlation coefficient (MCC) (0.768 and 0.5962, respectively) and the highest accuracy (87.1% and 78.43%).
Journal ArticleDOI

CD8TCEI-EukPath: A Novel Predictor to Rapidly Identify CD8+ T-Cell Epitopes of Eukaryotic Pathogens Using a Hybrid Feature Selection Approach

TL;DR: This work proposed a novel predictor called CD8TCEI-EukPath to detect CD8+ TCEs of eukaryotic pathogens, which will contribute to rapidly screening epitope-based vaccine candidates, particularly from large peptide-coding datasets.
References
More filters
Journal Article

Scikit-learn: Machine Learning in Python

TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Journal Article

Visualizing Data using t-SNE

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Journal ArticleDOI

SMOTE: synthetic minority over-sampling technique

TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Journal ArticleDOI

SMOTE: Synthetic Minority Over-sampling Technique

TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Related Papers (5)