iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators.

doi:10.1093/BIOINFORMATICS/BTY827

Journal ArticleDOI

iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators.

Chao-Qin Feng, +7 more

- 01 May 2019 -

Bioinformatics

- Vol. 35, Iss: 9, pp 1469-1477

Chats0

TLDR

A new predictor based on support vector machine to identify transcription terminators based on pseudo k-tuple nucleotide composition (PseKNC) that could become a powerful tool for bacterial terminator recognition.

Abstract:

Motivation Transcription termination is an important regulatory step of gene expression. If there is no terminator in gene, transcription could not stop, which will result in abnormal gene expression. Detecting such terminators can determine the operon structure in bacterial organisms and improve genome annotation. Thus, accurate identification of transcriptional terminators is essential and extremely important in the research of transcription regulations. Results In this study, we developed a new predictor called 'iTerm-PseKNC' based on support vector machine to identify transcription terminators. The binomial distribution approach was used to pick out the optimal feature subset derived from pseudo k-tuple nucleotide composition (PseKNC). The 5-fold cross-validation test results showed that our proposed method achieved an accuracy of 95%. To further evaluate the generalization ability of 'iTerm-PseKNC', the model was examined on independent datasets which are experimentally confirmed Rho-independent terminators in Escherichia coli and Bacillus subtilis genomes. As a result, all the terminators in E. coli and 87.5% of the terminators in B. subtilis were correctly identified, suggesting that the proposed model could become a powerful tool for bacterial terminator recognition. Availability and implementation For the convenience of most of wet-experimental researchers, the web-server for 'iTerm-PseKNC' was established at http://lin-group.cn/server/iTerm-PseKNC/, by which users can easily obtain their desired result without the need to go through the detailed mathematical equations involved.

iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators.

Citations

i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome

Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique

Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation

Machine intelligence in peptide therapeutics: A next-generation tool for rapid disease screening

Identification of hormone binding proteins based on machine learning methods

References

LIBSVM: A library for support vector machines

A Tutorial on Support Vector Machines for Pattern Recognition

Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

Feature selection based on mutual information: criteria ofmax-dependency, max-relevance, and min-redundancy

Cd-hit

Related Papers (5)

iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties.

Predicting protein structural classes for low-similarity sequences by evaluating different features

i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome

Some remarks on protein attribute prediction and pseudo amino acid composition.

iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition