scispace - formally typeset
Journal ArticleDOI

iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators.

Reads0
Chats0
TLDR
A new predictor based on support vector machine to identify transcription terminators based on pseudo k-tuple nucleotide composition (PseKNC) that could become a powerful tool for bacterial terminator recognition.
Abstract
Motivation Transcription termination is an important regulatory step of gene expression. If there is no terminator in gene, transcription could not stop, which will result in abnormal gene expression. Detecting such terminators can determine the operon structure in bacterial organisms and improve genome annotation. Thus, accurate identification of transcriptional terminators is essential and extremely important in the research of transcription regulations. Results In this study, we developed a new predictor called 'iTerm-PseKNC' based on support vector machine to identify transcription terminators. The binomial distribution approach was used to pick out the optimal feature subset derived from pseudo k-tuple nucleotide composition (PseKNC). The 5-fold cross-validation test results showed that our proposed method achieved an accuracy of 95%. To further evaluate the generalization ability of 'iTerm-PseKNC', the model was examined on independent datasets which are experimentally confirmed Rho-independent terminators in Escherichia coli and Bacillus subtilis genomes. As a result, all the terminators in E. coli and 87.5% of the terminators in B. subtilis were correctly identified, suggesting that the proposed model could become a powerful tool for bacterial terminator recognition. Availability and implementation For the convenience of most of wet-experimental researchers, the web-server for 'iTerm-PseKNC' was established at http://lin-group.cn/server/iTerm-PseKNC/, by which users can easily obtain their desired result without the need to go through the detailed mathematical equations involved.

read more

Citations
More filters
Journal ArticleDOI

i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome

TL;DR: A computational method called i6mA-Pred was developed to identify 6mA sites in the rice genome, in which the optimal nucleotide chemical properties obtained by the using feature selection technique were used to encode the DNA sequences.
Journal ArticleDOI

Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique

TL;DR: A predictor called iORI-PseKNC2.0 to identify ORIs in the Saccharomyces cerevisiae genome based on sequence information was developed and a user-friendly webserver was established to provide more convenience to most of wet-experimental scholars.
Journal ArticleDOI

Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation

TL;DR: Meta-4mCpred, the first meta-predictor for 4mC site prediction, is proposed, which achieved an overall average accuracy of 86% on independent datasets evaluation, which is over 4% higher than those yielded by the state-of-the-art predictors.
Journal ArticleDOI

Machine intelligence in peptide therapeutics: A next-generation tool for rapid disease screening

TL;DR: Overall, it is shown that using ML models in peptide research can streamline the development of targeted peptide therapies and avoid the common pitfalls and challenges of using ML approaches for peptide therapeutics.
Journal ArticleDOI

Identification of hormone binding proteins based on machine learning methods

TL;DR: A machine learning-based method was proposed to identify HBP, in which the samples were encoded by using the optimal tripeptide composition obtained based on the binomial distribution method.
References
More filters
Journal ArticleDOI

LIBSVM: A library for support vector machines

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Journal ArticleDOI

A Tutorial on Support Vector Machines for Pattern Recognition

TL;DR: There are several arguments which support the observed high accuracy of SVMs, which are reviewed and numerous examples and proofs of most of the key theorems are given.
Journal ArticleDOI

Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

TL;DR: In this article, the maximal statistical dependency criterion based on mutual information (mRMR) was proposed to select good features according to the maximal dependency condition. But the problem of feature selection is not solved by directly implementing mRMR.

Feature selection based on mutual information: criteria ofmax-dependency, max-relevance, and min-redundancy

TL;DR: This work derives an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection, and presents a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers).
Journal ArticleDOI

Cd-hit

TL;DR: A new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets to reduce sequence redundancy and improve the performance of other sequence analyses is developed.
Related Papers (5)