scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Protein Folds Prediction with Hierarchical Structured SVM

31 May 2016-Current Proteomics-Vol. 13, Iss: 2, pp 79-85
About: This article is published in Current Proteomics.The article was published on 2016-05-31. It has received 112 citations till now. The article focuses on the topics: Structured support vector machine.
Citations
More filters
Journal ArticleDOI
TL;DR: This review selected several popular clustering tools, briefly explained the key computing principles, analyzed their characters and compared them using two independent benchmark datasets to assist bioinformatics users in employing suitable clustering tool effectively to analyze big sequencing data.
Abstract: Sequence clustering is a basic bioinformatics task that is attracting renewed attention with the development of metagenomics and microbiomics. The latest sequencing techniques have decreased costs and as a result, massive amounts of DNA/RNA sequences are being produced. The challenge is to cluster the sequence data using stable, quick and accurate methods. For microbiome sequencing data, 16S ribosomal RNA operational taxonomic units are typically used. However, there is often a gap between algorithm developers and bioinformatics users. Different software tools can produce diverse results and users can find them difficult to analyze. Understanding the different clustering mechanisms is crucial to understanding the results that they produce. In this review, we selected several popular clustering tools, briefly explained the key computing principles, analyzed their characters and compared them using two independent benchmark datasets. Our aim is to assist bioinformatics users in employing suitable clustering tools effectively to analyze big sequencing data. Related data, codes and software tools were accessible at the link http://lab.malab.cn/∼lg/clustering/.

170 citations

Journal ArticleDOI
TL;DR: A new predictor based on support vector machine to identify transcription terminators based on pseudo k-tuple nucleotide composition (PseKNC) that could become a powerful tool for bacterial terminator recognition.
Abstract: Motivation Transcription termination is an important regulatory step of gene expression. If there is no terminator in gene, transcription could not stop, which will result in abnormal gene expression. Detecting such terminators can determine the operon structure in bacterial organisms and improve genome annotation. Thus, accurate identification of transcriptional terminators is essential and extremely important in the research of transcription regulations. Results In this study, we developed a new predictor called 'iTerm-PseKNC' based on support vector machine to identify transcription terminators. The binomial distribution approach was used to pick out the optimal feature subset derived from pseudo k-tuple nucleotide composition (PseKNC). The 5-fold cross-validation test results showed that our proposed method achieved an accuracy of 95%. To further evaluate the generalization ability of 'iTerm-PseKNC', the model was examined on independent datasets which are experimentally confirmed Rho-independent terminators in Escherichia coli and Bacillus subtilis genomes. As a result, all the terminators in E. coli and 87.5% of the terminators in B. subtilis were correctly identified, suggesting that the proposed model could become a powerful tool for bacterial terminator recognition. Availability and implementation For the convenience of most of wet-experimental researchers, the web-server for 'iTerm-PseKNC' was established at http://lin-group.cn/server/iTerm-PseKNC/, by which users can easily obtain their desired result without the need to go through the detailed mathematical equations involved.

165 citations

Journal ArticleDOI
Ruolan Chen1, Xiangrong Liu1, Shuting Jin1, Jiawei Lin1, Juan Liu1 
TL;DR: A hierarchical classification scheme is adopted and several representative methods of each category of drug-target interaction prediction are introduced, especially the recent state-of-the-art methods.
Abstract: Identifying drug-target interactions will greatly narrow down the scope of search of candidate medications, and thus can serve as the vital first step in drug discovery Considering that in vitro experiments are extremely costly and time-consuming, high efficiency computational prediction methods could serve as promising strategies for drug-target interaction (DTI) prediction In this review, our goal is to focus on machine learning approaches and provide a comprehensive overview First, we summarize a brief list of databases frequently used in drug discovery Next, we adopt a hierarchical classification scheme and introduce several representative methods of each category, especially the recent state-of-the-art methods In addition, we compare the advantages and limitations of methods in each category Lastly, we discuss the remaining challenges and future outlook of machine learning in DTI prediction This article may provide a reference and tutorial insights on machine learning-based DTI prediction for future researchers

162 citations

Journal ArticleDOI
TL;DR: This study proposes a machine learning based predictor, namely 4mcPred‐SVM, for the genome‐wide detection of DNA 4mC sites, and presents a new feature representation algorithm that sufficiently exploits sequence‐based information.
Abstract: Motivation As one of important epigenetic modifications, DNA N4-methylcytosine (4mC) is recently shown to play crucial roles in restriction-modification systems. For better understanding of their functional mechanisms, it is fundamentally important to identify 4mC modification. Machine learning methods have recently emerged as an effective and efficient approach for the high-throughput identification of 4mC sites, although high predictive error rates are still challenging for existing methods. Therefore, it is highly desirable to develop a computational method to more accurately identify m4C sites. Results In this study, we propose a machine learning based predictor, namely 4mcPred-SVM, for the genome-wide detection of DNA 4mC sites. In this predictor, we present a new feature representation algorithm that sufficiently exploits sequence-based information. To improve the feature representation ability, we use a two-step feature optimization strategy, thereby obtaining the most representative features. Using the resulting features and Support Vector Machine (SVM), we adaptively train the optimal models for different species. Comparative results on benchmark datasets from six species indicate that our predictor is able to achieve generally better performance in predicting 4mC sites as compared to the state-of-the-art predictors. Importantly, the sequence-based features can reliably and robust predict 4mC sites, facilitating the discovery of potentially important sequence characteristics for the prediction of 4mC sites. Availability and implementation The user-friendly webserver that implements the proposed 4mcPred-SVM is well established, and is freely accessible at http://server.malab.cn/4mcPred-SVM. Supplementary information Supplementary data are available at Bioinformatics online.

139 citations

Journal ArticleDOI
TL;DR: This study proposed a support vector machine-based model to predict 2'-O-methylation sites in H. sapiens, and the RNA sequences were encoded with the optimal features obtained from feature selection.
Abstract: 2'-O-methylation plays an important biological role in gene expression. Owing to the explosive increase in genomic sequencing data, it is necessary to develop a method for quickly and efficiently identifying whether a sequence contains the 2'-O-methylation site. As an additional method to the experimental technique, a computational method may help to identify 2'-O-methylation sites. In this study, based on the experimental 2'-O-methylation data of Homo sapiens, we proposed a support vector machine-based model to predict 2'-O-methylation sites in H. sapiens. In this model, the RNA sequences were encoded with the optimal features obtained from feature selection. In the fivefold cross-validation test, the accuracy reached 97.95%.

124 citations