scispace - formally typeset
Open AccessJournal ArticleDOI

Learning interpretable SVMs for biological sequence classification.

TLDR
Novel and efficient algorithms are proposed for solving the so-called Support Vector Multiple Kernel Learning problem and can be used to understand the obtained support vector decision function in order to extract biologically relevant knowledge about the sequence analysis problem at hand.
Abstract
Support Vector Machines (SVMs) – using a variety of string kernels – have been successfully applied to biological sequence classification problems. While SVMs achieve high classification accuracy they lack interpretability. In many applications, it does not suffice that an algorithm just detects a biological signal in the sequence, but it should also provide means to interpret its solution in order to gain biological insight. We propose novel and efficient algorithms for solving the so-called Support Vector Multiple Kernel Learning problem. The developed techniques can be used to understand the obtained support vector decision function in order to extract biologically relevant knowledge about the sequence analysis problem at hand. We apply the proposed methods to the task of acceptor splice site prediction and to the problem of recognizing alternatively spliced exons. Our algorithms compute sparse weightings of substring locations, highlighting which parts of the sequence are important for discrimination. The proposed method is able to deal with thousands of examples while combining hundreds of kernels within reasonable time, and reliably identifies a few statistically significant positions.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal Article

Large Scale Multiple Kernel Learning

TL;DR: It is shown that the proposed multiple kernel learning algorithm can be rewritten as a semi-infinite linear program that can be efficiently solved by recycling the standard SVM implementations, and generalize the formulation and the method to a larger class of problems, including regression and one-class classification.
Journal ArticleDOI

On the interpretation of weight vectors of linear models in multivariate neuroimaging.

TL;DR: It is demonstrated that the parameters of forward models are neurophysiologically interpretable in the sense that significant nonzero weights are only observed at channels the activity of which is related to the brain process under study, in contrast to the interpretation of backward model parameters.
Journal ArticleDOI

l p -Norm Multiple Kernel Learning

TL;DR: Empirical applications of lp-norm MKL to three real-world problems from computational biology show that non-sparse MKL achieves accuracies that surpass the state-of-the-art, and two efficient interleaved optimization strategies for arbitrary norms are developed.
Proceedings ArticleDOI

How far can you get with a modern face recognition test set using only simple features

TL;DR: It is shown that even modest optimization of the simple model introduced by Pinto et al. using modern multiple kernel learning (MKL) techniques once again yields “state-of-the-art” performance levels on a standard face recognition set (“labeled faces in the wild”).
Journal ArticleDOI

Generic eukaryotic core promoter prediction using structural features of DNA

TL;DR: A novel approach for predicting promoters in whole-genome sequences by using large-scale structural properties of DNA that requires no training, is applicable to many eukaryotic genomes, and performs extremely well in comparison with the best available promoter prediction programs.
References
More filters
Book

The Nature of Statistical Learning Theory

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Journal ArticleDOI

Support-Vector Networks

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Journal ArticleDOI

BLAT—The BLAST-Like Alignment Tool

TL;DR: How BLAT was optimized is described, which is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences.
Book

Testing statistical hypotheses

TL;DR: The general decision problem, the Probability Background, Uniformly Most Powerful Tests, Unbiasedness, Theory and First Applications, and UNbiasedness: Applications to Normal Distributions, Invariance, Linear Hypotheses as discussed by the authors.

Fast training of support vector machines using sequential minimal optimization, advances in kernel methods

J. C. Platt
TL;DR: SMO breaks this large quadratic programming problem into a series of smallest possible QP problems, which avoids using a time-consuming numerical QP optimization as an inner loop and hence SMO is fastest for linear SVMs and sparse data sets.