Learning interpretable SVMs for biological sequence classification.

doi:10.1186/1471-2105-7-S1-S9

Open AccessJournal ArticleDOI

Learning interpretable SVMs for biological sequence classification.

Gunnar Rätsch, +2 more

- 20 Mar 2006 -

BMC Bioinformatics

- Vol. 7, Iss: 1, pp 1-14

TLDR

Novel and efficient algorithms are proposed for solving the so-called Support Vector Multiple Kernel Learning problem and can be used to understand the obtained support vector decision function in order to extract biologically relevant knowledge about the sequence analysis problem at hand.

Abstract:

Support Vector Machines (SVMs) – using a variety of string kernels – have been successfully applied to biological sequence classification problems. While SVMs achieve high classification accuracy they lack interpretability. In many applications, it does not suffice that an algorithm just detects a biological signal in the sequence, but it should also provide means to interpret its solution in order to gain biological insight. We propose novel and efficient algorithms for solving the so-called Support Vector Multiple Kernel Learning problem. The developed techniques can be used to understand the obtained support vector decision function in order to extract biologically relevant knowledge about the sequence analysis problem at hand. We apply the proposed methods to the task of acceptor splice site prediction and to the problem of recognizing alternatively spliced exons. Our algorithms compute sparse weightings of substring locations, highlighting which parts of the sequence are important for discrimination. The proposed method is able to deal with thousands of examples while combining hundreds of kernels within reasonable time, and reliably identifies a few statistically significant positions.

Citations

PDF

Open Access

More filters

Journal Article

Large Scale Multiple Kernel Learning

Sören Sonnenburg, +3 more

- 01 Dec 2006 -

Journal of Machine Learning Research

TL;DR: It is shown that the proposed multiple kernel learning algorithm can be rewritten as a semi-infinite linear program that can be efficiently solved by recycling the standard SVM implementations, and generalize the formulation and the method to a larger class of problems, including regression and one-class classification.

...read moreread less

Journal ArticleDOI

On the interpretation of weight vectors of linear models in multivariate neuroimaging.

Stefan Haufe, +7 more

- 15 Feb 2014 -

NeuroImage

TL;DR: It is demonstrated that the parameters of forward models are neurophysiologically interpretable in the sense that significant nonzero weights are only observed at channels the activity of which is related to the brain process under study, in contrast to the interpretation of backward model parameters.

...read moreread less

Journal ArticleDOI

l p -Norm Multiple Kernel Learning

Marius Kloft, +3 more

- 01 Feb 2011 -

Journal of Machine Learning Research

TL;DR: Empirical applications of lp-norm MKL to three real-world problems from computational biology show that non-sparse MKL achieves accuracies that surpass the state-of-the-art, and two efficient interleaved optimization strategies for arbitrary norms are developed.

...read moreread less

Proceedings ArticleDOI

How far can you get with a modern face recognition test set using only simple features

Nicolas Pinto, +2 more

TL;DR: It is shown that even modest optimization of the simple model introduced by Pinto et al. using modern multiple kernel learning (MKL) techniques once again yields “state-of-the-art” performance levels on a standard face recognition set (“labeled faces in the wild”).

...read moreread less

Journal ArticleDOI

Generic eukaryotic core promoter prediction using structural features of DNA

Thomas Abeel, +4 more

- 01 Feb 2008 -

Genome Research

TL;DR: A novel approach for predicting promoters in whole-genome sequences by using large-scale structural properties of DNA that requires no training, is applicable to many eukaryotic genomes, and performs extremely well in comparison with the best available promoter prediction programs.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

The Nature of Statistical Learning Theory

Vladimir Vapnik

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?

...read moreread less

Journal ArticleDOI

Support-Vector Networks

Corinna Cortes, +1 more

- 15 Sep 1995 -

Machine Learning

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

...read moreread less

Journal ArticleDOI

BLAT—The BLAST-Like Alignment Tool

W. James Kent

- 01 Apr 2002 -

Genome Research

TL;DR: How BLAT was optimized is described, which is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences.

...read moreread less

Book

Testing statistical hypotheses

Erich L. Lehmann

TL;DR: The general decision problem, the Probability Background, Uniformly Most Powerful Tests, Unbiasedness, Theory and First Applications, and UNbiasedness: Applications to Normal Distributions, Invariance, Linear Hypotheses as discussed by the authors.

...read moreread less

Fast training of support vector machines using sequential minimal optimization, advances in kernel methods

J. C. Platt

TL;DR: SMO breaks this large quadratic programming problem into a series of smallest possible QP problems, which avoids using a time-consuming numerical QP optimization as an inner loop and hence SMO is fastest for linear SVMs and sparse data sets.

...read moreread less

Collapse

Learning interpretable SVMs for biological sequence classification.

Citations

Large Scale Multiple Kernel Learning

On the interpretation of weight vectors of linear models in multivariate neuroimaging.

l p -Norm Multiple Kernel Learning

How far can you get with a modern face recognition test set using only simple features

Generic eukaryotic core promoter prediction using structural features of DNA

References

The Nature of Statistical Learning Theory

Support-Vector Networks

BLAT—The BLAST-Like Alignment Tool

Testing statistical hypotheses

Fast training of support vector machines using sequential minimal optimization, advances in kernel methods

Related Papers (5)

The Nature of Statistical Learning Theory

Support-Vector Networks

Learning the Kernel Matrix with Semidefinite Programming

LIBSVM: A library for support vector machines

Learning with Kernels