scispace - formally typeset
Search or ask a question

Showing papers on "Relevance vector machine published in 2004"


Journal ArticleDOI
TL;DR: This tutorial gives an overview of the basic ideas underlying Support Vector (SV) machines for function estimation, and includes a summary of currently used algorithms for training SV machines, covering both the quadratic programming part and advanced methods for dealing with large datasets.
Abstract: In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.

10,696 citations


Proceedings ArticleDOI
04 Jul 2004
TL;DR: This paper proposes to generalize multiclass Support Vector Machine learning in a formulation that involves features extracted jointly from inputs and outputs, and demonstrates the versatility and effectiveness of the method on problems ranging from supervised grammar learning and named-entity recognition, to taxonomic text classification and sequence alignment.
Abstract: Learning general functional dependencies is one of the main goals in machine learning. Recent progress in kernel-based methods has focused on designing flexible and powerful input representations. This paper addresses the complementary issue of problems involving complex outputs such as multiple dependent output variables and structured output spaces. We propose to generalize multiclass Support Vector Machine learning in a formulation that involves features extracted jointly from inputs and outputs. The resulting optimization problem is solved efficiently by a cutting plane algorithm that exploits the sparseness and structural decomposition of the problem. We demonstrate the versatility and effectiveness of our method on problems ranging from supervised grammar learning and named-entity recognition, to taxonomic text classification and sequence alignment.

1,446 citations


Journal ArticleDOI
TL;DR: This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning and shows up precise connections to other "kernel machines" popular in the community.
Abstract: Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible non-parametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations.13,78,31 The mathematical literature on GPs is large and often uses deep concepts which are not required to fully understand most machine learning applications. In this tutorial paper, we aim to present characteristics of GPs relevant to machine learning and to show up precise connections to other "kernel machines" popular in the community. Our focus is on a simple presentation, but references to more detailed sources are provided.

752 citations


Proceedings ArticleDOI
13 Nov 2004
TL;DR: A novel hierarchical classification method that generalizes Support Vector Machine learning and that is based on discriminant functions that are structured in a way that mirrors the class hierarchy is proposed.
Abstract: Automatically categorizing documents into pre-defined topic hierarchies or taxonomies is a crucial step in knowledge and content management. Standard machine learning techniques like Support Vector Machines and related large margin methods have been successfully applied for this task, albeit the fact that they ignore the inter-class relationships. In this paper, we propose a novel hierarchical classification method that generalizes Support Vector Machine learning and that is based on discriminant functions that are structured in a way that mirrors the class hierarchy. Our method can work with arbitrary, not necessarily singly connected taxonomies and can deal with task-specific loss functions. All parameters are learned jointly by optimizing a common objective function corresponding to a regularized upper bound on the empirical loss. We present experimental results on the WIPO-alpha patent collection to show the competitiveness of our approach.

431 citations


Proceedings ArticleDOI
27 Jun 2004
TL;DR: This work describes a learning based method for recovering 3D human body pose by direct nonlinear regression against shape descriptor vectors extracted automatically from image silhouettes, and results are a factor of 3 better than the current state of the art for the much simpler upper body problem.
Abstract: We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body pans in the image. Instead, it recovers pose by direct nonlinear regression against shape descriptor vectors extracted automatically from image silhouettes. For robustness against local silhouette segmentation errors, silhouette shape is encoded by histogram-of-shape-contexts descriptors. For the main regression, we evaluate both regularized least squares and relevance vector machine (RVM) regressors over both linear and kernel bases. The RVM's provide much sparser regressors without compromising performance, and kernel bases give a small but worthwhile improvement in performance. For realism and good generalization with respect to viewpoints, we train the regressors on images resynthesized from real human motion capture data, and test it both quantitatively on similar independent test data, and qualitatively on a real image sequence. Mean angular errors of 6-7 degrees are obtained - a factor of 3 better than the current state of the art for the much simpler upper body problem.

393 citations


Proceedings ArticleDOI
Peng Zhang1, Jing Peng1
23 Aug 2004
TL;DR: This paper applies support vector machines and regularized least squares to a collection of data sets and presents results demonstrating virtual identical performance by the two methods.
Abstract: Support vector machines (SVMs) and regularized least squares (RLS) are two recent promising techniques for classification. SVMs implement the structure risk minimization principle and use the kernel trick to extend it to the nonlinear case. On the one hand, RLS minimizes a regularized functional directly in a reproducing kernel Hilbert space defined by a kernel. While both have a sound mathematical foundation, RLS is strikingly simple. On the other hand, SVMs in general have a sparse representation of solutions. In addition, the performance of SVMs has been well documented but little can be said of RLS. This paper applies these two techniques to a collection of data sets and presents results demonstrating virtual identical performance by the two methods.

358 citations


Journal ArticleDOI
TL;DR: The results show that the generalization error of FSVMs is comparable to other methods on benchmark datasets and the proposed approach for automatic setting of fuzzy memberships makes theFSVMs more applicable in reducing the effects of noises or outliers.

251 citations


Journal ArticleDOI
TL;DR: Results show that the SVM performs better than maximum likelihood, univariate decision tree and backpropagation neural network classifiers, even with small training data sets, and is almost unaffected by the Hughes phenomenon.

216 citations


Proceedings ArticleDOI
04 Jul 2004
TL;DR: This work casts linear classifiers into a probabilistic framework and develops a co-EM version of the Support Vector Machine, which conducts experiments on text classification problems and compares the family of semi-supervised support vector algorithms under different conditions, including violations of the assumptions underlying multi-view learning.
Abstract: Multi-view algorithms, such as co-training and co-EM, utilize unlabeled data when the available attributes can be split into independent and compatible subsets. Co-EM outperforms co-training for many problems, but it requires the underlying learner to estimate class probabilities, and to learn from probabilistically labeled data. Therefore, co-EM has so far only been studied with naive Bayesian learners. We cast linear classifiers into a probabilistic framework and develop a co-EM version of the Support Vector Machine. We conduct experiments on text classification problems and compare the family of semi-supervised support vector algorithms under different conditions, including violations of the assumptions underlying multi-view learning. For some problems, such as course web page classification, we observe the most accurate results reported so far.

202 citations


Journal ArticleDOI
TL;DR: A meta-learning methodology that can select settings with low error while providing significant savings in time is proposed and applied to set the width of the Gaussian kernel.
Abstract: The Support Vector Machine algorithm is sensitive to the choice of parameter settings. If these are not set correctly, the algorithm may have a substandard performance. Suggesting a good setting is thus an important problem. We propose a meta-learning methodology for this purpose and exploit information about the past performance of different settings. The methodology is applied to set the width of the Gaussian kernel. We carry out an extensive empirical evaluation, including comparisons with other methods (fixed default ranking; selection based on cross-validation and a heuristic method commonly used to set the width of the SVM kernel). We show that our methodology can select settings with low error while providing significant savings in time. Further work should be carried out to see how the methodology could be adapted to different parameter setting tasks.

178 citations


Proceedings ArticleDOI
22 Aug 2004
TL;DR: A simple generalization of SVM: Weighted Margin SVM (WMSVMs) that permits the incorporation of prior knowledge and it is shown that Sequential Minimal Optimization can be used in training WMSVM.
Abstract: Like many purely data-driven machine learning methods, Support Vector Machine (SVM) classifiers are learned exclusively from the evidence presented in the training dataset; thus a larger training dataset is required for better performance. In some applications, there might be human knowledge available that, in principle, could compensate for the lack of data. In this paper, we propose a simple generalization of SVM: Weighted Margin SVM (WMSVMs) that permits the incorporation of prior knowledge. We show that Sequential Minimal Optimization can be used in training WMSVM. We discuss the issues of incorporating prior knowledge using this rather general formulation. The experimental results show that the proposed methods of incorporating prior knowledge is effective.

Journal ArticleDOI
TL;DR: Experimental results on simulated and real-world data sets indicate that the approach works well even on large data sets, and has the advantages of Bayesian methods for model adaptation and error bars of its predictions.
Abstract: In this paper, we use a unified loss function, called the soft insensitive loss function, for Bayesian support vector regression. We follow standard Gaussian processes for regression to set up the Bayesian framework, in which the unified loss function is used in the likelihood evaluation. Under this framework, the maximum a posteriori estimate of the function values corresponds to the solution of an extended support vector regression problem. The overall approach has the merits of support vector regression such as convex quadratic programming and sparsity in solution representation. It also has the advantages of Bayesian methods for model adaptation and error bars of its predictions. Experimental results on simulated and real-world data sets indicate that the approach works well even on large data sets.

Proceedings ArticleDOI
01 Jan 2004
TL;DR: An algorithm called ClusterSVM is proposed that accelerates the training process by exploiting the distributional properties of the training data, that is, the natural clustering of theTraining data and the overall layout of these clusters relative to the decision boundary of support vector machines.
Abstract: Training support vector machines involves a huge optimization problem and many specially designed algorithms have been proposed. In this paper, we proposed an algorithm called ClusterSVM that accelerates the training process by exploiting the distributional properties of the training data, that is, the natural clustering of the training data and the overall layout of these clusters relative to the decision boundary of support vector machines. The proposed algorithm first partitions the training data into several pair-wise disjoint clusters. Then, the representatives of these clusters are used to train an initial support vector machine, based on which we can approximately identify the support vectors and non-support vectors. After replacing the cluster containing only non-support vectors with its representative, the number of training data can be significantly reduced, thereby speeding up the training process. The proposed ClusterSVM has been tested against the popular training algorithm SMO on both the artificial data and the real data, and a significant speedup was observed. The complexity of ClusterSVM scales with the square of the number of support vectors and, after a further improvement, it is expected that it will scale with square of the number of non-boundary support vectors.

01 Jan 2004
TL;DR: This work proposes the use of clustering techniques such as K-mean to find initial clusters that are further altered to identify non-relevant samples in deciding the decision boundary for SVM to reduce the number of training samples for SVMs without degrading the classification result.
Abstract: Support Vector Machines (SVMs) have gained wide acceptance because of the high generalization ability for a wide range of classification applications. Although SVMs have shown potential and promising performance in classification, they have been limited by speed particularly when the training data set is large. The hyper plane constructed by SVM is dependent on only a portion of the training samples called support vectors that lie close to the decision boundary (hyper plane). Thus, removing any training samples that are not relevant to support vectors might have no effect on building the proper decision function. We propose the use of clustering techniques such as K-mean to find initial clusters that are further altered to identify non-relevant samples in deciding the decision boundary for SVM. This will help to reduce the number of training samples for SVM without degrading the classification result.

Proceedings ArticleDOI
04 Jul 2004
TL;DR: A sparse Bayesian regression method for recovering 3D human body motion directly from silhouettes extracted from monocular video sequences, and demonstrates the method on a 54-parameter full body pose model.
Abstract: We describe a sparse Bayesian regression method for recovering 3D human body motion directly from silhouettes extracted from monocular video sequences. No detailed body shape model is needed, and realism is ensured by training on real human motion capture data. The tracker estimates 3D body pose by using Relevance Vector Machine regression to combine a learned autoregressive dynamical model with robust shape descriptors extracted automatically from image silhouettes. We studied several different combination methods, the most effective being to learn a nonlinear observation-update correction based on joint regression with respect to the predicted state and the observations. We demonstrate the method on a 54-parameter full body pose model, both quantitatively using motion capture based test sequences, and qualitatively on a test video sequence.

01 Jan 2004
TL;DR: A novel method involving classifying translations as machine or humanproduced rather than directly predicting numerical human judgments eliminates the need for labor-intensive user studies as a source of training data and is shown to significantly improve upon current automatic metrics.
Abstract: The problem of evaluating machine translation (MT) systems is more challenging than it may first appear, as diverse translations can often be considered equally correct. The task is even more difficult when practical circumstances require that evaluation be done automatically over short texts, for instance, during incremental system development and error analysis. While several automatic metrics, such as BLEU, have been proposed and adopted for largescale MT system discrimination, they all fail to achieve satisfactory levels of correlation with human judgments at the sentence level. Here, a new class of metrics based on machine learning is introduced. A novel method involving classifying translations as machine or humanproduced rather than directly predicting numerical human judgments eliminates the need for labor-intensive user studies as a source of training data. The resulting metric, based on support vector machines, is shown to significantly improve upon current automatic metrics, increasing correlation with human judgments at the sentence level halfway toward that achieved by an independent human evaluator.

Proceedings ArticleDOI
26 Oct 2004
TL;DR: In the one-class scenario distance methods are superior while in the two-class SVM based method outperforms the other methods.
Abstract: Learning strategies and classification methods for verification of signatures from scanned documents are proposed and evaluated. Learning strategies considered are writer independent- those that learn from a set of signature sample (including forgeries) prior to enrollment of a writer, and writer dependent- those that learn only from a newly enrolled individual. Classification methods considered include two distance based methods (one based on a threshold, which is the standard method of signature verification and biometrics, and the other based on a distance probability distribution), a Nave Bayes (NB) classifier based on pairs of feature bit values and a support vector machine (SVM). Two scenarios are considered for the writer dependent scenario: (i) without forgeries (one-class problem) and (ii) with forgery samples being available (two class problem). The features used to characterize a signature capture local geometry, stroke and topology information in the form of a binary vector. In the one-class scenario distance methods are superior while in the two-class SVM based method outperforms the other methods.

01 Jan 2004
TL;DR: In this paper, an approach for learning-based rule-extraction from support vector machines is outlined, including an evaluation of the quality of the extracted rules in terms of fidelity, accuracy, consistency and comprehensibility.
Abstract: Over the last decade, rule-extraction from neural networks (ANN) techniques have been developed to explain how classification and regression are realised by the ANN. Yet, this is not the case for support vector machines (SVMs) which also demonstrate an inability to explain the process by which a learning result was reached and why a decision is being made. Rule-extraction from SVMs is important, especially for applications such as medical diagnosis. In this paper, an approach for learning-based rule-extraction from support vector machines is outlined, including an evaluation of the quality of the extracted rules in terms of fidelity, accuracy, consistency and comprehensibility. In addition, the rules are verified by use of knowledge from the problem domains as well as other classification techniques to assure correctness and validity.

Journal ArticleDOI
TL;DR: The proposed MSVM in addition provides a unifying framework when there are either equal or unequal misclassification costs, and when there is a possibly nonrepresentative training set.
Abstract: Two-category support vector machines (SVMs) have become very popular in the machine learning community for classification problems and have recently been shown to have good optimality properties for classification purposes. Treating multicategory problems as a series of binary problems is common in the SVM paradigm. However, this approach may fail under a variety of circumstances. The multicategory support vector machine (MSVM), which extends the binary SVM to the multicategory case in a symmetric way, and has good theoretical properties, has recently been proposed. The proposed MSVM in addition provides a unifying framework when there are either equal or unequal misclassification costs, and when there is a possibly nonrepresentative training set. Illustrated herein is the potential of the MSVM as an efficient cloud detection and classification algorithm for use in Earth Observing System models, which require knowledge of whether or not a radiance profile is cloud free. If the profile is not clou...

Proceedings ArticleDOI
25 Jul 2004
TL;DR: This work proposes a novel technique to formulate the relevance feedback based on a modified SVM called biased support vector machine (Biased SVM or BSVM) for solving the unbalanced dataset problem.
Abstract: Recently, support vector machines (SVMs) have been engaged on relevance feedback tasks in content-based image retrieval. Typical approaches by SVMs treat the relevance feedback as a strict binary classification problem. However, these approaches do not consider an important issue of relevance feedback, i.e. the unbalanced dataset problem, in which the negative instances largely outnumber the positive instances. For solving this problem, we propose a novel technique to formulate the relevance feedback based on a modified SVM called biased support vector machine (Biased SVM or BSVM). Mathematical formulation and explanations are provided for showing the advantages. Experiments are conducted to evaluate the performance of our algorithms, in which promising results demonstrate the effectiveness of our techniques.

Proceedings ArticleDOI
07 Oct 2004
TL;DR: This work introduces novel methods for feature selection (FS) based on support vector machines (SVM) that combine feature subsets produced by a variant of SVM-RFE, a popular feature ranking/selection algorithm based on SVM.
Abstract: This work introduces novel methods for feature selection (FS) based on support vector machines (SVM). The methods combine feature subsets produced by a variant of SVM-RFE, a popular feature ranking/selection algorithm based on SVM. Two combination strategies are proposed: union of features occurring frequently, and ensemble of classifiers built on single feature subsets. The resulting methods are applied to pattern proteomic data for tumor diagnostics. Results of experiments on three proteomic pattern datasets indicate that combining feature subsets affects positively the prediction accuracy of both SVM and SVM-RFE. A discussion about the biological interpretation of selected features is provided.

Book ChapterDOI
26 Sep 2004
TL;DR: It is shown that the Gaussian kernel function combined with an optimal choice of parameters can produce high classification accuracy in a Support Vector Machines system.
Abstract: The classification of normal and malginant colon tissue cells is crucial to the diagnosis of colon cancer in humans. Given the right set of feature vectors, Support Vector Machines (SVMs) have been shown to perform reasonably well for the classification [4,13]. In this paper, we address the following question: how does the choice of a kernel function and its parameters affect the SVM classification performance in such a system? We show that the Gaussian kernel function combined with an optimal choice of parameters can produce high classification accuracy.

Journal ArticleDOI
TL;DR: An efficient method for computing the leave-one-out (LOO) error for support vector machines (SVMs) with Gaussian kernels quite accurately and has good promise for use in hyperparameter tuning and model comparison.
Abstract: In this paper, we give an efficient method for computing the leave-one-out (LOO) error for support vector machines (SVMs) with Gaussian kernels quite accurately. It is particularly suitable for iterative decomposition methods of solving SVMs. The importance of various steps of the method is illustrated in detail by showing the performance on six benchmark datasets. The new method often leads to speedups of 10-50 times compared to standard LOO error computation. It has good promise for use in hyperparameter tuning and model comparison.

Proceedings ArticleDOI
07 Oct 2004
TL;DR: This work provides an interesting new mechanism to address complex classification problems, which are common in medical or biological information processing applications, by building a sequence of information granules and then building a support vector machine in each information granule.
Abstract: We propose a new learning model called granular support vector machines for data classification problems. Granular support vector machines systematically and formally combines the principles from statistical learning theory and granular computing theory. It works by building a sequence of information granules and then building a support vector machine in each information granule. In this paper, we also give a simple but efficient implementation method for modeling a granular support vector machine by building just two information granules in the top-down way (that is, halving the whole feature space). The hyperplane used to halve the feature space is selected by extending statistical margin maximization principle. The experiment results on three medical binary classification problems show that finding the splitting hyperplane is not a trivial task. For some datasets and some kernel functions, granular support vector machines with two information granules could achieve some improvement on testing accuracy, but for some other datasets, building one single support vector machine in the whole feature space gets a little better performance. How to get the optimal information granules is still an open problem. The important issue is that granular support vector machines proposed in This work provides an interesting new mechanism to address complex classification problems, which are common in medical or biological information processing applications.

01 Jan 2004
TL;DR: In this paper, a rule extraction from support vector machines (SVM) is presented, where the output rule sets are verified against available knowledge for the domain problem (e.g. a medical expert), and other classification techniques, to assure correctness and validity of rules.
Abstract: In recent years, support vector machines (SVMs) have shown good performance in a number of application areas, including text classification. However, the success of SVMs comes at a cost - an inability to explain the process by which a learning result was reached and why a decision is being made. Rule-extraction from SVMs is important for the acceptance of this machine learning technology, especially for applications such as medical diagnosis. It is crucial for the users to understand how the system makes a decision. In this paper, a novel approach for rule-extraction from support vector machines is presented. This approach handles rule-extraction as a learning task, which proceeds in two steps. The first is to use the labeled patterns from a data set to train an SVM. The second step is to use the generated model to predict the label (class) for an extended data set or different, unlabeled data set. The resulting patterns are then used to train a decision tree learning system and to extract the corresponding rule sets. The output rule sets are verified against available knowledge for the domain problem (e.g. a medical expert), and other classification techniques, to assure correctness and validity of rules.

Book ChapterDOI
13 Dec 2004
TL;DR: This paper presents a time series classification system based on boosting very simple classifiers, formed only by one literal, based on temporal intervals.
Abstract: In previous works, a time series classification system has been presented. It is based on boosting very simple classifiers, formed only by one literal. The used literals are based on temporal intervals.

Proceedings ArticleDOI
27 Jun 2004
TL;DR: A direct Bayesian based support vector machine is developed by combining the Bayesian analysis with the SVM, including the one-versus-all method, the hierarchical agglomerative clustering based method, and the adaptive clustering method.
Abstract: In this paper, we first develop a direct Bayesian based support vector machine by combining the Bayesian analysis with the SVM. Unlike traditional SVM-based face recognition method that needs to train a large number of SVMs, the direct Bayesian SVM needs only one SVM trained to classify the face difference between intra-personal variation and extra-personal variation. However, the added simplicity means that the method has to separate two complex subspaces by one hyper-plane thus affects the recognition accuracy. In order to improve the recognition performance we develop three more Bayesian based SVMs, including the one-versus-all method, the hierarchical agglomerative clustering based method, and the adaptive clustering method. We show the improvement of the new algorithms over traditional subspace methods through experiments on two face databases, the FERET database and the XM2VTS database.

Book ChapterDOI
19 Aug 2004
TL;DR: In this article, the authors proved the universal approximation of the SVM with RBF kernel to arbitrary functions on a compact set and deduced it to the approximation of discrete functions.
Abstract: The SVM has been used to the nonlinear function mapping successfully, but the universal approximation property of the SVM has never been proved in theory. This paper proves the universal approximation of the SVM with RBF kernel to arbitrary functions on a compact set and deduces it to the approximation of discrete function. From simulation we can see that the RBF kernel based LS-SVM is more effective in nonlinear function estimation and can prevent the system from noise pollution, so it has high generalization ability.

Dissertation
01 Jan 2004
TL;DR: This paper modify this method for training generalized linear models by adapting automatically the width of the basis functions to the optimal for the data at hand, and tries the Adaptive RVM for prediction on the chaotic Mackey-Glass time series.
Abstract: This thesis is concerned with Gaussian Processes (GPs) and Relevance Vector Machines (RVMs), both of which are particular instances of probabilistic linear models. We look at both models from a Bayesian perspective, and are forced to adopt an approximate Bayesian treatment to learning for two reasons. The fi rst reason is the analytical intractability of the full Bayesian treatment and the fact that we in principle do not want to resort to sampling methods. The second reason, which incidentally justifi es our not wanting to sample, is that we are interested in computationally efficient models. Computational efficiency is obtained through sparseness: sparse linear models have a signi ficant number of their weights set to zero. For the RVM, which we treat in Chap. 2, we show that it is precisely the particular choice of Bayesian approximation that enforces sparseness. Probabilistic models have the important property of producing predictive distributions instead of point predictions. We also show that the resulting sparse probabilistic model implies counterintuitive priors over functions, and ultimately inappropriate predictive variances; the model is more certain about its predictions, the further away from the training data. We propose the RVM*, a modi fied RVM that provides signi ficantly better predictive uncertainties. RVMs happen to be a particular case of GPs, the latter having superior performance and being non-sparse non-parametric models. For completeness, in Chap. 3 we study a particular family of approximations to Gaussian Processes, Reduced Rank Gaussian Processes (RRGPs), which take the form of fi nite extended linear models; we show that GPs are in general equivalent to in finite extended linear models. We also show that RRGPs result in degenerate GPs, which suff er, like RVMs, of inappropriate predictive variances. We solve this problem in by proposing a modifi cation of the classic RRGP approach, in the same guise as the RVM*. In the last part of this thesis we move on to the problem of uncertainty in the inputs. Indeed, these were until now considered deterministic, as it is common use. We derive the equations for predicting at an uncertain input with GPs and RVMs, and use this to propagate the uncertainty in recursive multi-step ahead time-series predictions. This allows us to obtain sensible predictive uncertainties when recursively predicting k-steps ahead, while standard approaches that ignore the accumulated uncertainty are way overconfi dent. Finally we explore a much harder problem: that of training with uncertain inputs. We explore approximating the full Bayesian treatment, which implies an analytically intractable integral. We propose two preliminary approaches. The first one tries to \guess” the unknown \true” inputs, and requires careful optimisation to avoid over- fitting. It also requires prior knowledge of the output noise, which is limiting. The second approach consists in sampling from the inputs posterior, and optimising the hyperparameters. Sampling has the eff ect of severely incrementing the computational cost, which again is limiting. However, the success in toy experiments is exciting, and should motivate future research.

Journal ArticleDOI
01 Dec 2004
TL;DR: The problem of feature selection is a difficult combinatorial task in Machine Learning and of high practical relevance, e.g. in bioinformatics, and Genetic Algorithms offer a natural way to solve it.
Abstract: The problem of feature selection is a difficult combinatorial task in Machine Learning and of high practical relevance, e.g. in bioinformatics. Genetic Algorithms (GAs) offer a natural way to solve...