scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Greedy Gaussian Process Regression Applied to Object Categorization and Regression

TL;DR: This work proposes an approximation of Gaussian Process and applies it to Classification and Regression tasks using a greedy approach to subset selection and the inducing input choice to approximate the kernel matrix, resulting in faster retrieval timings.
Abstract: In this work we propose an approximation of Gaussian Process and apply it to Classification and Regression tasks. We, primarily, target the problem of visual object categorization using a Greedy variant of Gaussian Processes. To deal with the prohibitive training and inferencing cost of GP, we devise a greedy approach to subset selection and the inducing input choice to approximate the kernel matrix, resulting in faster retrieval timings. A localized combination of kernel functions is designed and used in a framework of sparse approximations to Gaussian Processes for visual object categorization and generic regression tasks. Through exhaustive experimentation and empirical results we demonstrate the effectiveness of the proposed approach, when compared with other kernel based methods.
Citations
More filters
Journal ArticleDOI
TL;DR: A new regularized kernel least squares algorithm based on the fixed-budget approximation of the kernel matrix that allows regulating the computational burden of the identification algorithm and obtaining the least approximation error.
Abstract: The paper considers the identification of nonlinear dynamic processes using kernel algorithms. Kernel algorithms rely on a nonlinear transformation of the input data points into a high-dimensional space that allows solving nonlinear problems through the construction of kernelized counterparts of linear methods by replacing the inner products with kernels. A key feature of the kernel algorithms is high complexity of the inverse kernel matrix calculation. Nowadays, there are two approaches to this problem. The first one is based on using a reduced training data sample instead of a full one. In case of kernel methods, this approach could cause model misspecification, since kernel methods are directly based on training data. The second one is based on the reduced-rank approximations of the kernel matrix. A major limitation of this approach is that the rank of the approximation is either unknown until approximation is done or it is predefined by the user, both of which are not efficient enough. In this paper, we propose a new regularized kernel least squares algorithm based on the fixed-budget approximation of the kernel matrix. The proposed algorithm allows regulating the computational burden of the identification algorithm and obtaining the least approximation error. We have shown some simulations results illustrating the efficiency of the proposed algorithm compared to other algorithms. The application of the proposed algorithm is considered on the identification problem of the input and output pressure of the pump station.
References
More filters
Journal ArticleDOI
TL;DR: The incremental algorithm is compared experimentally to an earlier batch Bayesian algorithm, as well as to one based on maximum-likelihood, which have comparable classification performance on small training sets, but incremental learning is significantly faster, making real-time learning feasible.

2,597 citations


"Greedy Gaussian Process Regression ..." refers methods in this paper

  • ...We conduct supervised learning experiments on the Caltech 101 Dataset ( [6]), which consists of 101 different classes of images, with 31 to 800 samples per class....

    [...]

  • ...Despite using only two kernels, we have matched the performance of other multiple kernel-based methods [3],[4] and [6]....

    [...]

Journal ArticleDOI
TL;DR: A new unifying view, including all existing proper probabilistic sparse approximations for Gaussian process regression, relies on expressing the effective prior which the methods are using, and highlights the relationship between existing methods.
Abstract: We provide a new unifying view, including all existing proper probabilistic sparse approximations for Gaussian process regression. Our approach relies on expressing the effective prior which the methods are using. This allows new insights to be gained, and highlights the relationship between existing methods. It also allows for a clear theoretically justified ranking of the closeness of the known approximations to the corresponding full GPs. Finally we point directly to designs of new better sparse approximations, combining the best of the existing strategies, within attractive computational constraints.

1,881 citations


"Greedy Gaussian Process Regression ..." refers background or methods in this paper

  • ...FITC ([18]) imposes a further independence assumption and approximates the training conditional distribution as: qF ITC (f |U ) = n∏ i=1 p(fi |U ) = N (Kf ,U K−1U ,U U ,diaд[Kf , f −Qf , f )])....

    [...]

  • ...(7) The effective prior implied by the FITC is given by qF ITC (f , f∗) = N ( 0, ( Qf , f − diaд[Qf , f − Kf , f ] Qf ,∗ Q∗, f K∗∗ )) ....

    [...]

  • ...FITC ([18]) imposes a further independence assumption and approximates the training conditional distribution as:...

    [...]

  • ...The corresponding predictive distribution as given by [18] is:...

    [...]

  • ...So even though the sparse formulation withm inducing inputs brings down the complexity to O(nm2), the presence of the n term makes it computationally prohibitive ([18])....

    [...]

Proceedings Article
05 Dec 2005
TL;DR: It is shown that this new Gaussian process (GP) regression model can match full GP performance with small M, i.e. very sparse solutions, and it significantly outperforms other approaches in this regime.
Abstract: We present a new Gaussian process (GP) regression model whose co-variance is parameterized by the the locations of M pseudo-input points, which we learn by a gradient based optimization. We take M ≪ N, where N is the number of real data points, and hence obtain a sparse regression method which has O(M2N) training cost and O(M2) prediction cost per test case. We also find hyperparameters of the covariance function in the same joint optimization. The method can be viewed as a Bayesian regression model with particular input dependent noise. The method turns out to be closely related to several other sparse GP approaches, and we discuss the relation in detail. We finally demonstrate its performance on some large data sets, and make a direct comparison to other sparse GP methods. We show that our method can match full GP performance with small M, i.e. very sparse solutions, and it significantly outperforms other approaches in this regime.

1,708 citations


"Greedy Gaussian Process Regression ..." refers background or methods in this paper

  • ...There also exist sparsification methods ([8, 10, 22, 24]) that do not constrain the inducing inputs to be part of the input dataset....

    [...]

  • ...The Fully Independent Training Conditional (FITC) approximation which was first proposed by Snelson and Ghahramani ([22]) as Sparse Pseudo input GP (SPGP), uses an approximate training conditional (7) but an exact test conditional (6)....

    [...]

  • ...Also when the dimensionality of the data is big, the authors [22] have pointed out that this optimization fails....

    [...]

  • ...Under the Bayesian framework, the non-parametric GP model provides a flexible and elegant method for non-linear regression[22, 28]....

    [...]

Proceedings ArticleDOI
17 Oct 2005
TL;DR: A new fast kernel function is presented which maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in this space and is shown to be positive-definite, making it valid for use in learning algorithms whose optimal solutions are guaranteed only for Mercer kernels.
Abstract: Discriminative learning is challenging when examples are sets of features, and the sets vary in cardinality and lack any sort of meaningful ordering. Kernel-based classification methods can learn complex decision boundaries, but a kernel over unordered set inputs must somehow solve for correspondences epsivnerally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function which maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in this space. This "pyramid match" computation is linear in the number of features, and it implicitly finds correspondences based on the finest resolution histogram cell where a matched pair first appears. Since the kernel does not penalize the presence of extra features, it is robust to clutter. We show the kernel function is positive-definite, making it valid for use in learning algorithms whose optimal solutions are guaranteed only for Mercer kernels. We demonstrate our algorithm on object recognition tasks and show it to be accurate and dramatically faster than current approaches

1,669 citations


"Greedy Gaussian Process Regression ..." refers methods in this paper

  • ...Recently, promising discriminative methods like support vector machines and nearest neighbour methods have been presented by ([9, 26, 27, 32])....

    [...]

  • ...Method Author Kernel Accuracy SVM [9] 1 43 * GP-PMK [12] 1 53 * GS-MKL [31] 6 65....

    [...]

Proceedings ArticleDOI
09 Jul 2007
TL;DR: This work introduces a descriptor that represents local image shape and its spatial layout, together with a spatial pyramid kernel that is designed so that the shape correspondence between two images can be measured by the distance between their descriptors using the kernel.
Abstract: The objective of this paper is classifying images by the object categories they contain, for example motorbikes or dolphins. There are three areas of novelty. First, we introduce a descriptor that represents local image shape and its spatial layout, together with a spatial pyramid kernel. These are designed so that the shape correspondence between two images can be measured by the distance between their descriptors using the kernel. Second, we generalize the spatial pyramid kernel, and learn its level weighting parameters (on a validation set). This significantly improves classification performance. Third, we show that shape and appearance kernels may be combined (again by learning parameters on a validation set).Results are reported for classification on Caltech-101 and retrieval on the TRECVID 2006 data sets. For Caltech-101 it is shown that the class specific optimization that we introduce exceeds the state of the art performance by more than 10%.

1,496 citations


"Greedy Gaussian Process Regression ..." refers background or methods in this paper

  • ...(11) Here, kernel weights w1,w2 are set to constants, while Kph and Kдb refers to kernels of our choice....

    [...]

  • ...Our similarity metric is given by Sim(i, j) = w1Kph (i, j) +w2Kдb (i, j)....

    [...]

  • ...Multiple kernels are usually more effective than any particular choice of a simple baseline [3], so we propose here a composite kernel function KC which is a linear combination of PHOW kernel Kph [2] and a Geometric Blur kernel Kдb [1, 32]....

    [...]

  • ...We achieve considerable computational speed-ups due to the sparse approximations incorporated, and are able to reproduce the results obtained through exact GP methods [2] and [4]....

    [...]