Manifold Gaussian Processes for regression
read more
Citations
Taking the Human Out of the Loop: A Review of Bayesian Optimization
Hidden physics models: Machine learning of nonlinear partial differential equations
Machine learning of linear differential equations using Gaussian processes
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
When Gaussian Process Meets Big Data: A Review of Scalable GPs
References
Nonlinear dimensionality reduction by locally linear embedding.
A global geometric framework for nonlinear dimensionality reduction.
Gaussian Processes for Machine Learning
LIII. On lines and planes of closest fit to systems of points in space
Independent component analysis: algorithms and applications
Related Papers (5)
Frequently Asked Questions (15)
Q2. What is the common approximation to the full Bayesian framework?
A common approximation to the full Bayesian framework is to introduce a deterministic feature space H, and to find the mappings M and G in two consecutive steps.
Q3. What is the effect of the mapping M?
Increasing the number of parameters of the mapping M intuitively leads to an increased flexibility in the learned covariance function.
Q4. What is the way to model a variety of functions?
Although the squared exponential can be applied to a great range of problems, generic covariance functions may also be inadequate to model a variety of functions where the common smoothness assumptions are violated, such as ground contacts in robot locomotion.
Q5. what are the benefits of the mGP?
Applications that profit from the enhanced modeling capabilities of the mGP include robot modeling (e.g., contact and stiction modeling), reinforcement learning, and Bayesian optimization.
Q6. What is the main challenge of training mGPs using neural networks as mapping M?
One of the main challenges of training mGPs using neural networks as mapping M is the unwieldy joint optimization of the parameters θmGP.
Q7. What is the covariance function of a GP?
The covariance function of a GP implicitly encodes high-level assumptions about the underlying function to be modeled, e.g., smoothness or periodicity.
Q8. Why is the locomotion data set difficult?
The locomotion data set is highly challenging due to ground contacts, which cause the regression function to violate standard smoothness assumptions.
Q9. What is the effect of different frequencies in the feature space?
The presence of different frequencies is problematic for covariance functions, such as the SEARD, which assume a single frequency.
Q10. What is the main argument of MacKay?
Unlike neural networks, which have been successfully used to extract complex features, MacKay (1998) argued that GPs are unsuited for feature learning.
Q11. What is the way to replace the deterministic mapping?
the authors replace this deterministic mapping with a probabilistic one, which would describe the uncertainty about the location of the discontinuity.
Q12. What is the effect of learning the discontinuity in the feature space?
it is easier for the GP to learn the mapping G. Learning the discontinuity in the feature space is a direct result from jointly training M and G as feature learning is embedded in the overall regression F .1.
Q13. What is the gradient of the kernel matrix?
The analytic gradients ∂NLML/∂θG of the objective in Equation (14)with respect to the parameters θG are computed as in the standard GP, i.e.,∂NLML(θmGP) ∂θG = ∂NLML(θmGP) ∂KθmGP ∂KθmGP ∂θG . (14)The gradients of the parameters θM of the feature mapping are computed by applying the chain-rule∂NLML(θmGP) ∂θM = ∂NLML(θmGP) ∂KθmGP ∂KθmGP ∂H ∂H ∂θM , (15)where only ∂H/∂θM depends on the chosen input transformation M , while ∂KθmGP/∂H is the gradient of the kernel matrix with respect to the Q-dimensional GP training inputs H = M(X).
Q14. What is the purpose of the supervised learning of the input and aGP?
Snelson and Ghahramani (2006) proposed a supervised dimensionality reduction by jointly learning a liner transformation of the input and aGP.
Q15. What is the novel approach to learning a regression model?
the authors introduce the Manifold Gaussian Processes, their novel approach to jointly learning a regression model and a suitable feature representation of the data.