Kernel Mean Embedding of Distributions: A Review and Beyond

Home
/
Papers
/
Kernel Mean Embedding of Distributions: A Review and Beyond

Book•

Kernel Mean Embedding of Distributions: A Review and Beyond

Krikamol Muandet¹, Kenji Fukumizu, Bharath K. Sriperumbudur², Bernhard Schölkopf¹•Institutions (2)

Max Planck Society¹, Pennsylvania State University²

28 Jun 2017-

TL;DR: The kernel mean embedding (KME) as discussed by the authors is a generalization of the original feature map of support vector machines (SVMs) and other kernel methods, and it can be viewed as a generalisation of the SVM feature map.

read less

Abstract: A Hilbert space embedding of a distribution—in short, a kernel mean embedding—has recently emerged as a powerful tool for machine learning and statistical inference. The basic idea behind this framework is to map distributions into a reproducing kernel Hilbert space (RKHS) in which the whole arsenal of kernel methods can be extended to probability measures. It can be viewed as a generalization of the original “feature map” common to support vector machines (SVMs) and other kernel methods. In addition to the classical applications of kernel methods, the kernel mean embedding has found novel applications in ﬁelds ranging from probabilistic modeling to statistical inference, causal discovery, and deep learning. Kernel Mean Embedding of Distributions: A Review and Beyond provides a comprehensive review of existing work and recent advances in this research area, and to discuss some of the most challenging issues and open problems that could potentially lead to new research directions. The targeted audience includes graduate students and researchers in machine learning and statistics who are interested in the theory and applications of kernel mean embeddings.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Pros and Cons of GAN Evaluation Measures

[...]

Ali Borji

01 Feb 2019-Computer Vision and Image Understanding

TL;DR: More recently, this article reviewed and critically discussed more than 24 quantitative and 5 qualitative measures for evaluating generative models with a particular emphasis on GAN-derived models and also provided a set of 7 desiderata followed by an evaluation of whether a given measure or a family of measures is compatible with them.

...read moreread less

505 citations

Posted Content•

In Search of Lost Domain Generalization

[...]

Ishaan Gulrajani¹, David Lopez-Paz²•Institutions (2)

Stanford University¹, Facebook²

02 Jul 2020-arXiv: Learning

TL;DR: This paper implements DomainBed, a testbed for domain generalization including seven multi-domain datasets, nine baseline algorithms, and three model selection criteria, and finds that, when carefully implemented, empirical risk minimization shows state-of-the-art performance across all datasets.

...read moreread less

Abstract: The goal of domain generalization algorithms is to predict well on distributions different from those seen during training While a myriad of domain generalization algorithms exist, inconsistencies in experimental conditions -- datasets, architectures, and model selection criteria -- render fair and realistic comparisons difficult In this paper, we are interested in understanding how useful domain generalization algorithms are in realistic settings As a first step, we realize that model selection is non-trivial for domain generalization tasks Contrary to prior work, we argue that domain generalization algorithms without a model selection strategy should be regarded as incomplete Next, we implement DomainBed, a testbed for domain generalization including seven multi-domain datasets, nine baseline algorithms, and three model selection criteria We conduct extensive experiments using DomainBed and find that, when carefully implemented, empirical risk minimization shows state-of-the-art performance across all datasets Looking forward, we hope that the release of DomainBed, along with contributions from fellow researchers, will streamline reproducible and rigorous research in domain generalization

...read moreread less

492 citations

Posted Content•

MMD GAN: Towards Deeper Understanding of Moment Matching Network

[...]

Chun-Liang Li¹, Wei-Cheng Chang¹, Yu Cheng², Yiming Yang¹, Barnabás Póczos¹ - Show less +1 more•Institutions (2)

Carnegie Mellon University¹, IBM²

24 May 2017-arXiv: Learning

TL;DR: MMD GAN as discussed by the authors improves both the model expressiveness of GMMN and its computational efficiency by introducing adversarial kernel learning techniques, as the replacement of a fixed Gaussian kernel in the original GMMN.

...read moreread less

Abstract: Generative moment matching network (GMMN) is a deep generative model that differs from Generative Adversarial Network (GAN) by replacing the discriminator in GAN with a two-sample test based on kernel maximum mean discrepancy (MMD) Although some theoretical guarantees of MMD have been studied, the empirical performance of GMMN is still not as competitive as that of GAN on challenging and large benchmark datasets The computational efficiency of GMMN is also less desirable in comparison with GAN, partially due to its requirement for a rather large batch size during the training In this paper, we propose to improve both the model expressiveness of GMMN and its computational efficiency by introducing adversarial kernel learning techniques, as the replacement of a fixed Gaussian kernel in the original GMMN The new approach combines the key ideas in both GMMN and GAN, hence we name it MMD GAN The new distance measure in MMD GAN is a meaningful loss that enjoys the advantage of weak topology and can be optimized via gradient descent with relatively small batch sizes In our evaluation on multiple benchmark datasets, including MNIST, CIFAR- 10, CelebA and LSUN, the performance of MMD-GAN significantly outperforms GMMN, and is competitive with other representative GAN works

...read moreread less

362 citations

Posted Content•

Pros and Cons of GAN Evaluation Measures

[...]

Ali Borji

09 Feb 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: More than 24 quantitative and 5 qualitative measures for evaluating generative models with a particular emphasis on GAN-derived models have been discussed in this paper, with a set of 7 desiderata followed by an evaluation of whether a given measure or a family of measures is compatible with them.

...read moreread less

Abstract: Generative models, in particular generative adversarial networks (GANs), have received significant attention recently A number of GAN variants have been proposed and have been utilized in many applications Despite large strides in terms of theoretical progress, evaluating and comparing GANs remains a daunting task While several measures have been introduced, as of yet, there is no consensus as to which measure best captures strengths and limitations of models and should be used for fair model comparison As in other areas of computer vision and machine learning, it is critical to settle on one or few good measures to steer the progress in this field In this paper, I review and critically discuss more than 24 quantitative and 5 qualitative measures for evaluating generative models with a particular emphasis on GAN-derived models I also provide a set of 7 desiderata followed by an evaluation of whether a given measure or a family of measures is compatible with them

...read moreread less

262 citations

Posted Content•

Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences.

[...]

Motonobu Kanagawa, Philipp Hennig, Dino Sejdinovic, Bharath K. Sriperumbudur

06 Jul 2018-arXiv: Machine Learning

TL;DR: This paper is an attempt to bridge the conceptual gaps between researchers working on the two widely used approaches based on positive definite kernels: Bayesian learning or inference using Gaussian processes on the one side, and frequentist kernel methods based on reproducing kernel Hilbert spaces on the other.

...read moreread less

Abstract: This paper is an attempt to bridge the conceptual gaps between researchers working on the two widely used approaches based on positive definite kernels: Bayesian learning or inference using Gaussian processes on the one side, and frequentist kernel methods based on reproducing kernel Hilbert spaces on the other. It is widely known in machine learning that these two formalisms are closely related; for instance, the estimator of kernel ridge regression is identical to the posterior mean of Gaussian process regression. However, they have been studied and developed almost independently by two essentially separate communities, and this makes it difficult to seamlessly transfer results between them. Our aim is to overcome this potential difficulty. To this end, we review several old and new results and concepts from either side, and juxtapose algorithmic quantities from each framework to highlight close similarities. We also provide discussions on subtle philosophical and theoretical differences between the two approaches.

...read moreread less

224 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75

Collapse

References

PDF

Open Access

More filters

Book•

The Nature of Statistical Learning Theory

[...]

Vladimir Vapnik¹•Institutions (1)

Bell Labs¹

01 Jan 1995

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?

...read moreread less

Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

...read moreread less

40,147 citations

Journal Article•DOI•

Generative Adversarial Nets

[...]

Ian Goodfellow¹, Jean Pouget-Abadie¹, Mehdi Mirza¹, Bing Xu¹, David Warde-Farley¹, Sherjil Ozair², Aaron Courville¹, Yoshua Bengio¹ - Show less +4 more•Institutions (2)

Université de Montréal¹, Indian Institute of Technology Delhi²

08 Dec 2014

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to ½ everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

...read moreread less

38,211 citations

Journal Article•DOI•

Support-Vector Networks

[...]

Corinna Cortes¹, Vladimir Vapnik¹•Institutions (1)

Bell Labs¹

15 Sep 1995-Machine Learning

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

...read moreread less

Abstract: The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

...read moreread less

37,861 citations

Journal Article•DOI•

Pattern Recognition and Machine Learning

[...]

Radford M. Neal

01 Aug 2007-Technometrics

TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.

...read moreread less

Abstract: (2007). Pattern Recognition and Machine Learning. Technometrics: Vol. 49, No. 3, pp. 366-366.

...read moreread less

18,802 citations

Journal Article•DOI•

Pattern Classification and Scene Analysis.

[...]

Ulf Grenander, Richard O. Duda, Peter E. Hart

01 Sep 1974-Journal of the American Statistical Association

14,948 citations