scispace - formally typeset
Search or ask a question

Showing papers by "Arnaud Doucet published in 2020"


Proceedings Article
12 Jul 2020
TL;DR: Continuously indexed flows (CIFs) as discussed by the authors replace the single bijection used by normalising flows with a continuously indexed family of bijections, which can intuitively clean up mass that would otherwise be misplaced by a single Bijection.
Abstract: We show that normalising flows become pathological when used to model targets whose supports have complicated topologies. In this scenario, we prove that a flow must become arbitrarily numerically noninvertible in order to approximate the target closely. This result has implications for all flow-based models, and especially Residual Flows (ResFlows), which explicitly control the Lipschitz constant of the bijection used. To address this, we propose Continuously Indexed Flows (CIFs), which replace the single bijection used by normalising flows with a continuously indexed family of bijections, and which can intuitively "clean up" mass that would otherwise be misplaced by a single bijection. We show theoretically that CIFs are not subject to the same topological limitations as normalising flows, and obtain better empirical performance on a variety of models and benchmarks.

40 citations


Journal ArticleDOI
TL;DR: This method builds upon a number of existing algorithms in econometrics, physics, and statistics for inference in state space models, and generalizes these methods so as to accommodate complex static models.
Abstract: Sequential Monte Carlo methods, also known as particle methods, are a popular set of techniques for approximating high-dimensional probability distributions and their normalizing constants. These methods have found numerous applications in statistics and related fields; for example, for inference in nonlinear non-Gaussian state space models, and in complex static models. Like many Monte Carlo sampling schemes, they rely on proposal distributions which crucially impact their performance. We introduce here a class of controlled sequential Monte Carlo algorithms, where the proposal distributions are determined by approximating the solution to an associated optimal control problem using an iterative scheme. This method builds upon a number of existing algorithms in econometrics, physics and statistics for inference in state space models, and generalizes these methods so as to accommodate complex static models. We provide a theoretical analysis concerning the fluctuation and stability of this methodology that also provides insight into the properties of related algorithms. We demonstrate significant gains over state-of-the-art methods at a fixed computational complexity on a variety of applications.

31 citations


Posted Content
TL;DR: Deep feature instrumental variable regression (DFIV) is proposed, to address the case where relations between instruments, treatments, and outcomes may be nonlinear, and outperforms recent state-of-the-art methods on challenging IV benchmarks, including settings involving high dimensional image data.
Abstract: Instrumental variable (IV) regression is a standard strategy for learning causal relationships between confounded treatment and outcome variables from observational data by utilizing an instrumental variable, which affects the outcome only through the treatment. In classical IV regression, learning proceeds in two stages: stage 1 performs linear regression from the instrument to the treatment; and stage 2 performs linear regression from the treatment to the outcome, conditioned on the instrument. We propose a novel method, deep feature instrumental variable regression (DFIV), to address the case where relations between instruments, treatments, and outcomes may be nonlinear. In this case, deep neural nets are trained to define informative nonlinear features on the instruments and treatments. We propose an alternating training regime for these features to ensure good end-to-end performance when composing stages 1 and 2, thus obtaining highly flexible feature maps in a computationally efficient manner. DFIV outperforms recent state-of-the-art methods on challenging IV benchmarks, including settings involving high dimensional image data. DFIV also exhibits competitive performance in off-policy policy evaluation for reinforcement learning, which can be understood as an IV regression task.

28 citations


Journal ArticleDOI
TL;DR: In this article, the authors show how to use coupling techniques to generate unbiased estimators in finite time, building on recent advances for generic MCMC algorithms, and extend existing results to cover the case of polynomially ergodic Markov chains.
Abstract: Performing numerical integration when the integrand itself cannot be evaluated point-wise is a challenging task that arises in statistical analysis, notably in Bayesian inference for models with intractable likelihood functions. Markov chain Monte Carlo (MCMC) algorithms have been proposed for this setting, such as the pseudo-marginal method for latent variable models and the exchange algorithm for a class of undirected graphical models. As with any MCMC algorithm, the resulting estimators are justified asymptotically in the limit of the number of iterations, but exhibit a bias for any fixed number of iterations due to the Markov chains starting outside of stationarity. This “burn-in” bias is known to complicate the use of parallel processors for MCMC computations. We show how to use coupling techniques to generate unbiased estimators in finite time, building on recent advances for generic MCMC algorithms. We establish the theoretical validity of some of these procedures, by extending existing results to cover the case of polynomially ergodic Markov chains. The efficiency of the proposed estimators is compared with that of standard MCMC estimators, with theoretical arguments and numerical experiments including state space models and Ising models.

19 citations


Journal ArticleDOI
TL;DR: An $\mathbb{L}_r$ -inequality (which implies a strong law of large numbers) and a central limit theorem for sequential MCMC methods are established and conditions under which errors can be controlled uniformly in time are provided.
Abstract: Both sequential Monte Carlo (SMC) methods (a.k.a. ‘particle filters’) and sequential Markov chain Monte Carlo (sequential MCMC) methods constitute classes of algorithms which can be used to approximate expectations with respect to (a sequence of) probability distributions and their normalising constants. While SMC methods sample particles conditionally independently at each time step, sequential MCMC methods sample particles according to a Markov chain Monte Carlo (MCMC) kernel. Introduced over twenty years ago in [6], sequential MCMC methods have attracted renewed interest recently as they empirically outperform SMC methods in some applications. We establish an Lr-inequality (which implies a strong law of large numbers) and a central limit theorem for sequential MCMC methods and provide conditions under which errors can be controlled uniformly in time. In the context of state-space models, we also provide conditions under which sequential MCMC methods can indeed outperform standard SMC methods in terms of asymptotic variance of the corresponding Monte Carlo estimators.

18 citations


Proceedings Article
01 Jan 2020
TL;DR: This article proposed a meta-learning approach that obviates the need for sub-optimal hand-selection by automatically discovering and learning both task-specific and general reusable modules and showed that this approach outperforms existing meta learning approaches in domains like few-shot text-to-speech that have little task data and long adaptation horizons.
Abstract: Many real-world problems, including multi-speaker text-to-speech synthesis, can greatly benefit from the ability to meta-learn large models with only a few task-specific components. Updating only these task-specific modules then allows the model to be adapted to low-data tasks for as many steps as necessary without risking overfitting. Unfortunately, existing meta-learning methods either do not scale to long adaptation or else rely on handcrafted task-specific architectures. Here, we propose a meta-learning approach that obviates the need for this often sub-optimal hand-selection. In particular, we develop general techniques based on Bayesian shrinkage to automatically discover and learn both task-specific and general reusable modules. Empirically, we demonstrate that our method discovers a small set of meaningful task-specific modules and outperforms existing meta-learning approaches in domains like few-shot text-to-speech that have little task data and long adaptation horizons. We also show that existing meta-learning methods including MAML, iMAML, and Reptile emerge as special cases of our method.

12 citations


Posted Content
TL;DR: A maximum likelihood training scheme for VAEs by introducing unbiased gradient estimators of the log-likelihood is developed and it is demonstrated experimentally that VAEs fitted with unbiased estimators exhibit better predictive performance on three image datasets.
Abstract: The variational auto-encoder (VAE) is a deep latent variable model that has two neural networks in an autoencoder-like architecture; one of them parameterizes the model's likelihood. Fitting its parameters via maximum likelihood (ML) is challenging since the computation of the marginal likelihood involves an intractable integral over the latent space; thus the VAE is trained instead by maximizing a variational lower bound. Here, we develop a ML training scheme for VAEs by introducing unbiased estimators of the log-likelihood gradient. We obtain the estimators by augmenting the latent space with a set of importance samples, similarly to the importance weighted auto-encoder (IWAE), and then constructing a Markov chain Monte Carlo coupling procedure on this augmented space. We provide the conditions under which the estimators can be computed in finite time and with finite variance. We show experimentally that VAEs fitted with unbiased estimators exhibit better predictive performance.

8 citations


Posted Content
TL;DR: In this paper, the authors propose a particle method that can be thought of as a Monte Carlo approximation of the expectation maximization smoothing (EMS) scheme, which not only performs an adaptive stochastic discretization of the domain but also results in smooth approximate solutions.
Abstract: Fredholm integral equations of the first kind are the prototypical example of ill-posed linear inverse problems. They model, among other things, reconstruction of distorted noisy observations and indirect density estimation and also appear in instrumental variable regression. However, their numerical solution remains a challenging problem. Many techniques currently available require a preliminary discretization of the domain of the solution and make strong assumptions about its regularity. For example, the popular expectation maximization smoothing (EMS) scheme requires the assumption of piecewise constant solutions which is inappropriate for most applications. We propose here a novel particle method that circumvents these two issues. This algorithm can be thought of as a Monte Carlo approximation of the EMS scheme which not only performs an adaptive stochastic discretization of the domain but also results in smooth approximate solutions. We analyze the theoretical properties of the EMS iteration and of the corresponding particle algorithm. Compared to standard EMS, we show experimentally that our novel particle method provides state-of-the-art performance for realistic systems, including motion deblurring and reconstruction of cross-section images of the brain from positron emission tomography.

6 citations


Posted Content
TL;DR: Ensemble Rejection Sampling is introduced, a scheme for exact simulation from the posterior distribution of the latent states of a class of non-linear non-Gaussian state-space models and an application to rare event simulation is presented.
Abstract: We introduce Ensemble Rejection Sampling, a scheme for exact simulation from the posterior distribution of the latent states of a class of non-linear non-Gaussian state-space models. Ensemble Rejection Sampling relies on a proposal for the high-dimensional state sequence built using ensembles of state samples. Although this algorithm can be interpreted as a rejection sampling scheme acting on an extended space, we show under regularity conditions that the expected computational cost to obtain an exact sample increases cubically with the length of the state sequence instead of exponentially for standard rejection sampling. We demonstrate this methodology by sampling exactly state sequences according to the posterior distribution of a stochastic volatility model and a non-linear autoregressive process. We also present an application to rare event simulation.

6 citations


Journal ArticleDOI
TL;DR: In this article, the existence and stability of the optimal filter higher-order derivatives were studied under regularity conditions and the same derivatives were shown to be geometrically ergodic.

5 citations


Posted Content
TL;DR: In this paper, the authors explore a simple and generic approach to improve convergence to equilibrium of existing MCMC algorithms which rely on the Metropolis-Hastings (MH) update, the main building block of MCMC.
Abstract: Markov chain Monte Carlo (MCMC) methods to sample from a probability distribution $\pi$ defined on a space $(\Theta,\mathcal{T})$ consist of the simulation of realisations of Markov chains $\{\theta_{n},n\geq1\}$ of invariant distribution $\pi$ and such that the distribution of $\theta_{i}$ converges to $\pi$ as $i\rightarrow\infty$. In practice one is typically interested in the computation of expectations of functions, say $f$, with respect to $\pi$ and it is also required that averages $M^{-1}\sum_{n=1}^{M}f(\theta_{n})$ converge to the expectation of interest. The iterative nature of MCMC makes it difficult to develop generic methods to take advantage of parallel computing environments when interested in reducing time to convergence. While numerous approaches have been proposed to reduce the variance of ergodic averages, including averaging over independent realisations of $\{\theta_{n},n\geq1\}$ simulated on several computers, techniques to reduce the "burn-in" of MCMC are scarce. In this paper we explore a simple and generic approach to improve convergence to equilibrium of existing algorithms which rely on the Metropolis-Hastings (MH) update, the main building block of MCMC. The main idea is to use averages of the acceptance ratio w.r.t. multiple realisations of random variables involved, while preserving $\pi$ as invariant distribution. The methodology requires limited change to existing code, is naturally suited to parallel computing and is shown on our examples to provide substantial performance improvements both in terms of convergence to equilibrium and variance of ergodic averages. In some scenarios gains are observed even on a serial machine.

Posted Content
TL;DR: In this article, the authors provide a comprehensive theoretical analysis of magnitude and gradient based pruning at initialization and training of sparse architectures, and propose novel principled approaches which they validate experimentally on a variety of NN architectures.
Abstract: Overparameterized Neural Networks (NN) display state-of-the-art performance. However, there is a growing need for smaller, energy-efficient, neural networks tobe able to use machine learning applications on devices with limited computational resources. A popular approach consists of using pruning techniques. While these techniques have traditionally focused on pruning pre-trained NN (LeCun et al.,1990; Hassibi et al., 1993), recent work by Lee et al. (2018) has shown promising results when pruning at initialization. However, for Deep NNs, such procedures remain unsatisfactory as the resulting pruned networks can be difficult to train and, for instance, they do not prevent one layer from being fully pruned. In this paper, we provide a comprehensive theoretical analysis of Magnitude and Gradient based pruning at initialization and training of sparse architectures. This allows us to propose novel principled approaches which we validate experimentally on a variety of NN architectures.