scispace - formally typeset
Search or ask a question
Author

Arnaud Guyader

Bio: Arnaud Guyader is an academic researcher from University of Paris. The author has contributed to research in topics: Estimator & Monte Carlo method. The author has an hindex of 19, co-authored 52 publications receiving 1746 citations. Previous affiliations of Arnaud Guyader include Pierre-and-Marie-Curie University & École des ponts ParisTech.


Papers
More filters
Journal ArticleDOI
TL;DR: This article proposes an adaptive algorithm to cope with the estimation of rare event probability that is asymptotically consistent, costs just a little bit more than classical multilevel splitting, and has the same efficiency in terms of asymPTotic variance.
Abstract: The estimation of rare event probability is a crucial issue in areas such as reliability, telecommunications, aircraft management. In complex systems, analytical study is out of question and one has to use Monte Carlo methods. When rare is really rare, which means a probability less than 10−9, naive Monte Carlo becomes unreasonable. A widespread technique consists in multilevel splitting, but this method requires enough knowledge about the system to decide where to put the levels at hand. This, unfortunately, is not always possible. In this article, we propose an adaptive algorithm to cope with this problem: The estimation is asymptotically consistent, costs just a little bit more than classical multilevel splitting, and has the same efficiency in terms of asymptotic variance. In the one-dimensional case, we rigorously prove the a.s. convergence and the asymptotic normality of our estimator, with the same variance as with other algorithms that use fixed crossing levels. In our proofs we mainly us...

291 citations

Journal ArticleDOI
TL;DR: A novel strategy for simulating rare events and an associated Monte Carlo estimation of tail probabilities using a system of interacting particles and exploits a Feynman-Kac representation of that system to analyze their fluctuations.
Abstract: This paper discusses a novel strategy for simulating rare events and an associated Monte Carlo estimation of tail probabilities. Our method uses a system of interacting particles and exploits a Feynman-Kac representation of that system to analyze their fluctuations. Our precise analysis of the variance of a standard multilevel splitting algorithm reveals an opportunity for improvement. This leads to a novel method that relies on adaptive levels and produces, in the limit of an idealized version of the algorithm, estimates with optimal variance. The motivation for this theoretical work comes from problems occurring in watermarking and fingerprinting of digital contents, which represents a new field of applications of rare event simulation techniques. Some numerical results show performance close to the idealized version of our technique for these practical applications.

216 citations

Journal ArticleDOI
TL;DR: The decoding scheme proposed can be viewed as a turbo algorithm using alternately the intersymbol correlation due to the Markov source and the redundancy introduced by the channel code, which is used as a translator of soft information from the bit clock to the symbol clock.
Abstract: We analyze the dependencies between the variables involved in the source and channel coding chain. This analysis is carried out in the framework of Bayesian networks, which provide both an intuitive representation for the global model of the coding chain and a way of deriving joint (soft) decoding algorithms. Three sources of dependencies are involved in the chain: (1) the source model, a Markov chain of symbols; (2) the source coder model, based on a variable length code (VLC), for example a Huffman code; and (3) the channel coder, based on a convolutional error correcting code. Joint decoding relying on the hidden Markov model (HMM) of the global coding chain is intractable, except in trivial cases. We advocate instead an iterative procedure inspired from serial turbo codes, in which the three models of the coding chain are used alternately. This idea of using separately each factor of a big product model inside an iterative procedure usually requires the presence of an interleaver between successive components. We show that only one interleaver is necessary here, placed between the source coder and the channel coder. The decoding scheme we propose can be viewed as a turbo algorithm using alternately the intersymbol correlation due to the Markov source and the redundancy introduced by the channel code. The intermediary element, the source coder model, is used as a translator of soft information from the bit clock to the symbol clock.

117 citations

Journal ArticleDOI
TL;DR: In this article, a nonasymptotic theorem for interacting particle approximations of unnormalized Feynman-Kac models is presented, where the L(2)-relative error of these weighted particle measures grows linearly with respect to the time horizon.
Abstract: We present a nonasymptotic theorem for interacting particle approximations of unnormalized Feynman-Kac models. We provide an original stochastic analysis-based on Feynman-Kac semigroup techniques combined with recently developed coalescent tree-based functional representations of particle block distributions. We present some regularity conditions under which the L(2)-relative error of these weighted particle measures grows linearly with respect to the time horizon yielding what seems to be the first results of this type for this class of unnormalized models. We also illustrate these results in the context of particle absorption models, with a special interest in rare event analysis.

114 citations

Journal ArticleDOI
TL;DR: In this article, the authors analyze approximate Bayesian computations from the point of view of k-nearest neighbor theory and explore the statistical properties of its outputs, in particular some asymptotic features of the genuine conditional density estimate associated with ABC, which is an interesting hybrid between a kNN and a kernel method.
Abstract: Approximate Bayesian Computation (ABC for short) is a family of computational techniques which offer an almost automated solution in situations where evaluation of the posterior likelihood is computationally prohibitive, or whenever suitable likelihoods are not available. In the present paper, we analyze the procedure from the point of view of k-nearest neighbor theory and explore the statistical properties of its outputs. We discuss in particular some asymptotic features of the genuine conditional density estimate associated with ABC, which is an interesting hybrid between a k-nearest neighbor and a kernel method.

113 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: It is shown here how it is possible to build efficient high dimensional proposal distributions by using sequential Monte Carlo methods, which allows not only to improve over standard Markov chain Monte Carlo schemes but also to make Bayesian inference feasible for a large class of statistical models where this was not previously so.
Abstract: Summary. Markov chain Monte Carlo and sequential Monte Carlo methods have emerged as the two main tools to sample from high dimensional probability distributions. Although asymptotic convergence of Markov chain Monte Carlo algorithms is ensured under weak assumptions, the performance of these algorithms is unreliable when the proposal distributions that are used to explore the space are poorly chosen and/or if highly correlated variables are updated independently. We show here how it is possible to build efficient high dimensional proposal distributions by using sequential Monte Carlo methods. This allows us not only to improve over standard Markov chain Monte Carlo schemes but also to make Bayesian inference feasible for a large class of statistical models where this was not previously so. We demonstrate these algorithms on a non-linear state space model and a Levy-driven stochastic volatility model.

1,869 citations

Journal ArticleDOI
19 Apr 2016-Test
TL;DR: The present article reviews the most recent theoretical and methodological developments for random forests, with special attention given to the selection of parameters, the resampling mechanism, and variable importance measures.
Abstract: The random forest algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method. The approach, which combines several randomized decision trees and aggregates their predictions by averaging, has shown excellent performance in settings where the number of variables is much larger than the number of observations. Moreover, it is versatile enough to be applied to large-scale problems, is easily adapted to various ad hoc learning tasks, and returns measures of variable importance. The present article reviews the most recent theoretical and methodological developments for random forests. Emphasis is placed on the mathematical forces driving the algorithm, with special attention given to the selection of parameters, the resampling mechanism, and variable importance measures. This review is intended to provide non-experts easy access to the main ideas.

1,279 citations

Journal ArticleDOI
TL;DR: This paper developed a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm, and showed that causal forests are pointwise consistent for the true treatment effect and have an asymptotically Gaussian and centered sampling distribution.
Abstract: Many scientific and engineering challenges—ranging from personalized medicine to customized marketing recommendations—require an understanding of treatment effect heterogeneity. In this paper, we develop a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm. In the potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms. To our knowledge, this is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical infe...

1,156 citations

Posted Content
TL;DR: A review of the most recent theoretical and methodological developments for random forests can be found in this article, with special attention given to the selection of parameters, the resampling mechanism, and variable importance measures.
Abstract: The random forest algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method. The approach, which combines several randomized decision trees and aggregates their predictions by averaging, has shown excellent performance in settings where the number of variables is much larger than the number of observations. Moreover, it is versatile enough to be applied to large-scale problems, is easily adapted to various ad-hoc learning tasks, and returns measures of variable importance. The present article reviews the most recent theoretical and methodological developments for random forests. Emphasis is placed on the mathematical forces driving the algorithm, with special attention given to the selection of parameters, the resampling mechanism, and variable importance measures. This review is intended to provide non-experts easy access to the main ideas.

1,119 citations

Journal Article
TL;DR: An in-depth analysis of a random forests model suggested by Breiman (2004), which is very close to the original algorithm, and shows in particular that the procedure is consistent and adapts to sparsity, in the sense that its rate of convergence depends only on the number of strong features and not on how many noise variables are present.
Abstract: Random forests are a scheme proposed by Leo Breiman in the 2000's for building a predictor ensemble with a set of decision trees that grow in randomly selected subspaces of data. Despite growing interest and practical use, there has been little exploration of the statistical properties of random forests, and little is known about the mathematical forces driving the algorithm. In this paper, we offer an in-depth analysis of a random forests model suggested by Breiman (2004), which is very close to the original algorithm. We show in particular that the procedure is consistent and adapts to sparsity, in the sense that its rate of convergence depends only on the number of strong features and not on how many noise variables are present.

950 citations