scispace - formally typeset
Search or ask a question
Author

Luc Devroye

Bio: Luc Devroye is an academic researcher from McGill University. The author has contributed to research in topics: Random variate & Random variable. The author has an hindex of 61, co-authored 353 publications receiving 22719 citations. Previous affiliations of Luc Devroye include University of New South Wales & University of California, Davis.


Papers
More filters
Book
01 Jan 1996
TL;DR: The Bayes Error and Vapnik-Chervonenkis theory are applied as guide for empirical classifier selection on the basis of explicit specification and explicit enforcement of the maximum likelihood principle.
Abstract: Preface * Introduction * The Bayes Error * Inequalities and alternatedistance measures * Linear discrimination * Nearest neighbor rules *Consistency * Slow rates of convergence Error estimation * The regularhistogram rule * Kernel rules Consistency of the k-nearest neighborrule * Vapnik-Chervonenkis theory * Combinatorial aspects of Vapnik-Chervonenkis theory * Lower bounds for empirical classifier selection* The maximum likelihood principle * Parametric classification *Generalized linear discrimination * Complexity regularization *Condensed and edited nearest neighbor rules * Tree classifiers * Data-dependent partitioning * Splitting the data * The resubstitutionestimate * Deleted estimates of the error probability * Automatickernel rules * Automatic nearest neighbor rules * Hypercubes anddiscrete spaces * Epsilon entropy and totally bounded sets * Uniformlaws of large numbers * Neural networks * Other error estimates *Feature extraction * Appendix * Notation * References * Index

3,598 citations

Journal ArticleDOI
TL;DR: This chapter reviews the main methods for generating random variables, vectors and processes in non-uniform random variate generation, and provides information on the expected time complexity of various algorithms before addressing modern topics such as indirectly specified distributions, random processes, and Markov chain methods.

3,304 citations

Book
16 Apr 1986
TL;DR: A survey of the main methods in non-uniform random variate generation can be found in this article, where the authors provide information on the expected time complexity of various algorithms, before addressing modern topics such as indirectly specified distributions, random processes and Markov chain methods.
Abstract: This is a survey of the main methods in non-uniform random variate generation, and highlights recent research on the subject. Classical paradigms such as inversion, rejection, guide tables, and transformations are reviewed. We provide information on the expected time complexity of various algorithms, before addressing modern topics such as indirectly specified distributions, random processes, and Markov chain methods. Authors’ address: School of Computer Science, McGill University, 3480 University Street, Montreal, Canada H3A 2K6. The authors’ research was sponsored by NSERC Grant A3456 and FCAR Grant 90-ER-0291. 1. The main paradigms The purpose of this chapter is to review the main methods for generating random variables, vectors and processes. Classical workhorses such as the inversion method, the rejection method and table methods are reviewed in section 1. In section 2, we discuss the expected time complexity of various algorithms, and give a few examples of the design of generators that are uniformly fast over entire families of distributions. In section 3, we develop a few universal generators, such as generators for all log concave distributions on the real line. Section 4 deals with random variate generation when distributions are indirectly specified, e.g, via Fourier coefficients, characteristic functions, the moments, the moment generating function, distributional identities, infinite series or Kolmogorov measures. Random processes are briefly touched upon in section 5. Finally, the latest developments in Markov chain methods are discussed in section 6. Some of this work grew from Devroye (1986a), and we are carefully documenting work that was done since 1986. More recent references can be found in the book by Hörmann, Leydold and Derflinger (2004). Non-uniform random variate generation is concerned with the generation of random variables with certain distributions. Such random variables are often discrete, taking values in a countable set, or absolutely continuous, and thus described by a density. The methods used for generating them depend upon the computational model one is working with, and upon the demands on the part of the output. For example, in a ram (random access memory) model, one accepts that real numbers can be stored and operated upon (compared, added, multiplied, and so forth) in one time unit. Furthermore, this model assumes that a source capable of producing an i.i.d. (independent identically distributed) sequence of uniform [0, 1] random variables is available. This model is of course unrealistic, but designing random variate generators based on it has several advantages: first of all, it allows one to disconnect the theory of non-uniform random variate generation from that of uniform random variate generation, and secondly, it permits one to plan for the future, as more powerful computers will be developed that permit ever better approximations of the model. Algorithms designed under finite approximation limitations will have to be redesigned when the next generation of computers arrives. For the generation of discrete or integer-valued random variables, which includes the vast area of the generation of random combinatorial structures, one can adhere to a clean model, the pure bit model, in which each bit operation takes one time unit, and storage can be reported in terms of bits. Typically, one now assumes that an i.i.d. sequence of independent perfect bits is available. In this model, an elegant information-theoretic theory can be derived. For example, Knuth and Yao (1976) showed that to generate a random integer X described by the probability distribution {X = n} = pn, n ≥ 1, any method must use an expected number of bits greater than the binary entropy of the distribution, ∑

3,217 citations

Journal ArticleDOI
TL;DR: Differentiation of Integrals Consistency Lower bounds for rates of convergence rates of Convergence in L1 and Pointwise Convergence estimates Related to the Kernel Estimate and the Histogram Estimate Simulation, Inequalities, and Random Variate Generation The Transformed Kernel Estimation Applications in Discrimination Operations on Density Estimates Estimators Based on Orthogonal Series Index as mentioned in this paper.
Abstract: Differentiation of Integrals Consistency Lower Bounds for Rates of Convergence Rates of Convergence in L1 The Automatic Kernel Estimate: L1 and Pointwise Convergence Estimates Related to the Kernel Estimate and the Histogram Estimate Simulation, Inequalities, and Random Variate Generation The Transformed Kernel Estimate Applications in Discrimination Operations on Density Estimates Estimators Based on Orthogonal Series Index.

852 citations

Book
03 Nov 2011
TL;DR: A comparison of the Kernel Estimate and the Vapnik-Chervonenkis Dimension and Covering Numbers shows that the former is significantly more accurate than the latter and the latter is significantly less accurate.
Abstract: 1. Introduction.- 1.1. References.- 2. Concentration Inequalities.- 2.1. Hoeffding's Inequality.- 2.2. An Inequality for the Expected Maximal Deviation.- 2.3. The Bounded Difference Inequality.- 2.4. Examples.- 2.5. Bibliographic Remarks.- 2.6. Exercises.- 2.7. References.- 3. Uniform Deviation Inequalities.- 3.1. The Vapnik-Chervonenkis Inequality.- 3.2. Covering Numbers and Chaining.- 3.3. Example: The Dvoretzky-Kiefer-Wolfowitz Theorem.- 3.4. Bibliographic Remarks.- 3.5. Exercises.- 3.6. References.- 4. Combinatorial Tools.- 4.1. Shatter Coefficients.- 4.2. Vapnik-Chervonenkis Dimension and Shatter Coefficients.- 4.3. Vapnik-Chervonenkis Dimension and Covering Numbers.- 4.4. Examples.- 4.5. Bibliographic Remarks.- 4.6. Exercises.- 4.7. References.- 5. Total Variation.- 5.1. Density Estimation.- 5.2. The Total Variation.- 5.3. Invariance.- 5.4. Mappings.- 5.5. Convolutions.- 5.6. Normalization.- 5.7. The Lebesgue Density Theorem.- 5.8. LeCam's Inequality.- 5.9. Bibliographic Remarks.- 5.10. Exercises.- 5.11. References.- 6. Choosing a Density Estimate.- 6.1. Choosing Between Two Densities.- 6.2. Examples.- 6.3. Is the Factor of Three Necessary?.- 6.4. Maximum Likelihood Does not Work.- 6.5. L2 Distances Are To Be Avoided.- 6.6. Selection from k Densities.- 6.7. Examples Continued.- 6.8. Selection from an Infinite Class.- 6.9. Bibliographic Remarks.- 6.10. Exercises.- 6.11. References.- 7. Skeleton Estimates.- 7.1. Kolmogorov Entropy.- 7.2. Skeleton Estimates.- 7.3. Robustness.- 7.4. Finite Mixtures.- 7.5. Monotone Densities on the Hypercube.- 7.6. How To Make Gigantic Totally Bounded Classes.- 7.7. Bibliographic Remarks.- 7.8. Exercises.- 7.9. References.- 8. The Minimum Distance Estimate: Examples.- 8.1. Problem Formulation.- 8.2. Series Estimates.- 8.3. Parametric Estimates: Exponential Families.- 8.4. Neural Network Estimates.- 8.5. Mixture Classes, Radial Basis Function Networks.- 8.6. Bibliographic Remarks.- 8.7. Exercises.- 8.8. References.- 9. The Kernel Density Estimate.- 9.1. Approximating Functions by Convolutions.- 9.2. Definition of the Kernel Estimate.- 9.3. Consistency of the Kernel Estimate.- 9.4. Concentration.- 9.5. Choosing the Bandwidth.- 9.6. Choosing the Kernel.- 9.7. Rates of Convergence.- 9.8. Uniform Rate of Convergence.- 9.9. Shrinkage, and the Combination of Density Estimates.- 9.10. Bibliographic Remarks.- 9.11. Exercises.- 9.12. References.- 10. Additive Estimates and Data Splitting.- 10.1. Data Splitting.- 10.2. Additive Estimates.- 10.3. Histogram Estimates.- 10A. Bibliographic Remarks.- 10.5. Exercises.- 10.6. References.- 11. Bandwidth Selection for Kernel Estimates.- 11.1. The Kernel Estimate with Riemann Kernel.- 11.2. General Kernels, Kernel Complexity.- 11.3. Kernel Complexity: Univariate Examples.- 11.4. Kernel Complexity: Multivariate Kernels.- 11.5. Asymptotic Optimality.- 11.6. Bibliographic Remarks.- 11.7. Exercises.- 11.8. References.- 12. Multiparameter Kernel Estimates.- 12.1. Multivariate Kernel Estimates-Product Kernels.- 12.2. Multivariate Kernel Estimates-Ellipsoidal Kernels.- 12.3. Variable Kernel Estimates.- 12.4. Tree-Structured Partitions.- 12.5. Changepoints and Bump Hunting.- 12.6. Bibliographic Remarks.- 12.7. Exercises.- 12.8. References.- 13. Wavelet Estimates.- 13.1. Definitions.- 13.2. Smoothing.- 13.3. Thresholding.- 13.4. Soft Thresholding.- 13.5. Bibliographic Remarks.- 13.6. Exercises.- 13.7. References.- 14. The Transformed Kernel Estimate.- 14.1. The Transformed Kernel Estimate.- 14.2. Box-Cox Transformations.- 14.3. Piecewise Linear Transformations.- 14.4. Bibliographic Remarks.- 14.5. Exercises.- 14.6. References.- 15. Minimax Theory.- 15.1. Estimating a Density from One Data Point.- 15.2. The General Minimax Problem.- 15.3. Rich Classes.- 3.3. Example: The Dvoretzky-Kiefer-Wolfowitz Theorem.- 3.4. Bibliographic Remarks.- 3.5. Exercises.- 3.6. References.- 4. Combinatorial Tools.- 4.1. Shatter Coefficients.- 4.2. Vapnik-Chervonenkis Dimension and Shatter Coefficients.- 4.3. Vapnik-Chervonenkis Dimension and Covering Numbers.- 4.4. Examples.- 4.5. Bibliographic Remarks.- 4.6. Exercises.- 4.7. References.- 5. Total Variation.- 5.1. Density Estimation.- 5.2. The Total Variation.- 5.3. Invariance.- 5.4. Mappings.- 5.5. Convolutions.- 5.6. Normalization.- 5.7. The Lebesgue Density Theorem.- 5.8. LeCam's Inequality.- 5.9. Bibliographic Remarks.- 5.10. Exercises.- 5.11. References.- 6. Choosing a Density Estimate.- 6.1. Choosing Between Two Densities.- 6.2. Examples.- 6.3. Is the Factor of Three Necessary?.- 6.4. Maximum Likelihood Does not Work.- 6.5. L2 Distances Are To Be Avoided.- 6.6. Selection from k Densities.- 6.7. Examples Continued.- 6.8. Selection from an Infinite Class.- 6.9. Bibliographic Remarks.- 6.10. Exercises.- 6.11. References.- 7. Skeleton Estimates.- 7.1. Kolmogorov Entropy.- 7.2. Skeleton Estimates.- 7.3. Robustness.- 7.4. Finite Mixtures.- 7.5. Monotone Densities on the Hypercube.- 7.6. How To Make Gigantic Totally Bounded Classes.- 7.7. Bibliographic Remarks.- 7.8. Exercises.- 7.9. References.- 8. The Minimum Distance Estimate: Examples.- 8.1. Problem Formulation.- 8.2. Series Estimates.- 8.3. Parametric Estimates: Exponential Families.- 8.4. Neural Network Estimates.- 8.5. Mixture Classes, Radial Basis Function Networks.- 8.6. Bibliographic Remarks.- 8.7. Exercises.- 8.8. References.- 9. The Kernel Density Estimate.- 9.1. Approximating Functions by Convolutions.- 9.2. Definition of the Kernel Estimate.- 9.3. Consistency of the Kernel Estimate.- 9.4. Concentration.- 9.5. Choosing the Bandwidth.- 9.6. Choosing the Kernel.- 9.7. Rates of Convergence.- 9.8. Uniform Rate of Convergence.- 9.9. Shrinkage, and the Combination of Density Estimates.- 9.10. Bibliographic Remarks.- 9.11. Exercises.- 9.12. References.- 10. Additive Estimates and Data Splitting.- 10.1. Data Splitting.- 10.2. Additive Estimates.- 10.3. Histogram Estimates.- 10A. Bibliographic Remarks.- 10.5. Exercises.- 10.6. References.- 11. Bandwidth Selection for Kernel Estimates.- 11.1. The Kernel Estimate with Riemann Kernel.- 11.2. General Kernels, Kernel Complexity.- 11.3. Kernel Complexity: Univariate Examples.- 11.4. Kernel Complexity: Multivariate Kernels.- 11.5. Asymptotic Optimality.- 11.6. Bibliographic Remarks.- 11.7. Exercises.- 11.8. References.- 12. Multiparameter Kernel Estimates.- 12.1. Multivariate Kernel Estimates-Product Kernels.- 12.2. Multivariate Kernel Estimates-Ellipsoidal Kernels.- 12.3. Variable Kernel Estimates.- 12.4. Tree-Structured Partitions.- 12.5. Changepoints and Bump Hunting.- 12.6. Bibliographic Remarks.- 12.7. Exercises.- 12.8. References.- 13. Wavelet Estimates.- 13.1. Definitions.- 13.2. Smoothing.- 13.3. Thresholding.- 13.4. Soft Thresholding.- 13.5. Bibliographic Remarks.- 13.6. Exercises.- 13.7. References.- 14. The Transformed Kernel Estimate.- 14.1. The Transformed Kernel Estimate.- 14.2. Box-Cox Transformations.- 14.3. Piecewise Linear Transformations.- 14.4. Bibliographic Remarks.- 14.5. Exercises.- 14.6. References.- 15. Minimax Theory.- 15.1. Estimating a Density from One Data Point.- 15.2. The General Minimax Problem.- 15.3. Rich Classes.- 15.4. Assouad's Lemma.- 15.5. Example: The Class of Convex Densities.- 15.6. Additional Examples.- 15.7. Tuning the Parameters of Variable Kernel Estimates.- 15.8. Sufficient Statistics.- 15.9. Bibliographic Remarks.- 15.10. Exercises.- 15.11. References.- 16. Choosing the Kernel Order.- 16.1. Introduction.- 16.2. Standard Kernel Estimate: Riemann Kernels.- 16.3. Standard Kernel Estimates: General Kernels.- 16.4. An Infinite Family of Kernels.- 16.5. Bibliographic Remarks.- 16.6. Exercises.- 16.7. References.- 17. Bandwidth Choice with Superkernels.- 17.1. Superkernels.- 17.2. The Trapezoidal Kernel.- 17.3. Bandwidth Selection.- 17.4. Bibliographic Remarks.- 17.5. Exercises.- 17.6. References.- Author Index.

828 citations


Cited by
More filters
Book
Vladimir Vapnik1
01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

40,147 citations

Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

Proceedings Article
01 Jan 2014
TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
Abstract: How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.

20,769 citations

Book
25 Oct 1999
TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

20,196 citations

Book
01 Jan 1995
TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.
Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

19,056 citations