Home
/
Authors
/
Luc Devroye

Author

Luc Devroye

Other affiliations: University of New South Wales, University of California, Davis

Bio: Luc Devroye is an academic researcher from McGill University. The author has contributed to research in topics: Random variate & Random variable. The author has an hindex of 61, co-authored 353 publications receiving 22719 citations. Previous affiliations of Luc Devroye include University of New South Wales & University of California, Davis.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976

Papers

PDF

Open Access

More filters

Book•

A Probabilistic Theory of Pattern Recognition

[...]

Luc Devroye, László Györfi, Gábor Lugosi

01 Jan 1996

TL;DR: The Bayes Error and Vapnik-Chervonenkis theory are applied as guide for empirical classifier selection on the basis of explicit specification and explicit enforcement of the maximum likelihood principle.

...read moreread less

Abstract: Preface * Introduction * The Bayes Error * Inequalities and alternatedistance measures * Linear discrimination * Nearest neighbor rules *Consistency * Slow rates of convergence Error estimation * The regularhistogram rule * Kernel rules Consistency of the k-nearest neighborrule * Vapnik-Chervonenkis theory * Combinatorial aspects of Vapnik-Chervonenkis theory * Lower bounds for empirical classifier selection* The maximum likelihood principle * Parametric classification *Generalized linear discrimination * Complexity regularization *Condensed and edited nearest neighbor rules * Tree classifiers * Data-dependent partitioning * Splitting the data * The resubstitutionestimate * Deleted estimates of the error probability * Automatickernel rules * Automatic nearest neighbor rules * Hypercubes anddiscrete spaces * Epsilon entropy and totally bounded sets * Uniformlaws of large numbers * Neural networks * Other error estimates *Feature extraction * Appendix * Notation * References * Index

...read moreread less

3,598 citations

Journal Article•DOI•

Non-Uniform Random Variate Generation.

[...]

B. J. T. Morgan, Luc Devroye

01 Sep 1988-Biometrics

TL;DR: This chapter reviews the main methods for generating random variables, vectors and processes in non-uniform random variate generation, and provides information on the expected time complexity of various algorithms before addressing modern topics such as indirectly specified distributions, random processes, and Markov chain methods.

...read moreread less

3,304 citations

Book•

Non-uniform random variate generation

[...]

Luc Devroye

16 Apr 1986

TL;DR: A survey of the main methods in non-uniform random variate generation can be found in this article, where the authors provide information on the expected time complexity of various algorithms, before addressing modern topics such as indirectly specified distributions, random processes and Markov chain methods.

...read moreread less

Abstract: This is a survey of the main methods in non-uniform random variate generation, and highlights recent research on the subject. Classical paradigms such as inversion, rejection, guide tables, and transformations are reviewed. We provide information on the expected time complexity of various algorithms, before addressing modern topics such as indirectly specified distributions, random processes, and Markov chain methods. Authors’ address: School of Computer Science, McGill University, 3480 University Street, Montreal, Canada H3A 2K6. The authors’ research was sponsored by NSERC Grant A3456 and FCAR Grant 90-ER-0291. 1. The main paradigms The purpose of this chapter is to review the main methods for generating random variables, vectors and processes. Classical workhorses such as the inversion method, the rejection method and table methods are reviewed in section 1. In section 2, we discuss the expected time complexity of various algorithms, and give a few examples of the design of generators that are uniformly fast over entire families of distributions. In section 3, we develop a few universal generators, such as generators for all log concave distributions on the real line. Section 4 deals with random variate generation when distributions are indirectly specified, e.g, via Fourier coefficients, characteristic functions, the moments, the moment generating function, distributional identities, infinite series or Kolmogorov measures. Random processes are briefly touched upon in section 5. Finally, the latest developments in Markov chain methods are discussed in section 6. Some of this work grew from Devroye (1986a), and we are carefully documenting work that was done since 1986. More recent references can be found in the book by Hörmann, Leydold and Derflinger (2004). Non-uniform random variate generation is concerned with the generation of random variables with certain distributions. Such random variables are often discrete, taking values in a countable set, or absolutely continuous, and thus described by a density. The methods used for generating them depend upon the computational model one is working with, and upon the demands on the part of the output. For example, in a ram (random access memory) model, one accepts that real numbers can be stored and operated upon (compared, added, multiplied, and so forth) in one time unit. Furthermore, this model assumes that a source capable of producing an i.i.d. (independent identically distributed) sequence of uniform [0, 1] random variables is available. This model is of course unrealistic, but designing random variate generators based on it has several advantages: first of all, it allows one to disconnect the theory of non-uniform random variate generation from that of uniform random variate generation, and secondly, it permits one to plan for the future, as more powerful computers will be developed that permit ever better approximations of the model. Algorithms designed under finite approximation limitations will have to be redesigned when the next generation of computers arrives. For the generation of discrete or integer-valued random variables, which includes the vast area of the generation of random combinatorial structures, one can adhere to a clean model, the pure bit model, in which each bit operation takes one time unit, and storage can be reported in terms of bits. Typically, one now assumes that an i.i.d. sequence of independent perfect bits is available. In this model, an elegant information-theoretic theory can be derived. For example, Knuth and Yao (1976) showed that to generate a random integer X described by the probability distribution {X = n} = pn, n ≥ 1, any method must use an expected number of bits greater than the binary entropy of the distribution, ∑

...read moreread less

3,217 citations

Journal Article•DOI•

Nonparametric density estimation : the L[1] view

[...]

Luc Devroye, László Györfi

01 Mar 1987-Journal of the American Statistical Association

TL;DR: Differentiation of Integrals Consistency Lower bounds for rates of convergence rates of Convergence in L1 and Pointwise Convergence estimates Related to the Kernel Estimate and the Histogram Estimate Simulation, Inequalities, and Random Variate Generation The Transformed Kernel Estimation Applications in Discrimination Operations on Density Estimates Estimators Based on Orthogonal Series Index as mentioned in this paper.

...read moreread less

Abstract: Differentiation of Integrals Consistency Lower Bounds for Rates of Convergence Rates of Convergence in L1 The Automatic Kernel Estimate: L1 and Pointwise Convergence Estimates Related to the Kernel Estimate and the Histogram Estimate Simulation, Inequalities, and Random Variate Generation The Transformed Kernel Estimate Applications in Discrimination Operations on Density Estimates Estimators Based on Orthogonal Series Index.

...read moreread less

852 citations

Book•

Combinatorial Methods in Density Estimation

[...]

Luc Devroye, Gábor Lugosi

03 Nov 2011

TL;DR: A comparison of the Kernel Estimate and the Vapnik-Chervonenkis Dimension and Covering Numbers shows that the former is significantly more accurate than the latter and the latter is significantly less accurate.

...read moreread less

Abstract: 1. Introduction.- 1.1. References.- 2. Concentration Inequalities.- 2.1. Hoeffding's Inequality.- 2.2. An Inequality for the Expected Maximal Deviation.- 2.3. The Bounded Difference Inequality.- 2.4. Examples.- 2.5. Bibliographic Remarks.- 2.6. Exercises.- 2.7. References.- 3. Uniform Deviation Inequalities.- 3.1. The Vapnik-Chervonenkis Inequality.- 3.2. Covering Numbers and Chaining.- 3.3. Example: The Dvoretzky-Kiefer-Wolfowitz Theorem.- 3.4. Bibliographic Remarks.- 3.5. Exercises.- 3.6. References.- 4. Combinatorial Tools.- 4.1. Shatter Coefficients.- 4.2. Vapnik-Chervonenkis Dimension and Shatter Coefficients.- 4.3. Vapnik-Chervonenkis Dimension and Covering Numbers.- 4.4. Examples.- 4.5. Bibliographic Remarks.- 4.6. Exercises.- 4.7. References.- 5. Total Variation.- 5.1. Density Estimation.- 5.2. The Total Variation.- 5.3. Invariance.- 5.4. Mappings.- 5.5. Convolutions.- 5.6. Normalization.- 5.7. The Lebesgue Density Theorem.- 5.8. LeCam's Inequality.- 5.9. Bibliographic Remarks.- 5.10. Exercises.- 5.11. References.- 6. Choosing a Density Estimate.- 6.1. Choosing Between Two Densities.- 6.2. Examples.- 6.3. Is the Factor of Three Necessary?.- 6.4. Maximum Likelihood Does not Work.- 6.5. L2 Distances Are To Be Avoided.- 6.6. Selection from k Densities.- 6.7. Examples Continued.- 6.8. Selection from an Infinite Class.- 6.9. Bibliographic Remarks.- 6.10. Exercises.- 6.11. References.- 7. Skeleton Estimates.- 7.1. Kolmogorov Entropy.- 7.2. Skeleton Estimates.- 7.3. Robustness.- 7.4. Finite Mixtures.- 7.5. Monotone Densities on the Hypercube.- 7.6. How To Make Gigantic Totally Bounded Classes.- 7.7. Bibliographic Remarks.- 7.8. Exercises.- 7.9. References.- 8. The Minimum Distance Estimate: Examples.- 8.1. Problem Formulation.- 8.2. Series Estimates.- 8.3. Parametric Estimates: Exponential Families.- 8.4. Neural Network Estimates.- 8.5. Mixture Classes, Radial Basis Function Networks.- 8.6. Bibliographic Remarks.- 8.7. Exercises.- 8.8. References.- 9. The Kernel Density Estimate.- 9.1. Approximating Functions by Convolutions.- 9.2. Definition of the Kernel Estimate.- 9.3. Consistency of the Kernel Estimate.- 9.4. Concentration.- 9.5. Choosing the Bandwidth.- 9.6. Choosing the Kernel.- 9.7. Rates of Convergence.- 9.8. Uniform Rate of Convergence.- 9.9. Shrinkage, and the Combination of Density Estimates.- 9.10. Bibliographic Remarks.- 9.11. Exercises.- 9.12. References.- 10. Additive Estimates and Data Splitting.- 10.1. Data Splitting.- 10.2. Additive Estimates.- 10.3. Histogram Estimates.- 10A. Bibliographic Remarks.- 10.5. Exercises.- 10.6. References.- 11. Bandwidth Selection for Kernel Estimates.- 11.1. The Kernel Estimate with Riemann Kernel.- 11.2. General Kernels, Kernel Complexity.- 11.3. Kernel Complexity: Univariate Examples.- 11.4. Kernel Complexity: Multivariate Kernels.- 11.5. Asymptotic Optimality.- 11.6. Bibliographic Remarks.- 11.7. Exercises.- 11.8. References.- 12. Multiparameter Kernel Estimates.- 12.1. Multivariate Kernel Estimates-Product Kernels.- 12.2. Multivariate Kernel Estimates-Ellipsoidal Kernels.- 12.3. Variable Kernel Estimates.- 12.4. Tree-Structured Partitions.- 12.5. Changepoints and Bump Hunting.- 12.6. Bibliographic Remarks.- 12.7. Exercises.- 12.8. References.- 13. Wavelet Estimates.- 13.1. Definitions.- 13.2. Smoothing.- 13.3. Thresholding.- 13.4. Soft Thresholding.- 13.5. Bibliographic Remarks.- 13.6. Exercises.- 13.7. References.- 14. The Transformed Kernel Estimate.- 14.1. The Transformed Kernel Estimate.- 14.2. Box-Cox Transformations.- 14.3. Piecewise Linear Transformations.- 14.4. Bibliographic Remarks.- 14.5. Exercises.- 14.6. References.- 15. Minimax Theory.- 15.1. Estimating a Density from One Data Point.- 15.2. The General Minimax Problem.- 15.3. Rich Classes.- 3.3. Example: The Dvoretzky-Kiefer-Wolfowitz Theorem.- 3.4. Bibliographic Remarks.- 3.5. Exercises.- 3.6. References.- 4. Combinatorial Tools.- 4.1. Shatter Coefficients.- 4.2. Vapnik-Chervonenkis Dimension and Shatter Coefficients.- 4.3. Vapnik-Chervonenkis Dimension and Covering Numbers.- 4.4. Examples.- 4.5. Bibliographic Remarks.- 4.6. Exercises.- 4.7. References.- 5. Total Variation.- 5.1. Density Estimation.- 5.2. The Total Variation.- 5.3. Invariance.- 5.4. Mappings.- 5.5. Convolutions.- 5.6. Normalization.- 5.7. The Lebesgue Density Theorem.- 5.8. LeCam's Inequality.- 5.9. Bibliographic Remarks.- 5.10. Exercises.- 5.11. References.- 6. Choosing a Density Estimate.- 6.1. Choosing Between Two Densities.- 6.2. Examples.- 6.3. Is the Factor of Three Necessary?.- 6.4. Maximum Likelihood Does not Work.- 6.5. L2 Distances Are To Be Avoided.- 6.6. Selection from k Densities.- 6.7. Examples Continued.- 6.8. Selection from an Infinite Class.- 6.9. Bibliographic Remarks.- 6.10. Exercises.- 6.11. References.- 7. Skeleton Estimates.- 7.1. Kolmogorov Entropy.- 7.2. Skeleton Estimates.- 7.3. Robustness.- 7.4. Finite Mixtures.- 7.5. Monotone Densities on the Hypercube.- 7.6. How To Make Gigantic Totally Bounded Classes.- 7.7. Bibliographic Remarks.- 7.8. Exercises.- 7.9. References.- 8. The Minimum Distance Estimate: Examples.- 8.1. Problem Formulation.- 8.2. Series Estimates.- 8.3. Parametric Estimates: Exponential Families.- 8.4. Neural Network Estimates.- 8.5. Mixture Classes, Radial Basis Function Networks.- 8.6. Bibliographic Remarks.- 8.7. Exercises.- 8.8. References.- 9. The Kernel Density Estimate.- 9.1. Approximating Functions by Convolutions.- 9.2. Definition of the Kernel Estimate.- 9.3. Consistency of the Kernel Estimate.- 9.4. Concentration.- 9.5. Choosing the Bandwidth.- 9.6. Choosing the Kernel.- 9.7. Rates of Convergence.- 9.8. Uniform Rate of Convergence.- 9.9. Shrinkage, and the Combination of Density Estimates.- 9.10. Bibliographic Remarks.- 9.11. Exercises.- 9.12. References.- 10. Additive Estimates and Data Splitting.- 10.1. Data Splitting.- 10.2. Additive Estimates.- 10.3. Histogram Estimates.- 10A. Bibliographic Remarks.- 10.5. Exercises.- 10.6. References.- 11. Bandwidth Selection for Kernel Estimates.- 11.1. The Kernel Estimate with Riemann Kernel.- 11.2. General Kernels, Kernel Complexity.- 11.3. Kernel Complexity: Univariate Examples.- 11.4. Kernel Complexity: Multivariate Kernels.- 11.5. Asymptotic Optimality.- 11.6. Bibliographic Remarks.- 11.7. Exercises.- 11.8. References.- 12. Multiparameter Kernel Estimates.- 12.1. Multivariate Kernel Estimates-Product Kernels.- 12.2. Multivariate Kernel Estimates-Ellipsoidal Kernels.- 12.3. Variable Kernel Estimates.- 12.4. Tree-Structured Partitions.- 12.5. Changepoints and Bump Hunting.- 12.6. Bibliographic Remarks.- 12.7. Exercises.- 12.8. References.- 13. Wavelet Estimates.- 13.1. Definitions.- 13.2. Smoothing.- 13.3. Thresholding.- 13.4. Soft Thresholding.- 13.5. Bibliographic Remarks.- 13.6. Exercises.- 13.7. References.- 14. The Transformed Kernel Estimate.- 14.1. The Transformed Kernel Estimate.- 14.2. Box-Cox Transformations.- 14.3. Piecewise Linear Transformations.- 14.4. Bibliographic Remarks.- 14.5. Exercises.- 14.6. References.- 15. Minimax Theory.- 15.1. Estimating a Density from One Data Point.- 15.2. The General Minimax Problem.- 15.3. Rich Classes.- 15.4. Assouad's Lemma.- 15.5. Example: The Class of Convex Densities.- 15.6. Additional Examples.- 15.7. Tuning the Parameters of Variable Kernel Estimates.- 15.8. Sufficient Statistics.- 15.9. Bibliographic Remarks.- 15.10. Exercises.- 15.11. References.- 16. Choosing the Kernel Order.- 16.1. Introduction.- 16.2. Standard Kernel Estimate: Riemann Kernels.- 16.3. Standard Kernel Estimates: General Kernels.- 16.4. An Infinite Family of Kernels.- 16.5. Bibliographic Remarks.- 16.6. Exercises.- 16.7. References.- 17. Bandwidth Choice with Superkernels.- 17.1. Superkernels.- 17.2. The Trapezoidal Kernel.- 17.3. Bandwidth Selection.- 17.4. Bibliographic Remarks.- 17.5. Exercises.- 17.6. References.- Author Index.

...read moreread less

828 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72

Collapse

Cited by

PDF

Open Access

More filters

Book•

The Nature of Statistical Learning Theory

[...]

Vladimir Vapnik¹•Institutions (1)

Bell Labs¹

01 Jan 1995

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?

...read moreread less

Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

...read moreread less

40,147 citations

Book•

Deep Learning

[...]

Ian Goodfellow¹, Yoshua Bengio², Aaron Courville²•Institutions (2)

Google¹, Université de Montréal²

18 Nov 2016

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

...read moreread less

38,208 citations

Proceedings Article•

Auto-Encoding Variational Bayes

[...]

Diederik P. Kingma¹, Max Welling¹•Institutions (1)

University of Amsterdam¹

01 Jan 2014

TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.

...read moreread less

Abstract: How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.

...read moreread less

20,769 citations

Book•

Data Mining: Practical Machine Learning Tools and Techniques

[...]

Ian H. Witten, Eibe Frank, Mark Hall

25 Oct 1999

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.

...read moreread less

Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

...read moreread less

20,196 citations

Book•

Neural networks for pattern recognition

[...]

Christopher M. Bishop¹•Institutions (1)

Aston University¹

01 Jan 1995

TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.

...read moreread less

Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

...read moreread less

19,056 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse