scispace - formally typeset
Search or ask a question

Showing papers by "David L. Donoho published in 1997"


Journal ArticleDOI
TL;DR: The basis empirically selected by dyadic CART is shown to be nearly as good as a basis ideally adapted to the underlying f, and the risk of estimation in an ideally adapted anisotropic Haar basis is shows to be comparable to the minimax risk over anisotrop smoothness classes.
Abstract: We study what we call "dyadic CART"-a method of nonparametric regression which constructs a recursive partition by optimizing a complexity penalized sum of squares, where the optimization is over all recursive partitions arising from midpoint splits. We show that the method is adaptive to unknown degrees of anisotropic smoothness. Specifically, consider the anisotropic smoothness classes of Nikol'skii, consisting of bivariate functions f(xl, x2) whose finite difference of distance h in direction i is bounded in LP norm by Ch8i, i = 1, 2. We show that dyadic CART, with an appropriate complexity penalty parameter A - u2 Const log(n), is within logarithmic terms of minimax over every anisotropic smoothness class 0 < C < x, 0 < 81, 82 < 1. The proof shows that dyadic CART is identical to a certain adaptive best-ortho-basis algorithm based on the library of all anisotropic Haar bases. Then it applies empirical basis selection ideas of Donoho and Johnstone. The basis empirically selected by dyadic CART is shown to be nearly as good as a basis ideally adapted to the underlying f. The risk of estimation in an ideally adapted anisotropic Haar basis is shown to be comparable to the minimax risk over anisotropic smoothness classes. Underlying the success of this argument is harmonic analysis of anisotropic smoothness classes. We show that, for each anisotropic smoothness class, there is an anisotropic Haar basis which is a best orthogonal basis for representing that smoothness class; the basis is optimal not just within the library of anisotropic Haar bases, but among all orthogonal bases of L2[0, 1]2.

230 citations


01 Jan 1997
TL;DR: In this article, the authors presented at the International Conference on Wavelets and Applications, Toulouse, France, June, 1992, based on presentation at the NSF DMS 92-09130.
Abstract: Based on presentation at the International Conference on Wavelets and Applications, Toulouse, France, June, 1992. Supported by NSF DMS 92-09130. With appreciation to S. Roques for patience and Y. Meyer for encouragement. It is a pleasure to thank Iain Johnstone with whom many of these theoretical results have been derived, and Carl Taswell with whom Johnstone and I have developed the software used here.

97 citations


Book
01 Jan 1997
TL;DR: This paper considers the application of FDR thresholding to a non-Gaussian setting, in hopes of learning whether the good asymptotic properties of Roosevelt thresholding as an estimation tool hold more broadly than just at the standard Gaussian model.
Abstract: Control of the False Discovery Rate (FDR) is an important development in multiple hypothesis testing, allowing the user to limit the fraction of rejected null hypotheses which correspond to false rejections (i.e. false discoveries). The FDR principle also can be used in multiparameter estimation problems to set thresholds for separating signal from noise when the signal is sparse. Success has been proven when the noise is Gaussian; see [3]. In this paper, we consider the application of FDR thresholding to a non-Gaussian setting, in hopes of learning whether the good asymptotic properties of FDR thresholding as an estimation tool hold more broadly than just at the standard Gaussian model. We consider a vector Xi, i = 1,...,n, whose coordinates are independent exponential with individual means µi. The vector µ is thought to be sparse, with most coordinates 1 and a small fraction significantly larger than 1. This models a situation where most coordinates are simply ‘noise’, but a small fraction of the coordinates contain ‘signal’. We develop an estimation theory working with log(µi) as the estimand, and use the percoordinate mean-squared error in recovering log(µi) to measure risk. We consider minimax estimation over parameter spaces defined by constraints on the per-coordinate ‘ p norm of log(µi): 1 n ( P n i=1 log p (µ i)) p . Members of such spaces are vectors (µ i) which are sparsely heterogeneous. We find that, for large n and small , FDR thresholding can be nearly minimax, increasingly so as decreases. The FDR control parameter 0 1 prevents near minimaxity. These conclusions mirror those found in the Gaussian case in [3]. The techniques developed here seem applicable to a wide range of other distributional assumptions, other loss measures, and non-i.i.d. dependency structures.

59 citations


Book ChapterDOI
01 Jan 1997
TL;DR: A method for curve estimation based on n noisy data that translates the empirical wavelet coefficients towards the origin by an amount that is nearly minimax for a wide variety of loss functions and a broader near-optimality than anything previously proposed in the minimax literature.
Abstract: We discuss a method for curve estimation based on n noisy data; one translates the empirical wavelet coefficients towards the origin by an amount \( \sqrt {{2\log \left( n \right)}} \cdot \sigma /\sqrt {n}\) The method is nearly minimax for a wide variety of loss functions-e.g. pointwise error, global error measured in LP norms, pointwise and global error in estimation of derivatives—and for a wide range of smoothness classes, including standard Holder classes, Sobolev classes, and Bounded Variation. This is a broader near-optimality than anything previously proposed in the minimax literature. The theory underlying the method exploits a correspondence between statistical questions and questions of optimal recovery and information-based complexity. This paper contains a detailed proof of the result announced in Donoho, Johnstone, Kerkyacharian & Picard (1995).

34 citations


Proceedings ArticleDOI
16 Apr 1997
TL;DR: Modulation classification by an approach based on Hellinger distance methods is studied, finding a hierarchy of candidate modulation types can be automatically constructed and a hierarchical recognition scheme is derived.
Abstract: Automatic modulation recognition has become important in wireless communications for both civilian and military purposes. Assuming a 5 dB signal-to-noise ratio (SNR), we studied modulation classification by an approach based on Hellinger distance (HD) methods. The advantages of this approach compared to either the likelihood method or the "key features" extraction method are robustness and simplicity. Also, a hierarchy of candidate modulation types can be automatically constructed; then a hierarchical recognition scheme is derived. Visualization of the hierarchy of modulation clustering can be obtained simply. A computational study of 15 modulation types is given.

24 citations


Proceedings ArticleDOI
02 Nov 1997
TL;DR: Analytic and computational results are presented to show that the nonlinear pyramid has a very different performance compared to traditional wavelets when coping with non-Gaussian data.
Abstract: It is well known that wavelet transforms can be derived from stationary linear refinement subdivision schemes. We discuss a special nonlinear refinement subdivision scheme-median-interpolation. It is a nonlinear cousin of Deslauriers-Dubuc (1992) interpolation and of average-interpolation. The refinement scheme is based on constructing polynomials which interpolate median functionals of the underlying object. The refinement scheme can be deployed in a multiresolution fashion to construct nonlinear pyramid schemes and associated forward and inverse transforms. We discuss the basic properties of this transform and its possible use in wavelet de-noising schemes for badly non-Gaussian data. Analytic and computational results are presented to show that the nonlinear pyramid has a very different performance compared to traditional wavelets when coping with non-Gaussian data.

20 citations


Book ChapterDOI
01 Jan 1997
TL;DR: In this article, the optimal rate of convergence is defined as the minimax risk from n observations, i.e., the probability that a convex class of smooth functions (e.g. a class of convex smooth functions) will converge to a real-valued function in a given set of observations.
Abstract: Let f = f(t), t ∈ R d be an unknown “object” (real-valued function), and suppose we are interested in recovering the nonlinear functional T(f). We know a priori that f ∈F, a certain convex class of functions (e.g. a class of smooth functions). For various types of measurements Yn=(yl, y2,…, yn), problems of this form arise in statistical settings, such as nonparametric density estimation and nonparametric regression estimation; but they also arise in signal recovery and image processing. In such problems, there generally exists an “optimal rate of convergence”: the minimax risk from n observations, \( R\left( n \right) = \mathop{{\inf }}\limits_{{\hat{T}}} \mathop{{\sup }}\limits_{{f \in F}} E{{\left( {\hat{T}\left( {{{Y}_{n}}} \right) - T\left( f \right)} \right)}^{2}}\) tends to zero as. \( R\left( n \right) \asymp {{n}^{{ - r}}}\) There is ariety of functionals T, function classes.F, and types of observation Yn; the literature is really too extensive to list here, although we mention Ibragimov & Has’minskii (1981), Sacks & Ylvisaker (1981), and Stone (1980). Lucien Le Cam (1973) has contributed directly to this literature, in his typical abstract and profound way; his ideas have stimulated the work of others in the field, e.g. Donoho & Liu (1991a).

4 citations


01 Jan 1997
TL;DR: A method for curve estimation based on n noisy data that translates the empirical wavelet coefficients towards the origin by an amount v210g(n) is discussed, which is a broader near-optimality than anything previously proposed in the minimax literature.
Abstract: We discuss a method for curve estimation based on n noisy data; one translates the empirical wavelet coefficients towards the origin by an amount v210g(n) . aj.,fii. The method is nearly minimax for a wide variety of loss functions----e.g. pointwise error, global error measured in LP norms, pointwise and global error in estimation of derivatives-and for a wide range of smoothness classes, including standard HOlder classes, Sobolev classes, and Bounded Variation. This is a broader near-optimality than anything previously proposed in the minimax literature. The theory underlying the method exploits a correspondence between statistical ques­ tions and questions of optimal recovery and information-based complexity. This paper contains a detailed proof of the result announced in Donoho, Johnstone, Kerkyacharian & Picard (1995).