Home
/
Authors
/
Charles J. Stone

Author

Charles J. Stone

Bio: Charles J. Stone is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: Density estimation & Probability density function. The author has an hindex of 22, co-authored 34 publications receiving 8126 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Consistent Nonparametric Regression

[...]

Charles J. Stone

01 Jul 1977-Annals of Statistics

TL;DR: In this article, a sequence of probability weight functions defined in terms of nearest neighbors is constructed and sufficient conditions for consistency are obtained, which are applied to verify the consistency of the estimators of the various quantities discussed above and the consistency in Bayes risk of the approximate Bayes rules.

...read moreread less

Abstract: Let $(X, Y)$ be a pair of random variables such that $X$ is $\mathbb{R}^d$-valued and $Y$ is $\mathbb{R}^{d'}$-valued. Given a random sample $(X_1, Y_1), \cdots, (X_n, Y_n)$ from the distribution of $(X, Y)$, the conditional distribution $P^Y(\bullet \mid X)$ of $Y$ given $X$ can be estimated nonparametrically by $\hat{P}_n^Y(A \mid X) = \sum^n_1 W_{ni}(X)I_A(Y_i)$, where the weight function $W_n$ is of the form $W_{ni}(X) = W_{ni}(X, X_1, \cdots, X_n), 1 \leqq i \leqq n$. The weight function $W_n$ is called a probability weight function if it is nonnegative and $\sum^n_1 W_{ni}(X) = 1$. Associated with $\hat{P}_n^Y(\bullet \mid X)$ in a natural way are nonparametric estimators of conditional expectations, variances, covariances, standard deviations, correlations and quantiles and nonparametric approximate Bayes rules in prediction and multiple classification problems. Consistency of a sequence $\{W_n\}$ of weight functions is defined and sufficient conditions for consistency are obtained. When applied to sequences of probability weight functions, these conditions are both necessary and sufficient. Consistent sequences of probability weight functions defined in terms of nearest neighbors are constructed. The results are applied to verify the consistency of the estimators of the various quantities discussed above and the consistency in Bayes risk of the approximate Bayes rules.

...read moreread less

1,754 citations

Journal Article•DOI•

Optimal Global Rates of Convergence for Nonparametric Regression

[...]

Charles J. Stone

01 Dec 1982-Annals of Statistics

TL;DR: In this article, it was shown that the optimal rate of convergence for an estimator of an unknown regression function (i.e., a regression function of order 2p + d) with respect to a training sample of size n = (p - m)/(2p + 2p+d) is O(n−1/n−r) under appropriate regularity conditions, where n−1 is the optimal convergence rate if q < q < \infty.

...read moreread less

Abstract: Consider a $p$-times differentiable unknown regression function $\theta$ of a $d$-dimensional measurement variable Let $T(\theta)$ denote a derivative of $\theta$ of order $m$ and set $r = (p - m)/(2p + d)$ Let $\hat{T}_n$ denote an estimator of $T(\theta)$ based on a training sample of size $n$, and let $\| \hat{T}_n - T(\theta)\|_q$ be the usual $L^q$ norm of the restriction of $\hat{T}_n - T(\theta)$ to a fixed compact set Under appropriate regularity conditions, it is shown that the optimal rate of convergence for $\| \hat{T}_n - T(\theta)\|_q$ is $n^{-r}$ if $0 < q < \infty$; while $(n^{-1} \log n)^r$ is the optimal rate if $q = \infty$

...read moreread less

1,513 citations

Journal Article•DOI•

Additive Regression and Other Nonparametric Models

[...]

Charles J. Stone

01 Jun 1985-Annals of Statistics

TL;DR: In this article, a variety of parametric and nonparametric models for the joint distribution of a pair of random variables are discussed in relation to flexibility, dimensionality, and interpretability.

...read moreread less

Abstract: Let $(X, Y)$ be a pair of random variables such that $X = (X_1, \cdots, X_J)$ and let $f$ by a function that depends on the joint distribution of $(X, Y).$ A variety of parametric and nonparametric models for $f$ are discussed in relation to flexibility, dimensionality, and interpretability. It is then supposed that each $X_j \in \lbrack 0, 1\rbrack,$ that $Y$ is real valued with mean $\mu$ and finite variance, and that $f$ is the regression function of $Y$ on $X.$ Let $f^\ast,$ of the form $f^\ast(x_1, \cdots, x_J) = \mu + f^\ast_1(x_1) + \cdots + f^\ast_J(x_J),$ be chosen subject to the constraints $Ef^\ast_j = 0$ for $1 \leq j \leq J$ to minimize $E\lbrack(f(X) - f^\ast(X))^2\rbrack.$ Then $f^\ast$ is the closest additive approximation to $f,$ and $f^\ast = f$ if $f$ itself is additive. Spline estimates of $f^\ast_j$ and its derivatives are considered based on a random sample from the distribution of $(X, Y).$ Under a common smoothness assumption on $f^\ast_j, 1 \leq j \leq J,$ and some mild auxiliary assumptions, these estimates achieve the same (optimal) rate of convergence for general $J$ as they do for $J = 1.$

...read moreread less

1,239 citations

Journal Article•DOI•

Optimal Rates of Convergence for Nonparametric Estimators

[...]

Charles J. Stone

01 Nov 1980-Annals of Statistics

TL;DR: In this paper, it was shown that for nonparametric estimators of a density function, the Taylor polynomial is the optimal (uniform) rate of convergence for a sequence of estimators.

...read moreread less

Abstract: Let $d$ denote a positive integer, $\|x\| = (x^2_1 + \cdots + x^2_d)^{1/2}$ the Euclidean norm of $x = (x_1, \cdots, x_d) \in \mathbb{R}^d, k$ a nonnegative integer, $\mathscr{C}_k$ the collection of $k$ times continuously differentiable functions on $\mathbb{R}^d$, and $g_k$ the Taylor polynomial of degree $k$ about the origin corresponding to $g \in \mathscr{C}_k$. Let $M$ and $p > k$ denote positive constants and let $U$ be an open neighborhood of the origin of $\mathbb{R}^d$. Let $\mathscr{G}$ denote the collection of functions $g \in \mathscr{C}_k$ such that $|g(x) - g_k(x)| \leq M \|x\|^P$ for $x\in U$. Let $m \leq k$ be a nonnegative integer, let $\theta_0\in\mathscr{C}_m$ and set $\Theta = \{\theta_0 + g:g \in \mathscr{G}\}$. Let $L$ be a linear differential operator of order $m$ on $\mathscr{C}_m$ and set $T(\theta) = L\theta(0)$ for $\theta \in \Theta$. Let $(X, Y)$ be a pair of random variables such that $X$ is $\mathbb{R}^d$ valued and $Y$ is real valued. It is assumed that the distribution of $X$ is absolutely continuous and that its density is bounded away from zero and infinity on $U$. The conditional distribution of $Y$ given $X$ is assumed to be (say) normal, with a conditional variance which is bounded away from zero and infinity on $U$. The regression function of $Y$ on $X$ is assumed to belong to $\Theta$. It is shown that $r = (p - m)/(2p + d)$ is the optimal (uniform) rate of convergence for a sequence $\{\hat{T}_n\}$ of estimators of $T(\theta)$ such that $\hat{T}_n$ is based on a random sample of size $n$ from the distribution of $(X, Y)$. An analogous result is obtained for nonparametric estimators of a density function.

...read moreread less

837 citations

Journal Article•DOI•

Introduction to Stochastic Processes

[...]

Paul G. Hoel, Sidney C. Port, Charles J. Stone, Richard Holley¹•Institutions (1)

Princeton University¹

01 Sep 1973

421 citations

1
2
3
4
…
5
6
7

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A tutorial on support vector regression

[...]

Alexander J. Smola¹, Bernhard Schölkopf²•Institutions (2)

Australian National University¹, Max Planck Society²

01 Aug 2004-Statistics and Computing

TL;DR: This tutorial gives an overview of the basic ideas underlying Support Vector (SV) machines for function estimation, and includes a summary of currently used algorithms for training SV machines, covering both the quadratic programming part and advanced methods for dealing with large datasets.

...read moreread less

Abstract: In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.

...read moreread less

10,696 citations

Journal Article•DOI•

Robust Locally Weighted Regression and Smoothing Scatterplots

[...]

William S. Cleveland¹•Institutions (1)

Bell Labs¹

01 Dec 1979-Journal of the American Statistical Association

TL;DR: Robust locally weighted regression as discussed by the authors is a method for smoothing a scatterplot, in which the fitted value at z k is the value of a polynomial fit to the data using weighted least squares, where the weight for (x i, y i ) is large if x i is close to x k and small if it is not.

...read moreread less

Abstract: The visual information on a scatterplot can be greatly enhanced, with little additional cost, by computing and plotting smoothed points. Robust locally weighted regression is a method for smoothing a scatterplot, (x i , y i ), i = 1, …, n, in which the fitted value at z k is the value of a polynomial fit to the data using weighted least squares, where the weight for (x i , y i ) is large if x i is close to x k and small if it is not. A robust fitting procedure is used that guards against deviant points distorting the smoothed points. Visual, computational, and statistical issues of robust locally weighted regression are discussed. Several examples, including data on lead intoxication, are used to illustrate the methodology.

...read moreread less

10,225 citations

Journal Article•DOI•

Generalized Additive Models.

[...]

R. A. Brown, Trevor Hastie, Robert Tibshirani

01 Jun 1991-Biometrics

9,941 citations

Journal Article•DOI•

De-noising by soft-thresholding

[...]

David L. Donoho¹•Institutions (1)

Stanford University¹

01 May 1995-IEEE Transactions on Information Theory

TL;DR: The authors prove two results about this type of estimator that are unprecedented in several ways: with high probability f/spl circ/*/sub n/ is at least as smooth as f, in any of a wide variety of smoothness measures.

...read moreread less

Abstract: Donoho and Johnstone (1994) proposed a method for reconstructing an unknown function f on [0,1] from noisy data d/sub i/=f(t/sub i/)+/spl sigma/z/sub i/, i=0, ..., n-1,t/sub i/=i/n, where the z/sub i/ are independent and identically distributed standard Gaussian random variables. The reconstruction f/spl circ/*/sub n/ is defined in the wavelet domain by translating all the empirical wavelet coefficients of d toward 0 by an amount /spl sigma//spl middot//spl radic/(2log (n)/n). The authors prove two results about this type of estimator. [Smooth]: with high probability f/spl circ/*/sub n/ is at least as smooth as f, in any of a wide variety of smoothness measures. [Adapt]: the estimator comes nearly as close in mean square to f as any measurable estimator can come, uniformly over balls in each of two broad scales of smoothness classes. These two properties are unprecedented in several ways. The present proof of these results develops new facts about abstract statistical inference and its connection with an optimal recovery model. >

...read moreread less

9,359 citations

Journal Article•DOI•

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

[...]

Jianqing Fan¹, Runze Li¹•Institutions (1)

University of California, Los Angeles¹

01 Dec 2001-Journal of the American Statistical Association

TL;DR: In this article, penalized likelihood approaches are proposed to handle variable selection problems, and it is shown that the newly proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well if the correct submodel were known.

...read moreread less

Abstract: Variable selection is fundamental to high-dimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized likelihood approaches are proposed to handle these kinds of problems. The proposed methods select variables and estimate coefficients simultaneously. Hence they enable us to construct confidence intervals for estimated parameters. The proposed approaches are distinguished from others in that the penalty functions are symmetric, nonconcave on (0, ∞), and have singularities at the origin to produce sparse solutions. Furthermore, the penalty functions should be bounded by a constant to reduce bias and satisfy certain conditions to yield continuous solutions. A new algorithm is proposed for optimizing penalized likelihood functions. The proposed ideas are widely applicable. They are readily applied to a variety of ...

...read moreread less

8,314 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse