Bootstrap Methods: Another Look at the Jackknife

doi:10.1214/AOS/1176344552

Home
/
Papers
/
Bootstrap Methods: Another Look at the Jackknife

Journal Article•DOI•

Bootstrap Methods: Another Look at the Jackknife

Bradley Efron¹•Institutions (1)

Stanford University¹

01 Jan 1979-Annals of Statistics (Institute of Mathematical Statistics)-Vol. 7, Iss: 1, pp 1-26

TL;DR: In this article, the authors discuss the problem of estimating the sampling distribution of a pre-specified random variable R(X, F) on the basis of the observed data x.

read less

Abstract: We discuss the following problem given a random sample X = (X 1, X 2,…, X n) from an unknown probability distribution F, estimate the sampling distribution of some prespecified random variable R(X, F), on the basis of the observed data x. (Standard jackknife theory gives an approximate mean and variance in the case R(X, F) = \(\theta \left( {\hat F} \right) - \theta \left( F \right)\), θ some parameter of interest.) A general method, called the “bootstrap”, is introduced, and shown to work satisfactorily on a variety of estimation problems. The jackknife is shown to be a linear approximation method for the bootstrap. The exposition proceeds by a series of examples: variance of the sample median, error rates in a linear discriminant analysis, ratio estimation, estimating regression parameters, etc.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Regression Shrinkage and Selection via the Lasso

[...]

Robert Tibshirani

01 Jan 1996-Journal of the royal statistical society series b-methodological

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

...read moreread less

40,785 citations

Journal Article•DOI•

Confidence limits on phylogenies: an approach using the bootstrap.

[...]

Joseph Felsenstein¹•Institutions (1)

University of Washington¹

01 Jul 1985-Evolution

TL;DR: The recently‐developed statistical method known as the “bootstrap” can be used to place confidence intervals on phylogenies and shows significant evidence for a group if it is defined by three or more characters.

...read moreread less

Abstract: The recently-developed statistical method known as the "bootstrap" can be used to place confidence intervals on phylogenies. It involves resampling points from one's own data, with replacement, to create a series of bootstrap samples of the same size as the original data. Each of these is analyzed, and the variation among the resulting estimates taken to indicate the size of the error involved in making estimates from the original data. In the case of phylogenies, it is argued that the proper method of resampling is to keep all of the original species while sampling characters with replacement, under the assumption that the characters have been independently drawn by the systematist and have evolved independently. Majority-rule consensus trees can be used to construct a phylogeny showing all of the inferred monophyletic groups that occurred in a majority of the bootstrap samples. If a group shows up 95% of the time or more, the evidence for it is taken to be statistically significant. Existing computer programs can be used to analyze different bootstrap samples by using weights on the characters, the weight of a character being how many times it was drawn in bootstrap sampling. When all characters are perfectly compatible, as envisioned by Hennig, bootstrap sampling becomes unnecessary; the bootstrap method would show significant evidence for a group if it is defined by three or more characters.

...read moreread less

40,349 citations

Cites methods from "Bootstrap Methods: Another Look at ..."

...An important recent statistical method is the bootstrap (Efron, 1979), a relative of the jackknife....
[...]

Journal Article•DOI•

Non-parametric multivariate analyses of changes in community structure

[...]

K. R. Clarke¹•Institutions (1)

Plymouth Marine Laboratory¹

01 Mar 1993-Austral Ecology

TL;DR: Which elements of this often-quoted strategy for graphical representation of multivariate (multi-species) abundance data have proved most useful in practical assessment of community change resulting from pollution impact are identified.

...read moreread less

Abstract: In the early 1980s, a strategy for graphical representation of multivariate (multi-species) abundance data was introduced into marine ecology by, among others, Field, et al. (1982). A decade on, it is instructive to: (i) identify which elements of this often-quoted strategy have proved most useful in practical assessment of community change resulting from pollution impact; and (ii) ask to what extent evolution of techniques in the intervening years has added self-consistency and comprehensiveness to the approach. The pivotal concept has proved to be that of a biologically-relevant definition of similarity of two samples, and its utilization mainly in simple rank form, for example ‘sample A is more similar to sample B than it is to sample C’. Statistical assumptions about the data are thus minimized and the resulting non-parametric techniques will be of very general applicability. From such a starting point, a unified framework needs to encompass: (i) the display of community patterns through clustering and ordination of samples; (ii) identification of species principally responsible for determining sample groupings; (iii) statistical tests for differences in space and time (multivariate analogues of analysis of variance, based on rank similarities); and (iv) the linking of community differences to patterns in the physical and chemical environment (the latter also dictated by rank similarities between samples). Techniques are described that bring such a framework into place, and areas in which problems remain are identified. Accumulated practical experience with these methods is discussed, in particular applications to marine benthos, and it is concluded that they have much to offer practitioners of environmental impact studies on communities.

...read moreread less

12,446 citations

Journal Article•DOI•

A concordance correlation coefficient to evaluate reproducibility.

[...]

Lawrence I-Kuei Lin¹•Institutions (1)

Baxter International¹

01 Mar 1989-Biometrics

TL;DR: A new reproducibility index is developed and studied that is simple to use and possesses desirable properties and the statistical properties of this estimate can be satisfactorily evaluated using an inverse hyperbolic tangent transformation.

...read moreread less

Abstract: A new reproducibility index is developed and studied. This index is the correlation between the two readings that fall on the 45 degree line through the origin. It is simple to use and possesses desirable properties. The statistical properties of this estimate can be satisfactorily evaluated using an inverse hyperbolic tangent transformation. A Monte Carlo experiment with 5,000 runs was performed to confirm the estimate's validity. An application using actual data is given.

...read moreread less

6,916 citations

Journal Article•DOI•

Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy

[...]

Bradley Efron, Robert Tibshirani

01 Feb 1986-Statistical Science

TL;DR: The bootstrap is extended to other measures of statistical accuracy such as bias and prediction error, and to complicated data structures such as time series, censored data, and regression models.

...read moreread less

Abstract: This is a review of bootstrap methods, concentrating on basic ideas and applications rather than theoretical considerations. It begins with an exposition of the bootstrap estimate of standard error for one-sample situations. Several examples, some involving quite complicated statistical procedures, are given. The bootstrap is then extended to other measures of statistical accuracy such as bias and prediction error, and to complicated data structures such as time series, censored data, and regression models. Several more examples are presented illustrating these ideas. The last third of the paper deals mainly with bootstrap confidence intervals.

...read moreread less

5,894 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•

An Introduction to Multivariate Statistical Analysis

[...]

T. W. Anderson

14 Sep 1984

TL;DR: In this article, the distribution of the Mean Vector and the Covariance Matrix and the Generalized T2-Statistic is analyzed. But the distribution is not shown to be independent of sets of Variates.

...read moreread less

Abstract: Preface to the Third Edition.Preface to the Second Edition.Preface to the First Edition.1. Introduction.2. The Multivariate Normal Distribution.3. Estimation of the Mean Vector and the Covariance Matrix.4. The Distributions and Uses of Sample Correlation Coefficients.5. The Generalized T2-Statistic.6. Classification of Observations.7. The Distribution of the Sample Covariance Matrix and the Sample Generalized Variance.8. Testing the General Linear Hypothesis: Multivariate Analysis of Variance9. Testing Independence of Sets of Variates.10. Testing Hypotheses of Equality of Covariance Matrices and Equality of Mean Vectors and Covariance Matrices.11. Principal Components.12. Cononical Correlations and Cononical Variables.13. The Distributions of Characteristic Roots and Vectors.14. Factor Analysis.15. Pattern of Dependence Graphical Models.Appendix A: Matrix Theory.Appendix B: Tables.References.Index.

...read moreread less

9,693 citations

Journal Article•DOI•

Introduction to Multivariate Statistical Analysis.

[...]

William G. Madow, T. W. Anderson

01 May 1959-American Mathematical Monthly

7,408 citations

Journal Article•DOI•

The jackknife-a review

[...]

Rupert G. Miller¹•Institutions (1)

Stanford University¹

01 Apr 1974-Biometrika

TL;DR: In this paper, a review of the literature on the use of the jackknife technique in bias reduction and robust interval estimation is presented, and speculations and suggestions about future research are made.

...read moreread less

Abstract: SUMMARY Research on the jackknife technique since its introduction by Quenouille and Tukey is reviewed. Both its role in bias reduction and in robust interval estimation are treated. Some speculations and suggestions about future research are made. The bibliography attempts to include all published work on jackknife methodology.

...read moreread less

1,620 citations

Journal Article•DOI•

Estimation of Error Rates in Discriminant Analysis

[...]

Peter A. Lachenbruch¹, M. Ray Mickey¹•Institutions (1)

University of California, Los Angeles¹

01 Feb 1968-Technometrics

TL;DR: In this article, several methods of estimating error rates in discriminant analysis are evaluated by sampling methods, and two methods in most common use are found to be significantly poorer than some new methods that are proposed.

...read moreread less

Abstract: Several methods of estimating error rates in Discriminant Analysis are evaluated by sampling methods. Multivariate normal samples are generated on a computer which have various true probabilities of misclassification for different combinations of sample sizes and different numbers of parameters. The two methods in most common use are found to be significantly poorer than some new methods that are proposed.

...read moreread less

1,513 citations

Journal Article•DOI•

Bibliography on estimation of misclassification

[...]

Godfried T. Toussaint

01 Jul 1974-IEEE Transactions on Information Theory

TL;DR: Articles, books, and technical reports on the theoretical and experimental estimation of probability of misclassification are listed for the case of correctly labeled or preclassified training data.

...read moreread less

Abstract: Articles, books, and technical reports on the theoretical and experimental estimation of probability of misclassification are listed for the case of correctly labeled or preclassified training data. By way of introduction, the problem of estimating the probability of misclassification is discussed in order to characterize the contributions of the literature.

...read moreread less

325 citations