scispace - formally typeset
Search or ask a question
Topic

Resampling

About: Resampling is a research topic. Over the lifetime, 5428 publications have been published within this topic receiving 242291 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper compares three multiple testing procedures in the microarray context: the original Bonferroni method, a Bonferronsi-type improved single-step method and a step-down method based on nonparametric resampling, by which the null distribution can be derived with the dependency structure among gene expressions preserved and the family-wise error rate accurately controlled at the desired level.
Abstract: Microarray technology is rapidly emerging for genome-wide screening of differentially expressed genes between clinical subtypes or different conditions of human diseases. Traditional statistical testing approaches, such as the two-sample t-test or Wilcoxon test, are frequently used for evaluating statistical significance of informative expressions but require adjustment for large-scale multiplicity. Due to its simplicity, Bonferroni adjustment has been widely used to circumvent this problem. It is well known, however, that the standard Bonferroni test is often very conservative. In the present paper, we compare three multiple testing procedures in the microarray context: the original Bonferroni method, a Bonferroni-type improved single-step method and a step-down method. The latter two methods are based on nonparametric resampling, by which the null distribution can be derived with the dependency structure among gene expressions preserved and the family-wise error rate accurately controlled at the desired level. We also present a sample size calculation method for designing microarray studies. Through simulations and data analyses, we find that the proposed methods for testing and sample size calculation are computationally fast and control error and power precisely.

81 citations

01 Jan 2002
TL;DR: Bootstrapping is a general approach to statistical inference based on building a sampling distribution for a statistic by resampling from the data at hand as mentioned in this paper, which has been shown to be sound.
Abstract: Bootstrapping is a general approach to statistical inference based on building a sampling distribution for a statistic by resampling from the data at hand. The term ‘bootstrapping,’ due to Efron (1979), is an allusion to the expression ‘pulling oneself up by one’s bootstraps’ – in this case, using the sample data as a population from which repeated samples are drawn. At first blush, the approach seems circular, but has been shown to be sound. Two S libraries for bootstrapping are associated with extensive treatments of the subject: Efron and Tibshirani’s (1993) bootstrap library, and Davison and Hinkley’s (1997) boot library. Of the two, boot, programmed by A. J. Canty, is somewhat more capable, and will be used for the examples in this appendix. There are several forms of the bootstrap, and, additionally, several other resampling methods that are related to it, such as jackknifing, cross-validation, randomization tests ,a ndpermutation tests. I will stress the nonparametric bootstrap. Suppose that we draw a sample S = {X1 ,X 2, ..., Xn} from a population P = {x1 ,x 2, ..., xN } ;i magine further, at least for the time being, that N is very much larger than n ,a nd thatS is either a simple random sample or an independent random sample from P; 1 I will briefly consider other sampling schemes at the end of the appendix. It will also help initially to think of the elements of the population (and, hence, of the sample) as scalar values, but they could just as easily be vectors (i.e., multivariate). Now suppose that we are interested in some statistic T = t(S) as an estimate of the corresponding population parameter θ = t(P). Again, θ could be a vector of parameters and T the corresponding vector of estimates, but for simplicity assume that θ is a scalar. A traditional approach to statistical inference is to make assumptions about the structure of the population (e.g., an assumption of normality), and, along with the stipulation of random sampling, to use these assumptions to derive the sampling distribution of T , on which classical inference is based. In certain instances, the exact distribution of T may be intractable, and so we instead derive its asymptotic distribution. This familiar approach has two potentially important deficiencies:

81 citations

Journal ArticleDOI
TL;DR: In this article, the authors compare the uncertainty in the solution stemming from the data splitting with neural-network specific uncertainties (parameter initialization, choice of number of hidden units, etc.).
Abstract: Exposes problems of the commonly used technique of splitting the available data into training, validation, and test sets that are held fixed, warns about drawing too strong conclusions from such static splits, and shows potential pitfalls of ignoring variability across splits. Using a bootstrap or resampling method, we compare the uncertainty in the solution stemming from the data splitting with neural-network specific uncertainties (parameter initialization, choice of number of hidden units, etc.). We present two results on data from the New York Stock Exchange. First, the variation due to different resamplings is significantly larger than the variation due to different network conditions. This result implies that it is important to not over-interpret a model (or an ensemble of models) estimated on one specific split of the data. Second, on each split, the neural-network solution with early stopping is very close to a linear model; no significant nonlinearities are extracted.

81 citations

Journal ArticleDOI
TL;DR: It is shown that, provided block length or leave-out number, respectively, are chosen appropriately, both techniques produce first-order optimal bandwidths, and the block bootstrap has far better empirical properties, particularly under long-range dependence.
Abstract: We analyse methods based on the block bootstrap and leave-out cross-validation, for choosing the bandwidth in nonparametric regression when errors have an almost arbitrarily long range of dependence. A novel analytical device for modelling the dependence structure of errors is introduced. This allows a concise theoretical description of the way in which the range of dependence affects optimal bandwidth choice. It is shown that, provided block length or leave-out number, respectively, are chosen appropriately, both techniques produce first-order optimal bandwidths. Nevertheless, the block bootstrap has far better empirical properties, particularly under long-range dependence.

81 citations

Journal ArticleDOI
TL;DR: The method is probabilistic, based on bootstrap resampling, and suggests it is more reliable than other available methods in recovering the true intrinsic dimensionality in metric ordination of a sample.
Abstract: A method is described to determine the number of significant dimensions in metric ordination of a sample. The method is probabilistic, based on bootstrap resampling. An iterative algorithm takes bootstrap samples with replacement from the sample. It finds in each bootstrap sample ordination coordinates and computes, after Procrustean adjustments, the correlation between observed and bootstrap ordination scores. It compares this correlation to the same parameter generated in a parallel bootstrapped ordination of randomly permuted data, which upon many iterations will generate a probability. The method is assessed in principal coordinates analysis of simu- lated data sets that have varying number of variables and correlation levels, uniform or patterned correlation structure. The results suggest the method is more reliable than other available methods in recovering the true intrinsic dimen- sionality. Examples with grassland data illustrate utility.

81 citations


Network Information
Related Topics (5)
Estimator
97.3K papers, 2.6M citations
89% related
Inference
36.8K papers, 1.3M citations
87% related
Sampling (statistics)
65.3K papers, 1.2M citations
86% related
Regression analysis
31K papers, 1.7M citations
86% related
Markov chain
51.9K papers, 1.3M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20251
20242
2023377
2022759
2021275
2020279