scispace - formally typeset
Search or ask a question
Topic

Resampling

About: Resampling is a research topic. Over the lifetime, 5428 publications have been published within this topic receiving 242291 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A non-parametric algorithm for determining an optimal splitting proportion that can be applied with a specific dataset and classifier algorithm is developed and applied to any dataset, using any predictor development method, to determine the best split.
Abstract: We consider the problem of designing a study to develop a predictive classifier from high dimensional data. A common study design is to split the sample into a training set and an independent test set, where the former is used to develop the classifier and the latter to evaluate its performance. In this paper we address the question of what proportion of the samples should be devoted to the training set. How does this proportion impact the mean squared error (MSE) of the prediction accuracy estimate? We develop a non-parametric algorithm for determining an optimal splitting proportion that can be applied with a specific dataset and classifier algorithm. We also perform a broad simulation study for the purpose of better understanding the factors that determine the best split proportions and to evaluate commonly used splitting strategies (1/2 training or 2/3 training) under a wide variety of conditions. These methods are based on a decomposition of the MSE into three intuitive component parts. By applying these approaches to a number of synthetic and real microarray datasets we show that for linear classifiers the optimal proportion depends on the overall number of samples available and the degree of differential expression between the classes. The optimal proportion was found to depend on the full dataset size (n) and classification accuracy - with higher accuracy and smaller n resulting in more assigned to the training set. The commonly used strategy of allocating 2/3rd of cases for training was close to optimal for reasonable sized datasets (n ≥ 100) with strong signals (i.e. 85% or greater full dataset accuracy). In general, we recommend use of our nonparametric resampling approach for determing the optimal split. This approach can be applied to any dataset, using any predictor development method, to determine the best split.

252 citations

Journal ArticleDOI
TL;DR: A systematic review of the modern way of assessing risk prediction models using methods derived from ROC methodology and from probability forecasting theory to compare measures of predictive performance.
Abstract: For medical decision making and patient information, predictions of future status variables play an important role. Risk prediction models can be derived with many different statistical approaches. To compare them, measures of predictive performance are derived from ROC methodology and from probability forecasting theory. These tools can be applied to assess single markers, multivariable regression models and complex model selection algorithms. This article provides a systematic review of the modern way of assessing risk prediction models. Particular attention is put on proper benchmarks and resampling techniques that are important for the interpretation of measured performance. All methods are illustrated with data from a clinical study in head and neck cancer patients.

249 citations

Journal ArticleDOI
TL;DR: Newly developed resampling algorithms for particle filters suitable for real-time implementation that reduce the complexity of both hardware and DSP realization through addressing common issues such as decreasing the number of operations and memory access are described.
Abstract: Newly developed resampling algorithms for particle filters suitable for real-time implementation are described and their analysis is presented. The new algorithms reduce the complexity of both hardware and DSP realization through addressing common issues such as decreasing the number of operations and memory access. Moreover, the algorithms allow for use of higher sampling frequencies by overlapping in time the resampling step with the other particle filtering steps. Since resampling is not dependent on any particular application, the analysis is appropriate for all types of particle filters that use resampling. The performance of the algorithms is evaluated on particle filters applied to bearings-only tracking and joint detection and estimation in wireless communications. We have demonstrated that the proposed algorithms reduce the complexity without performance degradation.

248 citations

Book
29 Jul 1992
TL;DR: In this article, the authors consider the application of the bootstrap to the estimation of smooth functionals, non-parametric curve estimation, and to linear models, and investigate the conditions under which the bootstraps works satisfactorily.
Abstract: Bootstrap methods are procedures for estimating or approximating the distribution of a statistic based on ideas from resampling and simulation methods. This volume is concerned with the asymptotic behaviour of the bootstrap and investigates the conditions under which the bootstrap works satisfactorily. In particular, the author considers the application of the bootstrap to the estimation of smooth functionals, non-parametric curve estimation, and to linear models. Readers are assumed to have a working familiarity with the basics of bootstrap methods.

246 citations

Journal ArticleDOI
01 Dec 1987-Ecology
TL;DR: A distribution-free approach to the detection of density-dependence in the variation of population abundance, measured by a series of annual censuses, is reported, which shows that the randomization test is effective whether or not there is a marked trend in the observed data.
Abstract: We report a distribution-free approach to the detection of density-depen- dence in the variation of population abundance, measured by a series of annual censuses. The method uses the correlation coefficient between the observed population changes and population size and proposes a randomization procedure to define a rejection region for the hypothesis of density-independence. It is shown that the use of the proposed statistic under the randomization approach is equivalent to the likelihood ratio test for a particular family of time series models. The randomization test is compared with two other recently proposed tests. Using computer-generated density-independent and density-dependent data, it is shown that, unlike the other tests, the randomization test is effective whether or not there is a marked trend in the observed data. Arguments are presented showing how one of the other two tests can be further improved. Caution is urged in the use and interpretation of any test for detecting density-depen- dence in census data because (a) the tests depend on assumptions about population pro- cesses, (b) errors of measurement may lead to spurious detection of density-dependence.

244 citations


Network Information
Related Topics (5)
Estimator
97.3K papers, 2.6M citations
89% related
Inference
36.8K papers, 1.3M citations
87% related
Sampling (statistics)
65.3K papers, 1.2M citations
86% related
Regression analysis
31K papers, 1.7M citations
86% related
Markov chain
51.9K papers, 1.3M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20251
20242
2023377
2022759
2021275
2020279