scispace - formally typeset
Search or ask a question
Topic

Resampling

About: Resampling is a research topic. Over the lifetime, 5428 publications have been published within this topic receiving 242291 citations.


Papers
More filters
Proceedings ArticleDOI
06 Apr 2006
TL;DR: A permutation and a bootstrap resampling method is introduced for the nonparametric estimation of statistical significance of performance metrics when comparing two FROC curves.
Abstract: The ability to statistically compare the performance of two computer detection (CD) or computer-aided detection (CAD) algorithms is fundamental for the development and evaluation of medical image analysis tools. Automated detection tools for medical imaging are commonly characterized using free-response receiver operating characteristic (FROC) methods. However, few statistical tools are currently available to estimate statistical significance when comparing two FROC performance curves. In this study, we introduce a permutation and a bootstrap resampling method for the nonparametric estimation of statistical significance of performance metrics when comparing two FROC curves. We then provide an initial validation of the proposed methods using an area under the FROC performance metric and a simulation model for creating CD algorithm prompts. Validation is based on a comparison of the Type I error rate produced by two statistically identical CD algorithms. The results of 10/sup 4/ Monte Carlo trials show that both the permutation and bootstrap methods produced excellent estimates of the expected Type I error rate.

93 citations

Proceedings ArticleDOI
18 Jul 2010
TL;DR: It is demonstrated that it would be possible to obtain comparable classification results to the random oversampling results through two sets of efficient resampling methods which use 50% less amount of data and 75% less amounts of data, respectively, compared to the sizes of the datasets generated by the random Oversampling method.
Abstract: Random undersampling and oversampling are simple but well-known resampling methods applied to solve the problem of class imbalance. In this paper we show that the random oversampling method can produce better classification results than the random undersampling method, since the oversampling can increase the minority class recognition rate by sacrificing less amount of majority class recognition rate than the undersampling method. However, the random oversampling method would increase the computational cost associated with the SVM training largely due to the addition of new training examples. In this paper we present an investigation carried out to develop efficient resampling methods that can produce comparable classification results to the random oversampling results, but with the use of less amount of data. The main idea of the proposed methods is to first select the most informative data examples located closer to the class boundary region by using the separating hyperplane found by training an SVM model on the original imbalanced dataset, and then use only those examples in resampling. We demonstrate that it would be possible to obtain comparable classification results to the random oversampling results through two sets of efficient resampling methods which use 50% less amount of data and 75% less amount of data, respectively, compared to the sizes of the datasets generated by the random oversampling method.

93 citations

Journal ArticleDOI
TL;DR: In this article, the authors proposed novel resampling methods that may be directly applied to variance estimation, which consist of selecting subsamples under a completely different sampling scheme from that which generated the original sample, which is composed of several sampling designs.
Abstract: In complex designs, classical bootstrap methods result in a biased variance estimator when the sampling design is not taken into account. Resampled units are usually rescaled or weighted in order to achieve unbiasedness in the linear case. In the present article, we propose novel resampling methods that may be directly applied to variance estimation. These methods consist of selecting subsamples under a completely different sampling scheme from that which generated the original sample, which is composed of several sampling designs. In particular, a portion of the subsampled units is selected without replacement, while another is selected with replacement, thereby adjusting for the finite population setting. We show that these bootstrap estimators directly and precisely reproduce unbiased estimators of the variance in the linear case in a time-efficient manner, and eliminate the need for classical adjustment methods such as rescaling, correction factors, or artificial populations. Moreover, we show via sim...

92 citations

Journal ArticleDOI
TL;DR: A modification of generalized estimating equations (GEEs) methodology is proposed for hypothesis testing of high-dimensional data, with particular interest in multivariate abundance data in ecology, and it is shown via theory and simulation that this substantially improves the power of Wald statistics when cluster size is not small.
Abstract: Summary A modification of generalized estimating equations (GEEs) methodology is proposed for hypothesis testing of high-dimensional data, with particular interest in multivariate abundance data in ecology, an important application of interest in thousands of environmental science studies. Such data are typically counts characterized by high dimensionality (in the sense that cluster size exceeds number of clusters, n>K) and over-dispersion relative to the Poisson distribution. Usual GEE methods cannot be applied in this setting primarily because sandwich estimators become numerically unstable as n increases. We propose instead using a regularized sandwich estimator that assumes a common correlation matrix R, and shrinks the sample estimate of R toward the working correlation matrix to improve its numerical stability. It is shown via theory and simulation that this substantially improves the power of Wald statistics when cluster size is not small. We apply the proposed approach to study the effects of nutrient addition on nematode communities, and in doing so discuss important issues in implementation, such as using statistics that have good properties when parameter estimates approach the boundary (), and using resampling to enable valid inference that is robust to high dimensionality and to possible model misspecification.

92 citations

Journal ArticleDOI
TL;DR: Performance of prognostic models constructed using the lasso technique can be optimistic as well, although results of the internal validation are sensitive to how bootstrap resampling is performed.
Abstract: Background: In prognostic studies, the lasso technique is attractive since it improves the quality of predictions by shrinking regression coefficients, compared to predictions based on a model fitted via unpenalized maximum likelihood Since some coefficients are set to zero, parsimony is achieved as well It is unclear whether the performance of a model fitted using the lasso still shows some optimism Bootstrap methods have been advocated to quantify optimism and generalize model performance to new subjects It is unclear how resampling should be performed in the presence of multiply imputed data Method: The data were based on a cohort of Chronic Obstructive Pulmonary Disease patients We constructed models to predict Chronic Respiratory Questionnaire dyspnea 6 months ahead Optimism of the lasso model was investigated by comparing 4 approaches of handling multiply imputed data in the bootstrap procedure, using the study data and simulated data sets In the first 3 approaches, data sets that had been completed via multiple imputation (MI) were resampled, while the fourth approach resampled the incomplete data set and then performed MI Results: The discriminative model performance of the lasso was optimistic There was suboptimal calibration due to over-shrinkage The estimate of optimism was sensitive to the choice of handling imputed data in the bootstrap resampling procedure Resampling the completed data sets underestimates optimism, especially if, within a bootstrap step, selected individuals differ over the imputed data sets Incorporating the MI procedure in the validation yields estimates of optimism that are closer to the true value, albeit slightly too larger Conclusion: Performance of prognostic models constructed using the lasso technique can be optimistic as well Results of the internal validation are sensitive to how bootstrap resampling is performed

92 citations


Network Information
Related Topics (5)
Estimator
97.3K papers, 2.6M citations
89% related
Inference
36.8K papers, 1.3M citations
87% related
Sampling (statistics)
65.3K papers, 1.2M citations
86% related
Regression analysis
31K papers, 1.7M citations
86% related
Markov chain
51.9K papers, 1.3M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20251
20242
2023377
2022759
2021275
2020279