scispace - formally typeset
Search or ask a question
Topic

Resampling

About: Resampling is a research topic. Over the lifetime, 5428 publications have been published within this topic receiving 242291 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A diversified sensitivity-based undersampling method that yields a good generalization capability for 14 UCI datasets by iteratively clustering and sampling a balanced set of samples yielding high classifier sensitivity.
Abstract: Undersampling is a widely adopted method to deal with imbalance pattern classification problems. Current methods mainly depend on either random resampling on the majority class or resampling at the decision boundary. Random-based undersampling fails to take into consideration informative samples in the data while resampling at the decision boundary is sensitive to class overlapping. Both techniques ignore the distribution information of the training dataset. In this paper, we propose a diversified sensitivity-based undersampling method. Samples of the majority class are clustered to capture the distribution information and enhance the diversity of the resampling. A stochastic sensitivity measure is applied to select samples from both clusters of the majority class and the minority class. By iteratively clustering and sampling, a balanced set of samples yielding high classifier sensitivity is selected. The proposed method yields a good generalization capability for 14 UCI datasets.

195 citations

Journal ArticleDOI
TL;DR: In this article, an empirical test based on the method of convolutions was proposed for assessing the statistical significance between approximate empirical distributions created by resampling techniques. But, the empirical test was not applied to the case of dichotomous choice contingent valuation data.
Abstract: Resampling or simulation techniques are now frequently used in applied economic analyses. However, significance tests for differences between empirical distributions have either invoked normality assumptions or have used nonoverlapping confidence interval criteria. We demonstrate that such methods generally will not be appropriate, and we present an empirical test, based on the method of convolutions, for assessing the statistical significance between approximate empirical distributions created by resampling techniques. The proposed convolutions approach is illustrated in a case study involving empirical distributions from dichotomous choice contingent valuation data.

194 citations

Journal ArticleDOI
TL;DR: An accurate and efficient method for multiple testing correction in genome-wide association studies—SLIDE, which accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution.
Abstract: With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies—SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu.

194 citations

Posted Content
TL;DR: Bootstrap as mentioned in this paper is a method for estimating the distribution of an estimator or test statistic by resampling one's data, which can be used to substitute computation for mathematical analysis if calculating the asymptotic distribution of a statistic or estimator is difficult.
Abstract: The bootstrap is a method for estimating the distribution of an estimator or test statistic by resampling one's data. It amounts to treating the data as if they were the population for the purpose of evaluating the distribution of interest. Under mild regularity conditions, the bootstrap yields an approximation to the distribution of an estimator or test statistic that is at least as accurate as the approximation obtained from first-order asymptotic theory. Thus, the bootstrap provides a way to substitute computation for mathematical analysis if calculating the asymptotic distribution of an estimator or statistic is difficult. The maximum score estimator Manski (1975, 1985), the statistic developed by Ha..rdle et al. (1991) for testing positive- definiteness of income-effect matrices, and certain functions of time- series data (Blanchard and Quah 1989, Runkle 1987, West 1990) are examples in which evaluating the asymptotic distribution is difficult and bootstrapping has been used as an alternative.1 In fact, the bootstrap is often more accurate in finite samples than first-order asymptotic approximations but does not entail the algebraic complexity of higher-order expansions. Thus, it can provide a practical method for improving upon first-order approximations. First-order asymptotic theory often gives a poor approximation to the distributions of test statistics with the sample sizes available in applications. As a result, the nominal levels of tests based on asymptotic critical values can be very different from the true levels. The information matrix test of White(1982) is a well-known example of a test in which large finite- sample distortions of level can occur when asymptotic critical values are used (Horowitz 1994, Kennan and Neumann 1988, Orme 1990, Taylor 1987). Other illustrations are given later in this chapter. The bootstrap often provides a tractable way to reduce or eliminate finite- sample distortions of the levels of statistical tests.

190 citations

Journal ArticleDOI
TL;DR: A canonical large deviations criterion for optimality is considered and it is shown that inference based on the empirical likelihood ratio statistic is optimal and a new empirical likelihood bootstrap is introduced that provides a valid resampling method for moment inequality models and overcomes the implementation challenges that arise as a result of non-pivotal limit distributions.

190 citations


Network Information
Related Topics (5)
Estimator
97.3K papers, 2.6M citations
89% related
Inference
36.8K papers, 1.3M citations
87% related
Sampling (statistics)
65.3K papers, 1.2M citations
86% related
Regression analysis
31K papers, 1.7M citations
86% related
Markov chain
51.9K papers, 1.3M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20251
20242
2023377
2022759
2021275
2020279