scispace - formally typeset
Search or ask a question
Topic

Resampling

About: Resampling is a research topic. Over the lifetime, 5428 publications have been published within this topic receiving 242291 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors investigated the asymptotic validity of bootstrap techniques to estimate the sampling distribution of the estimates of the kernel density estimates and showed that a straightforward application of a naive bootstrap yields invalid inferences.
Abstract: The problem of constructing bootstrap confidence intervals for the mode of a density is considered Estimates of the mode are derived from kernel density estimates based on fixed and data-dependent bandwidths The asymptotic validity of bootstrap techniques to estimate the sampling distribution of the estimates is investigated In summary, the results are negative in the sense that a straightforward application of a naive bootstrap yields invalid inferences In particular, the bootstrap fails if resampling is done from the kernel density estimate On the other hand, if one resamples from a smoother kernel density estimate (which is necessarily different from the one which yields the original estimate of the mode), the bootstrap is consistent The bootstrap also fails if resampling is done from the empirical distribution, unless the choice of bandwidth is suboptimal Similar results hold when applying bootstrap techniques to other functionals of a density

47 citations

Journal ArticleDOI
TL;DR: In this article, a nonparametric resampling technique is used to estimate the sampling variability for the target site, as well as for every site that is a potential member of the pooling group.
Abstract: [1] In recent years, catchment similarity measures based on flood seasonality have become popular alternatives for identifying hydrologically homogeneous pooling groups used in regional flood frequency analysis. Generally, flood seasonality pooling measures are less prone to errors and are more robust than measures based on flood magnitude data. However, they are also subject to estimation uncertainty resulting from sampling variability. Because of sampling variability, catchment similarity in flood seasonality can significantly deviate from the true similarity. Therefore sampling variability should be directly incorporated in the pooling algorithm to decrease the level of pooling uncertainty. This paper develops a new pooling approach that takes into consideration the sampling variability of flood seasonality measures used as pooling variables. A nonparametric resampling technique is used to estimate the sampling variability for the target site, as well as for every site that is a potential member of the pooling group for the target site. The variability is quantified by Mahalanobis distance ellipses. The similarity between the target site and the potential site is then assessed by finding the minimum confidence interval at which their Mahalanobis ellipses intersect. The confidence intervals can be related to regional homogeneity, which allows the target degree of regional homogeneity to be set in advance. The approach is applied to a large set of catchments from Great Britain, and its performance is compared with the performance of a previously used pooling technique based on the Euclidean distance. The results demonstrate that the proposed approach outperforms the previously used approach in terms of the overall homogeneity of delineated pooling groups in the study area.

47 citations

Journal ArticleDOI
TL;DR: The cluster sample technique, presented here in the context of a logistic dose-response model, incorporates many of the advantages of quasi-likelihood methods, are valid for any underlying nested correlation structure, and are adaptable to a variety of analytical settings.
Abstract: SUMMARY This paper presents a model-free approach for evaluating teratology and developmental toxicity data involving clustered binary responses. In teratology studies, a major statistical problem arises from the effect of intralitter correlation, or the potential for littermates to respond similarly. Some statistical methods impose strict distributional assumptions to account for extra-binomial variation, while others rely on nonparametric resampling and empirical variance estimation techniques. Quasi-likelihood methods and generalized estimating equations (GEE), which model the marginal mean/variance relationship, also avoid strict distributional assumptions. The proposed approach, often used to analyze complex sample survey data, is based on a first-order Taylor series approximation and a between-cluster variance estimation procedure, yielding consistent variance estimates for binomial-based proportions and regression coefficients from dose-response models. The cluster sample technique, presented here in the context of a logistic dose-response model, incorporates many of the advantages of quasi-likelihood methods, are valid for any underlying nested correlation structure, and are adaptable to a variety of analytical settings. The results of a simulation study show the cluster sample technique to be a viable competitor to GEE methods currently receiving attention.

47 citations

Journal ArticleDOI
01 Mar 2008-Robotica
TL;DR: In estimation errors, the compensation technique outperformed other resampling algorithms though its run-time was longer than those of others, and the most appropriate time to instigate compensation to reduce the run- time was also analyzed with the diminishing number of particles.
Abstract: The state-of-the-art FastSLAM algorithm has been shown to cause a particle depletion problem while performing simultaneous localization and mapping for mobile robots. As a result, it always produces over-confident estimates of uncertainty as time progresses. This particle depletion problem is mainly due to the resampling process in FastSLAM, which tends to eliminate particles with low weights. Therefore, the number of particles to conduct loop-closure decreases, which makes the performance of FastSLAM degenerate. The resampling process has not been thoroughly analyzed even though it is the main reason for the particle depletion problem. In this paper, standard resampling algorithms (systematic residual and partial resampling), a rank-based resampling adopting genetic algorithms are analyzed using computer simulations. Several performance measures such as the effective sample size, the number of distinct particles, estimation errors, and complexity are used for the thorough analysis of the resampling algorithms. Moreover, a new compensation technique is proposed instead of resampling to resolve the particle depletion problem in FastSLAM. In estimation errors, the compensation technique outperformed other resampling algorithms though its run-time was longer than those of others. The most appropriate time to instigate compensation to reduce the run-time was also analyzed with the diminishing number of particles.

47 citations

Journal ArticleDOI
TL;DR: Investigation and illustrate the effects of the resampling methods on the inner structure of a data set by exploiting local neighborhood information, identifying the sample types in both classes and analyzing their distribution in each resampled set, and results indicate that the resAmpling methods that produce the highest proportion of safe samples and the lowest proportion of unsafe samples correspond to those with the highest overall performance.
Abstract: Data plays a key role in the design of expert and intelligent systems and therefore, data preprocessing appears to be a critical step to produce high-quality data and build accurate machine learning models. Over the past decades, increasing attention has been paid towards the issue of class imbalance and this is now a research hotspot in a variety of fields. Although the resampling methods, either by under-sampling the majority class or by over-sampling the minority class, stand among the most powerful techniques to face this problem, their strengths and weaknesses have typically been discussed based only on the class imbalance ratio. However, several questions remain open and need further exploration. For instance, the subtle differences in performance between the over- and under-sampling algorithms are still under-comprehended, and we hypothesize that they could be better explained by analyzing the inner structure of the data sets. Consequently, this paper attempts to investigate and illustrate the effects of the resampling methods on the inner structure of a data set by exploiting local neighborhood information, identifying the sample types in both classes and analyzing their distribution in each resampled set. Experimental results indicate that the resampling methods that produce the highest proportion of safe samples and the lowest proportion of unsafe samples correspond to those with the highest overall performance. The significance of this paper lies in the fact that our findings may contribute to gain a better understanding of how these techniques perform on class-imbalanced data and why over-sampling has been reported to be usually more efficient than under-sampling. The outcomes in this study may have impact on both research and practice in the design of expert and intelligent systems since a priori knowledge about the internal structure of the imbalanced data sets could be incorporated to the learning algorithms.

47 citations


Network Information
Related Topics (5)
Estimator
97.3K papers, 2.6M citations
89% related
Inference
36.8K papers, 1.3M citations
87% related
Sampling (statistics)
65.3K papers, 1.2M citations
86% related
Regression analysis
31K papers, 1.7M citations
86% related
Markov chain
51.9K papers, 1.3M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20251
20242
2023377
2022759
2021275
2020279