scispace - formally typeset
Search or ask a question
Topic

Resampling

About: Resampling is a research topic. Over the lifetime, 5428 publications have been published within this topic receiving 242291 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A statistical framework to evaluate VS studies by which the threshold to determine whether a ranking method is better than random ranking can be derived by bootstrap simulations and 2 ranking methods can be compared by permutation test is proposed.
Abstract: Receiver operating characteristic (ROC) curve is widely used to evaluate virtual screening (VS) studies. However, the method fails to address the "early recognition" problem specific to VS. Although many other metrics, such as RIE, BEDROC, and pROC that emphasize "early recognition" have been proposed, there are no rigorous statistical guidelines for determining the thresholds and performing significance tests. Also no comparisons have been made between these metrics under a statistical framework to better understand their performances. We have proposed a statistical framework to evaluate VS studies by which the threshold to determine whether a ranking method is better than random ranking can be derived by bootstrap simulations and 2 ranking methods can be compared by permutation test. We found that different metrics emphasize "early recognition" differently. BEDROC and RIE are 2 statistically equivalent metrics. Our newly proposed metric SLR is superior to pROC. Through extensive simulations, we observed a "seesaw effect" – overemphasizing early recognition reduces the statistical power of a metric to detect true early recognitions. The statistical framework developed and tested by us is applicable to any other metric as well, even if their exact distribution is unknown. Under this framework, a threshold can be easily selected according to a pre-specified type I error rate and statistical comparisons between 2 ranking methods becomes possible. The theoretical null distribution of SLR metric is available so that the threshold of SLR can be exactly determined without resorting to bootstrap simulations, which makes it easy to use in practical virtual screening studies.

85 citations

Journal ArticleDOI
TL;DR: In this paper, Beran's extension of the Kaplan-Meier estimator for the situation of right censored observations at fixed covariate values is studied and an almost sure asymptotic representation is established.
Abstract: We study Beran's extension of the Kaplan-Meier estimator for thesituation of right censored observations at fixed covariate values. Thisestimator for the conditional distribution function at a given value of thecovariate involves smoothing with Gasser-Muller weights. We establishan almost sure asymptotic representation which provides a key tool forobtaining central limit results. To avoid complicated estimation ofasymptotic bias and variance parameters, we propose a resampling methodwhich takes the covariate information into account. An asymptoticrepresentation for the bootstrapped estimator is proved and the strongconsistency of the bootstrap approximation to the conditional distributionfunction is obtained.

85 citations

Journal ArticleDOI
TL;DR: The proposed algorithm can maintain the diversity of particles thus avoid the sample impoverishment in particle filters, and can obtain the same estimation accuracy through less number of sample particles.
Abstract: In this correspondence, an improvement on resampling algorithm (also called the systematic resampling algorithm) of particle filters is presented. First, the resampling algorithm is analyzed from a new viewpoint and its defects are demonstrated. Then some exquisite work is introduced in order to overcome these defects such as comparing the weights of particles by stages and constructing the new particles based on quasi-Monte Carlo method, from which an exquisite resampling (ER) algorithm is derived. Compared to the resampling algorithm, the proposed algorithm can maintain the diversity of particles thus avoid the sample impoverishment in particle filters, and can obtain the same estimation accuracy through less number of sample particles. These advantages are finally verified by simulations of non-stationary growth model and a re-entry ballistic object tracking.

85 citations

Posted Content
TL;DR: In this paper, the authors introduce a new statistical quantity the energy to test whether two samples originate from the same distribution, which is a simple logarithmic function of the distances of the observations in the variate space.
Abstract: We introduce a new statistical quantity the energy to test whether two samples originate from the same distributions. The energy is a simple logarithmic function of the distances of the observations in the variate space. The distribution of the test statistic is determined by a resampling method. The power of the energy test in one dimension was studied for a variety of different test samples and compared to several nonparametric tests. In two and four dimensions a comparison was performed with the Friedman-Rafsky and nearest neighbor tests. The two-sample energy test was shown to be especially powerful in multidimensional applications.

84 citations

Journal ArticleDOI
TL;DR: A novel signed-rank test for clustered paired data is obtained using the general principle of within-cluster resampling and it is shown that only this test maintains the correct size under a null hypothesis of marginal symmetry compared to four existing signed rank tests.
Abstract: We consider the problem of comparing two outcome measures when the pairs are clustered. Using the general principle of within-cluster resampling, we obtain a novel signed-rank test for clustered paired data. We show by a simple informative cluster size simulation model that only our test maintains the correct size under a null hypothesis of marginal symmetry compared to four other existing signed rank tests; further, our test has adequate power when cluster size is noninformative. In general, cluster size is informative if the distribution of pair-wise differences within a cluster depends on the cluster size. An application of our method to testing radiation toxicity trend is presented.

84 citations


Network Information
Related Topics (5)
Estimator
97.3K papers, 2.6M citations
89% related
Inference
36.8K papers, 1.3M citations
87% related
Sampling (statistics)
65.3K papers, 1.2M citations
86% related
Regression analysis
31K papers, 1.7M citations
86% related
Markov chain
51.9K papers, 1.3M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20251
20242
2023377
2022759
2021275
2020279