Topic
Resampling
About: Resampling is a research topic. Over the lifetime, 5428 publications have been published within this topic receiving 242291 citations.
Papers published on a yearly basis
Papers
More filters
••
06 Dec 2009TL;DR: In this paper, the authors explore the framework of permutation-based p-values for assessing the behavior of the classification error and study two simple permutation tests: the first test estimates the null distribution by permuting the labels in the data; this has been used extensively in classification problems in computational biology and the second test produces permutations of the features within classes, inspired by restricted randomization techniques traditionally used in statistics.
Abstract: We explore the framework of permutation-based p-values for assessing the behavior of the classification error. In this paper we study two simple permutation tests. The first test estimates the null distribution by permuting the labels in the data; this has been used extensively in classification problems in computational biology. The second test produces permutations of the features within classes, inspired by restricted randomization techniques traditionally used in statistics. We study the properties of these tests and present an extensive empirical evaluation on real and synthetic data. Our analysis shows that studying the classification error via permutation tests is effective; in particular, the restricted permutation test clearly reveals whether the classifier exploits the interdependency between the features in the data.
392 citations
••
TL;DR: In this paper, a broad class of rank-based monotone estimating functions is developed for the semiparametric accelerated failure time model with censored observations, which are shown to be consistent and asymptotically normal.
Abstract: SUMMARY A broad class of rank-based monotone estimating functions is developed for the semiparametric accelerated failure time model with censored observations. The corresponding estimators can be obtained via linear programming, and are shown to be consistent and asymptotically normal. The limiting covariance matrices can be estimated by a resampling technique, which does not involve nonparametric density estimation or numerical derivatives. The new estimators represent consistent roots of the non-monotone estimating equations based on the familiar weighted log-rank statistics. Simulation studies demonstrate that the proposed methods perform well in practical settings. Two real examples are provided.
382 citations
•
20 Feb 2013
TL;DR: This Second edition is a practical guide to data analysis using the bootstrap, cross-validation, and permutation tests and is an essential resource for industrial statisticians, statistical consultants, and research professionals in science, engineering, and technology.
Abstract: The goal of this book is to introduce statistical methodology-estimation, hypothesis, testing and classification-to a wide applied audience through resampling from existing data via the bootstrap, and estimation or cross-validation methods. The book provides an accessible introduction and practical guide to the power, simplicity and veritability of the bootstrap, cross-validation and permutation tests. Industrial statistical consultants, professionals and researchers will find the book's methods and software imimediately helpful. (unvollstandig)) This Second edition is a practical guide to data analysis using the bootstrap, cross-validation, and permutation tests. It is an essential resource for industrial statisticians, statistical consultants, and research professionals in science, engineering, and technology. Only requiring minimal mathematics beyond algebra, it provides a table-free introduction to data analysis utilizing numerous exercizes, practical data sets, and freely available statistical shareware. Topics and features: *Thoroughly revised text features more practical examples plus an additional chapter devoted to regression and data mining techniques and their limitations *Uses resampling approach to introduction statistics *A Practical presentation that covers all three sampling methods - bootstrap, density-estimation, and permutations *Includes systematic guide to help one select correct procedure for a particular application *Detailed coverage of all three statistical methodologies - classification, estimation, and hypothesis testing *Suitable for classroom use and individual, self-study purposes *Numerous practical examples using popular computer programs such as SAS, Stata, and StatXact *Useful appendices with computer programs and code to develop own methods *Downloadable freeware from author's website: http://users.oco.net/drphilgood/resamp.htm With its accessable style and intuitive topic development, the book is an excellent basic resource and guide to the power, simplicity and versatility of bootstrap, cross-validation and permutation tests. Students, professionals, and researchers will find it a particularly useful guide to modern resampling methods and their applications.
376 citations
••
TL;DR: The proposed algorithms improve the scalability of the filter architectures affected by the resampling process and reduce communication through the interconnection network is reduced and made deterministic, which results in simpler network structure and increased sampling frequency.
Abstract: In this paper, we propose novel resampling algorithms with architectures for efficient distributed implementation of particle filters. The proposed algorithms improve the scalability of the filter architectures affected by the resampling process. Problems in the particle filter implementation due to resampling are described, and appropriate modifications of the resampling algorithms are proposed so that distributed implementations are developed and studied. Distributed resampling algorithms with proportional allocation (RPA) and nonproportional allocation (RNA) of particles are considered. The components of the filter architectures are the processing elements (PEs), a central unit (CU), and an interconnection network. One of the main advantages of the new resampling algorithms is that communication through the interconnection network is reduced and made deterministic, which results in simpler network structure and increased sampling frequency. Particle filter performances are estimated for the bearings-only tracking applications. In the architectural part of the analysis, the area and speed of the particle filter implementation are estimated for a different number of particles and a different level of parallelism with field programmable gate array (FPGA) implementation. In this paper, only sampling importance resampling (SIR) particle filters are considered, but the analysis can be extended to any particle filters with resampling.
360 citations
••
TL;DR: In this paper, the authors compared the performance of a variety of approaches for assessing the significance of eigenvector coefficients in terms of type I error rates and power, and two novel approaches based on the broken-stick model were also evaluated.
Abstract: Principal component analysis (PCA) is one of the most commonly used tools in the analysis of ecological data. This method reduces the effective dimensionality of a multivariate data set by producing linear combinations of the original variables (i.e., com- ponents) that summarize the predominant patterns in the data. In order to provide meaningful interpretations for principal components, it is important to determine which variables are associated with particular components. Some data analysts incorrectly test the statistical significance of the correlation between original variables and multivariate scores using standard statistical tables. Others interpret eigenvector coefficients larger than an arbitrary absolute value (e.g., 0.50). Resampling, randomization techniques, and parallel analysis have been applied in a few cases. In this study, we compared the performance of a variety of approaches for assessing the significance of eigenvector coefficients in terms of type I error rates and power. Two novel approaches based on the broken-stick model were also evaluated. We used a variety of simulated scenarios to examine the influence of the number of real dimensions in the data; unique versus complex variables; the magnitude of eigen- vector coefficients; and the number of variables associated with a particular dimension. Our results revealed that bootstrap confidence intervals and a modified bootstrap confidence interval for the broken-stick model proved to be the most reliable techniques.
357 citations