scispace - formally typeset
Search or ask a question

Showing papers on "Resampling published in 2013"


Journal ArticleDOI
TL;DR: This work proposes an ultrafast bootstrap approximation approach (UFBoot) to compute the support of phylogenetic groups in maximum likelihood (ML) based trees and offers an efficient and easy-to-use software to perform the UFBoot analysis with ML tree inference.
Abstract: Nonparametric bootstrap has been a widely used tool in phylogenetic analysis to assess the clade support of phylogenetic trees. However, with the rapidly growing amount of data, this task remains a computational bottleneck. Recently, approximation methods such as the RAxML rapid bootstrap (RBS) and the Shimodaira-Hasegawa-like approximate likelihood ratio test have been introduced to speed up the bootstrap. Here, we suggest an ultrafast bootstrap approximation approach (UFBoot) to compute the support of phylogenetic groups in maximum likelihood (ML) based trees. To achieve this, we combine the resampling estimated log-likelihood method with a simple but effective collection scheme of candidate trees. We also propose a stopping rule that assesses the convergence of branch support values to automatically determine when to stop collecting candidate trees. UFBoot achieves a median speed up of 3.1 (range: 0.66-33.3) to 10.2 (range: 1.32-41.4) compared with RAxML RBS for real DNA and amino acid alignments, respectively. Moreover, our extensive simulations show that UFBoot is robust against moderate model violations and the support values obtained appear to be relatively unbiased compared with the conservative standard bootstrap. This provides a more direct interpretation of the bootstrap support. We offer an efficient and easy-to-use software (available at http://www.cibiv.at/software/iqtree) to perform the UFBoot analysis with ML tree inference.

2,469 citations


Journal ArticleDOI
TL;DR: A simple, non-parametric method with resampling to account for the different sequencing depths is introduced, and it is found that the method discovers more consistent patterns than competing methods.
Abstract: We discuss the identification of features that are associated with an outcome in RNA-Sequencing (RNA-Seq) and other sequencing-based comparative genomic experiments. RNA-Seq data takes the form of counts, so models based on the normal distribution are generally unsuitable. The problem is especially challenging because different sequencing experiments may generate quite different total numbers of reads, or 'sequencing depths'. Existing methods for this problem are based on Poisson or negative binomial models: they are useful but can be heavily influenced by 'outliers' in the data. We introduce a simple, non-parametric method with resampling to account for the different sequencing depths. The new method is more robust than parametric methods. It can be applied to data with quantitative, survival, two-class or multiple-class outcomes. We compare our proposed method to Poisson and negative binomial-based methods in simulated and real data sets, and find that our method discovers more consistent patterns than competing methods.

431 citations


Book
20 Feb 2013
TL;DR: This Second edition is a practical guide to data analysis using the bootstrap, cross-validation, and permutation tests and is an essential resource for industrial statisticians, statistical consultants, and research professionals in science, engineering, and technology.
Abstract: The goal of this book is to introduce statistical methodology-estimation, hypothesis, testing and classification-to a wide applied audience through resampling from existing data via the bootstrap, and estimation or cross-validation methods. The book provides an accessible introduction and practical guide to the power, simplicity and veritability of the bootstrap, cross-validation and permutation tests. Industrial statistical consultants, professionals and researchers will find the book's methods and software imimediately helpful. (unvollstandig)) This Second edition is a practical guide to data analysis using the bootstrap, cross-validation, and permutation tests. It is an essential resource for industrial statisticians, statistical consultants, and research professionals in science, engineering, and technology. Only requiring minimal mathematics beyond algebra, it provides a table-free introduction to data analysis utilizing numerous exercizes, practical data sets, and freely available statistical shareware. Topics and features: *Thoroughly revised text features more practical examples plus an additional chapter devoted to regression and data mining techniques and their limitations *Uses resampling approach to introduction statistics *A Practical presentation that covers all three sampling methods - bootstrap, density-estimation, and permutations *Includes systematic guide to help one select correct procedure for a particular application *Detailed coverage of all three statistical methodologies - classification, estimation, and hypothesis testing *Suitable for classroom use and individual, self-study purposes *Numerous practical examples using popular computer programs such as SAS, Stata, and StatXact *Useful appendices with computer programs and code to develop own methods *Downloadable freeware from author's website: http://users.oco.net/drphilgood/resamp.htm With its accessable style and intuitive topic development, the book is an excellent basic resource and guide to the power, simplicity and versatility of bootstrap, cross-validation and permutation tests. Students, professionals, and researchers will find it a particularly useful guide to modern resampling methods and their applications.

376 citations


BookDOI
01 Jan 2013
TL;DR: In this paper, the origins and generation of long memory are discussed and a definition of Long Memory is proposed. But this definition does not address the problem of long-term memory.
Abstract: Definition of Long Memory.- Origins and Generation of Long Memory.- Mathematical Concepts.- Limit Theorems.- Statistical Inference for Stationary Processes.- Statistical Inference for Nonlinear Processes.- Statistical Inference for Nonstationary Processes.- Forecasting.- Spatial and Space-Time Processes.- Resampling.- Function Spaces.- Regularly Varying Functions.- Vague Convergence.- Some Useful Integrals.- Notation and Abbreviations.- References. ai

352 citations


Journal ArticleDOI
TL;DR: Investigation of the suitability and performance of several resampling techniques when applied in conjunction with statistical and artificial intelligence prediction models over five real-world credit data sets, which have artificially been modified to derive different imbalance ratios.
Abstract: In real-life credit scoring applications, the case in which the class of defaulters is under-represented in comparison with the class of non-defaulters is a very common situation, but it ha...

139 citations


Journal ArticleDOI
TL;DR: This paper proposes an efficient L1 tracker, named bounded particle resampling (BPR-L1), with a minimum error bound and occlusion detection, and demonstrates an excellent performance as compared with nine state-of-the-art trackers on eleven challenging benchmark sequences.
Abstract: Recently, sparse representation has been applied to visual tracking to find the target with the minimum reconstruction error from a target template subspace. Though effective, these L1 trackers require high computational costs due to numerous calculations for l1 minimization. In addition, the inherent occlusion insensitivity of the l1 minimization has not been fully characterized. In this paper, we propose an efficient L1 tracker, named bounded particle resampling (BPR)-L1 tracker, with a minimum error bound and occlusion detection. First, the minimum error bound is calculated from a linear least squares equation and serves as a guide for particle resampling in a particle filter (PF) framework. Most of the insignificant samples are removed before solving the computationally expensive l1 minimization in a two-step testing. The first step, named τ testing, compares the sample observation likelihood to an ordered set of thresholds to remove insignificant samples without loss of resampling precision. The second step, named max testing, identifies the largest sample probability relative to the target to further remove insignificant samples without altering the tracking result of the current frame. Though sacrificing minimal precision during resampling, max testing achieves significant speed up on top of τ testing. The BPR-L1 technique can also be beneficial to other trackers that have minimum error bounds in a PF framework, especially for trackers based on sparse representations. After the error-bound calculation, BPR-L1 performs occlusion detection by investigating the trivial coefficients in the l1 minimization. These coefficients, by design, contain rich information about image corruptions, including occlusion. Detected occlusions are then used to enhance the template updating. For evaluation, we conduct experiments on three video applications: biometrics (head movement, hand holding object, singers on stage), pedestrians (urban travel, hallway monitoring), and cars in traffic (wide area motion imagery, ground-mounted perspectives). The proposed BPR-L1 method demonstrates an excellent performance as compared with nine state-of-the-art trackers on eleven challenging benchmark sequences.

128 citations


Journal ArticleDOI
TL;DR: This paper introduces a new methodology to perform feature selection in multi-label classification problems that uses the multivariate mutual information criterion combined with a problem transformation and a pruning strategy.

111 citations


Monograph
05 Aug 2013
TL;DR: 1. Introduction 2. Probability
Abstract: 1. Introduction 2. Probability 3. Introduction to R 4. Random Number Generation 5 .Statistical Simulation of the Linear Model 6. Simulating Generalized Linear Models 7. Testing Theory Using Simulation 8. Resampling Methods 9. Other Simulation-Based Methods 10. Final Thoughts

98 citations


Journal ArticleDOI
TL;DR: Generalized sampling as discussed by the authors is a recently developed linear framework for sampling and reconstruction in separable Hilbert spaces, which allows one to recover any element in any finite-dimensional subspace given finitely many of its samples with respect to an arbitrary basis or frame.
Abstract: Generalized sampling is a recently developed linear framework for sampling and reconstruction in separable Hilbert spaces. It allows one to recover any element in any finite-dimensional subspace given finitely many of its samples with respect to an arbitrary basis or frame. Unlike more common approaches for this problem, such as the consistent reconstruction technique of Eldar and others, it leads to numerical methods possessing both guaranteed stability and accuracy. The purpose of this paper is twofold. First, we give a complete and formal analysis of generalized sampling, the main result of which being the derivation of new, sharp bounds for the accuracy and stability of this approach. Such bounds improve upon those given previously and result in a necessary and sufficient condition, the stable sampling rate, which guarantees a priori a good reconstruction. Second, we address the topic of optimality. Under some assumptions, we show that generalized sampling is an optimal, stable method. Correspondingly...

90 citations


Book
17 Dec 2013
TL;DR: In this article, the authors present a model for estimating the probability of an experiment with random variables, including the effect size, sample size, and effect size of a linear regression model.
Abstract: FOUNDATIONS Philosophical and Historical Foundations Introduction Nature of Science Scientific Principles Scientific Method Scientific Hypotheses Logic Variability and Uncertainty in Investigations Science and Statistics Statistics and Biology Introduction to Probability Introduction: Models for Random Variables Classical Probability Conditional Probability Odds Combinatorial Analysis Bayes Rule Probability Density Functions Introduction Introductory Examples of pdfs Other Important Distributions Which pdf to Use? Reference Tables Parameters and Statistics Introduction Parameters Statistics OLS and ML Estimators Linear Transformations Bayesian Applications Interval Estimation: Sampling Distributions, Resampling Distributions, and Simulation Distributions Introduction Sampling Distributions Confidence Intervals Resampling Distributions Bayesian Applications: Simulation Distributions Hypothesis Testing Introduction Parametric Frequentist Null Hypothesis Testing Type I and Type II Errors Power Criticisms of Frequentist Null Hypothesis Testing Alternatives to Parametric Null Hypothesis Testing Alternatives to Null Hypothesis Testing Sampling Design and Experimental Design Introduction Some Terminology The Question Is: What Is the Question? Two Important Tenets: Randomization and Replication Sampling Design Experimental Design APPLICATIONS Correlation Introduction Pearson's Correlation Robust Correlation Comparisons of Correlation Procedures Regression Introduction Linear Regression Model General Linear Models Simple Linear Regression Multiple Regression Fitted and Predicted Values Confidence and Prediction Intervals Coefficient of Determination and Important Variants Power, Sample Size, and Effect Size Assumptions and Diagnostics for Linear Regression Transformation in the Context of Linear Models Fixing the Y-Intercept Weighted Least Squares Polynomial Regression Comparing Model Slopes Likelihood and General Linear Models Model Selection Robust Regression Model II Regression (X Not Fixed) Generalized Linear Models Nonlinear Models Smoother Approaches to Association and Regression Bayesian Approaches to Regression ANOVA Introduction One-Way ANOVA Inferences for Factor Levels ANOVA as a General Linear Model Random Effects Power, Sample Size, and Effect Size ANOVA Diagnostics and Assumptions Two-Way Factorial Design Randomized Block Design Nested Design Split-Plot Design Repeated Measures Design ANCOVA Unbalanced Designs Robust ANOVA Bayesian Approaches to ANOVA Tabular Analyses Introduction Probability Distributions for Tabular Analyses One-Way Formats Confidence Intervals for p Contingency Tables Two-Way Tables Ordinal Variables Power, Sample Size, and Effect Size Three-Way Tables Generalized Linear Models Appendix References Index A Summary and Exercises appear at the end of each chapter.

79 citations


Journal ArticleDOI
TL;DR: It is demonstrated that both the test statistic and its estimated variance are seriously biased when predictions from nested regression models are used as data inputs for the test, and the reasons for these problems are examined.
Abstract: In constructing predictive models, investigators frequently assess the incremental value of a predictive marker by comparing the ROC curve generated from the predictive model including the new marker with the ROC curve from the model excluding the new marker. Many commentators have noticed empirically that a test of the two ROC areas often produces a non-significant result when a corresponding Wald test from the underlying regression model is significant. A recent article showed using simulations that the widely used ROC area test produces exceptionally conservative test size and extremely low power. In this article, we demonstrate that both the test statistic and its estimated variance are seriously biased when predictions from nested regression models are used as data inputs for the test, and we examine in detail the reasons for these problems. Although it is possible to create a test reference distribution by resampling that removes these biases, Wald or likelihood ratio tests remain the preferred approach for testing the incremental contribution of a new marker.

Journal ArticleDOI
Faming Liang1, Yichen Cheng1, Qifan Song1, Jincheol Park1, Ping Yang1 
TL;DR: This article proposes a resampling-based stochastic approximation method that leads to a general parameter estimation approach, maximum mean log-likelihood estimation, which includes the popular maximum (log)-likelihood estimator approach as a special case and is expected to play an important role in analyzing large datasets.
Abstract: The Gaussian geostatistical model has been widely used in modeling of spatial data. However, it is challenging to computationally implement this method because it requires the inversion of a large covariance matrix, particularly when there is a large number of observations. This article proposes a resampling-based stochastic approximation method to address this challenge. At each iteration of the proposed method, a small subsample is drawn from the full dataset, and then the current estimate of the parameters is updated accordingly under the framework of stochastic approximation. Since the proposed method makes use of only a small proportion of the data at each iteration, it avoids inverting large covariance matrices and thus is scalable to large datasets. The proposed method also leads to a general parameter estimation approach, maximum mean log-likelihood estimation, which includes the popular maximum (log)-likelihood estimation (MLE) approach as a special case and is expected to play an important role ...

Journal ArticleDOI
TL;DR: The main idea is to allow for lookahead in the Monte Carlo process so that future information can be utilized in weighting and generating Monte Carlo samples, or resampling from samples of the current state.
Abstract: Based on the principles of importance sampling and resampling, sequential Monte Carlo (SMC) encompasses a large set of powerful techniques dealing with complex stochastic dynamic systems. Many of these systems possess strong memory, with which future information can help sharpen the inference about the current state. By providing theoretical justification of several existing algorithms and introducing several new ones, we study systematically how to construct efficient SMC algorithms to take advantage of the “future” information without creating a substantially high computational burden. The main idea is to allow for lookahead in the Monte Carlo process so that future information can be utilized in weighting and generating Monte Carlo samples, or resampling from samples of the current state.

Journal ArticleDOI
TL;DR: For sample size adjustment, it is more theoretically rigorous and practically flexible to measure the fit of the distribution represented by weighted particles based on KLD during resampling than in sampling.
Abstract: An adaptive resampling method is provided. It determines the number of particles to resample so that the Kullback-Leibler distance (KLD) between the distribution of particles before resampling and after resampling does not exceed a pre-specified error bound. The basis of the method is the same as Fox's KLD-sampling but implemented differently. The KLD-sampling assumes that samples are coming from the true posterior distribution and ignores any mismatch between the true and the proposal distribution. In contrast, the KLD measure is incorporated into the resampling in which the distribution of interest is just the posterior distribution. That is to say, for sample size adjustment, it is more theoretically rigorous and practically flexible to measure the fit of the distribution represented by weighted particles based on KLD during resampling than in sampling. Simulations of target tracking demonstrate the efficiency of the method.

Book
30 Jan 2013
TL;DR: In this article, the authors present a R Code for Concept Implementations for Spatial Statistics and Geostatistics, which is based on Bayesian methods for spatial data Markov Chain Monte Carlo Techniques Selected Puerto Rico Examples Designing Monte Carlo Simulation Experiments A Monte Carlo Experiment Investigating Eigenvector Selection when Constructing a Spatial Filtering: Correlation Coefficient Decomposition R code for Concept Implementation Methods For Spatial Interpolation In Two Dimensions Kriging: An Algebraic Basis The EM Algorithm Spatial Autoregression: A Spatial
Abstract: About the Authors Preface Introduction Spatial Statistics and Geostatistics R Basics Spatial Autocorrelation Indices Measuring Spatial Dependency Important Properties of MC Relationships Between MC And GR, and MC and Join Count Statistics Graphic Portrayals: The Moran Scatterplot and the Semi-variogram Plot Impacts of Spatial Autocorrelation Testing for Spatial Autocorrelation in Regression Residuals R Code for Concept Implementations Spatial Sampling Selected Spatial Sampling Designs Puerto Rico DEM Data Properties of the Selected Sampling Designs: Simulation Experiment Results Sampling Simulation Experiments On A Unit Square Landscape Sampling Simulation Experiments On A Hexagonal Landscape Structure Resampling Techniques: Reusing Sampled Data The Bootstrap The Jackknife Spatial Autocorrelation and Effective Sample Size R Code for Concept Implementations Spatial Composition and Configuration Spatial Heterogeneity: Mean and Variance ANOVA Testing for Heterogeneity Over a Plane: Regional Supra-Partitionings Establishing a Relationship to the Superpopulation A Null Hypothesis Rejection Case With Heterogeneity Testing for Heterogeneity Over a Plane: Directional Supra-Partitionings Covariates Across a Geographic Landscape Spatial Weights Matrices Weights Matrices for Geographic Distributions Weights Matrices for Geographic Flows Spatial Heterogeneity: Spatial Autocorrelation Regional Differences Directional Differences: Anisotropy R Code for Concept Implementations Spatially Adjusted Regression And Related Spatial Econometrics Linear Regression Nonlinear Regression Binomial/Logistic Regression Poisson/Negative Binomial Regression Geographic Distributions Geographic Flows: A Journey-To-Work Example R Code for Concept Implementations Local Statistics: Hot And Cold Spots Multiple Testing with Positively Correlated Data Local Indices of Spatial Association Getis-Ord Statistics Spatially Varying Coefficients R Code For Concept Implementations Analyzing Spatial Variance And Covariance With Geostatistics And Related Techniques Semi-variogram Models Co-kriging DEM Elevation as a Covariate Landsat 7 ETM+ Data as a Covariate Spatial Linear Operators Multivariate Geographic Data Eigenvector Spatial Filtering: Correlation Coefficient Decomposition R Code for Concept Implementations Methods For Spatial Interpolation In Two Dimensions Kriging: An Algebraic Basis The EM Algorithm Spatial Autoregression: A Spatial EM Algorithm Eigenvector Spatial Filtering: Another Spatial EM Algorithm R Code for Concept Implementations More Advanced Topics In Spatial Statistics Bayesian Methods for Spatial Data Markov Chain Monte Carlo Techniques Selected Puerto Rico Examples Designing Monte Carlo Simulation Experiments A Monte Carlo Experiment Investigating Eigenvector Selection when Constructing a Spatial Filter A Monte Carlo Experiment Investigating Eigenvector Selection from a Restricted Candidate Set of Vectors Spatial Error: A Contributor to Uncertainty R Code for Concept Implementations References Index

Journal ArticleDOI
TL;DR: A simple fit to the oscillations using linear least squares is available, together with a non-parametric test for detecting changes in period length which allows for period estimates with different variances, as frequently encountered in practice.
Abstract: Estimation of the period length of time-course data from cyclical biological processes, such as those driven by the circadian pacemaker, is crucial for inferring the properties of the biological clock found in many living organisms. We propose a methodology for period estimation based on spectrum resampling (SR) techniques. Simulation studies show that SR is superior and more robust to non-sinusoidal and noisy cycles than a currently used routine based on Fourier approximations. In addition, a simple fit to the oscillations using linear least squares is available, together with a non-parametric test for detecting changes in period length which allows for period estimates with different variances, as frequently encountered in practice. The proposed methods are motivated by and applied to various data examples from chronobiology.

Journal ArticleDOI
TL;DR: The combined total flow and peak flow data were an efficient alternative to the intensive 5-min flow data for reducing SWMM parameter and output uncertainties and several runoff control parameters were found to have a great effect on peak flows, including the newly introduced parameters for trees.
Abstract: This research incorporates the generalized likelihood uncertainty estimation (GLUE) methodology in a high-resolution Environmental Protection Agency Storm Water Management Model (SWMM), which we developed for a highly urbanized sewershed in Syracuse, NY, to assess SWMM modelling uncertainties and estimate parameters. We addressed two issues that have long been suggested having a great impact on the GLUE uncertainty estimation: the observations used to construct the likelihood measure and the sampling approach to obtain the posterior samples of the input parameters and prediction bounds of the model output. First, on the basis of the Bayes' theorem, we compared the prediction bounds generated from the same Gaussian distribution likelihood measure conditioned on flow observations of varying magnitude. Second, we employed two sampling techniques, the sampling importance resampling (SIR) and the threshold sampling methods, to generate posterior parameter distributions and prediction bounds, based on which the sampling efficiency was compared. In addition, for a better understanding of the hydrological responses of different pervious land covers in urban areas, we developed new parameter sets in SWMM representing the hydrological properties of trees and lawns, which were estimated through the GLUE procedure. The results showed that SIR was a more effective alternative to the conventional threshold sampling method. The combined total flow and peak flow data were an efficient alternative to the intensive 5-min flow data for reducing SWMM parameter and output uncertainties. Several runoff control parameters were found to have a great effect on peak flows, including the newly introduced parameters for trees. Copyright © 2013 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this paper, a new panel unit root test based on Simes' [Biometrika 1986, “An Improved Bonferroni Procedure for Multiple Tests of Significance”] classical intersection test is proposed.
Abstract: This paper proposes a new panel unit root test based on Simes’ [Biometrika 1986, “An Improved Bonferroni Procedure for Multiple Tests of Significance”] classical intersection test. The test is robust to general patterns of cross-sectional dependence and yet straightforward to implement, only requiring p-values of time series unit root tests of the series in the panel, and no resampling. Monte Carlo experiments show good size and power properties relative to existing panel unit root tests. Unlike previously suggested tests, the new test allows to identify the units in the panel for which the alternative of stationarity can be said to hold. We provide two empirical applications to panels of real gross domestic product (GDP) and real exchange rate data.

Journal ArticleDOI
TL;DR: In this article, a rigorous study of weak convergence of the wild bootstrap for nonparametric estimation of the cumulative event probability of a competing risk is presented. But the data may be subject to independent left-truncation and right-censoring.
Abstract: . We give a rigorous study of weak convergence of the wild bootstrap for non-parametric estimation of the cumulative event probability of a competing risk. The data may be subject to independent left-truncation and right-censoring. Inclusion of left-truncation is motivated by a study on pregnancy outcomes. The wild bootstrap includes as one case a popular resampling technique, where the limit distribution is approximated by repeatedly generating standard normal variates, while the data are kept fixed. Simulation results and a data example are also presented.

Journal ArticleDOI
TL;DR: This paper proposes a generalized estimating equations (GEEs) based kernel association test, a variance component based testing method, to test for the association between a phenotype and multiple variants in an SNP set jointly using family samples, for both continuous and discrete traits.
Abstract: Family-based genetic association studies of related individuals provide opportunities to detect genetic variants that complement studies of unrelated individuals. Most statistical methods for family association studies for common variants are single marker based, which test one SNP a time. In this paper, we consider testing the effect of an SNP set, e.g., SNPs in a gene, in family studies, for both continuous and discrete traits. Specifically, we propose a generalized estimating equations (GEEs) based kernel association test, a variance component based testing method, to test for the association between a phenotype and multiple variants in an SNP set jointly using family samples. The proposed approach allows for both continuous and discrete traits, where the correlation among family members is taken into account through the use of an empirical covariance estimator. We derive the theoretical distribution of the proposed statistic under the null and develop analytical methods to calculate the P-values. We also propose an efficient resampling method for correcting for small sample size bias in family studies. The proposed method allows for easily incorporating covariates and SNP-SNP interactions. Simulation studies show that the proposed method properly controls for type I error rates under both random and ascertained sampling schemes in family studies. We demonstrate through simulation studies that our approach has superior performance for association mapping compared to the single marker based minimum P-value GEE test for an SNP-set effect over a range of scenarios. We illustrate the application of the proposed method using data from the Cleveland Family GWAS Study.

Posted Content
01 Jan 2013
TL;DR: The SMC^2 algorithm proposed in this paper is a sequential Monte Carlo algorithm, defined in the theta-dimension, which propagates and resamples many particle filters in the x-dimension.
Abstract: We consider the generic problem of performing sequential Bayesian inference in a state-space model with observation process y, state process x and fixed parameter theta. An idealized approach would be to apply the iterated batch importance sampling (IBIS) algorithm of Chopin (2002). This is a sequential Monte Carlo algorithm in the theta-dimension, that samples values of theta, reweights iteratively these values using the likelihood increments p(y_t|y_1:t-1, theta), and rejuvenates the theta-particles through a resampling step and a MCMC update step. In state-space models these likelihood increments are intractable in most cases, but they may be unbiasedly estimated by a particle filter in the x-dimension, for any fixed theta. This motivates the SMC^2 algorithm proposed in this article: a sequential Monte Carlo algorithm, defined in the theta-dimension, which propagates and resamples many particle filters in the x-dimension. The filters in the x-dimension are an example of the random weight particle filter as in Fearnhead et al. (2010). On the other hand, the particle Markov chain Monte Carlo (PMCMC) framework developed in Andrieu et al. (2010) allows us to design appropriate MCMC rejuvenation steps. Thus, the theta-particles target the correct posterior distribution at each iteration t, despite the intractability of the likelihood increments. We explore the applicability of our algorithm in both sequential and non-sequential applications and consider various degrees of freedom, as for example increasing dynamically the number of x-particles. We contrast our approach to various competing methods, both conceptually and empirically through a detailed simulation study, included here and in a supplement, and based on particularly challenging examples.

Journal ArticleDOI
TL;DR: In this paper, a generalization of the multiplier resampling scheme proposed by B\"{u}cher and Ruppert along two directions is proposed, which allows to transpose to the strongly mixing setting many of the existing multiplier tests on the unknown copula, including nonparametric tests for change point detection.
Abstract: Two key ingredients to carry out inference on the copula of multivariate observations are the empirical copula process and an appropriate resampling scheme for the latter. Among the existing techniques used for i.i.d. observations, the multiplier bootstrap of R\'{e}millard and Scaillet (J. Multivariate Anal. 100 (2009) 377-386) frequently appears to lead to inference procedures with the best finite-sample properties. B\"{u}cher and Ruppert (J. Multivariate Anal. 116 (2013) 208-229) recently proposed an extension of this technique to strictly stationary strongly mixing observations by adapting the dependent multiplier bootstrap of B\"{u}hlmann (The blockwise bootstrap in time series and empirical processes (1993) ETH Z\"{u}rich, Section 3.3) to the empirical copula process. The main contribution of this work is a generalization of the multiplier resampling scheme proposed by B\"{u}cher and Ruppert along two directions. First, the resampling scheme is now genuinely sequential, thereby allowing to transpose to the strongly mixing setting many of the existing multiplier tests on the unknown copula, including nonparametric tests for change-point detection. Second, the resampling scheme is now fully automatic as a data-adaptive procedure is proposed which can be used to estimate the bandwidth parameter. A simulation study is used to investigate the finite-sample performance of the resampling scheme and provides suggestions on how to choose several additional parameters. As by-products of this work, the validity of a sequential version of the dependent multiplier bootstrap for empirical processes of B\"{u}hlmann is obtained under weaker conditions on the strong mixing coefficients and the multipliers, and the weak convergence of the sequential empirical copula process is established under many serial dependence conditions.

Journal ArticleDOI
TL;DR: An overview of how resampling methods may be applied to linear models of ontogenetic trajectories of landmark-based geometric morphometric data, to extract in- formation about ontogeny is presented.
Abstract: Keywords: ontogeny shape permutation bootstrapping resampling MANCOVA Abstract Comparative studies of ontogenies play a crucial role in the understanding of the processes of mor- phological diversification. These studies have benefited from the appearance of new mathematical and statistical tools, including geometric morphometrics, resampling statistics and general linear models. This paper presents an overview of how resampling methods may be applied to linear models of ontogenetic trajectories of landmark-based geometric morphometric data, to extract in- formation about ontogeny. That information can be used to test hypotheses about the changes (or dierences) in rate, direction, duration and starting point of ontogenetic trajectories that led to the observed patterns of morphological diversification.

Journal ArticleDOI
TL;DR: Resampling resulted in improved plan quality and in considerable optimization time reduction compared with traditional regular grid planning and was especially effective when using thin PBs.
Abstract: This study investigates whether ‘pencil beam resampling’, i.e. iterative selection and weight optimization of randomly placed pencil beams (PBs), reduces optimization time and improves plan quality for multi-criteria optimization in intensity-modulated proton therapy, compared with traditional modes in which PBs are distributed over a regular grid. Resampling consisted of repeatedly performing: (1) random selection of candidate PBs from a very fine grid, (2) inverse multi-criteria optimization, and (3) exclusion of low-weight PBs. The newly selected candidate PBs were added to the PBs in the existing solution, causing the solution to improve with each iteration. Resampling and traditional regular grid planning were implemented into our in-house developed multi-criteria treatment planning system ‘Erasmus iCycle’. The system optimizes objectives successively according to their priorities as defined in the so-called ‘wish-list’. For five head-and-neck cancer patients and two PB widths (3 and 6 mm sigma at 230 MeV), treatment plans were generated using: (1) resampling, (2) anisotropic regular grids and (3) isotropic regular grids, while using varying sample sizes (resampling) or grid spacings (regular grid). We assessed differences in optimization time (for comparable plan quality) and in plan quality parameters (for comparable optimization time). Resampling reduced optimization time by a factor of 2.8 and 5.6 on average (7.8 and 17.0 at maximum) compared with the use of anisotropic and isotropic grids, respectively. Doses to organs-at-risk were generally reduced when using resampling, with median dose reductions ranging from 0.0 to 3.0 Gy (maximum: 14.3 Gy, relative: 0%–42%) compared with anisotropic grids and from −0.3 to 2.6 Gy (maximum: 11.4 Gy, relative: −4%–19%) compared with isotropic grids. Resampling was especially effective when using thin PBs (3 mm sigma). Resampling plans contained on average fewer PBs, energy layers and protons than anisotropic grid plans and more energy layers and protons than isotropic grid plans. In conclusion, resampling resulted in improved plan quality and in considerable optimization time reduction compared with traditional regular grid planning.

Book
30 Jan 2013
TL;DR: This book is comprehensive for learning S–PLUS—only a single chapter deals strictly with statistic analysis, though perhaps R novices would be better served by the recent book by Crawley (2005), for which the review by Ng (2006) appears elsewhere in this issue.

Journal ArticleDOI
16 Jul 2013-PLOS ONE
TL;DR: It is shown that the value of the RV coefficient depends on sample size also in real geometric morphometric datasets, and a permutation procedure to test for the difference between a priori defined groups of observations and a nearest-neighbor procedure that could be used when studying the variation of modularity in geographic space are proposed.
Abstract: Modularity has been suggested to be connected to evolvability because a higher degree of independence among parts allows them to evolve as separate units. Recently, the Escoufier RV coefficient has been proposed as a measure of the degree of integration between modules in multivariate morphometric datasets. However, it has been shown, using randomly simulated datasets, that the value of the RV coefficient depends on sample size. Also, so far there is no statistical test for the difference in the RV coefficient between a priori defined groups of observations. Here, we (1), using a rarefaction analysis, show that the value of the RV coefficient depends on sample size also in real geometric morphometric datasets; (2) propose a permutation procedure to test for the difference in the RV coefficient between a priori defined groups of observations; (3) show, through simulations, that such a permutation procedure has an appropriate Type I error; (4) suggest that a rarefaction procedure could be used to obtain sample-size-corrected values of the RV coefficient; and (5) propose a nearest-neighbor procedure that could be used when studying the variation of modularity in geographic space. The approaches outlined here, readily extendable to non-morphometric datasets, allow study of the variation in the degree of integration between a priori defined modules. A Java application – that will allow performance of the proposed test using a software with graphical user interface – has also been developed and is available at the Morphometrics at Stony Brook Web page (http://life.bio.sunysb.edu/morph/).

Journal ArticleDOI
TL;DR: It is suggested that resampling of hyperspectral data should account for the spectral dependence information to improve overall classification accuracy as well as reducing the problem of multicollinearity.

Book ChapterDOI
01 Jan 2013
TL;DR: Lanczos resampling method is proposed to be a good method from qualitative and quantitative point of view when compared to the other two resamplings methods, and proves to be an optimal method for image resampled in the arena of remote sensing when compared with the other methods used.
Abstract: This paper presents theoretical and practical application of a relatively unknown and rare image resampling technique called Lanczos resampling. Application of this method on satellite remote sensing images is considered. Image resampling is the mathematical technique used to create a new version of the image with a different width and/or height in pixels. Interpolation is the process of determining the values of a function at positions lying between its samples. Sampling the interpolated image is equivalent to interpolating the image with a sampled interpolating function. Image registration is the process of overlaying two or more images of the same scene taken at different times, from different viewpoints, and/or by different sensors. It geometrically aligns two images: the reference and sensed images. In the interaction between interpolation and sampling processes, aliases occur on some occasions. Majority of the registration methods consist of the steps like feature detection, feature matching, transform model estimation and image resampling and transformation. The proprietary softwares that are commercially available for image processing that are capable of doing image registration do not provide us with performance metrics for assessing the resampling methods used. Lanczos resampling method has not been used in the digital processing of remotely sensed satellite images by any of the open source and the proprietary software packages that are available until now. In this paper, we have applied performance metrics (on satellite images) for analyzing the performance of Lanczos resampling method. Comparison of Lanczos resampling method with other resampling methods, such as nearest neighborhood resampling, and sinc resampling, is done based on the metrics pertaining to entropy, mean relative error, and time. We propose that Lanczos resampling method to be a good method from qualitative and quantitative point of view when compared to the other two resampling methods. Also, it proves to be an optimal method for image resampling in the arena of remote sensing when compared to the other methods used. This, we hope, will enhance the understanding of the classified images’ characteristics in a quantitative manner.

Journal ArticleDOI
TL;DR: An easy-to-implement framework for monitoring nonparametric profiles in both Phase I and Phase II of a control chart scheme that can appropriately accommodate the dependence structure of the within-profile observations is proposed.

Journal ArticleDOI
TL;DR: It is concluded that BE followed by PWSF is a suitable approach when variable selection is a key part of data analysis, despite of problems introduced by data- dependent model building.
Abstract: In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues. In order to assess and compare several strategies, we will conduct a simulation study with 15 predictors and a complex correlation structure in the linear regression model. Using sample sizes of 100 and 400 and estimates of the residual variance corresponding to R2 of 0.50 and 0.71, we consider 4 scenarios with varying amount of information. We also consider two examples with 24 and 13 predictors, respectively. We will discuss the value of cross-validation, shrinkage and backward elimination (BE) with varying significance level. We will assess whether 2-step approaches using global or parameterwise shrinkage (PWSF) can improve selected models and will compare results to models derived with the LASSO procedure. Beside of MSE we will use model sparsity and further criteria for model assessment. The amount of information in the data has an influence on the selected models and the comparison of the procedures. None of the approaches was best in all scenarios. The performance of backward elimination with a suitably chosen significance level was not worse compared to the LASSO and BE models selected were much sparser, an important advantage for interpretation and transportability. Compared to global shrinkage, PWSF had better performance. Provided that the amount of information is not too small, we conclude that BE followed by PWSF is a suitable approach when variable selection is a key part of data analysis.