scispace - formally typeset
Search or ask a question

Showing papers on "Resampling published in 2000"


Journal ArticleDOI
TL;DR: This article reviews the common algorithms for resampling and methods for constructing bootstrap confidence intervals, together with some less well known ones, highlighting their strengths and weaknesses.
Abstract: Since the early 1980s, a bewildering array of methods for constructing bootstrap confidence intervals have been proposed. In this article, we address the following questions. First, when should bootstrap confidence intervals be used. Secondly, which method should be chosen, and thirdly, how should it be implemented. In order to do this, we review the common algorithms for resampling and methods for constructing bootstrap confidence intervals, together with some less well known ones, highlighting their strengths and weaknesses. We then present a simulation study, a flow chart for choosing an appropriate method and a survival analysis example.

1,416 citations


Book
01 Jan 2000
TL;DR: In this paper, the authors presented an approach for the detection of anomalous clusters based on principal components analysis (PCA) and cluster clustering, and the results showed that PCA is more accurate than other clustering techniques.
Abstract: 1 Introduction and Overview.- 1.1 Objectives.- 1.2 Multivariate Statistics: An Ecological Perspective.- 1.3 Multivariate Description and Inference.- 1.4 Multivariate Confusion!.- 1.5 Types of Multivariate Techniques.- 1.5.1 Ordination.- 1.5.2 Cluster Analysis.- 1.5.3 Discriminant Analysis.- 1.5.4 Canonical Correlation Analysis.- 2 Ordination: Principal Components Analysis.- 2.1 Objectives.- 2.2 Conceptual Overview.- 2.2.1 Ordination.- 2.2.2 Principal Components Analysis (PCA).- 2.3 Geometric Overview.- 2.4 The Data Set.- 2.5 Assumptions.- 2.5.1 Multivariate Normality.- 2.5.2 Independent Random Sample and the Effects of Outliers.- 2.5.3 Linearity.- 2.6 Sample Size Requirements.- 2.6.1 General Rules.- 2.6.2 Specific Rules.- 2.7 Deriving the Principal Components.- 2.7.1 The Use of Correlation and Covariance Matrices.- 2.7.2 Eigenvalues and Associated Statistics.- 2.7.3 Eigenvectors and Scoring Coefficients.- 2.8 Assessing the Importance of the Principal Components.- 2.8.1 Latent Root Criterion.- 2.8.2 Scree Plot Criterion.- 2.8.3 Broken Stick Criterion.- 2.8.4 Relative Percent Variance Criterion.- 2.8.5 Significance Tests.- 2.9 Interpreting the Principal Components.- 2.9.1 Principal Component Structure.- 2.9.2 Significance of Principal Component Loadings.- 2.9.3 Interpreting the Principal Component Structure.- 2.9.4 Communality.- 2.9.5 Principal Component Scores and Associated Plots.- 2.10 Rotating the Principal Components.- 2.11 Limitations of Principal Components Analysis.- 2.12 R-Factor Versus Q-Factor Ordination.- 2.13 Other Ordination Techniques.- 2.13.1 Polar Ordination.- 2.13.2 Factor Analysis.- 2.13.3 Nonmetric Multidimensional Scaling.- 2.13.4 Reciprocal Averaging.- 2.13.5 Detrended Correspondence Analysis.- 2.13.6 Canonical Correspondence Analysis.- Appendix 2.1.- 3 Cluster Analysis.- 3.1 Objectives.- 3.2 Conceptual Overview.- 3.3 The Definition of Cluster.- 3.4 The Data Set.- 3.5 Clustering Techniques.- 3.6 Nonhierarchical Clustering.- 3.6.1 Polythetic Agglomerative Nonhierarchical Clustering.- 3.6.2 Polythetic Divisive Nonhierarchical Clustering.- 3.7 Hierarchical Clustering.- 3.7.1 Polythetic Agglomerative Hierarchical Clustering.- 3.7.2 Polythetic Divisive Hierarchical Clustering.- 3.8 Evaluating the Stability of the Cluster Solution.- 3.9 Complementary Use of Ordination and Cluster Analysis.- 3.10 Limitations of Cluster Analysis.- Appendix 3.1.- 4 Discriminant Analysis.- 4.1 Objectives.- 4.2 Conceptual Overview.- 4.2.1 Overview of Canonical Analysis of Discriminance.- 4.2.2 Overview of Classification.- 4.2.3 Analogy with Multiple Regression Analysis and Multivariate Analysis of Variance.- 4.3 Geometric Overview.- 4.4 The Data Set.- 4.5 Assumptions.- 4.5.1 Equality of Variance-Covariance Matrices.- 4.5.2 Multivariate Normality.- 4.5.3 Singularities and Multicollinearity.- 4.5.4 Independent Random Sample and the Effects of Outliers.- 4.5.5 Prior Probabilities Are Identifiable.- 4.5.6 Linearity 153.- 4.6 Sample Size Requirements.- 4.6.1 General Rules.- 4.6.2 Specific Rules.- 4.7 Deriving the Canonical Functions.- 4.7.1 Stepwise Selection of Variables.- 4.7.2 Eigenvalues and Associated Statistics.- 4.7.3 Eigenvectors and Canonical Coefficients.- 4.8 Assessing the Importance of the Canonical Functions.- 4.8.1 Relative Percent Variance Criterion.- 4.8.2 Canonical Correlation Criterion.- 4.8.3 Classification Accuracy.- 4.8.4 Significance Tests.- 4.8.5 Canonical Scores and Associated Plots.- 4.9 Interpreting the Canonical Functions.- 4.9.1 Standardized Canonical Coefficients.- 4.9.2 Total Structure Coefficients.- 4.9.3 Covariance-Controlled Partial F-Ratios.- 4.9.4 Significance Tests Based on Resampling Procedures.- 4.9.5 Potency Index.- 4.10 Validating the Canonical Functions.- 4.10.1 Split-Sample Validation.- 4.10.2 Validation Using Resampling Procedures.- 4.11 Limitations of Discriminant Analysis.- Appendix 4.1.- 5 Canonical Correlation Analysis.- 5.1 Objectives.- 5.2 Conceptual Overview.- 5.3 Geometric Overview.- 5.4 The Data Set.- 5.5 Assumptions.- 5.5.1 Multivariate Normality.- 5.5.2 Singularities and Multicollinearity.- 5.5.3 Independent Random Sample and the Effects of Outliers.- 5.5.4 Linearity.- 5.6 Sample Size Requirements.- 5.6.1 General Rules.- 5.6.2 Specific Rules.- 5.7 Deriving the Canonical Variates.- 5.7.1 The Use of Covariance and Correlation Matrices.- 5.7.2 Eigenvalues and Associated Statistics.- 5.7.3 Eigenvectors and Canonical Coefficients.- 5.8 Assessing the Importance of the Canonical Variates.- 5.8.1 Canonical Correlation Criterion.- 5.8.2 Canonical Redundancy Criterion.- 5.8.3 Significance Tests.- 5.8.4 Canonical Scores and Associated Plots.- 5.9 Interpreting the Canonical Variates.- 5.9.1 Standardized Canonical Coefficients.- 5.9.2 Structure Coefficients.- 5.9.3 Canonical Cross-Loadings.- 5.9.4 Significance Tests Based on Resampling Procedures.- 5.10 Validating the Canonical Variates.- 5.10.1 Split-Sample Validation.- 5.10.2 Validation Using Resampling Procedures.- 5.11 Limitations of Canonical Correlation Analysis.- Appendix 5.1.- 6 Summary and Comparison.- 6.1 Objectives.- 6.2 Relationship Among Techniques.- 6.2.1 Purpose and Source of Variation Emphasized.- 6.2.2 Statistical Procedure.- 6.2.3 Type of Statistical Technique and Variable Set Characteristics.- 6.2.4 Data Structure.- 6.2.5 Sampling Design.- 6.3 Complementary Use of Techniques.- Appendix: Acronyms Used in This Book.

1,371 citations


Journal ArticleDOI
TL;DR: It is shown how the non-parametric bootstrap provides a more flexible alternative for comparing arithmetic mean costs between randomized groups, avoiding the assumptions which limit other methods.
Abstract: Health economic evaluations are now more commonly being included in pragmatic randomized trials. However a variety of methods are being used for the presentation and analysis of the resulting cost data, and in many cases the approaches taken are inappropriate. In order to inform health care policy decisions, analysis needs to focus on arithmetic mean costs, since these will reflect the total cost of treating all patients with the disease. Thus, despite the often highly skewed distribution of cost data, standard non-parametric methods or use of normalizing transformations are not appropriate. Although standard parametric methods of comparing arithmetic means may be robust to non-normality for some data sets, this is not guaranteed. While the randomization test can be used to overcome assumptions of normality, its use for comparing means is still restricted by the need for similarly shaped distributions in the two groups. In this paper we show how the non-parametric bootstrap provides a more flexible alternative for comparing arithmetic mean costs between randomized groups, avoiding the assumptions which limit other methods. Details of several bootstrap methods for hypothesis tests and confidence intervals are described and applied to cost data from two randomized trials. The preferred bootstrap approaches are the bootstrap-t or variance stabilized bootstrap-t and the bias corrected and accelerated percentile methods. We conclude that such bootstrap techniques can be recommended either as a check on the robustness of standard parametric methods, or to provide the primary statistical analysis when making inferences about arithmetic means for moderately sized samples of highly skewed data such as costs.

637 citations


Book
01 Jan 2000
TL;DR: In this article, the authors proposed a method for controlling the effect of treatment intervention on Pretest-Posttest data by using the variance of the Pretest and Posttest scores.
Abstract: INTRODUCTION Clinical Applications of Pretest-Posttest Data Why use the Pretest Data Graphical Presentation of Pretest-Posttest Data How to Analyze Pretest-Postest Data: Possible Solutions A Note on SAS Notation Focus of the Book MEASUREMENT CONCEPTS What is Validity? What is Reliability? What is Regression Towards the Mean? Why is Regression Towards the Mean Important? Dealing with Regression Towards the Mean and How to Take Advantage of Test-Retest Reliability What is Pretest Sensitization? Controlling for Pretest Sensitization with Factorial Designs Alternative Methods for Controlling for Pretest Sensitization DIFFERENCE SCORES Definition and Assumptions Case 1: The Absence of a Treatment Intervention Between Measurement of the Pretest and Posttest Scores Case 2: The Application of a Treatment Intervention Between Measurement of the Pretest and Posttest Scores Nonparametric Alternative to Case 1 or Case 2 Case 3: Two Groups with Different Treatment Interventions Between Measurement of Pretest and Posttest Scores Case 4: More than Two Groups with Different Treatment Interventions Between Measurement of Pretest and Posttest Scores Unreliability of Difference Scores Testing the Distribution of Change and Relative change Scores Effect of Regression Towards the Mean on Difference Scores RELATIVE CHANGE FUNCTIONS Definitions and Assumptions Statistical Analyses with Change Scores Change Scores and Regression Towards the Mean Difference Scores or Relative change Scores? Other Relative change Functions Distribution of Relative change Scores ANALYSIS OF COVARIANCE Definitions and Assumptions Parametric ANCOVA ANCOVA with Difference Scores as the Dependent Variable ANCOVA using Percent change as the Dependent Variable Assumptions of the ANCOVA Violation of Homogeneity of Within-Groups Regression Coefficients Error-in-Variables ANCOVA Other Violations Effect of Outliers and Influential Observations Nonrandom Assignment of Subject to Treatment Groups BLOCKING TECHNIQUES Using Stratification to Control for the Pretest Post-Hoc Stratification REPEATED MEASURES ANALYSIS OF VARIANCE Using Repeated Measures ANOVA for Analysis of Pretest-Posttest Data Regression Towards the Mean with Multiple Posttest Measurements Using Repeated Measures ANOVA for Analysis of Pretest-Posttest Data with Multiple Posttest Measurements Analysis of Repeated Measures using Summary Measures CHOOSING A STATISTICAL TEST Choosing a Test Based on how the Data will be Presented Generation of Bivariate, Normally Distributed Data with a Specified Covariance Structure Monte Carlo Simulation when the Assumptions of the Statistical Test are Met Monte Carlo simulation when Systematic Bias Affects the Pretest and Posttest Equally Monte Carlo Simulation when the variance of the Posttest Scores does not Equal the Variance of the Pretest Scores Monte Carlo Simulation when Subjects are Grouped A Priori based on Pretest Score Monte Carlo Simulation when the Marginal Distribution of the Pretest and Posttest Scores is Non-Normal RANDOMIZATION TESTS Permutation Tests and Randomization Tests Randomization Tests and Pretest-Posttest Data Analysis of Covariance Resampling within Block or Time Periods Resampling with Missing Data SPECIAL TOPICS: EQUALITY OF VARIANCE Methods and Procedures APPENDIX: SAS Code

288 citations


Journal ArticleDOI
TL;DR: In this article, a semiparametric approach to smoothing sample extremes, based on local polynomial fitting of the generalized extreme value distribution and related models, is proposed, which is applied to data on extreme temperatures and on record times for the women's 3000 m race.
Abstract: Trends in sample extremes are of interest in many contexts, an example being environmental statistics. Parametric models are often used to model trends in such data, but they may not be suitable for exploratory data analysis. This paper outlines a semiparametric approach to smoothing sample extremes, based on local polynomial fitting of the generalized extreme value distribution and related models. The uncertainty of fits is assessed by using resampling methods. The methods are applied to data on extreme temperatures and on record times for the women's 3000 m race.

161 citations


Journal ArticleDOI
TL;DR: Both the simple nonparametric reference interval estimation procedure and the resampling (bootstrap) principle were studied using simulations based on distribution types that should be relevant for clinical chemistry, i.e., gaussian and skewed distributions.
Abstract: In recent years, increasing interest has arisen in nonparametric estimation of reference intervals. The IFCC recommendation focuses on the nonparametric procedure, and the NCCLS guideline on reference interval estimation deals exclusively with the nonparametric approach (1)(2). The mentioned reports are based on the simple nonparametric approach, taking as a basis the sorted sample values. In addition to this basic approach, modern computer-based procedures have been introduced, which have made it possible to attain slightly increased precision for the nonparametric approach by applying resampling methods, weighted percentile estimation, or smoothing techniques (3)(4). In the present report, both the simple nonparametric reference interval estimation procedure and the resampling (bootstrap) principle were studied using simulations based on distribution types that should be relevant for clinical chemistry, i.e., gaussian and skewed distributions. According to the procedure recommended by the IFCC and NCCLS, the observations are ranked according to size, and the 2.5 and 97.5 percentiles are obtained as the 0.025 (n + 1) and 0.975 (n + 1) ordered observations (1)(2). If the estimated rank values are not integers, then linear interpolation is carried out. In the statistical literature, various modifications of the computation procedure have been considered (5)(6)(7). Here the traditional one used in clinical chemistry as outlined above (called method I) is compared with an alternative (called method II): p /100 × n + 0.5, where p indicates the percentile (6). For the 2.5 and 97.5 percentiles, method II yields the 0.025n + 0.5 and 0.975n + 0.5 ordered values, respectively. In the following, the above-mentioned calculation principles are referred to as “simple” procedures (IS or IIS) as opposed to “bootstrap” modifications described below (IB or IIB). The bootstrap principle consists of repeated random resampling of the original observations with replacement, which …

120 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed simple resampling methods by convexfying Powell's approach in the resample stage, which can be implemented by efficient linear programming and showed that the methods are reliable even with moderate sample sizes.

118 citations


Journal ArticleDOI
TL;DR: The 1 to 24 steps ahead load forecasts are obtained through multilayer perceptrons trained by the backpropagation algorithm, and three techniques for the computation of confidence intervals for this neural network based short-term load forecasting are presented.
Abstract: Using traditional statistical models, like ARMA and multilinear regression, confidence intervals can be computed for the short-term electric load forecasting, assuming that the forecast errors are independent and Gaussian distributed. In this paper, the 1 to 24 steps ahead load forecasts are obtained through multilayer perceptrons trained by the backpropagation algorithm. Three techniques for the computation of confidence intervals for this neural network based short-term load forecasting are presented: (1) error output; (2) resampling; and (3) multilinear regression adapted to neural networks. A comparison of the three techniques is performed through simulations of online forecasting.

112 citations


Reference BookDOI
TL;DR: In this paper, a review of empirical fourier analysis in scientific problems modeling and inference for periodically correlated time series modeling time series of count data seasonal and cyclical long memory nonparametric specification procedures for time series parameter estimation and model selection for multistep prediction of a time series.
Abstract: Some examples of empirical fourier analysis in scientific problems modeling and inference for periodically correlated time series modeling time series of count data seasonal and cyclical long memory nonparametric specification procedures for time series parameter estimation and model selection for multistep prediction of a time series - a review nonlinear estimation for time series observed on arrays some contributions to multivariate nonlinear time series and to bilinear models optimal testing for semiparametric AR models - from Gaussian Lagrange multipliers to autoregression rank scores and adaptive tests statistical analysis based on functionals of nonparametric spectral density estimators efficient estimation in a semiparametric additive regression model with ARMA errors efficient estimation in Markov chain models - an introduction nonparametric functional estimation - an overview minimum distance and nonparametric dispersion functions estimators of changes on inverse estimation approaches for semiparametric Bayesian regression consistency issues in Bayesian nonparametrics breakdown theory for estimators based on bootstrap and other resampling schemes on second-order properties of the stationary bootstrap method for studentized statistics convergence to equilibrium of random dynamical systems generated by IID monotone maps, with applications to economics chi-squared tests of goodness-of-fit for dependent observations positive and negative dependence with some statistical applications second-order information loss due to nuisance parameters - a simple measure. Appendix: publications of Madan Lal Puri.

101 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a method for assessing the validity of applying IRT models when ability estimates are imprecise, where the posterior expectations that are computed are dependent and the distribution of the goodness-of-fit statistic is unknown.
Abstract: Assessing the correspondence between model predictions and observed data is a recommended procedure for justifying the application of an IRT model. However, with shorter tests, current goodness-of-fit procedures that assume precise point estimates of ability, are inappropriate. The present paper describes a goodness-of-fit statistic that considers the imprecision with which ability is estimated and involves constructing item fit tables based on each examinee's posterior distribution of ability, given the likelihood of their response pattern and an assumed marginal ability distribution. However, the posterior expectations that are computed are dependent and the distribution of the goodness-of-fit statistic is unknown. The present paper also describes a Monte Carlo resampling procedure that can be used to assess the significance of the fit statistic and compares this method with a previously used method. The results indicate that the method described herein is an effective and reasonably simple procedure for assessing the validity of applying IRT models when ability estimates are imprecise.

82 citations


Journal ArticleDOI
TL;DR: In this paper, the idea of permutation testing is extended in this application to include confidence intervals for the thresholds and p-values estimated in permutation test procedures, and the confidence intervals are used to account for the Monte Carlo error associated with practical applications.
Abstract: Locating quantitative trait loci (QTL), or genomic regions associated with known molecular markers, is of increasing interest in a wide variety of applications ranging from human genetics to agricultural genetics. The hope of locating QTL (or genes) affecting a quantitative trait is that it will lead to characterization and possible manipulations of these genes. However, the complexity of both statistical and genetic issues surrounding the location of these regions calls into question the asymptotic statistical results supplying the distribution of the test statistics employed. Coupled with the power of current-day computing, permutation theory was reintroduced for the purpose of estimating the distribution of any test statistic used to test for the location of QTL. Permutation techniques have offered an attractive alternative to significance measures based on asymptotic theory. The ideas of permutation testing are extended in this application to include confidence intervals for the thresholds and p-values estimated in permutation testing procedures. The confidence intervals developed account for the Monte Carlo error associated with practical applications of permutation testing and lead to an effective method of determining an efficient permutation sample size.

Reference BookDOI
TL;DR: In this article, a review of variance estimators with extensions to multivariate nonparametric regression models on affine invariant sign and rank tests in one and two sample multivariate problems correspondence and component analysis growth curve models dealing with uncertainties in queues and networks of queues optimal Bayesian design for a logistic regression model - geometric and algebraic approaches structure of weighing mattrices of small order and weight.
Abstract: Sampling designs and prediction methods for Gaussian spatial processes design techniques for probabilistic sampling of items with variable monetary value small area estimation -a Bayesian perspective Bayes sampling designs for selection procedures cluster coordinated composites of diverse datasets on several spatial scales for designing extensive environmental sample surveys - prospectus on promising protocols corrected confidence sets for sequentially designed experiments, II -examples resampling marked point processes graphical Markov models in multivariate analysis robust regression with censored and truncated data multivariate calibration some consequences of random effects in multivariate survival models a unified methodology for constructing multivariate autoregressive models statistical model evaluation and information criteria multivariate rank tests asymptotic expansions of the distribution of some test statistics for elliptical populations a review of variance estimators with extensions to multivariate nonparametric regression models on affine invariant sign and rank tests in one and two sample multivariate problems correspondence and component analysis growth curve models dealing with uncertainties in queues and networks of queues optimal Bayesian design for a logistic regression model - geometric and algebraic approaches structure of weighing mattrices of small order and weight.

Journal ArticleDOI
TL;DR: In this article, a computer-intensive significance test for estimated power spectra of cyclic sedimentary successions is presented, which requires no more than a few minutes in computer time for a PC-486, and does not require distributional assumptions.

Journal ArticleDOI
TL;DR: In this article, the authors describe a graphical approach for detecting changes in the distribution of the variable over a number of monitoring sites between two or more sample times, with associated randomization tests.
Abstract: We describe a graphical approach for detecting changes in the distribution of the variable over a number of monitoring sites between two or more sample times, with associated randomization tests. This method was derived from the cumulative sum (CUSUM) method that was developed initially for industrial process control, but our use differs in some fundamental ways. In particular, the standard CUSUM procedure is used to detect changes with time in the mean of a variable at one location, whereas our concern is with detecting changes with time in the distribution of a variable measured at a number of different locations. We compare our randomization test for any changes in distribution with the Mann-Kendall test and analysis of covariance in terms of the power for detecting a systematic time trend affecting all sites, with and without serial correlation in time. We also compare these different tests in the situation where trend occurs, but it is not the same at all sites, again with and without serial correlation in time. All tests were adversely affected by high serial correlation, so we repeated the comparisons with the CUSUM and analysis of covariance tests modified to take this into account. We conclude that although our randomization test sometimes has less power than the Mann-Kendall test and analysis of covariance for detecting trends, it does have reasonable power and also has the ability to detect other types of change with time. The modified randomization test is very robust to serial correlation.

Journal ArticleDOI
TL;DR: The various resampling methods are compared using the new error measure SD/sub T/: the spectral distortion at interval T is compared to the reconstruction error.
Abstract: With resampling, a regularly sampled signal is extracted from observations which are irregularly spaced in time. Resampling methods can be divided into simple and complex methods. Simple methods such as Sample and Hold (S and H) and Nearest Neighbor Resampling (NNR) use only one irregular sample for one resampled observation. A theoretical analysis of the simple methods is given. The various resampling methods are compared using the new error measure SD/sub T/: the spectral distortion at interval T. SD/sub T/ is zero when the time domain properties of the signal are conserved. Using the time domain approach, an antialiasing filter is no longer necessary: the best possible estimates are obtained by using the data themselves. In the frequency domain approach, both allowing aliasing and applying antialiasing leads to distortions in the spectrum. The error measure SD/sub T/ has been compared to the reconstruction error. A small reconstruction error does not necessarily result in an accurate estimate of the statistical signal properties as expressed by SD/sub T/.

Journal ArticleDOI
TL;DR: A novel method for resampling and enhancing image data using multidimensional adaptive filters is presented and clearly shows an improvement over conventional resampled techniques such as cubic spline interpolation and sinc interpolation.

Journal ArticleDOI
07 Feb 2000-Talanta
TL;DR: Applications of some computer-intensive statistical techniques to chemical problems are demonstrated and Monte Carlo resampling and Latin Hypercube sampling do not require sophisticated and often unavailable mathematical treatment.

Proceedings ArticleDOI
05 Jun 2000
TL;DR: Novel techniques are presented for generation of random realisations from the joint smoothing distribution and for MAP estimation of the state sequence in nonlinear non-Gaussian dynamical models.
Abstract: We develop methods for performing filtering and smoothing in nonlinear non-Gaussian dynamical models. The methods rely on a particle cloud representation of the filtering distribution which evolves through time using importance sampling and resampling ideas. In particular, novel techniques are presented for generation of random realisations from the joint smoothing distribution and for MAP estimation of the state sequence. Realisations of the smoothing distribution are generated in a forward-backward procedure, while the MAP estimation procedure can be performed in a single forward pass of the Viterbi algorithm applied to a discretised version of the state space. An application to spectral estimation for time-varying autoregressions is described.

Journal ArticleDOI
TL;DR: In this paper, the notion of nonparametric bootstrap for a single unstructured sample corresponds to the algebraic operation of monoid composition, with a uniform distribution on the monoid.
Abstract: The nonparametric, or resampling, bootstrap for a single unstructured sample corresponds to the algebraic operation of monoid composition, with a uniform distribution on the monoid. With this interpretation, the notion of resampling can be extended to designs having a certain group-invariance property. Two types of exchangeable array structures are considered in some detail, namely the one-way layout, and the two-way row-column exchangeable design. Although in both cases there is a unique group under which the sampling distribution of the observations is exchangeable, the choice of monoid is not unique. Different choices of monoid can lead to drastically different, and in some cases quite misleading, inferences.

Journal ArticleDOI
TL;DR: The post-blackening (PB) approach is introduced for modeling annual streamflows that exhibit significant dependence and seems to offer considerable scope for improvement in hydrologic time series modeling and its applications to water resources planning.

Journal ArticleDOI
TL;DR: This work introduces three families of tapers for time series with gaps and presents a novel resampling technique that gave confidence intervals that attained the correct confidence level in simulations of helioseismic data with gaps.
Abstract: Gaps in time series can produce spurious features in power spectrum estimates. These artifacts can be suppressed by averaging spectrum estimates obtained by first windowing the time series with a collection of orthogonal tapers. Such "multitaper" methods have been used for data without gaps since the early 1980s and for more general sampling schemes since the late 1980s. We introduce three families of tapers for time series with gaps. Two of the families solve optimization problems. They minimize bounds on different measures of bias. Computing them involves solving large eigenvalue problems with special structure that can be exploited to construct efficient algorithms. The third family solves no particular optimization problem but is inexpensive to compute and gives spectrum estimates that are quite similar to the other two for actual and simulated helioseismic data. All three families of gap-adapted multitaper estimates have lower variance and bias than the periodogram. In simulations of helioseismic data with gaps, standard methods for constructing confidence intervals for multitaper spectrum estimates, including parametric approximations and resampling in the temporal and spectral domains, all failed to attain their nominal confidence level. We present a novel resampling technique that, in the same simulations, gave confidence intervals that attained the correct confidence level.

Journal ArticleDOI
TL;DR: Scheiner et al. as discussed by the authors used two resampling techniques, namely, the Jackknife and Bootstrap along with the Taylor series approximation and transformation method, for the construction of condence intervals.

Posted Content
TL;DR: In this article, the authors apply two non-ignorable non-response models to the data of the Norwegian Labour Force Survey, the Fertility Survey and the Alveolar Bone Loss Survey.
Abstract: We apply two non-ignorable non-response models to the data of the Norwegian Labour Force Survey, the Fertility Survey and the Alveolar Bone Loss Survey. Both models focus on the marginal effect which the object variable of interest has on the non-response, where we assume the probability of non-response to be generalized proportional to the size of the object variable. We draw the inference of the parameter of interest based on the first-order theory of the profile likelihood. We adapt the Markov chain sampling techniques to efficiently generate the profile likelihood inference. We explain and demonstrate why the resampling approach is more flexible for the likelihood inference than under the Beyesian framework.

Journal ArticleDOI
TL;DR: It is shown that by permutation‐based resampling, statistical significance may be computed for each voxel belonging to a cluster of interest without parametric distributional assumptions.
Abstract: Exploratory, data-driven analysis approaches such as cluster analysis, principal component analysis, independent component analysis, or neural network-based techniques are complementary to hypothesis-led methods. They may be considered as hypothesis generating methods. The representative time courses they produce may be viewed as alternative hypotheses to the null hypothesis, ie, “no activation.” We present here a resampling technique to validate the results of exploratory fuzzy clustering analysis. In this case an alternative hypothesis is represented by a cluster centroid. For both simulated and in vivo functional magnetic resonance imaging data, we show that by permutation-based resampling, statistical significance may be computed for each voxel belonging to a cluster of interest without parametric distributional assumptions. J. Magn. Reson. Imaging 2000;11:228–231. © 2000 Wiley-Liss, Inc.

Journal ArticleDOI
TL;DR: In this paper, the cross ratio in Clayton's copula was estimated under the assumption that covariates are incorporated into the marginal distributions via semiparametric accelerated life regression models.
Abstract: We consider estimation of the cross ratio in Clayton's (1978) copula in which covariates are incorporated into the marginal distributions via semiparametric accelerated life regression models. Generalisations of Oakes' (1982, 1986) concordance estimating equations yield a closed form for the association parameter under right censoring. Joint inferences about covariate effects in the marginal models and the cross ratio are obtained with U-statistics, martingales and a resampling technique for nonsmooth estimating equations. Simulating under an exponential model, we find that our procedure may provide a more precise estimate of association than other methods for proportional hazards models. A goodness-of-fit test for the assumed copula is presented and used in an analysis of a diabetic retinopathy dataset.

Journal ArticleDOI
TL;DR: This paper demonstrates the equivalence of the two separable resampling algorithms in the sense that they produce identical output scanlines and derives a variation of Fant's algorithm that applies when the forward mapping is given and aVariation of Wolberg's algorithm when the inverse map is given.
Abstract: Separable resampling algorithms significantly reduce the complexity of image warping. Fant presented a separable algorithm that is well suited for hardware implementation. That method, however, is ...

Journal ArticleDOI
TL;DR: In this paper, the authors extend the Wild Bootstrap approach to censored regression and apply it to simulated data sets as well as tto the Stanford Heart Transplant Data, where the distributional approximation of the underlying marked empirical process is provided.
Abstract: Let M be a parametric model for an unknown regression function m. In order to check the validity of M, i.e., to test for m ∈ M, it is known that optinal tests should be based on the empirical process of the regressors marked by the residuals. In this paper we extend the methodology to censored regression. The asymptotic distribution of the underlying marked empirical process in provided. The Wild Bootstrap, appropriately modified to account for censhorship, provides distributional approximations. The method is applied to simulated data sets as well as tto the Stanford Heart Transplant Data.

Journal ArticleDOI
TL;DR: Exact and Monte Carlo resampling FORTRAN programs are described for the Wilcoxon-Mann-Whitney rank-sum test and the Kruskal-Wallis oneway analysis of variance for ranks test.
Abstract: Exact and Monte Carlo resampling FORTRAN programs are described for the Wilcoxon-Mann-Whitney rank sum test and the Kruskal-Wallis one-way analysis of variance for ranks test. The program algorithms compensate for tied values and do not depend on asymptotic approximations for probability values, unlike most algorithms contained in PC-based statistical software packages.

Proceedings Article
01 Jan 2000
TL;DR: This paper describes a resampling based multiple comparison technique that is illustrated on the estimate of the number of hidden units for feed-forward neural networks.
Abstract: In statistical modelling, an investigator must often choose a suitable model among a collection of viable candidates. There is no consensus in the research community on how such a comparative study is performed in a methodologically sound way. The ranking of several methods is usually performed by the use of a selection criterion, which assigns a score to every model based on some underlying statistical principles. The fitted model that is favoured is the one corresponding to the minimum (or the maximum) score. Statistical significance testing can extend this method. However, when enough pairwise tests are performed the multiplicity effect appears which can be taken into account by considering multiple comparison procedures. The existing comparison procedures can roughly be categorized as analytical or resampling based. This paper describes a resampling based multiple comparison technique. This method is illustrated on the estimate of the number of hidden units for feed-forward neural networks.

Journal ArticleDOI
TL;DR: Assessment of confidence limits at different levels of correctness is discussed using standard and bootstrap methods, andferiority of standard normal approaches becomes evident even in mildly non-linear situations.