scispace - formally typeset
Search or ask a question

Showing papers on "Resampling published in 2015"


Journal ArticleDOI
TL;DR: A frequentist analogue to SUCRA which is based solely on the point estimates and standard errors of the frequentist network meta-analysis estimates under normality assumption and can easily be calculated as means of one-sided p-values is proposed.
Abstract: Network meta-analysis is used to compare three or more treatments for the same condition. Within a Bayesian framework, for each treatment the probability of being best, or, more general, the probability that it has a certain rank can be derived from the posterior distributions of all treatments. The treatments can then be ranked by the surface under the cumulative ranking curve (SUCRA). For comparing treatments in a network meta-analysis, we propose a frequentist analogue to SUCRA which we call P-score that works without resampling. P-scores are based solely on the point estimates and standard errors of the frequentist network meta-analysis estimates under normality assumption and can easily be calculated as means of one-sided p-values. They measure the mean extent of certainty that a treatment is better than the competing treatments. Using case studies of network meta-analysis in diabetes and depression, we demonstrate that the numerical values of SUCRA and P-Score are nearly identical. Ranking treatments in frequentist network meta-analysis works without resampling. Like the SUCRA values, P-scores induce a ranking of all treatments that mostly follows that of the point estimates, but takes precision into account. However, neither SUCRA nor P-score offer a major advantage compared to looking at credible or confidence intervals.

819 citations


Journal ArticleDOI
TL;DR: This work adopts a U-statistics-based C estimator that is asymptotically normal and develops a nonparametric analytical approach to estimate the variance of the C estimATOR and the covariance of two C estimators, which is illustrated with an example from the Framingham Heart Study.
Abstract: The area under the receiver operating characteristic curve is often used as a summary index of the diagnostic ability in evaluating biomarkers when the clinical outcome (truth) is binary. When the clinical outcome is right-censored survival time, the C index, motivated as an extension of area under the receiver operating characteristic curve, has been proposed by Harrell as a measure of concordance between a predictive biomarker and the right-censored survival outcome. In this work, we investigate methods for statistical comparison of two diagnostic or predictive systems, of which they could either be two biomarkers or two fixed algorithms, in terms of their C indices. We adopt a U-statistics-based C estimator that is asymptotically normal and develop a nonparametric analytical approach to estimate the variance of the C estimator and the covariance of two C estimators. A z-score test is then constructed to compare the two C indices. We validate our one-shot nonparametric method via simulation studies in terms of the type I error rate and power. We also compare our one-shot method with resampling methods including the jackknife and the bootstrap. Simulation results show that the proposed one-shot method provides almost unbiased variance estimations and has satisfactory type I error control and power. Finally, we illustrate the use of the proposed method with an example from the Framingham Heart Study.

238 citations


Journal ArticleDOI
TL;DR: The purpose of this paper is to present specialized measures directed to assess the imbalance level in multilabel datasets (MLDs) and propose several algorithms designed to reduce the imbalance in MLDs in a classifier-independent way, by means of resampling techniques.

204 citations


Journal ArticleDOI
TL;DR: A diversified sensitivity-based undersampling method that yields a good generalization capability for 14 UCI datasets by iteratively clustering and sampling a balanced set of samples yielding high classifier sensitivity.
Abstract: Undersampling is a widely adopted method to deal with imbalance pattern classification problems. Current methods mainly depend on either random resampling on the majority class or resampling at the decision boundary. Random-based undersampling fails to take into consideration informative samples in the data while resampling at the decision boundary is sensitive to class overlapping. Both techniques ignore the distribution information of the training dataset. In this paper, we propose a diversified sensitivity-based undersampling method. Samples of the majority class are clustered to capture the distribution information and enhance the diversity of the resampling. A stochastic sensitivity measure is applied to select samples from both clusters of the majority class and the minority class. By iteratively clustering and sampling, a balanced set of samples yielding high classifier sensitivity is selected. The proposed method yields a good generalization capability for 14 UCI datasets.

195 citations


Journal ArticleDOI
TL;DR: Bayesian phylogenetic analyses of simulated virus data are conducted to evaluate the performance of the date-randomization test and propose guidelines for interpretation of its results, finding that the test sometimes fails to detect rate estimates from data with no temporal signal.
Abstract: Rates and timescales of viral evolution can be estimated using phylogenetic analyses of time-structured molecular sequences. This involves the use of molecular-clock methods, calibrated by the sampling times of the viral sequences. However, the spread of these sampling times is not always sufficient to allow the substitution rate to be estimated accurately. We conducted Bayesian phylogenetic analyses of simulated virus data to evaluate the performance of the date-randomization test, which is sometimes used to investigate whether time-structured data sets have temporal signal. An estimate of the substitution rate passes this test if its mean does not fall within the 95% credible intervals of rate estimates obtained using replicate data sets in which the sampling times have been randomized. We find that the test sometimes fails to detect rate estimates from data with no temporal signal. This error can be minimized by using a more conservative criterion, whereby the 95% credible interval of the estimate with correct sampling times should not overlap with those obtained with randomized sampling times. We also investigated the behavior of the test when the sampling times are not uniformly distributed throughout the tree, which sometimes occurs in empirical data sets. The test performs poorly in these circumstances, such that a modification to the randomization scheme is needed. Finally, we illustrate the behavior of the test in analyses of nucleotide sequences of cereal yellow dwarf virus. Our results validate the use of the date-randomization test and allow us to propose guidelines for interpretation of its results.

155 citations


Journal ArticleDOI
TL;DR: Of the different methods evaluated, only the Knapp and Hartung method and the permutation test provide adequate control of the Type I error rate across all conditions.
Abstract: Several alternative methods are available when testing for moderators in mixed-effects meta-regression models. A simulation study was carried out to compare different methods in terms of their Type I error and statistical power rates. We included the standard (Wald-type) test, the method proposed by Knapp and Hartung (2003) in 2 different versions, the Huber-White method, the likelihood ratio test, and the permutation test in the simulation study. These methods were combined with 7 estimators for the amount of residual heterogeneity in the effect sizes. Our results show that the standard method, applied in most meta-analyses up to date, does not control the Type I error rate adequately, sometimes leading to overly conservative, but usually to inflated, Type I error rates. Of the different methods evaluated, only the Knapp and Hartung method and the permutation test provide adequate control of the Type I error rate across all conditions. Due to its computational simplicity, the Knapp and Hartung method is recommended as a suitable option for most meta-analyses.

140 citations


Journal ArticleDOI
TL;DR: It is demonstrated not only that bootstrapping has insufficient statistical power to provide a rigorous hypothesis test in most conditions but also that boot strapping has a tendency to exhibit an inflated Type I error rate.
Abstract: Bootstrapping is an analytical tool commonly used in psychology to test the statistical significance of the indirect effect in mediation models. Bootstrapping proponents have particularly advocated for its use for samples of 20-80 cases. This advocacy has been heeded, especially in the Journal of Applied Psychology, as researchers are increasingly utilizing bootstrapping to test mediation with samples in this range. We discuss reasons to be concerned with this escalation, and in a simulation study focused specifically on this range of sample sizes, we demonstrate not only that bootstrapping has insufficient statistical power to provide a rigorous hypothesis test in most conditions but also that bootstrapping has a tendency to exhibit an inflated Type I error rate. We then extend our simulations to investigate an alternative empirical resampling method as well as a Bayesian approach and demonstrate that they exhibit comparable statistical power to bootstrapping in small samples without the associated inflated Type I error. Implications for researchers testing mediation hypotheses in small samples are presented. For researchers wishing to use these methods in their own research, we have provided R syntax in the online supplemental materials.

139 citations


Journal ArticleDOI
TL;DR: In this article, a modified permutation approach is proposed to improve the small sample behavior of the Wald-type statistic, maintaining its applicability to general settings as crossed or hierarchically nested designs.
Abstract: Summary In general factorial designs where no homoscedasticity or a particular error distribution is assumed, the well-known Wald-type statistic is a simple asymptotically valid procedure. However, it is well known that it suffers from a poor finite sample approximation since the convergence to its χ2 limit distribution is quite slow. This becomes even worse with an increasing number of factor levels. The aim of the paper is to improve the small sample behaviour of the Wald-type statistic, maintaining its applicability to general settings as crossed or hierarchically nested designs by applying a modified permutation approach. In particular, it is shown that this approach approximates the null distribution of the Wald-type statistic not only under the null hypothesis but also under the alternative yielding an asymptotically valid permutation test which is even finitely exact under exchangeability. Finally, its small sample behaviour is compared with competing procedures in an extensive simulation study.

122 citations


Journal ArticleDOI
02 Jun 2015-BMJ
TL;DR: The data from a single sample are used here to quantify the variation in the estimate of interest across (hypothetical) multiple samples from the same population.
Abstract: In medical research we study a sample of individuals to make inferences about a target population. Estimates of interest, such as a mean or a difference in proportions, are calculated, usually accompanied by a confidence interval derived from the standard error. The data from a single sample are used here to quantify the variation in the estimate of interest across (hypothetical) multiple samples from the same population.1 As we have only one sample we need to make assumptions about the data. Most methods of analysis are called parametric because they incorporate assumptions about the distribution of the data, such as that observations follow a normal distribution. Non-parametric methods avoid assumptions about distributions but generally provide only P values and not estimates of quantities of interest.2 For a given dataset the assumptions may not be met. In such cases there is an alternative way to estimate standard errors and confidence intervals without any reliance on assumed probability distributions. We use the sample dataset and apply a resampling procedure called …

103 citations


Journal ArticleDOI
TL;DR: It is proposed to address regression tasks by resampling approaches that change the distribution of the given data set to decrease the problem of imbalance between the rare target cases and the most frequent ones to address problems of forecasting rare extreme values of a continuous target variable.
Abstract: Several real world prediction problems involve forecasting rare values of a target variable When this variable is nominal, we have a problem of class imbalance that was thoroughly studied within machine learning For regression tasks, where the target variable is continuous, few works exist addressing this type of problem Still, important applications involve forecasting rare extreme values of a continuous target variable This paper describes a contribution to this type of tasks Namely, we propose to address such tasks by resampling approaches that change the distribution of the given data set to decrease the problem of imbalance between the rare target cases and the most frequent ones We present two modifications of well-known resampling strategies for classification tasks: the under-sampling and the synthetic minority over-sampling technique SMOTE methods These modifications allow the use of these strategies on regression tasks where the goal is to forecast rare extreme values of the target variable In an extensive set of experiments, we provide empirical evidence for the superiority of our proposals for these particular regression tasks The proposed resampling methods can be used with any existing regression algorithm, which means that they are general tools for addressing problems of forecasting rare extreme values of a continuous target variable

97 citations


16 Nov 2015
TL;DR: Conditional inference procedures for the general independence problem including two-sample, K-sample (non-parametric ANOVA), correlation, correlation, censored, ordered and multivariate problems.
Abstract: September 8, 2009 Title Conditional Inference Procedures in a Permutation Test Framework Date 2009-09-08 Version 1.0-6 Author Torsten Hothorn, Kurt Hornik, Mark A. van de Wiel and Achim Zeileis Maintainer Torsten Hothorn Description Conditional inference procedures for the general independence problem including two-sample, K-sample (non-parametric ANOVA), correlation, censored, ordered and multivariate problems. Depends R (>= 2.2.0), methods, survival, mvtnorm (>= 0.8-0), modeltools (>= 0.2-9) Suggests multcomp, xtable, e1071, vcd Enhances Biobase LazyLoad yes LazyData yes License GPL-2 Repository CRAN Date/Publication 2009-09-08 18:13:32

Journal ArticleDOI
TL;DR: REML analysis supports the previous conclusion that the G matrix for this population is full rank, and is computationally very efficient, making it an attractive alternative to both data resampling and MCMC approaches to assessing confidence in parameters of evolutionary interest.
Abstract: We explore the estimation of uncertainty in evolutionary parameters using a recently devised approach for resampling entire additive genetic variance-covariance matrices (G). Large-sample theory shows that maximum-likelihood estimates (including restricted maximum likelihood, REML) asymptotically have a multivariate normal distribution, with covariance matrix derived from the inverse of the information matrix, and mean equal to the estimated G. This suggests that sampling estimates of G from this distribution can be used to assess the variability of estimates of G, and of functions of G. We refer to this as the REML-MVN method. This has been implemented in the mixed-model program WOMBAT. Estimates of sampling variances from REML-MVN were compared to those from the parametric bootstrap and from a Bayesian Markov chain Monte Carlo (MCMC) approach (implemented in the R package MCMCglmm). We apply each approach to evolvability statistics previously estimated for a large, 20-dimensional data set for Drosophila wings. REML-MVN and MCMC sampling variances are close to those estimated with the parametric bootstrap. Both slightly underestimate the error in the best-estimated aspects of the G matrix. REML analysis supports the previous conclusion that the G matrix for this population is full rank. REML-MVN is computationally very efficient, making it an attractive alternative to both data resampling and MCMC approaches to assessing confidence in parameters of evolutionary interest.

Journal ArticleDOI
TL;DR: Using resampling methods and sensitivity analyses based on randomized subsamples, sampling error in horse teeth from several modern and fossil populations is assessed and indicates that mean centroid size is highly accurate; even when sample size is small, errors are generally considerably smaller than differences among populations.
Abstract: One of the most basic but problematic issues in modern morphometrics is how many specimens one needs to achieve accuracy in samples. Indeed, this is one of the most regularly posed questions in introductory courses. There is no simple and certainly no absolute answer to this question. However, there are a number of techniques for exploring the effect of sampling, and our aim is to provide an example of how this might function in a simplified but informative way. Thus, using resampling methods and sensitivity analyses based on randomized subsamples, we assessed sampling error in horse teeth from several modern and fossil populations. Centroid size and shape of an upper premolar (PM2) were captured using Procrustes geometric morphometrics. Means and variances (using three different statistics for shape variance) were estimated, as well as their confidence intervals. Also, the largest population sample was randomly split into progressively smaller subsamples to assess how reducing sample size affects statistical parameters. Results indicate that mean centroid size is highly accurate; even when sample size is small, errors are generally considerably smaller than differences among populations. In contrast, mean shape estimation requires large samples of tens of specimens (ca. >20), although this requirement may be less stringent when variance in a population is very small (e.g. populations that underwent strong genetic bottlenecks). Variance in either centroid size or shape can be highly inaccurate in small samples, to the point that sampling error makes it as variable as differences among spatially and chronologically well-separated populations, including two which are highly distinctive as a consequence of strong artificial selection. Likely, centroid size and shape variance require no <15–20 specimens to achieve a reasonable degree of accuracy. Results from the simplified sensitivity analysis were largely congruent with the pattern suggested by bootstrapped confidence intervals, as well as with the observations of a previous study on African monkeys. The analyses we performed, especially the sensitivity assessment, are simple and do not require much time or computational effort; however, they do necessitate that at least one sample is large (50 or more specimens). If this type of analyses became more common in geometric morphometrics, it could provide an effective tool for the preliminarily exploration of the effect of sampling on results and therefore assist in assessing their robustness. Finally, as the use of sensitivity studies increases, the present case could form part of a set of examples that allow us to better understand and estimate what a desirable sample size might be, depending on the scientific question, type of data and taxonomic level under investigation.

Journal ArticleDOI
TL;DR: Practical guidance on when estimators are likely to provide substantial precision gains are provided and a quick assessment method is described that allows clinical investigators to determine whether these estimators could be useful in their specific trial contexts are described.
Abstract: We focus on estimating the average treatment effect in a randomized trial. If baseline variables are correlated with the outcome, then appropriately adjusting for these variables can improve precision. An example is the analysis of covariance (ANCOVA) estimator, which applies when the outcome is continuous, the quantity of interest is the difference in mean outcomes comparing treatment versus control, and a linear model with only main effects is used. ANCOVA is guaranteed to be at least as precise as the standard unadjusted estimator, asymptotically, under no parametric model assumptions and also is locally semiparametric efficient. Recently, several estimators have been developed that extend these desirable properties to more general settings that allow any real-valued outcome (e.g., binary or count), contrasts other than the difference in mean outcomes (such as the relative risk), and estimators based on a large class of generalized linear models (including logistic regression). To the best of our knowledge, we give the first simulation study in the context of randomized trials that compares these estimators. Furthermore, our simulations are not based on parametric models; instead, our simulations are based on resampling data from completed randomized trials in stroke and HIV in order to assess estimator performance in realistic scenarios. We provide practical guidance on when these estimators are likely to provide substantial precision gains and describe a quick assessment method that allows clinical investigators to determine whether these estimators could be useful in their specific trial contexts.

Journal ArticleDOI
TL;DR: A new command, xtbcfe, is described that performs the iterative bootstrap-based bias correction for the fixed-effects estimator in dynamic panels proposed by Everaert and Pozzi (2007) by using the invariance principle.
Abstract: In this article, we describe a new command, xtbcfe, that performs the iterative bootstrap-based bias correction for the fixed-effects estimator in dynamic panels proposed by Everaert and Pozzi (2007, Journal of Economic Dynamics and Control 31: 1160–1184). We first simplify the core of their algorithm by using the invariance principle and subsequently extend it to allow for unbalanced and higher order dynamic panels. We implement various bootstrap error resampling schemes to account for general heteroskedasticity and contemporaneous cross-sectional dependence. Inference can be performed using a bootstrapped variance–covariance matrix or percentile intervals. Monte Carlo simulations show that the simplification of the original algorithm results in a further bias reduction for very small T. The Monte Carlo results also support the bootstrap-based bias correction in higher order dynamic panels and panels with cross-sectional dependence. We illustrate the command with an empirical example estimating a dynamic labor–demand function. (Less)

Proceedings ArticleDOI
08 Dec 2015
TL;DR: In this paper, the impact of resampling on classification accuracy is investigated and compared, and the authors highlight the key points and difficulties of resample and compare the performance of different methods.
Abstract: In many real-world binary classification tasks (e.g. detection of certain objects from images), an available dataset is imbalanced, i.e., it has much less representatives of a one class (a minor class ), than of another. Generally, accurate prediction of the minor class is crucial but it’s hard to achieve since there is not much information about the minor class. One approach to deal with this problem is to preliminarily resample the dataset, i.e., add new elements to the dataset or remove existing ones. Resampling can be done in various ways which raises the problem of choosing the most appropriate one. In this paper we experimentally investigate impact of resampling on classification accuracy, compare resampling methods and highlight key points and difficulties of resampling.

Journal ArticleDOI
TL;DR: A novel double resampling method is described to quantify uncertainty in MultSE values with increasing sample size, as a useful quantity for assessing sample-size adequacy in studies of ecological communities.
Abstract: Ecological studies require key decisions regarding the appropriate size and number of sampling units. No methods currently exist to measure precision for multivariate assemblage data when dissimilarity-based analyses are intended to follow. Here, we propose a pseudo multivariate dissimilarity-based standard error (MultSE) as a useful quantity for assessing sample-size adequacy in studies of ecological communities. Based on sums of squared dissimilarities, MultSE measures variability in the position of the centroid in the space of a chosen dissimilarity measure under repeated sampling for a given sample size. We describe a novel double resampling method to quantify uncertainty in MultSE values with increasing sample size. For more complex designs, values of MultSE can be calculated from the pseudo residual mean square of a permanova model, with the double resampling done within appropriate cells in the design. R code functions for implementing these techniques, along with ecological examples, are provided.

Journal ArticleDOI
TL;DR: A nonparametric technique is proposed as an alternative to parametric error models to estimate the uncertainty of hydrological predictions, and it is proved that the results obtained are compared with those obtained using a formal statistical technique from the same case study.
Abstract: Estimating the uncertainty of hydrological models remains a relevant challenge in applied hydrology, mostly because it is not easy to parameterize the complex structure of hydrological model errors. A nonparametric technique is proposed as an alternative to parametric error models to estimate the uncertainty of hydrological predictions. Within this approach, the above uncertainty is assumed to depend on input data uncertainty, parameter uncertainty and model error, where the latter aggregates all sources of uncertainty that are not considered explicitly. Errors of hydrological models are simulated by resampling from their past realizations using a nearest neighbor approach, therefore avoiding a formal description of their statistical properties. The approach is tested using synthetic data which refer to the case study located in Italy. The results are compared with those obtained using a formal statistical technique (meta-Gaussian approach) from the same case study. Our findings prove that the nea...

Journal ArticleDOI
TL;DR: In this article, a Bayesian approach was proposed to predict the tool wear rate, linking vibration data measured during machining with the state of tool wear. But the results of the predicted tool wear state and off-line tool wear measurement were not comparable.

ReportDOI
TL;DR: In this paper, a permutation test based on induced ordered statistics for the null hypothesis of continuity of the distribution of baseline covariates at the cut-off is proposed, which is easy to implement and exhibits finite sample validity under stronger conditions than those needed for its asymptotic validity.
Abstract: In the regression discontinuity design (RDD), it is common practice to assess the credibility of the design by testing whether the means of baseline covariates do not change at the cut-off (or threshold) of the running variable. This practice is partly motivated by the stronger implication derived by Lee (2008), who showed that under certain conditions the distribution of baseline covariates in the RDD must be continuous at the cut-off. We propose a permutation test based on the so-called induced ordered statistics for the null hypothesis of continuity of the distribution of baseline covariates at the cut-off; and introduce a novel asymptotic framework to analyse its properties. The asymptotic framework is intended to approximate a small sample phenomenon: even though the total number n of observations may be large, the number of effective observations local to the cut-off is often small. Thus, while traditional asymptotics in RDD require a growing number of observations local to the cut-off as n→∞, our framework keeps the number q of observations local to the cut-off fixed as n→∞. The new test is easy to implement, asymptotically valid under weak conditions, exhibits finite sample validity under stronger conditions than those needed for its asymptotic validity, and has favourable power properties relative to tests based on means. In a simulation study, we find that the new test controls size remarkably well across designs. We then use our test to evaluate the plausibility of the design in Lee (2008), a well-known application of the RDD to study incumbency advantage.

Journal ArticleDOI
TL;DR: More than a dozen typical resampling methods are compared via simulations in terms of sample size variation, sampling variance, computing speed, and estimation accuracy, providing solid guidelines for either selection of existing resamplings methods or new implementations.
Abstract: Resampling is a critical procedure that is of both theoretical and practical significance for efficient implementation of the particle filter. To gain an insight of the resampling process and the filter, this paper contributes in three further respects as a sequel to the tutorial (Li et al., 2015). First, identical distribution (ID) is established as a general principle for the resampling design, which requires the distribution of particles before and after resampling to be statistically identical. Three consistent metrics including the (symmetrical) Kullback-Leibler divergence, Kolmogorov-Smirnov statistic, and the sampling variance are introduced for assessment of the ID attribute of resampling, and a corresponding, qualitative ID analysis of representative resampling methods is given. Second, a novel resampling scheme that obtains the optimal ID attribute in the sense of minimum sampling variance is proposed. Third, more than a dozen typical resampling methods are compared via simulations in terms of sample size variation, sampling variance, computing speed, and estimation accuracy. These form a more comprehensive understanding of the algorithm, providing solid guidelines for either selection of existing resampling methods or new implementations.


Journal ArticleDOI
TL;DR: This work investigates a general resampling approach (BI-SS) that combines bootstrap imputation and stability selection, the latter of which was developed for fully observed data and can be applied to a wide range of settings.
Abstract: In the presence of missing data, variable selection methods need to be tailored to missing data mechanisms and statistical approaches used for handling missing data. We focus on the mechanism of missing at random and variable selection methods that can be combined with imputation. We investigate a general resampling approach (BI-SS) that combines bootstrap imputation and stability selection, the latter of which was developed for fully observed data. The proposed approach is general and can be applied to a wide range of settings. Our extensive simulation studies demonstrate that the performance of BI-SS is the best or close to the best and is relatively insensitive to tuning parameter values in terms of variable selection, compared with several existing methods for both low-dimensional and high-dimensional problems. The proposed approach is further illustrated using two applications, one for a low-dimensional problem and the other for a high-dimensional problem.

Journal ArticleDOI
Mark Abney1
TL;DR: A formula is provided that predicts the amount of inflation of the type 1 error rate depending on the degree of misspecification of the covariance structure of the polygenic effect and the heritability of the trait and is validated by doing simulations.
Abstract: This article discusses problems with and solutions to performing valid permutation tests for quantitative trait loci in the presence of polygenic effects. Although permutation testing is a popular approach for determining statistical significance of a test statistic with an unknown distribution--for instance, the maximum of multiple correlated statistics or some omnibus test statistic for a gene, gene-set, or pathway--naive application of permutations may result in an invalid test. The risk of performing an invalid permutation test is particularly acute in complex trait mapping where polygenicity may combine with a structured population resulting from the presence of families, cryptic relatedness, admixture, or population stratification. I give both analytical derivations and a conceptual understanding of why typical permutation procedures fail and suggest an alternative permutation-based algorithm, MVNpermute, that succeeds. In particular, I examine the case where a linear mixed model is used to analyze a quantitative trait and show that both phenotype and genotype permutations may result in an invalid permutation test. I provide a formula that predicts the amount of inflation of the type 1 error rate depending on the degree of misspecification of the covariance structure of the polygenic effect and the heritability of the trait. I validate this formula by doing simulations, showing that the permutation distribution matches the theoretical expectation, and that my suggested permutation-based test obtains the correct null distribution. Finally, I discuss situations where naive permutations of the phenotype or genotype are valid and the applicability of the results to other test statistics.

Journal ArticleDOI
01 Nov 2015
TL;DR: In this article, a general approximate permutation test (permutation of the residuals under the reduced model or reduced residuals) is proposed for any repeated measures and mixed-model designs, for any number of repetitions per cell, any numbers of subjects and factors and for both balanced and unbalanced designs (all-cell-filled).
Abstract: Repeated measures ANOVA and mixed-model designs are the main classes of experimental designs used in psychology. The usual analysis relies on some parametric assumptions (typically Gaussianity). In this article, we propose methods to analyze the data when the parametric conditions do not hold. The permutation test, which is a non-parametric test, is suitable for hypothesis testing and can be applied to experimental designs. The application of permutation tests in simpler experimental designs such as factorial ANOVA or ANOVA with only between-subject factors has already been considered. The main purpose of this paper is to focus on more complex designs that include only within-subject factors (repeated measures) or designs that include both within-subject and between-subject factors (mixed-model designs). First, a general approximate permutation test (permutation of the residuals under the reduced model or reduced residuals) is proposed for any repeated measures and mixed-model designs, for any number of repetitions per cell, any number of subjects and factors and for both balanced and unbalanced designs (all-cell-filled). Next, a permutation test that uses residuals that are exchangeable up to the second moment is introduced for balanced cases in the same class of experimental designs. This permutation test is therefore exact for spherical data. Finally, we provide simulations results for the comparison of the level and the power of the proposed methods.

Journal ArticleDOI
TL;DR: A novel ordinal regression framework for predicting medical risk stratification from EMR is constructed, a conceptual view of EMR as a temporal image is constructed to extract a diverse set of features, and two indices are introduced that measure the model stability against data resampling.
Abstract: The recent wide adoption of electronic medical records (EMRs) presents great opportunities and challenges for data mining. The EMR data are largely temporal, often noisy, irregular and high dimensional. This paper constructs a novel ordinal regression framework for predicting medical risk stratification from EMR. First, a conceptual view of EMR as a temporal image is constructed to extract a diverse set of features. Second, ordinal modeling is applied for predicting cumulative or progressive risk. The challenges are building a transparent predictive model that works with a large number of weakly predictive features, and at the same time, is stable against resampling variations. Our solution employs sparsity methods that are stabilized through domain-specific feature interaction networks. We introduces two indices that measure the model stability against data resampling. Feature networks are used to generate two multivariate Gaussian priors with sparse precision matrices (the Laplacian and Random Walk). We apply the framework on a large short-term suicide risk prediction problem and demonstrate that our methods outperform clinicians to a large margin, discover suicide risk factors that conform with mental health knowledge, and produce models with enhanced stability.

Journal ArticleDOI
TL;DR: Simulation studies and an application to an HIV clinical trial show that the proposed permutation test attains the nominal Type I error rate and can be drastically more powerful than the classical Mann-Whitney U test.
Abstract: The Mann-Whitney U test is frequently used to evaluate treatment effects in randomized experiments with skewed outcome distributions or small sample sizes. It may lack power, however, because it ignores the auxiliary baseline covariate information that is routinely collected. Wald and score tests in so-called probabilistic index models generalize the Mann-Whitney U test to enable adjustment for covariates, but these may lack robustness by demanding correct model specification and do not lend themselves to small sample inference. Using semiparametric efficiency theory, we here propose an alternative extension of the Mann-Whitney U test, which increases its power by exploiting covariate information in an objective way and which lends itself to permutation inference. Simulation studies and an application to an HIV clinical trial show that the proposed permutation test attains the nominal Type I error rate and can be drastically more powerful than the classical Mann-Whitney U test.

Journal ArticleDOI
TL;DR: An adaptive resampling test (ART) is proposed that provides an alternative to the popular (yet conservative) Bonferroni method of controlling family-wise error rates and is evaluated using a simulation study and applied to gene expression data and HIV drug resistance data.
Abstract: This article investigates marginal screening for detecting the presence of significant predictors in high-dimensional regression. Screening large numbers of predictors is a challenging problem due to the nonstandard limiting behavior of post-model-selected estimators. There is a common misconception that the oracle property for such estimators is a panacea, but the oracle property only holds away from the null hypothesis of interest in marginal screening. To address this difficulty, we propose an adaptive resampling test (ART). Our approach provides an alternative to the popular (yet conservative) Bonferroni method of controlling family-wise error rates. ART is adaptive in the sense that thresholding is used to decide whether the centered percentile bootstrap applies, and otherwise adapts to the nonstandard asymptotics in the tightest way possible. The performance of the approach is evaluated using a simulation study and applied to gene expression data and HIV drug resistance data.

Journal ArticleDOI
TL;DR: The results show that using a zero-constrained tree for data simulation can result in a wider null distribution and higher p-values, but does not change the outcome of the SOWH test for most of the data sets tested here.
Abstract: The Swofford-Olsen-Waddell-Hillis (SOWH) test evaluates statistical support for incongruent phylogenetic topologies. It is commonly applied to determine if the maximum likelihood tree in a phylogenetic analysis is significantly different than an alternative hypothesis. The SOWH test compares the observed difference in log-likelihood between two topologies to a null distribution of differences in log-likelihood generated by parametric resampling. The test is a well- established phylogenetic method for topology testing, but it is sensitive to model misspecification, it is computationally burdensome to perform, and its implementation requires the investigator to make several decisions that each have the potential to affect the outcome of the test. We analyzed the effects of multiple factors using seven data sets to which the SOWH test was previously applied. These factors include a number of sample replicates, likelihood software, the introduction of gaps to simulated data, the use of distinct models of evolution for data simulation and likelihood inference, and a suggested test correction wherein an unresolved "zero-constrained" tree is used to simulate sequence data. To facilitate these analyses and future applications of the SOWH test, we wrote SOWHAT, a program that automates the SOWH test. We find that inadequate bootstrap sampling can change the outcome of the SOWH test. The results also show that using a zero-constrained tree for data simulation can result in a wider null distribution and higher p-values, but does not change the outcome of the SOWH test for most of the data sets tested here. These results will help others implement and evaluate the SOWH test and allow us to provide recommendations for future applications of the SOWH test. SOWHAT is available for download from https://github.com/josephryan/SOWHAT. (Phylogenetics; SOWH test; topology test)

Journal ArticleDOI
TL;DR: In this paper, four hypothesis tests based on shifts of flow duration curves (FDCs) are developed and tested using three different experimental designs based on different strategies for resampling of annual FDCs.