scispace - formally typeset
Search or ask a question

Showing papers on "Resampling published in 2010"


Book
27 Jul 2010
TL;DR: In this article, the multcomp package is used for multiple comparisons with a control and all pairwise comparisons with the same pairwise comparison, under the assumption of heteroscedasticity.
Abstract: Introduction General Concepts Error rates and general concepts Construction methods Methods based on Bonferroni's inequality Methods based on Simes' inequality Multiple Comparisons in Parametric Models General linear models Extensions to general parametric models The multcomp package Applications Multiple comparisons with a control All pairwise comparisons Dose response analyses Variable selection in regression models Simultaneous confidence bands Multiple comparisons under heteroscedasticity Multiple comparisons in logistic regression models Multiple comparisons in survival models Multiple comparisons in mixed-effects models Further Topics Resampling-based multiple comparison procedures Group sequential and adaptive designs Combining multiple comparisons with modeling Bibliography Index

923 citations


Journal ArticleDOI
TL;DR: Numerical simulations show that in tests of significance of the relationship between simple variables and multivariate data tables, the power of linear correlation, regression and canonical analysis is far greater than that of the Mantel test and derived forms, meaning that the former methods are much more likely than the latter to detect a relationship when one is present in the data.
Abstract: The Mantel test is widely used to test the linear or monotonic independence of the elements in two distance matrices. It is one of the few appropriate tests when the hypothesis under study can only be formulated in terms of distances; this is often the case with genetic data. In particular, the Mantel test has been widely used to test for spatial relationship between genetic data and spatial layout of the sampling locations. We describe the domain of application of the Mantel test and derived forms. Formula development demonstrates that the sum-of-squares (SS) partitioned in Mantel tests and regression on distance matrices differs from the SS partitioned in linear correlation, regression and canonical analysis. Numerical simulations show that in tests of significance of the relationship between simple variables and multivariate data tables, the power of linear correlation, regression and canonical analysis is far greater than that of the Mantel test and derived forms, meaning that the former methods are much more likely than the latter to detect a relationship when one is present in the data. Examples of difference in power are given for the detection of spatial gradients. Furthermore, the Mantel test does not correctly estimate the proportion of the original data variation explained by spatial structures. The Mantel test should not be used as a general method for the investigation of linear relationships or spatial structures in univariate or multivariate data. Its use should be restricted to tests of hypotheses that can only be formulated in terms of distances.

622 citations


Book ChapterDOI
01 Jan 2010
TL;DR: This paper provides an introduction to an alternative distribution free approach based on an approximate randomization test – where a subset of all possible data permutations between sample groups is made.
Abstract: To date, multi-group comparison of Partial Least Square (PLS) models where differences in path estimates for different sampled populations have been relatively naive. Often, researchers simply examine and discuss the difference in magnitude of specific model path estimates from two or more data sets. When evaluating the significance of path differences, a t-test based on the pooled standard errors obtained via a resampling procedure such as bootstrapping from each data set is made. Yet problems can occur if the assumption of normal population or similar sample size is made. This paper provides an introduction to an alternative distribution free approach based on an approximate randomization test – where a subset of all possible data permutations between sample groups is made. The performance of this permutation procedure is tested on both simulated data and a study exploring the differences of factors that impact outsourcing between the countries of US and Germany. Furthermore, as an initial examination of the consistency of this new procedure, the outsourcing results are compared with those obtained from using covariance based SEM (AMOS 7).

447 citations


Journal ArticleDOI
TL;DR: The analysis shows that studying the classification error via permutation tests is effective; in particular, the restricted permutation test clearly reveals whether the classifier exploits the interdependency between the features in the data.
Abstract: We explore the framework of permutation-based p-values for assessing the performance of classifiers. In this paper we study two simple permutation tests. The first test assess whether the classifier has found a real class structure in the data; the corresponding null distribution is estimated by permuting the labels in the data. This test has been used extensively in classification problems in computational biology. The second test studies whether the classifier is exploiting the dependency between the features in classification; the corresponding null distribution is estimated by permuting the features within classes, inspired by restricted randomization techniques traditionally used in statistics. This new test can serve to identify descriptive features which can be valuable information in improving the classifier performance. We study the properties of these tests and present an extensive empirical evaluation on real and synthetic data. Our analysis shows that studying the classifier performance via permutation tests is effective. In particular, the restricted permutation test clearly reveals whether the classifier exploits the interdependency between the features in the data.

436 citations


Book
01 Dec 2010
TL;DR: The capital asset pricing model, factor models and principal components, and nonparametric regression and splines are presented.
Abstract: Introduction.- Returns.- Fixed income securities.- Exploratory data analysis.- Modeling univariate distributions.- Resampling.- Multivariate statistical models.- Copulas.- Time series models: basics.- Time series models: further topics.- Portfolio theory.- Regression: basics.- Regression: troubleshooting.- Regression: advanced topics.- Cointegration.- The capital asset pricing model.- Factor models and principal components.- GARCH models.- Risk management.- Bayesian data analysis and MCMC.- Nonparametric regression and splines.

299 citations


Journal ArticleDOI
TL;DR: In this article, a Bayesian decision theoretic framework was proposed for optimal portfolio selection using a skew normal distribution, which has many attractive features for modeling multivariate returns. But, it is important to incorporate higher order moments in portfolio selection, which leads to higher expected utility than the traditional Markowitz approach.
Abstract: We propose a method for optimal portfolio selection using a Bayesian decision theoretic framework that addresses two major shortcomings of the traditional Markowitz approach: the ability to handle higher moments and parameter uncertainty. We employ the skew normal distribution which has many attractive features for modeling multivariate returns. Our results suggest that it is important to incorporate higher order moments in portfolio selection. Further, our comparison to other methods where parameter uncertainty is either ignored or accommodated in an ad hoc way, shows that our approach leads to higher expected utility than competing methods, such as the resampling methods that are common in the practice of finance.

296 citations


Journal ArticleDOI
TL;DR: In this article, a generic method for the providing of prediction intervals of wind power generation is described, which employs a fuzzy inference model that permits to integrate expertise on the characteristics of prediction errors for providing conditional interval forecasts.
Abstract: A generic method for the providing of prediction intervals of wind power generation is described. Prediction intervals complement the more common wind power point forecasts, by giving a range of potential outcomes for a given probability, their so-called nominal coverage rate. Ideally they inform of the situation-specific uncertainty of point forecasts. In order to avoid a restrictive assumption on the shape of forecast error distributions, focus is given to an empirical and nonparametric approach named adapted resampling. This approach employs a fuzzy inference model that permits to integrate expertise on the characteristics of prediction errors for providing conditional interval forecasts. By simultaneously generating prediction intervals with various nominal coverage rates, one obtains full predictive distributions of wind generation. Adapted resampling is applied here to the case of an onshore Danish wind farm, for which three point forecasting methods are considered as input. The probabilistic forecasts generated are evaluated based on their reliability and sharpness, while compared to forecasts based on quantile regression and the climatology benchmark. The operational application of adapted resampling to the case of a large number of wind farms in Europe and Australia among others is finally discussed.

291 citations


Journal ArticleDOI
TL;DR: The estimators most widely used to evaluate the prediction error of a non-linear regression model are examined and the repeated-corrected 10-fold Cross-Validation estimator and the Parametric Bootstrap estimator obtained the best performance in the simulations.

215 citations


Journal ArticleDOI
TL;DR: A canonical large deviations criterion for optimality is considered and it is shown that inference based on the empirical likelihood ratio statistic is optimal and a new empirical likelihood bootstrap is introduced that provides a valid resampling method for moment inequality models and overcomes the implementation challenges that arise as a result of non-pivotal limit distributions.

190 citations


Journal ArticleDOI
TL;DR: In this article, a new resampling procedure, the dependent wild bootstrap, was proposed for stationary time series, which can be easily extended to irregularly spaced time series with no implementational difficulty.
Abstract: We propose a new resampling procedure, the dependent wild bootstrap, for stationary time series. As a natural extension of the traditional wild bootstrap to time series setting, the dependent wild bootstrap offers a viable alternative to the existing block-based bootstrap methods, whose properties have been extensively studied over the last two decades. Unlike all of the block-based bootstrap methods, the dependent wild bootstrap can be easily extended to irregularly spaced time series with no implementational difficulty. Furthermore, it preserves the favorable bias and mean squared error property of the tapered block bootstrap, which is the state-of-the-art block-based method in terms of asymptotic accuracy of variance estimation and distribution approximation. The consistency of the dependent wild bootstrap in distribution approximation is established under the framework of the smooth function model. In addition, we obtain the bias and variance expansions of the dependent wild bootstrap variance estimat...

170 citations


Journal ArticleDOI
TL;DR: It is shown that the weighted ensemble technique is statistically exact for a wide class of Markovian and non-Markovian dynamics, including the use of bins which can adaptively find the target state in a simple model.
Abstract: The “weighted ensemble” method, introduced by Huber and Kim [Biophys. J. 70, 97 (1996)], is one of a handful of rigorous approaches to path sampling of rare events. Expanding earlier discussions, we show that the technique is statistically exact for a wide class of Markovian and non-Markovian dynamics. The derivation is based on standard path-integral (path probability) ideas, but recasts the weighted-ensemble approach as simple “resampling” in path space. Similar reasoning indicates that arbitrary nonstatic binning procedures, which merely guide the resampling process, are also valid. Numerical examples confirm the claims, including the use of bins which can adaptively find the target state in a simple model.

Journal ArticleDOI
TL;DR: The results suggest that trait sampling for many objectives in species-rich plant communities may require the considerable effort of sampling at least one individual of each species in each plot, and that investment in complete sampling, though great, may be worthwhile for at least some traits.
Abstract: 1. Despite considerable interest in the application of plant functional traits to questions of community assembly and ecosystem structure and function, there is no consensus on the appropriateness of sampling designs to obtain plot-level estimates in diverse plant communities. 2. We measured 10 plant functional traits describing leaf and stem morphology and ecophysiology for all trees in nine 1-ha plots in terra firme lowland tropical rain forests of French Guiana (N = 4709). 3. We calculated, by simulation, the mean and variance in trait values for each plot and each trait expected under seven sampling methods and a range of sampling intensities. Simulated sampling methods included a variety of spatial designs, as well as the application of existing data base values to all individuals of a given species. 4. For each trait in each plot, we defined a performance index for each sampling design as the proportion of resampling events that resulted in observed means within 5% of the true plot mean, and observed variance within 20% of the true plot variance. 5. The relative performance of sampling designs was consistent for estimations of means and variances. Data base use had consistently poor performance for most traits across all plots, whereas sampling one individual per species per plot resulted in relatively high performance. We found few differences among different spatial sampling strategies; however, for a given strategy, increased intensity of sampling resulted in markedly improved accuracy in estimates of trait mean and variance. 6. We also calculated the financial cost of each sampling design based on data from our 'every individual per plot' strategy and estimated the sampling and botanical effort required. The relative performance of designs was strongly positively correlated with relative financial cost, suggesting that sampling investment returns are relatively constant. 7. Our results suggest that trait sampling for many objectives in species-rich plant communities may require the considerable effort of sampling at least one individual of each species in each plot, and that investment in complete sampling, though great, may be worthwhile for at least some traits.

Journal ArticleDOI
TL;DR: In this article, an incremental mixture importance sampling (IMIS) algorithm is proposed, which iteratively builds up a better sampling function, which retains the simplicity and transparency of sampling importance resampling, but is much more efficient computationally.
Abstract: The Joint United Nations Programme on HIV/AIDS (UNAIDS) has decided to use Bayesian melding as the basis for its probabilistic projections of HIV prevalence in countries with generalized epidemics. This combines a mechanistic epidemiological model, prevalence data, and expert opinion. Initially, the posterior distribution was approximated by sampling-importance-resampling, which is simple to implement, easy to interpret, transparent to users, and gave acceptable results for most countries. For some countries, however, this is not computationally efficient because the posterior distribution tends to be concentrated around nonlinear ridges and can also be multimodal. We propose instead incremental mixture importance sampling (IMIS), which iteratively builds up a better importance sampling function. This retains the simplicity and transparency of sampling importance resampling, but is much more efficient computationally. It also leads to a simple estimator of the integrated likelihood that is the basis for Bayesian model comparison and model averaging. In simulation experiments and on real data, it outperformed both sampling importance resampling and three publicly available generic Markov chain Monte Carlo algorithms for this kind of problem.

Journal ArticleDOI
TL;DR: For a given finite set of nonuniformly sampled data, a reasonable way to choose the Nyquist frequency and the resampling time are discussed and the performance of the different methods is evaluated.

Journal ArticleDOI
TL;DR: The main approaches to resampling variance estimation in complex survey data: balanced repeated replication, the jackknife, and the bootstrap are discussed.
Abstract: In this article, I discuss the main approaches to resampling variance es- timation in complex survey data: balanced repeated replication, the jackknife, and the bootstrap. Balanced repeated replication and the jackknife are implemented in the Stata svy suite. The bootstrap for complex survey data is implemented by the bsweights command. I describe this command and provide working examples. Editors' note. This article was submitted and accepted before the new svy boot- strap prefix was made available in the Stata 11.1 update. The variance estimation method implemented in the new svy bootstrap prefix is equivalent to the one in bs4rw. The only real difference is syntax. For example,

Journal ArticleDOI
TL;DR: Of the MI approaches investigated, MI using MICE-PMM produced the least biased estimates and better model performance measures, however, this MI approach still produced biased regression coefficient estimates with 75% missingness.
Abstract: The appropriate handling of missing covariate data in prognostic modelling studies is yet to be conclusively determined. A resampling study was performed to investigate the effects of different missing data methods on the performance of a prognostic model. Observed data for 1000 cases were sampled with replacement from a large complete dataset of 7507 patients to obtain 500 replications. Five levels of missingness (ranging from 5% to 75%) were imposed on three covariates using a missing at random (MAR) mechanism. Five missing data methods were applied; a) complete case analysis (CC) b) single imputation using regression switching with predictive mean matching (SI), c) multiple imputation using regression switching imputation, d) multiple imputation using regression switching with predictive mean matching (MICE-PMM) and e) multiple imputation using flexible additive imputation models. A Cox proportional hazards model was fitted to each dataset and estimates for the regression coefficients and model performance measures obtained. CC produced biased regression coefficient estimates and inflated standard errors (SEs) with 25% or more missingness. The underestimated SE after SI resulted in poor coverage with 25% or more missingness. Of the MI approaches investigated, MI using MICE-PMM produced the least biased estimates and better model performance measures. However, this MI approach still produced biased regression coefficient estimates with 75% missingness. Very few differences were seen between the results from all missing data approaches with 5% missingness. However, performing MI using MICE-PMM may be the preferred missing data approach for handling between 10% and 50% MAR missingness.

Journal ArticleDOI
TL;DR: In this article, the authors propose a general transition kernel (iterative spatial resampling, ISR) that preserves any spatial model produced by conditional simulation, which can be used with any conditional geostatistical simulation method, whether it generates continuous or discrete variables.
Abstract: [1] Measurements are often unable to uniquely characterize the subsurface at a desired modeling resolution. In particular, inverse problems involving the characterization of hydraulic properties are typically ill-posed since they generally present more unknowns than data. In a Bayesian context, solutions to such problems consist of a posterior ensemble of models that fit the data (up to a certain precision specified by a likelihood function) and that are a subset of a prior distribution. Two possible approaches for this problem are Markov chain Monte Carlo (McMC) techniques and optimization (calibration) methods. Both frameworks rely on a perturbation mechanism to steer the search for solutions. When the model parameters are spatially dependent variable fields obtained using geostatistical realizations, such as hydraulic conductivity or porosity, it is not trivial to incur perturbations that respect the prior spatial model. To overcome this problem, we propose a general transition kernel (iterative spatial resampling, ISR) that preserves any spatial model produced by conditional simulation. We also present a stochastic stopping criterion for the optimizations inspired from importance sampling. In the studied cases, this yields posterior distributions reasonably close to the ones obtained by a rejection sampler, but with a greatly reduced number of forward model runs. The technique is general in the sense that it can be used with any conditional geostatistical simulation method, whether it generates continuous or discrete variables. Therefore it allows sampling of different priors and conditioning to a variety of data types. Several examples are provided based on either multi-Gaussian or multiple-point statistics.

Journal ArticleDOI
TL;DR: This work puts forward a new strategy designed for situations when there is not a priori information about ‘when’ and ‘where’ these differences appear in the spatio‐temporal domain, simultaneously testing numerous hypotheses, which increase the risk of false positives.
Abstract: Current analysis of event-related potentials (ERP) data is usually based on the a priori selection of channels and time windows of interest for studying the differences between experimental conditions in the spatio-temporal domain. In this work we put forward a new strategy designed for situations when there is not a priori information about 'when' and 'where' these differences appear in the spatio-temporal domain, simultaneously testing numerous hypotheses, which increase the risk of false positives. This issue is known as the problem of multiple comparisons and has been managed with methods that control the false discovery rate (FDR), such as permutation test and FDR methods. Although the former has been previously applied, to our knowledge, the FDR methods have not been introduced in the ERP data analysis. Here we compare the performance (on simulated and real data) of permutation test and two FDR methods (Benjamini and Hochberg (BH) and local-fdr, by Efron). All these methods have been shown to be valid for dealing with the problem of multiple comparisons in the ERP analysis, avoiding the ad hoc selection of channels and/or time windows. FDR methods are a good alternative to the common and computationally more expensive permutation test. The BH method for independent tests gave the best overall performance regarding the balance between type I and type II errors. The local-fdr method is preferable for high dimensional (multichannel) problems where most of the tests conform to the empirical null hypothesis. Differences among the methods according to assumptions, null distributions and dimensionality of the problem are also discussed.

Journal ArticleDOI
TL;DR: It is proved that the strategy of nonparametric bootstrapping on the highest level is better than that on lower levels, and some resampling strategies of hierarchical data are provided.
Abstract: Nonparametric bootstrapping for hierarchical data is relatively underdeveloped and not straightforward: certainly it does not make sense to use simple nonparametric resampling, which treats all observations as independent. We have provided some resampling strategies of hierarchical data, proved that the strategy of nonparametric bootstrapping on the highest level (randomly sampling all other levels without replacement within the highest level selected by randomly sampling the highest levels with replacement) is better than that on lower levels, analyzed real data and performed simulation studies.

Journal ArticleDOI
TL;DR: It is proposed to use the out-of-bag estimates of the generalization accuracy to select a near-optimal value for the sampling ratio.


Journal ArticleDOI
TL;DR: McMurry et al. as mentioned in this paper proposed an estimator that leaves the main diagonals of the sample autocovariance matrix intact while gradually down-weighting o'-diagonal entries towards zero.
Abstract: Author(s): McMurry, Timothy L; Politis, D N | Abstract: We address the problem of estimating the autocovariance matrix of a stationary process. Under short range dependence assumptions, convergence rates are established for a gradually tapered version of the sample autocovariance matrix and for its inverse. The proposed estimator is formed by leaving the main diagonals of the sample autocovariance matrix intact while gradually down-weighting o�-diagonal entries towards zero. In addition we show the same convergence rates hold for a positive de�nite version of the estimator, and we introduce a new approach for selecting the banding parameter. The new matrix estimator is shown to perform well theoretically and in simulation studies. As an application we introduce a new resampling scheme for stationary processes termed the linear process bootstrap (LPB). The LPB is shown to be asymptotically valid for the sample mean and related statistics. The e�ectiveness of the proposed methods are demonstrated in a simulation study.

Proceedings ArticleDOI
18 Jul 2010
TL;DR: It is demonstrated that it would be possible to obtain comparable classification results to the random oversampling results through two sets of efficient resampling methods which use 50% less amount of data and 75% less amounts of data, respectively, compared to the sizes of the datasets generated by the random Oversampling method.
Abstract: Random undersampling and oversampling are simple but well-known resampling methods applied to solve the problem of class imbalance. In this paper we show that the random oversampling method can produce better classification results than the random undersampling method, since the oversampling can increase the minority class recognition rate by sacrificing less amount of majority class recognition rate than the undersampling method. However, the random oversampling method would increase the computational cost associated with the SVM training largely due to the addition of new training examples. In this paper we present an investigation carried out to develop efficient resampling methods that can produce comparable classification results to the random oversampling results, but with the use of less amount of data. The main idea of the proposed methods is to first select the most informative data examples located closer to the class boundary region by using the separating hyperplane found by training an SVM model on the original imbalanced dataset, and then use only those examples in resampling. We demonstrate that it would be possible to obtain comparable classification results to the random oversampling results through two sets of efficient resampling methods which use 50% less amount of data and 75% less amount of data, respectively, compared to the sizes of the datasets generated by the random oversampling method.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed to use a surrogate consistency resampling method for the reliability assessment of non-parametric density forecasts of short-term wind-power generation.
Abstract: Reliability is seen as a primary requirement when verifying probabilistic forecasts, since a lack of reliability would introduce a systematic bias in subsequent decision-making. Reliability diagrams comprise popular and practical diagnostic tools for the reliability evaluation of density forecasts of continuous variables. Such diagrams relate to the assessment of the unconditional calibration of probabilistic forecasts. A reason for their appeal is that deviations from perfect reliability can be visually assessed based on deviations from the diagonal. Deviations from the diagonal may, however, be caused by both sampling effects and serial correlation in the forecast-verification pairs. We build on a recent proposal, consisting of associating reliability diagrams with consistency bars that would reflect the deviations from the diagonal that are potentially observable even if density forecasts are perfectly reliable. Our consistency bars, however, reflect potential deviations originating from the combined effects of limited counting statistics and serial correlation in the forecast-verification pairs. They are generated based on an original surrogate consistency resampling method. Its ability to provide consistency bars with a significantly better coverage against the independent and identically distributed (i.i.d.) resampling alternative is shown from simulations. Finally, a practical example of the reliability assessment of non-parametric density forecasts of short-term wind-power generation is given. Copyright © 2010 Royal Meteorological Society

Journal ArticleDOI
TL;DR: The proposed algorithm can maintain the diversity of particles thus avoid the sample impoverishment in particle filters, and can obtain the same estimation accuracy through less number of sample particles.
Abstract: In this correspondence, an improvement on resampling algorithm (also called the systematic resampling algorithm) of particle filters is presented. First, the resampling algorithm is analyzed from a new viewpoint and its defects are demonstrated. Then some exquisite work is introduced in order to overcome these defects such as comparing the weights of particles by stages and constructing the new particles based on quasi-Monte Carlo method, from which an exquisite resampling (ER) algorithm is derived. Compared to the resampling algorithm, the proposed algorithm can maintain the diversity of particles thus avoid the sample impoverishment in particle filters, and can obtain the same estimation accuracy through less number of sample particles. These advantages are finally verified by simulations of non-stationary growth model and a re-entry ballistic object tracking.

Posted Content
TL;DR: The Measures of Analysis of Time Series (MATS) MATLAB toolkit is designed to handle an arbitrary large set of scalar time series and compute a large variety of measures on them, allowing for the specification of varying measure parameters as well.
Abstract: In many applications, such as physiology and finance, large time series data bases are to be analyzed requiring the computation of linear, nonlinear and other measures. Such measures have been developed and implemented in commercial and freeware softwares rather selectively and independently. The Measures of Analysis of Time Series ({\tt MATS}) {\tt MATLAB} toolkit is designed to handle an arbitrary large set of scalar time series and compute a large variety of measures on them, allowing for the specification of varying measure parameters as well. The variety of options with added facilities for visualization of the results support different settings of time series analysis, such as the detection of dynamics changes in long data records, resampling (surrogate or bootstrap) tests for independence and linearity with various test statistics, and discrimination power of different measures and for different combinations of their parameters. The basic features of {\tt MATS} are presented and the implemented measures are briefly described. The usefulness of {\tt MATS} is illustrated on some empirical examples along with screenshots.

Journal Article
TL;DR: Bagging is a simple way to combine estimates in order to improve their performance, and it is shown that this estimate may achieve optimal rate of convergence, independently from the fact that resampling is done with or without replacement.
Abstract: Bagging is a simple way to combine estimates in order to improve their performance. This method, suggested by Breiman in 1996, proceeds by resampling from the original data set, constructing a predictor from each subsample, and decide by combining. By bagging an n-sample, the crude nearest neighbor regression estimate is turned into a consistent weighted nearest neighbor regression estimate, which is amenable to statistical analysis. Letting the resampling size kn grows appropriately with n, it is shown that this estimate may achieve optimal rate of convergence, independently from the fact that resampling is done with or without replacement. Since the estimate with the optimal rate of convergence depends on the unknown distribution of the observations, adaptation results by data-splitting are presented.

31 Mar 2010
TL;DR: In this paper, the authors address the problem of estimating the autocovariance matrix of a stationary process under short range dependence assumptions, and establish convergence rates for a gradually tapered version of the sample auto-correlation matrix and for its inverse.
Abstract: We address the problem of estimating the autocovariance matrix of a stationary process. Under short range dependence assumptions, convergence rates are established for a gradually tapered version of the sample autocovariance matrix and for its inverse. The proposed estimator is formed by leaving the main diagonals of the sample autocovariance matrix intact while gradually down-weighting o�-diagonal entries towards zero. In addition we show the same convergence rates hold for a positive de�nite version of the estimator, and we introduce a new approach for selecting the banding parameter. The new matrix estimator is shown to perform well theoretically and in simulation studies. As an application we introduce a new resampling scheme for stationary processes termed the linear process bootstrap (LPB). The LPB is shown to be asymptotically valid for the sample mean and related statistics. The e�ectiveness of the proposed methods are demonstrated in a simulation study.

Journal ArticleDOI
TL;DR: This article reviews the current practice of statistical analysis of longitudinal data in these fields, and recommends resampling as a method that readily adjusts the post hoc testing to be limited to only interesting comparisons and thereby avoids unduly sacrificing the power.

Journal ArticleDOI
TL;DR: The Measure of Analysis of Time Series (MATS) toolkit as discussed by the authors is designed to handle an arbitrary large set of scalar time series and compute a large variety of measures on them, allowing for the specification of varying measure parameters.
Abstract: In many applications, such as physiology and finance, large time series data bases are to be analyzed requiring the computation of linear, nonlinear and other measures. Such measures have been developed and implemented in commercial and freeware softwares rather selectively and independently. The Measures of Analysis of Time Series ( MATS ) MATLAB toolkit is designed to handle an arbitrary large set of scalar time series and compute a large variety of measures on them, allowing for the specification of varying measure parameters as well. The variety of options with added facilities for visualization of the results support different settings of time series analysis, such as the detection of dynamics changes in long data records, resampling (surrogate or bootstrap) tests for independence and linearity with various test statistics, and discrimination power of different measures and for different combinations of their parameters. The basic features of MATS are presented and the implemented measures are briefly described. The usefulness of MATS is illustrated on some empirical examples along with screenshots.