scispace - formally typeset
Search or ask a question

Showing papers on "Resampling published in 1998"



Journal ArticleDOI
TL;DR: The Simulated Score (MSS) estimator as discussed by the authors uses a recursive conditioning of the multivariate normal density through a Cholesky triangularization of its variance-covariance matrix.
Abstract: The method of simulated scores (MSS) is presented for estimating limited dependent variables models (LDV) with flexible correlation structure in the unobservables. We propose simulators that are continuous in the unknown parameter vectors, and hence standard optimization methods can be used to compute the MSS estimators that employ these simulators. The first continuous method relies on a recursive conditioning of the multivariate normal density through a Cholesky triangularization of its variance-covariance matrix. The second method combines results about the conditionals of the multivariate normal distribution with Gibbs resampling techniques. We establish consistency and asymptotic normality of the MSS estimators and derive suitable rates at which the number of simulations must rise if biased simulators are used.

266 citations


Journal ArticleDOI
TL;DR: It is concluded that for PSD estimation of unevenly sampled signals the Lomb method is more suitable than fast Fourier transform or autoregressive estimate with linear or cubic interpolation, but in extreme situations the Lomb estimate still introduces high-frequency contamination that suggest further studies of superior performance interpolators.
Abstract: This work studies the frequency behavior of a least-square method to estimate the power spectral density of unevenly sampled signals. When the uneven sampling can be modeled as uniform sampling plus a stationary random deviation, this spectrum results in a periodic repetition of the original continuous time spectrum at the mean Nyquist frequency, with a low-pass effect affecting upper frequency bands that depends on the sampling dispersion. If the dispersion is small compared with the mean sampling period, the estimation at the base band is unbiased with practically no dispersion. When uneven sampling is modeled by a deterministic sinusoidal variation respect to the uniform sampling the obtained results are in agreement with those obtained for small random deviation. This approximation is usually well satisfied in signals like heart rate (HR) series. The theoretically predicted performance has been tested and corroborated with simulated and real HR signals. The Lomb method has been compared with the classical power spectral density (PSD) estimators that include resampling to get uniform sampling. The authors have found that the Lomb method avoids the major problem of classical methods: the low-pass effect of the resampling. Also only frequencies up to the mean Nyquist frequency should be considered (lower than 0.5 Hz if the HR is lower than 60 bpm). It is concluded that for PSD estimation of unevenly sampled signals the Lomb method is more suitable than fast Fourier transform or autoregressive estimate with linear or cubic interpolation. In extreme situations (low-HR or high-frequency components) the Lomb estimate still introduces high-frequency contamination that suggest further studies of superior performance interpolators. In the case of HR signals the authors have also marked the convenience of selecting a stationary heart rate period to carry out a heart rate variability analysis.

264 citations


Journal ArticleDOI
TL;DR: In this paper, a general Cox-type regression model is proposed to formulate the marginal distributions of multivariate failure time data, which allows different baseline hazard functions among distinct failure types and imposes a common baseline hazard function on the failure times of the same type.
Abstract: In this article we propose a general Cox-type regression model to formulate the marginal distributions of multivariate failure time data. This model has a nested structure in that it allows different baseline hazard functions among distinct failure types and imposes a common baseline hazard function on the failure times of the same type. We prove that the maximum “quasi-partial-likelihood” estimator for the vector of regression parameters under the independence working assumption is consistent and asymptotically normal with a covariance matrix for which a consistent estimator is provided. Furthermore, we establish the uniform consistency and joint weak convergence of the Aalen-Breslow type estimators for the cumulative baseline hazard functions, and develop a resampling technique to approximate the joint distribution of these processes, which enables one to make simultaneous inference about the survival functions over the time axis and across failure types. Finally, we assess the small-sample pro...

206 citations


Journal ArticleDOI
TL;DR: This article provides a readable, self-contained introduction to the bootstrap and jackknife methodology for statistical inference; in particular, the focus is on the derivation of confidence intervals in general situations.
Abstract: As far back as the late 1970s, the impact of affordable, high-speed computers on the theory and practice of modern statistics was recognized by Efron (1979, 1982). As a result, the bootstrap and other computer-intensive statistical methods (such as subsampling and the jackknife) have been developed extensively since that time and now constitute very powerful (and intuitive) tools to do statistics with. This article provides a readable, self-contained introduction to the bootstrap and jackknife methodology for statistical inference; in particular, the focus is on the derivation of confidence intervals in general situations. A guide to the available bibliography on bootstrap methods is also offered.

153 citations


Journal ArticleDOI
TL;DR: This article used the Gibbs sampling approach in the context of a three state Markov-switching model to show how heteroskedasticity affects inference and suggest two strategies for valid inference.

149 citations


Journal ArticleDOI
01 Aug 1998
TL;DR: It is shown that an increase in computation, necessary for the statistical resampling methods, produces networks that perform better than those constructed in the traditional manner.
Abstract: Neural networks must be constructed and validated with strong empirical dependence, which is difficult under conditions of sparse data. The paper examines the most common methods of neural network validation along with several general validation methods from the statistical resampling literature, as applied to function approximation networks with small sample sizes. It is shown that an increase in computation, necessary for the statistical resampling methods, produces networks that perform better than those constructed in the traditional manner. The statistical resampling methods also result in lower variance of validation, however some of the methods are biased in estimating network error.

121 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a matching algorithm based on a kernel estimate of the conditional lag one distribution or on a fitted autoregression of small order to align with higher likelihood those blocks which match at their ends.
Abstract: The block bootstrap for time series consists in randomly resampling blocks of consecutive values of the given data and aligning these blocks into a bootstrap sample. Here we suggest improving the performance of this method by aligning with higher likelihood those blocks which match at their ends. This is achieved by resampling the blocks according to a Markov chain whose transitions depend on the data. The matching algorithms that we propose take some of the dependence structure of the data into account. They are based on a kernel estimate of the conditional lag one distribution or on a fitted autoregression of small order. Numerical and theoretical analysis in the case of estimating the variance of the sample mean show that matching reduces bias and, perhaps unexpectedly, has relatively little effect of variance. Our theory extends to the case of smooth functions of a vector mean.

107 citations


Journal ArticleDOI
TL;DR: In this paper, a sieve bootstrap procedure for time series with a deterministic trend is proposed, which is based on nonparametric trend estimation and autoregressive approximation for some noise process.
Abstract: We propose a sieve bootstrap procedure for time series with a deterministic trend. The sieve for constructing the bootstrap is based on nonparametric trend estimation and autoregressive approximation for some noise process. The bootstrap scheme itself does i.i.d. resampling of estimated innovations from fitted autoregressive models. We show the validity and indicate second-order correctness of such sieve bootstrap approximations for the limiting distribution of nonparametric linear smoothers. The resampling can then be used to construct nonparametric confidence intervals for the underlying trend. In particular, we show asymptotic validity for constructing confidence bands which are simultaneously within a neighborhood of size in the order of the smoothing bandwidth. Our resampling procedure yields satisfactory results in a simulation study for finite sample sizes. We also apply it to the longest series of total ozone measurements from Arosa (Switzerland) and find a significant decreasing trend.

89 citations


Journal ArticleDOI
TL;DR: In this article, the authors argue that the number of resamples required for bootstrap variance estimation should be determined by the conditional coefficient of variation, involving only resampling variability.
Abstract: It is widely believed that the number of resamples required for bootstrap variance estimation is relatively small An argument based on the unconditional coefficient of variation of the Monte Carlo approximation, suggests that as few as 25 resamples will give reasonable results. In this article we argue that the number of resamples should, in fact, be determined by the conditional coefficient of variation, involving only resampling variability. Our conditional analysis is founded on a belief that Monte Carlo error should not be allowed to determine the conclusions of a statistical analysis and indicates that approximately 800 resamples are required for this purpose. The argument can be generalized to the multivariate setting and a simple formula is given for determining a lower bound on the number of resamples required to approximate an m-dimensional bootstrap variance-covariance matrix.

86 citations


Journal ArticleDOI
TL;DR: In this article, the authors compare the uncertainty in the solution stemming from the data splitting with neural-network specific uncertainties (parameter initialization, choice of number of hidden units, etc.).
Abstract: Exposes problems of the commonly used technique of splitting the available data into training, validation, and test sets that are held fixed, warns about drawing too strong conclusions from such static splits, and shows potential pitfalls of ignoring variability across splits. Using a bootstrap or resampling method, we compare the uncertainty in the solution stemming from the data splitting with neural-network specific uncertainties (parameter initialization, choice of number of hidden units, etc.). We present two results on data from the New York Stock Exchange. First, the variation due to different resamplings is significantly larger than the variation due to different network conditions. This result implies that it is important to not over-interpret a model (or an ensemble of models) estimated on one specific split of the data. Second, on each split, the neural-network solution with early stopping is very close to a linear model; no significant nonlinearities are extracted.

Journal ArticleDOI
TL;DR: In this article, a Monte Carlo analysis of t-test, the Mann-Whitney U-test and the exact randomization t -test is conducted, and it is shown that the T-test performs better than the other two tests in terms of size and power.
Abstract: Data created in a controlled laboratory setting are a relatively new phenomenon to economists. Traditional data analysis methods using either parametric or nonparametric tests are not necessarily the best option available to economists analyzing laboratory data. In 1935, Fisher proposed the randomization technique as an alternative data analysis method when examining treatment effects. The observed data are used to create a test statistic. Then treatment labels are shuffled across the data and the test statistic is recalculated. The original statistic can be ranked against all possible test statistics that can be generated by these data, and a p-value can be obtained. A Monte Carlo analysis of t-test, the Mann-Whitney U-test, and the exact randomization t-test is conducted. The exact randomization t-test compares favorably to the other two tests both in terms of size and power. Given the limited distributional assumptions necessary for implementation of the exact randomization test, these results suggest that experimental economists should consider using the exact randomization test more often.

Book
01 Jan 1998
TL;DR: In this article, the authors consider the problem of constructing confidence regions, whose coverage probabilities are nearly equal to the nominal ones, for the treat- ment effects associated with the primary and secondary endpoints of a clinical trial whose stopping rule, specified by a group sequential test, makes the approximate pivots in the nonsequential bootstrap method highly "non-pivotal".
Abstract: This paper considers the problem of constructing confidence intervals for a single parameter θ in a multiparameter or nonparametric family. Hybrid resampling methods, which "hybridize" the essential features of bootstrap and ex- act methods, are proposed and developed for both parametric and nonparametric situations. In particular, we apply such methods to construct confidence regions, whose coverage probabilities are nearly equal to the nominal ones, for the treat- ment effects associated with the primary and secondary endpoints of a clinical trial whose stopping rule, specified by a group sequential test, makes the approximate pivots in the nonsequential bootstrap method highly "non-pivotal". We also apply hybrid resampling methods to construct second-order correct confidence intervals in possibly non-ergodic autoregressive models and branching processes.

Book
01 Jan 1998
TL;DR: Fourier Transform Linear System Theory Sampling Sampling Devices Resampling Reconstruction Reconstructed Signal Appearance System Analysis System Resolution Image Quality Metrics.
Abstract: Fourier Transform Linear System Theory Sampling Sampling Devices Resampling Reconstruction Reconstructed Signal Appearance System Analysis System Resolution Image Quality Metrics.

Journal ArticleDOI
Harry Mager1, Gernot Göller1
TL;DR: The results obtained show that the resampling techniques can be considered a reliable alternative to Bailer's approach for the estimation of the standard error of the AUC t(k)0 in the case of normally distributed concentration data.

Journal ArticleDOI
01 Jan 1998-Genetics
TL;DR: It appears that the selective resampling schemes are either unbiased or least biased when the QTL is situated near the middle of the chromosome, but the problem open is how the method should be altered to take into account the bias of the original estimate of theQTL's position.
Abstract: Several nonparametric bootstrap methods are tested to obtain better confidence intervals for the quantitative trait loci (QTL) positions, i.e., with minimal width and unbiased coverage probability. Two selective resampling schemes are proposed as a means of conditioning the bootstrap on the number of genetic factors in our model inferred from the original data. The selection is based on criteria related to the estimated number of genetic factors, and only the retained bootstrapped samples will contribute a value to the empirically estimated distribution of the QTL position estimate. These schemes are compared with a nonselective scheme across a range of simple configurations of one QTL on a one-chromosome genome. In particular, the effect of the chromosome length and the relative position of the QTL are examined for a given experimental power, which determines the confidence interval size. With the test protocol used, it appears that the selective resampling schemes are either unbiased or least biased when the QTL is situated near the middle of the chromosome. When the QTL is closer to one end, the likelihood curve of its position along the chromosome becomes truncated, and the nonselective scheme then performs better inasmuch as the percentage of estimated confidence intervals that actually contain the real QTL's position is closer to expectation. The nonselective method, however, produces larger confidence intervals. Hence, we advocate use of the selective methods, regardless of the QTL position along the chromosome (to reduce confidence interval sizes), but we leave the problem open as to how the method should be altered to take into account the bias of the original estimate of the QTL's position.


Journal ArticleDOI
TL;DR: In this article, the null-hypothesis that a symbolic sequence is of nth Markov order is tested using binary and heptary test-sequences of known order.

Journal ArticleDOI
TL;DR: This paper proposes to obtain the underlying estimators for the estimator bank using a new type of resampling of some chosen eigenstructure estimator via pseudo-randomly generated noise, which can be applied in the same way to any other direction finding algorithm as well.

Journal ArticleDOI
TL;DR: In this article, the determination of empirical confidence intervals for the location of quantitative trait loci (QTLs) by interval mapping was investigated using simulation, and confidence intervals were created using a non-parametric sampling method and a parametric bootstrap for a backcross population derived from inbred lines.
Abstract: The determination of empirical confidence intervals for the location of quantitative trait loci (QTLs) by interval mapping was investigated using simulation. Confidence intervals were created using a non-parametric (resampling method) and parametric (resimulation method) bootstrap for a backcross population derived from inbred lines. QTLs explaining 1%, 5% and 10% of the phenotypic variance were tested in populations of 200 or 500 individuals. Results from the two methods were compared at all locations along one half of the chromosome. The non-parametric bootstrap produced results close to expectation at all non-marker locations, but confidence intervals when the QTL was located at the marker were conservative. The parametric method performed poorly; results varied from conservative confidence intervals at the location of the marker, to anti-conservative intervals midway between markers. The results were shown to be influenced by a bias in the mapping procedure and by the accumulation of type 1 errors at the location of the markers. The parametric bootstrap is not a suitable method for constructing confidence intervals in QTL mapping. The confidence intervals from the non-parametric bootstrap are accurate and suitable for practical use.

Journal ArticleDOI
TL;DR: In this paper, nonparametric sampling methods are introduced for the construction of confidence intervals for treatment effects associated with the primary and secondary endpoints of a clinical trial whose stopping rule is specified by a group sequential test.
Abstract: SUMMARY Resampling methods are introduced for the construction of confidence intervals for treatment effects associated with the primary and secondary endpoints of a clinical trial whose stopping rule is specified by a group sequential test. These methods are nonparametric and compare favourably with the exact methods that assume the responses to be normally distributed.

Journal ArticleDOI
TL;DR: Application to a real data set illustrates the advantages of distribution-free statistical methods, including freedom from distribution assumptions without loss of power, complete choice over test statistics, easy adaptation to design complexities and missing data, and considerable intuitive appeal.

Proceedings ArticleDOI
12 May 1998
TL;DR: A global motion estimation algorithm based on the Taylor expansion equation and robust regression technique using probabilistic thresholding is proposed that can improve both the coding efficiency and the quality of motion compensation on sequences involving camera movement.
Abstract: In the H.263 Version 2 (H.263+) coding standard, the global motion compensation can be introduced by using Reference Picture Resampling (Annex. P) syntax. Such an application requires that the global motion parameters be estimated automatically. We propose a global motion estimation algorithm based on the Taylor expansion equation and robust regression technique using probabilistic thresholding. The experimental results confirm that the proposed algorithm can improve both the coding efficiency and the quality of motion compensation on sequences involving camera movement.

Journal ArticleDOI
TL;DR: In this paper, the authors examined a new resampling methodology for estimating reference levels of 137Cs in uneroded locations and determined the influence of under-sampling on sediment redistribution and landscape stability.
Abstract: The objective of this study was to examine a new resampling methodology for estimating reference levels of 137Cs in uneroded locations. Accurate and precise measurement of 137Cs is required from reference locations to estimate long-term (c. 40 years) sediment redistribution (SRD) and landscape stability. Without reliable long-term, quantitative erosion data it is extremely difficult for land managers to make optimal decisions to ensure landscape sustainability. To determine the influence of 137Cs reference site sampling, particularly under-sampling, on SRD and landscape stability, two statistical approaches were applied to a grid-based data set. Caesium-137 inventories in the reference location (n=36) indicated that data were normally distributed, with a mean inventory of 2150±130 Bq m−2 (±95% confidence band) and a coefficient of variation of 18%. The two approaches used to determine the effect of under sampling included: (1) one-time random subsampling from the total sample collected, subsamples ranged from n=3 to n=30; from these data means and parametric confidence bands were calculated; and (2) random subsamples (n=3 to n=36) were selected from the total 137Cs reference sample, and each subsample was in turn resampled 1000 times with replacement to establish a sampling distribution of means. Thus, an empirically derived mean and 95% confidence bands were established. Caesium-137 activities determined from each approach were input into equations to estimate SRD from two cultivated fields. Results indicate that the one-time random sampling approach for subsamples of size ≤12 significantly over- or under-estimated net SRD, particularly from the gently sloping agricultural field. Computer-intensive resampling produced significantly better estimates of net SRD when compared with the random one-sample approach, especially when a subsample of size three was used. Landscape stability, based on partitioning the agricultural fields into areas exhibiting erosion, stability and deposition, was better approximated for both fields by applying resampling. © 1998 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this paper, the authors developed permutation tests for estimated distribution functions, which are formed by averaging a functional of estimated distribution function that are calculated from independent sampling units, where the units may be a single response, a set of repeated responses, or a censored response.
Abstract: In this article we develop permutation tests for estimated distribution functions. The tests are formed by averaging a functional of estimated distribution functions that are calculated from independent sampling units, where the units may be a single response, a set of repeated responses, or a censored response. We study primarily two functionals—the difference in means functional and the Mann-Whitney functional, and two types of responses—repeated conditionally independent responses and censored responses. For repeated responses, the permutation test using the difference in means functional produces a permutation form of the corresponding mixed-effects test. A new permutation test is developed when we apply the Mann-Whitney functional to the repeated responses. This is a case in which the rank-transform method does not work. On the other hand, for right-censored or interval-censored data, we obtain permutation forms of standard rank tests using the Mann-Whitney functional (or weighted forms of t...

Proceedings ArticleDOI
01 Dec 1998
TL;DR: The paper defines bootstrapping for random simulations with replicated runs on linear regression metamodels and concludes that Rao's lack-of-fit statistic is a good alternative to the F-test because it gives virtually identical results when the assumptions of theF-test are known to apply.
Abstract: Bootstrapping is a resampling technique that requires less computer time than simulation does. Bootstrapping-like simulation-must be defined for each type of application. The paper defines bootstrapping for random simulations with replicated runs. The focus is on linear regression metamodels. The metamodel's parameters are estimated through Generalized Least Squares. Its fit is measured through C.R. Rao's (1959) lack-of-fit F-statistic. The distributions of this statistic is estimated through bootstrapping. The main conclusions are: (i) not the regression residuals should be bootstrapped-instead the deviations that also occur in the standard deviation, should be bootstrapped; (ii) bootstrapping Rao's lack-of-fit statistic is a good alternative to the F-test because it gives virtually identical results when the assumptions of the F-test are known to apply, and somewhat better results otherwise.

Proceedings ArticleDOI
16 Aug 1998
TL;DR: An algorithm called MIXFIT is described that automatically undertakes the fitting of normal mixture models to multivariate data, using maximum likelihood via the EM algorithm, including the specification of suitable initial values if not supplied by the user.
Abstract: We consider the fitting of normal mixture models to multivariate data, using maximum likelihood via the EM algorithm. This approach requires the specification of all initial estimate of the vector of unknown parameters, or equivalently, of an initial classification of the data with respect to the components of the mixture model underfit. We describe an algorithm called MIXFIT that automatically undertakes this fitting, including the specification of suitable initial values if not supplied by the user The MIXFIT algorithm has several options, including the provision to carry out a resampling-based test for the number of components in the mixture model.

01 Jan 1998
TL;DR: A novel method for building and evaluating ANN under the condition of scarce data is proposed: Committee networks by resampling.
Abstract: Artificial neural networks (ANN) are nonparametric models and like all nonparametric models require a large number of observations to build and evaluate the model. But data is always finite and most often scarce in real world applications. The question now becomes: How does a modeler build an ANN model on a limited sample size and obtain an accurate and reliable estimate the model’s error? The research presented in this paper proposes a novel method for building and evaluating ANN under the condition of scarce data: Committee networks by resampling.

Journal ArticleDOI
TL;DR: In this paper, the bias of the Monte Carlo approximation to the additively corrected endpoints is of smaller order than in the case of direct coverage calibration, and the asymptotic variance is the same.
Abstract: Use of the iterated bootstrap is often recommended for calibration of bootstrap intervals, using either direct calibration of the nominal coverage probability (prepivoting), or additive correction of the interval endpoints. Monte Carlo resampling is a straightforward, but computationally expensive way to approximate the endpoints of bootstrap intervals. Booth and Hall examined the case of coverage calibration of Efron's percentile interval, and developed an asymptotic approximation for the error in the Monte Carlo approximation of the endpoints. Their results can be used to determine an approximately optimal allocation of resamples to the first and second level of the bootstrap. An extension of this result to the case of the additively corrected percentile interval shows that the bias of the Monte Carlo approximation to the additively corrected endpoints is of smaller order than in the case of direct coverage calibration, and the asymptotic variance is the same. Because the asymptotic bias is con...

Posted Content
TL;DR: In this paper, the authors discuss how computer intensive methods may be used to adjust the test distribution, such that the actual significance level will coincide with the desired nominal level, and as a concequence, too many true null hypotheses will falsely be rejected.
Abstract: This dissertation contains five essays in the field of time series econometrics. The main issue discussed is the lack of coherence between small sample and asymptotic inference. Frequently, in modern econometrics distributional results are strictly only valid for a hypothetical infinite sample. Studies show that the attained actual level of a test may be considerable different from the nominal significance level, and as a concequence, too many true null hypotheses will falsely be rejected. This leads, in the extension, to applied users that too often reject evidence in the data for theoretical predictions. In large, the thesis discusses how computer intensive methods may be used to adjust the test distribution, such that the actual significance level will coincide with the desired nominal level. The first two essays focus on how to improve testing for persistence in data, through a bootstrap procedure within a univariate framework. The remaining three essays are studies of multivariate time series models. The third essay considers the identification problem of the basic stationary vector autoregressive model, which is also the basic-line econometric specification for maximum likelihood cointegration analysis. In the fourth essay the multivariate framework is expanded to allow for components of different integrating order and in this setting the paper discusses how fractional cointegration affects the inference in maximum likelihood cointegration analysis. The fifth essay consider once again the bootstrap testing approach, now in a multivariate application, to correct inference on long-run relations in maximum likelihood cointegration analysis.