scispace - formally typeset
Search or ask a question

Showing papers on "Sampling distribution published in 1993"


Book
09 Aug 1993
TL;DR: In this paper, the authors present a formal justification for the use of the Bootstrap in statistical inference. But they do not discuss future limitations of the bootstrap and their application in the statistical verification of confidence intervals.
Abstract: PART ONE: INTRODUCTION Traditional Parametric Statistical Inference Bootstrap Statistical Inference Bootstrapping a Regression Model Theoretical Justification The Jackknife Monte Carlo Evaluation of the Bootstrap PART TWO: STATISTICAL INFERENCE USING THE BOOTSTRAP Bias Estimation Bootstrap Confidence Intervals PART THREE: APPLICATIONS OF BOOTSTRAP CONFIDENCE INTERVALS Confidence Intervals for Statistics With Unknown Sampling Distributions Inference When Traditional Distributional Assumptions Are Violated PART FOUR: CONCLUSION Future Work Limitations of the Bootstrap Concluding Remarks

1,682 citations


Journal ArticleDOI
01 Sep 1993-Ecology
TL;DR: This paper attempts to introduce some distribution-free and robust techniques to ecologists and to offer a critical appraisal of the potential advantages and drawbacks of these methods.
Abstract: After making a case for the prevalence of nonnormality, this paper attempts to introduce some distribution-free and robust techniques to ecologists and to offer a critical appraisal of the potential advantages and drawbacks of these methods. The techniques presented fall into two distinct categories, methods based on ranks and "computer-inten- sive" techniques. Distribution-free rank tests have features that can be recommended. They free the practitioner from concern about the underlying distribution and are very robust to outliers. If the distribution underlying the observations is other than normal, rank tests tend to be more efficient than their parametric counterparts. The absence, in computing packages, of rank procedures for complex designs may, however, severely limit their use for ecological data. An entire body of novel distribution-free methods has been developed in parallel with the increasing capacities of today's computers to process large quantities of data. These techniques either reshuffle or resample a data set (i.e., sample with replacement) in order to perform their analyses. The former we shall refer to as "permutation" or "randomiza- tion" methods and the latter as "bootstrap" techniques. These computer-intensive methods provide new alternatives for the problem of a small and/or unbalanced data set, and they may be the solution for parameter estimation when the sampling distribution cannot be derived analytically. Caution must be exercised in the interpretation of these estimates because confidence limits may be too small.

462 citations


Journal ArticleDOI
TL;DR: This article developed Bayesian model-based theory for post-stratification, which is a common technique in survey analysis for incorporating population distributions of variables into survey estimates, such as functions of means and totals.
Abstract: Post-stratification is a common technique in survey analysis for incorporating population distributions of variables into survey estimates. The basic technique divides the sample into post-strata, and computes a post-stratification weight w ih = rP h /r h for each sample case in post-stratum h, where r h is the number of survey respondents in post-stratum h, P h is the population proportion from a census, and r is the respondent sample size. Survey estimates, such as functions of means and totals, then weight cases by w h . Variants and extensions of the method include truncation of the weights to avoid excessive variability and raking to a set of two or more univariate marginal distributions. Literature on post-stratification is limited and has mainly taken the randomization (or design-based) perspective, where inference is based on the sampling distribution with population values held fixed. This article develops Bayesian model-based theory for the method. A basic normal post-stratification mod...

253 citations


Journal ArticleDOI
TL;DR: A universally accepted definition for vector correlation in oceanography and meteorology does not presently exist as mentioned in this paper, and to address this need, a generalized correlation coefficient, originally proposed by Hooper and later expanded upon by Jupp and Mardia, is explored.
Abstract: A universally accepted definition for vector correlation in oceanography and meteorology does not presently exist. To address this need, a generalized correlation coefficient, originally proposed by Hooper and later expanded upon by Jupp and Mardia, is explored. A short history of previous definitions is presented. Then the definition originally proposed by Hooper is presented together with supporting theory and associated properties. The most significant properties of this vector correlation coefficient are that it is a generalization of the square of the simple one-dimensional correlation coefficient, and when the vectors are independent, its asymptotic distribution is known; hence, it can be used for hypothesis testing. Because the asymptotic results hold only for large samples, and in practical situations only small samples are often available, modified sampling distributions are derived using simulation techniques for samples as small as eight. It is symmetric with respect to its arguments a...

132 citations


Journal ArticleDOI
TL;DR: In this article, a Monte Carlo simulation of the sampling process and the resulting estimates of the characteristics of the underlying drop population were evaluated, including liquid water concentration and the reflectivity factor; the maximum particle size in each sample was also determined.
Abstract: Because of the randomness associated with sampling from a population of raindrops, variations in the data reflect some undetermined mixture of sampling variability and inhomogeneity in the precipitation. Better understanding of the effects of sampling variability can aid in interpreting drop size observations. This study begins with a Monte Carlo simulation of the sampling process and then evaluates the resulting estimates of the characteristics of the underlying drop population. The characteristics considered include the liquid water concentration and the reflectivity factor; the maximum particle size in each sample is also determined. The results show that skewness in the sampling distributions when the samples are small (which is the usual case in practice) produces a propensity to underestimate all of the characteristic quantities. In particular, the distribution of the sample maximum drop sizes suggests that it may be futile to try to infer an upper truncation point for the size distribution...

101 citations


Journal ArticleDOI
TL;DR: Probability Random Variables and their Distributions Special Probability Distributions Joint Distributions Properties of random Variables Functions of random variables Limiting Distributions Statistics and Sampling Distributions Point Estimation Sufficiency and Completeness Interval Estimation Test of Hypotheses Contingency Tables and Goodness-of-Fit Nonparametric Methods Regression and Linear Models Reliability and Survival Distributions Answers to Selected Exercises as mentioned in this paper.
Abstract: Probability Random Variables and Their Distributions Special Probability Distributions Joint Distributions Properties of Random Variables Functions of Random Variables Limiting Distributions Statistics and Sampling Distributions Point Estimation Sufficiency and Completeness Interval Estimation Test of Hypotheses Contingency Tables and Goodness-of-Fit Nonparametric Methods Regression and Linear Models Reliability and Survival Distributions Answers to Selected Exercises.

95 citations



Journal ArticleDOI
J.S. Sadowsky1
01 Jan 1993
TL;DR: It is shown that the embedded parametric family of exponentially twisted distributions has a certain uniform asymptotic stability property, and the technique is stable even if the optimal twisting parameter(s) cannot be precisely determined.
Abstract: Estimation of the large deviations probability p/sub n/=P(S/sub n/>or= gamma n) via importance sampling is considered, where S/sub n/ is a sum of n i.i.d. random variables. It has been previously shown that within the nonparametric candidate family of all i.i.d. (or, more generally, Markov) distributions, the optimized exponentially twisted distribution is the unique asymptotically optimal sampling distribution. As n to infinity , the sampling cost required to stabilize the normalized variance grows with strictly positive exponential rate for any suboptimal sampling distribution, while this sampling cost for the optimal exponentially twisted distribution is only O(n/sup 1/2/). Here, it is established that the optimality is actually much stronger. As n to infinity , this solution simultaneously stabilizes all error moments of both the sample mean and the sample variance estimators with sampling cost O(n/sup 1/2/). In addition, it is shown that the embedded parametric family of exponentially twisted distributions has a certain uniform asymptotic stability property. The technique is stable even if the optimal twisting parameter(s) cannot be precisely determined. >

70 citations


Journal ArticleDOI
TL;DR: In this article, a Bayesian bootstrap for a censored data model is introduced and its small sample distributional properties are discussed and found to be similar to Efron's bootstraps for censored data.
Abstract: A Bayesian bootstrap for a censored data model is introduced. Its small sample distributional properties are discussed and found to be similar to Efron's bootstrap for censored data. In the absence of censoring, the Bayesian bootstrap for censored data reduces to Rubin's Bayesian bootstrap for complete data. A first-order large-sample theory is developed. This theory shows that both censored data bootstraps are consistent bootstraps for approximating the sampling distribution of the Kaplan-Meier estimator. It also shows that both bootstraps are consistent bootstraps for approximating a posterior distribution of the survival function with respect to each member of the class of conjugate beta-neutral process priors.

66 citations


Book
07 Jan 1993
TL;DR: In this article, the Lagrange multiplier test is used for multivariate analysis and matrix algebra, and it is suitable for beginning graduate courses in mathematical statistics and econometrics.
Abstract: Covers both multivariate analysis and matrix algebra. This work focuses on tests of hypotheses such as the Lagrange multiplier test. It discusses asymptotic distribution theory, and characteristic functions in depth. It is suitable for beginning graduate courses in mathematical statistics and econometrics.

64 citations


Book ChapterDOI
TL;DR: In this paper, the authors review Efron's method called the bootstrap and briefly mention its relation to the jackknife, with a particular emphasis on econometric applications.
Abstract: Publisher Summary This chapter reviews Efron's method called the bootstrap, and briefly mentions its relation to the jackknife, with a particular emphasis on econometric applications. Bootstrap literature has made tremendous progress in solving an old statistical problem: making reliable confidence statements in complicated small sample, multi-step, dependent, and non-normal cases. Resampling methods provide radically new solutions to several modeling problems involving interdependence, simultaneity, nonlinearity, nonstationarity, instability, nonnormality, heteroscedasticity, small or missing data, Hawthorne effect, and more solutions. The bootstrap handles these problems nonparametrically and intuitively, avoiding complicated power functions, Cramer–Rao lower bounds, bias corrections for Wald or Lagrange multiplier tests, and such. Many early applications of the bootstrap in econometrics attempts to provide an alternative to asymptotic standard error estimates. The jackknife is also used to find improved estimates of the standard errors. The bootstrap offers a potentially valuable insight into the sampling distributions, beyond simpler and improved estimation of standard errors. When two or more statistical tests are used, their power is difficult to determine analytically. The bootstrap sampling distribution can eliminate the need for tedious computations of the power in some cases. The chapter also discusses the post hoc technique for cleverly manipulating the bootstrap replications, computational aspects of bootstrap methods, and simultaneous equation and dynamic econometric models which require a special setup different from the traditional bootstrap.

Journal ArticleDOI
01 Nov 1993-Oikos
TL;DR: In this article, the mean and standard error of the population growth rate were approximated using a Taylor's series expansion, assuming a particular type of sampling distribution for the population rate, one may use this analytic estimate to assign confidence limits.
Abstract: Estimates of population growth rates are subject to uncertainties because of errors incurred in estimating the individual rates of fecundity, survival and growth. However, the non-linear relationship between the population growth rate and the vital rates makes this a difficult task, and confidence limits are rarely assigned to growth rate estimates. The mean and standard error of the population growth rate may be approximated using a Taylor's series expansion. Assuming a particular type of sampling distribution for the population rate, one may use this analytic estimate to assign confidence limits. Available simulations do not enable generalizations concerning the performance of this analytic approach, especially for estimates obtained for age or stage-structured populations

Journal ArticleDOI
TL;DR: In this article, the authors developed statistical inference based on the maximum likelihood method in elliptical populations with an unknown density function and showed that the method assuming the multivariate normal distribution, using the sample mean and the sample covariance matrix, is basically correct even for elliptical population under a certain kurtosis adjustment, but is not statistically efficient.
Abstract: In this article we develop statistical inference based on the maximum likelihood method in elliptical populations with an unknown density function. The method assuming the multivariate normal distribution, using the sample mean and the sample covariance matrix, is basically correct even for elliptical populations under a certain kurtosis adjustment, but is not statistically efficient, especially when the kurtosis of the population distribution has higher than moderate values. On the other hand, several methods of statistical inference assuming a particular family (e.g., multivariate T distribution) of elliptical distributions have been recommended as a robust procedure against outliers or distributions with heavy tails. Such inference also will be important to maintain a high efficiency of statistical inference in elliptical populations. In practice, however, it is very difficult to choose an appropriate family of elliptical distributions, and one may misspecify the family. Furthermore, extra parameters (...

Journal Article
TL;DR: Computer simulations conducted using real DNA typing data indicate that, while the sampling distribution of estimated genotypes probabilities is not symmetric around the point estimate, the confidence interval of estimated (single-locus or multilocus) genotype probabilities can be obtained from the sampling of a logarithmic transformation of the estimated values.
Abstract: Multilocus genotype probabilities, estimated using the assumption of independent association of alleles within and across loci, are subject to sampling fluctuation, since allele frequencies used in such computations are derived from samples drawn from a population. We derive exact sampling variances of estimated genotype probabilities and provide simple approximation of sampling variances. Computer simulations conducted using real DNA typing data indicate that, while the sampling distribution of estimated genotype probabilities is not symmetric around the point estimate, the confidence interval of estimated (single-locus or multilocus) genotype probabilities can be obtained from the sampling of a logarithmic transformation of the estimated values. This, in turn, allows an examination of heterogeneity of estimators derived from data on different reference populations. Applications of this theory to DNA typing data at VNTR loci suggest that use of different reference population data may yield significantly different estimates. However, significant differences generally occur with rare (less than 1 in 40,000) genotype probabilities. Conservative estimates of five-locus DNA profile probabilities are always less than 1 in 1 million in an individual from the United States, irrespective of the racial/ethnic origin.

Journal ArticleDOI
TL;DR: It is proved that, under a Jeffreys' type improper prior on the scale parameter, posterior inference on the location parameters is the same for all lq-spherical sampling models with common q.
Abstract: SUMMARY The class of multivariate lq-spherical distributions is introduced and defined through their isodensity surfaces. We prove that, under a Jeffreys' type improper prior on the scale parameter, posterior inference on the location parameters is the same for all lq-spherical sampling models with common q. This gives us perfect inference robustness with respect to any departures from the reference case of independent sampling from the exponential power distribution.

Journal ArticleDOI
TL;DR: In this article, a second-order property for the small-sample Bayesian bootstrap clone (BBC) in unweighted i.i.d. sampling is given, and it is shown that in weighted sampling models, BBC approximations to a posterior distribution of the reciprocal of the weighted mean are asymptotically accurate.
Abstract: Bayesian statistical inference for sampling from weighted distribution models is studied. Small-sample Bayesian bootstrap clone (BBC) approximations to the posterior distribution are discussed. A second-order property for the BBC in unweighted i.i.d. sampling is given. A consequence is that BBC approximations to a posterior distribution of the mean and to the sampling distribution of the sample average, can be made asymptotically accurate by a proper choice of the random variables that generate the clones. It also follows from this result that in weighted sampling models, BBC approximations to a posterior distribution of the reciprocal of the weighted mean are asymptotically accurate; BBC approximations to a sampling distribution of the reciprocal of the empirical weighted mean are also asymptotically accurate.

Journal ArticleDOI
01 Jul 1993-Genetics
TL;DR: Exact sampling distributions of several statistics are derived here, using combinatorial approaches parallel to the classical occupancy problem to help overcome the difficulty of applying large sample approximations for hypothesis testing purposes.
Abstract: In categorical genetic data analysis when the sampling units are classified into an arbitrary number of distinct classes, sometimes the sample size may not be large enough to apply large sample approximations for hypothesis testing purposes. Exact sampling distributions of several statistics are derived here, using combinatorial approaches parallel to the classical occupancy problem to help overcome this difficulty. Since the multinomial probabilities can be unequal, this situation is described as a generalized occupancy problem. The sampling properties derived are used to examine nonrandomness of occurrence of mutagen-induced mutations across loci, to devise tests of Hardy-Weinberg proportions of genotype frequencies in the presence of a large number of alleles, and to provide a global test of gametic phase disequilibrium of several restriction site polymorphisms.

Journal ArticleDOI
TL;DR: For a spherically symmetric multivariate normal random sample, the asymptotic distribution of the largest interpoint Euclidean distance is derived in this article, where the number of interpoint distances exceeding a high level is shown to have a limiting Poisson distribution.
Abstract: For a spherically symmetric multivariate normal random sample, the asymptotic distribution of the largest interpoint Euclidean distance is derived. The number of interpoint distances exceeding a high level is shown to have a limiting Poisson distribution.

Journal ArticleDOI
TL;DR: A way in which the UMVUE for a normal mean can be calculated using software capable of determining the operating characteristics of a group-sequential test is presented.

Journal ArticleDOI
TL;DR: In this paper, the authors present a method to carry out Bayesian predictive inference for a finite population proportion, which is useful for many surveys of this type and yields simple analytical expressions for the prior and posterior mean and variance.
Abstract: SUMMARY Given binary data from a two-stage cluster sample, we present a method to carry out Bayesian predictive inference for a finite population proportion. Our probabilistic specification should be useful for many surveys of this type and yields simple analytical expressions for the prior and posterior mean and variance. Within cluster k, we assume that the Yki are a random sample from the Bernoulli distribution with probability Ok. Conditional on ,B and T, 01, . . ., ON are a random sample from a beta distribution. Finally, 13 has a discrete distribution with specified probabilities. We use data from the National Health Interview Survey to illustrate the methodology and to show how to choose values for the parameters in the prior distribution.

Journal ArticleDOI
TL;DR: In this article, the authors extend the M and M motif to parametric and nonparametric statistics, particularly with reference to power, robustness, scale of measurement, the null hypothesis, and generality of application.
Abstract: Some Myths Concerning Parametric and Nonparametric Tests by Hunter and May Hunter and May offer a paper on myths and misconceptions (M and M's) that is an excellent companion article to Brewer (1985), who wrote on M and M's in statistical textbooks. Brewer addressed hypothesis testing, confidence intervals, and sampling distributions and the Central Limit Theorem. Hunter and May extend the M and M motif to parametric and nonparametric statistics, particularly with reference to power, robustness, scale of measurement, the null hypothesis, and generality of application.In the section on power, Hunter and May point out that when underlying assumptions of the parametric test are violated nonparametric tests may be more powerful. They call this a "knee - jerk argument" because this fact is usually ignored in selecting tests. In considering alternatives to normal theory statistics, they offer what they consider to be the definitive argument: "... the reason some nonparametric tests are less powerful than parametric tests is not because they are nonparametric tests per se, but because they are rank or nominal - scale tests and therefore are based on less information".In contradistinction to their reasoning, consider the following analogy: both an accomplished opera singer sings and an off - key beginning tuba player plays dots and dashes of the International Morse Code. While some may consider the opera singer's notes to be sounds of music, there is, in fact, no more information in those dots and dashes than in the off - key notes of the beginning tuba player, with respect to the code. If the complexity and subtlety of what is often imagined to be included in interval scales is noise and not signal, parametric tests will have no more information available than a rank test, and will be less efficient by trying to discriminate a signal from noise when in fact there isn't any. This is my interpretation of Hemelrijk (1961): the cost of being robust with respect to both Type I and Type II error under nonnormality precludes the t test from remaining the Uniformly Most Powerful Unbiased test under nonnormality.In the M and M section on the robustness of parametric tests, they cite Micceri (1989) as evidence of the widespread problem of nonnormality in psychology and education data. Yet, there are many, many Monte Carlo studies that demonstrate that normal theory tests such as the F and t test are robust to departures from normality. These studies used well known mathematical functions (e.g., cauchy, chi - square, exponetional, uniform) to model real data and showed that so long as sample sizes are about equal, sample sizes are at least 20 - 25 per group, and the tests are two - tailed, rather than one - tailed, the t test is robust.Micceri's (1989) argument, echoed by Hunter and May, was that those mathematical functions are poor models of psychology and education data, and consequently Monte Carlo studies based on them are not convincing. His study pointed out how radical real distributions may be, such as the so - called multi - modal lumpy, extreme bimodal, extreme asymmetric, digit preference, and discrete mass at zero with gap distributions. Nevertheless, a Monte Carlo study by Sawilowsky and Blair (1992) demonstrated by sampling with replacement from Micceri's data sets, that so long as sample sizes were equal, about 20 - 25, and tests were two tailed, the independent and dependent samples t tests were robust by any definition.The real issue of the effects of nonnormality, as indicated by Sawilowsky and Blair (1992), is on the comparative power, not robustness, of the t test. For example, a Monte Carlo comparison (10,000 repetitions) of the power for the t test and Wilcoxon test with a sample size of (5,15) drawn from an extreme asymmetric distribution identified by Micceri (1989) indicated that at the .05 alpha level and effect size of .20"Greek not transcribed", the power of the Wilcoxon test was . …

Journal ArticleDOI
TL;DR: The multivariate portmanteau test proposed by Hosking as mentioned in this paper for testing the adequacy of an autoregressive moving average model is based on the first s residual autocovariances of the fitted model.
Abstract: The multivariate portmanteau test proposed by Hosking (1980) for testing the adequacy of an autoregressive moving average model is based on the first s residual autocovariances of the fitted model.In practice a value for s is chosen in dependence on the sample size n, mostly s = 20 for n between 50 and 200. In this paper it will be shown by simulations that the usual choiceof s = 20 oftenleads to a significant deviation of the sample distribution of the test statistic Pm from the asymptotic X2 distribution. In the case of pure multivariate AR models the Kolmogorow-Smirnow test is used to find those values of s for which the sample distribution shows the best agreement with X2.In this manner s depends not only on the sample size n but also on the order of themodel p and the dimension m. A table for the best choice of s is given for n between 100 and 1000,p between 1 and 5 and m between 1 and 12.

Journal ArticleDOI
TL;DR: The authors analyzed posterior distributions of the moving average parameter in the first-order case and sampling distribution of the corresponding maximum likelihood estimator, and concluded that the posterior distributions do not pile up at unity regardless of the parameter's proximity to unity.
Abstract: We analyze posterior distributions of the moving average parameter in the first-order case and sampling distributions of the corresponding maximum likelihood estimator. Sampling distributions “pile up” at unity when the true parameter is near unity; hence if one were to difference such a process, estimates of the moving average component of the resulting series would spuriously tend to indicate that the process was overdifferenced. Flat-prior posterior distributions do not pile up, however, regardless of the parameter's proximity to unity; hence caution should be taken in dismissing evidence that a series has been overdifferenced.


Journal ArticleDOI
01 Oct 1993-Networks
TL;DR: This paper shows how information obtained during an iterative procedure for computing the probability that l ≤ δ < u can be used for designing an efficient Monte Carlo sampling plan that performs sampling at few capacity distributions and uses sampling data to estimate the probabilities of interest at each distribution in ℱ.
Abstract: Consider a flow network whose nodes do not restrict flow transmission and arcs have random, discrete, and independent capacities. Let s and t be a pair of selected nodes, let δ denote the value of a maximum s—t flow, and let Γ denote a set of s–t cuts. Also, let ℱ denote a set of independent joint capacity distributions with common state space. For fixed l < u, this paper develops methods for approximating the probability that l ≤ Δ < u and the probability that a cut in Γ is minimum given that l ≤ δ < u for each distribution in ℱ. Since these evaluations are NP-hard problems, it shows how information obtained during an iterative procedure for computing the probability that l ≤ δ < u can be used for designing an efficient Monte Carlo sampling plan that performs sampling at few capacity distributions and uses sampling data to estimate the probabilities of interest at each distribution in ℱ. The set of sampling distributions is chosen by solving an uncapacitated facility location problem. The paper also describes techniques for computing confidence intervals and includes an algorithm for implementing the sampling experiment. An example illustrates the efficiency of the proposed method. This method is applicable to the computation of performance measures for networks whose elements have discrete random weights (lengths, gains, etc.) for a set of joint weight distributions with common state space. © 1993 by John Wiley & Sons, Inc.

Book
01 Jan 1993
TL;DR: This book discusses Statistical Thinking for Process Management and Quality Improvement, and an introduction to the Fundamental Elements of Statistical Analysis.
Abstract: 1. INTRODUCTION TO STATISTICS AND STATISTICAL THINKING Introduction / The Fundamental Elements of Statistical Analysis / The Evaluation of Statistical Analyses / Obtaining Data / Statistical Thinking for Process Management and Quality Improvement / An Introduction to the Design of Experiments / Statistical Notation / Use of Computers in Statistical Analysis / Looking Ahead / Summary / Appendix 1: Introduction to MINITAB, Excel, and JMP IN 2. EXPLORING AND SUMMARIZING DATA Introduction / Types of Data / Distributions of Data / Measures of Location: The Center of the Data / Measures of Variation / Measures of Relative Standing / Relationships Between Two Variables / Exploring and Summarizing Data: A Comprehensive Example / Summary / Appendix 2A: Computer Instructions for Using MINITAB, Excel, and JMP IN / Appendix 2B: Your Turn to Perform a Statistical Study 3. PROBABILITY, RANDOM VARIABLES, AND PROBABILITY DISTRIBUTIONS Bridging to New Topics / The Basic Elements of Probability / Interpretations and Fundamental Rules of Probabilities / Discrete and Continuous Random Variables / Probability Distributions of Discrete Random Variables / Probability Distributions of Continuous Random Variables / Expected Values of Random Variables / Summary / Appendix 3: Calculus-Based Introduction to Probability Distributions for Continuous Random Variables 4. SOME IMPORTANT PROBABILITY DISTRIBUTIONS Bridging to New Topics / The Binomial Distribution / The Normal Distribution / The Normal Distribution as an Approximation to the Binomial Distribution / Summary / Appendix 4: Computer Instructions for Using MINITAB, Excel, and JMP IN 5. STATISTICS AND SAMPLING DISTRIBUTIONS Bridging to New Topics / Sampling Techniques / Parameters Statistics, and Fundamentals of Statistical Inference / Desirable Properties of Statistics / The Sampling Distribution of the Sample Mean X [overbar] / The Sampling Distribution of the Sample Proportion p / Summary / Appendix 5: Computer Instructions for Using MINITAB, Excel, and JMP IN 6. STATISTICAL INFERENCES FOR A SINGLE POPULATION OR PROCESS Bridging to New Topics / An Introduction to Confidence Intervals and Hypothesis Testing / Statistical Inferences on Mu Based on X [overbar] / Statistical Inference for Pi Based on P / Summary / Appendix 6: Computer Instructions for Using MINITAB and Excel 7. STATISTICAL INFERENCES FOR TWO POPULATIONS OR PROCESSES Bridging to New Topics / Planning a Comparison of Two Means / Statistical Inferences for Two Means Based on Independent Samples / Statistical Inferences for Two Means Based on Paired Samples / Statistical Inferences for Two Proportions Based on Independent Samples / Statistical Inferences for Two Populations or Processes: A Comprehensive Example / Summary / Appendix 7: Computer Instructions for Using MINITAB, Excel, and JMP IN 8. ANALYSIS OF VARIANCE Bridging to New Topics / Comparing More Than Two Population or Process Means with Independent Samples / Comparing More Than Two Treatments with Samples Selected in Blocks / Analysis of Variance: A Comprehensive Example / Summary / Appendix 8: Computer Instructions for Using MINITAB, Excel, and JMP IN 9. SIMPLE LINEAR REGRESSION ANALYSIS Bridging to New Topics / Relationships Between Two Variables: The Simple Linear Regression Model / Estimating the Parameters of the Simple Linear Regression Model / Statistical Inferences for the Simple Linear Regression Model / The Reliability of Estimates and Predictions / Factors That Affect Regression Standard Errors: Some Design Considerations / Correlation: Measuring the Linear Association Between Y and X / Simple Linear Regression: A Comprehensive Example / Summary / Appendix 9A: Computer Instructions for Using MINITAB, Excel, and JMP IN / Appendix 9B: Determining Least Squares Estimates Using a Calculator 10. MULTIPLE LINEAR REGRESSION Bridging to New Topics / The Multiple Linear Regression Model / Estimating the Parameters of the Multiple Linear Regression Model / How Good Is the Model? / Statistical Inference for Multiple Linear Regression / Incorporating Qualitative Variables in Multiple Linear Regression: Dummy Variables / Curvilinear Regression Models / Detecting Model Deficiencies and Avoiding Pitfalls: Residual Analysis and Collinearity / Criteria for Selecting the Best Set of Predictor Variables / Multiple Linear Regression: A Comprehensive Example / Summary / Appendix 10: Computer Instructions for Using MINITAB, Excel, and JMP IN 11. GOODNESS-OF-FIT PROCEDURES AND CONTINGENCY TABLES Bridging to New Topics / The Chi-Square Goodness-of-Fit Procedure / Analysis of Two-Way Contingency Tables: The Chi-Square Procedure for Independence / Summary / Appendix 11: Computer Instructions for Using MINITAB, Excel, and JMP IN 12. TIME SERIES ANALYSIS AND FORECASTING Bridging to New Topics / Time Series Patterns / Forecasting with Exponential Smoothing / Forecasting with Regression Models / Summary / Appendix 12: Computer Instructions for Using MINITAB, Excel, and JMP IN 13. METHODS FOR PROCESS IMPROVEMENT AND STATISTICAL QUALITY CONTROL Bridging to New Topics / Process Improvement Strategies / Statistical Control Charts / Control Charts for the Average and Variation of Process Outputs: X [overbar] and S charts / Control Charts for Process Proportions: p Charts / Summary / Appendix 13: Computer Instructions for Using MINITAB, Excel, and JMP IN / APPENDIX: STATISTICAL TABLES / ANSWERS TO SELECTED ODD-NUMBERED EXERCISES / INDEX

Journal ArticleDOI
TL;DR: The asymptotic distribution of the least squares estimators in the random walk model was first found by White [17] and is described in terms of functional of Brownian motion with no closed form expression known as discussed by the authors.
Abstract: The asymptotic distribution of the least-squares estimators in the random walk model was first found by White [17] and is described in terms of functional of Brownian motion with no closed form expression known. Evans and Savin [5,6] and others have examined numerically both the asymptotic and finite sample distribution. The purpose of this paper is to derive an asymptotic expansion for the distribution. Our approach is in contrast to Phillips [12,13] who has already derived some terms in a general expansion by analyzing the functionals. We proceed by assuming that the errors are normally distributed and expand the characteristic function directly. Then, via numerical integration, we invert the characteristic function to find the distribution. The approximation is shown to be extremely accurate for all sample sizes ≥25, and can be used to construct simple tests for the presence of a unit root in a univariate time series model. This could have useful applications in applied economics.

01 Jan 1993
TL;DR: I consider the problem of making statistical inference from information inherent in the structure of a complex mechanistic model and from stochastic evidence about model inputs and outputs and presents a Bayesian approach, which consists of translating all the available information into a joint 'pre-model' distribution on the model input and outputs.
Abstract: I consider the problem of making statistical inference from information inherent in the structure of a complex mechanistic model and from stochastic evidence about model inputs and outputs. A Bayesian approach is presented which consists of translating all the available information into a joint 'pre-model' distribution on the model inputs and outputs, and then restricting this to the submanifold defined by the model's mapping to obtain a joint 'post-model' distribution. Marginalizing this yields inference, conditional on the model, about quantities of interest which can be functions of model inputs, model outputs, or both. Importance sampling can be used to obtain a sample of model simulations from the post-model distribution. I apply this approach to two deterministic population dynamics models for the Western Arctic stock of bowhead whales. The results are a multivariate sample which can be explored using the full range of exploratory data analysis techniques. Methods for comparing competing models are developed, based on a generalization of the Bayes factor idea. Sensitivity analysis may be performed with a simple reweighting scheme. Examples of these techniques are given, and the advantages of this approach compared to existing methods are discussed in detail. Simple importance sampling can sometimes produce a poor post-model sample because mechanistic models are frequently overparameterized so that the post-model distribution of inputs and outputs may be much less diffuse than the pre-model distribution, nearly lower-dimensional and have strong non-linear relationships between variables. An iterative, adaptive importance sampling algorithm is presented as a possible solution. At each stage, it builds a nonparametric kernel density estimate of the post-model distribution based on the importance sample from the previous stage. This estimate is then used as a sampling envelope for the next stage. The unusual shape of the post-model distribution may require kernels which vary to reflect the estimated local structure of the distribution. Monte Carlo simulation results show that a locally adaptive algorithm can consistently estimate quantities of interest with lower MSE than simple importance sampling or globally adaptive methods, while requiring fewer model simulations. The strong consistency of the locally adaptive kernel density estimator is proved.

Book ChapterDOI
TL;DR: The chapter discusses the accuracy of thebootstrap methods, the use of bootstrap for non-smooth functions, the computation of the bootstrap distribution, bootstrap significance tests, boot strap confidence intervals, randomly censored models, regression models, time series models, autoregressive models, sample survey models, and Bayesian bootstrap methods.
Abstract: Publisher Summary The bootstrap technique can also be used in situations where the observed data are not iid from a common distribution function F , as in the regression problem. In such a case, it is necessary to estimate the entire probability mechanism that gave rise to the observations. In the iid case for sample means and sample quantiles, the bootstrap distribution consistently estimates the sampling distribution. That is, the difference between the sampling distribution and the bootstrap distribution goes to zero uniformly, for almost all sample sequences, as the sample size increases. If the statistics are not uniform in some sense, the bootstrap might fail to be consistent. The examples of U-statistics and extreme value statistics, where the bootstrap method fails to approximate the sampling distribution are given in the chapter. Blind application of the bootstrap method could lead to disastrous results. This can be avoided in some cases by modifying the bootstrap statistic. The chapter discusses the accuracy of the bootstrap methods, the use of bootstrap for non-smooth functions, the computation of the bootstrap distribution, bootstrap significance tests, bootstrap confidence intervals, randomly censored models, regression models, time series models, autoregressive models, sample survey models, and Bayesian bootstrap methods.

Journal ArticleDOI
TL;DR: In this article, an envelope-rejection method is used to generate random variates from the Watson distribution, which is competitive with, if not superior to, the existing sampling algorithms.
Abstract: An envelope-rejection method is used to generate random variates from the Watson distribution. The method is compact and is competitive with, if not superior to, the existing sampling algorithms. For the girdle form of the Watson distribution, a faster algorithm is proposed. As a result, Johnson's sampling algorithm for the Bingham distribution is improved.