scispace - formally typeset
Search or ask a question

Showing papers on "Sampling distribution published in 1998"


Journal ArticleDOI
TL;DR: In this paper, the authors point out that one of these estimators is correct while the other is incorrect, which biases one's hypothesis test in favor of rejecting the null hypothesis that b1= b2.
Abstract: Criminologists are often interested in examining interactive effects within a regression context. For example, “holding other relevant factors constant, is the effect of delinquent peers on one's own delinquent conduct the same for males and females?” or “is the effect of a given treatment program comparable between first-time and repeat offenders?” A frequent strategy in examining such interactive effects is to test for the difference between two regression coefficients across independent samples. That is, does b1= b2? Traditionally, criminologists have employed a t or z test for the difference between slopes in making these coefficient comparisons. While there is considerable consensus as to the appropriateness of this strategy, there has been some confusion in the criminological literature as to the correct estimator of the standard error of the difference, the standard deviation of the sampling distribution of coefficient differences, in the t or z formula. Criminologists have employed two different estimators of this standard deviation in their empirical work. In this note, we point out that one of these estimators is correct while the other is incorrect. The incorrect estimator biases one's hypothesis test in favor of rejecting the null hypothesis that b1= b2. Unfortunately, the use of this incorrect estimator of the standard error of the difference has been fairly widespread in criminology. We provide the formula for the correct statistical test and illustrate with two examples from the literature how the biased estimator can lead to incorrect conclusions.

2,346 citations


Book
01 Oct 1998
TL;DR: This chapter discusses Graphical Descriptive Techniques for Quantitative Data, which focuses on the art and science of Graphical Presentations, and Hypothesis Testing, and its applications to Statistics.
Abstract: 1. WHAT IS STATISTIC?. Introduction to Statistics. Key Statistical Concepts. How Managers Use Statistics. Statistics and the Computer. World Wide Web and Learning Center. Part I. DESCRIPTIVE TECHNIQUES AND PROBABILITY. 2. Graphical Descriptive Techniques. Introduction. Types of Data. Graphical Techniques for Quantitative Data. Scatter Diagrams. Pie Charts, Bar Charts, and Line Charts. Summary. Case 2.1 Pacific Salmon Catches. Case 2.2 Bombardier Inc. Case 2.3 The North American Free Trade Agreement (NAFTA). Appendix 2.A Minitab Instructions. Appendix 2.B Excel Instructions. 3. Art and Science of Graphical Presentations. Introduction. Graphical Excellence. Graphical Deception. Summary. Case 3.1 Canadian Federal Budget. 4. Numerical Descriptive Measures. Introduction. Measures of Central Location. Measures of Variability. Interpreting Standard Deviation. Measures of Relative Standing and Box Plots. Measures of Association. General Guidelines on the Exploration of Data. Summary. Appendix 4.A Minitab Instructions. Appendix 4.B Summation Notation. 5. Data Collection and Sampling. Introduction. Sources of Data. Sampling. Sampling Plans. Errors Involved in Sampling. Use of Sampling in Auditing. Summary. 6. Probability and Discrete Probability Distributions. Introduction. Assigning Probabilities to Events. Probability Rules and Trees. Random Variables and Probability Distributions. Expected Value and Variance. Bivariate Distributions. Binomial Distribution. Poisson Distribution. Summary. Case 6.1 Let's Make a Deal. Case 6.2 Gains from Market Timing. Case 6.3 Calculating Probabilities Associated with the Stock Market. Appendix 6.A Minitab Instructions. Appendix 6.B Excel Instructions. 7. Continuous Probability Distributions. Introduction. Continuous Probability Distributions. Normal Distribution. Exponential Distribution. Summary. Appendix 7.A Minitab Instructions. Appendix 7.B Excel Instructions. Part II. STATISTICALl INFERENCE. 8. Sampling Distributions. Introduction. Sampling Distribution of the Mean. Summary. 9. Introduction to Estimation. Introduction. Concepts of Estimation. Estimating the Population Mean When the Population Variance Is Known. Selecting the Sample Size. Summary. Appendix 9.A Minitab Instructions. Appendix 9.B Excel Instructions. 10. Introduction to Hypothesis Testing. Introduction. Concepts of Hypothesis Testing. Testing the Population Mean When the Population Variance Is Known. The p-Value of a Test of Hypothesis. Calculating the Probability of a Type II Error. The Road Ahead. Summary. Appendix 10.A Minitab Instructions. Appendix 10.B Excel Instructions. 11. Inference about the Description of a Single Population. Introduction. Inference about a Population Mean When the Population Variance Is Unknown. Inference about a Population Variance. Inference about a Population Proportion. The Myth of the Law of Averages. Case 11.1 Number of Uninsured Motorists. Case 11.2 National Patent Development Corporation.

805 citations


Posted Content
TL;DR: In this paper, the authors derived the asymptotic sampling distribution of various estimators frequently used to order distributions in terms of poverty, welfare and inequality, and established the statistical results for deterministic or stochastic poverty lines as well as for paired or independent samples of incomes.
Abstract: We derive the asymptotic sampling distribution of various estimators frequently used to order distributions in terms of poverty, welfare and inequality. This includes estimators of most of the poverty indices currently in use, as well as estimators of the curves used to infer stochastic dominance of any order. These curves can be used to determine whether poverty, inequality or social welfare is greater in one distribution than in another for general classes of indices. We also derive the sampling distribution of the maximal poverty lines (or income censoring thresholds) up to which we may confidently assert that poverty or social welfare is greater in one distribution than in another. The sampling distribution of convenient estimators for dual approaches to the measurement of poverty is also established. The statistical results are established for deterministic or stochastic poverty lines as well as for paired or independent samples of incomes. Our results are briefly illustrated using data for 6 countries drawn from the Luxembourg Income Study data bases.

738 citations


Journal ArticleDOI
TL;DR: This work constructs Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic for examples including contingency tables, logistic regression, and spectral analysis of permutation data.
Abstract: We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Examples include contingency tables, logistic regression, and spectral analysis of permutation data. The algorithms involve computations in polynomial rings using Grobner bases.

724 citations


Journal ArticleDOI
TL;DR: In this paper, bias in average r and average rz' was empirically examined and it was concluded that the use of z' decreased bias when correlations from a matrix were averaged, contrary to analytical expectations.
Abstract: R. A. Fisher's z (z'; 1958) essentially normalizes the sampling distribution of Pearson r and can thus be used to obtain an average correlation that is less affected by sampling distribution skew, suggesting a less biased statistic. Analytical formulae, however, indicate less expected bias in average r than in average z' back-converted to average rz' . In large part because of this fact, J. E. Hunter and F. L. Schmidt (1990) have argued that average r is preferable to average rz' . In the present study, bias in average r and average rz' was empirically examined. When correlations from a matrix were averaged, the use of z' decreased bias. For independent correlations, contrary to analytical expectations, average rz' was also generally the less biased statistic. It is concluded that (a) average rz' is a less biased estimate of the population correlation than average r and (b) expected values formulae do not adequately predict bias in average rz' when a small number of correlations are averaged.

270 citations


Journal ArticleDOI
TL;DR: In this article, the authors extend the Markov switching model and use the information contained in leading indicator data to forecast transition probabilities, which can then be used to calculate expected durations.

264 citations


Journal ArticleDOI
David Higdon1
TL;DR: Two applications in Bayesian image analysis are considered: a binary classification problem in which partial decoupling out performs Swendsen-Wang and single-site Metropolis methods, and a positron emission tomography reconstruction that uses the gray level prior of Geman and McClure.
Abstract: Suppose that one wishes to sample from the density π(x) using Markov chain Monte Carlo (MCMC). An auxiliary variable u and its conditional distribution π(u|x) can be defined, giving the joint distribution π(x, u) = π(x)π(u|x). A MCMC scheme that samples over this joint distribution can lead to substantial gains in efficiency compared to standard approaches. The revolutionary algorithm of Swendsen and Wang is one such example. Besides reviewing the Swendsen-Wang algorithm and its generalizations, this article introduces a new auxiliary variable method called partial decoupling. Two applications in Bayesian image analysis are considered: a binary classification problem in which partial decoupling out performs Swendsen-Wang and single-site Metropolis methods, and a positron emission tomography (PET) reconstruction that uses the gray level prior of Geman and McClure. A generalized Swendsen–Wang algorithm is developed for this problem, which reduces the computing time to the point where MCMC is a viab...

245 citations


Journal ArticleDOI
TL;DR: In this paper, spatial simulated annealing is presented as a method to optimize spatial environmental sampling schemes, and it is shown that SSA is superior to conventional methods of designing sampling schemes.
Abstract: Spatial sampling is an important issue in environmental studies because the sample configuration influences both costs and effectiveness of a survey. Practical sampling constraints and available pre-information can help to optimize the sampling scheme. In this paper, spatial simulated annealing (SSA) is presented as a method to optimize spatial environmental sampling schemes. Sampling schemes are optimized at the point-level, taking into account sampling constraints and preliminary observations. Two optimization criteria have been used. The first optimizes even spreading of the points over a region, whereas the second optimizes variogram estimation using a proposed criterion from the literature. For several examples it is shown that SSA is superior to conventional methods of designing sampling schemes. Improvements up to 30% occur for the first criterion, and an almost complete solution is found for the second criterion. Spatial simulated annealing is especially useful in studies with many sampling constraints. It is flexible in implementing additional, quantitative criteria.

230 citations


Journal ArticleDOI
TL;DR: This article used the Gibbs sampling approach in the context of a three state Markov-switching model to show how heteroskedasticity affects inference and suggest two strategies for valid inference.

149 citations


Journal ArticleDOI
TL;DR: In this article, a method of assessing the extent to which the value of a variable on a given segment of a network influences values of that variable on contiguous segments is examined using network autocorrelation analysis.

114 citations


01 Jan 1998
TL;DR: In this article, a general approach of approximating the marginal sample distribution for a given population distribution and first order sample section probabilities is discussed and illustrated, and a general method of inference on the population distribution (model) under informative sampling is proposed.
Abstract: The sample distribution is defined as the distribution of the sample mea- surements given the selected sample. Under informative sampling, this distribution is different from the corresponding population distribution, although for several examples the two distributions are shown to be in the same family and only differ in some or all the parameters. A general approach of approximating the marginal sample distribution for a given population distribution and first order sample se- lection probabilities is discussed and illustrated. Theoretical and simulation results indicate that under common sampling methods of selection with unequal proba- bilities, when the population measurements are independently drawn from some distribution (superpopulation), the sample measurements are asymptotically inde- pendent as the population size increases. This asymptotic independence combined with the approximation of the marginal sample distribution permits the use of stan- dard methods such as direct likelihood inference or residual analysis for inference on the population distribution. Survey data may be viewed as the outcome of two random processes: The process generating the values in the finite population, often referred to as the 'superpopulation model', and the process selecting the sample data from the finite population values, known as the 'sample selection mechanism'. Analytic inference from survey data relates to the superpopulation model, but when the sample selection probabilities are correlated with the values of the model response variables even after conditioning on auxiliary variables, the sampling mechanism becomes informative and the selection effects need to be accounted for in the inference process. In this article, we propose a general method of inference on the population distribution (model) under informative sampling that consists of approximating the parametric distribution of the sample measurements. The sample distribu- tion is defined as the distribution of measurements corresponding to the units in

01 Jan 1998
TL;DR: In this paper, a three-part simulation to study the distribution of fit statistics for computerized adaptive testing (CAT) is described, where the theoretical distribution of the often used l(z) statistic across theta levels in a conventional testing and in CAT testing was studied, where both theta and estimated theta were used to calculate l(Z).
Abstract: Several person-fit statistics have been proposed to detect item score patterns that do not fit an item response theory model. To classify response patterns as not fitting a model, a distribution of a person-fit statistic is needed. The null distributions of several fit statistics have been investigated using conventionally administered tests, but less is known about the distribution of fit statistics for computerized adaptive testing (CAT). A three-part simulation to study this distribution is described. First the theoretical distribution of the often used l(z) statistic across theta levels in a conventional testing and in CAT testing was studied, where theta and estimated theta were used to calculate l(z). Also, the distribution of a statistic l*(z), that is corrected for the error in theta, proposed by T. Snijders (1998) was studied in both testing environments. Simulating the distribution of l(z) for the two-parameter logistic model for conventional tests was studied. Two procedures for simulating the distribution of l(z) and l*(z) in a CAT were examined: (1) item scores were simulated with a fixed set of administered items; and (2) item scores were generated according to a stochastic design, where the choice of the administered item i + 1 depended on responses to previously administered items. The third study was a power study conducted to compare detection rates of l*(z) with l(z) for conventional tests. Results indicate that the distribution of l(z) differed from the theoretical distribution in conventional and CAT environments. In a conventional testing situation, the distribution of l(z) was in accord with the theoretical distribution, but for the CAT the distribution differed from the theoretical distribution. In the context of conventional testing, simulating the sampling distribution of l(z) for every examinee, based on theta, resulted in an appropriate approximation of the distribution. However, for the CAT environment, simulating the sampling distributions of both l(z) and l*(z) was problematic. Two appendixes show the derivation of the l*(z) statistic and discuss modeling local dependence.

Book
08 Jan 1998
TL;DR: In this article, the authors present data in tables and charts and present several important discrete probability distributions, including the Normal Distribution and Other Continuous Distributions, as well as several statistical applications in quality management self-test solutions.
Abstract: Preface 1. Introduction and Data Collection 2. Presenting Data in Tables and Charts 3. Numerical Descriptive Measures 4. Basic Probability 5. Some Important Discrete Probability Distributions 6. The Normal Distribution and Other Continuous Distributions 7. Sampling and Sampling Distributions 8. Confidence Interval Estimation 9. Fundamentals of Hypothesis Testing 10. Two Sample Tests and One-Way Anova 11. Chi-Square Tests 12. Simple Linear Regression 13. Multiple Regression 14. Statistical Applications in Quality Management Self-Test Solutions and Answers to Selected Even-Numbered Problems Index

Journal ArticleDOI
TL;DR: In this article, a method for correcting polynomial-type estimators of bias, and for constructing simultaneous confidence bands for the data edge, is proposed, based on large-sample approximations of the distributions of the edge estimators, and then developing algorithms for simulating from them so as to produce Monte Carlo approximation to the distribution of the difference between the true edge and its estimator.

Journal ArticleDOI
TL;DR: In this paper, a modification to the Pearson chi-square test is proposed to measure the appropriate distance between observed and hypothesized cell counts in a contingency table, and its distribution is no longer the familiar chisquare.
Abstract: SUMMARY In many studies, multiple categorical responses or measurements are made on members of different populations or treatment groups. This arises often in surveys where individuals may mark all answers that apply when responding to a multiple-choice question. Frequently, it is of interest to determine whether the distributions of responses differ among groups. In this situation, the test statistic of the usual Pearson chi-square test no longer measures a scaled distance between observed and hypothesized cell counts in a contingency table, and its distribution is no longer the familiar chisquare. This paper presents a modification to the Pearson statistic that measures the appropriate distance for multiple-response tables. The asymptotic distribution is shown to be that of a linear combination of chi-square random variables with coefficients depending on the true probabilities. A bootstrap resampling method is proposed instead to obtain a null-hypothesis sampling distribution. Simulations show that this bootstrap method maintains its size under a variety of circumstances, while a naively applied Pearson chi-square test is severely affected by multiple responses.

Journal ArticleDOI
TL;DR: In this paper, the authors derived the asymptotic sampling distribution of various estimators frequently used to order distributions in terms of poverty, welfare and inequality, and established the statistical results for deterministic or stochastic poverty lines as well as for paired or independent samples of incomes.
Abstract: We derive the asymptotic sampling distribution of various estimators frequently used to order distributions in terms of poverty, welfare and inequality. This includes estimators of most of the poverty indices currently in use, as well as estimators of the curves used to infer stochastic dominance of any order. These curves can be used to determine whether poverty, inequality or social welfare is greater in one distribution than in another for general classes of indices. We also derive the sampling distribution of the maximal poverty lines (or income censoring thresholds) up to which we may confidently assert that poverty or social welfare is greater in one distribution than in another. The sampling distribution of convenient estimators for dual approaches to the measurement of poverty is also established. The statistical results are established for deterministic or stochastic poverty lines as well as for paired or independent samples of incomes. Our results are briefly illustrated using data for 6 countries drawn from the Luxembourg Income Study data bases.

Journal ArticleDOI
TL;DR: In this paper, the authors examined both the interpretational and the statistical properties of the FSC and concluded that it has an intuitive interpretation that is no less useful than either a standard correlation coefficient or its competitors, its sampling distribution is approximately normal, and the conventional formula for the estimated standard error may underestimate the true standard error in some circumstances.
Abstract: For both public policy and theoretical reasons, criminologists have been interested in the degree to which criminal offenders specialize in particular crimes. Traditionally, offense specialization has been measured with the forward specialization coefficient (FSC). Recently, the FSC has been criticized for being interpretationally obtuse and having no known sampling distribution. In this paper we examine both the interpretational and the statistical properties of the FSC. We conclude that (1) it has an intuitive interpretation that is no less useful than either a standard correlation coefficient or its competitors, (2) its sampling distribution is approximately normal, and (3) the conventional formula for the estimated standard error of the FSC may underestimate the true standard error in some circumstances. With these results behind us, we propose and illustrate both a parametric statistical test for the difference between two independent FSCs and two nonparametric alternatives.

Journal ArticleDOI
TL;DR: In this article, the authors investigate alternative unconditional and conditional distributional models for the returns on Japan's Nikkei 225 stock market index among them is the recently introduced class of ARMA-GARCH models driven by α-stable (or stable Paretian) distributed innovations, designed to capture the observed serial dependence, conditional heteroskedasticity and fat-tailedness present in the return data.
Abstract: We investigate alternative unconditional and conditional distributional models for the returns on Japan's Nikkei 225 stock market index Among them is the recently introduced class of ARMA-GARCH models driven by α-stable (or stable Paretian) distributed innovations, designed to capture the observed serial dependence, conditional heteroskedasticity and fat-tailedness present in the return data Of the eight entertained distributions, the partially asymmetric Weibull, Student's t and asymmetric α-stable present themselses as the most viable candidates in terms of overall fit However, the tails of the sample distribution are approximated best by the asymmetric α-stable distribution Good tail approximations are particularly important for risk assessments

Journal ArticleDOI
TL;DR: The use of the Jeffreys prior in Bayesian analysis of the simultaneous equations model (SEM) was studied in this article, where the posterior density of the structural coefficient β in canonical SEMs with two endogenous variables was derived.

Book
01 Apr 1998
TL;DR: In this paper, a method of least squares construction of an unbiased estimator of Sigma-2 normal regression analysis Pearson's productmoment correlation coefficient the sum of squares of errors as a measure of the amount of linear structure exercises.
Abstract: Probability theory: probability spaces stochastic variables product measures and statistical independence functions of stochastic vectors expectation, variance and covariance of stochastic variables distribution functions and probability distributions moments, moment generating functions and characteristic function the central limit theorem exercises. Statistics and their probability distributions, estimation theory: introduction the Gamma distribution and the Chi-2-distribution the t-distribution statistics to measure differences in mean the F-distribution the Beta distribution populations which are not normally distributed Bayesian estimation estimation theory in a more general framework maximum likelihood estimation, sufficiency exercises. Hypothesis testing: the Neyman-Pearson theory hypothesis tests concerning normally distributed populations the Chi-2 test on goodness of fit the Chi-2 test on statistical independence exercises. Simple regression analysis: the method of least squares construction of an unbiased estimator of Sigma-2 normal regression analysis Pearson's product-moment correlation coefficient the sum of squares of errors as a measure of the amount of linear structure exercises. Normal analysis of variance: one-way analysis of variance two-way analysis of variance exercises. Non-parametric methods: the sign test Wilcoxon's signed-rank test Wilcoxon's rank-sum test the runs test rank correlation tests the Kruskal-Wallis test Friedman's test exercises. Stochastic analysis and its applications in statistics: the empirical distribution function associated with a sample convergence of stochastic variables the Glivenko-Cantelli theorem the Kolmogorov-Smirnov test statistic metrics on the set of distribution functions smoothing techniques robustness of statistics trimmed means, the median, and their robustness statistical functionals the von Mises derivative influence functions Bootstrap methods estimation of densities by mean of kernel densities estimation of densities by means of histograms exercises. Vectorial statistics: linear algebra the expectation vector and the covariance operator of stochastic vectors vectorial samples the vectorial normal distribution conditional probability distributions that emanate from Gaussian ones vectorial samples from Gaussian distributed populations normal correlation analysis multiple regression analysis the multiple correlation coefficient exercise. Appendices: Lebesgue's convergence theorems product measures conditional probabilities the characteristic function of the Cauchy distribution metric spaces, equicontinuity the Fourier transform and the existence of stoutly tailed distribution functions. List of elementary probability densities frequently used symbols statistical tables references.

Journal ArticleDOI
TL;DR: A finite sample mean square error (MSE) is proposed to be used to select thresholds and to estimate tail characteristics, and confidence limits are obtained using the sampling distribution of estimators at the optimal threshold.

Journal ArticleDOI
TL;DR: In this paper, the median ranked set sampling with probability proportion to size and with errors in ranking is considered and compared with the ranked set with error in ranking, and computer simulation results for some probability distributions are also given.
Abstract: Median ranked set sampling may be combined with size biased probability of selection. A two-phase sample is assumed. In the first phase, units are selected with probability proportional to their size. In the second phase, units are selected using median ranked set sampling to increase the efficiency of the estimators relative to simple random sampling. There is also an increase in the efficiency relative to ranked set sampling (for some probability distribution functions). There will be a loss in efficiency depending on the amount of errors in ranking the units, the median ranked set sampling can be used to reduce the errors in ranking the units selected from the population. Estimators of the population mean and the population size are considered. The median ranked set sampling with probability proportion to size and with errors in ranking is considered and compared with ranked set sampling with errors in ranking. Computer simulation results for some probability distributions are also given.

Journal ArticleDOI
TL;DR: In this article, a test for the presence of a stationary first-order autoregressive process embedded in white noise is constructed, which is shown to have a Cramer-von Mises distribution in large samples.

Journal ArticleDOI
TL;DR: In this article, the authors present estimation and likelihood-ratio testing for various monotone trend alternatives to independence in contingency tables with ordered categories, characterized by non-negative values for four types of log odds ratios for ordinal data: Local, global, cumulative, and continuation ratios.

Journal ArticleDOI
TL;DR: In this article, the authors derive the asymptotic distribution for the main representatives of the two classes of measures: indices based on inequality measures and transition matrices, and derive rigorous statistically inferences about levels of or changes in mobility.

Journal ArticleDOI
TL;DR: In this article, the universal kriging model is reexamined together with methods for handling problems in the inference of parameters, including bias, variance, and mean square error of the sampling distribution obtained by Monte Carlo simulation for three different estimators.
Abstract: Universal kriging originally was developed for problems of spatial interpolation if a drift seemed to be justified to model the experimental data. But its use has been questioned in relation to the bias of the estimated underlying variogram (variogram of the residuals), and furthermore universal kriging came to be considered an old-fashioned method after the theory of intrinsic random functions was developed. In this paper the model is reexamined together with methods for handling problems in the inference of parameters. The efficiency of the inference of covariance parameters is shown in terms of bias, variance, and mean square error of the sampling distribution obtained by Monte Carlo simulation for three different estimators (maximum likelihood, bias corrected maximum likelihood, and restricted maximum likelihood). It is shown that unbiased estimates for the covariance parameters may be obtained but if the number of samples is small there can be no guarantee of ‘good’ estimates (estimates close to the true value) because the sampling variance usually is large. This problem is not specific to the universal kriging model but rather arises in any model where parameters are inferred from experimental data. The validity of the estimates may be evaluated statistically as a risk function as is shown in this paper.

Journal ArticleDOI
TL;DR: In this paper, a nonparametric Bayesian model for data that can be accommodated in a contingency table with fixed right margin totals is proposed, where cell count vectors for each group are conditionally independent, with multinomial distribution given vectors of classification probabilities.
Abstract: In this work I postulate a nonparametric Bayesian model for data that can be accommodated in a contingency table with fixed right margin totals. This data structure usually arises when comparing different groups regarding classification probabilities for a number of categories. I assume that cell count vectors for each group are conditionally independent, with multinomial distribution given vectors of classification probabilities. In turn, these vectors of probabilities are assumed to be a sample from a distribution F, and the prior distribution of F is assumed to be a Dirichlet process, centered on a probability measure α and with weight c. I also assume a prior distribution for c, as a way of obtaining a better control on the clustering structure induced by the Dirichlet process. I use this setting to assess homogeneity of classification probabilities, and propose a “Bayes factor.” I derive exact expressions for the relevant quantities. These can be directly computed when the number of rows k i...

Journal ArticleDOI
TL;DR: The research reported in this paper examined the hypothesis that frequency distribution versions of sample-size tasks yield higher solution rates than corresponding sampling distribution versions, and found a substantial difference between solution rates for the two types of task.
Abstract: Different studies on how well people take sample size into account have found a wide range of solution rates. In a recent review, Sedlmeier and Gigerenzer (1997) suggested that a substantial part of the variation in results can be explained by the fact that experimenters have used two different types of sample-size tasks, one involving frequency distributions and the other sampling distributions. This suggestion rested on an analysis of studies that, with one exception, did not systematically manipulate type of distribution. In the research reported in this paper, well-known sample-size tasks were used to examine the hypothesis that frequency distribution versions of sample-size tasks yield higher solution rates than corresponding sampling distribution versions. In Study 1, a substantial difference between solution rates for the two types of task was found. Study 2 replicated this finding and ruled out an alternative explanation for it, namely, that the solution rate for sampling distribution tasks was lower because the information they contained was harder to extract than that in frequency distribution tasks. Finally, in Study 3 an attempt was made to reduce the gap between the solution rates for the two types of tasks by giving participants as many hints as possible for solving a sampling distribution task. Even with hints, the gap in performance remained. A new computational model of statistical reasoning specifies cognitive processes that might explain why people are better at solving frequency than sampling distribution tasks. Copyright© 1998 John Wiley & Sons, Ltd.

Journal ArticleDOI
01 Nov 1998-Extremes
TL;DR: In this paper, the authors developed a methodology for conducting inference based on record values and record times derived from a sequence of independent and identically distributed random variables, and showed that using information about record times and record values jointly contain considerably more information about F than do the record values alone.
Abstract: We develop methodology for conducting inference based on record values and record times derived from a sequence of independent and identically distributed random variables. The advantage of using information about record times as well as record values is stressed. This point is a subtle one, since if the sampling distribution F is continuous then there is no information at all about F in the record times alone; the joint distribution of any number of them does not depend on F. However, the record times and record values jointly contain considerably more information about F than do the record values alone. Indeed, in the case of a distribution with regularly varying tails, the rate of convergence of the exponent of regular variation is two orders of magnitude faster if information about record times is included. Optimal estimators and convergence rates are derived under simple, specific models, and shown to be surprisingly robust against significant departures from those models. However, even under our special models the estimators have irregular properties, including an undefined information matrix. To some extent these difficulties may be alleviated by conditioning and by considering the relationship between maximum likelihood and maximum probability estimators.

Journal ArticleDOI
TL;DR: In this paper, a trimmed version of the Mallows distance (Mallows 1972) between two cumulative distribution functions (c.d.'s F and G) is used to assess the similarity of two c.d.s with respect to this distance at controlled type I error rate.
Abstract: The problem of assessing similarity of two cumulative distribution functions (c.d.f.'s) has been the topic of a previous paper by the authors (Munk and Czado (1995)). Here, we developed an asymptotic test based on a trimmed version of the Mallows distance (Mallows 1972) between two c.d.f.'s F and G. This allows to assess the similarity of two c.d.f.'s with respect to this distance at controlled type I error rate. In particular, this applies to bioequivalence testing within a purely nonparametric setting. In this paper, we investigate the finite sample behavior of this test. The effect of trimming and non equal sample size on the observed power and level is studied. Sample size driven recommendations for the choice of the trimming bounds are given in order to minimize the bias. Finally, assuming normality and homogeneous variances, we simulate the relative efficiency of the Mallows test to the (asymptotically optimal) standard equivalence t test, which reveals the Mallows test as a robust alternative to th...