scispace - formally typeset
Search or ask a question

Showing papers on "Sample size determination published in 1968"


Journal ArticleDOI
TL;DR: In this article, several methods of estimating error rates in discriminant analysis are evaluated by sampling methods, and two methods in most common use are found to be significantly poorer than some new methods that are proposed.
Abstract: Several methods of estimating error rates in Discriminant Analysis are evaluated by sampling methods. Multivariate normal samples are generated on a computer which have various true probabilities of misclassification for different combinations of sample sizes and different numbers of parameters. The two methods in most common use are found to be significantly poorer than some new methods that are proposed.

1,513 citations


Journal ArticleDOI
TL;DR: In this paper, an empirical sampling study of the sensitivities of nine statistical procedures for evaluating the normality of a complete sample was carried out, including W (Shapiro and Wilk, 1965), (standard third moment), b 2 (standard fourth moment), KS (Kolmogorov-Smirnov), CM (Cramer-Von Mises), WCM (weighted CM), D (modified KS), CS (chi-squared) and u (studentized range).
Abstract: Results are given of an empirical sampling study of the sensitivities of nine statistical procedures for evaluating the normality of a complete sample. The nine statistics are W (Shapiro and Wilk, 1965), (standard third moment), b 2 (standard fourth moment), KS (Kolmogorov-Smirnov), CM (Cramer-Von Mises), WCM (weighted CM), D (modified KS), CS (chi-squared) and u (Studentized range). Forty-five alternative distributions in twelve families and five sample sizes were studied. Results are included on the comparison of the statistical procedures in relation to groupings of the alternative distributions, on means and variances of the statistics under the various alternatives, on dependence of sensitivities on sample size, on approach to normality as measured by the W statistic within some classes of distribution, and on the effect of misspecification of parameters on the performance of the simple hypothesis test statistics. The general findings include: (i) The W statistic provides a generally superio...

1,093 citations


Journal ArticleDOI
TL;DR: In this paper, a Monte Carlo experiment is carried out to examine the small sample properties of five alternative estimators of a set of linear regression equations with mutually correlated disturbances, and the results show that three of the five estimation methods lead to identical estimates for any sample size, that in many cases the two-stage Aitken estimator performs as well as or better than the other estimators, and most of the asymptotic properties of this estimator tend to hold in small samples as well.
Abstract: A Monte Carlo experiment is carried out to examine the small sample properties of five alternative estimators of a set of linear regression equations with mutually correlated disturbances. The estimators considered are ordinary least squares, Zellner's two-stage Aitken, Zellner's iterative Aitken, Telser's iterative, and maximum likelihood. The experiment, based on 100 samples, provides approximate sampling distributions for samples of size 10, 20 and 100 for various model specifications. The results show that three of the five estimation methods lead to identical estimates for any sample size, that in many cases the two-stage Aitken estimator performs as well as or better than the other estimators, and that most of the asymptotic properties of this estimator tend to hold in small samples as well.

465 citations


Journal ArticleDOI
TL;DR: Statistical properties of several methods for obtaining estimates of factor loadings and procedures for estimating the number of factors are compared by means of random sampling experiments.
Abstract: Statistical properties of several methods for obtaining estimates of factor loadings and procedures for estimating the number of factors are compared by means of random sampling experiments. The effect of increasing the ratio of the number of observed variables to the number of factors, and of increasing sample size, is examined. A description is given of a procedure which makes use of the Bartlett decomposition of a Wishart matrix to generate random correlation matrices.

185 citations


Journal ArticleDOI
TL;DR: This article showed that the increase in the probabilities of misclassification is directly related to shrinkage of the multiple correlation coefficient R2 in new samples and that these are related to the unbiased estimation of Mahalanobis' 62 using D2.
Abstract: When a sample discriminant function D, is computed, it is desired to estimate the chance of misclassification using D8. This is often done by classifying the sample with the help of D8 or by computing 4((ID), where b is the cumulative normal distribution, and D2 is Mahalanobis' distance. When D. is applied to a new sample, the observed probabilities of misclassification are usually found to be greater than those computed from the initial sample. The purposes of this paper are to show that this increase in the probabilities of misclassification is directly related to the 'shrinkage' of the multiple correlation coefficient R2 in new samples and that these are related to the unbiased estimation of Mahalanobis' 62 using D2.

154 citations


Journal ArticleDOI
TL;DR: In this paper, the robustness of the F-test between means to its underlying assumptions (normally distributed populations with equal variances) is investigated using two nonnormal distributions (exponential and lognormal).
Abstract: In this study of robustness the insensitivity of the F-test between means to its underlying assumptions (normally distributed populations with equal variances) is investigated. Using two nonnormal distributions (exponential and lognormal), it is found that the test is fairly insensitive for moderate and equal sample size (n = 32) when the variances are equal. Further, for small samples (n < 32), the test is conservative with respect to Type I error. It is also conservative with respect to Type II error for a large range of φ (noncentrality), depending on the size of the sample and α. When the within cell error variances are heterogenous, the test continues to be conservative for the upper values of φ and slightly biased toward larger Type II errors for smaller values of φ depending on the size of α. Analysis of the correlation between the numerator and denominator of F under the null hypothesis indicates that the robustness feature is largely due to this correlation. Analytic proofs under the non-null hyp...

149 citations


Journal ArticleDOI
TL;DR: In this paper, the authors assume that the results of the medical trial are to be evaluated by comparing the incidence of events in the control and experimental groups, and they assume that sampling is the only consideration.

123 citations


Journal ArticleDOI
TL;DR: In this paper, an approximate method for estimating the sample size in simple random sampling and a systematic way of transformation of sample data are derived by using the parameters α and β of the regression of mean crowding on mean density in the spatial distribution per quadrat of animal populations.
Abstract: An approximate method for estimating the sample size in simple random sampling and a systematic way of transformation of sample data are derived by using the parameters α and β of the regression of mean crowding on mean density in the spatial distribution per quadrat of animal populations (Iwao, 1968) If the values of α and β have been known for the species concerned, the sample size needed to attain a desired precision can be estimated by simply knowing the approximate level of mean density of the population to be sampled Also, an appropriate variance stabilizing transformation of sample data can be obtained by the method given here without restrictions on the distribution pattern of the frequency counts

120 citations


Journal ArticleDOI
TL;DR: General asymptotic results for power, sample size, and efficiency are derived for fixed sample size procedures inl the case of all-or-none responses and application to an epidemiologic problem is shown.
Abstract: The properties of the matched pairs design in the case of all-or-none responses have been the object of an increasing research interest. Wald ([1947] p. 108) pointed out that pairing on the basis of some irrelevant criterion leads to a loss of efficiency in small samples. This matter was later scrutinized by Youkeles [19631 who concluded that the loss vanishes as the sizes of the comparison series become 30 or more. A condition on which matching produces a gain in efficiency was derived by Worcester [1964]. Empirical work on the efficiency of the design was conducted by Billewicz [1964, 1965]. M\Iiettinen [1966] derived asymptotic results for power, sample size, and efficiency, but the nature of the approach limnits the applicability of the results to the vicinity of the null state. In the present work, general asymptotic results for power, sample size, and efficiency are derived for fixed sample size procedures inl the case of all-or-none responses. Application to an epidemiologic problem is shown.

116 citations


Journal ArticleDOI
S. S. Shapiko1, M. B. Wilk1
TL;DR: In this paper, the authors developed an approximation to the null distribution of W, a statistic suggested for testing normality by Shapiro and Wilk (1965), based on fitting and smoothing empirical sampling results.
Abstract: The present note deals with the development of an approximation to the null distribution of W, a statistic suggested for testing normality by Shapiro and Wilk (1965). This approximation is based on fitting and smoothing empirical sampling results. The KT statistic is defined as the ratio of the square of a linear combination of the ordered sample to the usual sum of squares of deviations about the mean. For a sample from a normal distribution, the ratio is statistically independent of its denominator and so the moments of the ratio are equal to the ratio of the moments. This enables the simple computation of the 3 and 1 moments of W. Higher moments of W are not available and hence the Cornish-Fisher expansion could not be used as an approximation method. Good approximation was attained, after preliminary investigations, using Johnson’s (1949) S, distribution, which is defined as that of the random variable u, where 2 = y + 6 In u - e X+e-u is distributed as standard normal, and where E and X + e are the minimum and maximum attainable values of U, respectively. For IV, X + E = 1 for all n, while e is a known function of sample size (see Shapiro and Wilk (1965)). Values of e are given in Table 1 of the present note for n = 3(1)50. To obtain suitable values of y and 6, in the case when the bounds are known, Johnson (1949) recommends matching chosen percentage points. An alternative method might be to match two moments, but would require heavy computation. Also, while the matching of two moments could be done solely using theoretical values, in principle, it would not necessarily provide for weighting the fit so as to be good in the tails of the distribution-which is what is wanted. The procedure actually used here was to do, for each 72, the simple least squares regression of the empirical sampling value of

84 citations


Journal ArticleDOI
TL;DR: In this paper, the mean of a normal distribution with unknown variance was estimated to lie within an interval of given fixed width at a prescribed confidence level using a procedure which overcomes ignorance about the distribution with no more than a finite number of observations.
Abstract: It is shown that the mean of a normal distribution with unknown variance $\sigma^2$ may be estimated to lie within an interval of given fixed width at a prescribed confidence level using a procedure which overcomes ignorance about $\sigma^2$ with no more than a finite number of observations. That is, the expected sample size exceeds the (fixed) sample size one would use if $\sigma^2$ were known by a finite amount, the difference depending on the confidence level $\alpha$ but not depending on the values of the mean $\mu$, the variance $\sigma^2$ and the interval width $2d$. A number of unpublished results on the moments of the sample size are presented. Some do not depend on an assumption of normality.

Journal ArticleDOI
TL;DR: In this article, the authors describe a practical investigation to determine how large the sample size must be for various kinds of non-normality to cease to have an appreciable effect on the distribution of student's t.
Abstract: Student's t assumes that the parent population is normal. This paper describes a practical investigation to determine how large the sample size must be for various kinds of non‐normality to cease to have an appreciable effect on the distribution of t.

Journal ArticleDOI
TL;DR: In this article, the problem of determining sample size to take for a tolerance limit L(X) is investigated, where X is a function of a random sample X 1, X n from a distribution with density f(x : θ), and is investigated.
Abstract: The problem of determining sample size to take for a tolerance limit L(X), where L(X) is a function of a random sample X 1, …, Xn from a distribution with density f(x : θ), and is investigated. A criteriorl of “goodness” of tolerance limits is developed and a method given, using this criterion, for solving the sample size problem. Examples are given using the uniform, exponential, and normal distributions as underlying models.

Journal ArticleDOI
A. Hald1
TL;DR: In this article, the authors derived and discussed the properties of a single sampling attribute plan obtained by miniiizing the average costs under the assumptions that sampling and decision costs are linear in lot size, sample size, and the number of defectives in the lot and the sample, and that the distribution of lot quality is a mixed binomial distribution.
Abstract: The purpose of the paper is to derive and discuss the properties of a system of single sampling attribute plans obtained by miniiizing the average costs under the assumptions that sampling and decision costs are linear in lot size, sample size, and the number of defectives in the lot and the sample, that sampling is without replacement, and that the distribution of lot quality is a mixed binomial distribution, i.e., each lot is produced by a process in binomial control but the process average varies from lot to lot according to a frequency distribution which is assumed to be differentiable in the neighbourhood of the break-even value. It is shown that the optimum sample size is approximately a linear function of the square root of the lot size, and that the optimum acceptance number is approximately a linear function of the sample size. The accuracy of the approximation has been evaluated numerically for the beta distribution as prior. Some auxiliary tables are given with examples of applications. A short...

Journal ArticleDOI
TL;DR: In this article, a series for the cumulative distribution function of the multiple correlation coefficient in which the coefficients are interpretable as binomial probabilities is obtained, and a comparatively simple approximation to the general distribution of the MCC is presented which is more accurate than Fisher's z-transform over a large set of values of the parameters involved.
Abstract: SUMMARY A series is obtained for the cumulative distribution function of the multiple correlation coefficient in which the coefficients are interpretable as binomial probabilities. When N-p is even (N is the sample size and p is the number of components in the underlying multivariate normal distribution) the series is finite with ji(n -p) + 1 terms. Other series are also considered. A comparatively simple approximation to the general distribution of the multiple correlation coefficient is presented which, in particular for p = 2, is more accurate than Fisher's z-transform over a large set of values of the parameters involved.

Journal ArticleDOI
TL;DR: In this article, a procedure for estimating the population of young Pribilof fur seals is described. But the sampling process is assumed to be random with respect to the presence or absence of the tag.
Abstract: It is well known that usual tag-sample estimates are valid if it can be assumed that the sampling process is random with respect to the presence or absence of the tag. This may occur because of natural processes or it may be the result of deliberate design of the experiment either in the tagging phase or in sampling, or in both. Procedures of this type, which have been applied to estimate the population of young Pribilof fur seals, are discussed. Particular consideration is given to variance estimates, comparisons of sample size, and methods of determining the randomness of allocation of the tags.

Journal ArticleDOI
TL;DR: In this article, the problem of selecting a subset of the smallest possible fixed size s that will contain the t best of k populations (t < s < k), based on any given common sample size from each of the k populations, is considered.
Abstract: SUMMARY The problem considered is that of selecting a subset of the smallest possible fixed size s that will contain the t best of k populations (t < s < k), based on any given common sample size from each of the k populations. Special emphasis, with a table included for finding s, is given to the case of normal distributions with larger means being better and with a common known variance. A criterion for efficiency, comparisons with other procedures, and a dual problem are also discussed.

Journal ArticleDOI
TL;DR: In this article, the Cum method of Dalenius and Hodges for approximately optimal construction of strata is utilized to approximate the variance of the stratified estimate, for estimation of the population mean of a random variable Y by the technique of stratified random sampling.
Abstract: The cum method of Dalenius and Hodges for approximately optimal construction of strata is utilized to approximate the variance of the stratified estimate, for estimation of the population mean of a random variable Y by the technique of stratified random sampling. The approximation provides a basis for choosing optimally, for fixed cost, the number of strata to be constructed and the total sample size to be used. It also facilitates other purposes, such as the comparison of optimal stratified with optimal simple random sampling. The study is carried out for the situations of stratification on the estimation variable and of stratification on a covariable closely associated with the estimation variable.

Journal ArticleDOI
01 Jan 1968
TL;DR: In this article, the authors investigate the operating characteristics of these techniques under random sample sizes, and show that the performance of the max(X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18, X19, X20, X21, X22, X23, X24, X25, X26, X27, X28, X29, X
Abstract: In many applied probability models, one is concerned with a sequence {Xn: n > 1} of independent random variables (r.v.'s) with a common distribution function (d.f.), F say. When making statistical inferences within such a model, one frequently must do so on the basis of observations X1, X2,…, XN where the sample size N is a r.v. For example, N might be the number of observations that it was possible to take within a given period of time or within a fixed cost of experimentation. In cases such as these it is not uncommon for statisticians to use fixed-sample-size techniques, even though the random sample size, N, is not independent of the sample. It is therefore important to investigate the operating characteristics of these techniques under random sample sizes. Much work has been done since 1952 on this problem for techniques based on the sum, X1 + … + XN (see, for example, the references in (3)). Also, for techniques based on max(X1, X2, …, XN), results have been obtained independently by Barndorff-Nielsen(2) and Lamperti(9).

Journal ArticleDOI
TL;DR: In this paper, the authors extended the results of Madansky and Buehler for series, parallel and series-parallel systems to a much wider class of systems and showed that the use of the asymptotic chi-square method to determine confidence limits yields results which are substantially better for small sample sizes and equivalent for large ones for the cases considered.
Abstract: 0. Summary. The asymptotic chi-square distribution of the log-likelihood ratio is used to obtain approximate confidence intervals for the reliability of any system which may be represented by a monotone function of Bernoulli variates. This generalizes the results of A. Madansky, Technometrics, November 1965, for series, parallel and series-parallel systems. The method used is to parameterize the log-likelihood equation so as to find the interval of parameter values which keeps the log-likelihood less than or equal to the specified quantile of the chisquare distribution. This is done by introducing an operator depending upon the parameter, a fixed point of which is the solution of the likelihood ratio equation, and by showing the operator is a contractive map and hence has a unique fixed point depending continuously on the parameter. The solution can be found simply by iteration. 1. Introduction. Several approaches have been made to the problem of determining confidence limits for system reliability. Buehler [2] has given upper confidence lirnits for the product of two binomial parameters using Poisson approximations with an accuracy believed to be adequate whenever both sample sizes exceed forty. Buehler's method has been extended by Lipow [5] and Steck [8], for certain cases and short tabulations have been made. Madansky [6] has made use of the log-likelihood ratio's asymptotic chi-square distribution to obtain confidence intervals for series, parallel and series-parallel systems. It is the extension of this last result to a much wider class of systems that we present here. The reason for this effort is the acknowledged inadequacy in many practical instances of the only known method of finding confidence bounds, from data on the performance of the components, for the reliability of general systems. This known method is the use of the asymptotic normality of the maximum likelihood estimates. A comparison, for certain cases of practical importance, of the method presented here with this alternative method is carried out in [7]. There the authors have shown that the use of the asymptotic chi-square method to determine confidence limits yields results which are substantially better for small sample sizes and equivalent for large ones for the cases considered. In order to be specific about the class of systems which we consider here, it is necessary to introduce some notation. For the ith component among m, let the Bernoulli random variable Yi indicate performance by taking the value one for success and zero for failure. The state of the components then is the vector Y = ( Y1, * * *, Yin). By a coherent (monotone) structure we follow [1] to mean there


Journal ArticleDOI
TL;DR: In this article, the problem of optimum allocation of sample size to strata is examined in the light of a priori distributions, and the sample size is allocated so as to minimize the expected variance of the strategy consisting of πPS sampling scheme and the Horvitz-Thompson estimator under a general super-population model.
Abstract: The problem of optimum allocation of sample size to strata is examined in the light ofa priori distributions. In this context, we discuss with illustrations the justification for the assumption that the unknown proportionate values of σ 2 ’s can be replaced by the proportionate values of the known σ 2 ’s, which are estimates of σ 2 ’s. The sample size is allocated so as to minimize the expected variance of the strategy consisting of πPS sampling scheme and the Horvitz-Thompson estimator under a general super-population model. It is further shown that, in the sense of expected variance, πPS sampling for unstratified sampling is inferior to πPS stratified sampling with this type of allocation, unless the superpopulation parameterg attains the value 2, in which case both schemes are equivalent.

Journal ArticleDOI
TL;DR: Results show that for a given level of accuracy, significantly less computer time is required when sample sizes are determined by the method suggested in this study than when they are equal.
Abstract: This paper investigates the problem of efficiently allocating computer time between two simulation experiments when the objective is to make a statistical comparison of means. For a given level of accuracy our results show that significantly less computer time is required when the sample sizes are determined according to a certain rule than when the sample sizes are equal. A graphical analysis suggests that small errors in estimating the population parameters of the allocation rule do not significantly affect the efficient allocation of time. The influence that the degree of autocorrelation has on the time allocation is also investigated; results show that small differences in the autocorrelation functions are important when each process is highly autocorrelated. Positively correlated samples for the two experiments are examined and incorporated into the efficient allocation rule. It is shown that their use leads to a saving in computer time. A two-stage procedure is described wherein initial estimates of...

Journal ArticleDOI
TL;DR: In this paper, the authors present the results of a Monte Carlo study of the accuracy of an approximation to the power of the chi-square goodness of fit test with small but equal expected frequencies.
Abstract: This paper presents the results of a Monte Carlo study of the accuracy of an approximation to the power of the chi-square goodness of fit test with small but equal expected frequencies. Various combinations of sample size, number of groups, and alpha level are considered, and in most instances the actual power of the test is estimated to be less than the nominal power. The degree of accuracy appears to be more related to the size of the sample than to the size of the expected frequencies. The following rule of thumb is offered for obtaining crude estimates of the actual power from the nominal power for sample sizes from 10 to 50. The actual power of the test equals about eight-tenths of the nominal power.

Journal ArticleDOI
TL;DR: In this paper, two nonparametric tests for analyzing data from a one-way layout are presented, one is intended for k samples of equal size n, and the other is for k sample of possibly unequal sample sizes.
Abstract: Two nonparametric tests are presented for analyzing data from a one-way layout. Their principal characteristic is ease in computation. One test is intended for k samples of equal size n, and the other is for k samples of possibly unequal sample sizes. Both tests may be regarded as generalizations of Mosteller's slippage test. Multiple decision aspects of both tests are discussed. Monte Carlo studies show the first test to be more powerful than the F test in some situations, such as when the populations are normal and the larger means are accompanied by larger variances. Tables of critical values are presented.

Journal ArticleDOI
M. Hecht1, M. Schwartz
TL;DR: M -ary sequential tests are proposed and analyzed for amplitude-modulated signals that can be represented in a one-or two-dimensional signal space and the probability of error as a function of the average sample size for the sequential test is compared to the optimum fixed sample test.
Abstract: M -ary sequential tests are proposed and analyzed for amplitude-modulated signals that can be represented in a one-or two-dimensional signal space. Coherent detection in Gaussian noise has been assumed. Expressions for the probability of error and average sample size are found as a function of the threshold constants. The probability of error as a function of the average sample size for the sequential test is then compared to the optimum fixed sample test. For error rates in the range of 10-6, the average sample size of the sequential tests is about 2.5 times (4 dB) smaller than the sample size of the corresponding fixed sample tests. For comparison purposes the length of the binary sequential test will be 3.2 times (5.2 dB) smaller than the length of the corresponding binary fixed sample test. The sequential likelihood ratio test for amplitude-modulated signals in one dimension leads to a fixed sample test of a certain length to determine the two most likely signals and then a binary sequential test between these two signals to determine the most likely one. The binary sequential portion of the test incorporates the data obtained from the fixed portion of the test. For two-dimensional amplitude-modulated signals, the sequential test consists of a fixed sample test through a certain length and then two concurrent binary sequential tests on the two most likely amplitudes of each dimension.

Journal ArticleDOI
TL;DR: In this article, the effect of sample size on the degree of mixingness of powders is studied for on idealized model, and it is shown that samples of a single size provide all the necessary information, and a minimum sample size is recommended.
Abstract: One of the greatest difficulties in the analysis of mixing operations for powders is that of specifying the degree of mixedness in a quantitative manner. This problem is studied for on idealized model. A precise index of the degree of mixing is found for this model, which holds promise of being useful for real mixtures. It is shown that samples of a single size provide all the necessary information. The effect of sample size is defined quantitatively, and a minimum sample size is recommended.

Journal ArticleDOI
TL;DR: In this paper, a sample of unknown size n is obtained from a known population and the order statistics are then examined sequentially for the purpose of drawing inferences about n. This procedure provides a basis for examining some popular conjectures and approximations in sequential analysis.
Abstract: A sample of unknown size n is obtained from a known population. The order statistics are then examined sequentially for the purpose of drawing inferences about n. A procedure for testing the unknown sample size n is given and its exact properties are obtained. This procedure provides a basis for examining some popular conjectures and approximations in sequential analysis. In particular two suggested approximations to the average sample number of a sequential probability ratio test are discussed.

Journal ArticleDOI
TL;DR: In this paper, the authors developed computing formulas and helpful approximations for standard errors of these statistics, and apply them to a series of data from the Center's Surveys of Consumer Attitudes & Expectations.
Abstract: Methods of computing standard errors were developed and applied for several statistics of importance in economic and social surveys. These statistics are based on ratio means r = y/x; x is often the variable sample size of complex samples. Some statistics involve weighted sums σWjrj, others concern the relatives R1 = r1/r0 = (y1/x1)/(y0/x0), the ratio of current (1) to base (0) mean. From these relatives the indexes are constructed, and changes, I2 – I1, in the index. We develop computing formulas and helpful approximations for standard errors of these statistics, and apply them to a series of data from our Center's Surveys of Consumer Attitudes & Expectations. A large group of empirical results yield evidence on the behavior of these statistics, with wide implications for the design of surveys to measure economic and social indicators.

Journal ArticleDOI
TL;DR: Simple empirical formulae are presented for estimating appropriate sample size for simple randomized analysis of variance designs involving 2, 3, 4 or 5 treatments using the magnitude of a meaningful treatment difference and an estimate of the error variance.
Abstract: Simple empirical formulae are presented for estimating appropriate sample size for simple randomized analysis of variance designs involving 2, 3, 4 or 5 treatments. In order to use these formulae one must specify the magnitude of a meaningful treatment difference and must have an estimate of the error variance. Sample size estimates derived from the simple formulae have been found to differ from values obtained using constant power curves by no more than one sampling unit on the low side and no more than two sampling units on the high side.