scispace - formally typeset
Search or ask a question

Showing papers on "Sample size determination published in 1976"


Journal ArticleDOI
TL;DR: In this article, three different methods for testing all pairs of y − k, - y - k were compared under varying sample size (n) and variance conditions, with unequal n's of six and up.
Abstract: Three different methods for testing all pairs of yȳk, - yȳk’ were contrasted under varying sample size (n) and variance conditions. With unequal n’s of six and up, only the Behrens-Fisher statistic...

830 citations


Journal ArticleDOI
TL;DR: In this article, a method for summarizing the power of the parametric t tests and the nonparametric Spearman's rho test and Mann-Whitney's test against step and linear trends in a dimensionless trend number is presented.
Abstract: Classical statistical tests for trend, both parametric and nonparametric, assume independence of observations, a condition rarely encountered in time series obtained by using moderate to high sample frequencies. A method is developed for summarizing the power of the parametric t tests and the nonparametric Spearman's rho test and Mann-Whitney's test against step and linear trends in a dimensionless ‘trend number’ which is a function of trend magnitude, standard deviation of the time series, and sample size. For the case of dependent observations, use of an equivalent independent sample size rather than the actual sample size is shown to enable use of the same trend number developed for the independent case. An important related result is the existence of an upper limit on power (trend detectability) over a fixed time horizon, regardless of the number of samples taken, for a lag 1 Markov process.

287 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a new class of statistics which are distribution free when the populations are identical, but are not linear rank statistics, and each has the same Pitman efficiency as its corresponding linear rank statistic, and yet their small sample power is significantly higher.
Abstract: Several linear rank statistics have been proposed in the literature for the two-sample scale problem. We propose a new class of statistics which are distribution free when the populations are identical, but are not linear rank statistics. Our analogs of the Ansari-Bradley, Mood, and Klotz tests are of particular interest. Each has the same Pitman efficiency as its corresponding linear rank statistic, and yet our small-sample power is significantly higher. In addition, our tests are consistent for scale differences with unequal location in the case when the populations are symmetric and the sample sizes are equal.

240 citations


Journal ArticleDOI
TL;DR: In this article, the optimum sample size is defined as the smallest sample size that would assure the desired reliability of the estimate, and the formula giving it depends on the way we choose to define reliability.
Abstract: Estimation of field population parameters is an indispensable part of many ecological and various pest management projects. Obviously, the greater the sample size, the more reliable the estimates are. However, the cost per unit sampled is often substantial, which makes the collection of unnecessarily large samples unwise. Thus, we must determine in advance the smallest sample size that would assure us the desired reliability of the estimate. This is called the optimum sample size. The formula giving it depends on the way we choose to define “reliability.“.

184 citations


Journal ArticleDOI
TL;DR: This paper considers the project scheduling problem with multiple constrained resources and shows that the choice of priority rule is important with the parallel method, but with the sampling method, although it does affect the distribution of the sample, the choices of rule are not significant.
Abstract: This paper considers the project scheduling problem with multiple constrained resources. Two classes of heuristic procedure, both making use of priority rules, are discussed: the parallel method, which generates just one schedule; and the sampling method, which generates a set of schedules using probabilistic techniques and selects the best schedule from this sample. An experimental investigation is described in which a set of projects with different characteristics is scheduled by each of these heuristics with a variety of priority rules. The effects of the heuristic method, the project characteristics and the priority rules are assessed. It is shown that the choice of priority rule is important with the parallel method, but with the sampling method, although it does affect the distribution of the sample, the choice of rule is not significant. The sampling method with sample size 100 is shown to produce samples at least 7% better than those generated by the corresponding parallel method, with 99% confidence. Further results are discussed and conclusions are presented.

180 citations


Journal ArticleDOI
TL;DR: In this article, the authors present maps showing the geographical distribution of estimates of the standard deviations at each grid point for the January climatological statistics of the NCAR GCM based on a sample of five independent realizations.
Abstract: One of the key problems in analyzing the results of climate experiments with general circulation models (GCM's,) is the matter of estimating the statistical significance of a prescribed change response. This question involves separating the signal (that part of the response attributable to the prescribed change) and the noise (some measure of the inherent variability of model statistics). In this paper we present maps showing the geographical distribution of estimates of the standard deviations at each grid point for the January climatological statistics of the NCAR GCM based on a sample of five independent realizations. Also, a formalism for estimating the statistical significance of a prescribed change response is given based on the classical Student's t-test, and the implications of varying the sample size are discussed. The most telling implication of the results is that thus statistical significance questions could mean that a large percentage of total computational effort in a particular pr...

169 citations


Journal ArticleDOI
TL;DR: In this article, the contrasts of interest are limited to the pairwise comparisons among the means of K samples of equal or unequal sizes, and four normal univariate single-stage multiple comparison procedures are compared for significance levels not exceeding 0.05.
Abstract: For the situation in which the contrasts of interest are limited to the pairwise comparisons among the means of K samples of equal or unequal sizes, four normal univariate single-stage multiple comparison procedures are compared for significance levels not exceeding 0.05: Scheffe's S-method, Dunn's (1, 2] and sidak's [17] improved version of the Bonferroni method, Hochberg's GT2 procedure [8] utilizing the maximum modulus, and Spjovoll and Stoline's T′-method [19]. Rules are given for determining if any method is uniformly preferable (best for all contrasts). Nonuniform preference rules are also proposed and applied to some examples. Auxiliary tables are provided for selecting a method for significance levels 0.01 and 0.05 for several values of v, the number of degrees of freedom of an independent variance estimate, and K. It is shown that the T′-method is uniformly preferable when the sample sizes are “nearly” equal, while one of the other methods will be uniformly preferable when all sample sizes are “s...

146 citations


Journal ArticleDOI
TL;DR: In this paper, the distribution of the characteristic roots of the sample covariance matrix is studied for multivariate populations with finite fourth cumulants, and the asymptotic forms of the marginal and joint distributions are given.
Abstract: SUMMARY The distribution of the characteristic roots of the sample covariance matrix is studied for multivariate populations with finite fourth cumulants. For large sample sizes, the asymptotic forms of the marginal and joint distributions are given. For moderate and small sample sizes, empirical sampling techniques are used: various trivariate models are simulated, and the distributions of the sample roots are empirically observed. It is shown that the distributional results for a multivariate normal population are nonrobust to departures from normality; they are mainly affected by nonzero fourth cumulants and cross-cumulants of the parent population.

139 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider one class of multivariate matching methods which yield the same percent reduction in expected bias for each of the matching variables, and derive the expression for the maximum attainable percent reduction of bias given fixed distributions and fixed sample sizes.
Abstract: Matched sampling is a method of data collection designed to reduce bias and variability due to specific matching variables. Although often used to control for bias in studies in which randomization is practically impossible, there is virtually no statistical literature devoted to investigating the ability of matched sampling to control bias in the common case of many matching variables. An obvious problem in studying the multivariate matching situation is the variety of sampling plans, underlying distributions, and intuitively reasonable matching methods. This article considers one class of multivariate matching methods which yield the same percent reduction in expected bias for each of the matching variables. The primary result is the derivation of the expression for the maximum attainable percent reduction in bias given fixed distributions and fixed sample sizes. An examination of trends in this maximum leads to a procedure for estimating minimum ratios of sample sizes needed to obtain well-matched samples.

136 citations


Journal ArticleDOI
TL;DR: In this article, a simple combination of one-sided sequential probability ratio tests, called a 2-SPRT, is shown to approximately minimize the expected sample size at a given point θ 0 among all tests with error probabilities controlled at two other points, θ 1 and θ 2.
Abstract: A simple combination of one-sided sequential probability ratio tests, called a 2-SPRT, is shown to approximately minimize the expected sample size at a given point θ0 among all tests with error probabilities controlled at two other points, θ1 and θ2. In the symmetric normal and binomial testing problems, this result applies directly to the Kiefer-Weiss problem of minimizing the maximum over θ of the expected sample size. Extensive computer calculations for the normal case indicate that 2-SPRT's have efficiencies greater than 99% regardless of the size of the error probabilities. Accurate approximations to the error probabilities and expected sample sizes of these tests are given.

122 citations


Journal ArticleDOI
TL;DR: The square root rule is generalized to the case where the response variable has different variances in each group, and three situations in which unequal allocation may be preferred are discussed.

Journal ArticleDOI
TL;DR: In this article, adaptive estimates for the parameters of a stationary-rth-order autoregressive process are constructed from the observed portion of a sample path, and the asymptotic efficiency of these estimates relative to the least squares estimates is greater than or equal to one for all regular distributions.
Abstract: Let {Xt:t=0, ±1, ±2, ...} be a stationaryrth order autoregressive process whose generating disturbances are independent identically distributed random variables with marginal distribution functionF. Adaptive estimates for the parameters of {Xt} are constructed from the observed portion of a sample path. The asymptotic efficiency of these estimates relative to the least squares estimates is greater than or equal to one for all regularF. The nature of the adaptive estimates encourages stable behavior for moderate sample sizes. A similar approach can be taken to estimation problems in the general linear model.

Journal ArticleDOI
TL;DR: The relationship of sample size to number of variables in the use of factor analysis has been treated by many investigators as discussed by the authors, but none of these investigators pointed out the constraints imposed on the dimensionality of the variables by using a sample size smaller than the number of parameters.
Abstract: The relationship of sample size to number of variables in the use of factor analysis has been treated by many investigators. In attempting to explore what the minimum sample size should be, none of these investigators pointed out the constraints imposed on the dimensionality of the variables by using a sample size smaller than the number of variables. A review of studies in this area is made as well as suggestions for resolution of the problem.

Journal ArticleDOI
TL;DR: In this paper, the regret criterion is proposed to avoid the arbitrariness of the significance level and undesirable quadratic risk properties of a regret criterion, extending the results of Sawa and Hiromatsu [4].
Abstract: In linear models, an F-test may be used to decide on restricted or unrestricted estimators. To avoid the arbitrariness of the significance level and undesirable quadratic risk properties, a regret criterion is proposed, extending the results of Sawa and Hiromatsu [4]. Optimal critical points of the prior F-test and their corresponding significance levels are tabulated for different sample sizes and number of restrictions. The critical value is generally close to two, but much smaller if the columns of the design matrix are nonorthogonal. This suggests that if the F-statistic is more than two, the unrestricted estimator should be used.

Journal ArticleDOI
TL;DR: In this article, a model is developed for production processes which have an in-control state and may jump to one of several out-of-control states in course of time, each such state being associated with an assignable cause.
Abstract: A model is developed for production processes which have an in-control state and may jump to one of several out-of-control states in course of time, each such state being associated with an assignable cause. The quality characteristic is an attribute so that an np-chart control scheme is applied. Various cost and time elements are included in the derivation of the loss-cost function. By minimizing this function with respect to the three control variables, namely, the sampling interval, the sample size and the acceptance number, the economically optimal control plan can be obtained. A numerical example is given using a new prior distribution. The results compare favourably with those derived from a matched single cause model.

Journal ArticleDOI
TL;DR: In this paper, the authors consider sequential tests of the hypothesis Ho: 0 6 and compare the properties of the approximate Bayesian test, the sequential probability ratio test, and the fixed sample size test.
Abstract: SUMMARY Let X1, X2, ... denote independent random variables which are normally distributed with unknown mean 0 and unit variance. We consider sequential tests of the hypothesis Ho: 0 6. The tests which we consider were shown by Schwarz (1962) to approximate the optimal Bayesian tests with respect to a general loss structure and any prior density which is everywhere positive. Their continuation regions are bounded subsets of the (n, Sn) plane, where Sn is the cumulative sum. We give both inequalities and asymptotic expressions for the power function and the expected sample size. We also give comparisons of the properties of the approximate Bayesian test, the sequential probability ratio test, and the fixed sample size test.

ReportDOI
01 Aug 1976
TL;DR: In this article, the variances and covariances of the normal order statistics for samples of size N less than or equal to 20 were given by Sarhan and Greenberg (1956).
Abstract: Tables of the variances and covariances of the normal order statistics for samples of size N less than or equal to 20 were given by Sarhan and Greenberg (1956), based on tables of expected values given by Teichroew (1956). This report extends these results to N less than or equal to 50. 2 tables.

Journal ArticleDOI
TL;DR: In this article, a two-sample distribution-free procedure is proposed for testing equal locations under a stochastic ordering restriction, where the test statistic M is an estimator of P(X ≤ Y) based upon maximum likelihood estimators of stochastically ordered distribution functions.
Abstract: A two-sample distribution-free procedure is proposed for testing equal locations under a stochastic ordering restriction. The test statistic M is an estimator of P(X ≤ Y) based upon maximum likelihood estimators of stochastically ordered distribution functions. Some properties of the M test are developed and critical values are provided for selected sample sizes. A Monte Carlo power study indicates that the M test is more effective (for shift alternatives) than the Mann-Whitney-Wilcoxon test when the form of the underlying distribution is heavy-tailed, but the Mann-Whitney-Wilcoxon is preferred for moderate-tailed distributions.

Journal ArticleDOI
TL;DR: In this paper, a simple and economical method for estimating initial parameter values for the normal ogive or logistic latent trait mental test model is outlined and the accuracy of the method in comparison with maximum likelihood estimation is investigated through the use of Monte-Carlo data.
Abstract: A very simple and economical method for estimating initial parameter values for the normal ogive or logistic latent trait mental test model is outlined. The accuracy of the method in comparison with maximum likelihood estimation is investigated through the use of Monte-Carlo data. The study yields a number of observations concerning how sample size and true item parameter values of the data influence both estimation methods.

Journal ArticleDOI
TL;DR: In this paper, the effects of several experimental criteria including sample quality, heaing rate, sample size, sample vessel, test configuration, and thermodynamic interference are discussed, and the practicality of this method as a general standard procedure can only be realized when these effects are considered.


Journal ArticleDOI
TL;DR: In this article, the Neyman-Pearson framework of hypothesis testing with fixed-error-level specifications was used to obtain two-stage hypothesis testing designs with known variance and binomially distributed variates, and it was shown that when the alternative hypothesis is true, these optimal twostage designs generally achieve between one-half and two-thirds of the ASN differential between the two extremes of analogous fixed-sample designs (maximum ASN) and item-by-item Wald SPRT design (minimum ASN).
Abstract: Within the Neyman-Pearson framework of hypothesis testing with fixed-error-level specifications, two-stage designs are obtained such that sample size is minimized when the alternative hypothesis is true. Normally distributed variates with known variance and binomially distributed variates are considered. It is shown that when the alternative hypothesis is true, these optimal two-stage designs generally achieve between one-half and two-thirds of the ASN differential between the two extremes of analogous fixed-sample designs (maximum ASN) and item-by-item Wald SPRT design (minimum ASN when alternative hypothesis is true).

Journal ArticleDOI
TL;DR: The Bayesian algorithm presented in this paper provides a generalized procedure for determining the minimum cost sample size n* and acceptance number c* for single sample attribute acceptance plans.
Abstract: The Bayesian algorithm presented in this paper provides a generalized procedure for determining the minimum cost sample size n* and acceptance number c* for single sample attribute acceptance plans. The algorithm is applicable to a broad range of acceptance sampling problems, assuming only that the distributions of product quality are discrete, and that the sampling cost is either a linear or strictly convex function of the sample size. Experimental results are presented that compare the solution quality and the computational requirements of this algorithm with three types of previously reported procedures: 1 Bayesian decision tree methods, 2 analytic approximation methods, and 3 direct search techniques. The results indicate that the algorithm produces the optimal solution with minimal computational requirements over a wide range of acceptance sampling problem types.

01 Sep 1976
TL;DR: In this paper, the central-limit theorem is used to determine the sample size for travel time and delay studies that involve either general travel conditions for all vehicles along a study route, or for only public transportation vehicles on scheduled routes.
Abstract: Based on the central-limit theorem, sample means of travel times are assumed to have a normal distribution, regardless of the actual distribution of the population of travel times along a study route. Using this assumption, the techniques of statistical quality control are applied to provide a procedure for sample size determination. This procedure is applicable for both the license-plate and the test car techniques; it is also used for estimation of sample sizes for travel time and delay studies that involve either general travel conditions for all vehicles along a study route, or for only public transportation vehicles on scheduled routes.

Journal ArticleDOI
TL;DR: In this paper, an Edgeworth-type expansion for the distribution of a sample quantile was proposed, and the error of the approximation was shown to be of order O(n √ s+1) for all Borel sets.
Abstract: This paper deals with an Edgeworth-type expansion for the distribution of a sample quantile. As the sample size $n$ increases, these expansions establish a higher order approximation which holds uniformly for all Borel sets. If the underlying distribution function has $s + 2$ left and right derivatives at the true quantile, the error of the approximation is of order $O(n^{-(s+1)})$. From this result asymptotic expansions for the distribution functions of sample quantiles and for percentage points are derived.

Journal ArticleDOI
TL;DR: An exact distribution of a finite sample drawn from an infinite population in Hardy-Weinberg Equilibrium is described for k-alleles and an exact test of the law is presented and compared with two x2-tests for two and three alleles.
Abstract: An exact distribution of a finite sample drawn from an infinite population in Hardy-Weinberg Equilibrium is described for k-alleles. Accordingly, an exact test of the law is presented and compared with two x2-tests for two and three alleles. For two alleles, it is shown that the "classical" c2-test is very adequate for sample sizes as small as ten. For three alleles, it is shown that a simpler formulation based on Leven's distribution approximates the exact test of this paper rather closely. However, it is recommended that researchers continue to employ the standard x2-test for all sample sizes and abide by it if the corresponding probability value is not "too close" to the critical level; otherwise, an exact test should be used.

Journal ArticleDOI
TL;DR: Dispersion statistics parameters of Taylor's power law (b) and the common k of the negative binomial distribution to be used in establishing a sampling survey system for integrated pest management were calculated for the mite species Panonychus ulmi and Amblyseius fallacis occurring in Michigan apple orchards.
Abstract: Dispersion statistics parameters of Taylor's power law (b) and the common k of the negative binomial distribution to be used in establishing a sampling survey system for integrated pest management were calculated for the mite species Panonychus ulmi and Amblyseius fallacis occurring in Michigan apple orchards of ≤ 10 acres in size. For both species, a data set consisting of 6600 tree samples was analyzed and for P. ulmi several smaller data sets were also evaluated. Sample data for either species was adequately described by a negative binomial distribution. When b and common k values were compared for all life stages of both species, they were very similar. Based on the assumption of a negative binomial distribution, sample size estimates for both species at varying densities and error margins (expressed as ratios of standard error to mean) were determined.

Journal ArticleDOI
TL;DR: In this paper, a comparison of conditional confidence interval procedures for estimating the parameters in the power law model, and for estimating mean life at a given future stress level, is made based on asymptotic properties of the maximum likelihood estimates.
Abstract: SUMMARY Estimation and prediction procedures are discussed for the inverse power law model, when the time to failure follows an exponential distribution. In the context of accelerated life test experiments, procedures are given for estimating the parameters in the power law model, and for estimating mean life at a given future stress level. The procedures given are conditional confidence interval procedures, obtained by conditioning on ancillary statistics. A comparison is made of these procedures and procedures based on asymptotic properties of the maximum likelihood estimates. IN studies concerning the length of life of certain types of manufactured items, it is often wished to consider the relationship between length of life and one or more concomitant variables. Thus, for example, in an experiment to study the lifetimes of a certain type of electrical insulation, the relationship between length of life and environmental temperature was studied (Nelson, 1970). Sometimes the main problem of interest involves determining the effect of environmental factors (concomitant variables) on the life distribution of the items in question, and in incorporating this into a useful statistical model. In other situations, the general form of the model may be considered determined, and it may be wished to estimate various parameters in the model. The estimated relationship between length of life and the concomitant variables allows the prediction of item life under specified environmental condi- tions. This latter situation commonly arises in accelerated life testing where, on the basis of tests run at "accelerated" test conditions, it is desired to predict item life under standard operating conditions. This paper deals with the second type of problem: we discuss estimation and prediction procedures for a model which is commonly used in reliability and life testing work, the so-called inverse power law model, with exponential time to fail data. This model has been discussed a number of times in the statistical literature, and a number of estimation procedures have been proposed for it, mainly based on large sample theory. Our purpose here is to describe confi- dence interval estimation procedures for this model, and to illustrate their use. The procedures do not involve the use of any asymptotic approximations and so all distributions given are exact for any sample size. Before describing the model, we remark that an excellent survey of work on the inverse power law model and its application in accelerated life testing is given by Nelson (1970), who discusses the more general situation in which the time to failure follows the two-parameter Weibull distribution. We consider the inverse power law in the following form: let the lifetime of an item under environmental condition i have an exponential distribution with mean Oi. In this model the environmental conditions are specified by means of a single covariate vi (which we will call the "stress"), and the relationship Oi = c/vg? is assumed, where c,p are (unknown) constants.

Journal ArticleDOI
TL;DR: It is shown that time-invariant algorithms can use knowledge of the sample size to obtain lower error rates than in the infinite sample problem and the minimal error rate achievable after N samples goes to zero.
Abstract: This paper explores the structure and performance of optimal finite state machines used to test between two simple hypotheses. It is shown that time-invariant algorithms can use knowledge of the sample size to obtain lower error rates than in the infinite sample problem. The existence of an optimal rule is established and its structure is found for optimal time-varying algorithms. The structure of optimal time-invariant rules is partially established. The particular problem of testing between two Gaussian distributions differing only by a shift is then examined. It is shown that the minimal error rate achievable after N samples goes to zero like exp[−(ln N)1/2].

Journal ArticleDOI
TL;DR: The problem of low response rates is a function of generalizing information from questionnaires to a group larger than that from which the questionnaire data were obtained as mentioned in this paper, which is a sampling problem that occurs when a random sample is drawn from a population and questionnaires are mailed to that sample, only if each member of that sample cooperates in the study will the data obtained be generalizable to the population within the confidence and precision limits associated with the sample size.
Abstract: Smith recently completed a study using questionnaires in which a response rate of 74% was reported. Jones reported a study with a response rate of 81%. Which of these two studies had the higher response rate? You may think the answer to this question is obvious; the intention of this article is to convince you that the obvious answer may be misleading. The problems associated with low response rates in studies using questionnaires have been discussed extensively. Basically, problems associated with low response rates are a function of generalizing information from questionnaires to a group larger than that from which the questionnaire data were obtained. Such generalizations may be from a randomly chosen sample to the population from which the sample was drawn, or from incomplete questionnaire data obtained from a population to that population. In the former case, the response rate problem is, in essence, a sampling problem. When a random sample is drawn from a population, and questionnaires are mailed to that sample, only if each member of that sample cooperates in the study will the data obtained be generalizable to the population within the confidence and precision limits associated with the sample size. This is because nonresponse patterns may be systematic; that is, all nonrespondents may share similar characteristics with each other-characteristics relevant to the study that distinguish them from respondents-and this will lead to their underrepresentation in the obtained data. "For instance,