scispace - formally typeset
Search or ask a question

Showing papers on "Sample size determination published in 1975"


Journal ArticleDOI
01 Oct 1975-Ecology
TL;DR: In this article, an explicit means of calculating the expected number of species and the variance of (sn) in a random sample of n individuals from a collection containing N individuals and S species is presented.
Abstract: An explicit means of calculating the expected number of species [E(Sn)] and the variance of (Sn) in a random sample of n individuals from a collection containing N individuals and S species is presented. An example illustrates a new use of E(Sn): determination of the sample size required for any desired degree of accuracy in collecting species known to occur in a particular area.

890 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compared the predictive ability of standard linear regression and simple unit weighting schemes, and found that unit weights are a viable alternative to standard regression in certain situations and not greatly inferior in others.

648 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed several many outlier procedtues to detect more than one outlier in a sample and compared several different procedtuses in various sample sizes.
Abstract: This article is concerned with “many outlier” procedures i.e., procedures that can detect more than one outlier in a sample. Several many outlier procedtues are proposed in Section 2 and via power comparisons in Section 3 are found to be much superior to one outlier procedures in detecting many outliers. We then compare several different. many outlier procedures in Section 4 and find that the procedutre based on the extreme studentized deviate (ESD) is slightly the best. Finally, 5%, 1% and .5% points are given for the ESD procedure for various sample sizes in Section 5.

257 citations


Journal ArticleDOI
TL;DR: In this paper, the error model is transformed and reparameterized to induce regular estimation on the boundary with one or both degrees of freedom infinite, leading to bivariate score tests for normal, extreme value and logistic special cases as well as an evaluation of these models within a more general framework.
Abstract: SUMMARY Linear models, with errors that follow the distribution of the logarithm of an F statistic, are shown to include a number of common statistical models as special cases. The error model is transformed and reparameterized to induce regular estimation on the boundary with one or both degrees of freedom infinite. This leads to bivariate score tests for normal, extreme value and logistic special cases as well as an evaluation of these models within a more general framework. In particular, the test for normality is found to reduce to the usual tests based on sample skewness and kurtosis. Sample sizes are given for pairwise discrimination among some specific models. Applications are indicated.

162 citations


Journal ArticleDOI
TL;DR: Asymptotic formulae, lower and upper bounds for the expected sample size of certain sequential tests of the parameter of an exponential family of distributions, were presented in this paper, where the tests involved are tests of power one based on mixture-type stopping rules and tests for detecting of change in the underlying distribution.
Abstract: This paper presents asymptotic formulae, lower and upper bounds for the expected sample size of certain sequential tests of the parameter of an exponential family of distributions. The tests involved are tests of power one based on mixture-type stopping rules and tests for the detecting of change in the underlying distribution. Analysis for incorrect assumptions of the underlying distribution yields asymptotic formulae for such cases, showing robustness of the original formulae. Monte Carlo results indicate the validity of asymptotic formulae for sample sizes one would expect in practical applications.

138 citations


Journal ArticleDOI
TL;DR: In this article, a type of hybrid retrospective design for epidemiologic studies within defined populations is considered, which requires exposure information for the complete population of cases and for a random sample of the original population at risk (PAR).
Abstract: A type of hybrid retrospective design for epidemiologic studies within defined populations is considered. The design requires “exposure” information for the complete population of cases and for a random sample of the original population at risk (PAR). Assuming that the only stochastic feature occurs with the random sampling of the PAR, confidence intervals for incidence rates and relative risk (R) are developed in conjunction with PAR sample size requirements. Analysis by stratification is discussed and a numerical example included.

111 citations


Journal ArticleDOI
TL;DR: In this article, the authors provide tables of percentage points of the asymptotic distribution of the one sample truncated Kolmogorov-Smirnov statistic for goodness of fit problems involving tnmcated or censored data and indicate that the tables provide accurate critical values for sample sizes greater than 30.
Abstract: In this article we provide tables of percentage points of the asymptotic distribution of the one sample truncated Kolmogorov-Smirnov statistic. We discuss use of the tables in goodness of fit problems involving tnmcated or censored data and indicate that the tables provide accurate critical values for sample sizes greater than 30. We also discuss use of the tables in situations involving censored data and in two sample testing problems.

106 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present procedures for estimating the appropriate number of subjects to include in educational experiments designed to detect differences among two or more treatments (e.g., directional and non-directional).
Abstract: The purpose of this paper is to present some new procedures for estimating the appropriate (or at least, the approximate) number of subjects to include in educational experiments designed to detect differences among two or more treatments. The major appeal of these procedures is essentially threefold: (a) they do not require an estimate of the unknown within-treatment variance previously acknowledged as a difficult task for most researchers (e.g., Feldt, 1973), (b) they consist of a straightforward extension (to multi-treatment and multi-factor designs) of a concept proposed for the case in which only two treatments are compared (e.g., Cohen, 1969), and (c) they are adaptable to the testing of "planned comparisons" (cf. Hays, 1973)-both directional and nondirectional-which in certain conditions has distinct economic advantages. In addition, because of the great versatility of the basic approach, it is applicable not only to situations wherein subjects are randomly assigned to treatments (completely randomized designs), but to those wherein subjects are matched or "blocked" on some relevant control variable prior to assignment (randomized block designs), when subjects serve as their own controls (repeated measures designs), when a control variable is employed as a "covariate" in the design, or when the appropriate "experimental unit" consists of something other than individual subjects (as when groups of students or classrooms are simultaneously administered a single treatment). Due to space limitations, however, only a brief discussion of the basic approach, along with an example illustrating its usage, will be included here. To set the stage, suppose that a researcher is interested in determining whether there are performance differences associated with K independently administered .treatments. Under the assumptions of random sampling and, especially, random assignment of subjects to treatments, a one-way analysis of variance (as an extension of the two-sample t test) might be conducted to assess whether the observed mean differences in performance exceed what would have been expected on the basis of chance. Should such an F test prove statistically significant, the researcher might then wish to identify which of the K treatment means differ from one another. Thus, a multiple comparison procedure such as that of Tukey or Schefft (cf. Kirk, 1968) would generally be selected in order to determine which means or combinations of means were responsible for rejecting the hypothesis of no differences.2 While previous approaches to the determination of "sample size" have generally been framed with regard to the probability of rejecting the no-difference hypothesis per se, the present approach is framed with regard to the probability of detecting

78 citations


Journal ArticleDOI
TL;DR: In this paper, the statistical significance of results obtained by analysing data through the Automatic Interaction Detection technique is investigated and a test statistic which is asymptotically independent of the sample size is examined and critical values are obtained for a special case.
Abstract: SUMMARY This paper initiates an investigation into the statistical significance of results obtained by analysing data through the Automatic Interaction Detection technique. A test statistic which is asymptotically independent of the sample size is examined and critical values are obtained for a special case. The asymptotic results are compared with the exact distribution for varying sample size in a specific example. A brief discussion on the application of the results to a particular case study concludes the paper.

70 citations


Journal ArticleDOI
TL;DR: Although the standard deviation is the most widely used measure of the precision of quantitative methods, there is a need to re-examine the conditions necessary to obtain a meaningful estimate of this quantity.
Abstract: Although the standard deviation is the most widely used measure of the precision of quantitative methods, there is a need to re-examine the conditions necessary to obtain a meaningful estimate of this quantity. The importance of the material to be sampled, the sample size, the calculation of confidence intervals, and the segregation of outliers are discussed.

65 citations


Journal ArticleDOI
TL;DR: In this paper, the sampling distributions of Kelley's e2 and Hays' ω2 were studied empirically by computer simulation within the context of a three level one-way fixed effects analysis of variance design.
Abstract: Statistics used to estimate the population correlation ratio were reviewed and evaluated. The sampling distributions of Kelley's e2 and Hays' ω2 were studied empirically by computer simulation within the context of a three level one-way fixed effects analysis of variance design. These statistics were found to have rather large standard errors when small samples were used. As with other correlation indices, large samples are recommended for accuracy of estimation. Both e2 and ω2 were found to be negligibly biased. Heterogeneity of variances had negligible effects on the estimates under conditions of proportional representativeness of sample sizes with respect to their population counterparts, but combinations of heterogeneity of variance and unrepresentative sample sizes yielded especially poor estimates.

Journal ArticleDOI
TL;DR: This paper brings to the attention of fishery biologists a method of estimating survival and exploitation rates when a series of tag recaptures from angler-killed fish is available; this model is appropriate only for recaptures that are removed from the population.
Abstract: This paper brings to the attention of fishery biologists a method of estimating survival and exploitation rates when a series of tag recaptures from angler-killed fish is available; this model is a...

Book
01 Jan 1975
TL;DR: A guide to testing statistical hypotheses for readers familiar with the Neyman-Pearson theory of hypothesis testing including the notion of power, the general linear hypothesis (multiple regression) problem, and the special case of analysis of variance is given in this paper.
Abstract: A guide to testing statistical hypotheses for readers familiar with the Neyman-Pearson theory of hypothesis testing including the notion of power, the general linear hypothesis (multiple regression) problem, and the special case of analysis of variance. The second edition (date of first not mentione

Journal ArticleDOI
TL;DR: The authors made an empirical comparison of three measures of relative predictor variable contribution: (1) the scaled weights of the first discriminant function, (2) the total group estimates of the correlations between each predictor variable and the first function, and (3) the within-groups estimates of correlation between predictor variables and first function.
Abstract: An empirical comparison is made of three proposed indices of relative predictor variable contribution: (1) the scaled weights of the first discriminant function; (2) the total group estimates of the correlations between each predictor variable and the first function; and (3) the within-groups estimates of the correlations between each predictor variable and the first function. It was found that given a single run of an experiment, none of the indices was sufficiently reliable in identifying the rank-order of the variables except possibly when the total sample size was very large.

Journal ArticleDOI
TL;DR: In this paper, the authors compare the performance of two asymptotically equivalent statistics, based on estimated residuals and Durbin's h, for detecting serial correlation when some of the regressors are lagged dependent variables.

Journal ArticleDOI
TL;DR: The Stochastic Dominance (SD) approach as discussed by the authors does not depend on specific assumptions about the investor's utility function and has been shown to be theoretically superior to the two-moment methods.
Abstract: Preference orderings of uncertain prospects have progressed from the two-moment EV model first developed by Markowitz [1952] to the more general efficiency analysis that is based on the entire probability function. This general efficiency approach, referred to as the Stochastic Dominance (SD) approach, does not depend on specific assumptions about the investor's utility function and has been shown to be theoretically superior to the “moment methods” [1].

Journal ArticleDOI
TL;DR: The Dollar Unit Sampling (DUS) method as mentioned in this paper performs hypothesis tests on total dollar amounts for populations which have zero or low error rates, rather than entire accounts, and estimates the maximum percentage of dollars that are in error in the population.
Abstract: The most interesting new development for statistical sampling in auditing has been the recent series of articles describing Dollar-Unit Sampling (DUS). While such a procedure has been part of the Haskins and Sells AUDITAPE system for many years, it has only been with the appearance of Meikle [1972], Anderson and Teitlebaum [1973], Teitlebaum [1973], and Neter, Goodfellow, and Loebbecke [1974] that a public and critical discussion of the merits and demerits of Dollar-Unit Sampling has occurred. Since this paper will deal with only a limited aspect of DUS, I will not present a detailed exposition of the entire method. I will assume that the reader is familiar with at least one of the above references, each of which contains an extensive description of the DUS method. Briefly, DUS can perform hypothesis tests on total dollar amounts for populations which have zero or low error rates. It does this by sampling individual dollars in the population, rather than entire accounts, and inferring the maximum percentage of dollars that are in error in the population. DUS has the advantage of being able to accept populations on the basis of a variables test when no or only a very few errors are found in the sample. This circumstance, which is typical of many auditing populations, causes considerable grief to normal estimation procedures (see Kaplan [1973]). DUS has been criticized for its inability to specify in advance the risks associated with use of this procedure. In the most common version of DUS, a sample size is selected assuming that no errors will occur in the dollars sampled; that is, a sample size computed on a discovery sampling basis. If errors occur the population is either rejected or else accepted at a lower

Journal ArticleDOI
TL;DR: In this paper, a method is developed for determining economical acceptance sampling plans where the characteristics of interest are a mixture of variables and attributes, and an optimum plan is found by minimizing the expected cost model with respect to the decision variables which are the sample size and control limits on the sample means for variables and the sample sizes and acceptance numbers for attributes.
Abstract: A method is developed for determining economical acceptance sampling plans where the characteristics of interest are a mixture of variables and attributes. The method uses a model which has been developed to represent the total expected cost per lot of exercising acceptance sampling. An optimum plan is found by minimizing the expected cost model with respect to the decision variables which are the sample sizes and control limits on the sample means for variables and the sample sizes and acceptance numbers for attributes. Optimization is accomplished using the pattern search.

Journal ArticleDOI
TL;DR: The T2 test is shown to be asymptotically equivalent to the analysis of variance F test in the balanced case and some investigation is made of the distribution of the test statistic by applying the method of Box (1954).
Abstract: SUMMARY The analysis of data from multiclinic experiments is put in the framework of a mixed model in which the treatment effects are assumed to be fixed, and clinic and clinic by treatment effects are assumed to be random. The analysis in the unbalanced case is developed under the usual mixed model assumptions. The maximum likelihood estimates and likelihood ratio tests of the parameters are derived following the approach of Hemmerle and Hartley [1973]. The T2 test is shown to be asymptotically equivalent to the analysis of variance F test in the balanced case. The distribution of the test statistic in the case of proportional cell frequencies is also discutssed. Some investigation is made of the distribution of the test statistic by applying the method of Box [1954]. Clinical studies in which several clinics participate in the evaluation of therapies using a common protocol have become relatively common. Each clinic contributes patients in proportion to the size of the population it serves that is eligible for the study. The number of clinics is often relatively large, of order 10-20, and the number of patients contributed by each clinic may be relatively small. The evaluation of the efficacy of treatments may not be the same in every clinic due to random variation among the clinics and small sample sizes. The objective of the investigation reported here is to develop a method of global evaluation in the circumstances described above. We shall assume an experimental design in which the clinics participating in a study are presumed to be a random sample from the population of clinics which might conceivably use the treatments under investigation and that patients are allocated at random within each clinic. We assume further that the clinics use identical criteria for selection of patients and for their management and evaluation. The design described and the assumptions made are identical to those of a randomized blocks design in which block effects (clinic effects) and clinic by treatment interactions are random. In most applications to medical research, we have the added complication of unequal sample sizes. Thus an appropriate model for this situation is a two-way classified mixed effects model with unequal cell frequencies, generally called unbalanced data. In case of balanced data (i.e. when cell frequencies are equal) the inference problems of this model have been studied by several authors (Eisenhart [1947], Scheffe [1959], Graybill [1961] among others). By considering cell means Scheffe constructed Hotelling's T2-test for testing the hypothesis of equality of fixed effects, while by imposing symmetric condition on covariance matrix, Graybill considered analysis of variance F-test. Since analysis of variance (A.O.V.) estimates of variance components may sometimes be negative, the maximum likelihood (ML) procedure may not lead to this A.O.V. F-test even in the balanced

Journal ArticleDOI
TL;DR: In this article, one-sided and two-sided nonparametric prediction limits which are based on the smallest and/or largest ordered observations are derived and tables which give the probabilities associated with various sample sizes are presented.
Abstract: In this report one-sided and two-sided non-parametric prediction limits which are based on the smallest and/or largest ordered observations are derived. Tables which give the probabilities associated with various sample sizes are presented...

Journal ArticleDOI
TL;DR: In this article, different modifications of Mahalanobis' generalized distance (D2) utilized in anthropological research are given, and it is shown that they all derive from the distribution of D2 when the distances between the populations investigated are real.

Journal ArticleDOI
TL;DR: An unbiased estimator of population true score from sample data is derived and its variance is shown to decrease with increasing sample size.
Abstract: A model is proposed for the description of ordinal test scores based on the definition of true score as expected rank. Derivations from the model are compared with results from clasiscal test theory as developed by Lord and Novick, in particular with respect to parallel tests and composites. An unbiased estimator of population true score from sample data is derived and its variance is shown to decrease with increasing sample size. Population reliability is shown to be analytically related to expected sample reliability, and methods of reliability estimation are discussed.

Journal ArticleDOI
TL;DR: In this article, double sample tests are presented for testing one-sided hypotheses about the mean of a normal population, based on the statistics suggested by Bulgren, Dykstra and Hewett.
Abstract: Double sample tests are presented for testing one-sided hypotheses about the mean of a normal population, based on the statistics suggested by Bulgren, Dykstra and Hewett [4]. Tables of decision values are given based on the criterion that the power curves of the double sample test and the corresponding single sample test are approximately the same. A discussion of the numerical methods is presented, as well as a comparison of expected sample sizes for a particular example.

Journal ArticleDOI
TL;DR: In this paper, the power of five tests for detecting heavy-tailed distributions with the null hypothesis of normality and alternative hypotheses of each of eleven members of the symmetric stable Paretian distribution was presented.
Abstract: This article presents estimates of the power of five tests for detecting heavy-tailed distributions with the null hypothesis of normality and alternative hypotheses of each of eleven members of the symmetric stable Paretian distribution. The findings indicate that for small samples and low values of the characteristic exponent, the ratio of the standard deviation to mean deviation would be the most appropriate test. With larger sample sizes and/or larger values of the characteristic exponent, Kurtosis and the ratio of one-half the range to the mean deviation from the sample median appear to be the best tests.

Journal ArticleDOI
TL;DR: The probability of misclassifying normal variates using the usual discriminant function when the parameters are unknown is estimated, by Monte Carlo simulation, as a function of n 1 and n 2 (sample sizes), p (number of variates) and α (measure of separation between the two populations).

Journal ArticleDOI
TL;DR: In this paper, the authors investigate conditions under which nonnegative unbiased estimators for the variance of the Horvitz and Thompson estimator exist and characterize the form of such estimators, when the design is of fixed sample size.
Abstract: In this article we investigate conditions under which nonnegative unbiased estimators for the variance of the Horvitz and Thompson estimator exist. We have characterized the form of such estimators, when the design is of fixed sample size.

Journal ArticleDOI
TL;DR: In this article, various transfomantions of chi-square to an approximate normal varialte are compared in the context of determining sample size in exponential life testing procedures for a given level of significance and probability of Type II error.
Abstract: Various transfomantions of chi-square to an approximate normal varialte are compared in the context of determining sample size in exponential life testing procedures for a given level of significance and probability of Type II error.

Journal ArticleDOI
TL;DR: Previous work by the author in which estimates of the heterogeneity variances are obtained and used in weighting is extended to the case of comparing means in two dependent samples and the resulting empirical weighting estimates are asymptotically equivalent to exact least squares estimates.
Abstract: Data which appear to be binomial proportions sometimes exhibit heterogeneity which results in greater variation than would be exhibited under the binomial distribution. Previous work by the author (Kleinman [1973]) in which estimates of the heterogeneity variances are obtained and used in weighting is extended to the case of comparing means in two dependent samples. The resulting empirical weighting estimates are asymptotically equivalent to exact least squares estimates and Monte Carlo studies for sample size 10 indicate high efficiency relative to exact least squares estimates.

Journal ArticleDOI
TL;DR: The problem of finding an optimum sampling strategy from the class of all linear unbiased strategies with a given expected sample size is considered in this paper, where the authors deal with the properties of admissibility, completeness and strength of the Horvitz-Thompson strategies.
Abstract: The problem of finding an optimum sampling strategy from the class of all linear unbiased strategies with a given expected sample size is considered. The paper deals with the properties of admissibility, completeness and strong admissibility of the Horvitz-Thompson strategies.