scispace - formally typeset
Search or ask a question

Showing papers on "Sample size determination published in 1974"


Book
01 Jan 1974
TL;DR: In this paper, one-way analysis of variance with fixed effects was used to test whether the data fit the one-Way ANOVA model. But the results showed that the model was not robust enough to handle large numbers of samples.
Abstract: Preface.1. Data Screening.1.1 Variables and Their Classification.1.2 Describing the Data.1.2.1 Errors in the Data.1.2.2 Descriptive Statistics.1.2.3 Graphical Summarization.1.3 Departures from Assumptions.1.3.1 The Normal Distribution.1.3.2 The Normality Assumption.1.3.3 Transformations.1.3.4 Independence.1.4 Summary.Problems.References.2. One-Way Analysis of Variance Design.2.1 One-Way Analysis of Variance with Fixed Effects.2.1.1 Example.2.1.2 The One-Way Analysis of Variance Model with Fixed Effects.2.1.3 Null Hypothesis: Test for Equality of Population Means.2.1.4 Estimation of Model Terms.2.1.5 Breakdown of the Basic Sum of Squares.2.1.6 Analysis of Variance Table.2.1.7 The F Test.2.1.8 Analysis of Variance with Unequal Sample Sizes.2.2 One-Way Analysis of Variance with Random Effects.2.2.1 Data Example.2..2.2 The One-Way Analysis of Variance Model with Random Effects.2.2.3 Null Hypothesis: Test for Zero Variance of Population Means.2.2.4 Estimation of Model Terms.2.2.5 The F Test.2.3 Designing an Observational Study or Experiment.2.3.1 Randomization for Experimental Studies.2.3.2 Sample Size and Power.2.4 Checking if the Data Fit the One-Way ANOVA Model.2.4.1 Normality.2.4.2 Equality of Population Variances.2.4.3 Independence.2.4.4 Robustness.2.4.5 Missing Data.2.5 What to Do if the Data Do Not Fit the Model.2.5.1 Making Transformations.2.5.2 Using Nonparametric Methods.2.5.3 Using Alternative ANOVAs.2.6 Presentation and Interpretation of Results.2.7 Summary.Problems.References.3. Estimation and Simultaneous Inference.3.1 Estimation for Single Population Means.3.1.1 Parameter Estimation.3.1.2 Confidence Intervals.3.2 Estimation for Linear Combinations of Population Means.3.2.1 Differences of Two Population Means.3.2.2 General Contrasts for Two or More Means.3.2.3 General Contrasts for Trends.3.3 Simultaneous Statistical Inference.3.1.1 Straightforward Approach to Inference.3.3.2 Motivation for Multiple Comparison Procedures and Terminology.3.3.3 The Bonferroni Multiple Comparison Method.3.3.4 The Tukey Multiple Comparison Method.3.3.5 The Scheffe Multiple Comparison Method.3.4 Inference for Variance Components.3.5 Presentation and Interpretation of Results.3.6 Summary.Problems.References.4. Hierarchical or Nested Design.4.1 Example.4.2 The Model.4.3 Analysis of Variance Table and F Tests.4.3.1 Analysis of Variance Table.4.3.2 F Tests.4.3.3 Pooling.4.4 Estimation of Parameters.4.4.1 Comparison with the One-Way ANOVA Model of Chapter 2.4.5 Inferences with Unequal Sample Sizes.4.5.1 Hypothesis Testing.4.5.2 Estimation.4.6 Checking If the Data Fit the Model.4.7 What to Do If the Data Don't Fit the Model.4.8 Designing a Study.4.8.1 Relative Efficiency.4.9 Summary.Problems.References.5. Two Crossed Factors: Fixed Effects and Equal Sample Sizes.5.1 Example.5.2 The Model.5.3 Interpretation of Models and Interaction.5.4 Analysis of Variance and F Tests.5.5 Estimates of Parameters and Confidence Intervals.5.6 Designing a Study.5.7 Presentation and Interpretation of Results.5.8 Summary.Problems.References.6 Randomized Complete Block Design.6.1 Example.6.2 The Randomized Complete Block Design.6.3 The Model.6.4 Analysis of Variance Table and F Tests.6.5 Estimation of Parameters and Confidence Intervals.6.6 Checking If the Data Fit the Model.6.7 What to Do if the Data Don't Fit the Model.6.7.1 Friedman's Rank Sum Test.6.7.2 Missing Data.6.8 Designing a Randomized Complete Block Study.6.8.1 Experimental Studies.6.8.2 Observational Studies.6.9 Model Extensions.6.10 Summary.Problems.References.7. Two Crossed Factors: Fixed Effects and Unequal Sample Sizes.7.1 Example.7.2 The Model.7.3 Analysis of Variance and F Tests.7.4 Estimation of Parameters and Confidence Intervals.7.4.1 Means and Adjusted Means.7.4.2 Standard Errors and Confidence Intervals.7.5 Checking If the Data Fit the Two-Way Model.7.6 What To Do If the Data Don't Fit the Model.7.7 Summary.Problems.References.8. Crossed Factors: Mixed Models.8.1 Example.8.2 The Mixed Model.8.3 Estimation of Fixed Effects.8.4 Analysis of Variance.8.5 Estimation of Variance Components.8.6 Hypothesis Testing.8.7 Confidence Intervals for Means and Variance Components.8.7.1 Confidence Intervals for Population Means.8.7.2 Confidence Intervals for Variance Components.8.8 Comments on Available Software.8.9 Extensions of the Mixed Model.8.9.1 Unequal Sample Sizes.8.9.2 Fixed, Random, or Mixed Effects.8.9.3 Crossed versus Nested Factors.8.9.4 Dependence of Random Effects.8.10 Summary.Problems.References.9. Repeated Measures Designs.9.1 Repeated Measures for a Single Population.9.1.1 Example.9.1.2 The Model.9.1.3 Hypothesis Testing: No Time Effect.9.1.4 Simultaneous Inference.9.1.5 Orthogonal Contrasts.9.1.6 F Tests for Trends over Time.9.2 Repeated Measures with Several Populations.9.2.1 Example.9.2.2 Model.9.2.3 Analysis of Variance Table and F Tests.9.3 Checking if the Data Fit the Repeated Measures Model.9.4 What to Do if the Data Don't Fit the Model.9.5 General Comments on Repeated Measures Analyses.9.6 Summary.Problems.References.10. Linear Regression: Fixed X Model.10.1 Example.10.2 Fitting a Straight Line.10.3 The Fixed X Model.10.4 Estimation of Model Parameters and Standard Errors.10.4.1 Point Estimates.10.4.2 Estimates of Standard Errors.10.5 Inferences for Model Parameters: Confidence Intervals.10.6 Inference for Model Parameters: Hypothesis Testing.10.6.1 t Tests for Intercept and Slope.10.6.2 Division of the Basic Sum of Squares.10.6.3 Analysis of Variance Table and F Test.10.7 Checking if the Data Fit the Regression Model.10.7.1 Outliers.10.7.2 Checking for Linearity.10.7.3 Checking for Equality of Variances.10.7.4 Checking for Normality.10.7.5 Summary of Screening Procedures.10.8 What to Do if the Data Don't Fit the Model.10.9 Practical Issues in Designing a Regression Study.10.9.1 Is Fixed X Regression an Appropriate Technique?10.9.2 What Values of X Should Be Selected?10.9.3 Sample Size Calculations.10.10 Comparison with One-Way ANOVA.10.11 Summary.Problems.References.11. Linear Regression: Random X Model and Correlation.11.1 Example.11.1.1 Sampling and Summary Statistics.11.2 Summarizing the Relationship Between X and Y.11.3 Inferences for the Regression of Y and X.11.3.1 Comparison of Fixed X and Random X Sampling.11.4 The Bivariate Normal Model.11.4.1 The Bivariate Normal Distribution.11.4.2 The Correlation Coefficient.11.4.3 The Correlation Coefficient: Confidence Intervals and Tests.11.5 Checking if the Data Fit the Random X Regression Model.11.5.1 Checking for High-Leverage, Outlying, and Influential Observations.11.6 What to Do if the Data Don't Fit the Random X Model.11.6.1 Nonparametric Alternatives to Simple Linear Regression.11.6.2 Nonparametric Alternatives to the Pearson Correlation.11.7 Summary.Problem.References.12. Multiple Regression.12.1 Example.12.2 The Sample Regression Plane.12.3 The Multiple Regression Model.12.4 Parameters Standard Errors, and Confidence Intervals.12.4.1 Prediction of E(Y\X1,...,Xk).12.4.2 Standardized Regression Coefficients.12.5 Hypothesis Testing.12.5.1 Test That All Partial Regression Coefficients Are 0.12.5.2 Tests that One Partial Regression Coefficient is 0.12.6 Checking If the Data Fit the Multiple Regression Model.12.6.1 Checking for Outlying, High Leverage and Influential Points.12.6.2 Checking for Linearity.12.6.3 Checking for Equality of Variances.12.6.4 Checking for Normality of Errors.12.6.5 Other Potential Problems.12.7 What to Do If the Data Don't Fit the Model.12.8 Summary.Problems.References.13. Multiple and Partial Correlation.13.1 Example.13.2 The Sample Multiple Correlation Coefficient.13.3 The Sample Partial Correlation Coefficient.13.4 The Joint Distribution Model.13.4.1 The Population Multiple Correlation Coefficient.13.4.2 The Population Partial Correlation Coefficient.13.5 Inferences for the Multiple Correlation Coefficient.13.6 Inferences for Partial Correlation Coefficients.13.6.1 Confidence Intervals for Partial Correlation Coefficients.13.6.2 Hypothesis Tests for Partial Correlation Coefficients.13.7 Checking If the Data Fit the Joint Normal Model.13.8 What to Do If the Data Don't Fit the Model.13.9 Summary.Problems.References.14. Miscellaneous Topics in Regression.14.1 Models with Dummy Variables.14.2 Models with Interaction Terms.14.3 Models with Polynomial Terms.14.3.1 Polynomial Model.14.4 Variable Selection.14.4.1 Criteria for Evaluating and Comparing Models.14.4.2 Methods for Variable Selection.14.4.3 General Comments on Variable Selection.14.5 Summary.Problems.References.15. Analysis of Covariance.15.1 Example.15.2 The ANCOVA Model.15.3 Estimation of Model Parameters.15.4 Hypothesis Tests.15.5 Adjusted Means.15.5.1 Estimation of Adjusted Means and Standard Errors.15.5.2 Confidence Intervals for Adjusted Means.15.6 Checking If the Data Fit the ANCOVA Model.15.7 What to Do if the Data Don't Fit the Model.15.8 ANCOVA in Observational Studies.15.9 What Makes a Good Covariate.15.10 Measurement Error.15.11 ANCOVA versus Other Methods of Adjustment.15.12 Comments on Statistical Software.15.13 Summary.Problems.References.16. Summaries, Extensions, and Communication.16.1 Summaries and Extensions of Models.16.2 Communication of Statistics in the Context of Research Project.References.Appendix A.A.1 Expected Values and Parameters.A.2 Linear Combinations of Variables and Their Parameters.A.3 Balanced One-Way ANOVA, Expected Mean Squares.A.3.1 To Show EMS(MSa) = sigma2 + n SIGMAai= 1 alpha2i /(a - 1).A.3.2 To Show EMS(MSr) = sigma2.A.4 Balanced One-Way ANOVA, Random Effects.A.5 Balanced Nested Model.A.6 Mixed Model.A.6.1 Variances and Covariances of Yijk.A.6.2 Variance of Yi.A.6.3 Variance of Yi. - Yi'..A.7 Simple Linear Regression-Derivation of Least Squares Estimators.A.8 Derivation of Variance Estimates from Simple Linear Regression.Appendix B.Index.

607 citations


Journal ArticleDOI
TL;DR: Sample size requirements for cohort and case-control studies of exposure factor(s) and disease are discussed and examples of the 2 types of studies along with the mathematical determination of sample size for each is presented.
Abstract: Sample size requirements for cohort and case-control studies of exposure factor(s) and disease are discussed. The sample size requirement for a cohort study depends on the incidence of the disease among the nonexposed and on the relative risk of disease. For the case-control study sample size depends on the prevalence of exposure to the factor(s) and on the relative risk of disease. Examples of the 2 types of studies along with the mathematical determination of sample size for each is presented. Matching is ignored in the determination of sample size.

258 citations


Journal ArticleDOI
TL;DR: The subjective sampling distributions appeared to be unaffected by sample size (N=5 or 10) and number of outcomes, and were flatter than the corresponding "objective" sampling distributions as mentioned in this paper.
Abstract: .— Previous studies of sampling distributions have been conducted almost exclusively under the assumption that persons behave in accordance with the “fundamental convention” of probability, i.e. that the sum of all probability estimates will equal 1. When this assumption was tested by asking subjects to give “unrestricted” probability estimates of all possible outcomes of samples from a given population, a general tendency of overestimation made the sum of all probabilities exceed 1 to a considerable extent. The subjective sampling distributions appeared to be unaffected by sample size (N=5 or 10) and number of outcomes, and were flatter than the corresponding “objective” sampling distributions.

136 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider a system of two unrelated regression equations (yi = Xiβi + ui, i = 1, 2) with, and examine some finite sample properties of β-estimators based on the unrestricted estimate S of Σ = (σij ).
Abstract: In this article, we consider a system of two “seemingly unrelated regression” equations (yi = Xiβi + ui, i = 1, 2) with , and examine some finite sample properties of β-estimators based on the unrestricted estimate S of Σ = (σij ). The estimator of β2 is shown to be identical with the direct OLS estimator b 2 obtained from Equation 2. The estimator of β1 is shown to be more efficient than b 1, the direct OLS estimator of β1 obtained from Equation 1, for moderate departures of from zero; the efficiency, moreover, is shown to increase rapidly with the sample size for any given ρ.

110 citations


Journal ArticleDOI
TL;DR: Asymptotic approximations to the expected sample size are given for a class of tests of power one introduced in [10]. Comparisons are made with the method of mixtures of likelihood ratios and an application is given to Breiman's gambling theory for favorable games.
Abstract: Asymptotic approximations to the expected sample size are given for a class of tests of power one introduced in [10]. Comparisons are made with the method of mixtures of likelihood ratios, and an application is given to Breiman's gambling theory for favorable games.

88 citations


Journal ArticleDOI
TL;DR: In this paper, a method for generating a sequence of random variates from an empirical distribution was proposed, and the proposed method requires less computation time than two standard methods but requires only ten more words of memory.
Abstract: This note presents a method for generating a sequence of random variates from an empirical distribution. Computational results show that the proposed method requires less computation time than two standard methods but requires only ten more words of memory. The savings in time becomes more significant as the number of distinct values contained in the distribution, or the sample size increases.

83 citations


Journal ArticleDOI
TL;DR: Power tables are presented for estimates of heritability and genetic correlation for four research designs as a function of sample size and indicate that, for the study of a single population, 400 families of four members each is a sufficient sample to achieve a statistical power in excess of 95% for a heritability estimate of 0.20.
Abstract: The question of statistical power as it relates to estimates of heritability and genetic correlation, particularly with reference to population comparisons, is briefly discussed. Power tables (α=0.05) are presented for estimates of heritability and genetic correlation for four research designs (the regression of offspring on mid-parent values, the regression of offspring on single-parent values, the intraclass correlation of full sibs; and the intraclass correlation of half sibs) as a function of sample size. In addition, tables of statistical power for the comparison of heritabilities obtained from two different populations, using these four methodologies, are presented. These tables indicate that, for the study of a single population, 400 families of four members each is a sufficient sample to achieve a statistical power in excess of 95% for a heritability estimate of 0.20. However, in the comparison of heritabilities from two populations, 800 families of four members each, measured in each population, would be required to achieve equivalent statistical power for a difference in heritability of 0.20.

74 citations



Journal ArticleDOI
TL;DR: In this article, the authors used the method of maximum likelihood to estimate the parameters of a mixture of two regression lines and found that when the sample size exceeds 250 and the regression lines are more than three standard deviations apart for at least one half of the data, the maximum likelihood estimates are reliable.
Abstract: The method of maximum likelihood is used to estimate the parameters of a mixture of two regression lines, The results of a small simulation study show that when the sample size exceeds 250 and the regression lines are more than three standard deviations apart for at least one half of the data, the maximum likelihood estimates are reliable. When this is net the case their sampling variances are so large that the estimates may not be reliable.

63 citations


Journal ArticleDOI
TL;DR: In this paper, a cross-validation approach to the a priori determination of sample size requirements and the a posteriori estimates of a derived regression equation is developed for regression models, where sampling from multivariate normal populations is discussed in particular.
Abstract: A cross-validation approach to the a priori determination of sample size requirements and the a posteriori estimates of the validity of a derived regression equation is developed for regression models, where sampling from multivariate normal populations is discussed in particular. Tables of sample size estimates are presented for the random model and their applications illustrated. An algorithm is given to obtain tables for the fixed model directly from those for the random case.

62 citations


Journal ArticleDOI
TL;DR: In this article, a method of estimating the reliability of a test which has been divided into three parts is presented, where the parts are homogeneous in content (congeneric), i.e., if their true scores are linearly related and if sample size is large.
Abstract: This paper gives a method of estimating the reliability of a test which has been divided into three parts. The parts do not have to satisfy any statistical criteria like parallelism orτ-equivalence. If the parts are homogeneous in content (congeneric),i.e., if their true scores are linearly related and if sample size is large then the method described in this paper will give the precise value of the reliability parameter. If the homogeneity condition is violated then underestimation will typically result. However, the estimate will always be at least as accurate as coefficientα and Guttman's lower boundλ 3 when the same data are used. An application to real data is presented by way of illustration. Seven different splits of the same test are analyzed. The new method yields remarkably stable reliability estimates across splits as predicted by the theory. One deviating value can be accounted for by a certain unsuspected peculiarity of the test composition. Both coefficientα andλ 3 would not have led to the same discovery.

Journal ArticleDOI
TL;DR: In this paper, two new unbiased estimators for the common mean of two normal distributions were proposed for the equal sample size case, and it was shown that a slight modification of one of the estimators is better than either sample mean simultaneously for sample sizes of 10 or more.
Abstract: Consider the problem of estimating the common mean of two normal distributions. Two new unbiased estimators of the common mean are offered for the equal sample size case. Both are better than the sample mean based on one population for sample sizes of 5 or more. A slight modification of one of the estimators is better than either sample mean simultaneously for sample sizes of 10 or more. This same estimator has desirable large sample properties and an explicit simple upper bound is given for its variance. A final result is concerned with confidence estimation. Suppose the variance of the first population, say, is known. Then if the sample mean of that population, plus and minus a constant, is used as a confidence interval, it is shown that an improved confidence interval can be found provided the sample sizes are at least 3. 1. Introduction and summary. Consider random samples of size n from each of two independent normal distributions. The first distribution has mean 0 and variance a 2 and the second has mean 6 and variance a 2. Let X' - (X1, X2, * , X.) and Y' = (Y,, Y2, *.*, Y,) denote these samples. The problem is to estimate the common mean 0 when the loss function is (t - 0)2/ar.2. This loss function is chosen for convenience. Squared error loss or squared error divided by a positive function of (vx2, a52) could also be taken. This problem of estimating the common mean and the related problem of recovery of interblock information has been studied in several papers. For a brief bibliography and justification of some of the results studied here the reader is referred to the introduction of Brown and Cohen [2]. In this paper two new unbiased estimators for the common mean are suggested for the equal sample size case. Each estimator is uniformly better than the sample

Journal ArticleDOI
TL;DR: In this article, the authors apply the jackknife method to stratified samples from a multivariate population of finite size, where the data omitted are those for a group of sampling units that cut across all strata, thereby compacting the region where approximate relationships are assumed to hold.
Abstract: SUMMARY The jackknife method of investigating and reducing the bias in nonlinear estimates of parameters is applied to stratified samples from a multivariate population of finite size. Attention is focused on estimators that can be expressed as functions of sample means. As in other applications of the jackknife method, an approximate relationship between the bias in an estimator and the sample size is exploited to reduce this bias by employing a linear combination of an estimate computed from all the data and several estimates each computed after omitting part of the data. Unlike some other applications to stratified sampling, however, where the data omitted are those for a group of sampling units that cut across all strata, the present application involves omitting just one sampling unit at a time, thereby compacting the region where approximate relationships are assumed to hold, and increasing the stability of the variance estimates. The relationships are derived by an analytic approach. Results are carried out far enough for use in second-order jackknife estimates of parameters and variances so as to eliminate bias to third-order moments of the variables observed. When there is just one stratum, the first-order estimators defined by equations (4.3) and (6-1), which are unbiased to second-order moments, are asymptotically the same -as those proposed by Tukey (1958).

Journal ArticleDOI
Abstract: SUMMARY The problem of comparing several ordered dose levels with a control when a larger sample size is taken on the control is considered. The distributions of Bartholomew's tests are determined for the limiting case where the control mean is known and an approximation is given for the problem. The existing tables for Bartholomew's tests are extended. It is considered that these tests are superior in all situations where the sample size for the control is greater than the sample sizes for the nonzero dose levels.

Journal ArticleDOI
TL;DR: It is shown that the optimum number of quantization levels decreases with increasing dimensionality for a fixed sample size, and increases with the sample size for fixed dimensionality.
Abstract: It is known that, in general, the number of measurements in a pattern classification problem cannot be increased arbitrarily, when the class-conditional densities are not completely known and only a finite number of learning samples are available. Above a certain number of measurements, the performance starts deteriorating instead of improving steadily. It was earlier shown by one of the authors that an exception to this "curse of finite sample size" is constituted by the case of binary independent measurements if a Bayesian approach is taken and uniform a priori on the unknown parameters are assumed. In this paper, the following generalizations are considered: arbitrary quantization and the use of maximum likelihood estimates. Further, the existence of an optimal quantization complexity is demonstrated, and its relationship to both the dimensionality of the measurement vector and the sample size are discussed. It is shown that the optimum number of quantization levels decreases with increasing dimensionality for a fixed sample size, and increases with the sample size for fixed dimensionality.

Journal ArticleDOI
TL;DR: In this article, the overidentification restrictions on a system of linear simultaneous equations are expressed in terms of restrictions on the reduced form parameters, which provide the basis of a test of the structure using only the unrestricted reduced form parameter estimates.
Abstract: In the first section of this paper the overidentifying restrictions on a system of linear simultaneous equations are expressed in terms of restrictions on the reduced form parameters. These restrictions provide the basis of a test of the structure using only the unrestricted reduced form parameter estimates. Under Ho the test proposed is asymptotically equivalent to a likelihood ratio test. The test may be applied as a single equation or complete system procedure and it may be presented as either a x2 or an F statistic. The case is also made here for system overidentification tests rather than single equation procedures, the arguments being drawn from the statistical literature on hypothesis testing by induction. The computational advantages of the present proposals are substantial when compared to FIML based likelihood -ratio tests and Monte Carlo experiments confirm that a system version of the test performs well in large samples. The system version of the test behaves like the FIML likelihood ratio test in large sample situations both under Ho and H1. However, the Monte Carlo studies indicate that both the single equation and system versions of the test perform poorly in small samples. THE AIM OF THIS paper is to investigate the possibility of deciding on the specification of a simultaneous equation model prior to the estimation of the structure. A well established test procedure is suggested which uses OLS estimates of the reduced form parameters; it enables the null hypothesis, that the model specified is not significantly different from the model which generated the sample, to be tested. Because of the one-to-one correspondence between the overidentified structure and the restricted reduced form, it is possible to make inferences about the structure from the observed compatibility of the reduced form restrictions with the sample information. In addition, the reduced form restrictions resulting from a particular equation may be isolated and tested separately, if desired. The principle underlying the test would appear to be due to Wald [17J; namely, that if the null hypothesis is correct and the structure postulated as the maintained hypothesis was responsible for the generation of the observed sample, then the unrestricted reduced form parameter estimates will tend, if the sample size is large enough, to satisfy the reduced form restrictions advanced under the maintained hypothesis. A number of problems relating to identification of linear simultaneous equations make life a little difficult and are discussed subsequently. Now, take the linear structure

Journal ArticleDOI
TL;DR: The methods presented here employ the concept of a distribution-free tolerance region to construct various sets whose elements have the common property of satisfying the chance constraint with a preassigned level of confidence.
Abstract: This paper concerns developing methods for approximating a chance-constrained set when any information concerning the random variables must be derived from actual samples. Such a situation has not been presented in the literature. When existing chance-constrained programming techniques are used, it is not possible to relate the accuracy of sample-based assumptions to actual constraint satisfaction. The methods presented here employ the concept of a distribution-free tolerance region to construct various sets whose elements have the common property of satisfying the chance constraint with a preassigned level of confidence. The sample size required to meet the desired confidence is readily available in tabular or graphical form.

Journal ArticleDOI
TL;DR: A unified treatment of three different models concerning the problem of obtaining suitably accurate confidence bounds on series or parallel system reliability from subsystem test data is explored and developed in this paper, where the component or subsystem test sets are assumed to be (1) exponentially distributed with censoring or truncation for a fixed number of failures, (2) exponential distributed with truncation of tests at fixed times, and (3) binomially distributed (pass-fail) with fixed but different sample sizes and random numbers of failures for subsystem tests.
Abstract: A unified treatment of three different models concerning the problem of obtaining suitably accurate confidence bounds on series or parallel system reliability from subsystem test data is explored and developed. The component or subsystem test data are assumed to be (1) exponentially distributed with censoring or truncation for a fixed number of failures, (2) exponentially distributed with truncation of tests at fixed times, and (3) binomially distributed (pass-fail) with fixed but different sample sizes and random numbers of failures for subsystem tests. Rather unique relations between the three models are found and discussed based on the binomial reliability study of hlann (1973). In fact, the approximate theory developed herein applies to “mixed” data systems, i.e. the case where some subsystem data are binomial and the others exponential in character. The extension of results to complex systems is also treated. The methodology developed for combining component failure data should perhaps be useful in p...

Journal ArticleDOI
TL;DR: In this paper, the authors consider several approximate confidence interval methods for θ, including a new method based on exact confidence intervals for linear functions of μ and σ2, and demonstrate the suitability of the new method for applications involving a wide range of data transformations, parameter values and sample sizes.
Abstract: When data are transformed to satisfy a spherical normal linear model, the mean θ of a variate in the original scale is a function of the mean μ and variance σ2 of a normal variate. We consider several approximate confidence interval methods for θ, including a new method based on exact confidence intervals for linear functions of μ and σ2. Monte Carlo estimates of coverage probabilities demonstrate the suitability of the new method for applications involving a wide range of data transformations, parameter values and sample sizes.

Journal ArticleDOI
TL;DR: In this paper, the robustness of three multiple-comparison procedures (i.e., multiple t test, Tukey WSD and Scheffe S test) to the violation of the homogeneous population variance assumption was investigated using three different standard error estimates.
Abstract: The robustness of three multiple-comparison procedures (multiple t test, Tukey WSD and Scheffe S test) to the violation of the homogeneous population variance assumption was investigated using three different standard error estimates. These estimates were (1) the square root of two mean square within/n, (2) the standard error of the traditional t test and (3) the standard error of the Behrens-Fisher t′ statistic. The procedures were highly robust to variance heterogeneity using the Behrens-Fisher standard error estimate. The procedures were considerably less robust using MSW. The universal use of the Behrens-Fisher statistic with the Welch solution for critical values is recommended.

Journal ArticleDOI
TL;DR: In this paper, a graphical procedure is given to evaluate the sample size for which confidence intervals of specified lengths, for the parameters of the multinomial distribution, are simultaneously verified at a given significance level.
Abstract: A graphical procedure is given to evaluate the sample size for which confidence intervals of specified lengths, for the parameters of the multinomial distribution, are simultaneously verified at a given significance level.

Journal ArticleDOI
TL;DR: In this paper, it was shown that the M.L.E. is inadmissible whenever the total sample size is 7 or more, except in two special cases, except for two special classes of cases.
Abstract: Admissibility properties of the M.L.E. for the parameters of $m$ independent binomial distributions (when these parameters are known to be ordered) are determined for certain convex loss functions. It is shown that, except in two special cases, the M.L.E. is inadmissible whenever the total sample size is 7 or more.

Journal ArticleDOI
TL;DR: In this article, a mathematical model for the derivation of the expected number of matched pairs is presented, employing only that information which can be assumed at the planning stage of research, and an approximate expression for calculating the sample variance is derived empirically.
Abstract: SUMMARY A mathematical model for the derivation of the expected number of matched pairs is presented, employing only that information which can be assumed at the planning stage of research. An approximate expression for calculating the sample variance is derived empirically. Using numerical values of the expected number of matches, for varying sample sizes, numbers of matching categories and matching distributions, it is shown that (a) in order to pair match all (or most) of an initial sample of cases the control reservoir has to be at least five to ten times as large in most situations; (b) for two samples of comparable size, the expected number of matches never reaches 100 per cent; and (c) when the number of categories is equivalent to the number in equal initial samples, only 50 per cent of the maximum pairs can be expected. The implications of these results for the planning of research are discussed.

Journal ArticleDOI
TL;DR: The extreme values of the distribution of a sample of healthy persons are not acceptable because they include values from a substantial number of abnormal persons, thus resulting in a large number of false normal diagnoses.
Abstract: The three main prerequisites for determination of valid normal limits are adequate sample size, sample composition representative of the average healthy population, and adequate statistical evaluation. In a substantial number of publications, even recent publications, one or the other of these principles (and occasionally all three) have been disregarded. The most common error is inadequate sample size. The minimal sample size of a male adult population for determination of normal limits should exceed 500, and approach 1,000 for a mixed sample of men and women. Constitutional variables affecting the electrocardiogram (relative body weight, age, sex and race) must be considered. In view of the skewed distribution of most electrocardiographic items, 95 or 98 percentiles should be used for determination of valid normal limits. Use of standard deviations (usually ± 2 SD) may result in erroneous limits. The extreme values of the distribution of a sample of healthy persons are not acceptable because they include values from a substantial number of abnormal persons, thus resulting in a large number of false normal diagnoses.

Journal ArticleDOI
TL;DR: In this article, the authors conducted a cross-validation study of the Bayesian m-group regression method developed by Jackson, Novick, and Thayer (1971) based on a theory described in Lindley and Smith (1972).
Abstract: Novick, Jackson, Thayer, and Cole (1972) conducted a cross-validation study of the Bayesian m-group regression method developed by Jackson, Novick, and Thayer (1971) based on a theory described in Lindley and Smith (1972). The context of this cross validation was the use of ACT Assessment Scores to predict first semester grade point averages in traditional junior colleges. Within-group least squares regression lines were calculated in each of 22 carefully selected, academically oriented junior colleges using 1968 data and these lines were used for prediction on 1969 data. The principal focus of accuracy of prediction was on the average over colleges of the mean-squared errors when the predictions for persons were compared with their actual attained grade point averages. The primary conclusions of the study were based on a 25% within-college sample. The sample sizes ranged from 26 to 184. The average, over the 22 colleges, of the mean-squared errors for the within-college regression was .62. This represents the average result to be expected if each college did its own work using only information from that college. There has always been the thought that some improvement on withincollege least squares could be attained by some central prediction system. However, we are unaware of a single example in the literature where a large scale cross-validation study has substantially supported this contention. In the instances we know of, cross validation has either not been done, has not been done successfully, or has not been done on a sufficient scale to clearly establish the general validity of the system used. We assume that there have

Journal ArticleDOI
TL;DR: In this article, the authors consider one class of multivariate matching methods which yield the same percent reduction in expected bias for each of the matching variables, and derive the expression for the maximum attainable percent reduction of bias given fixed distributions and fixed sample sizes.
Abstract: Matched sampling is a method of data collection designed to reduce bias and variability due to specific matching variables. Although often used to control for bias in studies in which randomization is practically impossible, there is virtually no statistical literature devoted to investigating the ability of matched sampling to control bias in the common case of many matching variables. An obvious problem in studying the multivariate matching situation is the variety of sampling plans, underlying distributions, and intuitively reasonable matching methods. This article considers one class of multivariate matching methods which yield the same percent reduction in expected bias for each of the matching variables. The primary result is the derivation of the expression for the maximum attainable percent reduction in bias given fixed distributions and fixed sample sizes. An examination of trends in this maximum leads to a procedure for estimating minimum ratios of sample sizes needed to obtain well-matched samples.

Journal ArticleDOI
TL;DR: In this article, the subject was required to make numerical judgments based on a brief inspection of a pair of samples of IQ scores, each sample within a pair contained 20 scores and the values of the two sample means were varied factorially.

Journal ArticleDOI
TL;DR: For testing H0:μ 0 and against when the observations are independent normal with known variance, tests of the following structure are considered: for fixed n, stop with the first i≤n such that and reject H0,. Otherwise stop with n observations and accept H0.
Abstract: For testing H0:μ 0 and against when the observations are independent normal with known variance, tests of the following structure are considered: For fixed n , stop with the first i≤n such that and reject H0, . Otherwise stop with n observations and accept H0 . Bounds on the power function and expected sample size are obtained, and these become exact limits as n→∞. The power function of these tests never falls below 94% of that of the corresponding nonsequential UMP (UMPU) tests.

Journal ArticleDOI
M. P. Cowles1
TL;DR: A brief discussion of four aspects of a statistical test; alpha level, strength of relationship, power and sample size, is followed by the suggestion that we might adopt N = 35 as a rule of thumb.
Abstract: A brief discussion of four aspects of a statistical test; alpha level, strength of relationship, power and sample size, is followed by the suggestion that we might adopt N = 35 as a rule of thumb.

Journal ArticleDOI
TL;DR: A Bayesian decision theoretic approach is employed to compare sampling schemes designed to estimate the reliability of series and parallel systems by testing individual components, and schemes are found which minimize Bayes risk plus sampling cost.
Abstract: A Bayesian decision theoretic approach is employed to compare sampling schemes designed to estimate the reliability of series and parallel systems by testing individual components. Quadratic loss is assumed and schemes are found which minimize Bayes risk plus sampling cost. Several kinds of initial information concerning the reliability of the individual components, all of which assume the components function independently, are considered. The case for which the initial information is in terms of the system's reliability is briefly considered and related to the aforementioned case