scispace - formally typeset
Search or ask a question

Showing papers on "Sample size determination published in 1989"


Journal Article
Fumio Tajima1
30 Oct 1989-Genomics
TL;DR: It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.

11,521 citations


Journal ArticleDOI
TL;DR: Two-stage designs that are optimal in the sense that the expected sample size is minimized if the regimen has low activity subject to constraints upon the size of the type 1 and type 2 errors are presented.

3,316 citations


Journal ArticleDOI
TL;DR: The statistical literature on tests to compare treatments after the analysis of variance is reviewed, and the use of these tests in ecology is examined, and particular strategies are recommended.
Abstract: The statistical literature on tests to compare treatments after the analysis of variance is reviewed, and the use of these tests in ecology is examined. Monte Carlo simulations on normal and lognormal data indicate that many of the tests commonly used are inappropriate or inefficient. Particular tests are recommended for unplanned multiple comparisons on the basis of controlling experimentwise type I error rate and providing maximum power. These include tests for parametric and nonparametric cases, equal and unequal sample sizes, homogeneous and heterogeneous variances, non-independent means (repeated measures or adjusted means), and comparing treatments to a control. Formulae and a worked example are provided. The problem of violations of assumptions, especially variance heterogeneity, was investigated using simulations, and particular strategies are recommended. The advantages and use of planned comparisons in ecology are discussed, and the philosophy of hypothesis testing with unplanned multiple comparisons is consid- ered in relation to confidence intervals and statistical estimation.

1,841 citations


Journal Article
TL;DR: This study demonstrates that it is possible to design a simple method of risk stratification of open-heart surgery patients that makes it feasible to analyze operative results by risk groups and to compare results in similar groups between institutions.
Abstract: The purpose of the study was to devise a method of stratifying open-heart operations into levels of predicted operative mortality, using objective data that are readily available in any hospital. Following univariate regression analysis of 3,500 consecutive operations, 14 risk factors were chosen that met these conditions. A few factors were excluded because they were insufficiently objective or not always available. An additive model was constructed, using the factors chosen, to calculate the probability of mortality within 30 days. The method was then tested prospectively in 1,332 open-heart procedures at the Newark Beth Israel Medical Center. Patients were categorized in five groups of increasing risk: good (0-4%), fair (5-9%), poor (10-14%), high (15-19%), and extremely high (greater than or equal to 20%). The correlation coefficient of anticipated and observed operative mortality, using the additive model, was 0.99. The operative mortality also correlated closely with complication rates and length of hospital stay. The additive model was compared with a second model based on logistic multiple regression; the resulting correlation coefficient was 0.85. The method was also tested at two other hospitals; although their sample sizes were smaller, the outcomes in each risk group were comparable with those at this institution. The collection of data proved to be acceptably simple for all three centers. This study demonstrates that it is possible to design a simple method of risk stratification of open-heart surgery patients that makes it feasible to analyze operative results by risk groups and to compare results in similar groups between institutions. Wider application of the system is recommended.

1,202 citations


Journal ArticleDOI
TL;DR: By this method, no lumping of data is required, and the accuracy of the estimate of alpha (i.e., a type 1 error) depends only on the number of randomizations of the original data set.
Abstract: Significance levels obtained from a x2 contingency test are suspect when sample sizes are small. Traditionally this has meant that data must be combined. However, such an approach may obscure heterogeneity and hence potentially reduce the power of the statistical test. In this paper, we present a Monte Carlo solution to this problem: by this method, no lumping of data is required, and the accuracy of the estimate of c1 (i.e., a type 1 error) depends only on the number of randomizations of the original data set. We illustrate this technique with data from mtDNA studies, where numerous genotypes are often observed and sample sizes are relatively small.

948 citations


Book
01 Dec 1989
TL;DR: This chapter discusses statistical power in treatment effectiveness research, as well as useful approaches and technology approaches to the study of statistical power.
Abstract: PART ONE: STATISTICAL POWER IN TREATMENT EFFECTIVENESS RESEARCH Treatment Effectiveness Research and Design Sensitivity The Statistical Power Framework Effect Size The Problematic Parameter How to Estimate Statistical Power PART TWO: USEFUL APPROACHES AND TECHNIQUES Dependent Measures Design, Sample Size, and Alpha The Independent Variable and the Role of Theory Putting It All Together

797 citations


Journal ArticleDOI
01 Feb 1989-Genetics
TL;DR: It is shown that two sampling plans whose differences have been stressed by previous authors can be treated in a uniform way and the temporal method is best suited for use with organisms having high juvenile mortality and, perhaps, a limited effective population size.
Abstract: The temporal method for estimating effective population size (Ne) from the standardized variance in allele frequency change (F) is presented in a generalized form. Whereas previous treatments of this method have adopted rather limiting assumptions, the present analysis shows that the temporal method is generally applicable to a wide variety of organisms. Use of a revised model of gene sampling permits a more generalized interpretation of Ne than that used by some other authors studying this method. It is shown that two sampling plans (individuals for genetic analysis taken before or after reproduction) whose differences have been stressed by previous authors can be treated in a uniform way. Computer simulations using a wide variety of initial conditions show that different formulas for computing F have much less effect on Ne than do sample size (S), number of generations between samples (t), or the number of loci studied (L). Simulation results also indicate that (1) bias of F is small unless alleles with very low frequency are used; (2) precision is typically increased by about the same amount with a doubling of S, t, or L; (3) confidence intervals for Ne computed using a chi 2 approximation are accurate and unbiased under most conditions; (4) the temporal method is best suited for use with organisms having high juvenile mortality and, perhaps, a limited effective population size.

784 citations


Journal ArticleDOI
TL;DR: Sample size tables are presented for epidemiologic studies which extend the use of Whittemore's formula and show that although the tables can be inaccurate for risk factors having double exponential distributions, they are reasonably adequate for normal distributions and exponential distributions.
Abstract: Sample size tables are presented for epidemiologic studies which extend the use of Whittemore's formula. The tables are easy to use for both simple and multiple logistic regressions. Monte Carlo simulations are performed which show three important results. Firstly, the sample size tables are suitable for studies with either high or low event proportions. Secondly, although the tables can be inaccurate for risk factors having double exponential distributions, they are reasonably adequate for normal distributions and exponential distributions. Finally, the power of a study varies both with the number of events and the number of individuals at risk.

343 citations


Journal ArticleDOI
TL;DR: The effect of finite sample-size on parameter estimates and their subsequent use in a family of functions are discussed, and an empirical approach is presented to enable asymptotic performance to be accurately estimated using a very small number of samples.
Abstract: The effect of finite sample-size on parameter estimates and their subsequent use in a family of functions are discussed. General and parameter-specific expressions for the expected bias and variance of the functions are derived. These expressions are then applied to the Bhattacharyya distance and the analysis of the linear and quadratic classifiers, providing insight into the relationship between the number of features and the number of training samples. Because of the functional form of the expressions, an empirical approach is presented to enable asymptotic performance to be accurately estimated using a very small number of samples. Results were experimentally verified using artificial data in controlled cases and using real, high-dimensional data. >

337 citations


Journal ArticleDOI
TL;DR: This article should serve as a useful guide for MIS researcher sin the planning, execution, and interpretation of inferential statistical analyses.
Abstract: Statistical power is a topic of importance to any researcher using statistical inference testing. Studies with low levels of statistical power usually result in inconclusive findings, even though the researcher may have expended much time and effort gathering the data for analysis. A survey of the statistical power of articles employing statistical inference testing published in leading MIS journals shows that their statistical power is, on average, substantially below accepted norms. The consequence of this low power is that MIS researchers typically have a 40 percent chance of not detecting the phenomenon under study, even though it, in fact, may exist.Fortunately, there are several techniques, beyond expanding the sample size (which often may be impossible) that researchers can use to improve the power of their studies. Some are as easy as using a different but more powerful statistical test, while others require developing more elaborate sampling plans or a more careful construction of the research design. Attention tot he statistical power of a study is one key ingredient in assuring the success of the study. This article should serve as a useful guide for MIS researcher sin the planning, execution, and interpretation of inferential statistical analyses.

324 citations


Journal ArticleDOI
TL;DR: In this article, an alternative measure of goodness-of-fit, based like Akaike's on the noncentrality parameter, appears to be consistent over variations in sample size.
Abstract: Akaike's Information Criterion is systematically dependent on sample size, and therefore cannot be used in practice as a basis for model selection. An alternative measure of goodness-of-fit, based like Akaike's on the noncentrality parameter, appears to be consistent over variations in sample size.

Journal ArticleDOI
P. R. Freeman1
TL;DR: It is concluded that the two-stage analysis for analysing the data from a two-treatment, two-period crossover trial is too potentially misleading to be of practical use.
Abstract: In the two-treatment, two-period crossover trial, patients are randomly allocated either to one group that receives treatment A followed by treatment B, or to another group that receives the treatments in the reverse order. Grizzle first proposed a two-stage procedure for analysing the data from such a trial. This paper examines the long-run sampling properties of this procedure, in terms of mean square error of point estimates, coverage probability of confidence intervals and actual significance level of hypothesis tests for the differences between the effects of the two treatments. The advantages of incorporating baseline observations into the analysis are also explored. Because the preliminary test for carryover is highly correlated with the analysis of data from the first period only, actual significance levels are higher than nominal levels even when there is no differential carryover. When carryover is present, the nominal level very seriously understates the actual level, and this becomes even worse when baseline observations are ignored. Increasing sample size only exacerbates the problem since this adverse behaviour then occurs at smaller values of the carryover effect. It is concluded that the two-stage analysis is too potentially misleading to be of practical use.

Journal ArticleDOI
TL;DR: The mathematical framework applicable when a multivariate normal distribution can be assumed is reviewed, a method for calculating exact power and sample sizes using a series expansion for the distribution of the multiple correlation coefficient is described and Cohen's approximations are described.
Abstract: This article discusses power and sample size calculations for observational studies in which the values of the independent variables cannot be fixed in advance but are themselves outcomes of the study. It reviews the mathematical framework applicable when a multivariate normal distribution can be assumed and describes a method for calculating exact power and sample sizes using a series expansion for the distribution of the multiple correlation coefficient. A table of exact sample sizes for level .05 tests is provided. Approximations to the exact power are discussed, most notably those of Cohen (1977). A rigorous justification of Cohen's approximations is given. Comparisons with exact answers show that the approximations are quite accurate in many situations of practical interest. More extensive tables and a computer program for exact calculations can be obtained from the authors.

Journal ArticleDOI
TL;DR: A two-stage design is proposed in which patients are first randomized among the experimental treatments, and the single treatment having the highest observed success rate is identified, and if this highest rate falls below a fixed cutoff then the trial is terminated.
Abstract: In clinical trials where several experimental treatments are of interest, the goal may be viewed as identification of the best of these and comparison of that treatment to a standard control therapy. However, it is undesirable to commit patients to a large-scale comparative trial of a new regimen without evidence that its therapeutic success rate is acceptably high. We propose a two-stage design in which patients are first randomized among the experimental treatments, and the single treatment having the highest observed success rate is identified. If this highest rate falls below a fixed cutoff then the trial is terminated. Otherwise, the "best" new treatment is compared to the control at a second stage. Locally optimal values of the cutoff and the stage-1 and stage-2 sample sizes are derived by minimizing expected total sample size. The design has both high power and high probability of terminating early when no experimental treatment is superior to the control. Numerical results for implementing the design are presented, and comparison to Dunnett's (1984, in Design of Experiments: Ranking and Selection, T. J. Santner and A. C. Tamhane (eds), 47-66; New York: Marcel Dekker) optimal one-stage procedure is made.

Journal ArticleDOI
TL;DR: In this paper, the authors provide empirical evidence that this underestimation phenomenon is extreme for certain sample size formulas based on confidence interval width, and they also discuss common sample size models that consider statistical power.
Abstract: One concern in the early stages of study planning and design is the minimum sample size needed to provide statistically credible results. This minimum sample size is usually determined via the use of simple formulas or, equivalently, from tables. The more popular formulas, however, involve large-sample approximations and hence may underestimate required sample sizes. This article provides empirical evidence indicating that this underestimation phenomenon is extreme for certain sample size formulas based on confidence interval width. Common sample size formulas that consider statistical power are also discussed; these are shown to perform quite well, even for small sample size situations. In this department The American Statistician publishes articles, reviews, and notes of interest to teachers of the first mathematical statistics course and of applied statistics courses. The department includes the Accent on Teaching Materials section; suitable contents for the section are described under the sec...

Journal ArticleDOI
TL;DR: What sample size n is needed to have 1 - beta chance of detecting a magnitude delta response to pollution impact?


Journal ArticleDOI
TL;DR: A generalized test of the hypothesis that observed changes in allele frequency can be satisfactorily explained by drift follows directly from the model, and simulation results indicate that the true α level of this adjusted test is close to the nominal one under most conditions.
Abstract: Although standard statistical tests (such as contingency chi-square or G tests) are not well suited to the analysis of temporal changes in allele frequencies, they continue to be used routinely in this context. Because the null hypothesis stipulated by the test is violated if samples are temporally spaced, the true probability of a significant test statistic will not equal the nominal α level, and conclusions drawn on the basis of such tests can be misleading. A generalized method, applicable to a wide variety of organisms and sampling schemes, is developed here to estimate the probability of a significant test statistic if the only forces acting on allele frequencies are stochastic ones (i.e., sampling error and genetic drift). Results from analyses and simulations indicate that the rate at which this probability increases with time is determined primarily by the ratio of sample size to effective population size. Because this ratio differs considerably among species, the seriousness of the error in using the standard test will also differ. Bias is particularly strong in cases in which a high percentage of the total population can be sampled (for example, endangered species). The model used here is also applicable to the analysis of parent-offspring data and to comparisons of replicate samples from the same generation. A generalized test of the hypothesis that observed changes in allele frequency can be satisfactorily explained by drift follows directly from the model, and simulation results indicate that the true α level of this adjusted test is close to the nominal one under most conditions.

Journal ArticleDOI
TL;DR: A one-parameter family of symmetric one-sided group sequential designs that are nearly fully efficient in terms of the average sample number and extended to a two-sided hypothesis test are presented.
Abstract: In Phase III clinical trials, ethical considerations often demand interim analyses in order that the better treatment be made available to all patients as soon as possible. Group sequential test designs that do not treat the hypotheses symmetrically may not fully address this concern since early termination of the study may be easier under one of the hypotheses. We present a one-parameter family of symmetric one-sided group sequential designs that are nearly fully efficient in terms of the average sample number. The symmetric tests are then extended to a two-sided hypothesis test. These symmetric two-sided group sequential tests are found to have improved overall efficiency when compared to the tests proposed by Pocock (1977, Biometrika 64, 191-199) and O'Brien and Fleming (1979, Biometrics 35, 549-556). Tables of critical values for both one-sided and two-sided symmetric designs are provided, thus allowing easy determination of sample sizes and stopping boundaries for a group sequential test. Approximate tests based on these designs are proposed for use when the number and timing of analyses are random.

Journal ArticleDOI
TL;DR: It is shown that for distributions satisfying mild regularity conditions, if attention is restricted to test statistics that are monotone nondecreasing functions of Si, then regardless of their covariance structure the min test is an optimal alpha-level test.
Abstract: SUMMARY We consider the problem of testing whether an identified treatment is better than each of K treatments. Suppose there are univariate test statistics Si that contrast the identified treatment with treatment i for i = 1, 2, .. . , K. The min test is defined to be the a-level procedure that rejects the null hypothesis that the identified treatment is not best when, for all i, Si rejects the one-sided hypothesis, at the a-level, that the identified treatment is not better than the ith treatment. In the normal case where Si are t statistics the min test is the likelihood ratio test. For distributions satisfying mild regularity conditions, if attention is restricted to test statistics that are monotone nondecreasing functions of Si, then regardless of their covariance structure the min test is an optimal a-level test. Tables of the sample size needed to achieve power .5, .8, .90, and .95 are given for the min test when the Si are Student's t and Wilcoxon.

Book ChapterDOI
TL;DR: The results indicate that the decision-making process in sampling must be viewed as a flexible exercise, dictated not by generalized recommendations but by specific objectives: there is no panacea in ecological sampling.
Abstract: In this paper we emphasize that sampling decisions in population and community ecology are context dependent. Thus, the selection of an appropriate sampling procedure should follow directly from considerations of the objectives of an investigation. We recognize eight sampling alternatives, which arise as a result of three basic dichotomies: parameter estimation versus pattern detection, univariate versus multivariate, and a discrete versus continuous sampling universe. These eight alternative sampling procedures are discussed as they relate to decisions regarding the required empirical sample size, the selection or arrangement of sampling units, and plot size and shape. Our results indicate that the decision-making process in sampling must be viewed as a flexible exercise, dictated not by generalized recommendations but by specific objectives: there is no panacea in ecological sampling. We also point to a number of unresolved sampling problems in ecology.

Journal ArticleDOI
TL;DR: There are striking trends for each endpoint, with small studies appearing to possess large treatment effects and large studies possessing relatively small effects, and it is believed that these differences are primarily due to publication bias.
Abstract: The potential magnitude of publication bias has been examined with a consecutive sample of published cancer clinical trials The analysis is based on the premise that the magnitude of the true treatment effect is unrelated to design features of the study, in particular sample size This assumption permits an analysis based only on published studies Three primary endpoints are examined: overall patient survival, disease-free survival, and tumor response rate There are striking trends for each endpoint, with small studies appearing to possess large treatment effects and large studies possessing relatively small effects It is believed that these differences are primarily due to publication bias The bias is very large: Absolute differences observed were 41% for overall survival, 79% for disease-free survival, and 17% for response rates Other study features have been examined that might be associated with bias, or that might be responsible for the striking trends regarding sample size The result

Journal ArticleDOI
01 Dec 1989-Ecology
TL;DR: The Petersen model is compared with theorectical sighting distributions to examine the effects of model and sampling biases, and the technique successfully estimated populations of badgers, bison, and crested porcupines.
Abstract: The use of capture–resight data for population estimation has seldom been exploited. It offers potential flexibility and advantages to the design of biological investigations in which a population estimate is required. Presently, the Petersen model is the only method for estimating closed populations using capture—resight data. A simple Monte Carlo simulation method can lead to a full probability distribution for the population. From this probability distribution, one can compute maximum likelihood estimates and a likelihood interval on the population. The shape and asymmetry of the distribution and width of likelihood intervals are determined by sampling heterogeneity and sample size. The method is simple and can be used by anyone with access to a microcomputer. Since it is data—intensive, estimates based on small data sets (including capture—recapture) with few animals can be quickly calculated. The method is especially applicable to species and habitats in which capture–resight, radiotelemetry, or other tracking data can be obtained and to situations in which nonrandom catchability or sightability is likely after the initial capture. The technique successfully estimated populations of badgers, bison, and crested porcupines. We compare observed with theorectical sighting distributions to examine the effects of model and sampling biases.

Journal ArticleDOI
TL;DR: Test-retest reliability data gathered from 79 sources (34 separate studies) were analyzed by a multiple-regression method in an attempt to estimate the effects of several factors on the temporal stability of individually tested intelligence.
Abstract: Test-retest reliability data gathered from 79 sources (34 separate studies) were analyzed by a multiple-regression method in an attempt to estimate the effects of several factors on the temporal stability of individually tested intelligence. Five intelligence tests were examined: the Standford-Binet (except the fourth edition), the WISC, the WISC-R, the WAIS, and the WAIS-R. Samples encompassed a wide range of subjects divergent on status, age, and sample size. Subject age and status, gender, and test-retest interval were evaluated, and age and interval were found to be significant predictors of reliability. Subject sex and specific instrument were not found to have a significant effect on reliability. A summary table provides expected reliability coefficients, standard error, and percent of persons with IQ change in excess of 15 points, tabulated for combinations of each of the two predictors.

Journal ArticleDOI
TL;DR: In this article, the authors investigate estimation of the parameter, K, of the negative binomial distribution for small samples, using a method-of-moments estimate (MME) and a maximum quasi-likelihood estimate (MQLE).
Abstract: We investigate estimation of the parameter, K, of the negative binomial distribution for small samples, using a method-of-moments estimate (MME) and a maximum quasi-likelihood estimate (MQLE). Previous work is reviewed; the importance of indirect estimation of K through its reciprocal, a, and of allowance for negative estimates of K (or a) are discussed. Samples of size 50 are simulated 10,000 times for each of several parameter combinations to examine the properties of the estimates. Samples of sizes 10, 20, 30, and 50 are simulated 1,000 times to investigate the effect of sample size. Both estimators perform reasonably well except when the mean is small and the sample size does not exceed 20. Three examples are given, one of a designed experiment, for which the MQLE is especially suited; confidence limits are derived for the MQLE. Further work along these lines is required for adequate assessment of the usual maximum likelihood estimate.

Journal ArticleDOI
TL;DR: This article showed that using an intensive design and the slope of response on time as the outcome measure maximizes sample retention and decreases within-group variability, thus, maximizing the power of test procedures without requiring increased sample sizes.
Abstract: Soft data are defined as measures having substantial intrasubject variability due to errors of measurement or to the inconsistency of subjects' responses. Such data are often important measures of response in randomized clinical trials. In this context, we show that using an intensive design and the slope of response on time as the outcome measure (a) maximizes sample retention and (b) decreases within-group variability, thus (c) maximizing the power of test procedures without requiring increased sample sizes.

Journal ArticleDOI
TL;DR: A method is presented for sample size determination based on the premise that a confidence interval for a simple mean, or for the difference between two means, with normally distributed data is to be used, and a concept of power relevant to confidence intervals is given.
Abstract: Sample size determination is usually based on the premise that a hypothesis test is to be used. A confidence interval can sometimes serve better than a hypothesis test. In this paper a method is presented for sample size determination based on the premise that a confidence interval for a simple mean, or for the difference between two means, with normally distributed data is to be used. For this purpose, a concept of power relevant to confidence intervals is given. Some useful tables giving required sample size using this method are also presented.

Journal ArticleDOI
TL;DR: In this paper, Monte Carlo simulation is used to assess the statistical properties of some Bayes procedures in situations where only a few data on a system governed by a NHPP (nonhomogeneous Poisson process) can be collected and where there is little or imprecise prior information available.
Abstract: Monte Carlo simulation is used to assess the statistical properties of some Bayes procedures in situations where only a few data on a system governed by a NHPP (nonhomogeneous Poisson process) can be collected and where there is little or imprecise prior information available. In particular, in the case of failure truncated data, two Bayes procedures are analyzed. The first uses a uniform prior PDF (probability distribution function) for the power law and a noninformative prior PDF for alpha , while the other uses a uniform PDF for the power law while assuming an informative PDF for the scale parameter obtained by using a gamma distribution for the prior knowledge of the mean number of failures in a given time interval. For both cases, point and interval estimation of the power law and point estimation of the scale parameter are discussed. Comparisons are given with the corresponding point and interval maximum-likelihood estimates for sample sizes of 5 and 10. The Bayes procedures are computationally much more onerous than the corresponding maximum-likelihood ones, since they in general require a numerical integration. In the case of small sample sizes, however, their use may be justified by the exceptionally favorable statistical properties shown when compared with the classical ones. In particular, their robustness with respect to a wrong assumption on the prior beta mean is interesting. >

Journal ArticleDOI
TL;DR: In this article, the authors compare parameter estimates from the proportional hazards model, the cumulative logistic model and a new modified logistic approach (referred to as the person-time logistic models) with the use of simulated data sets and with the following quantities varied: disease incidence, risk factor strength, length of follow-up, proportion censored, non-proportional hazards, and sample size.
Abstract: We compare parameter estimates from the proportional hazards model, the cumulative logistic model and a new modified logistic model (referred to as the person-time logistic model), with the use of simulated data sets and with the following quantities varied: disease incidence, risk factor strength, length of follow-up, the proportion censored, non-proportional hazards, and sample size. Parameter estimates from the person-time logistic regression model closely approximated those from the Cox model when the survival time distribution was close to exponential, but could differ substantially in other situations. We found parameter estimates from the cumulative logistic model similar to those from the Cox and person-time logistic models when the disease was rare, the risk factor moderate, and censoring rates similar across the covariates. We also compare the models with analysis of a real data set that involves the relationship of age, race, sex, blood pressure, and smoking to subsequent mortality. In this example, the length of follow-up among survivors varied from 5 to 14 years and the Cox and person-time logistic approaches gave nearly identical results. The cumulative logistic results had somewhat larger p-values but were substantively similar for all but one coefficient (the age-race interaction). The latter difference reflects differential censoring rates by age, race and sex.

Journal ArticleDOI
TL;DR: Group sequential testing for randomized clinical trials designed with multiple endpoints is considered in this paper, where the authors show that the sample size required when a trial is designed using more than one endpoint is smaller than a single endpoint, if the two calculations are undertaken using matching significance levels and powers.
Abstract: Group sequential testing for randomized clinical trials designed with multiple endpoints is considered. Previously computed tables for single endpoints are still useful, with a change in interpretation of certain parameters. The advantage in setting sample size based on multiple endpoints is that the sample size required when a trial is designed using more than one endpoint is smaller than the sample size based on any single endpoint, if the two calculations are undertaken using matching significance levels and powers. An example is provided, and possible extensions of the work are discussed.