scispace - formally typeset
Search or ask a question

Showing papers on "Sample size determination published in 1982"


Journal ArticleDOI
TL;DR: In this paper, the authors reviewed the necessary weighting factors for gridded data and the sampling errors incurred when too small a sample is available, and a rule of thumb indicating when an EOF is likely to be subject to large sampling fluctuations is presented.
Abstract: Empirical Orthogonal Functions (EOF's), eigenvectors of the spatial cross-covariance matrix of a meteorological field, are reviewed with special attention given to the necessary weighting factors for gridded data and the sampling errors incurred when too small a sample is available. The geographical shape of an EOF shows large intersample variability when its associated eigenvalue is “close” to a neighboring one. A rule of thumb indicating when an EOF is likely to be subject to large sampling fluctuations is presented. An explicit example, based on the statistics of the 500 mb geopotential height field, displays large intersample variability in the EOF's for sample sizes of a few hundred independent realizations, a size seldom exceeded by meteorological data sets.

2,793 citations


Journal ArticleDOI
TL;DR: The performance of four rules for determining the number of components to retain (Kaiser's eigenvalue greater than unity, Cattell's SCREE, Bartlett's test, and Velicer's MAP) was investigated across four systematically varied factors.
Abstract: The performance of four rules for determining the number of components to retain (Kaiser's eigenvalue greater than unity, Cattell's SCREE, Bartlett's test, and Velicer's MAP) was investigated across four systematically varied factors (sample size, number of variables, number of components, and component saturation). Ten sample correlation matrices were generated from each of 48 known population correlation matrices representing the combinations of conditions. The performance of the SCREE and MAP rules was generally the best across all situations. Bartlett's test was generally adequate except when the number of variables was close to the sample size. Kaiser's rule tended to severely overestimate the number of components.

532 citations


Journal ArticleDOI
TL;DR: In this article, a simulation study of the effects of sample size on the overall fit statistic provided by the LISREL program is presented, showing that the statistic is well behaved over a wide range of sample sizes for simple mod...
Abstract: A simulation study of the effects of sample size on the overall fit statistic provided by the LISREL program indicates the statistic is well behaved over a wide range of sample sizes for simple mod...

511 citations


Journal ArticleDOI
TL;DR: A test of the assumption of multivariate normality and methods for the detection of outlying families and outlying individuals are introduced and a method for the estimation of effects of measured genetic markers as variance components is introduced.
Abstract: Lange, Westlake & Spence (1976) used the assumption of multivariate normality to apply a likelihood method to the analysis of quantitative traits measured over pedigrees. We now introduce a test of the assumption of multivariate normality and methods for the detection of outlying families and outlying individuals. We also introduce a method for the estimation of effects of measured genetic markers as variance components, a flexible parameterization to estimate effects of shared family environment, and a method to allow for the ascertainment of pedigrees through probands. These innovations have been applied using numerical methods for maximization of the likelihood. Simulation studies and available theory suggest that the likelihood ratio criterion used in significance testing follows the expected asymptotic distribution with sample sizes encountered in typical applications.

509 citations


Journal Article
Don H. Card1
TL;DR: In this article, it is shown how one can use knowledge of map-category relative sizes to improve estimates of various probabilities, by means of two simple sampling plans suggested in the accuracy assessment literature.
Abstract: By means of two simple sampling plans suggested in the accuracy-assessment literature, it is shown how one can use knowledge of map-category relative sizes to improve estimates of various probabilities. The fact that maximum likelihood estimates of cell probabilities for the simple random sampling and map category-stratified sampling were identical has permitted a unified treatment of the contingency-table analysis. A rigorous analysis of the effect of sampling independently within map categories is made possible by results for the stratified case. It is noted that such matters as optimal sample size selection for the achievement of a desired level of precision in various estimators are irrelevant, since the estimators derived are valid irrespective of how sample sizes are chosen.

359 citations


Journal ArticleDOI
TL;DR: This paper quantifies the effects of using batch sizes larger than necessary to satisfy normality and independence assumptions and finds that the effects are large and small, respectively.
Abstract: Batching is a commonly used method for calculating confidence intervals on the mean of a sequence of correlated observations arising from a simulation experiment. Several recent papers have considered the effect of using batch sizes too small to satisfy assumptions of normality and/or independence, and the resulting incorrect probabilities of the confidence interval covering the mean. This paper quantifies the effects of using batch sizes larger than necessary to satisfy normality and independence assumptions. These effects include (1) correct probability of covering the mean, (2) an increase in expected half length, (3) an increase in the standard deviation and coefficient of variation of the half length, and (4) an increase in the probability of covering points not equal to the mean. For any sample size and independent and normal batch means, the results are (1) the effects of less than 10 batches are large and the effects of more than 30 batches small, and (2) additional batches have lesser effects on ...

345 citations


Journal ArticleDOI
01 Aug 1982-Ecology
TL;DR: In this article, the maximum likelihood estimator is derived and shown to be readily calculated using an iterative procedure that starts with the Mayfield (1975) estimate as a trial value.
Abstract: Statistical methods for estimating and comparing constant survival rates are developed here for sampling designs in which survival of a subject is checked at irregular intervals. The maximum likelihood estimator is derived and shown to be readily calculated using an iterative procedure that starts with the Mayfield (1975) estimate as a trial value. Sampling distributions of this estimator and of the product of two or more estimates are skewed, and normalizing transformations are provided to facilitate valid confidence interval estimation. The sampling distribution of the difference between two independent estimates is found to be sufficiently normal without transformation to allow valid use of conventional normal theory procedures for testing differences and determining sample size for specified power. Statistical validity under the variable intensity sampling design does require that the duration of intervisit periods vary independently of observer perceptions concerning the survival status of the subject and, in order to achieve robustness with respect to the assumption of constant survivorship, sampling intensity must vary independently of any temporal changes in the daily survival rate. Investigators are warned no, to return earlier than planned to subjects thought to have died, as this observer behavior may cause serious bias in the survivorship estimate.

232 citations


Journal ArticleDOI
01 Oct 1982-Ecology
TL;DR: In this article, the authors evaluate bias for six measures of overlap: likelihood ratio, the chi-square measure, the measure based on the Freeman-Tukey statistic, Morisita's adjusted index, MORISITA's original index, Horn's information index, and per centage similarity measure.
Abstract: Bias refers to the accuracy of a particular estimator. We evaluate bias, using analytic and simulation technics, for six measures of overlap: the likelihood ratio measure, the chi-square measure, the measure based on the Freeman-Tukey statistic, Morisita's adjusted index, Morisita's original index, and Horn's information index. We present an exact formula for a seventh, the per- centage similarity measure. We consider bias due to resource sample size, total number of different resources, and evenness of resource distribution. Results indicate that of the seven measures, changes in evenness of resource distribution produce significant bias only in the percentage similarity measure and Morisita's adjusted index. All measures show increasing bias with increasing number of resources. For estimating unbiased overlap, Morisita's original measure of overlap gives the most accurate results, especially when using small sample sizes. The percentage similarity measure, one of the most commonly used measures among ecologists, is also one of the most biased and for this reason is not preferred.

228 citations


Journal ArticleDOI
TL;DR: In this article, the authors assessed the accuracy of simultaneous estimation of item and person parameters in item response theory using the root mean squared error between recovered and actual item characteristic curves served as the principal measure of estimation accuracy for items.
Abstract: This monte carlo study assessed the accuracy of simultaneous estimation of item and person parameters in item response theory. Item responses were simulated using the two- and three-parameter logistic models. Samples of 200, 500, 1,000, and 2,000 simulated examinees and tests of 15, 30, and 60 items were generated. Item and person parameters were then estimated using the appropriate model. The root mean squared error between recovered and actual item characteristic curves served as the principal measure of estimation accuracy for items. The accuracy of estimates of ability was assessed by both correlation and root mean squared error. The results indicate that minimum sample sizes and tests lengths depend upon the response model and the purposes of an investigation. With item responses generated by the two-parameter model, tests of 30 items and samples of 500 appear adequate for some purposes. Estimates of ability and item parameters were less accurate in small sample sizes when item responses were generat...

206 citations


Journal ArticleDOI
M. T. Chao1
TL;DR: In this article, a general purpose unequal probability without replacement sampling plan with fixed sample size was proposed, which keeps the sample size fixed and lets the population units enter the sample one at a time through a carefully designed random mechanism.
Abstract: SUMMARY We present a general purpose unequal probability without replacement sampling plan with fixed sample size. In contrast to existing such plans, our scheme keeps the sample size fixed and lets the population units enter the sample one at a time through a carefully designed random mechanism. Consequently, all high-order inclusion probabilities can be easily computed.

160 citations


Journal ArticleDOI
TL;DR: In this paper, the mathematics required to calculate the asymptotic standard errors of the parameters of three commonly used logistic item response models is described and used to generate values for some common situations.
Abstract: The mathematics required to calculate the asymptotic standard errors of the parameters of three commonly used logistic item response models is described and used to generate values for some common situations. It is shown that the maximum likelihood estimation of a lower asymptote can wreak havoc with the accuracy of estimation of a location parameter, indicating that if one needs to have accurate estimates of location parameters (say for purposes of test linking/equating or computerized adaptive testing) the sample sizes required for acceptable accuracy may be unattainable in most applications. It is suggested that other estimation methods be used if the three parameter model is applied in these situations.

Journal ArticleDOI
TL;DR: A meta-analysis of outcomes from 32 studies investigating pretest effects was conducted as discussed by the authors, where standardized differences between pretested and non-pretested groups were computed as standardized differences.
Abstract: A meta-analysis of outcomes from 32 studies investigating pretest effects was conducted. All outcomes were computed as standardized differences between pretested and nonpretested groups. Eleven other variables were coded for each outcome. Initial descriptive statistics were indicative of differences between randomized and nonrandomized studies, so all further analyses were based on randomized group outcomes (n = 134). For all outcomes the average effect size was +.22, indicating the general elevating effect of pretest on posttest. Cognitive outcomes were raised .43, attitude outcomes .29, personality .48, and others about .00 standard deviations. Sixty-four percent of all effects were positive, and 81 percent of the cognitive effects were positive. Duration of time between pre- and posttesting was related to effect size, with effect size generally being small for durations less than a day or over 1 month. Year of publication, sample size, presence of experimental treatment, and sameness or difference of p...

Journal ArticleDOI
TL;DR: A concise table based on a general measure of magnitude of effect is presented in this paper, which allows direct determinations of statistical power over a practical range of values and alpha levels, and facilitates the setting of the research sample size needed to provide a given degree of power.
Abstract: A concise table based on a general measure of magnitude of effect is presented. This table allows direct determinations of statistical power over a practical range of values and alpha levels. The table also facilitates the setting of the research sample size needed to provide a given degree of power.

Journal ArticleDOI
TL;DR: Levin's measure of attributable risk is extended to adjust for confounding by aetiologic factors other than the exposure of interest and there appears no advantage in using the log-based interval suggested by Walter which is always longer than the simpler symmetric interval.
Abstract: This paper extends Levin's measure of attributable risk to adjust for confounding by aetiologic factors other than the exposure of interest. One can estimate this extended measure from case-control data provided either (i) from the control data one can estimate exposure prevalence within each stratum of the confounding factor; or (ii) one has additional information available concerning the confounder distribution and the stratum-specific disease rates. In both cases we give maximum likelihood estimates and their estimated asymptotic variances, and show them to be independent of the sampling design (matched vs. random). Computer simulations investigate the behaviour of these estimates and of three types of confidence intervals when sample size is small relative to the number of confounder strata. The simulations indicate that attributable risk estimates tend to be too low. The bias is not serious except when exposure prevalence is high among controls. In this case the estimates and their standard error estimates are also highly unstable. In general, the asymptotic standard error estimates performed quite well, even in small samples, and even when the true asymptotic standard error was too small. By contrast, the bootstrap estimate tended to be too large. None of the three confidence intervals proved superior in accuracy to the other two. Thus there appears no advantage in using the log-based interval suggested by Walter which is always longer than the simpler symmetric interval.

Journal ArticleDOI
TL;DR: It is suggested that fluctuating dental asymmetry is not yet established as a useful and reliable measure of general stress in human populations.
Abstract: Randomly distributed or "fluctuating" dental asymmetry has been accorded evolutionary meaning and interpreted as a result of environmental stress. However, except for congenital malformation syndromes, the determi- nants of human crown size asymmetry are still equivocal. Both a computer simu- lated sampling experiment using a combined sample size of N = 3000, and the re- quirements of adequate statistical power show that sample sizes of several hun- dred are needed to detect population differences in dental asymmetry. Using the largest available sample of children with defined prenatal stresses, we are unable to find systematic increases in crown size asymmetry. Given sampling limitations and the current inability to link increased human dental asymmetry to defined prenatal stresses, we suggest that fluctuating dental asymmetry is not yet estab- lished as a useful and reliable measure of general stress in human populations.

Journal ArticleDOI
TL;DR: The proposed confidence limits are shown to be asymptotically correct for continuous survival data, and the intervals suggested by Rothman are preferred for smaller samples.
Abstract: For survival probabilities with censored data, Rothman (1978, Journal of Chronic Diseases 31, 557-560) has recommended the use of quadratic confidence limits based on the assumption that the product of the 'effective' sample size at time t and the life-table estimate of the survival probability past time t follows a binomial distribution. This paper shows that the proposed confidence limits are asymptotically correct for continuous survival data. These intervals, as well as those based on the arcsine transformation, the logit transformation and the log(--log) transformation, are compared by simulation to those based on Greenwood's formula--the usual method of interval estimation in life-table analysis. With large amounts of data, the alternatives to the Greenwood method all produce acceptable intervals. On the basis of overall performance, the intervals suggested by Rothman are preferred for smaller samples. Any of these methods may be used to generate confidence sets for the median survival time or for any other quantile.

Journal ArticleDOI
TL;DR: In this article, a class of two-sample distribution-free tests that are appropriate for situations where one of the sample sizes is large relative to the other is considered, and optimality criteria for choosing a test from this class are discussed and limiting distributions for the associated class of test statistics are determined for the case where only one sample sizes goes to infinity.
Abstract: We consider a class of two-sample distribution-free tests that are appropriate for situations where one of the sample sizes is large relative to the other. These procedures are based on the placements of the observations in the smaller sample among the ordered observations in the larger sample, and this class of tests generalizes the Mann-Whitney (1947) procedure in much the same way that the class of linear rank tests generalizes the equivalent Wilcoxon (1945) rank sum form. Optimality criteria for choosing a test from this class are discussed and limiting distributions for the associated class of test statistics are determined for the case where only one of the sample sizes goes to infinity.

Journal ArticleDOI
Luc Devroye1
TL;DR: Any attempt to find a nontrivial distribution-free upper bound for Rn will fail, and any results on the rate of convergence of Rn to R* must use assumptions about the distribution of (X, Y).
Abstract: Consider the basic discrimination problem based on a sample of size n drawn from the distribution of (X, Y) on the Borel sets of Rdx {0, 1}. If 0 ?. Thus, any attempt to find a nontrivial distribution-free upper bound for Rn will fail, and any results on the rate of convergence of Rn to R* must use assumptions about the distribution of (X, Y).

Journal ArticleDOI
TL;DR: The truncated test is more favorable than the sequential probability ratio test in the sense that is has smaller average sample size when the actual location parameter is between \theta_{0} and \ theta_{1} .
Abstract: Truncation of a sequential test with constant boundaries is considered for the problem of testing a location hypothesis: f(x- \theta_{0}) versus f(x- \theta_{1}) . A test design procedure is developed by using bounds for the error probabilities under the hypothesis and alternative. By viewing the truncated sequential test as a mixture of a sequential probability ratio test and a fixed sample size test, its boundaries and truncation point can be obtained once the degree of mixture is specified. Asymptotically correct approximations for the operating characteristic function and the average sample number function of the resulting test are derived. Numerical results show that an appropriately designed truncated sequential test performs favorably as compared to both the fixed sample size test and the sequential probability ratio test with the same error probabilities. The average sample number function of the truncated test is uniformly smaller than that of the fixed sample size test, and the truncated test maintains average sample sizes under the hypothesis and the alternative that are close to those optimum values achieved by Wald's sequential probability ratio test. Moreover, the truncated test is more favorable than the sequential probability ratio test in the sense that is has smaller average sample size when the actual location parameter is between \theta_{0} and \theta_{1} . This behavior becomes more pronounced as the error probabilities become smaller, implying that the truncated sequential test becomes more favorable as the error probabilities become smaller.

Journal ArticleDOI
TL;DR: In this article, a method is outlined for incorporating into the sample size calculations the uncertainty of the estimate made at the design stage of a clinical trial in particular a formal scheme is described for deciding how many interim analyses should be performed to satisfy ethical and pragmatic requirements of large clinical trial design Although the argument will be 'Bayesian', the criteria for assessment and comparison will be strictly of a Neyman-Pearson (ie significance testing) kind.
Abstract: Small but important therapeutic effects of new treatments can be most efficiently detected through the study of large randomized prospective series of patients Such large scale clinical trials are nowadays commonplace The alternative is years of polemic and debate surrounding several trials each too small to detect plausible differences with any certainty Such trials produce equivocal and contradictory results, which could be predicted from power calculations based upon sensible pre-trial estimates of treatment differences Unfortunately such calculations often lead to sample sizes of several thousands It is not surprising that investigators tend to be over-optimistic in their estimation of treatment effects (which are necessarily uncertain) especially when the sample size requirements are so stark In this paper a method is outlined for incorporating into the sample size calculations the uncertainty of the estimate made at the design stage of a clinical trial In particular a formal scheme is described for deciding how many interim analyses should be performed to satisfy ethical and pragmatic requirements of large clinical trial design Although the argument will be 'Bayesian', the criteria for assessment and comparison will be strictly of a Neyman-Pearson (ie significance testing) kind

Journal ArticleDOI
TL;DR: In this paper, an estimator for the coefficients of the quadratic functional relationship is presented and the estimator is shown to be asymptotically normally distributed as the sample size increases.
Abstract: SUMMARY An estimator is presented for the coefficients of the quadratic functional relationship. The estimator is shown to be asymptotically normally distributed as the sample size increases. In deriving this result it is not assumed that replication occurs. A Monte Carlo study is presented agreeing well with the asymptotic results. An example from the earth sciences is analysed.

Journal ArticleDOI
TL;DR: In this article, sample size requirements are provided for the k group comparative clinical trial in which time-to-failure is the measure of treatment efficacy, and it is shown that heuristic use of sample size formulae or tables for comparing two treatment groups is not adequate for obtaining sufficient power and properly accounting for the multiple comparisons possible with k ⩾ 3 treatment groups.

Journal ArticleDOI
TL;DR: In this article, a goodness of fit test for quantal response bioassays is described and extended to the multiple regressor situation, and the size and power characteristics of the test statistic are described by a chi square variate for sample sizes as small as SO.
Abstract: A goodness of fit test statistic for the univariate logistic response model proposed by Prentice (1976) for quantal response bioassays is described and extended to the multiple regressor. variable situation. The size and power characteristics of the test statistic was found to be adequately described by a chi square variate for sample sizes as small as SO and its power was found to be dependent upon both the sample size and the true underlying response model.


Journal ArticleDOI
01 Jan 1982-Stroke
TL;DR: Critical evaluation of the literature was use to identify remediable flaws in the design of clinical trials of stroke treatment, andgnostic stratification is suggested as a method of overcoming problems of unbalanced allocation.
Abstract: Critical evaluation of the literature was use to identify remediable flaws in the design of clinical trials of stroke treatment. Trials of dexamethasone, dextran, and glycerol were reviewed. Available studies have in common major weaknesses in case selection (failure to exclude arteriolar strokes due to hemorrhage or lacunar infarction), and failure to estimate required sample size. Problems of case selection can be avoided with computerized tomography; the sample size required to show superiority of active treatment over placebo can be estimated using standard formulas. Prognostic stratification is suggested as a method of overcoming problems of unbalanced allocation. Further studies with improved design are required to evaluate the prospects for medical limitation of cerebral infarct size.

Journal ArticleDOI
TL;DR: In this paper, the authors compared the significance level and power of the uniformly most powerful test and an approximate test of equality of the failure intensities of two Poisson processes, and provided an approximate formula for determining the experiment length needed to achieve a specified power for both the equal and unequal interval cases.
Abstract: SUMMARY The uniformly most powerful test and an approximate test of equality of the failure intensities of two Poisson processes are studied. An approximately normally distributed test statistic is generalized to apply when the experiment is conducted over unequal time intervals (or sample sizes) from the two populations. The significance level and power of this test is compared to the uniformly most powerful exact fixed length test. An approximate formula is provided for determining the experiment length needed to achieve a specified power for both the equal and unequal interval cases. Numerical comparisons show that the approximation is useful over a wide range of specified values for either the exact or the approximate test.

Journal ArticleDOI
TL;DR: In this paper, the equality of two binomial proportions with partially incomplete paired data is examined using Monte Carlo methods, and the results suggest that the size of each test is more or less satisfactory for nominal sample sizes.
Abstract: Six simple procedures and the likelihood ratio criterion for testing the equality of two binomial proportions with partially incomplete paired data are examined using Monte Carlo methods. The results suggest that the size of each test is more or less satisfactory for nominal sample sizes. Tests which discard some of the observations are seen to have inferior power in comparison to those utilizing all observations. Recommendations are made for the various situations.

Journal ArticleDOI
TL;DR: In this paper, a quantification of the amount of robustness of the two-sample t-test procedure under violations of the assuaptioo of equal variances is presented.
Abstract: When the two-sample t-test has equal sample slies, it is widely considered to be a robust procedure (with respect to the significaoce level) under violatioa of the assuaptioo of equal variances. This paper is coa-earned with a quantification of the amount of robustness which this procedure has under such violations, The approach is through the concept of "religion of robustness" and the resluts show an extremely strong degree of robustness for the equal an extremely strong degree of robustness for the equal sample size t-test, probably more so than most statistyicians realise. This extremely high level of robustness, however, reduces quickly as the sample sizes begin to vary from equality. The regions of robustnes obtained show that while most users would likely be satisfied with the degree of robustness inherent when the two sample sizes each vary by 10% from equality, most would wish to be much more cautions when the variation is 20%. The study covers sample sizes n1 -= n 2 = 5(5)30(10)50 plus 10% and 2...

Journal ArticleDOI
TL;DR: In this paper, the authors used Ferguson's (1973) nonparametric priors in a Bayesian analysis of finite popula- tions, and showed that the usual estimates and confidence intervals for the population mean in simple and stratified random samples can be justified in Bayesian terms.
Abstract: SUMMARY Using Ferguson's (1973) non-parametric priors in a Bayesian analysis of finite popula- tions, we show that asymptotically, at least, the usual estimates and confidence intervals for the population mean in simple and stratified random samples can be justified in Bayesian terms. We then apply these models for estimating population percentiles, and a new procedure for interval estimates in stratified sampling is developed. AT present there are a number of general approaches in the statistical literature for making inferences based on data selected from a finite population. Many of these are summarized in Smith (1976). Probably the most widely used methodology, in practice, is the design-based inference from random samples, where criteria such as bias and mean-squared error, derived from the sampling distribution of the data, are used. This methodology does not preclude the possibility that the estimator may be based on an inherent superpopulation model; however, confidence intervals and mean-squared errors are based only on the probability distribution of the design, and ignore the model. One of the reasons for the popularity of this approach may be that it is non-parametric in nature and sample sizes tend to be large for many applications, so that the loss of efficiency compared with parametric superpopulation approaches is considered less important than the robustness of the design-based approach with respect to a wide class of models.

Journal ArticleDOI
TL;DR: Levin et al. as discussed by the authors examined the effect of variance variance on the robustness of the F test when testing for mean differences in the presence of unequal variances and found that the standard F test performed acceptably well as long as sample sizes were equal.
Abstract: One of the assumptions underlying the F test of parallelism of two or more regression lines is that the within-group residual variances are homogeneous. In the present two-group Monte Carlo investigation, the effect of violating (his assumption was examined for F, a large-sample chi-square approximation (C/0), and an approximate F test (F*). In terms of Type I error probabilities, the standard F test performed acceptably well as long as sample sizes were equal. This was not true when sample sizes were unequal, with F* proving to be clearly superior. The pattern of results paralleled exactly what is known about the robustness of the F test when testing for mean differences in the presence of unequal variances. Recommendations for the applied researcher follow directly from the findings, Questions concerning the parallelism (homogeneity) or nonparallelis m (heterogeneity) of K independent regression lines are relevant in at least two common empirical research situations. First, homogeneity of regression is an underlying assumption of classical analysis of covariance (ANCOVA; see Elashoff, 1969, and Glass, Peckham, & Sanders, 1972). Second, regression heterogeneity is of principal concern to researchers who are interested in discovering Aptitude X Treatment interactions (ATIs; see Cronbach & Snow, 1977). In either case, a statistical test of the homogeneity-ofregression hypothesis can be conducted— and usually is in the ATI situation. Moreover, the F test associated with this hypothesis is itself founded on several assumptions, including (among others) independence, normality, and homoscedasticity of the residuals. The focal assumption of the present investigation is that of equal residual variation The authors wish to thank Scott Maxwell, Penelope Peterson, and two anonymous referees for their helpful comments on an earlier draft of the manuscript. Thanks are also due Kathy Tobin for typing the manuscript. Requests for reprints should be sent to Joel R. Levin, Department of Educational Psychology, 1025 W. Johnson Street, University of Wisconsin, Madison, Wisconsin 53706. in the K populations. This assumption means that the variation of the dependent variable (y) about the regression line (F) is the same in each population. With the residual variance in the &th population defined as