scispace - formally typeset
Search or ask a question

Showing papers on "Sample size determination published in 1998"


Journal ArticleDOI
TL;DR: For example, this paper showed that using the adjusted Wald test with null rather than estimated standard error yields coverage probabilities close to nominal confidence levels, even for very small sample sizes, and that the 95% score interval has similar behavior as the adjusted-Wald interval obtained after adding two "successes" and two "failures" to the sample.
Abstract: For interval estimation of a proportion, coverage probabilities tend to be too large for “exact” confidence intervals based on inverting the binomial test and too small for the interval based on inverting the Wald large-sample normal test (i.e., sample proportion ± z-score × estimated standard error). Wilson's suggestion of inverting the related score test with null rather than estimated standard error yields coverage probabilities close to nominal confidence levels, even for very small sample sizes. The 95% score interval has similar behavior as the adjusted Wald interval obtained after adding two “successes” and two “failures” to the sample. In elementary courses, with the score and adjusted Wald methods it is unnecessary to provide students with awkward sample size guidelines.

3,276 citations


Journal ArticleDOI
TL;DR: A functional approximation to earlier exact results is shown to have excellent agreement with the exact results and one can use it easily without intensive numerical computation.
Abstract: A method is developed to calculate the required number of subjects k in a reliability study, where reliability is measured using the intraclass correlation rho. The method is based on a functional approximation to earlier exact results. The approximation is shown to have excellent agreement with the exact results and one can use it easily without intensive numerical computation. Optimal design configurations are also discussed; for reliability values of about 40 per cent or higher, use of two or three observations per subject will minimize the total number of observations required.

1,795 citations


Journal ArticleDOI
TL;DR: Two new approaches which also avoid aberrations are developed and evaluated, and a tail area profile likelihood based method produces the best coverage properties, but is difficult to calculate for large denominators.
Abstract: Several existing unconditional methods for setting confidence intervals for the difference between binomial proportions are evaluated. Computationally simpler methods are prone to a variety of aberrations and poor coverage properties. The closely interrelated methods of Mee and Miettinen and Nurminen perform well but require a computer program. Two new approaches which also avoid aberrations are developed and evaluated. A tail area profile likelihood based method produces the best coverage properties, but is difficult to calculate for large denominators. A method combining Wilson score intervals for the two proportions to be compared also performs well, and is readily implemented irrespective of sample size.

1,634 citations


Book
15 May 1998
TL;DR: This paper focuses on the design and analysis of nested cross-Sectional designs and the applications of these designs in the context of nested Cohort designs.
Abstract: 1. Introduction 2. Planning the Trials 3. Research Design 4. Planning the Analysis 5. Analysis for Nested Cross-Sectional Designs 6. Analysis for Nested-Cohort Designs 7. Applications of Analyses for Nested Cross-Sectional Designs 8. Applications of Analyses for Nested-Cohort Designs 9. Sample Size, Detectable Difference and Power 10. Case Studies

1,404 citations


Journal ArticleDOI
TL;DR: In this article, the modified t test is used to compare an individual's test score with a normative sample, where the normative sample is small and the individual is treated as a sample of N = 1.
Abstract: The standard method for comparing an individual's test score with a normative sample involves converting the score to a z score and evaluating it using a table of the area under the normal curve. When the normative sample is small, a more appropriate method is to treat the individual as a sample of N = 1 and use a modified t test described by Sokal and Rohlf (1995). The use of this t test is illustrated with examples and its results compared to those from the standard procedure. It is suggested that the t test be used when the N of the normative sample is less than 50. Finally, a computer program that implements the modified t-test procedure is described. This program can be downloaded from the first author's website.

1,166 citations


Journal ArticleDOI
01 Jan 1998-Pain
TL;DR: The meta‐analytic methodology used to provide quantitative evidence to address the question of the magnitude of sex differences in response to experimentally induced pain found the effect size to range from large to moderate, depending on whether threshold or tolerance were measured and which method of stimulus administration was used.
Abstract: Fillingim and Maixner (Fillingim, R.B. and Maixner, W., Pain Forum, 4(4) (1995) 209-221) recently reviewed the body of literature examining possible sex differences in responses to experimentally induced noxious stimulation. Using a 'box score' methodology, they concluded the literature supports sex differences in response to noxious stimuli, with females displaying greater sensitivity. However, Berkley (Berkley, K.J., Pain Forum, 4(4) (1995) 225-227) suggested the failure of a number of studies to reach statistical significance suggests the effect may be small and of little practical significance. This study used meta-analytic methodology to provide quantitative evidence to address the question of the magnitude of these sex differences in response to experimentally induced pain. We found the effect size to range from large to moderate, depending on whether threshold or tolerance were measured and which method of stimulus administration was used. The values for pressure pain and electrical stimulation, for both threshold and tolerance measures, were the largest. For studies employing a threshold measure, the effect for thermal pain was smaller and more variable. The failures to reject the null hypothesis in a number of these studies appear to have been a function of lack of power from an insufficient number of subjects. Given the estimated effect size of 0.55 threshold or 0.57 for tolerance, 41 subjects per group are necessary to provide adequate power (0.70) to test for this difference. Of the 34 studies reviewed by Fillingim and Maixner, only seven were conducted with groups of this magnitude. The results of this study compels to caution authors to obtain adequate sample sizes and hope that this meta-analytic review can aid in the determination of sample size for future studies.

1,030 citations


Journal ArticleDOI
TL;DR: This paper suggests use of sample size formulae for comparing means or for comparing proportions in order to calculate the required sample size for a simple logistic regression model.
Abstract: A sample size calculation for logistic regression involves complicated formulae. This paper suggests use of sample size formulae for comparing means or for comparing proportions in order to calculate the required sample size for a simple logistic regression model. One can then adjust the required sample size for a multiple logistic regression model by a variance inflation factor. This method requires no assumption of low response probability in the logistic model as in a previous publication. One can similarly calculate the sample size for linear regression models. This paper also compares the accuracy of some existing sample-size software for logistic regression with computer power simulations. An example illustrates the methods.

963 citations


Journal ArticleDOI
TL;DR: This article presents methods for sample size and power calculations for studies involving linear regression, applicable to clinical trials designed to detect a regression slope of a given magnitude or to studies that test whether the slopes or intercepts of two independent regression lines differ by a given amount.

929 citations


Book
01 Apr 1998
TL;DR: In this paper, the authors present a simple and general model for power analysis for minimum-effect tests, using power analysis with t-Tests and the analysis of variance, and the Implications of power analysis.
Abstract: 1. The Power of Statistical Tests. 2. A Simple and General Model for Power Analysis. 3. Power Analyses for Minimum-Effect Tests. 4. Using Power Analyses. 5. Correlation and Regression. 6. t-Tests and the Analysis of Variance. 7. Multi-Factor ANOVA Designs. 8. Split-Plot Factorial and Multivariate Analyses. 9. The Implications of Power Analyses. Appendices.

782 citations


Journal ArticleDOI
TL;DR: This article examined the use of data analysis tools by researchers in four research paradigms: between-subjects univariate, multivariate, repeated measures, and covariance designs, concluding that researchers rarely verify that validity assumptions are satisfied and that, accordingly, they typically use analyses that are nonrobust to assumption violations.
Abstract: Articles published in several prominent educational journals were examined to investigate the use of data analysis tools by researchers in four research paradigms: between-subjects univariate designs, between-subjects multivariate designs, repeated measures designs, and covariance designs. In addition to examining specific details pertaining to the research design (e.g., sample size, group size equality/inequality) and methods employed for data analysis, the authors also catalogued whether (a) validity assumptions were examined, (b) effect size indices were reported, (c) sample sizes were selected on the basis of power considerations, and (d) appropriate textbooks and/or articles were cited to communicate the nature of the analyses that were performed. The present analyses imply that researchers rarely verify that validity assumptions are satisfied and that, accordingly, they typically use analyses that are nonrobust to assumption violations. In addition, researchers rarely report effect size statistics, ...

571 citations


Journal ArticleDOI
TL;DR: Joreskog et al. as discussed by the authors examined the performance of the goodness-of-fit index under conditions of varying sample size, model specification and magnitude of factor loadings, and concluded that these three factors influenced GFI independently and jointly in terms of interactions.

Journal ArticleDOI
TL;DR: The authors developed methods for constructing asymptotically valid confidence intervals for the date of a single break in multivariate time series, including I(0), I(1), and deterministically trending regressors.
Abstract: This paper develops methods for constructing asymptotically valid confidence intervals for the date of a single break in multivariate time series, including I(0), I(1), and deterministically trending regressors. Although the width of the asymptotic confidence interval does not decrease as the sample size increases, it is inversely related to the number of series which have a common break date, so there are substantial gains to multivariate inference about break dates. These methods are applied to two empirical examples: the mean growth rate of output in three European countries, and the mean growth rate of U.S. consumption, investment, and output.

Journal ArticleDOI
TL;DR: The authors examined the results of nonparametric tests with small sample sizes published in a recent issue of Animal Behaviour and found that in more than half of the articles concerned, the asymptotic variant had apparently been inappropriately used and incorrect P values had been presented.


Journal ArticleDOI
TL;DR: In this paper, test statistics are proposed that can be used to test hypotheses about the parameters of the deterministic trend function of a univariate time series, and the tests are valid for I(0) and I(1) errors.
Abstract: In this paper test statistics are proposed that can be used to test hypotheses about the parameters of the deterministic trend function of a univariate time series. The tests are valid in the presence of general forms of serial correlation in the errors and can be used without having to estimate the serial correlation parameters either parametrically or nonparametrically. The tests are valid for I(0) and I(1) errors. Trend functions that are permitted include general linear polynomial trend functions that may have breaks at either known or unknown locations. Asymptotic distributions are derived, and consistency of the tests is established. The general results are applied to a model with a simple linear trend. A local asymptotic analysis is used to compute asymptotic size and power of the tests for this example. Size is well controlled and is relatively unaffected by the variance of the initial condition. Asymptotic power curves are computed for the simple linear trend model and are compared to existing tests. It is shown that the new tests have nontrivial asymptotic power. A simulation study shows that the asymptotic approximations are adequate for sample sizes typically used in economics. The tests are used to construct confidence intervals for average GNP growth rates for eight industrialized countries using post-war data.

01 Jan 1998
TL;DR: The purpose of this paper is to present the historical development of these fit indices and the various transformations and to examine the impact of sample size on both the fit mean squares and the t-transformations of those mean squares.
Abstract: Throughout the mid to late 1970's considerable research was conducted on the properties of Rasch fit mean squares. This work culminated in a variety of transformations to convert the mean squares into approximate t-statistics. This work was primarily motivated by the influence sample size has on the magnitude of the mean squares and the desire to have a single critical value that can generally be applied to most cases. In the late 1980's and the early 1990's the trend seems to have reversed, with numerous researchers using the untransformed fit mean squares as a means of testing fit to the Rasch measurement models. The principal motivation is cited as the influence sample size has on the sensitivity of the t-converted mean squares. The purpose of this paper is to present the historical development of these fit indices and the various transformations and to examine the impact of sample size on both the fit mean squares and the t-transformations of those mean squares. Because the sample size problem has little influence on the person mean square problem, due to the relatively short length (100 items or less), this paper focuses on the item fit mean squares, where it is common to find the statistics used with sample sizes ranging from 30 to 10,000.

Journal ArticleDOI
09 May 1998-BMJ
TL;DR: The calculation of sample size when subjects are randomised in groups or clusters is described in terms of two variances—the variance of observations taken from individuals in the same cluster, sw 2, and the variance of true cluster means, s c 2.
Abstract: We have described the calculation of sample size when subjects are randomised in groups or clusters in terms of two variances—the variance of observations taken from individuals in the same cluster, sw 2, and the variance of true cluster means, s c 2.1 We described how such a study could be analysed using the sample cluster means. The variance of such means would be s c 2+ s w 2/ m , where m is the number of subjects in a cluster. We used this to estimate the sample size needed for a cluster randomised trial. This sum of two components of variance is analogous to what happens with measurement error, where we have the variance within the subject, also denoted by sw 2, and between subjects ( s b …

Journal ArticleDOI
TL;DR: In this article, the authors compare properties of parameter estimators under Akaike information criterion (AIC) and consistent AIC (CAIC) model selection in a nested sequence of open population capture-recapture models.
Abstract: Summary We compare properties of parameter estimators under Akaike information criterion (AIC) and 'consistent' AIC (CAIC) model selection in a nested sequence of open population capture-recapture models. These models consist of product multinomials, where the cell probabilities are parameterized in terms of survival ( ) and capture ( p ) i i probabilities for each time interval i . The sequence of models is derived from 'treatment' effects that might be (1) absent, model H ; (2) only acute, model H ; or (3) acute and 0 2 p chronic, lasting several time intervals, model H . Using a 35 factorial design, 1000 3 repetitions were simulated for each of 243 cases. The true number of parameters ranged from 7 to 42, and the sample size ranged from approximately 470 to 55 000 per case. We focus on the quality of the inference about the model parameters and model structure that results from the two selection criteria. We use achieved confidence interval coverage as an integrating metric to judge what constitutes a ...

Journal ArticleDOI
TL;DR: In this article, the authors examined the effect of sample size on the mean productive efficiency of firms when the efficiency is evaluated using the nonparametric approach of Data Envelopment Analysis by employing Monte Carlo simulation.
Abstract: This study examines the effect of sample size on the mean productive efficiency of firms when the efficiency is evaluated using the non-parametric approach of Data Envelopment Analysis By employing Monte Carlo simulation, we show how the mean efficiency is related to the sample size The paper discusses the implications for international comparisons As an application, we investigate the efficiency of the electricity distribution industries in Australia, Sweden and New Zealand

Journal ArticleDOI
TL;DR: It is proposed that robust statistical analysis can be of great use for determinations of reference intervals from limited or possibly unreliable data.
Abstract: We propose a new methodology for the estimation of reference intervals for data sets with small numbers of observations or for those with substantial numbers of outliers. We propose a prediction interval that uses robust estimates of location and scale. The SAS software can be readily modified to do these calculations. We compared four reference interval procedures (nonparametric, transformed, robust with a nonparametric lower limit, and transformed robust) for sample sizes of 20, 40, 60, 80, 100, and 120 from chi 2 distributions of 1, 4, 7, and 10 df. chi 2 distributions were chosen because they simulate the skewness of distributions often found in clinical chemistry populations. We used the root mean square error as the measure of performance and used computer simulation to calculate this measure. The robust estimator showed the best performance for small sample sizes. As the sample size increased, the performance values converged. The robust method for calculating upper reference interval values yields reasonable results. In two examples using real data for haptoglobin and glucose, the robust estimator provides slightly smaller upper reference limits than the other procedures. Lastly, the robust estimator was compared with the other procedures in a population where 5% of the values were multiplied by a factor of 5. The reference intervals were calculated with and without outlier detection. In this case, the robust approach consistently yielded upper reference interval values that were closer to those of the true underlying distributions. We propose that robust statistical analysis can be of great use for determinations of reference intervals from limited or possibly unreliable data.

Journal ArticleDOI
TL;DR: The authors provides a consistent and asymptotically normal estimator for the intercept of a semiparametrically estimated sample selection model, which uses a decreasingly small fraction of all observations as the sample size goes to infinity.
Abstract: This paper provides a consistent and asymptotically normal estimator for the intercept of a semiparametrically estimated sample selection model. The estimator uses a decreasingly small fraction of all observations as the sample size goes to infinity, as in Heckman (1990). In the semiparametrics literature, estimation of the intercept has typically been subsumed in the nonparametric sample selection bias correction term. The estimation of the intercept, however, is important from an economic perspective. For instance, it permits one to determine the "wage gap" between unionized and nonunionized workers, decompose the wage differential between different socioeconomic groups (e.g. male-female and black-white), and evaluate the net benefits of a social programme.

Journal ArticleDOI
TL;DR: In this article, the authors consider inference in general binary response regression models under retrospective sampling plans and show that the estimating function obtained from the prospective likelihood is optimal in a class of unbiased estimating functions.
Abstract: SUMMARY We consider inference in general binary response regression models under retrospective sampling plans. Prentice & Pyke (1979) discovered that inference for the odds-ratio parameter in a logistic model can be based on a prospective likelihood even though the sampling scheme is retrospective. We show that the estimating function obtained from the prospective likelihood is optimal in a class of unbiased estimating functions. Also we link casecontrol sampling with a two-sample biased sampling problem, where the ratio of two densities is assumed to take a known parametric form. Connections between this model and the Cox proportional hazards model are pointed out. Large and small sample size behaviour of the proposed estimators is studied.

Journal ArticleDOI
TL;DR: In this article, the authors discuss methods for reducing the bias of consistent estimators that are biased in finite samples, and apply them to two problems: estimating the autoregressive parameter in an AR(1) model with a constant term, and estimating a logic model.

Journal ArticleDOI
TL;DR: This paper adapted the generalized estimating equation (GEE) approach of Liang and Zeger to sample size calculations for discrete and continuous outcome variables, and used the damped exponential family of correlation structures described in Munoz et al. for the working correlation matrix among the repeated measures.
Abstract: Derivation of the minimum sample size is an important consideration in an applied research effort. When the outcome is measured at a single time point, sample size procedures are well known and widely applied. The corresponding situation for longitudinal designs, however, is less well developed. In this paper, we adapt the generalized estimating equation (GEE) approach of Liang and Zeger to sample size calculations for discrete and continuous outcome variables. The non-central version of the Wald χ2 test is considered. We use the damped exponential family of correlation structures described in Munoz et al. for the ‘working’ correlation matrix among the repeated measures. We present a table of minimum sample sizes for binary outcomes, and discuss extensions that account for unequal allocation, staggered entry and loss to follow-up. © 1998 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this article, the authors examined the behavior of 8 measures of fit used to evaluate confirmatory factor analysis models and found that the measure of centrality was most affected by the design variables, with values of n 2 < 10 for sample size, model size, and level of nonnormality and interaction effects for Model Size x Level of Nonnormality.
Abstract: The purpose of this study was to examine the behavior of 8 measures of fit used to evaluate confirmatory factor analysis models. This study employed Monte Carlo simulation to determine to what extent sample size, model size, estimation procedure, and level of nonnormality affected fit when polytomous data were analyzed. The 3 indexes least affected by the design conditions were the comparative fit index, incremental fit index, and nonnormed fit index, which were affected only by level of nonnormality. The measure of centrality was most affected by the design variables, with values of n2>. 10 for sample size, model size, and level of nonnormality and interaction effects for Model Size x Level of Nonnormality and Estimation x Level of Nonnormality. Findings from this study should alert applied researchers to exercise caution when evaluating model fit with nonnormal, polytomous data.

Journal ArticleDOI
TL;DR: In this paper, the exact mean-squared error (MSE) of estimators of the variance in nonparametric regression based on quadratic forms is investigated, and it is shown that in many situations ordinary difference-based estimators are more appropriate for estimating the variance, because they control the bias much better and hence have a much better overall performance.
Abstract: The exact mean-squared error (MSE) of estimators of the variance in nonparametric regression based on quadratic forms is investigated. In particular, two classes of estimators are compared: Hall, Kay and Titterington's optimal difference-based estimators and a class of ordinary difference-based estimators which generalize methods proposed by Rice and Gasser, Sroka and Jennen-Steinmetz. For small sample sizes the MSE of the first estimator is essentially increased by the magnitude of the integrated first two squared derivatives of the regression function. It is shown that in many situations ordinary difference-based estimators are more appropriate for estimating the variance, because they control the bias much better and hence have a much better overall performance. It is also demonstrated that Rice's estimator does not always behave well. Data-driven guidelines are given to select the estimator with the smallest MSE.

Journal ArticleDOI
TL;DR: In this article, the authors assessed the accuracy of parallel analysis, a technique in which the observed eigenvalues are compared to eigen values from simulated data in which no real factors are present.
Abstract: Selecting the correct number of factors to retain in a factor analysis is a crucial step in developing psychometric tools or developing theories. The present study assessed the accuracy of parallel analysis, a technique in which the observed eigenvalues are compared to eigenvalues from simulated data in which no real factors are present. Study 1 investigated the effect of the presence of one real factor on the size of subsequent noise eigenvalues. The size of real factors and the sample size were manipulated. Study 2 examined the effect that the pattern of structure coefficients and continuousness of the variables have on the size of real and noise eigenvalues. Study 3 compared the results of Studies 1 and 2 to actual psychometric data. These examples illustrate the importance of modeling the data more closely when parallel analysis is used to determine the number of real factors.

Journal ArticleDOI
TL;DR: Health-related studies with populations composed partly or entirely of volunteers should take potential volunteer bias into account when analyzing and interpreting data.
Abstract: Background. Selection methods vary greatly in ease and cost-effectiveness. The effects of selection factors associated with subjects' recruitment into studies can introduce bias and seriously limit the generalizability of results. Methods. For an epidemiologic study, we recruited an age-stratified random sample of 1,422 community-dwelling individuals aged 65+ years from the voter registration lists in a rural area of southwestern Pennsylvania. The first 1,366 of these were accrued through intensive recruitment efforts; the last 56 of them responded to a single mailing. To increase sample size for future risk factor analyses, we also recruited by direct advertisement a sample of 259 volunteers from the same area. The three groups were compared on selected baseline characteristics and subsequent mortality. Results. The two subgroups of the random sample were not significantly different on any of the variables we examined. Compared to the random sample, in cross-sectional analyses, volunteers were significantly more likely to be women, more educated, and less likely to have used several health and human services. Volunteers also had higher cognitive test scores and Instrumental Activities of Daily Living (IADL) ability. Over 6-8 years (10,861 person-years) of follow-up, volunteers had significantly lower mortality rates than randomly selected subjects. Conclusions. Health-related studies with populations composed partly or entirely of volunteers should take potential volunteer bias into account when analyzing and interpreting data.

Journal ArticleDOI
TL;DR: Methods for determining sample size for studies of the accuracy of diagnostic tests are reviewed, and various study design issues are discussed, such as sampling methods, choices in format for the test results, and the issue of replicated readings.
Abstract: Methods for determining sample size for studies of the accuracy of diagnostic tests are reviewed. Several accuracy indices are considered, including sensitivity and specificity, the full and partial area under the receiver operating characteristic curve, the sensitivity at a fixed false positive rate, and the likelihood ratio. Sample size formulae are presented for studies evaluating a single test and studies comparing the accuracy of tests. Four real examples illustrate the concepts involved in sample size determination. Lastly, various study design issues are discussed, such as sampling methods, choices in format for the test results, and the issue of replicated readings.

Journal ArticleDOI
14 Feb 1998-BMJ
TL;DR: In this paper, sample size calculations for a cluster randomised trial are described. But most texts do not discuss sample size for trials which randomise groups (clusters) of people rather than individuals.
Abstract: Techniques for estimating sample size for randomised trials are well established,12 but most texts do not discuss sample size for trials which randomise groups (clusters) of people rather than individuals. For example, in a study of different preparations to control head lice all children in the same class were allocated to receive the same preparation. This was done to avoid contaminating the treatment groups through contact with control children in the same class.3 The children in the class cannot be considered independent of one another and the analysis should take this into account.45 There will be some loss of power due to randomising by cluster rather than individual and this should be reflected in the sample size calculations. Here we describe sample size calculations for a cluster randomised trial.