scispace - formally typeset
Search or ask a question

Showing papers on "Sample size determination published in 2007"


Journal ArticleDOI
TL;DR: Whereas the Bayesian Information Criterion performed the best of the ICs, the bootstrap likelihood ratio test proved to be a very consistent indicator of classes across all of the models considered.
Abstract: Mixture modeling is a widely applied data analysis technique used to identify unobserved heterogeneity in a population. Despite mixture models' usefulness in practice, one unresolved issue in the application of mixture models is that there is not one commonly accepted statistical indicator for deciding on the number of classes in a study population. This article presents the results of a simulation study that examines the performance of likelihood-based tests and the traditionally used Information Criterion (ICs) used for determining the number of classes in mixture modeling. We look at the performance of these tests and indexes for 3 types of mixture models: latent class analysis (LCA), a factor mixture model (FMA), and a growth mixture models (GMM). We evaluate the ability of the tests and indexes to correctly identify the number of classes at three different sample sizes (n = 200, 500, 1,000). Whereas the Bayesian Information Criterion performed the best of the ICs, the bootstrap likelihood ratio test ...

7,716 citations


Journal ArticleDOI
TL;DR: This paper proposed parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples.
Abstract: SUMMARY Non-biological experimental variation or “batch effects” are commonly observed across multiple batches of microarray experiments, often rendering the task of combining data from these batches difficult. The ability to combine microarray data sets is advantageous to researchers to increase statistical power to detect biological phenomena from studies where logistical considerations restrict sample size or in studies that require the sequential hybridization of arrays. In general, it is inappropriate to combine data sets without adjusting for batch effects. Methods have been proposed to filter batch effects from data, but these are often complicated and require large batch sizes (>25) to implement. Because the majority of microarray studies are conducted using much smaller sample sizes, existing methods are not sufficient. We propose parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples. We illustrate our methods using two example data sets and show that our methods are justifiable, easy to apply, and useful in practice. Software for our method is freely available at: http://biosun1.harvard.edu/complab/batch/.

6,319 citations


Journal ArticleDOI
01 Sep 2007
TL;DR: In this paper, the authors discuss the relationship of sample size and power and propose statistical rules for selecting sample sizes large enough for sufficient power to detect differences, associations, and factor analyses.
Abstract: This article addresses the definition of power and its relationship to Type I and Type II errors. We discuss the relationship of sample size and power. Finally, we offer statistical rules of thumb guiding the selection of sample sizes large enough for sufficient power to detecting differences, associations, chi‐square, and factor analyses.

1,534 citations


Journal ArticleDOI
TL;DR: A quantile-adjusted conditional maximum likelihood estimator for the dispersion parameter of the negative binomial distribution is derived and an "exact" test is derived that outperforms the standard approximate asymptotic tests.
Abstract: We derive a quantile-adjusted conditional maximum likelihood estimator for the dispersion parameter of the negative binomial distribution and compare its performance, in terms of bias, to various other methods. Our estimation scheme outperforms all other methods in very small samples, typical of those from serial analysis of gene expression studies, the motivating data for this study. The impact of dispersion estimation on hypothesis testing is studied. We derive an "exact" test that outperforms the standard approximate asymptotic tests.

1,038 citations


Journal ArticleDOI
TL;DR: A framework for making sampling and sample size considerations in interpretive research is provided for making sample sizes and sampling designs that are most compatible with their research purposes.
Abstract: The purpose of this paper is to emphasize the importance of sampling and sample size considerations in all qualitative research. Such considerations would help qualitative researchers to select sample sizes and sampling designs that are most compatible with their research purposes. First, we discuss the importance of sampling in qualitative research. Next, we outline 24 designs for selecting a sample in qualitative research. We then discuss the importance of selecting a sample size that yields data that have a realistic chance of reaching data saturation, theoretical saturation, or informational redundancy. Based on the literature, we then provide sample size guidelines for several qualitative research designs. As such, we provide a framework for making sampling and sample size considerations in interpretive research.

805 citations


Book
17 Apr 2007
TL;DR: In this paper, the authors present a survey of sample sizes for single-arm and multiple-arm clinical trials, focusing on the following issues: Confounding and interaction: 1-Sided Test Versus Two-Sides Test Crossover Design Versus Parallel Design Subgroup/Interim Analyses Data Transformation Practical Issues COMPARING MEANS One-Sample Design Two-Sample Parallel Design 2-Sample Crossover design Multiple-Sample One-Way ANOVA Multiple-sample Williams Design Practical issues LARGE SAMPLE TESTS for PROPORTIONS
Abstract: INTRODUCTION Regulatory Requirement Basic Considerations Procedures for Sample Size Calculation Aims and Structure of the Book CONSIDERATIONS PRIOR TO SAMPLE SIZE CALCULATION Confounding and Interaction One-Sided Test Versus Two-Sided Test Crossover Design Versus Parallel Design Subgroup/Interim Analyses Data Transformation Practical Issues COMPARING MEANS One-Sample Design Two-Sample Parallel Design Two-Sample Crossover Design Multiple-Sample One-Way ANOVA Multiple-Sample Williams Design Practical Issues LARGE SAMPLE TESTS FOR PROPORTIONS One-Sample Design Two-Sample Parallel Design Two-Sample Crossover Design One-Way Analysis of Variance Williams Design Relative Risk - Parallel Design Relative Risk - Crossover Design Practical Issues EXACT TESTS FOR PROPORTIONS Binomial Test Fisher's Exact Test Optimal Multiple-Stage Designs for Single Arm Trials Flexible Designs for Multiple-Arm Trials Remarks TESTS FOR GOODNESS-OF-FIT AND CONTINGENCY TABLES Tests for Goodness-of-Fit Test for Independence -Single Stratum Test for Independence -Multiple Strata Test for Categorical Shift Carry-Over Effect Test Practical Issues COMPARING TIME-TO-EVENT DATA Basic Concepts Exponential Model Cox's Proportional Hazards Model Weighted Log-Rank Test Practical Issues GROUP SEQUENTIAL METHODS Pocock's Test O'Brien and Fleming's Test Wang and Tsiatis' Test Inner Wedge Test Binary Variables Time-to-Event Data Alpha Spending Function Sample Size Re-Estimation Conditional Power Practical Issues COMPARING VARIABILITIES Comparing Intra-Subject Variabilities Comparing Intra-Subject CVs Comparing Inter-Subject Variabilities Comparing Total Variabilities Practical Issues BIOEQUIVALENCE TESTING Bioequivalence Criteria Average Bioequivalence Population Bioequivalence Individual Bioequivalence In Vitro Bioequivalence NONPARAMETRICS Violation of Assumptions One-Sample Location Problem Two-Sample Location Problem Test for Independence Practical Issues SAMPLE SIZE CALCULATION IN OTHER AREAS Dose Response Studies ANOVA with Repeated Measures Quality of Life Bridging Studies Vaccine Clinical Trials Appendix: Tables of Quantiles References Index

744 citations


Journal ArticleDOI
TL;DR: In this paper, the authors provide a compilation of intraclass correlation values of academic achievement and related covariate effects that could be used for planning group-randomized experiments in education.
Abstract: Experiments that assign intact groups to treatment conditions are increasingly common in social research. In educational research, the groups assigned are often schools. The design of group-randomized experiments requires knowledge of the intraclass correlation structure to compute statistical power and sample sizes required to achieve adequate power. This article provides a compilation of intraclass correlation values of academic achievement and related covariate effects that could be used for planning group-randomized experiments in education. It also provides variance component information that is useful in planning experiments involving covariates. The use of these values to compute the statistical power of group-randomized experiments is illustrated.

726 citations


01 Jan 2007
TL;DR: In this paper, the eigenvalues of the covariance matrix are all one, except for a finite number which are larger than a certain threshold, and the corresponding sample eigenvalue has a Gaussian limiting distribution.
Abstract: This paper deals with a multivariate Gaussian observation model where the eigenvalues of the covariance matrix are all one, except for a finite number which are larger. Of interest is the asymptotic behavior of the eigenvalues of the sample covariance matrix when the sample size and the dimension of the obser- vations both grow to infinity so that their ratio converges to a positive constant. When a population eigenvalue is above a certain threshold and of multiplicity one, the corresponding sample eigenvalue has a Gaussian limiting distribution. There is a "phase transition" of the sample eigenvectors in the same setting. Another contribution here is a study of the second order asymptotics of sample eigenvectors when corresponding eigenvalues are simple and sufficiently l arge.

716 citations


Journal ArticleDOI
TL;DR: The optimum test policy was found to be analysis by the 'N-1' chi-squared test when the minimum expected number is at least 1, and otherwise, by the Fisher-Irwin test by Irwin's rule (taking the total probability of tables in either tail that are as likely as, or less likely than the one observed).
Abstract: Two-by-two tables commonly arise in comparative trials and cross-sectional studies. In medical studies, two-by-two tables may have a small sample size due to the rarity of a condition, or to limited resources. Current recommendations on the appropriate statistical test mostly specify the chi-squared test for tables where the minimum expected number is at least 5 (following Fisher and Cochran), and otherwise the Fisher-Irwin test; but there is disagreement on which versions of the chi-squared and Fisher-Irwin tests should be used. A further uncertainty is that, according to Cochran, the number 5 was chosen arbitrarily. Computer-intensive techniques were used in this study to compare seven two-sided tests of two-by-two tables in terms of their Type I errors. The tests were K. Pearson's and Yates's chi-squared tests and the 'N-1' chi-squared test (first proposed by E. Pearson), together with four versions of the Fisher-Irwin test (including two mid-P versions). The optimum test policy was found to be analysis by the 'N-1' chi-squared test when the minimum expected number is at least 1, and otherwise, by the Fisher-Irwin test by Irwin's rule (taking the total probability of tables in either tail that are as likely as, or less likely than the one observed). This policy was found to have increased power compared to Cochran's recommendations.

701 citations


Book
29 Mar 2007
TL;DR: Economic Evaluation in Clinical Trials provides practical advice on how to conduct cost-effectiveness analyses in controlled trials of medical therapies, and topics discussed range from design issues such as the types of services that should be measured and price weights, to assessment of quality-adjusted life years.
Abstract: 1. Introduction to economic evaluations in clinical trials 2. Designing economic evaluations in clinical trials 3. Valuing medical service use 4. Assessing quality-adjusted life years 5. Analyzing cost 6. Analyzing censored cost 7. Comparing cost and effect: point estimates for cost-effectiveness ratios and net monetary benefit 8. Understanding sampling uncertainty: the concepts 9. Sampling uncertainty: calculation, sample size and power, and decision criteria 10. Transferability of the results from trials 11. Relevance of trial-based economic analyses

683 citations


Journal ArticleDOI
TL;DR: The study shows that inter-subject variability plays a prominent role in the relatively low sensitivity and reliability of group studies and focuses on the notion of reproducibility by bootstrapping.

Journal ArticleDOI
TL;DR: The distinctive feature of this approach is its acknowledgment of the asymmetry of sampling distributions for single correlations, which requires only the availability of confidence limits for the separate correlations and a method for taking into account the dependency between correlations.
Abstract: Confidence intervals are widely accepted as a preferred way to present study results They encompass significance tests and provide an estimate of the magnitude of the effect However, comparisons of correlations still rely heavily on significance testing The persistence of this practice is caused primarily by the lack of simple yet accurate procedures that can maintain coverage at the nominal level in a nonlopsided manner The purpose of this article is to present a general approach to constructing approximate confidence intervals for differences between (a) 2 independent correlations, (b) 2 overlapping correlations, (c) 2 nonoverlapping correlations, and (d) 2 independent R2s The distinctive feature of this approach is its acknowledgment of the asymmetry of sampling distributions for single correlations This approach requires only the availability of confidence limits for the separate correlations and, for correlated correlations, a method for taking into account the dependency between correlations These closed-form procedures are shown by simulation studies to provide very satisfactory results in small to moderate sample sizes The proposed approach is illustrated with worked examples

Journal ArticleDOI
TL;DR: In this paper, it was shown that the sample size effect can be rationalized almost completely by considering the stochastic of dislocation source lengths in samples of finite size, and the statistical first and second moments of the effective source length were derived as a function of sample size.

Journal ArticleDOI
TL;DR: In this article, the authors presented a method for the sample size calculation when ANCOVA is used and derived an approximate sample size formula for small randomized clinical trials that compare two treatments on a continuous outcome using analysis of covariance or a t -test approach.

Journal ArticleDOI
TL;DR: This work presents an approach that corrects for the ascertainment bias and generates an estimate of the frequency of a variant and its penetrance parameters and shows that application of the method to case-control data can improve the design of replication studies considerably.
Abstract: Genomewide association studies are now a widely used approach in the search for loci that affect complex traits. After detection of significant association, estimates of penetrance and allele-frequency parameters for the associated variant indicate the importance of that variant and facilitate the planning of replication studies. However, when these estimates are based on the original data used to detect the variant, the results are affected by an ascertainment bias known as the "winner's curse." The actual genetic effect is typically smaller than its estimate. This overestimation of the genetic effect may cause replication studies to fail because the necessary sample size is underestimated. Here, we present an approach that corrects for the ascertainment bias and generates an estimate of the frequency of a variant and its penetrance parameters. The method produces a point estimate and confidence region for the parameter estimates. We study the performance of this method using simulated data sets and show that it is possible to greatly reduce the bias in the parameter estimates, even when the original association study had low power. The uncertainty of the estimate decreases with increasing sample size, independent of the power of the original test for association. Finally, we show that application of the method to case-control data can improve the design of replication studies considerably.

Journal ArticleDOI
TL;DR: In this article, the authors compare 10 modeling techniques in terms of predictive power and sensitivity to location error, change in map resolution, and sample size, and assess whether some species traits can explain variation in model performance.
Abstract: Data characteristics and species traits are expected to influence the accuracy with which species' distributions can be modeled and predicted. We compare 10 modeling techniques in terms of predictive power and sensitivity to location error, change in map resolution, and sample size, and assess whether some species traits can explain variation in model performance. We focused on 30 native tree species in Switzerland and used presence- only data to model current distribution, which we evaluated against independent presence- absence data. While there are important differences between the predictive performance of modeling methods, the variance in model performance is greater among species than among techniques. Within the range of data perturbations in this study, some extrinsic parameters of data affect model performance more than others: location error and sample size reduced performance of many techniques, whereas grain had little effect on most techniques. No technique can rescue species that are difficult to predict. The predictive power of species- distribution models can partly be predicted from a series of species characteristics and traits based on growth rate, elevational distribution range, and maximum elevation. Slow-growing species or species with narrow and specialized niches tend to be better modeled. The Swiss presence-only tree data produce models that are reliable enough to be useful in planning and management applications.

Journal ArticleDOI
TL;DR: Simulation studies are used to assess the effect of varying sample size at both the individual and group level on the accuracy of the estimates of the parameters and variance components of multilevel logistic regression models, and suggest that low prevalent events require larger sample sizes.
Abstract: Background Many studies conducted in health and social sciences collect individual level data as outcome measures. Usually, such data have a hierarchical structure, with patients clustered within physicians, and physicians clustered within practices. Large survey data, including national surveys, have a hierarchical or clustered structure; respondents are naturally clustered in geographical units (e.g., health regions) and may be grouped into smaller units. Outcomes of interest in many fields not only reflect continuous measures, but also binary outcomes such as depression, presence or absence of a disease, and self-reported general health. In the framework of multilevel studies an important problem is calculating an adequate sample size that generates unbiased and accurate estimates.

Journal ArticleDOI
TL;DR: In this paper, it is shown that the population parameter values of a model can also influence the χ2 and lead to erroneous decisions about model acceptance/rejection, based on the examination of hypothetical population factor analytic models.

Book
12 Dec 2007
TL;DR: In this paper, Dattalo provides a pocket guide to sample size determination in empirical social work research, including techniques for advanced and emerging statistical strategies such as structural equation modeling, multilevel analysis, repeated measures MANOVA and repeated measures ANOVA.
Abstract: A researcher's decision about the sample to draw in a study may have an enormous impact on the results, and it rests on numerous statistical and practical considerations that can be difficult to juggle. Computer programs help, but no single software package exists that allows researchers to determine sample size across all statistical procedures. This pocket guide shows social work students, educators, and researchers how to prevent some of the mistakes that would result from a wrong sample size decision by describing and critiquing four main approaches to determining sample size. In concise, example-rich chapters, Dattalo covers sample-size determination using power analysis, confidence intervals, computer-intensive strategies, and ethical or cost considerations, as well as techniques for advanced and emerging statistical strategies such as structural equation modeling, multilevel analysis, repeated measures MANOVA and repeated measures ANOVA. He also offers strategies for mitigating pressures to increase sample size when doing so may not be feasible. Whether as an introduction to the process for students or as a refresher for experienced researchers, this practical guide is a perfect overview of a crucial but often overlooked step in empirical social work research.

Journal ArticleDOI
TL;DR: It is shown that nonparametric statistical tests provide convincing and elegant solutions for both problems and allow to incorporate biophysically motivated constraints in the test statistic, which may drastically increase the sensitivity of the test.

Journal ArticleDOI
TL;DR: In this paper, the authors examined the performance of alternatives to the naive test for comparison of survival curves and compared the type I errors and power of these tests for a variety of sample sizes by a Monte Carlo study.
Abstract: A common problem encountered in many medical applications is the comparison of survival curves. Often, rather than comparison of the entire survival curves, interest is focused on the comparison at a fixed point in time. In most cases, the naive test based on a difference in the estimates of survival is used for this comparison. In this note, we examine the performance of alternatives to the naive test. These include tests based on a number of transformations of the survival function and a test based on a generalized linear model for pseudo-observations. The type I errors and power of these tests for a variety of sample sizes are compared by a Monte Carlo study. We also discuss how these tests may be extended to situations where the data are stratified. The pseudo-value approach is also applicable in more detailed regression analysis of the survival probability at a fixed point in time. The methods are illustrated on a study comparing survival for autologous and allogeneic bone marrow transplants.

Journal ArticleDOI
TL;DR: The minimum sample size necessary to ensure the specified median life is obtained by assuming that the lifetimes of the test units follow a generalized Birnbaum–Saunders distribution is presented.
Abstract: In this article, we develop acceptance sampling plans when the life test is truncated at a pre-fixed time. The minimum sample size necessary to ensure the specified median life is obtained by assuming that the lifetimes of the test units follow a generalized Birnbaum–Saunders distribution. The operating characteristic values of the sampling plans as well as producer's risk are presented. Two examples are also given to illustrate the procedure developed here, with one of them being based on a real data from software reliability.

Journal ArticleDOI
TL;DR: In this article, explicit asymptotic bias formulae are given for dynamic panel regression estimators as the cross section sample size N → ∞, and the results extend earlier work by Nickell [1981] and later authors.

Journal ArticleDOI
TL;DR: A data augmentation approach to the analysis of multinomial models with unknown index that provides for a generic and efficient Bayesian implementation and three examples involving estimating the size of an animal population, estimating the number of diabetes cases in a population using the Rasch model, and the motivating example of estimating thenumber of species in an animal community with latent probabilities of species occurrence and detection are described.
Abstract: Multinomial models with unknown index (“sample size”) arise in many practical settings. In practice, Bayesian analysis of such models has proved difficult because the dimension of the parameter space is not fixed, being in some cases a function of the unknown index. We describe a data augmentation approach to the analysis of this class of models that provides for a generic and efficient Bayesian implementation. Under this approach, the data are augmented with all-zero detection histories. The resulting augmented dataset is modeled as a zero-inflated version of the complete-data model where an estimable zero-inflation parameter takes the place of the unknown multinomial index. Interestingly, data augmentation can be justified as being equivalent to imposing a discrete uniform prior on the multinomial index. We provide three examples involving estimating the size of an animal population, estimating the number of diabetes cases in a population using the Rasch model, and the motivating example of estimating t...

Journal ArticleDOI
01 Oct 2007-Genetics
TL;DR: Numerical evaluations of exact probability distributions and computer simulations verify that this new estimator yields unbiased estimates also when based on a modest number of alleles and loci, and eliminates the bias associated with earlier estimators.
Abstract: Amounts of genetic drift and the effective size of populations can be estimated from observed temporal shifts in sample allele frequencies. Bias in this so-called temporal method has been noted in cases of small sample sizes and when allele frequencies are highly skewed. We characterize bias in commonly applied estimators under different sampling plans and propose an alternative estimator for genetic drift and effective size that weights alleles differently. Numerical evaluations of exact probability distributions and computer simulations verify that this new estimator yields unbiased estimates also when based on a modest number of alleles and loci. At the cost of a larger standard deviation, it thus eliminates the bias associated with earlier estimators. The new estimator should be particularly useful for microsatellite loci and panels of SNPs, representing a large number of alleles, many of which will occur at low frequencies.

Journal ArticleDOI
TL;DR: Although PLS with the product indicator approach provides higher point estimates of interaction paths, it also produces wider confidence intervals, and thus provides less statistical power than multiple regression, and this disadvantage increases with the number of indicators and (up to a point) with sample size.
Abstract: A significant amount of information systems (IS) research involves hypothesizing and testing for interaction effects. Chin et al. (2003) completed an extensive experiment using Monte Carlo simulation that compared two different techniques for detecting and estimating such interaction effects: partial least squares (PLS) with a product indicator approach versus multiple regression with summated indicators. By varying the number of indicators for each construct and the sample size, they concluded that PLS using product indicators was better (at providing higher and presumably more accurate path estimates) than multiple regression using summated indicators. Although we view the Chin et al. (2003) study as an important step in using Monte Carlo analysis to investigate such issues, we believe their results give a misleading picture of the efficacy of the product indicator approach with PLS. By expanding the scope of the investigation to include statistical power, and by replicating and then extending their work, we reach a different conclusion---that although PLS with the product indicator approach provides higher point estimates of interaction paths, it also produces wider confidence intervals, and thus provides less statistical power than multiple regression. This disadvantage increases with the number of indicators and (up to a point) with sample size. We explore the possibility that these surprising results can be explained by capitalization on chance. Regardless of the explanation, our analysis leads us to recommend that if sample size or statistical significance is a concern, regression or PLS with product of the sums should be used instead of PLS with product indicators for testing interaction effects.

Journal ArticleDOI
TL;DR: In this article, the authors compared different statistical models to predict species distributions under different shapes of occurrence-environment relationship, using real and simulated data from a real landscape, the state of California, and simulated species distributions within this landscape.
Abstract: Aim To test statistical models used to predict species distributions under different shapes of occurrence-environment relationship. We addressed three questions: (1) Is there a statistical technique that has a consistently higher predictive ability than others for all kinds of relationships? (2) How does species prevalence influence the relative performance of models? (3) When an automated stepwise selection procedure is used, does it improve predictive modelling, and are the relevant variables being selected? Location We used environmental data from a real landscape, the state of California, and simulated species distributions within this landscape. Methods Eighteen artificial species were generated, which varied in their occurrence response to the environmental gradients considered (random, linear, Gaussian, threshold or mixed), in the interaction of those factors (no interaction vs. multiplicative), and on their prevalence (50% vs. 5%). The landscape was then randomly sampled with a large (n = 2000) or small (n = 150) sample size, and the predictive ability of each statistical approach was assessed by comparing the true and predicted distributions using five different indexes of performance (area under the receiver-operator characteristic curve, Kappa, correlation between true and predictive probability of occurrence, sensitivity and specificity). We compared generalized additive models (GAM) with and without flexible degrees of freedom, logistic regressions (general linear models, GLM) with and without variable selection, classification trees, and the genetic algorithm for rule-set production (GARP). Results Species with threshold and mixed responses, additive environmental effects, and high prevalence generated better predictions than did other species for all statistical models. In general, GAM outperforms all other strategies, although differences with GLM are usually not significant. The two variable-selection strategies presented here did not discriminate successfully between truly causal factors and correlated environmental variables. Main conclusions Based on our analyses, we recommend the use of GAM or GLM over classification trees or GARP, and the specification of any suspected interaction terms between predictors. An expert-based variable selection procedure was preferable to the automated procedures used here. Finally, for low-prevalence species, variability in model performance is both very high and sample-dependent. This suggests that distribution models for species with low prevalence can be improved through targeted sampling.

Journal ArticleDOI
TL;DR: The standard MLE method consistently outperformed the so-called robust variations of the MLE-based and LPR-based methods, as well as the various NP methods, for both the 95th percentile and the mean of right-skewed occupational exposure data.
Abstract: The purpose of this study was to compare the performance of several methods for statistically analyzing censored datasets [i.e. datasets that contain measurements that are less than the field limit-of-detection (LOD)] when estimating the 95th percentile and the mean of right-skewed occupational exposure data. The methods examined were several variations on the maximum likelihood estimation (MLE) and log-probit regression (LPR) methods, the common substitution methods, several non-parametric (NP) quantile methods for the 95th percentile and the NP Kaplan-Meier (KM) method. Each method was challenged with computer-generated censored datasets for a variety of plausible scenarios where the following factors were allowed to vary randomly within fairly wide ranges: the true geometric standard deviation, the censoring point or LOD and the sample size. This was repeated for both a single-laboratory scenario (i.e. single LOD) and a multiple-laboratory scenario (i.e. three LODs) as well as a single lognormal distribution scenario and a contaminated lognormal distribution scenario. Each method was used to estimate the 95th percentile and mean for the censored datasets (the NP quantile methods estimated only the 95th percentile). For each scenario, the method bias and overall imprecision (as indicated by the root mean square error or rMSE) were calculated for the 95th percentile and mean. No single method was unequivocally superior across all scenarios, although nearly all of the methods excelled in one or more scenarios. Overall, only the MLE- and LPR-based methods performed well across all scenarios, with the robust versions generally showing less bias than the standard versions when challenged with a contaminated lognormal distribution and multiple LODs. All of the MLE- and LPR-based methods were remarkably robust to departures from the lognormal assumption, nearly always having lower rMSE values than the NP methods for the exposure scenarios postulated. In general, the MLE methods tended to have smaller rMSE values than the LPR methods, particularly for the small sample size scenarios. The substitution methods tended to be strongly biased, but in some scenarios had the smaller rMSE values, especially for sample sizes <20. Surprisingly, the various NP methods were not as robust as expected, performing poorly in the contaminated distribution scenarios for both the 95th percentile and the mean. In conclusion, when using the rMSE rather than bias as the preferred comparison metric, the standard MLE method consistently outperformed the so-called robust variations of the MLE-based and LPR-based methods, as well as the various NP methods, for both the 95th percentile and the mean. When estimating the mean, the standard LPR method tended to outperform the robust LPR-based methods. Whenever bias is the main consideration, the robust MLE-based methods should be considered. The KM method, currently hailed by some as the preferred method for estimating the mean when the lognormal distribution assumption is questioned, did not perform well for either the 95th percentile or mean and is not recommended.

Journal ArticleDOI
Abstract: The paper examines various tests for assessing whether a time series model requires a slope component. We first consider the simple t-test on the mean of first differences and show that it achieves high power against the alternative hypothesis of a stochastic nonstationary slope as well as against a purely deterministic slope. The test may be modified, parametrically or nonparametrically to deal with serial correlation. Using both local limiting power arguments and finite sample Monte Carlo results, we compare the t-test with the nonparametric tests of Vogelsang (1998) and with a modified stationarity test. Overall the t-test seems a good choice, particularly if it is implemented by fitting a parametric model to the data. When standardized by the square root of the sample size, the simple t-statistic, with no correction for serial correlation, has a limiting distribution if the slope is stochastic. We investigate whether it is a viable test for the null hypothesis of a stochastic slope and conclude that its value may be limited by an inability to reject a small deterministic slope. Empirical illustrations are provided using series of relative prices in the euro-area and data on global temperature.

15 Dec 2007
TL;DR: In this article, the authors use the statistical software package GPower to illustrate the importance of effect and sample size in optimising the probability of a study to detect treatment effects, without requiring these effects to be massive.
Abstract: Background. The issue of sample size has become a dominant concern for UK research ethics committees since their reform in 2004. Sample size estimation is now a major, but often misunderstood concern for researchers, academic supervisors and members of research ethics committees. Aim. To enable researchers and research ethics committee members with non-statistical backgrounds to use freely available statistical software to explore and address issues relating to sample size, effect size and power. Method. Basic concepts are examined before utilising the statistical software package GPower to illustrate the use of alpha level, beta level and effect size in sample size calculation. Examples involving t-tests, analysis of variance (ANOVA) and chi-square tests are used. Results. The examples illustrate the importance of effect and sample size in optimising the probability of a study to detect treatment effects, without requiring these effects to be massive. Conclusions. Researchers and research ethics committee members need to be familiar with the technicalities of sample size estimation in order to make informed judgements on sample size, power of tests and associated ethical issues. Alpha and power levels can be pre-specified, but effect size is more problematic. GPower may be used to replicate the examples in this paper, which may be generalised to more complex study designs.