scispace - formally typeset
Search or ask a question

Showing papers on "Sample size determination published in 2009"


Journal ArticleDOI
TL;DR: A more progressive resource for sample-based studies, meta-analyses, and case studies in sports medicine and exercise science is presented, and forthright advice on controversial or novel issues is offered.
Abstract: Statistical guidelines and expert statements are now available to assist in the analysis and reporting of studies in some biomedical disciplines. We present here a more progressive resource for sample-based studies, meta-analyses, and case studies in sports medicine and exercise science. We offer forthright advice on the following controversial or novel issues: using precision of estimation for inferences about population effects in preference to null-hypothesis testing, which is inadequate for assessing clinical or practical importance; justifying sample size via acceptable precision or confidence for clinical decisions rather than via adequate power for statistical significance; showing SD rather than SEM, to better communicate the magnitude of differences in means and nonuniformity of error; avoiding purely nonparametric analyses, which cannot provide inferences about magnitude and are unnecessary; using regression statistics in validity studies, in preference to the impractical and biased limits of agreement; making greater use of qualitative methods to enrich sample-based quantitative projects; and seeking ethics approval for public access to the depersonalized raw data of a study, to address the need for more scrutiny of research and better meta-analyses. Advice on less contentious issues includes the following: using covariates in linear models to adjust for confounders, to account for individual differences, and to identify potential mechanisms of an effect; using log transformation to deal with nonuniformity of effects and error; identifying and deleting outliers; presenting descriptive, effect, and inferential statistics in appropriate formats; and contending with bias arising from problems with sampling, assignment, blinding, measurement error, and researchers' prejudices. This article should advance the field by stimulating debate, promoting innovative approaches, and serving as a useful checklist for authors, reviewers, and editors.

6,467 citations


Journal ArticleDOI
TL;DR: In this paper, a large-scale Monte-Carlo simulation was conducted to compare the performance of covariance-based and partial least squares (PLS) analysis with PLS and CBSEM.

1,864 citations


Journal ArticleDOI
TL;DR: In this article, a large-scale Monte-Carlo simulation was conducted to compare the performance of covariance-based and partial least squares (PLS) analysis with PLS.
Abstract: Variance-based SEM, also known under the term partial least squares (PLS) analysis, is an approach that has gained increasing interest among marketing researchers in recent years. During the last 25 years, more than 30 articles have been published in leading marketing journals that have applied this approach instead of the more traditional alternative of covariance-based SEM (CBSEM). However, although an analysis of these previous publications shows that there seems to be at least an implicit agreement about the factors that should drive the choice between PLS analysis and CBSEM, no research has until now empirically compared the performance of these approaches given a set of different conditions. Our study addresses this open question by conducting a large-scale Monte-Carlo simulation. We show that justifying the choice of PLS due to a lack of assumptions regarding indicator distribution and measurement scale is often inappropriate, as CBSEM proves extremely robust with respect to violations of its underlying distributional assumptions. Additionally, CBSEM clearly outperforms PLS in terms of parameter consistency and is preferable in terms of parameter accuracy as long as the sample size exceeds a certain threshold (250 observations). Nevertheless, PLS analysis should be preferred when the emphasis is on prediction and theory development, as the statistical power of PLS is always larger than or equal to that of CBSEM; already, 100 observations can be sufficient to achieve acceptable levels of statistical power given a certain quality of the measurement model.

1,378 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a critical review of the blanket test procedures and suggest new ones for goodness-of-fit testing of copula models, and describe and interpret the results of a large Monte Carlo experiment designed to assess the effect of the sample size and the strength of dependence on the level and power of blanket tests for various combinations of Copula models under the null hypothesis and the alternative.
Abstract: Many proposals have been made recently for goodness-of-fit testing of copula models. After reviewing them briefly, the authors concentrate on “blanket tests”, i.e., those whose implementation requires neither an arbitrary categorization of the data nor any strategic choice of smoothing parameter, weight function, kernel, window, etc. The authors present a critical review of these procedures and suggest new ones. They describe and interpret the results of a large Monte Carlo experiment designed to assess the effect of the sample size and the strength of dependence on the level and power of the blanket tests for various combinations of copula models under the null hypothesis and the alternative. To circumvent problems in the determination of the limiting distribution of the test statistics under composite null hypotheses, they recommend the use of a double parametric bootstrap procedure, whose implementation is detailed. They conclude with a number of practical recommendations.

995 citations


Journal ArticleDOI
TL;DR: Results showed that when data are well conditioned, EFA can yield reliable results for N well below 50, even in the presence of small distortions, which may be uncommon but should certainly not be ruled out in behavioral research data.
Abstract: Exploratory factor analysis (EFA) is generally regarded as a technique for large sample sizes (N), with N = 50 as a reasonable absolute minimum. This study offers a comprehensive overview of the conditions in which EFA can yield good quality results for N below 50. Simulations were carried out to estimate the minimum required N for different levels of loadings (λ), number of factors (f), and number of variables (p) and to examine the extent to which a small N solution can sustain the presence of small distortions such as interfactor correlations, model error, secondary loadings, unequal loadings, and unequal p/f. Factor recovery was assessed in terms of pattern congruence coefficients, factor score correlations, Heywood cases, and the gap size between eigenvalues. A subsampling study was also conducted on a psychological dataset of individuals who filled in a Big Five Inventory via the Internet. Results showed that when data are well conditioned (i.e., high λ, low f, high p), EFA can yield reliable results for N well below 50, even in the presence of small distortions. Such conditions may be uncommon but should certainly not be ruled out in behavioral research data. ∗ These authors contributed equally to this work.

777 citations


Journal ArticleDOI
TL;DR: D2 seems a better alternative than I2 to consider model variation in any random-effects meta-analysis despite the choice of the between trial variance estimator that constitutes the model.
Abstract: There is increasing awareness that meta-analyses require a sufficiently large information size to detect or reject an anticipated intervention effect. The required information size in a meta-analysis may be calculated from an anticipated a priori intervention effect or from an intervention effect suggested by trials with low-risk of bias. Information size calculations need to consider the total model variance in a meta-analysis to control type I and type II errors. Here, we derive an adjusting factor for the required information size under any random-effects model meta-analysis. We devise a measure of diversity (D 2) in a meta-analysis, which is the relative variance reduction when the meta-analysis model is changed from a random-effects into a fixed-effect model. D 2 is the percentage that the between-trial variability constitutes of the sum of the between-trial variability and a sampling error estimate considering the required information size. D 2 is different from the intuitively obvious adjusting factor based on the common quantification of heterogeneity, the inconsistency (I 2), which may underestimate the required information size. Thus, D 2 and I 2 are compared and interpreted using several simulations and clinical examples. In addition we show mathematically that diversity is equal to or greater than inconsistency, that is D 2 ≥ I 2, for all meta-analyses. We conclude that D 2 seems a better alternative than I 2 to consider model variation in any random-effects meta-analysis despite the choice of the between trial variance estimator that constitutes the model. Furthermore, D 2 can readily adjust the required information size in any random-effects model meta-analysis.

701 citations


Book
01 Jan 2009
TL;DR: Although the book mainly focuses on medical and public health applications, it shows that the rigorous evidence of intervention effects provided by CRTs has the potential to inform public policy in a wide range of other areas.
Abstract: BASIC CONCEPTS Introduction Randomised Trials Variability between Clusters Introduction The Implications of between-Cluster Variability: Some Examples Measures of between-Cluster Variability The Design Effect Sources of within-Cluster Correlation Choosing whether to Randomise by Cluster Introduction Rationale for Cluster Randomisation Using Cluster Randomisation to Capture Indirect Effects of Intervention Disadvantages and Limitations of Cluster Randomisation DESIGN ISSUES Choice of Clusters Introduction Types of Cluster Size of Clusters Strategies to Reduce Contamination Levels of Randomisation, Intervention, Data Collection, and Inference Matching and Stratification Introduction Rationale for Matching Disadvantages of Matching Stratification as an Alternative to Matching Choice of Matching Variables Choosing whether to Match or Stratify Randomisation Procedures Introduction Restricted Randomisation Some Practical Aspects of Randomisation Sample Size Introduction Sample Size for Unmatched Trials Sample Size for Matched and Stratified Trials Estimating the between-Cluster Coefficient of Variation Choice of Sample Size in Each Cluster Further Issues in Sample Size Calculation Alternative Study Designs Introduction Design Choices for Treatment Arms Design Choices for Impact Evaluation ANALYTICAL METHODS Basic Principles of Analysis Introduction Experimental and Observational Units Parameters of Interest Approaches to Analysis Baseline Analysis Analysis Based on Cluster-Level Summaries Introduction Point Estimates of Intervention Effects Statistical Inference Based on the t Distribution Statistical Inference Based on a Quasilikelihood Approach Adjusting for Covariates Nonparametric Methods Analysing for Effect Modification Regression Analysis Based on Individual-Level Data Introduction Random Effects Models Generalised Estimating Equations Choice of Analytical Method Analysing for Effect Modification More Complex Analyses Analysis of Trials with More Complex Designs Introduction Analysis of Pair-Matched Trials Analysis of Stratified Trials Analysis of Other Study Designs MISCELLANEOUS TOPICS Ethical Considerations Introduction General Principles Ethical Issues in Group Allocation Informed Consent in Cluster Randomised Trials Other Ethical Issues Conclusion Data Monitoring Introduction Data Monitoring Committees Interim Analyses Reporting and Interpretation Introduction Reporting of Cluster Randomised Trials Interpretation and Generalisability References Index

695 citations


Journal ArticleDOI
TL;DR: A trial sequential analysis (TSA) may reduce risk of random errors due to repetitive testing of accumulating data by evaluating meta-analyses not reaching the information size with monitoring boundaries, analogous to sequential monitoring boundaries in a single trial.
Abstract: Background Random error may cause misleading evidence in meta-analyses. The required number of participants in a meta-analysis (i.e. information size) should be at least as large as an adequately powered single trial. Trial sequential analysis (TSA) may reduce risk of random errors due to repetitive testing of accumulating data by evaluating meta-analyses not reaching the information size with monitoring boundaries. This is analogous to sequential monitoring boundaries in a single trial. Methods We selected apparently conclusive (P4 0.05) Cochrane neonatal meta-analyses. We applied heterogeneity-adjusted and unadjusted TSA on these meta-analyses by calculating the information size, the monitoring boundaries, and the cumulative Z-statistic after each trial. We identified the proportion of meta-analyses that did not reach the required information size and the proportion of these meta-analyses in which the Z-curve did not cross the monitoring boundaries. Results Of 54 apparently conclusive meta-analyses, 39 (72%) did not reach the heterogeneity-adjusted information size required to accept or reject an intervention effect of 25% relative risk reduction. Of these 39, 19 meta-analyses (49%) were considered inconclusive, because the cumulative Z-curve did not cross the monitoring boundaries. The median number of participants required to reach the required information size was 1591 (range, 339–6149). TSA without heterogeneity adjustment largely confirmed these results. Conclusions Many apparently conclusive Cochrane neonatal meta-analyses may become inconclusive when the statistical analyses take into account the risk of random error due to repetitive testing.

683 citations


Journal ArticleDOI
TL;DR: Evaluating statistical inference with trial sequential monitoring boundaries when meta-analyses fall short of a required IS may reduce the risk of false positive results and important inaccurate effect estimates.
Abstract: Background Results from apparently conclusive meta-analyses may be false. A limited number of events from a few small trials and the associated random error may be under-recognized sources of spurious findings. The information size (IS, i.e. number of participants) required for a reliable and conclusive meta-analysis should be no less rigorous than the sample size of a single, optimally powered randomized clinical trial. If a meta-analysis is conducted before a sufficient IS is reached, it should be evaluated in a manner that accounts for the increased risk that the result might represent a chance finding (i.e. applying trial sequential monitoring boundaries). Methods We analysed 33 meta-analyses with a sufficient IS to detect a treatment effect of 15% relative risk reduction (RRR). We successively monitored the results of the meta-analyses by generating interim cumulative meta-analyses after each included trial and evaluated their results using a conventional statistical criterion (a ¼ 0.05) and two-sided Lan-DeMets monitoring boundaries. We examined the proportion of false positive results and important inaccuracies in estimates of treatment effects that resulted from the two approaches.

635 citations


Journal ArticleDOI
TL;DR: It is argued that the statistical power to detect a causative variant should be the major criterion in study design and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage.
Abstract: Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped. The most common method of comparing different chips is using a measure of coverage, but this fails to properly account for the effects of sample size, the genetic model of the disease, and linkage disequilibrium between SNPs. In this paper, we argue that the statistical power to detect a causative variant should be the major criterion in study design. Because of the complicated pattern of linkage disequilibrium (LD) in the human genome, power cannot be calculated analytically and must instead be assessed by simulation. We describe in detail a method of simulating case-control samples at a set of linked SNPs that replicates the patterns of LD in human populations, and we used it to assess power for a comprehensive set of available genotyping chips. Our results allow us to compare the performance of the chips to detect variants with different effect sizes and allele frequencies, look at how power changes with sample size in different populations or when using multi-marker tags and genotype imputation approaches, and how performance compares to a hypothetical chip that contains every SNP in HapMap. A main conclusion of this study is that marked differences in genome coverage may not translate into appreciable differences in power and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage. We also show that genotype imputation can be used to boost the power of many chips up to the level obtained from a hypothetical “complete” chip containing all the SNPs in HapMap. Our results have been encapsulated into an R software package that allows users to design future association studies and our methods provide a framework with which new chip sets can be evaluated.

608 citations


Journal ArticleDOI
01 May 2009-Oikos
TL;DR: Partial least squares regression analysis (PLSR) as mentioned in this paper is a statistical technique particularly well suited to analyzing a large array of related predictor variables (i.e. not truly independent), with a sample size not large enough compared to the number of independent variables, and in cases in which an attempt is made to approach complex phenomena or syndromes that must be defined as a combination of several variables obtained independently.
Abstract: This paper briefly presents the aims, requirements and results of partial least squares regression analysis (PLSR), and its potential utility in ecological studies. This statistical technique is particularly well suited to analyzing a large array of related predictor variables (i.e. not truly independent), with a sample size not large enough compared to the number of independent variables, and in cases in which an attempt is made to approach complex phenomena or syndromes that must be defined as a combination of several variables obtained independently. A simulation experiment is carried out to compare this technique with multiple regression (MR) and with a combination of principal component analysis and multiple regression (PCA+MR), varying the number of predictor variables and sample sizes. PLSR models explained a similar amount of variance to those results obtained by MR and PCA+MR. However, PLSR was more reliable than other techniques when identifying relevant variables and their magnitudes of influence, especially in cases of small sample size and low tolerance. Finally, we present one example of PLSR to illustrate its application and interpretation in ecology.

Journal ArticleDOI
TL;DR: In this paper, the authors quantify two key characteristics of computer codes that affect the sample size required for a desired level of accuracy when approximating the code via a Gaussian process (GP) and provide reasons and evidence supporting the informal rule that the number of runs for an effective initial computer experiment should be about 10 times the input dimension.
Abstract: We provide reasons and evidence supporting the informal rule that the number of runs for an effective initial computer experiment should be about 10 times the input dimension. Our arguments quantify two key characteristics of computer codes that affect the sample size required for a desired level of accuracy when approximating the code via a Gaussian process (GP). The first characteristic is the total sensitivity of a code output variable to all input variables; the second corresponds to the way this total sensitivity is distributed across the input variables, specifically the possible presence of a few prominent input factors and many impotent ones (i.e., effect sparsity). Both measures relate directly to the correlation structure in the GP approximation of the code. In this way, the article moves toward a more formal treatment of sample size for a computer experiment. The evidence supporting these arguments stems primarily from a simulation study and via specific codes modeling climate and ligand activa...

Journal ArticleDOI
TL;DR: The unique factors influencing power in multilevel models and calculations for estimating power for simple fixed effects, variance components, and cross-level interactions are presented.
Abstract: The use of multilevel modeling to investigate organizational phenomena is rapidly increasing. Unfortunately, little advice is readily available for organizational researchers attempting to determin...

Journal ArticleDOI
TL;DR: The theoretical, simulation results and real data example demonstrate that the use of the stabilized weights in the pseudo data preserves the sample size of the original data, produces appropriate estimation of the variance of main effect, and maintains an appropriate type I error rate.

Journal ArticleDOI
01 Apr 2009-Ecology
TL;DR: This work develops the first statistically rigorous nonparametric method for estimating the minimum number of additional individuals, samples, or sampling area required to detect any arbitrary proportion of the estimated asymptotic species richness.
Abstract: Biodiversity sampling is labor intensive, and a substantial fraction of a biota is often represented by species of very low abundance, which typically remain undetected by biodiversity surveys. Statistical methods are widely used to estimate the asymptotic number of species present, including species not yet detected. Additional sampling is required to detect and identify these species, but richness estimators do not indicate how much sampling effort (additional individuals or samples) would be necessary to reach the asymptote of the species accumulation curve. Here we develop the first statistically rigorous nonparametric method for estimating the minimum number of additional individuals, samples, or sampling area required to detect any arbitrary proportion (including 100%) of the estimated asymptotic species richness. The method uses the Chao1 and Chao2 nonparametric estimators of asymptotic richness, which are based on the frequencies of rare species in the original sampling data. To evaluate the performance of the proposed method, we randomly subsampled individuals or quadrats from two large biodiversity inventories (light trap captures of Lepidoptera in Great Britain and censuses of woody plants on Barro Colorado Island [BCI], Panama). The simulation results suggest that the method performs well but is slightly conservative for small sample sizes. Analyses of the BCI results suggest that the method is robust to nonindependence arising from small-scale spatial aggregation of species occurrences. When the method was applied to seven published biodiversity data sets, the additional sampling effort necessary to capture all the estimated species ranged from 1.05 to 10.67 times the original sample (median approximately equal to 2.23). Substantially less effort is needed to detect 90% of the species (0.33-1.10 times the original effort; median approximately equal to 0.80). An Excel spreadsheet tool is provided for calculating necessary sampling effort for either abundance data or replicated incidence data.

Proceedings Article
21 Jul 2009
TL;DR: Improved constants for data dependent and variance sensitive confidence bounds are given, called empirical Bernstein bounds, and extended to hold uniformly over classes of functions whose growth function is polynomial in the sample size n, and sample variance penalization is considered.
Abstract: We give improved constants for data dependent and variance sensitive confidence bounds, called empirical Bernstein bounds, and extend these inequalities to hold uniformly over classes of functions whose growth function is polynomial in the sample size n. The bounds lead us to consider sample variance penalization, a novel learning method which takes into account the empirical variance of the loss function. We give conditions under which sample variance penalization is effective. In particular, we present a bound on the excess risk incurred by the method. Using this, we argue that there are situations in which the excess risk of our method is of order 1/n, while the excess risk of empirical risk minimization is of order 1/√n. We show some experimental results, which confirm the theory. Finally, we discuss the potential application of our results to sample compression schemes.

Journal ArticleDOI
TL;DR: An approximate Bayes factor is described that is straightforward to use and is appropriate when sample sizes are large, and various choices of the prior on the effect size are considered, including those that allow effect size to vary with the minor allele frequency of the marker.
Abstract: The Bayes factor is a summary measure that provides an alternative to the P-value for the ranking of associations, or the flagging of associations as "significant". We describe an approximate Bayes factor that is straightforward to use and is appropriate when sample sizes are large. We consider various choices of the prior on the effect size, including those that allow effect size to vary with the minor allele frequency (MAF) of the marker. An important contribution is the description of a specific prior that gives identical rankings between Bayes factors and P-values, providing a link between the two approaches, and allowing the implications of the use of P-values to be more easily understood. As a summary measure of noteworthiness P-values are difficult to calibrate since their interpretation depends on MAF and, crucially, on sample size. A consequence is that a consistent decision-making procedure using P-values requires a threshold for significance that reduces with sample size, contrary to common practice.

Journal ArticleDOI
TL;DR: If several small studies are pooled without consideration of the bias introduced by the inherent mathematical properties of the logistic regression model, researchers may be mislead to erroneous interpretation of the results.
Abstract: In epidemiological studies researchers use logistic regression as an analytical tool to study the association of a binary outcome to a set of possible exposures. Using a simulation study we illustrate how the analytically derived bias of odds ratios modelling in logistic regression varies as a function of the sample size. Logistic regression overestimates odds ratios in studies with small to moderate samples size. The small sample size induced bias is a systematic one, bias away from null. Regression coefficient estimates shifts away from zero, odds ratios from one. If several small studies are pooled without consideration of the bias introduced by the inherent mathematical properties of the logistic regression model, researchers may be mislead to erroneous interpretation of the results.

Journal ArticleDOI
TL;DR: This study compares some basic feature- selection methods in settings involving thousands of features, using both model-based synthetic data and real data and evaluates the performances of feature-selection algorithms for different distribution models and classifiers.

Journal ArticleDOI
TL;DR: In this article, the effect of dimension, sample size and strength of dependence on the nominal level and power of copula goodness-of-fit approaches are examined, three of which are proposed in this paper.
Abstract: Several copula goodness-of-fit approaches are examined, three of which are proposed in this paper. Results are presented from an extensive Monte Carlo study, where we examine the effect of dimension, sample size and strength of dependence on the nominal level and power of the different approaches. While no approach is always the best, some stand out and conclusions and recommendations are made. A novel study of p-value variation due to permutation order, for approaches based on Rosenblatt's transformation is also carried out. Results show significant variation due to permutation order for some of the approaches based on this transform. However, when approaching rejection regions, the additional variation is negligible.

Journal ArticleDOI
TL;DR: Only in certain cases, for instance, in estimating a value of the cumulative distribution function and when the assumed model is very different from the true model, can the use of dichotomized outcomes be considered a reasonable approach.
Abstract: Dichotomization is the transformation of a continuous outcome (response) to a binary outcome. This approach, while somewhat common, is harmful from the viewpoint of statistical estimation and hypothesis testing. We show that this leads to loss of information, which can be large. For normally distributed data, this loss in terms of Fisher's information is at least 1-2/pi (or 36%). In other words, 100 continuous observations are statistically equivalent to 158 dichotomized observations. The amount of information lost depends greatly on the prior choice of cut points, with the optimal cut point depending upon the unknown parameters. The loss of information leads to loss of power or conversely a sample size increase to maintain power. Only in certain cases, for instance, in estimating a value of the cumulative distribution function and when the assumed model is very different from the true model, can the use of dichotomized outcomes be considered a reasonable approach.

Journal ArticleDOI
TL;DR: In this paper, the authors extend the scope of empirical likelihood methodology ill three directions: to allow for plug-in estimates of nuisance parameters in estimating equations, slower than root n-rates of convergence, and settings in which there are a relatively large number of estimating equations compared to the sample size.
Abstract: This article extends the scope of empirical likelihood methodology ill three directions: to allow for plug-in estimates Of nuisance parameters in estimating equations, slower than root n-rates of convergence, and settings in which there are a relatively large number of estimating equations compared to the sample size. Calibrating empirical likelihood confidence regions with plug-in is sometimes intractable due to the complexity of the asymptotics, so we introduce a bootstrap approximation that call be used in such situations. We provide a range of examples from survival analysis and nonparametric statistics to illustrate the main results.

Journal ArticleDOI
TL;DR: In this paper, a rigorous simulation-based approach to power calculation that deals more comprehensively with analytic complexity has been implemented on the web as ESPRESSO: (www.p3gobservatory.org/powercalculator.htm).
Abstract: Background Despite earlier doubts, a string of recent successes indicates that if sample sizes are large enough, it is possible—both in theory and in practice—to identify and replicate genetic associations with common complex diseases. But human genome epidemiology is expensive and, from a strategic perspective, it is still unclear what ‘large enough’ really means. This question has critical implications for governments, funding agencies, bioscientists and the tax-paying public. Difficult strategic decisions with imposing price tags and important opportunity costs must be taken. Methods Conventional power calculations for case–control studies disregard many basic elements of analytic complexity—e.g. errors in clinical assessment, and the impact of unmeasured aetiological determinants—and can seriously underestimate true sample size requirements. This article describes, and applies, a rigorous simulation-based approach to power calculation that deals more comprehensively with analytic complexity and has been implemented on the web as ESPRESSO: (www.p3gobservatory.org/powercalculator.htm). Results Using this approach, the article explores the realistic power profile of stand-alone and nested case–control studies in a variety of settings and provides a robust quantitative foundation for determining the required sample size both of individual biobanks and of large disease-based consortia. Despite universal acknowledgment of the importance of large sample sizes, our results suggest that contemporary initiatives are still, at best, at the lower end of the range of desirable sample size. Insufficient power remains particularly problematic for studies exploring gene–gene or gene–environment interactions. Discussion Sample size calculation must be both accurate and realistic, and we must continue to strengthen national and international cooperation in the design, conduct, harmonization and integration of studies in human genome epidemiology.

Journal ArticleDOI
TL;DR: A hypothetical case is used to illustrate the process of designing a population genetics project, and results from simulations are presented that address several issues for maximizing statistical power to detect differentiation while minimizing the amount of effort in developing SNPs.
Abstract: Single nucleotide polymorphisms (SNPs) have been proposed by some as the new frontier for population studies, and several papers have presented theoretical and empirical evidence reporting the advantages and limitations of SNPs. As a practical matter, however, it remains unclear how many SNP markers will be required or what the optimal characteristics of those markers should be in order to obtain sufficient statistical power to detect different levels of population differentiation. We use a hypothetical case to illustrate the process of designing a population genetics project, and present results from simulations that address several issues for maximizing statistical power to detect differentiation while minimizing the amount of effort in developing SNPs. Results indicate that (i) while ~30 SNPs should be sufficient to detect moderate (F(ST) = 0.01) levels of differentiation, studies aimed at detecting demographic independence (e.g. F(ST) < 0.005) may require 80 or more SNPs and large sample sizes; (ii) different SNP allele frequencies have little affect on power, and thus, selection of SNPs can be relatively unbiased; (iii) increasing the sample size has a strong effect on power, so that the number of loci can be minimized when sample number is known, and increasing sample size is almost always beneficial; and (iv) power is increased by including multiple SNPs within loci and inferring haplotypes, rather than trying to use only unlinked SNPs. This also has the practical benefit of reducing the SNP ascertainment effort, and may influence the decision of whether to seek SNPs in coding or noncoding regions.

Journal ArticleDOI
12 May 2009-BMJ
TL;DR: Quality of reporting of sample size calculation is still inadequately reported, often erroneous, and based on assumptions that are frequently inaccurate, which raises questions about how sample size is calculated in randomised controlled trials.
Abstract: Objectives To assess quality of reporting of sample size calculation, ascertain accuracy of calculations, and determine the relevance of assumptions made when calculating sample size in randomised controlled trials. Design Review. Data sources We searched MEDLINE for all primary reports of two arm parallel group randomised controlled trials of superiority with a single primary outcome published in six high impact factor general medical journals between 1 January 2005 and 31 December 2006. All extra material related to design of trials (other articles, online material, online trial registration) was systematically assessed. Data extracted by use of a standardised form included parameters required for sample size calculation and corresponding data reported in results sections of articles. We checked completeness of reporting of the sample size calculation, systematically replicated the sample size calculation to assess its accuracy, then quantified discrepancies between a priori hypothesised parameters necessary for calculation and a posteriori estimates. Results Of the 215 selected articles, 10 (5%) did not report any sample size calculation and 92 (43%) did not report all the required parameters. The difference between the sample size reported in the article and the replicated sample size calculation was greater than 10% in 47 (30%) of the 157 reports that gave enough data to recalculate the sample size. The difference between the assumptions for the control group and the observed data was greater than 30% in 31% (n=45) of articles and greater than 50% in 17% (n=24). Only 73 trials (34%) reported all data required to calculate the sample size, had an accurate calculation, and used accurate assumptions for the control group. Conclusions Sample size calculation is still inadequately reported, often erroneous, and based on assumptions that are frequently inaccurate. Such a situation raises questions about how sample size is calculated in randomised controlled trials.

Journal ArticleDOI
TL;DR: This paper examined the relationship between sample size and effect size in education and found that there was a significant negative correlation between sample sizes and effect sizes, and the differences in effect sizes between small and large experiments were much greater than those between randomized and matched experiments.
Abstract: Research in fields other than education has found that studies with small sample sizes tend to have larger effect sizes than those with large samples. This article examines the relationship between sample size and effect size in education. It analyzes data from 185 studies of elementary and secondary mathematics programs that met the standards of the Best Evidence Encyclopedia. As predicted, there was a significant negative correlation between sample size and effect size. The differences in effect sizes between small and large experiments were much greater than those between randomized and matched experiments. Explanations for the effects of sample size on effect size are discussed.

Journal ArticleDOI
Khaled H. Hamed1
TL;DR: In this paper, a procedure for the calculation of the exact distribution of the Mann-Kendall trend test statistic for persistent data with an arbitrary correlation structure is presented for the AR(1) (first order autoregressive) model and the Fractional Gaussian Noise (FGN) model.

Journal ArticleDOI
TL;DR: The SSADT model when the degradation path follows a gamma process is introduced and under the constraint that the total experimental cost does not exceed a pre-specified budget, the optimal settings such as sample size, measurement frequency, and termination time are obtained by minimizing the approximate variance of the estimated MTTF of the lifetime distribution of the product.
Abstract: Step-stress accelerated degradation testing (SSADT) is a useful tool for assessing the lifetime distribution of highly reliable products (under a typical-use condition) when the available test items are very few. Recently, an optimal SSADT plan was proposed based on the assumption that the underlying degradation path follows a Wiener process. However, the degradation model of many materials (especially in the case of fatigue data) may be more appropriately modeled by a gamma process which exhibits a monotone increasing pattern. Hence, in practice, designing an efficient SSADT plan for a gamma degradation process is of great interest. In this paper, we first introduce the SSADT model when the degradation path follows a gamma process. Next, under the constraint that the total experimental cost does not exceed a pre-specified budget, the optimal settings such as sample size, measurement frequency, and termination time are obtained by minimizing the approximate variance of the estimated MTTF of the lifetime distribution of the product. Finally, an example is presented to illustrate the proposed method.

Journal ArticleDOI
TL;DR: Two methods for estimating confidence intervals (CIs) around absolute and relative changes in outcomes calculated from segmented regression parameter estimates are described and illustrated.

Journal ArticleDOI
TL;DR: A randomized algorithm is proposed that provides a probabilistic solution circumventing the potential conservatism of the bounds previously derived, and it is proved that the required sample size is inversely proportional to the accuracy for fixed confidence.
Abstract: In this paper, we study two general semi-infinite programming problems by means of a randomized strategy based on statistical learning theory. The sample size results obtained with this approach are generally considered to be very conservative by the control community. The first main contribution of this paper is to demonstrate that this is not necessarily the case. Utilizing as a starting point one-sided results from statistical learning theory, we obtain bounds on the number of required samples that are manageable for ldquoreasonablerdquo values of probabilistic confidence and accuracy. In particular, we show that the number of required samples grows with the accuracy parameter epsiv as 1/epsivln 1/epsiv , and this is a significant improvement when compared to the existing bounds which depend on 1/epsiv2ln 1/epsiv2. Secondly, we present new results for optimization and feasibility problems involving Boolean expressions consisting of polynomials. In this case, when the accuracy parameter is sufficiently small, an explicit bound that only depends on the number of decision variables, and on the confidence and accuracy parameters is presented. For convex optimization problems, we also prove that the required sample size is inversely proportional to the accuracy for fixed confidence. Thirdly, we propose a randomized algorithm that provides a probabilistic solution circumventing the potential conservatism of the bounds previously derived.