scispace - formally typeset
Search or ask a question

Showing papers on "Sample size determination published in 2012"


Journal ArticleDOI
TL;DR: A straightforward guide to understanding, selecting, calculating, and interpreting effect sizes for many types of data and to methods for calculating effect size confidence intervals and power analysis is provided.
Abstract: The Publication Manual of the American Psychological Association (American Psychological Association, 2001, American Psychological Association, 2010) calls for the reporting of effect sizes and their confidence intervals. Estimates of effect size are useful for determining the practical or theoretical importance of an effect, the relative contributions of factors, and the power of an analysis. We surveyed articles published in 2009 and 2010 in the Journal of Experimental Psychology: General, noting the statistical analyses reported and the associated reporting of effect size estimates. Effect sizes were reported for fewer than half of the analyses; no article reported a confidence interval for an effect size. The most often reported analysis was analysis of variance, and almost half of these reports were not accompanied by effect sizes. Partial η2 was the most commonly reported effect size estimate for analysis of variance. For t tests, 2/3 of the articles did not report an associated effect size estimate; Cohen's d was the most often reported. We provide a straightforward guide to understanding, selecting, calculating, and interpreting effect sizes for many types of data and to methods for calculating effect size confidence intervals and power analysis.

3,117 citations


Journal ArticleDOI
TL;DR: A simulation study compared the performance of robust normal theory maximum likelihood (ML) and robust categorical least squares (cat-LS) methodology for estimating confirmatory factor analysis models with ordinal variables and found cat-LS to be more sensitive to sample size and to violations of the assumption of normality of the underlying continuous variables.
Abstract: A simulation study compared the performance of robust normal theory maximum likelihood (ML) and robust categorical least squares (cat-LS) methodology for estimating confirmatory factor analysis models with ordinal variables. Data were generated from 2 models with 2-7 categories, 4 sample sizes, 2 latent distributions, and 5 patterns of category thresholds. Results revealed that factor loadings and robust standard errors were generally most accurately estimated using cat-LS, especially with fewer than 5 categories; however, factor correlations and model fit were assessed equally well with ML. Cat-LS was found to be more sensitive to sample size and to violations of the assumption of normality of the underlying continuous variables. Normal theory ML was found to be more sensitive to asymmetric category thresholds and was especially biased when estimating large factor loadings. Accordingly, we recommend cat-LS for data sets containing variables with fewer than 5 categories and ML when there are 5 or more categories, sample size is small, and category thresholds are approximately symmetric. With 6-7 categories, results were similar across methods for many conditions; in these cases, either method is acceptable.

1,472 citations


Journal ArticleDOI
TL;DR: The essentials in calculating power and sample size for a variety of applied study designs for a wide range of study designs are covered.
Abstract: Determining the optimal sample size for a study assures an adequate power to detect statistical significance. Hence, it is a critical step in the design of a planned research protocol. Using too many participants in a study is expensive and exposes more number of subjects to procedure. Similarly, if study is underpowered, it will be statistically inconclusive and may make the whole protocol a failure. This paper covers the essentials in calculating power and sample size for a variety of applied study designs. Sample size computation for single group mean, survey type of studies, 2 group studies based on means and proportions or rates, correlation studies and for case-control for assessing the categorical outcome are presented in detail.

691 citations


Journal ArticleDOI
TL;DR: A definition of effect size is proposed, which is purposely more inclusive than the way many have defined and conceptualized effect size, and it is unique with regard to linking effect size to a question of interest.
Abstract: The call for researchers to report and interpret effect sizes and their corresponding confidence intervals has never been stronger. However, there is confusion in the literature on the definition of effect size, and consequently the term is used inconsistently. We propose a definition for effect size, discuss 3 facets of effect size (dimension, measure/index, and value), outline 10 corollaries that follow from our definition, and review ideal qualities of effect sizes. Our definition of effect size is general and subsumes many existing definitions of effect size. We define effect size as a quantitative reflection of the magnitude of some phenomenon that is used for the purpose of addressing a question of interest. Our definition of effect size is purposely more inclusive than the way many have defined and conceptualized effect size, and it is unique with regard to linking effect size to a question of interest. Additionally, we review some important developments in the effect size literature and discuss the importance of accompanying an effect size with an interval estimate that acknowledges the uncertainty with which the population value of the effect size has been estimated. We hope that this article will facilitate discussion and improve the practice of reporting and interpreting effect sizes.

689 citations


Journal ArticleDOI
TL;DR: Trialists should calculate the appropriate size of a pilot study, just as they should the size of the main RCT, taking into account the twin needs to demonstrate efficiency in terms of recruitment and to produce precise estimates of treatment effect.

524 citations


Journal ArticleDOI
TL;DR: In this paper, the authors provide guidelines for determining the sample size (number of individuals and number of measurements per individual) required to accurately estimate the intraclass correlation coefficient (ICC).
Abstract: Summary 1. Researchers frequently take repeated measurements of individuals in a sample with the goal of quantifying the proportion of the total variation that can be attributed to variation among individuals vs. variation among measurements within individuals. The proportion of the variation attributed to variation among individuals is known as repeatability and is most frequently estimated as the intraclass correlation coefficient (ICC). The goal of our study is to provide guidelines for determining the sample size (number of individuals and number of measurements per individual) required to accurately estimate the ICC. 2. We report a range of ICCs from the literature and estimate 95% confidence intervals for these estimates. We introduce a predictive equation derived by Bonett (2002), and we test the assumptions of this equation through simulation. Finally, we create an R statistical package for the planning of experiments and estimation of ICCs. 3. Repeatability estimates were reported in 1·5% of the articles published in the journals surveyed. Repeatabilities tended to be highest when the ICC was used to estimate measurement error and lowest when it was used to estimate repeatability of behavioural and physiological traits. Few authors report confidence intervals, but our estimated 95% confidence intervals for published ICCs generally indicated a low level of precision associated with these estimates. This survey demonstrates the need for a protocol to estimate repeatability. 4. Analysis of the predictions from Bonett’s equation over a range of sample sizes, expected repeatabilities and desired confidence interval widths yields both analytical and intuitive guidelines for designing experiments to estimate repeatability. However, we find a tendency for the confidence interval to be underestimated by the equation when ICCs are high and overestimated when ICCs and the number of measurements per individual are low. 5. The sample size to use when estimating repeatability is a question pitting investigator effort against expected precision of the estimate. We offer guidelines that apply over a wide variety of ecological and evolutionary studies estimating repeatability, measurement error or heritability. Additionally, we provide the R package, icc, to facilitate analyses and determine the most economic use of resources when planning experiments to estimate repeatability.

505 citations


Journal ArticleDOI
TL;DR: In this article, the authors show that the common factors based on maximum likelihood are consistent for the size of the cross-section (n) and the sample size (T) going to infinity along any path of n and T and therefore maximum likelihood is viable for n large.
Abstract: Is maximum likelihood suitable for factor models in large cross-sections of time series? We answer this question from both an asymptotic and an empirical perspective. We show that estimates of the common factors based on maximum likelihood are consistent for the size of the cross-section (n) and the sample size (T) going to infinity along any path of n and T and that therefore maximum likelihood is viable for n large. The estimator is robust to misspecification of the cross-sectional and time series correlation of the the idiosyncratic components. In practice, the estimator can be easily implemented using the Kalman smoother and the EM algorithm as in traditional factor analysis.

497 citations


Journal ArticleDOI
TL;DR: Under the correct conditions, multiple instrument analyses are a promising approach for Mendelian randomisation studies, and further research is required into multiple imputation methods to address missing data issues in IV estimation.
Abstract: Mendelian randomisation analyses use genetic variants as instrumental variables (IVs) to estimate causal effects of modifiable risk factors on disease outcomes. Genetic variants typically explain a small proportion of the variability in risk factors; hence Mendelian randomisation analyses can require large sample sizes. However, an increasing number of genetic variants have been found to be robustly associated with disease-related outcomes in genome-wide association studies. Use of multiple instruments can improve the precision of IV estimates, and also permit examination of underlying IV assumptions. We discuss the use of multiple genetic variants in Mendelian randomisation analyses with continuous outcome variables where all relationships are assumed to be linear. We describe possible violations of IV assumptions, and how multiple instrument analyses can be used to identify them. We present an example using four adiposity-associated genetic variants as IVs for the causal effect of fat mass on bone density, using data on 5509 children enrolled in the ALSPAC birth cohort study. We also use simulation studies to examine the effect of different sets of IVs on precision and bias. When each instrument independently explains variability in the risk factor, use of multiple instruments increases the precision of IV estimates. However, inclusion of weak instruments could increase finite sample bias. Missing data on multiple genetic variants can diminish the available sample size, compared with single instrument analyses. In simulations with additive genotype-risk factor effects, IV estimates using a weighted allele score had similar properties to estimates using multiple instruments. Under the correct conditions, multiple instrument analyses are a promising approach for Mendelian randomisation studies. Further research is required into multiple imputation methods to address missing data issues in IV estimation.

494 citations


Journal ArticleDOI
TL;DR: Among transformation approaches, a general purpose rank-based inverse normal transformation was most beneficial, however, when samples were both small and extremely nonnormal, the permutation test often outperformed other alternatives, including various bootstrap tests.
Abstract: It is well known that when data are nonnormally distributed, a test of the significance of Pearson's r may inflate Type I error rates and reduce power. Statistics textbooks and the simulation literature provide several alternatives to Pearson's correlation. However, the relative performance of these alternatives has been unclear. Two simulation studies were conducted to compare 12 methods, including Pearson, Spearman's rank-order, transformation, and resampling approaches. With most sample sizes (n ≥ 20), Type I and Type II error rates were minimized by transforming the data to a normal shape prior to assessing the Pearson correlation. Among transformation approaches, a general purpose rank-based inverse normal transformation (i.e., transformation to rankit scores) was most beneficial. However, when samples were both small (n ≤ 10) and extremely nonnormal, the permutation test often outperformed other alternatives, including various bootstrap tests.

471 citations


Journal ArticleDOI
TL;DR: Monte Carlo simulation was used more extensively than previous research to evaluate PLS, multiple regression, and LISREL in terms of accuracy and statistical power under varying conditions of sample size, normality of the data, number of indicators per construct, reliability of the indicators, and complexity of the research model.
Abstract: There is a pervasive belief in the MIS research community that PLS has advantages over other techniques when analyzing small sample sizes or data with non-normal distributions. Based on these beliefs, major MIS journals have published studies using PLS with sample sizes that would be deemed unacceptably small if used with other statistical techniques. We used Monte Carlo simulation more extensively than previous research to evaluate PLS, multiple regression, and LISREL in terms of accuracy and statistical power under varying conditions of sample size, normality of the data, number of indicators per construct, reliability of the indicators, and complexity of the research model. We found that PLS performed as effectively as the other techniques in detecting actual paths, and not falsely detecting non-existent paths. However, because PLS (like regression) apparently does not compensate for measurement error, PLS and regression were consistently less accurate than LISREL. When used with small sample sizes, PLS, like the other techniques, suffers from increased standard deviations, decreased statistical power,and reduced accuracy. All three techniques were remarkably robust against moderate departures from normality, and equally so. In total, we found that the similarities in results across the three techniques were much stronger than the differences.

459 citations


Journal ArticleDOI
TL;DR: A much lower sample size was required with a strong effect size, common SNP, and increased LD, and it was found that case-parent studies require more samples than case-control studies.
Abstract: A sample size with sufficient statistical power is critical to the success of genetic association studies to detect causal genes of human complex diseases. Genome-wide association studies require much larger sample sizes to achieve an adequate statistical power. We estimated the statistical power with increasing numbers of markers analyzed and compared the sample sizes that were required in case-control studies and case-parent studies. We computed the effective sample size and statistical power using Genetic Power Calculator. An analysis using a larger number of markers requires a larger sample size. Testing a single-nucleotide polymorphism (SNP) marker requires 248 cases, while testing 500,000 SNPs and 1 million markers requires 1,206 cases and 1,255 cases, respectively, under the assumption of an odds ratio of 2, 5% disease prevalence, 5% minor allele frequency, complete linkage disequilibrium (LD), 1:1 case/control ratio, and a 5% error rate in an allelic test. Under a dominant model, a smaller sample size is required to achieve 80% power than other genetic models. We found that a much lower sample size was required with a strong effect size, common SNP, and increased LD. In addition, studying a common disease in a case-control study of a 1:4 case-control ratio is one way to achieve higher statistical power. We also found that case-parent studies require more samples than case-control studies. Although we have not covered all plausible cases in study design, the estimates of sample size and statistical power computed under various assumptions in this study may be useful to determine the sample size in designing a population-based genetic association study.

Journal ArticleDOI
TL;DR: A simple and effective sample size prediction algorithm that conducts weighted fitting of learning curves and outperformed an un-weighted algorithm described in previous literature can help researchers determine annotation sample size for supervised machine learning.
Abstract: Supervised learning methods need annotated data in order to generate efficient models. Annotated data, however, is a relatively scarce resource and can be expensive to obtain. For both passive and active learning methods, there is a need to estimate the size of the annotated sample required to reach a performance target. We designed and implemented a method that fits an inverse power law model to points of a given learning curve created using a small annotated training set. Fitting is carried out using nonlinear weighted least squares optimization. The fitted model is then used to predict the classifier's performance and confidence interval for larger sample sizes. For evaluation, the nonlinear weighted curve fitting method was applied to a set of learning curves generated using clinical text and waveform classification tasks with active and passive sampling methods, and predictions were validated using standard goodness of fit measures. As control we used an un-weighted fitting method. A total of 568 models were fitted and the model predictions were compared with the observed performances. Depending on the data set and sampling method, it took between 80 to 560 annotated samples to achieve mean average and root mean squared error below 0.01. Results also show that our weighted fitting method outperformed the baseline un-weighted method (p < 0.05). This paper describes a simple and effective sample size prediction algorithm that conducts weighted fitting of learning curves. The algorithm outperformed an un-weighted algorithm described in previous literature. It can help researchers determine annotation sample size for supervised machine learning.

Journal ArticleDOI
TL;DR: A criterion for increasing the sample size based on variance estimates obtained during the computation of a batch gradient, and establishes an O(1/\epsilon) complexity bound on the total cost of a gradient method.
Abstract: This paper presents a methodology for using varying sample sizes in batch-type optimization methods for large-scale machine learning problems. The first part of the paper deals with the delicate issue of dynamic sample selection in the evaluation of the function and gradient. We propose a criterion for increasing the sample size based on variance estimates obtained during the computation of a batch gradient. We establish an $${O(1/\epsilon)}$$ complexity bound on the total cost of a gradient method. The second part of the paper describes a practical Newton method that uses a smaller sample to compute Hessian vector-products than to evaluate the function and the gradient, and that also employs a dynamic sampling technique. The focus of the paper shifts in the third part of the paper to L 1-regularized problems designed to produce sparse solutions. We propose a Newton-like method that consists of two phases: a (minimalistic) gradient projection phase that identifies zero variables, and subspace phase that applies a subsampled Hessian Newton iteration in the free variables. Numerical tests on speech recognition problems illustrate the performance of the algorithms.

Journal ArticleDOI
14 Aug 2012-PLOS ONE
TL;DR: It is shown that the population sample size can be significantly reduced when using an appropriate estimator and a large number of bi-allelic genetic markers, and conservation genetic studies can now obtain almost the same statistical power as studies performed on model organisms using markers developed with next-generation sequencing.
Abstract: Population genetic studies provide insights into the evolutionary processes that influence the distribution of sequence variants within and among wild populations. FST is among the most widely used measures for genetic differentiation and plays a central role in ecological and evolutionary genetic studies. It is commonly thought that large sample sizes are required in order to precisely infer FST and that small sample sizes lead to overestimation of genetic differentiation. Until recently, studies in ecological model organisms incorporated a limited number of genetic markers, but since the emergence of next generation sequencing, the panel size of genetic markers available even in non-reference organisms has rapidly increased. In this study we examine whether a large number of genetic markers can substitute for small sample sizes when estimating FST. We tested the behavior of three different estimators that infer FST and that are commonly used in population genetic studies. By simulating populations, we assessed the effects of sample size and the number of markers on the various estimates of genetic differentiation. Furthermore, we tested the effect of ascertainment bias on these estimates. We show that the population sample size can be significantly reduced (as small as n = 4–6) when using an appropriate estimator and a large number of bi-allelic genetic markers (k>1,000). Therefore, conservation genetic studies can now obtain almost the same statistical power as studies performed on model organisms using markers developed with next-generation sequencing.

01 Jan 2012
TL;DR: In this article, sample standardized differences for continuous and categorical variables and how to interpret results are presented. And a SAS macro which performs the calculation without using the IML procedure is provided.
Abstract: Standardized difference scores are intuitive indexes which measure the effect size between two groups. Compared to a t&test or Wilcoxon rank&sum test, the y are independent of sample size. Thus, their use can be recommended for comparing baseline covariates in clinical trials as well as propensity&score matched studies. In this paper, we show how to calculate sample standardized differences for continuous and categorical variables and how to interpret results. We also provide a SAS macro which performs the calculation without using the IML procedure.

Journal ArticleDOI
TL;DR: Panel data and multivariate latent state-trait models are used to isolate reliable occasion-specific variance from random error and to estimate reliability for scores from single-item life satisfaction measures.
Abstract: Life satisfaction is often assessed using single-item measures. However, estimating the reliability of these measures can be difficult because internal consistency coefficients cannot be calculated. Existing approaches use longitudinal data to isolate occasion-specific variance from variance that is either completely stable or variance that changes systematically over time. In these approaches, reliable occasion-specific variance is typically treated as measurement error, which would negatively bias reliability estimates. In the current studies, panel data and multivariate latent state-trait models are used to isolate reliable occasion-specific variance from random error and to estimate reliability for scores from single-item life satisfaction measures. Across four nationally representative panel studies with a combined sample size of over 68,000, reliability estimates increased by an average of 16% when the multivariate model was used instead of the more standard univariate longitudinal model.

Journal ArticleDOI
TL;DR: A simulation study is used to compare the rejection rates of the Wilcoxon-Mann-Whitney (WMW) test and the two-sample t-test for increasing sample size and concludes non-parametric tests are most useful for small studies.
Abstract: During the last 30 years, the median sample size of research studies published in high-impact medical journals has increased manyfold, while the use of non-parametric tests has increased at the expense of t-tests. This paper explores this paradoxical practice and illustrates its consequences. A simulation study is used to compare the rejection rates of the Wilcoxon-Mann-Whitney (WMW) test and the two-sample t-test for increasing sample size. Samples are drawn from skewed distributions with equal means and medians but with a small difference in spread. A hypothetical case study is used for illustration and motivation. The WMW test produces, on average, smaller p-values than the t-test. This discrepancy increases with increasing sample size, skewness, and difference in spread. For heavily skewed data, the proportion of p<0.05 with the WMW test can be greater than 90% if the standard deviations differ by 10% and the number of observations is 1000 in each group. The high rejection rates of the WMW test should be interpreted as the power to detect that the probability that a random sample from one of the distributions is less than a random sample from the other distribution is greater than 50%. Non-parametric tests are most useful for small studies. Using non-parametric tests in large studies may provide answers to the wrong question, thus confusing readers. For studies with a large sample size, t-tests and their corresponding confidence intervals can and should be used even for heavily skewed data.

Journal ArticleDOI
TL;DR: In this article, the number of factors and the factor loadings are estimated in terms of an eigenanalysis for a nonnegative definite matrix, which is applicable when the dimension of time series is on the order of a few thousands.
Abstract: This paper deals with the factor modeling for high-dimensional time series based on a dimension-reduction viewpoint. Under stationary settings, the inference is simple in the sense that both the number of factors and the factor loadings are estimated in terms of an eigenanalysis for a nonnegative definite matrix, and is therefore applicable when the dimension of time series is on the order of a few thousands. Asymptotic properties of the proposed method are investigated under two settings: (i) the sample size goes to infinity while the dimension of time series is fixed; and (ii) both the sample size and the dimension of time series go to infinity together. In particular, our estimators for zero-eigenvalues enjoy faster convergence (or slower divergence) rates, hence making the estimation for the number of factors easier. In particular, when the sample size and the dimension of time series go to infinity together, the estimators for the eigenvalues are no longer consistent. However, our estimator for the number of the factors, which is based on the ratios of the estimated eigenvalues, still works fine. Furthermore, this estimation shows the so-called "blessing of dimensionality" property in the sense that the performance of the estimation may improve when the dimension of time series increases. A two-step procedure is investigated when the factors are of different degrees of strength. Numerical illustration with both simulated and real data is also reported.

Journal ArticleDOI
TL;DR: In this article, the effect of all-combinations model strategy and model averaging strategies on parameter estimates and variable selection was investigated in the Cormack-Jolly-Seber data type.
Abstract: One challenge an analyst often encounters when dealing with complex mark–recapture models is how to limit the number of a priori models. While all possible combinations of model structures on the different parameters (e.g., ϕ, p) can be considered, such a strategy often results in a burdensome number of models, leading to the use of ad hoc strategies to reduce the number of models constructed. For the Cormack–Jolly–Seber data type, one example of an ad hoc strategy is to hold a general ϕ model structure constant while investigating model structures on p, and then to hold the resulting best structure on p constant and investigate structures on ϕ. Many comparable strategies exist. The effect of following ad hoc strategies on parameter estimates as well as for variable selection and whether model averaging can ameliorate any problems are unknown. By means of a simulation study, we have investigated this informational gap by comparing the all-combinations model building strategy with two ad hoc strategies and with truth, as well as considering the results of model averaging. We found that model selection strategy had little effect on parameter estimator bias and precision and that model averaging did improve bias and precision slightly. In terms of variable selection (i.e., cumulative Akaike’s information criterion weights), model sets based on ad hoc strategies did not perform as well as those based on all combinations, as less important variables often had higher weights with the former than with the all possible combinations strategy. Increased sample size resulted in increased variable weights, with an infinite sample size resulting in all variable weights equaling 1 for variables with any predictive influence. Thus, the distinction between statistical importance (dependent on sample size) and biological importance must be recognized when utilizing cumulative weights. We recommend that all-combinations model strategy and model averaging be used. However, if an ad hoc strategy is relied upon to reduce the computational demand, parameter estimates will generally be comparable to the all-combinations strategy, but variable weights will not correspond to the all-combinations strategy.

Journal ArticleDOI
TL;DR: Two computer simulations were conducted to examine the findings of previous studies of testing mediation models and found that stagnation and decreases in statistical power as a function of the effect size of the a path occurred primarily when the path between M and Y, b, was small.
Abstract: Previous studies of different methods of testing mediation models have consistently found two anomalous results. The first result is elevated Type I error rates for the bias-corrected and accelerated bias-corrected bootstrap tests not found in nonresampling tests or in resampling tests that did not include a bias correction. This is of special concern as the bias-corrected bootstrap is often recommended and used due to its higher statistical power compared with other tests. The second result is statistical power reaching an asymptote far below 1.0 and in some conditions even declining slightly as the size of the relationship between X and M, a, increased. Two computer simulations were conducted to examine these findings in greater detail. Results from the first simulation found that the increased Type I error rates for the bias-corrected and accelerated bias-corrected bootstrap are a function of an interaction between the size of the individual paths making up the mediated effect and the sample size, such...

Journal ArticleDOI
TL;DR: In this paper, the number of factors and the factor loadings are estimated in terms of an eigenanalysis for a nonnegative definite matrix, which is applicable when the dimension of time series is on the order of a few thousands.
Abstract: This paper deals with the factor modeling for high-dimensional time series based on a dimension-reduction viewpoint. Under stationary settings, the inference is simple in the sense that both the number of factors and the factor loadings are estimated in terms of an eigenanalysis for a nonnegative definite matrix, and is therefore applicable when the dimension of time series is on the order of a few thousands. Asymptotic properties of the proposed method are investigated under two settings: (i) the sample size goes to infinity while the dimension of time series is fixed; and (ii) both the sample size and the dimension of time series go to infinity together. In particular, our estimators for zero-eigenvalues enjoy faster convergence (or slower divergence) rates, hence making the estimation for the number of factors easier. In particular, when the sample size and the dimension of time series go to infinity together, the estimators for the eigenvalues are no longer consistent. However, our estimator for the number of the factors, which is based on the ratios of the estimated eigenvalues, still works fine. Furthermore, this estimation shows the so-called “blessing of dimensionality” property in the sense that the performance of the estimation may improve when the dimension of time series increases. A two-step procedure is investigated when the factors are of different degrees of strength. Numerical illustration with both simulated and real data is also reported.

Journal ArticleDOI
TL;DR: This paper presents a method that explicitly incorporates a prespecified probability of achieving the prespecification width or lower limit of a confidence interval and the resultant closed-form formulas are shown to be very accurate.
Abstract: The number of subjects required to estimate the intraclass correlation coefficient in a reliability study has usually been determined on the basis of the expected width of a confidence interval. However, this approach fails to explicitly consider the probability of achieving the desired interval width and may thus provide sample sizes that are too small to have adequate chance of achieving the desired precision. In this paper, we present a method that explicitly incorporates a prespecified probability of achieving the prespecified width or lower limit of a confidence interval. The resultant closed-form formulas are shown to be very accurate. Copyright © 2012 John Wiley & Sons, Ltd.

Journal ArticleDOI
20 Dec 2012-PLOS ONE
TL;DR: The statistical approaches for several tests of hypothesis and power/sample size calculations are detailed and applied to taxonomic abundance distribution and rank abundance distribution data using HMP Jumpstart data on 24 subjects for saliva, subgingival, and supragingival samples.
Abstract: This paper presents new biostatistical methods for the analysis of microbiome data based on a fully parametric approach using all the data. The Dirichlet-multinomial distribution allows the analyst to calculate power and sample sizes for experimental design, perform tests of hypotheses (e.g., compare microbiomes across groups), and to estimate parameters describing microbiome properties. The use of a fully parametric model for these data has the benefit over alternative non-parametric approaches such as bootstrapping and permutation testing, in that this model is able to retain more information contained in the data. This paper details the statistical approaches for several tests of hypothesis and power/sample size calculations, and applies them for illustration to taxonomic abundance distribution and rank abundance distribution data using HMP Jumpstart data on 24 subjects for saliva, subgingival, and supragingival samples. Software for running these analyses is available.

Journal ArticleDOI
TL;DR: In this article, an estimator that is asymptotically equivalent to an oracle estimator suggested in previous work is presented, based on nonlinear transformations of the sample eigenvalues.
Abstract: Many statistical applications require an estimate of a covariance matrix and/or its inverse. When the matrix dimension is large compared to the sample size, which happens frequently, the sample covariance matrix is known to perform poorly and may suffer from ill-conditioning. There already exists an extensive literature concerning improved estimators in such situations. In the absence of further knowledge about the structure of the true covariance matrix, the most successful approach so far, arguably, has been shrinkage estimation. Shrinking the sample covariance matrix to a multiple of the identity, by taking a weighted average of the two, turns out to be equivalent to linearly shrinking the sample eigenvalues to their grand mean, while retaining the sample eigenvectors. Our paper extends this approach by considering nonlinear transformations of the sample eigenvalues. We show how to construct an estimator that is asymptotically equivalent to an oracle estimator suggested in previous work. As demonstrated in extensive Monte Carlo simulations, the resulting bona fide estimator can result in sizeable improvements over the sample covariance matrix and also over linear shrinkage.

Journal ArticleDOI
TL;DR: It is shown that the most accurate characterizations are achieved by using prior knowledge of where to expect neurodegeneration (hippocampus and parahippocampal gyrus) and that feature selection does improve the classification accuracies, but it depends on the method adopted.

Journal ArticleDOI
TL;DR: Sample size is an element of research design that significantly affects the validity and clinical relevance of the findings identified in research studies and appropriate planning optimises the likelihood of finding an important result that is both clinically and statistically meaningful.

Journal ArticleDOI
TL;DR: In this article, an estimator that is asymptotically equivalent to an oracle estimator suggested in previous work is presented, based on nonlinear transformations of the sample eigenvalues.
Abstract: Many statistical applications require an estimate of a covariance matrix and/or its inverse. When the matrix dimension is large compared to the sample size, which happens frequently, the sample covariance matrix is known to perform poorly and may suffer from ill-conditioning. There already exists an extensive literature concerning improved estimators in such situations. In the absence of further knowledge about the structure of the true covariance matrix, the most successful approach so far, arguably, has been shrinkage estimation. Shrinking the sample covariance matrix to a multiple of the identity, by taking a weighted average of the two, turns out to be equivalent to linearly shrinking the sample eigenvalues to their grand mean, while retaining the sample eigenvectors. Our paper extends this approach by considering nonlinear transformations of the sample eigenvalues. We show how to construct an estimator that is asymptotically equivalent to an oracle estimator suggested in previous work. As demonstrated in extensive Monte Carlo simulations, the resulting bona fide estimator can result in sizeable improvements over the sample covariance matrix and also over linear shrinkage.

Journal ArticleDOI
TL;DR: Respondent-driven sampling should be regarded as a (potentially superior) form of convenience sampling method, and caution is required when interpreting findings based on the sampling method.
Abstract: BACKGROUND: Respondent-driven sampling is a novel variant of link-tracing sampling for estimating the characteristics of hard-to-reach groups, such as HIV prevalence in sex workers. Despite its use by leading health organizations, the performance of this method in realistic situations is still largely unknown. We evaluated respondent-driven sampling by comparing estimates from a respondent-driven sampling survey with total population data. METHODS: Total population data on age, tribe, religion, socioeconomic status, sexual activity, and HIV status were available on a population of 2402 male household heads from an open cohort in rural Uganda. A respondent-driven sampling (RDS) survey was carried out in this population, using current methods of sampling (RDS sample) and statistical inference (RDS estimates). Analyses were carried out for the full RDS sample and then repeated for the first 250 recruits (small sample). RESULTS: We recruited 927 household heads. Full and small RDS samples were largely representative of the total population, but both samples underrepresented men who were younger, of higher socioeconomic status, and with unknown sexual activity and HIV status. Respondent-driven sampling statistical inference methods failed to reduce these biases. Only 31%-37% (depending on method and sample size) of RDS estimates were closer to the true population proportions than the RDS sample proportions. Only 50%-74% of respondent-driven sampling bootstrap 95% confidence intervals included the population proportion. CONCLUSIONS: Respondent-driven sampling produced a generally representative sample of this well-connected nonhidden population. However, current respondent-driven sampling inference methods failed to reduce bias when it occurred. Whether the data required to remove bias and measure precision can be collected in a respondent-driven sampling survey is unresolved. Respondent-driven sampling should be regarded as a (potentially superior) form of convenience sampling method, and caution is required when interpreting findings based on the sampling method.

Journal ArticleDOI
TL;DR: The author has given the definition of important key terms used in the calculation of sample size but left out the distinction between one-tailed and two-tailed situations, which is an important issue.
Abstract: Dear Editor, Thanks for writing a nice editorial on the importance of sample size and calculation in medical research.[1] The principle of sample size calculation and formulas to determine adequate sample have been explained for testing the hypothesis for single mean, two means and two proportions.[2] The author has given the definition of important key terms used in the calculation of sample size but left out the distinction between one-tailed and two-tailed situations, which is an important issue: also, standardized difference (generally called effect size) compares two populations, and it is equal to the clinically important difference between the populations divided by the standard deviation (SD) of the population, assuming both the populations SD are equal. In addition, there are many lapses in formulas and their description given by the author.

Journal ArticleDOI
TL;DR: Clinical trials should employ an ITT analysis strategy, comprising a design that attempts to follow up all randomised individuals, a main analysis that is valid under a stated plausible assumption about the missing data, and sensitivity analyses that include allrandomised individuals in order to explore the impact of departures from the assumption underlying the main analysis.
Abstract: BackgroundIntention-to-treat (ITT) analysis requires all randomised individuals to be included in the analysis in the groups to which they were randomised. However, there is confusion about how ITT analysis should be performed in the presence of missing outcome data.PurposesTo explain, justify, and illustrate an ITT analysis strategy for randomised trials with incomplete outcome data.MethodsWe consider several methods of analysis and compare their underlying assumptions, plausibility, and numbers of individuals included. We illustrate the ITT analysis strategy using data from the UK700 trial in the management of severe mental illness.ResultsDepending on the assumptions made about the missing data, some methods of analysis that include all randomised individuals may be less valid than methods that do not include all randomised individuals. Furthermore, some methods of analysis that include all randomised individuals are essentially equivalent to methods that do not include all randomised individuals.Limita...