scispace - formally typeset
Search or ask a question

Showing papers on "Sample size determination published in 2001"


01 Jan 2001
TL;DR: In this article, the procedures for determining sample size for continuous and categorical variables using Cochran's (1977) formulas are described, and a table is provided that can be used to select the sample size of a research problem based on three alpha levels and a set error rate.
Abstract: The determination of sample size is a common task for many organizational researchers. Inappropriate, inadequate, or excessive sample sizes continue to influence the quality and accuracy of research. This manuscript describes the procedures for determining sample size for continuous and categorical variables using Cochran’s (1977) formulas. A discussion and illustration of sample size formulas, including the formula for adjusting the sample size for smaller populations, is included. A table is provided that can be used to select the sample size for a research problem based on three alpha levels and a set error rate. Procedures for determining the appropriate sample size for multiple regression and factor analysis, and common issues in sample size determination are examined. Non-respondent sampling issues are addressed.

3,170 citations


Journal ArticleDOI
TL;DR: In this paper, the authors evaluated standard error, precision (inverse of standard error), variance, inverse of variance, sample size and log sample size (vertical axis) and log odds ratio, log risk ratio and risk difference (horizontal axis).

2,661 citations


Journal ArticleDOI
TL;DR: Suggestions for successful and meaningful sample size determination are offered and criticism is made of some ill-advised shortcuts relating to power and sample size.
Abstract: Sample size determination is often an important step in planning a statistical study—and it is usually a difficult one. Among the important hurdles to be surpassed, one must obtain an estimate of one or more error variances and specify an effect size of importance. There is the temptation to take some shortcuts. This article offers some suggestions for successful and meaningful sample size determination. Also discussed is the possibility that sample size may not be the main issue, that the real goal is to design a high-quality study. Finally, criticism is made of some ill-advised shortcuts relating to power and sample size.

1,060 citations


Journal ArticleDOI
TL;DR: Based on the empirical type I error rates, a regression of treatment effect on sample size, weighted by the inverse of the variance of the logit of the pooled proportion (using the marginal total) is the preferred method.
Abstract: Meta-analyses are subject to bias for many of reasons, including publication bias. Asymmetry in a funnel plot of study size against treatment effect is often used to identify such bias. We compare the performance of three simple methods of testing for bias: the rank correlation method; a simple linear regression of the standardized estimate of treatment effect on the precision of the estimate; and a regression of the treatment effect on sample size. The tests are applied to simulated meta-analyses in the presence and absence of publication bias. Both one-sided and two-sided censoring of studies based on statistical significance was used. The results indicate that none of the tests performs consistently well. Test performance varied with the magnitude of the true treatment effect, distribution of study size and whether a one- or two-tailed significance test was employed. Overall, the power of the tests was low when the number of studies per meta-analysis was close to that often observed in practice. Tests that showed the highest power also had type I error rates higher than the nominal level. Based on the empirical type I error rates, a regression of treatment effect on sample size, weighted by the inverse of the variance of the logit of the pooled proportion (using the marginal total) is the preferred method.

1,041 citations


Journal ArticleDOI
TL;DR: It is hypothesized that lack of fit of the model in the population will not, on the average, influence recovery of population factors in analysis of sample data, regardless of degree of model error and regardless of sample size.
Abstract: This article examines effects of sample size and other design features on correspondence between factors obtained from analysis of sample data and those present in the population from which the samples were drawn. We extend earlier work on this question by examining these phenomena in the situation in which the common factor model does not hold exactly in the population. We present a theoretical framework for representing such lack of fit and examine its implications in the population and sample. Based on this approach we hypothesize that lack of fit of the model in the population will not, on the average, influence recovery of population factors in analysis of sample data, regardless of degree of model error and regardless of sample size. Rather, such recovery will be affected only by phenomena related to sampling error which have been studied previously. These hypotheses are investigated and verified in two sampling studies, one using artificial data and one using empirical data.

901 citations


Journal ArticleDOI
TL;DR: In this paper, the authors evaluate the performance of the bootstrap resampling method for estimating model test statistic p values and parameter standard errors under non-normality data conditions.
Abstract: Though the common default maximum likelihood estimator used in structural equation modeling is predicated on the assumption of multivariate normality, applied researchers often find themselves with data clearly violating this assumption and without sufficient sample size to utilize distribution-free estimation methods. Fortunately, promising alternatives are being integrated into popular software packages. Bootstrap resampling, which is offered in AMOS (Arbuckle, 1997), is one potential solution for estimating model test statistic p values and parameter standard errors under nonnormal data conditions. This study is an evaluation of the bootstrap method under varied conditions of nonnormality, sample size, model specification, and number of bootstrap samples drawn from the resampling space. Accuracy of the test statistic p values is evaluated in terms of model rejection rates, whereas accuracy of bootstrap standard error estimates takes the form of bias and variability of the standard error estimates thems...

715 citations


Journal ArticleDOI
TL;DR: In rehabilitation intervention, effects larger than 12% of baseline score (6% of maximal score) can be attained and detected as MCID by the transition method in both the WOMAC and the SF-36.
Abstract: Objective To discuss the concepts of the minimal clinically important difference (MCID) and the smallest detectable difference (SDD) and to examine their relation to required sample sizes for future studies using concrete data of the condition-specific Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) and the generic Medical Outcomes Study 36-Item Short Form (SF-36) in patients with osteoarthritis of the lower extremities undergoing a comprehensive inpatient rehabilitation intervention. Methods SDD and MCID were determined in a prospective study of 122 patients before a comprehensive inpatient rehabilitation intervention and at the 3-month followup. MCID was assessed by the transition method. Required SDD and sample sizes were determined by applying normal approximation and taking into account the calculation of power. Results In the WOMAC sections the SDD and MCID ranged from 0.51 to 1.33 points (scale 0 to 10), and in the SF-36 sections the SDD and MCID ranged from 2.0 to 7.8 points (scale 0 to 100). Both questionnaires showed 2 moderately responsive sections that led to required sample sizes of 40 to 325 per treatment arm for a clinical study with unpaired data or total for paired followup data. Conclusion In rehabilitation intervention, effects larger than 12% of baseline score (6% of maximal score) can be attained and detected as MCID by the transition method in both the WOMAC and the SF-36. Effects of this size lead to reasonable sample sizes for future studies lying below n = 300. The same holds true for moderately responsive questionnaire sections with effect sizes higher than 0.25. When designing studies, assumed effects below the MCID may be detectable but are clinically meaningless.

680 citations


Journal ArticleDOI
TL;DR: Two Monte Carlo simulations are presented that compare the efficacy of the Hedges and colleagues, Rosenthal-Rubin, and Hunter-Schmidt methods for combining correlation coefficients for cases in which population effect sizes were both fixed and variable.
Abstract: The efficacy of the Hedges and colleagues, Rosenthal-Rubin, and Hunter-Schmidt methods for combining correlation coefficients was tested for cases in which population effect sizes were both fixed and variable. After a brief tutorial on these meta-analytic methods, the author presents two Monte Carlo simulations that compare these methods for cases in which the number of studies in the meta-analysis and the average sample size of studies were varied. In the fixed case the methods produced comparable estimates of the average effect size; however, the HunterSchmidt method failed to control the Type I error rate for the associated significance tests. In the variable case, for both the Hedges and colleagues and HunterSchmidt methods, Type I error rates were not controlled for meta-analyses including 15 or fewer studies and the probability of detecting small effects was less than .3. Some practical recommendations are made about the use of meta-analysis .

677 citations


Journal ArticleDOI
TL;DR: The ability of formal interaction tests to identify subgroup effects improved as the size of the interaction increased relative to the overall treatment effect, and the performance of the formal interaction test was generally superior to that of the subgroup-specific analyses.
Abstract: BACKGROUND Subgroup analyses are common in randomised controlled trials (RCTs). There are many easily accessible guidelines on the selection and analysis of subgroups but the key messages do not seem to be universally accepted and inappropriate analyses continue to appear in the literature. This has potentially serious implications because erroneous identification of differential subgroup effects may lead to inappropriate provision or withholding of treatment. OBJECTIVES (1) To quantify the extent to which subgroup analyses may be misleading. (2) To compare the relative merits and weaknesses of the two most common approaches to subgroup analysis: separate (subgroup-specific) analyses of treatment effect and formal statistical tests of interaction. (3) To establish what factors affect the performance of the two approaches. (4) To provide estimates of the increase in sample size required to detect differential subgroup effects. (5) To provide recommendations on the analysis and interpretation of subgroup analyses. METHODS The performances of subgroup-specific and formal interaction tests were assessed by simulating data with no differential subgroup effects and determining the extent to which the two approaches (incorrectly) identified such an effect, and simulating data with a differential subgroup effect and determining the extent to which the two approaches were able to (correctly) identify it. Initially, data were simulated to represent the 'simplest case' of two equal-sized treatment groups and two equal-sized subgroups. Data were first simulated with no differential subgroup effect and then with a range of types and magnitudes of subgroup effect with the sample size determined by the nominal power (50-95%) for the overall treatment effect. Additional simulations were conducted to explore the individual impact of the sample size, the magnitude of the overall treatment effect, the size and number of treatment groups and subgroups and, in the case of continuous data, the variability of the data. The simulated data covered the types of outcomes most commonly used in RCTs, namely continuous (Gaussian) variables, binary outcomes and survival times. All analyses were carried out using appropriate regression models, and subgroup effects were identified on the basis of statistical significance at the 5% level. RESULTS While there was some variation for smaller sample sizes, the results for the three types of outcome were very similar for simulations with a total sample size of greater than or equal to 200. With simulated simplest case data with no differential subgroup effects, the formal tests of interaction were significant in 5% of cases as expected, while subgroup-specific tests were less reliable and identified effects in 7-66% of cases depending on whether there was an overall treatment effect. The most common type of subgroup effect identified in this way was where the treatment effect was seen to be significant in one subgroup only. When a simulated differential subgroup effect was included, the results were dependent on the nominal power of the simulated data and the type and magnitude of the subgroup effect. However, the performance of the formal interaction test was generally superior to that of the subgroup-specific analyses, with more differential effects correctly identified. In addition, the subgroup-specific analyses often suggested the wrong type of differential effect. The ability of formal interaction tests to (correctly) identify subgroup effects improved as the size of the interaction increased relative to the overall treatment effect. When the size of the interaction was twice the overall effect or greater, the interaction tests had at least the same power as the overall treatment effect. However, power was considerably reduced for smaller interactions, which are much more likely in practice. The inflation factor required to increase the sample size to enable detection of the interaction with the same power as the overall effect varied with the size of the interaction. For an interaction of the same magnitude as the overall effect, the inflation factor was 4, and this increased dramatically to of greater than or equal to 100 for more subtle interactions of < 20% of the overall effect. Formal interaction tests were generally robust to alterations in the number and size of the treatment and subgroups and, for continuous data, the variance in the treatment groups, with the only exception being a change in the variance in one of the subgroups. In contrast, the performance of the subgroup-specific tests was affected by almost all of these factors with only a change in the number of treatment groups having no impact at all. CONCLUSIONS While it is generally recognised that subgroup analyses can produce spurious results, the extent of the problem is almost certainly under-estimated. This is particularly true when subgroup-specific analyses are used. In addition, the increase in sample size required to identify differential subgroup effects may be substantial and the commonly used 'rule of four' may not always be sufficient, especially when interactions are relatively subtle, as is often the case. CONCLUSIONS--RECOMMENDATIONS FOR SUBGROUP ANALYSES AND THEIR INTERPRETATION: (1) Subgroup analyses should, as far as possible, be restricted to those proposed before data collection. Any subgroups chosen after this time should be clearly identified. (2) Trials should ideally be powered with subgroup analyses in mind. However, for modest interactions, this may not be feasible. (3) Subgroup-specific analyses are particularly unreliable and are affected by many factors. Subgroup analyses should always be based on formal tests of interaction although even these should be interpreted with caution. (4) The results from any subgroup analyses should not be over-interpreted. Unless there is strong supporting evidence, they are best viewed as a hypothesis-generation exercise. In particular, one should be wary of evidence suggesting that treatment is effective in one subgroup only. (5) Any apparent lack of differential effect should be regarded with caution unless the study was specifically powered with interactions in mind. CONCLUSIONS--RECOMMENDATIONS FOR RESEARCH: (1) The implications of considering confidence intervals rather than p-values could be considered. (2) The same approach as in this study could be applied to contexts other than RCTs, such as observational studies and meta-analyses. (3) The scenarios used in this study could be examined more comprehensively using other statistical methods, incorporating clustering effects, considering other types of outcome variable and using other approaches, such as Bootstrapping or Bayesian methods.

509 citations


Journal ArticleDOI
TL;DR: It is argued that attempts at bias correction give unsatisfactory results, and that pointwise estimation in an independent data set may be the only way of obtaining reliable estimates of locus-specific effect-and then only if one does not condition on statistical significance being obtained.
Abstract: The primary goal of a genomewide scan is to estimate the genomic locations of genes influencing a trait of interest. It is sometimes said that a secondary goal is to estimate the phenotypic effects of each identified locus. Here, it is shown that these two objectives cannot be met reliably by use of a single data set of a currently realistic size. Simulation and analytical results, based on variance-components linkage analysis as an example, demonstrate that estimates of locus-specific effect size at genomewide LOD score peaks tend to be grossly inflated and can even be virtually independent of the true effect size, even for studies on large samples when the true effect size is small. However, the bias diminishes asymptotically. The explanation for the bias is that the LOD score is a function of the locus-specific effect-size estimate, such that there is a high correlation between the observed statistical significance and the effect-size estimate. When the LOD score is maximized over the many pointwise tests being conducted throughout the genome, the locus-specific effect-size estimate is therefore effectively maximized as well. We argue that attempts at bias correction give unsatisfactory results, and that pointwise estimation in an independent data set may be the only way of obtaining reliable estimates of locus-specific effect—and then only if one does not condition on statistical significance being obtained. We further show that the same factors causing this bias are responsible for frequent failures to replicate initial claims of linkage or association for complex traits, even when the initial localization is, in fact, correct. The findings of this study have wide-ranging implications, as they apply to all statistical methods of gene localization. It is hoped that, by keeping this bias in mind, we will more realistically interpret and extrapolate from the results of genomewide scans.

500 citations


01 Jan 2001
TL;DR: In this paper, the procedures for determining sample size for continuous and categorical variables using Cochran's (1977) formulas are described, and a table is provided that can be used to select the sample size of a research problem based on three alpha levels and a set error rate.
Abstract: The determination of sample size is a common task for many organizational researchers Inappropriate, inadequate, or excessive sample sizes continue to influence the quality and accuracy of research This manuscript describes the procedures for determining sample size for continuous and categorical variables using Cochran’s (1977) formulas A discussion and illustration of sample size formulas, including the formula for adjusting the sample size for smaller populations, is included A table is provided that can be used to select the sample size for a research problem based on three alpha levels and a set error rate Procedures for determining the appropriate sample size for multiple regression and factor analysis, and common issues in sample size determination are examined Non-respondent sampling issues are addressed

Journal ArticleDOI
TL;DR: The authors consider how choices in the duration of the study, frequency of observation, and number of participants affect statistical power and show that power depends on a standardized effect size, the sample size, and a person-specific reliability coefficient.
Abstract: Consider a study in which 2 groups are followed over time to assess group differences in the average rate of change, rate of acceleration, or higher degree polynomial effect. In designing such a study, one must decide on the duration of the study, frequency of observation, and number of participants. The authors consider how these choices affect statistical power and show that power depends on a standardized effect size, the sample size, and a person-specific reliability coefficient. This reliability, in turn, depends on study duration and frequency. These relations enable researchers to weigh alternative designs with respect to feasibility and power. The authors illustrate the approach using data from published studies of antisocial thinking during adolescence and vocabulary growth during infancy.

Journal ArticleDOI
TL;DR: In this paper, an effect size measure was developed for the logistic regression (LR) procedure for differential item functioning (DIF) detection, which is a model-based approach designed to identify both uniform and non-uniform DIF.
Abstract: The logistic regression (LR) procedure for differential item functioning (DIF) detection is a model-based approach designed to identify both uniform and nonuniform DIF. However, this procedure tends to produce inflated Type I errors. This outcome is problematic because it can result in the inefficient use of testing resources, and it may interfere with the study of the underlying causes of DIF. Recently, an effect size measure was developed for the LR DIF procedure and a classification method was proposed. However, the effect size measure and classification method have not been systematically investigated. In this study, we developed a new classification method based on those established for the Simultaneous Item Bias Test. A simulation study also was conducted to determine if the effect size measure affects the Type I error and power rates for the LR DIF procedure across sample sizes, ability distributions, and percentage of DIF items included on a test. The results indicate that the inclusion of the eff...

Journal ArticleDOI
TL;DR: It is shown that confidence intervals better inform readers about the possibility of an inadequate sample size than do post hoc power calculations.
Abstract: Using a hypothetical scenario typifying the experience that authors have when submitting manuscripts that report results of negative clinical trials, the pitfalls of a post hoc analysis are illustrated. We used the same scenario to explain how confidence intervals are used in interpreting results of clinical trials. We showed that confidence intervals better inform readers about the possibility of an inadequate sample size than do post hoc power calculations.

Journal ArticleDOI
TL;DR: In this article, the robustness of the multilevel factor and path analysis with unequal groups, small sample sizes at both the individual and the group level, in the presence of a low or a high intraclass correlation (ICC), was assessed.
Abstract: Hierarchical structured data cause problems in analysis, because the usual assumptions of independently and identically distributed variables are violated. Muthen (1989) described an estimation method for multilevel factor and path analysis with hierarchical data. This article assesses the robustness of the method with unequal groups, small sample sizes at both the individual and the group level, in the presence of a low or a high intraclass correlation (ICC). The within-groups part of the model poses no problems. The most important problem in the between-groups part of the model is the occurrence of inadmissible estimates, especially when group level sample size is small (50) while the intracluster correlation is low. This is partly compensated by using large group sizes. When an admissible solution is reached, the factor loadings are generally accurate. However, the residual variances are underestimated, and the standard errors are generally too small. Having more or larger groups or a higher ICC does n...

Journal ArticleDOI
TL;DR: Tables for single-phase II trials based on the exact binomial distribution are presented, and if the upper success rate is accepted, the lower success rate may be included in the final confidence interval for the proportion being estimated.
Abstract: Tables for single-phase II trials based on the exact binomial distribution are presented. These are preferable to those generated using Fleming's design, which are based on the normal approximation and can give rise to anomalous results. For example, if the upper success rate is accepted, the lower success rate, which the trial is designed to reject, may be included in the final confidence interval for the proportion being estimated.

Journal ArticleDOI
TL;DR: This critique will stress the inappropriateness of considering precision solely in the context of increasing N, or using sample sizes of 400 and more, as appears to be Charter's main objective or desideratum.
Abstract: (2001). Methodological Commentary The Precision of Reliability and Validity Estimates Re-Visited: Distinguishing Between Clinical and Statistical Significance of Sample Size Requirements. Journal of Clinical and Experimental Neuropsychology: Vol. 23, No. 5, pp. 695-700.

Journal ArticleDOI
TL;DR: An international meta-analysis of individual-patient data on the CCR5, CCR2, and SDF-1 alleles was conducted; data were contributed by 19 teams of investigators and a common protocol was developed in collaboration with research teams identified through these efforts.
Abstract: The burgeoning information on the human genome creates opportunities and challenges for studies of disease associations. Because genetic differences often produce modest effects, many patients must be studied to reach definitive conclusions. In the absence of a single large study, meta-analysis of individual-patient data (1 - 3) from smaller studies offers a way to assemble an adequate sample size. This approach is based on a unifying protocol that has standardized analytic definitions. When the protocol is applied to data contributed by most investigators working in a field, this method can provide more convincing results than a simple pooling of data or a meta-analysis of published reports (3). A meta-analysis of individual-patient data is also superior to a meta-analysis of published reports for examining differences in reported results.

Journal ArticleDOI
Daniel J. Sargent1
15 Apr 2001-Cancer
TL;DR: Artificial neural networks have many attractive theoretic properties, specifically, the ability to detect non predefined relations such as nonlinear effects and/or interactions, but this comes at the cost of reduced interpretability of the model output.
Abstract: BACKGROUND. In recent years, considerable attention has been given to the development of sophisticated techniques for exploring data sets. One such class of techniques is artificial neural networks (ANNs). Artificial neural networks have many attractive theoretic properties, specifically, the ability to detect non predefined relations such as nonlinear effects andlor interactions. These theoretic advantages come at the cost of reduced interpretability of the model output. Many authors have analyzed the same data set, based on these factors, with both standard statistical methods (such as logistic or Cox regression) and ANN. METHODS. The goal of this work is to review the literature comparing the performance of ANN with standard statistical techniques when applied to medium to large data sets (sample size > 200 patients). A thorough literature search was performed, with specific criteria for a published comparison to be included in this review. RESULTS. In the 28 studies included in this review, ANN outperformed regression in 10 cases (36%), was outperformed by regression in 4 cases (14%), and the 2 methods had similar performance in the remaining 14 cases (50%). However, in the 8 largest studies (sample size > 5000), regression and ANN tied in 7 cases, with regression winning in the remaining case. In addition, there is some suggestion of publication bias. CONCLUSIONS. Neither method achieves the desired performance. Both methods should continue to be used and explored in a complementary manner. However, based on the available data, ANN should not replace standard statistical approaches as the method of choice for the classification of medical data.

Book
08 Feb 2001
TL;DR: The objective of monitoring is to identify and select among Priorities for Sampling Design, Analysis of Trends, and Qualitative Techniques For Monitoring the most effective ways to collect and manage data.
Abstract: Preface. 1. Introduction To Monitoring. 2. Monitoring Overview. 3. Selecting Among Priorities. 4. Qualitative Techniques For Monitoring. 5. General Field Techniques. 6. Data Collection And Data Management. 7. Basic Principles Of Sampling. 8. Sampling Design. 9. Statistical Analysis. 10. Analysis Of Trends. 11. Selecting Random Samples. 12. Field Techniques For Measuring Vegetation. 13. Specialized Sampling Methods And Field Techniques For Animals. 14. Objectives. 15. Communication And Monitoring Plans. Appendix I: Monitoring Communities. Appendix II: Sample Size Equations. Appendix III: Confidence Interval Equations. Appendix IV: Sample Size And Confidence Intervals For Complex Sampling Designs. Literature Cited. Index References

Journal ArticleDOI
TL;DR: In this article, the authors investigated whether it was appropriate to use spatial interpolation methods with limited (n = 46), coarse-scaled (1188 ha) soils data from a Vertisol plain.
Abstract: Spatial interpolation methods are frequently used to characterize patterns in soil properties over various spatial scales provided that the data are abundant and spatially dependent. Establishing these criteria involved comparisons of abundant data from many fine-scaled (<100 ha) investigations. In this study we investigated whether it was appropriate to use spatial interpolation methods with limited (n = 46), coarse-scaled (1188 ha) soils data from a Vertisol plain. Methods investigated included ordinary kriging, inverse-distance weighting, and thin-plate smoothing splines with tensions. Comparison was based on accuracy and effectiveness measures, and analyzed using ANOVA and pairwise comparison t-tests. Results indicated that spatial interpolation was appropriate when the data exhibited smooth and consistent patterns of spatial dependency within the study area and the selected ranges of estimation and weighting used in this investigation. Nine of twelve soil properties we investigated exhibited characteristics other than these, however, including independent data, variable and erratic behavior, and extreme values. Our sample design may have been an important factor as well. Ordinary kriging and inverse-distance weighting were similarly accurate and effective methods; thin-plate smoothing splines with tensions was not. Results illustrate that sample size is as important for coarse-scale investigations as it is for fine-scale investigations with most soils data. However, our ability to predict successfully with some of our data raises the question as to the exact nature of the relationship between accuracy, sample size, and sample spacing, and to what extent these factors are related to the property under investigation, particularly when data are limited.

Journal ArticleDOI
TL;DR: In this article, the authors considered a variant of the competing risks problem in which a terminal event censors a non-terminal event, but not vice versa, and formulated the joint distribution of the events via a gamma frailty model in the upper wedge where data are observable, with the marginal distributions unspecified.
Abstract: SUMMARY We consider a variation of the competing risks problem in which a terminal event censors a non-terminal event, but not vice versa. The joint distribution of the events is formulated via a gamma frailty model in the upper wedge where data are observable (Day et al., 1997), with the marginal distributions unspecified. An estimator for the association parameter is obtained from a concordance estimating function. A novel plug-in estimator for the marginal distribution of the non-terminal event is shown to be uniformly consistent and to converge weakly to a Gaussian process. The assumptions on the joint distribution outside the upper wedge are weaker than those usually made in competing risks analyses. Simulations demonstrate that the methods work well with practical sample sizes. The proposals are illustrated with data on morbidity and mortality in leukaemia patients.

Journal ArticleDOI
TL;DR: In this paper, the authors present standardized effect size measures for latent mean differences inferred from both structured means modeling and MIMIC approaches to hypothesis testing about differences among means on a single latent construct, which are then related to post hoc power analysis, a priori sample size determination, and a relevant measure of construct reliability.
Abstract: While effect size estimates, post hoc power estimates, and a priori sample size determination are becoming a routine part of univariate analyses involving measured variables (e.g., ANOVA), such measures and methods have not been articulated for analyses involving latent means. The current article presents standardized effect size measures for latent mean differences inferred from both structured means modeling and MIMIC approaches to hypothesis testing about differences among means on a single latent construct. These measures are then related to post hoc power analysis, a priori sample size determination, and a relevant measure of construct reliability.

Journal ArticleDOI
TL;DR: The design considerations specific to implementation research studies are described, focusing particularly on the estimation of sample size requirements and on the need for reliable information on intracluster correlation coefficients for both effectiveness and efficiency outcomes.
Abstract: The cluster randomized trial with a concurrent economic evaluation is considered the gold standard evaluative design for the conduct of implementation research evaluating different strategies to promote the transfer of research findings into clinical practice. This has implications for the planning of such studies, as information is needed on the effects of clustering on both effectiveness and efficiency outcomes. This paper describes the design considerations specific to implementation research studies, focusing particularly on the estimation of sample size requirements and on the need for reliable information on intracluster correlation coefficients for both effectiveness and efficiency outcomes.

Journal ArticleDOI
TL;DR: In this article, an empirically based simulation using a behavioral model, generated a probability distribution from those data, and randomly selected locations from that distribution in a chronological sequence as the simulated individual moved through its home range.
Abstract: Simulations are necessary to assess the performance of home-range estimators because the true distribution of empirical data is unknown, but we must question whether that performance applies to empirical data. Some studies have used empirically based simulations, randomly selecting subsets of data to evaluate estimator performance, but animals do not move randomly within a home range. We created an empirically based simulation using a behavioral model, generated a probability distribution from those data, and randomly selected locations from that distribution in a chronological sequence as the simulated individual moved through its home range. Thus, we examined the influence of temporal patterns of space use and determined the effects of smoothing, number of locations, and autocorrelation on kernel estimates. Additionally, home-range estimators were designed to evaluate species that use space with few restrictions, traveling almost anywhere on the landscape. Many species, however, confine their movements ...

Journal ArticleDOI
TL;DR: In this paper, the authors investigated the effect of sample size, measured-variable reliability, and the number of measured variables per factor on the performance of maximum likelihood confirmatory factor analysis.
Abstract: A number of authors have proposed that determining an adequate sample size in structural equation modeling can be aided by considering the number of parameters to be estimated. This study directly investigates this assumption in the context of maximum likelihood confirmatory factor analysis. The findings support previous research on the effect of sample size, measured-variable reliability, and the number of measured variables per factor. However, no practically significant effect was found for the number of observations per estimated parameter.

Journal ArticleDOI
TL;DR: It is shown that the pseudo-likelihood method gives more accurate and precise estimates of Ne than the F-statistic method, and the performance difference is mainly due to the presence of rare alleles in the samples.
Abstract: A pseudo maximum likelihood method is proposed to estimate effective population size (Ne) using temporal changes in allele frequencies at multi-allelic loci The computation is simplified dramatically by (1) approximating the multi-dimensional joint probabilities of all the data by the product of marginal probabilities (hence the name pseudo-likelihood), (2) exploiting the special properties of transition matrix and (3) using a hidden Markov chain algorithm Simulations show that the pseudo-likelihood method has a similar performance but needs much less computing time and storage compared with the full likelihood method in the case of 3 alleles per locus Due to computational developments, I was able to assess the performance of the pseudo-likelihood method against the F-statistic method over a wide range of parameters by extensive simulations It is shown that the pseudo-likelihood method gives more accurate and precise estimates of Ne than the F-statistic method, and the performance difference is mainly due to the presence of rare alleles in the samples The pseudo-likelihood method is also flexible and can use three or more temporal samples simultaneously to estimate satisfactorily the NeS of each period, or the growth parameters of the population The accuracy and precision of both methods depend on the ratio of the product of sample size and the number of generations involved to Ne, and the number of independent alleles used In an application of the pseudo-likelihood method to a large data set of an olive fly population, more precise estimates of Ne are obtained than those from the F-statistic method

Journal ArticleDOI
TL;DR: In this paper, the potential for scaling equal-channel angular pressing (ECAP) for use with large samples was investigated by conducting tests on an aluminum alloy using cylinders having diameters from 6-40 mm.
Abstract: The potential for scaling equal-channel angular pressing (ECAP) for use with large samples was investigated by conducting tests on an aluminum alloy using cylinders having diameters from 6–40 mm. The results show the refinement of the microstructure and the subsequent mechanical properties after pressing are independent of the initial size of the sample and, for the largest sample with a diameter of 40 mm, independent of the location within the sample at least to a distance of ∼5 mm from the sample edge. By making direct measurements of the imposed load during ECAP, it is shown that the applied load is determined by the sample strength rather than frictional effects between the sample and the die walls. The results demonstrate the feasibility of scaling ECAP to large sizes for use in industrial applications.

Journal ArticleDOI
TL;DR: Two new likelihood‐ratio test statistics for multi‐generational quantitative traits to test either for linkage in the presence of allelic association or for allelic associations in the absence of linkage, such as may be due to linkage disequilibrium are investigated.
Abstract: Spielman et al. [1993] proposed a transmission-disequilibrium test (TDT), based on marker data collected on affected offspring and their parents, to test for linkage between a genetic marker and a binary trait provided there is allelic association. It has been shown that this TDT is powerful and is not affected by allelic association due to population stratification in the absence of linkage. For quantitative traits, George and Elston [1987] proposed a likelihood method to detect the effect of a candidate gene in pedigree data when familial correlations are present. This test will detect allelic association but will do so in the absence of linkage. In this paper, we investigate two new likelihood-ratio test statistics for multi-generational quantitative traits to test either for linkage in the presence of allelic association or for allelic association in the presence of linkage, such as may be due to linkage disequilibrium. We compare these two tests analytically and by simulation with respect to 1) the sample size required for the asymptotic null distributions to be valid and 2) their power to detect association in those cases in which they are not sensitive to population stratification unless linkage is present. In general, 80 nuclear families with two children each and at least one heterozygous parent, or the equivalent number of children in large pedigrees, are enough for the asymptotic null distribution of the proposed conditional and TDT methods to be valid. The theoretical power is close to the simulated power except for the case of a recessive allele with low frequency. A sampling strategy is proposed that dramatically improves power.

Journal ArticleDOI
TL;DR: An approximation of the expected width of the 95 per cent confidence interval of the ICC is derived and is shown to be of good accuracy and can therefore be used reliably in reproducibility studies.
Abstract: Reproducibility of a quantitative outcome is usually assessed by means of the intraclass correlation coefficient (ICC). When we are interested in assessing reproducibility from only one sample, we suggest that the study be planned with regards to the expected width of the 95 per cent confidence interval of the ICC. An approximation of this latter width is derived, which allows appraisal of the influence of n the number of subjects and p the number of replicates. Through simulation studies, the approximation is shown to be of good accuracy and can therefore be used reliably. Optimal designs are also discussed such as the optimal distribution between the number of subjects and the number of replicates per subject for a fixed total number of measures.