scispace - formally typeset
Search or ask a question

Showing papers on "Sample size determination published in 1999"


Journal ArticleDOI
TL;DR: A fundamental misconception about this issue is that the minimum sample size required to obtain factor solutions that are adequately stable and that correspond closely to population factors is not the optimal sample size.
Abstract: The factor analysis literature includes a range of recommendations regarding the minimum sample size necessary to obtain factor solutions that are adequately stable and that correspond closely to population factors. A fundamental misconception about this issue is that the minimum sample size, or the

4,166 citations


Journal ArticleDOI
TL;DR: In this paper, a Monte Carlo simulation study was conducted to investigate the effects on structural equation modeling (SEM) fit indexes of sample size, estimation method, and model specification, and two primary conclusions were suggested: (a) some fit indexes appear to be noncomparable in terms of the information they provide about model fit for misspecified models and (b) estimation method strongly influenced almost all the fit indexes examined.
Abstract: A Monte Carlo simulation study was conducted to investigate the effects on structural equation modeling (SEM) fit indexes of sample size, estimation method, and model specification. Based on a balanced experimental design, samples were generated from a prespecified population covariance matrix and fitted to structural equation models with different degrees of model misspecification. Ten SEM fit indexes were studied. Two primary conclusions were suggested: (a) some fit indexes appear to be noncomparable in terms of the information they provide about model fit for misspecified models and (b) estimation method strongly influenced almost all the fit indexes examined, especially for misspecified models. These 2 issues do not seem to have drawn enough attention from SEM practitioners. Future research should study not only different models vis‐a‐vis model complexity, but a wider range of model specification conditions, including correctly specified models and models specified incorrectly to varying degrees.

1,516 citations


Journal ArticleDOI
TL;DR: It is recommended that home range studies using kernel estimates use LSCV to determine the amount of smoothing, obtain a minimum of 30 observations per animal (but preferably >50), and report sample sizes in published results.
Abstract: Kernel methods for estimating home range are being used increasingly in wildlife research, but the effect of sample size on their accuracy is not known. We used computer simulations of 10-200 points/ home range and compared accuracy of home range estimates produced by fixed and adaptive kernels with the reference (REF) and least-squares cross-validation (LSCV) methods for determining the amount of smoothing. Simulated home ranges varied from simple to complex shapes created by mixing bivariate normal distributions. We used the size of the 95% home range area and the relative mean squared error of the surface fit to assess the accuracy of the kernel home range estimates. For both measures, the bias and variance approached an asymptote at about 50 observations/home range. The fixed kernel with smoothing selected by LSCV provided the least-biased estimates of the 95% home range area. All kernel methods produced similar surface fit for most simulations, but the fixed kernel with LSCV had the lowest frequency and magnitude of very poor estimates. We reviewed 101 papers published in The Journal of Wildlife Management (JWM) between 1980 and 1997 that estimated animal home ranges. A minority of these papers used nonparametric utilization distribution (UD) estimators, and most did not adequately report sample sizes. We recommend that home range studies using kernel estimates use LSCV to determine the amount of smoothing, obtain a minimum of 30 observations per animal (but preferably >50), and report sample sizes in published results.

1,366 citations


Book
15 Sep 1999
TL;DR: A short history of sequential and group sequential methods can be found in this paper, where the authors present a road map for the application of two-sided tests for comparing two treatments with normal response of known variance.
Abstract: INTRODUCTION About This Book Why Sequential Methods A Short History of Sequential and Group Sequential Methods Chapter Organization: A Roadmap Bibliography and Notes TWO-SIDED TESTS: INTRODUCTION Two-Sided Tests for Comparing Two Treatments with Normal Response of Known Variance A Fixed Sample Test Group Sequential Tests Pocock's Test O'Brien and Fleming's Test Properties of Pocock and O'Brien and Fleming Tests Other Tests Conclusions TWO-SIDED TESTS: GENERAL APPLICATIONS A Unified Formulation Applying the Tests with Equal Group Sizes Applying the Tests with Unequal Increments in Information Normal Linear Models Other Parametric Models Binary Data: Group Sequential Tests for Proportions The Group Sequential Log-Rank Test for Survival Data Group Sequential t-Tests ONE-SIDED TESTS Introduction The Power Family of One-Sided Group Sequential Tests Adapting Power Family Tests to Unequal Increments in Information Group Sequential One-Sided t-Tests Whitehead's Triangular Test TWO-SIDED TESTS WITH EARLY STOPPING UNDER THE NULL HYPOTHESIS Introduction The Power Family of Two-Sided, Inner Wedge Tests Whitehead's Double Triangular Test EQUIVALENCE TESTS Introduction One-Sided Tests of Equivalence Two-Sided Tests of Equivalence: Application to Comparative Bioavailability Studies Individual Bioequivalence: A One-Sided Test for Proportions Bibliography and Notes FLEXIBLE MONITORING: THE ERROR SPENDING APPROACH Unpredictable Information Sequences Two-Sided Tests One-Sided Tests Data Dependent Timing of Analyses Computations for Error Spending Tests ANALYSIS FOLLOWING A SEQUENTIAL TEST Introduction Distribution Theory Point Estimation P-Values Confidence intervals REPEATED CONFIDENCE INTERVALS Introduction Example: Difference of Normal Means Derived Tests: Use of RCIs to Aid Early Stopping Decisions Repeated P-Values Discussion STOCHASTIC CURTAILMENT Introduction Conditional Power Approach Predictive Power Approach A Parameter-Free Approach A Case Study with Survival Data Bibliography and Notes GENERAL GROUP SEQUENTIAL DISTRIBUTION THEORY Introduction A Standard Joint Distribution for Successive Estimates of a Parameter Vector Normal Linear Models Normal Linear Models with Unknown Variance: Group Sequential t-Tests Example: An Exact One-Sample Group Sequential t-Test General Parametric Models: Generalized Linear Models Connection with Survival Analysis BINARY DATA A Single Bernoulli Probability Two Bernoulli Probabilities The Odds Ratio and Multiple 2 x 2 Tables Case-Control and Matched Pair Analysis Logistic Regression: Adjusting for Covariates Bibliography and Notes SURVIVAL DATA Introduction The Log Rank Test The Stratified Log-Rank Test Group Sequential Methods for Survival Data with Covariates Repeated Confidence Intervals for a Hazard Ratio Example: A Clinical Trial for Carcinoma of the Oropharynx Survival Probabilities and Quantiles Bibliography and Notes INTERNAL PILOT STUDIES: SAMPLE SIZE RE-ESTIMATION The Role of an Internal Pilot Phase Sample Size Re-estimation for a Fixed Sample Test Sample Size Re-estimation in Group Sequential Tests MULTIPLE ENDPOINTS Introduction The Bonferroni Procedure A Group Sequential Hotelling Test A Group Sequential Version of O'Brien's Test Tests Based on other Global Statistics Tests Based on Marginal Criteria Bibliography and Notes MULTI-ARMED TRIALS Introduction Global Tests Monitoring Pairwise Comparisons Bibliography and Notes ADAPTIVE TREATMENT ASSIGNMENT A Multi-Stage Adaptive Design A Multi-Stage Adaptive Design with Time Trends Validity of Adaptive Multi-Stage Procedures Bibliography and Notes BAYESIAN APPROACHES The Bayesian Paradigm Stopping Rules Choice of Prior Distribution Discussion NUMERICAL COMPUTATIONS FOR GROUP SEQUENTIAL TESTS Introduction The Basic Calculation Error Probabilities and Sample Size Distributions Tests Defined by Error Spending Functions Analysis Following a Group Sequential Test Further Applications of Numerical Computation Computer Software

1,138 citations


Book
25 May 1999
TL;DR: In this paper, the concept of confounding identification of confounders is used to assess the risk factors at several levels of the risk ratio and relative risk measures of difference in a case study.
Abstract: PREFACE FUNDAMENTAL ISSUES What is Epidemiology? Case Studies: The Work of Doll and Hill Populations and Samples Measuring Disease Measuring the Risk Factor Causality Studies Using Routine Data Study Design Data Analysis Exercises BASIC ANALYTICAL PROCEDURES Introduction Case Study Types of Variables Tables and Charts Inferential Techniques for Categorical Variables Descriptive Techniques for Quantitative Variables Inferences about Means Inferential Techniques for Non-Normal Data Measuring Agreement Assessing Diagnostic Tests Exercises ASSESSING RISK FACTORS Risk and Relative Risk Odds and Odds Ratio Relative Risk or Odds Ratio? Prevalence Studies Testing Association Risk Factors Measured at Several Levels Attributable Risk Rate and Relative Rate Measures of Difference Exercises CONFOUNDING AND INTERACTION Introduction The Concept of Confounding Identification of Confounders Assessing Confounding Standardization Mantel-Haenszel Methods The Concept of Interaction Testing for Interaction Dealing with Interaction Exercises COHORT STUDIES Design Considerations Analytical Considerations Cohort Life Tables Kaplan-Meier Estimation Comparison of Two Sets of Survival Probabilities The Person-Years Method Period-Cohort Analysis Exercises CASE-CONTROL STUDIES Basic Design Concepts Basic Methods of Analysis Selection of Cases Selection of Controls Matching The Analysis of Matched Studies Nested Case-Control Studies Case-Cohort Studies Case-Crossover Studies Exercises INTERVENTION STUDIES Introduction Ethical Considerations Avoidance of Bias Parallel Group Studies Cross-Over Studies Sequential Studies Allocation to Treatment Group Exercises SAMPLE SIZE DETERMINATION Introduction Power Testing a Mean Value Testing a Difference Between Means Testing a Proportion Testing a Relative Risk Case-Control Studies Complex Sampling Designs Concluding Remarks Exercises MODELLING QUANTITATIVE OUTCOME VARIABLES Statistical Models One Categorical Explanatory Variable One Quantitative Explanatory Variable Two Categorical Explanatory Variables Model Building General Linear Models Several Explanatory Variables Model Checking Confounding Longitudinal Data Non-Normal Alternatives Exercises MODELLING BINARY OUTCOME DATA Introduction Problems with Standard Regression Models Logistic Regression Interpretation of Logistic Regression Coefficients Generic Data Multiple Logistic Regression Models Tests of Hypotheses Confounding Interaction Model Checking Regression Dilution Case-Control Studies Outcomes with Several Ordered Levels Longitudinal Data Complex Sampling Designs Exercises MODELLING FOLLOW-UP DATA Introduction Basic Functions of Survival Time Estimating the Hazard Function Probability Models Proportional Hazards Regression Models The Cox Proportional Hazards Model The Weibull Proportional Hazards Model Model Checking Poisson Regression Pooled Logistic Regression Exercises META-ANALYSIS Reviewing Evidence Systematic Review A General Approach to Pooling Investigating Heterogeneity Pooling Tabular Data Individual Participant Data Dealing with Aspects of Study Quality Publication Bias Is Meta-Analysis a Valid Tool in Epidemiology? Exercises APPENDIX A: MATERIALS AVAILABLE FROM THE WEBSITE APPENDIX B: STATISTICAL TABLES APPENDIX C: EXAMPLE DATA SETS SOLUTIONS TO EXERCISES REFERENCES INDEX

730 citations


Journal ArticleDOI
TL;DR: The main objective of the program is to increase the colorectal cancer screening participation rate among the deprived population through the intervention of a navigator to promote the Fecal Occult Blood Test (FOBT) and complementary exams.
Abstract: Background Cluster-randomized trials, in which health interventions are allocated randomly to intact clusters or communities rather than to individual subjects, are increasingly being used to evaluate disease control strategies both in industrialized and in developing countries. Sample size computations for such trials need to take into account between-cluster variation, but field epidemiologists find it difficult to obtain simple guidance on such procedures. Methods In this paper, we provide simple formulae for sample size determination for both unmatched and pair-matched trials. Outcomes considered include rates per person-year, proportions and means. For simplicity, formulae are expressed in terms of the coefficient of variation (SD/mean) of cluster rates, proportions or means. Guidance is also given on the estimation of this value, with or without the use of prior data on between-cluster variation. Case studies The methods are illustrated using two case studies: an unmatched trial of the impact of impregnated bednets on child mortality in Kenya, and a pair-matched trial of improved sexually-transmitted disease (STD) treatment services for HIV prevention in Tanzania.

688 citations


Journal ArticleDOI
TL;DR: This article studies the small sample behavior of several test statistics that are based on maximum likelihood estimator, but are designed to perform better with nonnormal data.
Abstract: Structural equation modeling is a well-known technique for studying relationships among multivariate data. In practice, high dimensional nonnormal data with small to medium sample sizes are very common, and large sample theory, on which almost all modeling statistics are based, cannot be invoked for model evaluation with test statistics. The most natural method for nonnormal data, the asymptotically distribution free procedure, is not defined when the sample size is less than the number of nonduplicated elements in the sample covariance. Since normal theory maximum likelihood estimation remains defined for intermediate to small sample size, it may be invoked but with the probable consequence of distorted performance in model evaluation. This article studies the small sample behavior of several test statistics that are based on maximum likelihood estimator, but are designed to perform better with nonnormal data. We aim to identify statistics that work reasonably well for a range of small sample sizes and distribution conditions. Monte Carlo results indicate that Yuan and Bentler's recently proposed F-statistic performs satisfactorily.

553 citations


Journal ArticleDOI
TL;DR: A method for group sequential trials that is based on the inverse normal method for combining the results of the separate stages is proposed, which enables data-driven sample size reassessments during the course of the study.
Abstract: A method for group sequential trials that is based on the inverse normal method for combining the results of the separate stages is proposed. Without exaggerating the Type I error rate, this method enables data-driven sample size reassessments during the course of the study. It uses the stopping boundaries of the classical group sequential tests. Furthermore, exact test procedures may be derived for a wide range of applications. The procedure is compared with the classical designs in terms of power and expected sample size.

538 citations


Journal ArticleDOI
TL;DR: A new group sequential test procedure is developed by modifying the weights used in the traditional repeated significance two-sample mean test, which has the type I error probability preserved at the target level and can provide a substantial gain in power with the increase of sample size.
Abstract: In group sequential clinical trials, sample size reestimation can be a complicated issue when it allows for change of sample size to be influenced by an observed sample path. Our simulation studies show that increasing sample size based on an interim estimate of the treatment difference can substantially inflate the probability of type I error in most practical situations. A new group sequential test procedure is developed by modifying the weights used in the traditional repeated significance two-sample mean test. The new test has the type I error probability preserved at the target level and can provide a substantial gain in power with the increase of sample size. Generalization of the new procedure is discussed.

497 citations


Journal ArticleDOI
TL;DR: Stratified randomization is important only for small trials in which treatment outcome may be affected by known clinical factors that have a large effect on prognosis, large trials when interim analyses are planned with small numbers of patients, and trials designed to show the equivalence of two therapies.

465 citations


Journal ArticleDOI
TL;DR: The authors compared empirical type I error and power of different permutation techniques for the test of significance of a single partial regression coefficient in a multiple regression model, using simulations, and found that two methods that had been identified as equivalent formulations of permutation under the reduced model were actually quite different.
Abstract: This study compared empirical type I error and power of different permutation techniques for the test of significance of a single partial regression coefficient in a multiple regression model, using simulations. The methods compared were permutation of raw data values, two alternative methods proposed for permutation of residuals under the reduced model, and permutation of residuals under the full model. The normal-theory t-test was also included in simulations. We investigated effects of (1) the sample size, (2) the degree of collinearity between the predictor variables, (3) the size of the covariable’s parameter, (4) the distribution of the added random error and (5) the presence of an outlier in the covariable on these methods. We found that two methods that had been identified as equivalent formulations of permutation under the reduced model were actually quite different. One of these methods resulted in consistently inflated type 1 error. In addition, when the covariable contained an extreme outlier,...

Journal ArticleDOI
TL;DR: FST‐based estimates are always better than RST when sample sizes are moderate or small and the number of loci scored is low, and therefore it is concluded that in many cases the most conservative approach is to use FST.
Abstract: We compare the performance of Nm estimates based on FST and RST obtained from microsatellite data using simulations of the stepwise mutation model with range constraints in allele size classes. The results of the simulations suggest that the use of microsatellite loci can lead to serious overestimations of Nm, particularly when population sizes are large (N > 5000) and range constraints are high (K < 20). The simulations also indicate that, when population sizes are small (N ≤ 500) and migration rates are moderate (Nm ≈ 2), violations to the assumption used to derive the Nm estimators lead to biased results. Under ideal conditions, i.e. large sample sizes (ns ≥ 50) and many loci (nl ≥ 20), RST performs better than FST for most of the parameter space. However, FST-based estimates are always better than RST when sample sizes are moderate or small (ns ≤ 10) and the number of loci scored is low (nl < 20). These are the conditions under which many real investigations are carried out and therefore we conclude that in many cases the most conservative approach is to use FST.

Journal ArticleDOI
TL;DR: In this article, a comparison of two groups in terms of single degree-of-freedom contrasts of population means across the study timepoints is presented to provide specified levels of power for tests of significance from a longitudinal design allowing for subject attrition.
Abstract: Formulas for estimating sample sizes are presented to provide specified levels of power for tests of significance from a longitudinal design allowing for subject attrition. These formulas are derived for a comparison of two groups in terms of single degree-of-freedom contrasts of population means across the study timepoints. Contrasts of this type can often capture the main and interaction effects in a two-group repeated measures design. For example, a two-group comparison of either an average across time or a specific trend across time (e.g., linear or quadratic) can be considered. Since longitudinal data with attrition are often analyzed using an unbalanced repeated measures model (with a structured variance-covariance matrix for the repeated measures) or a random-effects model for incomplete longitudinal data, the variance-covariance matrix of the repeated measures is allowed to assume a variety of forms. Tables are presented listing sample size determinations assuming compound symmetry, a first-order ...

Journal ArticleDOI
01 Jan 1999-Genetics
TL;DR: It is shown that with an appropriate evaluation of the coverage probability a support region is approximately a confidence region, and a theroretical explanation of the empirical observation that the size of the support regions is proportional to the sample size, not the square root of the samplesize, as one might expect from standard statistical theory.
Abstract: Lander and Botstein introduced statistical methods for searching an entire genome for quantitative trait loci (QTL) in experimental organisms, with emphasis on a backcross design and QTL having only additive effects. We extend their results to intercross and other designs, and we compare the power of the resulting test as a function of the magnitude of the additive and dominance effects, the sample size and intermarker distances. We also compare three methods for constructing confidence regions for a QTL: likelihood regions, Bayesian credible sets, and support regions. We show that with an appropriate evaluation of the coverage probability a support region is approximately a confidence region, and we provide a theroretical explanation of the empirical observation that the size of the support region is proportional to the sample size, not the square root of the sample size, as one might expect from standard statistical theory.

Journal ArticleDOI
TL;DR: A meta-analysis of 33 studies found a moderate relationship between maternal depression and child behaviour problems in children 1 year of age and older and covaried significantly with the predictors of sample size and quality index scores.
Abstract: Children of depressed mothers are not only at risk for the development of psychopathology, but also for behaviour problems. A meta-analysis of 33 studies was conducted to determine the magnitude of the relationship between maternal depression and behaviour problems in children 1 year of age and older. Substantive, methodological, and miscellaneous variables were extracted and coded by both the researcher and a research assistant. The initial inter-rater agreement reached in coding these variables ranged from 85% to 100%. Effect sizes were calculated in three ways: unweighted, weighted by sample size, and weighted by quality index score. The mean effect size for the r index ranged from 0.29 when weighted by sample size to 0.35 when unweighted, indicating a moderate relationship between maternal depression and child behaviour problems. Children between the ages of 1-18 whose mothers were depressed displayed more conduct behaviour problems than children whose mothers were not depressed. The magnitude of this relationship covaried significantly with the predictors of sample size and quality index scores. Implications for future research are addressed.

Journal ArticleDOI
TL;DR: This work analyzed the simple measure of richness, the first- and second-order jackknife, and the bootstrap estimators with simulation and resampling methods to examine the effects of sample size on estimator performance.
Abstract: Species richness is a widely used surrogate for the more complex concept of biological diversity. Because species richness is often central to ecological study and the establishment of conservation priorities, the biases and merits of richness measurements demand evaluation. The jackknife and bootstrap estimators can be used to compensate for the underestimation associated with simple richness estimation (or the sum of species counted in a sample). Using data from five forest communities, we analyzed the simple measure of richness, the first- and second-order jackknife, and the bootstrap estimators with simulation and resampling methods to examine the effects of sample size on estimator performance. Performance parameters examined were systematic under- or overestimation (bias), ability to estimate consistently (precision), and ability to estimate true species richness (accuracy). For small sample sizes in all studied communities (less than ∼25% of the total community), the least biased estimator was the ...

Book
29 Dec 1999
TL;DR: In this paper, the BCA algorithm has been used to estimate the standard error of a BCA CI for a normal population mean and for a Normal Population Mean for a nonparametric population variance.
Abstract: PREFACE: DATA ANALYSIS BY RESAMPLING PART I: RESAMPLING CONCEPTS INTRODUCTION CONCEPTS 1: TERMS AND NOTATION Case, Attributes, Scores, and Treatments / Experimental and Observational Studies / Data Sets, Samples, and Populations / Parameters, Statistics, and Distributions / Distribution Functions APPLICATIONS 1: CASES, ATTRIBUTES, AND DISTRIBUTIONS Attributes, Scores, Groups, and Treatments / Distributions of Scores and Statistics / Exercises CONCEPTS 2: POPULATIONS AND RANDOM SAMPLES Varieties of Populations / Random Samples APPLICATIONS 2: RANDOM SAMPLING Simple Random Samples / Exercises CONCEPTS 3: STATISTICS AND SAMPLING DISTRIBUTIONS Statistics and Estimators / Accuracy of Estimation / The Sampling Distribution / Bias of an Estimator / Standard Error of a Statistic / RMS Error of an Estimator / Confidence Interval APPLICATIONS 3: SAMPLING DISTRIBUTION COMPUTATIONS Exercises CONCEPTS 4: TESTING POPULATION HYPOTHESES Population Statistical Hypotheses / Population Hypothesis Testing APPLICATIONS 4: NULL SAMPLING DISTRIBUTION P-VALUES The p-value of a Directional Test / The p-value of a Nondirectional Test / Exercises CONCEPTS 5: PARAMETRICS, PIVOTALS, AND ASYMPTOTICS The Unrealizable Sampling Distribution / Sampling Distribution of a Sample Mean / Parametric Population Distributions / Pivotal Form Statistics / Asymptotic Sampling Distributions / Limitations of the Mathematical Approach APPLICATIONS 5: CIs FOR NORMAL POPULATION MEAN AND VARIANCE CI for a Normal Population Mean / CI for a Normal Population Variance / Nonparametric CI Estimation / Exercises CONCEPTS 6: LIMITATIONS OF PARAMETRIC INFERENCE Range and Precision of Scores / Size of Population / Size of Sample / Roughness of Population Distribution / Parameters and Statistics of Interests / Scarcity of Random Samples / Resampling Inference APPLICATIONS 6: RESAMPLING APPROACHES TO INFERENCE Exercises CONCEPTS 7: THE REAL AND BOOTSTRAP WORLDS The Real World of Population Inference / The Bootstrap World of Population Inference / Real World Population Distribution Estimates / Nonparametric Population Estimates / Sample Size and Distribution Estimates APPLICATIONS 7: BOOTSTRAP POPULATION DISTRIBUTIONS Nonparametric Population Estimates / Exercises CONCEPTS 8: THE BOOTSTRAP SAMPLING DISTRIBUTION The Bootstrap Conjecture / Complete Bootstrap Sampling Distributions / Monte Carlo Bootstrap Estimate of Standard Error / The Bootstrap Estimate of Bias / Simple Bootstrap CI Estimates APPLICATIONS 8: BOOTSTRAP SE, BIAS, AND CI ESTIMATES Example / Exercises CONCEPTS 9: BETTER BOOTSTRAP CIs: THE BOOTSTRAP-T Pivotal Form Statistics / The Bootstrap-t Pivotal Transformation / Forming Bootstrap-t CIs / Estimating the Standard Error of an Estimate / Range of Applications of the Bootstrap-t / Iterated Bootstrap CIs APPLICATIONS 9: SE AND CIs FOR TRIMMED MEANS Definition of the Trimmed Mean / Importance of the Trimmed Mean / A Note on Outliers / Determining the Trimming Fraction / Sampling Distribution of the Trimmed Mean / Applications / Exercises CONCEPTS 10: BETTER BOOTSTRAP CIs: BCA INTERVALS Bias Corrected and Accelerated CI Estimates / Applications of BCA CI / Better Confidence Interval Estimates APPLICATIONS 10: USING CI CORRECTION FACTORS Requirements for a BCA CI / Implementations of the BCA Algorithm / Exercise CONCEPTS 11: BOOTSTRAP HYPOTHESIS TESTING CIs, Null Hypothesis Tests, and p-values / Bootstrap-t Hypothesis Testing / Bootstrap Hypothesis Testing Alternatives / CI Hypothesis Testing / Confidence Intervals or p-values? APPLICATIONS 11: BOOTSTRAP P-VALUES Computing a Bootstrap-t p-value / Fixed-alpha CIs and Hypothesis Testing / Computing a BCI CI p-Value / Exercise CONCEPTS 12: RANDOMIZED TREATMENT ASSIGNMENT Two Functions of Randomization / Randomization of Sampled Cases / Randomization of Two Available Cases / Statistical Basis for Local Casual Inference / Population Hypothesis Revisited APPLICATIONS 12: MONTE CARLO REFERENCE DISTRIBUTIONS Serum Albumen in Diabetic Mice / Resampling Stats Analysis / SC Analysis / S-Plus Analysis / Exercises CONCEPTS 13: STRATEGIES FOR RANDOMIZING CASES Independent Randomization of Cases / Completely Randomized Designs / Randomized Blocks Designs / Restricted Randomization / Constraints on Rerandomization APPLICATIONS 13: IMPLEMENTING CASE RERANDOMIZATION Completely Randomized Designs / Randomized Blocks Designs / Independent Randomization of Cases / Restricted Randomization / Exercises CONCEPTS 14: RANDOM TREATMENT SEQUENCES Between- and Within-Cases Designs / Randomizing the Sequence of Treatments / Casual Inference for Within-Cases Designs / Sequence of Randomization Strategies APPLICATIONS 14: RERANDOMIZING TREATMENT SEQUENCES Analysis of the AB-BA Design / Sequences of k > 2 Treatments / Exercises CONCEPTS 15: BETWEEN- AND WITHIN-CASE DECISIONS Between/Within Designs / Between/Within Resampling Strategies / Doubly Randomized Available Cases APPLICATIONS 15: INTERACTIONS AND SIMPLE EFFECTS Simple and Main Effects / Exercises CONCEPTS 16: SUBSAMPLES: STABILITY OF DESCRIPTION Nonrandom Studies and Data Sets / Local Descriptive Inference / Descriptive Stability and Case Homogeneity / Subsample Descriptions / Employing Subsample Descriptions / Subsamples and Randomized Studies APPLICATIONS 16: STRUCTURED & UNSTRUCTURED DATA Half-Samples of Unstructured Data / Subsamples of Source-Structured Cases / Exercises PART II: RESAMPLING APPLICATIONS INTRODUCTION APPLICATIONS 17: A SINGLE GROUP OF CASES Random Sample or Set of Available Cases / Typical Size of Score Distribution / Variability of Attribute Scores / Association Between Two Attributes / Exercises APPLICATIONS 18: TWO INDEPENDENT GROUPS OF CASES Constitution of Independent Groups / Location Comparisons for Samples / Magnitude Differences, CR and RB Designs / Magnitude Differences, Nonrandom Designs / Study Size / Exercises APPLICATIONS 19: MULTIPLE INDEPENDENT GROUPS Multiple Group Parametric Comparisons / Nonparametric K-group Comparison / Comparisons among Randomized Groups / Comparisons among Nonrandom Groups / Adjustment for Multiple Comparisons / Exercises APPLICATIONS 20: MULTIPLE FACTORS AND COVARIATES Two Treatment Factors / Treatment and Blocking Factors / Covariate Adjustment of Treatment Scores / Exercises APPLICATIONS 21: WITHIN-CASES TREATMENT COMPARISONS Normal Models, Univariate and Multivariate / Bootstrap Treatment Comparisons / Randomized Sequence of Treatments / Nonrandom Repeated Measures / Exercises APPLICATIONS 22: LINEAR MODELS: MEASURED RESPONSE The Parametric Linear Model / Nonparametric Linear Models / Prediction Accuracy / Linear Models for Randomized Cases / Linear Models for Nonrandom Studies / Exercises APPLICATIONS 23: CATEGORICAL RESPONSE ATTRIBUTES Cross-Classification of Cases / The 2 x 2 Table / Logistic Regression / Exercises POSTSCRIPT: GENERALITY, CAUSALITY & STABILITY Study Design and Resampling / Resampling Tools / REFERENCES / INDEX

Posted Content
TL;DR: In this paper, the authors provide cumulative distribution functions, densities, and finite sample critical values for the single-equation error correction statistic for testing cointegration, and provide a convenient way for calculating finite-sample critical values at standard levels; and a computer program can be used to calculate both critical values and p-values.
Abstract: This paper provides cumulative distribution functions, densities, and finite sample critical values for the single-equation error correction statistic for testing cointegration. Graphs and response surfaces summarize extensive Monte Carlo simulations and highlight simple dependencies of the statistic's quantiles on the number of variables in the error correction model, the choice of deterministic components, and the estimation sample size. The response surfaces provide a convenient way for calculating finite sample critical values at standard levels; and a computer program, freely available over the Internet, can be used to calculate both critical values and p-values. Three empirical examples illustrate these tools.

Journal Article
TL;DR: In this article, the authors proposed test-based methods of constructing exact confidence intervals for the difference in two binomial proportions, and compared the performance of these confidence intervals to ones based on the observed difference alone.
Abstract: Confidence intervals are often provided to estimate a treatment difference. When the sample size is small, as is typical in early phases of clinical trials, confidence intervals based on large sample approximations may not be reliable. In this report, we propose test‐based methods of constructing exact confidence intervals for the difference in two binomial proportions. These exact confidence intervals are obtained from the unconditional distribution of two binomial responses, and they guarantee the level of coverage. We compare the performance of these confidence intervals to ones based on the observed difference alone. We show that a large improvement can be achieved by using the standardized Z test with a constrained maximum likelihood estimate of the variance.

Journal ArticleDOI
TL;DR: A computer program to perform sample size and power calculations to detect additive or multiplicative models of gene-environment interactions using the Lubin and Gail approach will be available free of charge in the near future from the National Cancer Institute.
Abstract: Power and sample size considerations are critical for the design of epidemiologic studies of gene-environment interactions. Hwang et al. (Am J Epidemiol 1994;140:1029-37) and Foppa and Spiegelman (Am J Epidemiol 1997;146:596-604) have presented power and sample size calculations for case-control studies of gene-environment interactions. Comparisons of calculations using these approaches and an approach for general multivariate regression models for the odds ratio previously published by Lubin and Gail (Am J Epidemiol 1990; 131:552-66) have revealed substantial differences under some scenarios. These differences are the result of a highly restrictive characterization of the null hypothesis in Hwang et al. and Foppa and Spiegelman, which results in an underestimation of sample size and overestimation of power for the test of a gene-environment interaction. A computer program to perform sample size and power calculations to detect additive or multiplicative models of gene-environment interactions using the Lubin and Gail approach will be available free of charge in the near future from the National Cancer Institute.

Journal ArticleDOI
TL;DR: This tutorial provides an introduction to the analysis of longitudinal binary data, at a level suited to statisticians familiar with logistic regression and survival analysis but not necessarily experienced in longitudinal analysis or estimating equation methods.
Abstract: Longitudinal studies are increasingly popular in epidemiology. In this tutorial we provide a detailed review of methods used by us in the analysis of a longitudinal (multiwave or panel) study of adolescent health, focusing on smoking behaviour. This example is explored in detail with the principal aim of providing an introduction to the analysis of longitudinal binary data, at a level suited to statisticians familiar with logistic regression and survival analysis but not necessarily experienced in longitudinal analysis or estimating equation methods. We describe recent advances in statistical methodology that can play a practical role in applications and are available with standard software. Our approach emphasizes the importance of stating clear research questions, and for binary outcomes we suggest these are best organized around the key epidemiological concepts of prevalence and incidence. For prevalence questions, we show how unbiased estimating equations and information-sandwich variance estimates may be used to produce a valid and robust analysis, as long as sample size is reasonably large. We also show how the estimating equation approach readily extends to accommodate adjustments for missing data and complex survey design. A detailed discussion of gender-related differences over time in our smoking outcome is used to emphasize the need for great care in separating longitudinal from cross-sectional information. We show how incidence questions may be addressed using a discrete-time version of the proportional hazards regression model. This approach has the advantages of providing estimates of relative risks, being feasible with standard software, and also allowing robust information-sandwich variance estimates.

Journal ArticleDOI
TL;DR: Empirical comparisons between model selection using VC-bounds and classical methods are performed for various noise levels, sample size, target functions and types of approximating functions, demonstrating the advantages of VC-based complexity control with finite samples.
Abstract: It is well known that for a given sample size there exists a model of optimal complexity corresponding to the smallest prediction (generalization) error. Hence, any method for learning from finite samples needs to have some provisions for complexity control. Existing implementations of complexity control include penalization (or regularization), weight decay (in neural networks), and various greedy procedures (aka constructive, growing, or pruning methods). There are numerous proposals for determining optimal model complexity (aka model selection) based on various (asymptotic) analytic estimates of the prediction risk and on resampling approaches. Nonasymptotic bounds on the prediction risk based on Vapnik-Chervonenkis (VC)-theory have been proposed by Vapnik. This paper describes application of VC-bounds to regression problems with the usual squared loss. An empirical study is performed for settings where the VC-bounds can be rigorously applied, i.e., linear models and penalized linear models where the VC-dimension can be accurately estimated, and the empirical risk can be reliably minimized. Empirical comparisons between model selection using VC-bounds and classical methods are performed for various noise levels, sample size, target functions and types of approximating functions. Our results demonstrate the advantages of VC-based complexity control with finite samples.

Journal ArticleDOI
TL;DR: In this article, a comparison of the sampling properties between L-moments and conventional product moments for generalised Normal, generalised Extreme Value, generalized Pareto and Pearson-3 distributions, in a relative form, is presented.

Journal Article
TL;DR: It is possible to design and implement a national probability sample of persons with a low-prevalence disease, even if it is stigmatized, according to the HCSUS consortium.
Abstract: Objective The design and implementation of a nationally representative probability sample of persons with a low-prevalence disease, HIV/AIDS. Data sources/study setting One of the most significant roadblocks to the generalizability of primary data collected about persons with a low-prevalence disease is the lack of a complete methodology for efficiently generating and enrolling probability samples. The methodology developed by the HCSUS consortium uses a flexible, provider-based approach to multistage sampling that minimizes the quantity of data necessary for implementation. Study design To produce a valid national probability sample, we combined a provider-based multistage design with the M.D.-colleague recruitment model often used in non-probability site-specific studies. Data collection Across the contiguous United States, reported AIDS cases for metropolitan areas and rural counties. In selected areas, caseloads for known providers for HIV patients and a random sample of other providers. For selected providers, anonymous patient visit records. Principal findings It was possible to obtain all data necessary to implement a multistage design for sampling individual HIV-infected persons under medical care with known probabilities. Taking account of both patient and provider nonresponse, we succeeded in obtaining in-person or proxy interviews from subjects representing over 70 percent of the eligible target population. Conclusions It is possible to design and implement a national probability sample of persons with a low-prevalence disease, even if it is stigmatized.

Journal ArticleDOI
TL;DR: In this paper, the authors performed geostatistical analyses to describe the spatial distribution of the organisms/processes coupled with power analyses to assess required sample sizes and found that required sample size and type II error rates can be significantly reduced for many belowground variables parameters when using a stratified sampling design.

Journal ArticleDOI
TL;DR: In this article, the reliability coefficient (r) is investigated as a function of sample size (N) for retest, alternate-form, split-half, alpha, intraclass, interrater, and validity coefficients.
Abstract: Precision of the reliability coefficient (r) is investigated. The width of the confidence interval for r as a function of sample size (N) is shown for retest, alternate-form, split-half, alpha, intraclass, interrater, and validity coefficients. Although the determination of the N needed for reliability studies is somewhat subjective, a minimum of 400 subjects is recommended. Much larger Ns may be needed for validity studies. A survey of published reliability studies shows that 59% of the sample sizes were less than 100. Confidence intervals for obtained test scores are used as a practical application measure that also leads to the conclusion of a minimum of 400 subjects.

01 Jan 1999
TL;DR: In this paper, the authors compared three methods to obtain a confidence interval for size at 50% maturity, and in gen- eral for P% maturity: Fieller's analyti- cal method, nonparametric bootstrap, and a Monte Carlo algorithm.
Abstract: Size at 50% maturity is commonly evaluated for wild popula- tions, but the uncertainty involved in such computation has been frequently overlooked in the application to marine fisheries. Here we evaluate three pro- cedures to obtain a confidence interval for size at 50% maturity, and in gen- eral for P% maturity: Fieller's analyti- cal method, nonparametric bootstrap, and a Monte Carlo algorithm. The three methods are compared in estimating size at 50% maturity (l 50% ) by using simulated data from an age-structured population, with von Bertalanffy growth and constant natural mortality, for sample sizes of 500 to 10,000 indi- viduals. Performance was assessed by using four criteria: 1) the proportion of times that the confidence interval did contain the true and known size at 50% maturity, 2) bias in estimating l 50% , 3) length and 4) shape of the confidence interval around l 50% . Judging from cri- teria 2-4, the three methods performed equally well, but in criterion 1, the Monte Carlo method outperformed the bootstrap and Fieller methods with a frequency remaining very close to the nominal 95% at all sample sizes. The Monte Carlo method was also robust to variations in natural mortality rate (M), although with lengthier and more asymmetric confidence intervals as M increased. This method was applied to two sets of real data. First, we used data from the squat lobster Pleuron- codes monodon with several levels of proportion mature, so that a confidence interval for the whole maturity curve could be outlined. Second, we compared two samples of the anchovy Engraulis ringens from different localities in cen- tral Chile to test the hypothesis that they differed in size at 50% maturity and concluded that they were not sta- tistically different. statistical uncertainty of the model- based l 50% is ignored (Table 1). In this work, we show three alterna- tive procedures: an analytical method derived from generalized linear models (McCullagh and Nelder, 1989), nonparametric boot- strap (Efron and Tibshirani, 1993), and a Monte Carlo algorithm devel- oped in our study. We show by simu- lation the behavior of the three methods for sample sizes of 500 to 10,000 individuals, concluding that they are similar in terms of bias, length, and shape of confidence inter- vals but that the Monte Carlo method outperforms the other two methods in percentage of times that the confi- dence interval contains the true pa- rameter, which remained close to the nominal 95% at all sample sizes.

Journal ArticleDOI
TL;DR: The understanding of the performance of the classifiers under the constraint of a finite design sample size is expected to facilitate the selection of a proper classifier for a given classification task and the design of an efficient resampling scheme.
Abstract: Classifier design is one of the key steps in the development of computer-aided diagnosis(CAD) algorithms. A classifier is designed with case samples drawn from the patient population. Generally, the sample size available for classifier design is limited, which introduces variance and bias into the performance of the trained classifier, relative to that obtained with an infinite sample size. For CAD applications, a commonly used performance index for a classifier is the area, A z , under the receiver operating characteristic (ROC) curve. We have conducted a computer simulation study to investigate the dependence of the mean performance, in terms of A z , on design sample size for a linear discriminant and two nonlinear classifiers, the quadratic discriminant and the backpropagation neural network (ANN). The performances of the classifiers were compared for four types of class distributions that have specific properties: multivariate normal distributions with equal covariance matrices and unequal means, unequal covariance matrices and unequal means, and unequal covariance matrices and equal means, and a feature space where the two classes were uniformly distributed in disjoint checkerboard regions. We evaluated the performances of the classifiers in feature spaces of dimensionality ranging from 3 to 15, and design sample sizes from 20 to 800 per class. The dependence of the resubstitution and hold-out performance on design (training) sample size (N t ) was investigated. For multivariate normal class distributions with equal covariance matrices, the linear discriminant is the optimal classifier. It was found that its A z - versus -1/N t curves can be closely approximated by linear dependences over the range of sample sizes studied. In the feature spaces with unequal covariance matrices where the quadratic discriminant is optimal, the linear discriminant is inferior to the quadratic discriminant or the ANN when the design sample size is large. However, when the design sample is small, a relatively simple classifier, such as the linear discriminant or an ANN with very few hidden nodes, may be preferred because performance bias increases with the complexity of the classifier. In the regime where the classifier performance is dominated by the 1/N t term, the performance in the limit of infinite sample size can be estimated as the intercept (1/N t =0) of a linear regression of A z versus 1/N t . The understanding of the performance of the classifiers under the constraint of a finite design sample size is expected to facilitate the selection of a proper classifier for a given classification task and the design of an efficient resampling scheme.

Journal ArticleDOI
TL;DR: In this paper, the authors investigate the potentials of neural network models by employing two cross-validation schemes and show that neural networks are a more robust forecasting method than the random walk model.
Abstract: Econometric methods used in foreign exchange rate forecasting have produced inferior out-of-sample results compared to a random walk model. Applications of neural networks have shown mixed findings. In this paper, we investigate the potentials of neural network models by employing two cross-validation schemes. The effects of different in-sample time periods and sample sizes are examined. Out-of-sample performance evaluated with four criteria across three forecasting horizons shows that neural networks are a more robust forecasting method than the random walk model. Moreover, neural network predictions are quite accurate even when the sample size is relatively small.

Journal ArticleDOI
TL;DR: A comparison among the first three procedures indicates that the Stein test is, unexpectedly, the test of choice under the original design alternative, whereas the approximate-optimal and Wittes-Brittain procedures appear to have superior power for detecting smaller treatment differences.
Abstract: The two-stage design involves sample size recalculation using an interim variance estimate. Stein proposed the design in 1945; biostatisticians recently have shown renewed interest in it. Wittes and Brittain proposed a modification aimed at greater efficiency; Gould and Shih proposed a similar procedure, but with a different interim variance estimate based on blinded data. We compare the power of Stein's original test, an idealized version of the Wittes-Brittain test, and a theoretical optimal test which can be approximated in practice. We also compare two procedures that control the conditional type I error rate given the actual final sample size: Gould and Shih's procedure and a newly proposed 'second segment' procedure. The comparison among the first three procedures indicates that the Stein test is, unexpectedly, the test of choice under the original design alternative, whereas the approximate-optimal and Wittes-Brittain procedures appear to have superior power for detecting smaller treatment differences. As between the latter two procedures, the second segment procedure is more powerful when many observations are likely to be taken after the interim resizing, whereas otherwise the Gould-Shih procedure is superior.