scispace - formally typeset
Search or ask a question

Showing papers on "Sample size determination published in 1991"


Journal ArticleDOI
TL;DR: A slightly more complex rule-of thumb is introduced that estimates minimum sample size as function of effect size as well as the number of predictors and it is argued that researchers should use methods to determine sample size that incorporate effect size.
Abstract: Numerous rules-of-thumb have been suggested for determining the minimum number of subjects required to conduct multiple regression analyses. These rules-of-thumb are evaluated by comparing their results against those based on power analyses for tests of hypotheses of multiple and partial correlations. The results did not support the use of rules-of-thumb that simply specify some constant (e.g., 100 subjects) as the minimum number of subjects or a minimum ratio of number of subjects (N) to number of predictors (m). Some support was obtained for a rule-of-thumb that N ≥ 50 + 8 m for the multiple correlation and N ≥104 + m for the partial correlation. However, the rule-of-thumb for the multiple correlation yields values too large for N when m ≥ 7, and both rules-of-thumb assume all studies have a medium-size relationship between criterion and predictors. Accordingly, a slightly more complex rule-of thumb is introduced that estimates minimum sample size as function of effect size as well as the number of predictors. It is argued that researchers should use methods to determine sample size that incorporate effect size.

3,105 citations


Book
01 Feb 1991
TL;DR: The sample size calculation for a prevalence only needs a simple formula, but there are a number of practical issues in selecting values for the parameters required in the formula, so the sample size needed for each of the selected province if the authors want to have a Sample size determination in health studies: A practical manual.
Abstract: The sample size calculation for a prevalence only needs a simple formula. However, there are a number of practical issues in selecting values for the parameters required in the formula. Several Practical Manual. Geneva: International Journal of Health Promotion and Education 11/2014, 53(3):128-135. Studies investigating the influence of genetic variants in vitamin D binding protein (DBP) and Sample size determination in health studies: A practical manual. Sample size determination in health studies. a practical manual. Sample size determination in health studies. a practical manual. PDF Came out some time. Targeted public health messages to raise knowledge level, correct misconception and Sample size determination in health studies : a practical manual. Sample size determination in health studies: a practical manual. 1991 Vol 0. S Lemeshow, S K Lwanga. The minimum sample size of 385 respondents was. The sample size needed for each of the selected province if we want to have a Sample size determination in health studies: A practical manual. Geneva:. Sample Size Determination In Health Studies A Practical Manual Read/Download Ethical approval has been obtained from the State Secretary of Health. Lemeshow S Sample size determination in health studies: a practical manual. Geneva. Handbook for Mental health posting. Obafemi (13), Lwanga, S.K. and Lemeshow, S. (1991) Sample Size Determination in Health Studies: A Practical Manual. The sample size was determine using the formula for prevalence study described in S. Sample size determination in health studies: a practical manual. Sample size determination and sampling procedure Lwanga SK, Lemeshow S. Sample Size Determination for Health Studies: A Practical Manual. Geneva. S. K. Wanga and S. Lemeshow, Sample Size Determination in Health Studies. A Practical Manual, World Health Organization, Geneva, Switzerland, 1991. online, Research questions, hypotheses and objectives (Practical Tips for Sample Size Determination in Health Studies: A Practical Manual WHO pdf, The. More recently, randomized controlled studies comparing SMS, phone calls, and no S. Sample size determination in health studies: a practical manual. Sample size determination in health studies: a practical manual. World Health. Organization. Riegert-Johnson DL, Korf BR, Alford RL, Broder MI, Keats BJ. Ministry of Public Health and Sanitation: National Strategy on Infant and Young Child Feeding Strategy 2007-2010. S. K. Lwanga, and S. Lameshow, “Sample size determination in health studies. A practical manual”. pp 1-71, 1991. Survey methods in community medicine, epidemiological studies, Lwanga SK, Lemeshow S. Sample size determination in health studies, a practical manual. As per WHO guidelines, 3 a minimum sample size of 96 was required using anticipated S. Sample size determination in health studies: A practical manual. The objective of this survey study was to determine the prevalence Lemeshow S. Sample size determination in health studies: a practical manual. 1991. 11. The sample size was computed using the formula for estimating a population S. Sample size determination in health studies: A practical manual. Geneva:. Aims: To know the factors determining gender preference by pregnant women, Lemshaw S. Sample size determination in health studies:A practical Manual. To determine the prevalence of mental health problems and psychosocial Sample size calculation for In other studies, the prevaA Practical Manual. In testing of hypothesis studies, the objective of sample size calculation is to achieve a S. Sample Size Determination in Health Studies: A Practical Manual. women are left with chronic ill health and 1 million neonatal deaths occur. and Lemeshow S, Sample size determination in health studies: A practical manual. In this state, a significant obstacle to health care access is the huge distances between the S. Sample size determination in health studies: a practical manual. International Journal of Interdisciplinary and Multidisciplinary Studies (IJIMS), S.Sample Size Determination in Health Studies–A Practical Manual: World. Bull World Health Organ 2002,80:546-54. (Pubmed) Lwanga SK, Lemeshow S. Sample Size Determination in Health Studies: A Practical Manual. Geneva:. Ian Janssen, Professor, School of Kinesiology and Health Studies, Queen's University S. Sample size determination in health studies: a practical manual. The sample was calculated using the using the World Health Organization Sample Size Determination in Health Studies (17) assuming a 41% prevalence. Lwanga S K, Lemeshow S. Sample size determination in health studies. A practical manual. World Health Organization Document 1991,1-80. pdf, Rieder H L. Sample size was estimated using the World Health Organization formula for sample size S. Sample size determination in health studies: a practical manual.

1,814 citations


Journal ArticleDOI
TL;DR: The effects of sample size on feature selection and error estimation for several types of classifiers are discussed and an emphasis is placed on giving practical advice to designers and users of statistical pattern recognition systems.
Abstract: The effects of sample size on feature selection and error estimation for several types of classifiers are discussed. The focus is on the two-class problem. Classifier design in the context of small design sample size is explored. The estimation of error rates under small test sample size is given. Sample size effects in feature selection are discussed. Recommendations for the choice of learning and test sample sizes are given. In addition to surveying prior work in this area, an emphasis is placed on giving practical advice to designers and users of statistical pattern recognition systems. >

1,323 citations


Journal ArticleDOI
TL;DR: This work presents a method for calculating likelihood ratio confidence intervals for tests that have positive or negative results, tests with non-positive/non-negative results, and tests reported on an ordinal outcome scale and demonstrates a sample size estimation procedure for diagnostic test studies based on the desired likelihood ratioconfidence interval.

877 citations


01 Jan 1991
TL;DR: General guidelines are presented for the use of cluster-sample surveys for health surveys in developing countries, with particular attention paid to allowing for the structure of the survey in estimating sample size, using the design effect and the rate of homogeneity.
Abstract: General guidelines are presented for the use of cluster-sample surveys for health surveys in developing countries. The emphasis is on methods which can be used by practitioners with little statistical expertise and no background in sampling. A simple self-weighting design is used, based on that used by the World Health Organization's Expanded Programme on Immunization (EPI). Topics covered include sample design, methods of random selection of areas and households, sample-size calculation and the estimation of proportions, ratios and means with standard errors appropriate to the design. Extensions are discussed, including stratification and multiple stages of selection. Particular attention is paid to allowing for the structure of the survey in estimating sample size, using the design effect and the rate of homogeneity. Guidance is given on possible values for these parameters. A spreadsheet is included for the calculation of standard errors.

865 citations


Journal ArticleDOI
TL;DR: The reference values and charts presented here have two major advantages over the current Swedish ones: the sample size used is now sufficiently large at the lower gestational ages, so that empirically found variations can be used, and the skewness of the birth weight distribution has been taken into account.
Abstract: An update of the Swedish reference standards for weight, length, and head circumference at birth, for each week of gestational age, is presented It is based on the total Swedish cohorts of infants born 1977-1981 (n = 475,588) A "healthy population" (79%) was extracted, using prospectively collected data Weekly (28-42 weeks) grouped data for length and head circumference were well approximated by the normal distribution, but the distributions for birthweight were positively skewed The original skewed distributions for birthweight were transformed, using the square root, resulting in distributions close to the Gaussian For smoothing purposes, the weakly values for the mean and the standard deviation were both fitted by a third degree polynomial function These functions also make possible the calculation of the continuous variable, standard deviation score, for individual newborn infants as well as a comparison of distributions between groups of infants The reference values and charts presented here have two major advantages over the current Swedish ones: the sample size used is now sufficiently large at the lower gestational ages, so that empirically found variations can be used, and the skewness of the birth weight distribution has been taken into account The use of the reference standards presented here improves and facilitates evaluation of size deviation at birth

740 citations



Journal ArticleDOI
TL;DR: In this article, it was shown that the empirical likelihood method for constructing confidence intervals is Bartlett-correctable, which means that a simple adjustment for the expected value of log-likelihood ratio reduces coverage error to an extremely low O(n −2 ) where n −2 denotes sample size.
Abstract: It is shown that, in a very general setting, the empirical likelihood method for constructing confidence intervals is Bartlett-correctable. This means that a simple adjustment for the expected value of log-likelihood ratio reduces coverage error to an extremely low $O(n^{-2})$, where $n$ denotes sample size. That fact makes empirical likelihood competitive with methods such as the bootstrap which are not Bartlett-correctable and which usually have coverage error of size $n^{-1}$. Most importantly, our work demonstrates a strong link between empirical likelihood and parametric likelihood, since the Bartlett correction had previously only been available for parametric likelihood. A general formula is given for the Bartlett correction, valid in a very wide range of problems, including estimation of mean, variance, covariance, correlation, skewness, kurtosis, mean ratio, mean difference, variance ratio, etc. The efficacy of the correction is demonstrated in a simulation study for the case of the mean.

410 citations


20 Sep 1991
TL;DR: Sample size determination in health studies , Sample size determinationIn health studies, کتابخانه دیجیتال جندی شاپور اهواز
Abstract: Sample size determination in health studies , Sample size determination in health studies , کتابخانه دیجیتال جندی شاپور اهواز

389 citations


Book
24 Apr 1991
TL;DR: A formal theory in which optimal tests are derived for simple statistical hypotheses in such a framework was developed by Abraham Wald in the early 20th century as mentioned in this paper, where the sample size depends in a random manner on the accumulating data.
Abstract: Sequential analysis refers to the body of statistical theory and methods where the sample size may depend in a random manner on the accumulating data. A formal theory in which optimal tests are derived for simple statistical hypotheses in such a framework was developed by Abraham Wald in the early 1

349 citations


Journal ArticleDOI
TL;DR: This paper discusses, from a philosophical perspective, the reasons for considering the power of any statistical test used in environmental biomonitoring, because Type II errors can be more costly than Type I errors for environmental management.
Abstract: This paper discusses, from a philosophical perspective, the reasons for considering the power of any statistical test used in environmental biomonitoring. Power is inversely related to the probability of making a Type II error (i.e. low power indicates a high probability of Type II error). In the context of environmental monitoring, a Type II error is made when it is concluded that no environmental impact has occurred even though one has. Type II errors have been ignored relative to Type I errors (the mistake of concluding that there is an impact when one has not occurred), the rates of which are stipulated by the a values of the test. In contrast, power depends on the value of α, the sample size used in the test, the effect size to be detected, and the variability inherent in the data. Although power ideas have been known for years, only recently have these issues attracted the attention of ecologists and have methods been available for calculating power easily. Understanding statistical power gives three ways to improve environmental monitoring and to inform decisions about actions arising from monitoring. First, it allows the most sensitive tests to be chosen from among those applicable to the data. Second, preliminary power analysis can be used to indicate the sample sizes necessary to detect an environmental change. Third, power analysis should be used after any nonsignificant result is obtained in order to judge whether that result can be interpreted with confidence or the test was too weak to examine the null hypothesis properly. Power procedures are concerned with the statistical significance of tests of the null hypothesis, and they lend little insight, on their own, into the workings of nature. Power analyses are, however, essential to designing sensitive tests and correctly interpreting their results. The biological or environmental significance of any result, including whether the impact is beneficial or harmful, is a separate issue. The most compelling reason for considering power is that Type II errors can be more costly than Type I errors for environmental management. This is because the commitment of time, energy and people to fighting a false alarm (a Type I error) may continue only in the short term until the mistake is discovered. In contrast, the cost of not doing something when in fact it should be done (a Type II error) will have both short- and long-term costs (e.g. ensuing environmental degradation and the eventual cost of its rectification). Low power can be disastrous for environmental monitoring programmes.

Journal ArticleDOI
TL;DR: Simulations show that there is substantial differential bias when comparing conditions with fewer than 10 observations against conditions with more than 20, and strongly skewed distributions and a cutoff of 3.0 standard deviations can influence comparisons of conditions with even more observations.
Abstract: To remove the influence of spuriously long response times, many investigators compute “restricted means”, obtained by throwing out any response time more than 2.0, 2.5, or 3.0 standard deviations from the overall sample average. Because reaction time distributions are skewed, however, the computation of restricted means introduces a bias: the restricted mean underestimates the true average of the population of response times. This problem may be very serious when investigators compare restricted means across conditions with different numbers of observations, because the bias increases with sample size. Simulations show that there is substantial differential bias when comparing conditions with fewer than 10 observations against conditions with more than 20. With strongly skewed distributions and a cutoff of 3.0 standard deviations, differential bias can influence comparisons of conditions with even more observations.

Journal ArticleDOI
TL;DR: In this paper, the effect size of p <.05 is compared to effect size in p >.05, where p is the difference in effect size between p < 0.
Abstract: (1991). What is Missing in p < .05? Effect Size. Research Quarterly for Exercise and Sport: Vol. 62, No. 3, pp. 344-348.

Journal ArticleDOI
TL;DR: In this paper, the authors argue that the influence of sample size is not necessarily undesirable, and the rationale behind this point of view is described in terms of the relationships among the population covariance matrix and 2 model-based estimates of it.
Abstract: Complex models for covariance matrices are structures that specify many parameters, whereas simple models require only a few. When a set of models of differing complexity is evaluated by means of some goodness of fit indices, structures with many parameters are more likely to be selected when the number of observations is large, regardless of other utility considerations. This is known as the sample size problem in model selection decisions. This article argues that this influence of sample size is not necessarily undesirable. The rationale behind this point of view is described in terms of the relationships among the population covariance matrix and 2 model-based estimates of it. The implications of these relationships for practical use are discussed.

Journal ArticleDOI
TL;DR: In this article, several types of estimators are developed which are unbiased for the population mean or total with stratified adaptive cluster sampling, which is similar to the adaptive clustering approach in this paper.
Abstract: SUMMARY Stratified adaptive cluster sampling refers to designs in which, following an initial stratified sample, additional units are added to the sample from the neighbourhood of any selected unit with an observed value that satisfies a condition of interest. If any of the added units in turn satisfies the condition, still more units are added to the sample. Estimation of the population mean or total with the stratified adaptive cluster designs is complicated by the possibility that a selection in one stratum may result in the addition of units from other strata to the sample, so that observations in separate strata are not independent. Since conventional estimators such as the stratified sample mean are biased with the adaptive designs of this paper, several types of estimators are developed which are unbiased for the population mean or total with stratified adaptive cluster sampling.

Journal ArticleDOI
TL;DR: In this article, the authors used the sample second moment as an estimate of the second moment of the Rice distribution, and applied two techniques, viz., methods of moments and maximum likelihood, in order to estimate the parameter from different sample sizes.
Abstract: This paper deals with the problem of estimating the parameters of the Rice distribution. The distribution has applications in sonar and radar signal processing and a proper estimation procedure with associated confidence intervals is important. Using the sample second moment as an estimate of the second moment of the distribution, two techniques, viz., methods of moments and maximum likelihood are applied to synthetic envelope data of known signal-to-noise ratios, in order to estimate the parameter from different sample sizes. It is concluded that the sample second moment is an unbiased estimate of the theoretical second moment and for the signal-to-noise ratio parameter both methods work without any significant bias and satisfy the criterion of maximum efficiency. However, the method of moments is simpler, easier to apply and therefore recommended as the method of choice.

Journal ArticleDOI
TL;DR: This approach serves to unify several ideas in the literature on evaluation of community studies, including use of time-series regression and the question of whether the individual or the community should be the unit of analysis.

Journal ArticleDOI
TL;DR: In this article, a first-order estimate of the systematic sample-size error is used to compare the efficiencies of various computing strategies, and it is found that slow-growth, free-energy perturbation calculations will always have lower errors from this source than window-growth free energy perturbations for the same computing effort.
Abstract: Although the free energy perturbation procedure is exact when an infinite sample of configuration space is used, for finite sample size there is a systematic error resulting in hysteresis for forward and backward simulations. The qualitative behavior of this systematic error is first explored for a Gaussian distribution, then a first-order estimate of the error for any distribution is derived. To first order the error depends only on the fluctuations in the sample of potential energies, {Delta}E, and the sample size, n, but not on the magnitude of {Delta}E. The first-order estimate of the systematic sample-size error is used to compare the efficiencies of various computing strategies. It is found that slow-growth, free energy perturbation calculations will always have lower errors from this source than window-growth, free energy perturbation calculations for the same computing effort. The systematic sample-size errors can be entirely eliminated by going to thermodynamic integration rather than free energy perturbation calculations. When {Delta}E is a very smooth function of the coupling parameter, {lambda}, thermodynamic integration with a relatively small number of windows is the recommended procedure because the time required for equilibration is reduced with a small number of windows. These results give a method of estimatingmore » this sample-size hysteresis during the course of a slow-growth, free energy perturbation run. This is important because in these calculations time-lag and sample-size errors can cancel, so that separate methods of estimating and correcting for each are needed. When dynamically modified window procedures are used, it is recommended that the estimated sample-size error be kept constant, not that the magnitude of {Delta}E be kept constant. Tests on two systems showed a rather small sample-size hysteresis in slow-growth calculations except in the first stages of creating a particle, where both fluctuations and sample-size hysteresis are large.« less

Journal ArticleDOI
TL;DR: In this paper, an exact expression for Fisher's information matrix, based upon the moment generating function of the distribution of covariates, is calculated for the Poisson regression model, and the resulting asymptotic variance of the maximum likelihood estimate of the parameters is used to calculate the sample size required to test hypotheses about the parameters at a specified significance and power.
Abstract: SUMMARY For the Poisson regression model, an exact expression for Fisher's information matrix, based upon the moment generating function of the distribution of covariates, is calculated. This parallels a similar, approximate, calculation by Whittemore (1981) for logistic regression. The resulting asymptotic variance of the maximum likelihood estimate of the parameters is used to calculate the sample size required to test hypotheses about the parameters at a specified significance and power. Methods for calculating sample size are derived for various distributions of a single covariate, and for a family of multivariate exponential-type distributions of multiple covariates. The procedures are illustrated with two examples.

Journal ArticleDOI
TL;DR: In this paper, a simple method is described for estimating the sample size per group required for specified power to detect a linear contrast among J group means, which can also be used to find sample size for a complex contrast in a nonfactorial design.
Abstract: A simple method is described for estimating the sample size per group required for specified power to detect a linear contrast among J group means This allows comparison of sample sizes to detect main effects with those needed to detect several realistic kinds of interaction in 2 × 2 and 2 × 2 × 2 designs with a fixed-effects model For example, when 2 factors are multiplicative, the sample size required to detect the presence of nonadditivity is 7 to 9 times as large as that needed to detect main effects with the same degree of power In certain other situations, effect sizes for the main effects and interaction may be identical, in which case power and necessary sample sizes to detect the effects will be the same The method can also be used to find sample size for a complex contrast in a nonfactorial design

Book
28 Jun 1991
TL;DR: The Compatibility of the Clinical and Epidemiologic Approaches and Categorical Data Analysis are examined, as well as Statistical Inference for Continuous Variables and Nonparametric Tests of Two Means, which help clarify the aims and objectives of the study.
Abstract: I Epidemiologic Research Design.- 1: Introduction.- 1.1 The Compatibility of the Clinical and Epidemiologic Approaches.- 1.2 Clinical Epidemiology: Main Areas of Interest.- 1.3 Historical Roots.- 1.4 Current and Future Relevance: Controversial Questions and Unproven Hypotheses.- 2: Measurement.- 2.1 Types of Variables and Measurement Scales.- 2.2 Sources of Variation in a Measurement.- 2.3 Properties of Measurement.- 2.4 "Hard" vs "Soft" Data.- 2.5 Consequences of Erroneous Measurement.- 2.6 Sources of Data.- 3: Rates.- 3.1 What is a Rate?.- 3.2 Prevalence and Incidence Rates.- 3.3 Stratification and Adjustment of Rates.- 3.4 Concluding Remarks.- 4: Epidemiologic Research Design: an Overview.- 4.1 The Research Objective: Descriptive vs Analytic Studies.- 4.2 Exposure and Outcome.- 4.3 The Three Axes of Epidemiologic Research Design.- 4.4 Concluding Remarks.- 5: Analytic Bias.- 5.1 Validity and Reproducibility of Exposure-Outcome Associations.- 5.2 Internal and External Validity.- 5.3 Sample Distortion Bias.- 5.4 Information Bias.- 5.5 Confounding Bias.- 5.6 Reverse Causality ("Cart-vs-Horse") Bias.- 5.7 Concluding Remarks.- 6: Observational Cohort Studies.- 6.1 Research Design Components.- 6.2 Analysis of Results.- 6.3 Bias Assessment and Control.- 6.4 Effect Modification and Synergism.- 6.5 Advantages and Disadvantages of Cohort Studies.- 7: Clinical Trials.- 7.1 Research Design Components.- 7.2 Assignment of Exposure (Treatment).- 7.3 Blinding in Clinical Trials.- 7.4 Analysis of Results.- 7.5 Interpretation of Results.- 7.6 Ethical Considerations.- 7.7 Advantages and Disadvantages of Clinical Trials.- 8: Case-Control Studies.- 8.1 Introduction.- 8.2 Research Design Components.- 8.3 Analysis of Results.- 8.4 Bias Assessment and Control.- 8.5 Advantages and Disadvantages of Case-Control Studies.- 9: Cross-Sectional Studies.- 9.1 Introduction.- 9.2 Research Design Components.- 9.3 Analysis of Results.- 9.4 Bias Assessment and Control.- 9.5 "Pseudo-Cohort" Cross-Sectional Studies.- 9.6 Advantages, Disadvantages, and Uses of Cross-Sectional Studies.- II Biostatistics.- 10: Introduction to Statistics.- 10.1 Variables.- 10.2 Populations, Samples, and Sampling Variation.- 10.3 Description vs Statistical Inference.- 10.4 Statistical vs Analytic Inference.- 11: Descriptive Statistics and Data Display.- 11.1 Continuous Variables.- 11.2 Categorical Variables.- 11.3 Concluding Remarks.- 12: Hypothesis Testing and P Values.- 12.1 Formulating and Testing a Research Hypothesis.- 12.2 The Testing of Ho.- 12.3 Type II Error and Statistical Power.- 12.4 Bayesian vs Frequentist Inference.- 13: Statistical Inference for Continuous Variables.- 13.1 Repetitive Sampling and the Central Limit Theorem.- 13.2 Statistical Inferences Using the t-Distribution.- 13.3 Calculation of Sample Sizes.- 13.4 Nonparametric Tests of Two Means.- 13.5 Comparing Three or More Means: Analysis of Variance.- 13.6 Control for Confounding Factors.- 14: Statistical Inference for Categorical Variables.- 14.1 Introduction to Categorical Data Analysis.- 14.2 Comparing Two Proportions.- 14.3 Statistical Inferences for a Single Proportion.- 14.4 Comparison of Three or More Proportions.- 14.5 Analysis of Larger (r x c) Contingency Tables.- 15: Linear Correlation and Regression.- 15.1 Linear Correlation.- 15.2 Linear Regression.- 15.3 Correlation vs Regression.- 15.4 Statistical Inference.- 15.5 Control for Confounding Factors.- 15.6 Rank (Nonparametric) Correlation.- III Special Topics.- 16: Diagnostic Tests.- 16.1 Introduction.- 16.2 Defining "Normal" and "Abnormal" Test Results.- 16.3 The Reproducibility and Validity of Diagnostic Tests.- 16.4 The Predictive Value of Diagnostic Tests.- 16.5 Bayes'I Epidemiologic Research Design.- 1: Introduction.- 1.1 The Compatibility of the Clinical and Epidemiologic Approaches.- 1.2 Clinical Epidemiology: Main Areas of Interest.- 1.3 Historical Roots.- 1.4 Current and Future Relevance: Controversial Questions and Unproven Hypotheses.- 2: Measurement.- 2.1 Types of Variables and Measurement Scales.- 2.2 Sources of Variation in a Measurement.- 2.3 Properties of Measurement.- 2.4 "Hard" vs "Soft" Data.- 2.5 Consequences of Erroneous Measurement.- 2.6 Sources of Data.- 3: Rates.- 3.1 What is a Rate?.- 3.2 Prevalence and Incidence Rates.- 3.3 Stratification and Adjustment of Rates.- 3.4 Concluding Remarks.- 4: Epidemiologic Research Design: an Overview.- 4.1 The Research Objective: Descriptive vs Analytic Studies.- 4.2 Exposure and Outcome.- 4.3 The Three Axes of Epidemiologic Research Design.- 4.4 Concluding Remarks.- 5: Analytic Bias.- 5.1 Validity and Reproducibility of Exposure-Outcome Associations.- 5.2 Internal and External Validity.- 5.3 Sample Distortion Bias.- 5.4 Information Bias.- 5.5 Confounding Bias.- 5.6 Reverse Causality ("Cart-vs-Horse") Bias.- 5.7 Concluding Remarks.- 6: Observational Cohort Studies.- 6.1 Research Design Components.- 6.2 Analysis of Results.- 6.3 Bias Assessment and Control.- 6.4 Effect Modification and Synergism.- 6.5 Advantages and Disadvantages of Cohort Studies.- 7: Clinical Trials.- 7.1 Research Design Components.- 7.2 Assignment of Exposure (Treatment).- 7.3 Blinding in Clinical Trials.- 7.4 Analysis of Results.- 7.5 Interpretation of Results.- 7.6 Ethical Considerations.- 7.7 Advantages and Disadvantages of Clinical Trials.- 8: Case-Control Studies.- 8.1 Introduction.- 8.2 Research Design Components.- 8.3 Analysis of Results.- 8.4 Bias Assessment and Control.- 8.5 Advantages and Disadvantages of Case-Control Studies.- 9: Cross-Sectional Studies.- 9.1 Introduction.- 9.2 Research Design Components.- 9.3 Analysis of Results.- 9.4 Bias Assessment and Control.- 9.5 "Pseudo-Cohort" Cross-Sectional Studies.- 9.6 Advantages, Disadvantages, and Uses of Cross-Sectional Studies.- II Biostatistics.- 10: Introduction to Statistics.- 10.1 Variables.- 10.2 Populations, Samples, and Sampling Variation.- 10.3 Description vs Statistical Inference.- 10.4 Statistical vs Analytic Inference.- 11: Descriptive Statistics and Data Display.- 11.1 Continuous Variables.- 11.2 Categorical Variables.- 11.3 Concluding Remarks.- 12: Hypothesis Testing and P Values.- 12.1 Formulating and Testing a Research Hypothesis.- 12.2 The Testing of Ho.- 12.3 Type II Error and Statistical Power.- 12.4 Bayesian vs Frequentist Inference.- 13: Statistical Inference for Continuous Variables.- 13.1 Repetitive Sampling and the Central Limit Theorem.- 13.2 Statistical Inferences Using the t-Distribution.- 13.3 Calculation of Sample Sizes.- 13.4 Nonparametric Tests of Two Means.- 13.5 Comparing Three or More Means: Analysis of Variance.- 13.6 Control for Confounding Factors.- 14: Statistical Inference for Categorical Variables.- 14.1 Introduction to Categorical Data Analysis.- 14.2 Comparing Two Proportions.- 14.3 Statistical Inferences for a Single Proportion.- 14.4 Comparison of Three or More Proportions.- 14.5 Analysis of Larger (r x c) Contingency Tables.- 15: Linear Correlation and Regression.- 15.1 Linear Correlation.- 15.2 Linear Regression.- 15.3 Correlation vs Regression.- 15.4 Statistical Inference.- 15.5 Control for Confounding Factors.- 15.6 Rank (Nonparametric) Correlation.- III Special Topics.- 16: Diagnostic Tests.- 16.1 Introduction.- 16.2 Defining "Normal" and "Abnormal" Test Results.- 16.3 The Reproducibility and Validity of Diagnostic Tests.- 16.4 The Predictive Value of Diagnostic Tests.- 16.5 Bayes' Theorem.- 16.6 The Uses of Diagnostic Tests.- 17: Decision Analysis.- 17.1 Strategies for Decision-Making.- 17.2 Constructing a Decision Tree.- 17.3 Probabilities and Utilities.- 17.4 Completing the Analysis.- 17.5 Cost-Benefit Analysis.- 17.6 Cost-Effectiveness Analysis.- 18: Life-Table (Survival) Analysis.- 18.1 Introduction.- 18.2 Alternative Methods of Analysis: an Example.- 18.3 The Actuarial Method.- 18.4 The Kaplan-Meier (Product-Limit) Method.- 18.5 Statistical Inference.- 19: Causality.- 19.1 What is a "Cause"?.- 19.2 Necessary, Sufficient, and Multiple Causes.- 19.3 Patterns of Cause.- 19.4 Probability and Uncertainty.- 19.5 Can Exposure Cause Outcome?.- 19.6 Is Exposure an Important Cause of Outcome?.- 19.7 Did Exposure Cause Outcome in a Specific Case?.- Appendix Tables.

Journal ArticleDOI
TL;DR: In this article, the dominant Lyapunov exponent from time-series data is estimated based on nonparametric regression, and it is shown that the estimate converges to the true values as the sample size increases.

Journal ArticleDOI
TL;DR: A common problem in multivariate applications is the comparison of pattern matrices obtained from two independent studies, and an increase in either saturation or sample size resulted in more accurate index values.
Abstract: A common problem in multivariate applications is the comparison of pattern matrices obtained from two independent studies. We compared the performance of four pattern matching indices (the coefficient of congruence [c], the s-statistic [s], Pearson's r [r], and kappa [k]) under a variety of experimental conditions. We constructed population pattern matrices by varying (a) saturation or the size of the loadings, (b) sample size, (c) the number of observed variables, and (d) the number of derived variables. Sample patterns were computer generated and matched, employing each index, to their population pattern. With the exception of r, little difference in matching performance between indices was observed. In general, an increase in either saturation or sample size resulted in more accurate index values.

Journal ArticleDOI
TL;DR: The information matrix (IM) test has a finite sample distribution which is poorly approximated by its asymptotic X 2 distribution in models and sample sizes commonly encountered in applied econometric research as mentioned in this paper.
Abstract: The information matrix (IM) test is shown to have a finite sample distribution which is poorly approximated by its asymptotic X 2 distribution in models and sample sizes commonly encountered in applied econometric research The quality of the x2 approximation depends upon the method chosen to compute the test Failure to exploit restrictions on the covariance matrix of the test can lead to a test with appalling finite sample properties Order O(n -1) approximations to the exact distribution of an efficient form of the IM test are reported These are developed from asymptotic expansions of the Edgeworth and Cornish-Fisher types They are compared with Monte Carlo estimates of the finite sample distribution of the test and are found to be superior to the usual x2 approximations in sample sizes of the magnitude found in applied micro-econometric work The methods developed in the paper are applied to normal and exponential models and to normal regression models Results are provided for the full IM test and for heteroskedasticity and nonnormality diagnostic tests which are special cases of the IM test In geieral the quality of alternative approximations is sensitive to covariate design However commonly used nonnormality tests are found to have distributions which, to order O(n-1), are invariant under changes in covariate design This leads to simple design and parameter invariant size corrections for nonnormality tests

Journal ArticleDOI
TL;DR: In this paper, the authors considered the consistency property of some test statistics based on a time series of data and provided Monte Carlo evidence on the power, in finite samples, of the tests Studied allowing various combinations of span and sampling frequencies.
Abstract: This paper considers the consistency property of some test statistics based on a time series of data. While the usual consistency criterion is based on keeping the sampling interval fixed, we let the sampling interval take any equispaced path as the sample size increases to infinity. We consider tests of the null hypotheses of the random walk and randomness against positive autocorrelation (stationary or explosive). We show that tests of the unit root hypothesis based on the first-order correlation coefficient of the original data are consistent as long as the span of the data is increasing. Tests of the same hypothesis based on the first-order correlation coefficient of the first-differenced data are consistent against stationary alternatives only if the span is increasing at a rate greater than T ½ , where T is the sample size. On the other hand, tests of the randomness hypothesis based on the first-order correlation coefficient applied to the original data are consistent as long as the span is not increasing too fast. We provide Monte Carlo evidence on the power, in finite samples, of the tests Studied allowing various combinations of span and sampling frequencies. It is found that the consistency properties summarize well the behavior of the power in finite samples. The power of tests for a unit root is more influenced by the span than the number of observations while tests of randomness are more powerful when a small sampling frequency is available.

Journal Article
Neil Risch1
TL;DR: It is shown that testing multiple markers decreases the posterior false-positive rate among significant tests, rather than increasing it; this is true whether the trait of interest is simply monogenic or complex, or even if the genetic model is misspecified.
Abstract: Controversy over the impact of multiple testing procedures in linkage analysis is reexamined in this report. Despite some recent claims to the contrary, it is shown that testing multiple markers decreases the posterior false-positive rate among significant tests, rather than increasing it; this is true whether the trait of interest is simply monogenic or complex, or even if the genetic model is misspecified. However, if the true mode of inheritance is complex, or if the genetic model is misspecified, the power to obtain a significant result when linkage is present may be reduced, while the significance level is not, leading to an inflation of the posterior false-positive rate. Furthermore, the posterior false-positive rate increases with decreasing sample size and may be unacceptably high for very small samples. By contrast, testing multiple genetic models, by varying either mode-of-inheritance parameters or diagnostic categories, does lead to an inflation of the posterior false-positive rate. A conservative correction for this case is to subtract log10t from the obtained maximum lod score, where t different genetic and/or diagnostic models have been tested.

Journal ArticleDOI
TL;DR: In this article, sequential and group sequential procedures are proposed for monitoring repeated t, X2or F statistics. But these procedures can be used to construct sequential hypothesis tests or repeated confidence intervals when the parameter of interest is a normal mean with unknown variance or a multivariate normal means with variance matrix known or known up to a scale factor.
Abstract: SUMMARY Sequential and group sequential procedures are proposed for monitoring repeated t, X2or F statistics. These can be used to construct sequential hypothesis tests or repeated confidence intervals when the parameter of interest is a normal mean with unknown variance or a multivariate normal mean with variance matrix known or known up to a scale factor. Exact methods for calculating error probabilities and sample size distributions are described and tables of critical values needed to implement the procedures are provided.

Journal ArticleDOI
TL;DR: In this paper, the minimum sample sizes required for several hypotheses arising from repeated-measures designs are presented, and the autoregressive model and the assumption of compound symmetry are compared and contrasted throughout.
Abstract: Vonesh and Schork (1986, Biometrics 42, 601-610) presented a statistical methodology for computing the minimum sample size required for the within-subjects repeated-measures design. They applied Hotelling's T2 analysis and demonstrated the utility of these techniques under general covariance structures. In this paper, we extend these procedures to the between-subjects repeated-measures design when there are two treatment groups under consideration. The multivariate analysis of variance approach to analyzing repeated measurements is considered and this model also resolves to Hotelling's T2 analysis. Two models are put forward to contend with those situations where the correlation structure among the repeated measures is unknown. These include the autoregressive model and the assumption of compound symmetry, and they are compared and contrasted throughout. Tables of the minimum sample sizes required for several hypotheses arising from repeated-measures designs are presented.

Journal ArticleDOI
TL;DR: In this paper, Monte Carlo sampling is used to estimate the probability that a Weibull random variable Y is less than another independent Weibll variable X for the case where both X and Y have the same shape parameter.
Abstract: The probability that a Weibull random variable Y is less than another independent Weibull random variable X is considered for the case where both X and Y have the same, but unknown, shape parameter. Tables,developed by Monte Carlo sampling, are presented whereby 90% confidence limits for this probability may be found in terms of its maximum likelihood estimate for 21 combinations of the size of the sample taken from each of the two populations and the ordered observation number at which the samples are type II censored. A normal approximation is also discussed and its accuracy vis-a-vis the exact values is examined as a function of sample size for a particular case

Journal ArticleDOI
TL;DR: In this article, an analytic solution for the theoretical distribution of optimal values for univariate optimal linear discriminant analysis, under the assumption that the data are random and continuous, is presented.
Abstract: Optimal linear discriminant models maximize percentage accuracy for dichotomous classifications, but are rarely used because a theoretical framework that allows one to make valid statements about the statistical significance of the outcomes of such analyses does not exist. This paper describes an analytic solution for the theoretical distribution of optimal values for univariate optimal linear discriminant analysis, under the assumption that the data are random and continuous. We also present the theoretical distribution for sample sizes up to N= 30. The discovery of a statistical framework for evaluating the performance of optimal discriminant models should greatly increase their use by scientists in all disciplines.