scispace - formally typeset
Search or ask a question

Showing papers on "Statistical hypothesis testing published in 1999"


Journal ArticleDOI
TL;DR: In this article, a non-standard asymptotic theory of inference is developed which allows construction of confidence intervals and testing of hypotheses, and the methods are applied to a 15-year sample of 565 US firms to test whether financial constraints affect investment decisions.

3,019 citations


Journal ArticleDOI
TL;DR: In this article, a goodness-of-fit process for quantile regression analogous to the conventional R2 statistic of least squares regression is introduced, and several related inference processes designed to test composite hypotheses about the combined effect of several covariates over an entire range of conditional quantile functions are also formulated.
Abstract: We introduce a goodness-of-fit process for quantile regression analogous to the conventional R2 statistic of least squares regression. Several related inference processes designed to test composite hypotheses about the combined effect of several covariates over an entire range of conditional quantile functions are also formulated. The asymptotic behavior of the inference processes is shown to be closely related to earlier p-sample goodness-of-fit theory involving Bessel processes. The approach is illustrated with some hypothetical examples, an application to recent empirical models of international economic growth, and some Monte Carlo evidence.

1,243 citations


Journal ArticleDOI
TL;DR: The historical and logical foundations of the dominant school of medical statistics, sometimes referred to as frequentist statistics, are explored and the logical fallacy at the heart of this system is explicated, which maintains such a tenacious hold on the minds of investigators, policymakers, and journal editors.
Abstract: An important problem exists in the interpretation of modern medical research data: Biological understanding and previous research play little formal role in the interpretation of quantitative results. This phenomenon is manifest in the discussion sections of research articles and ultimately can affect the reliability of conclusions. The standard statistical approach has created this situation by promoting the illusion that conclusions can be produced with certain "error rates," without consideration of information from outside the experiment. This statistical approach, the key components of which are P values and hypothesis tests, is widely perceived as a mathematically coherent approach to inference. There is little appreciation in the medical community that the methodology is an amalgam of incompatible elements, whose utility for scientific inference has been the subject of intense debate among statisticians for almost 70 years. This article introduces some of the key elements of that debate and traces the appeal and adverse impact of this methodology to the P value fallacy, the mistaken idea that a single number can capture both the long-run outcomes of an experiment and the evidential meaning of a single result. This argument is made as a prelude to the suggestion that another measure of evidence should be used--the Bayes factor, which properly separates issues of long-run behavior from evidential strength and allows the integration of background knowledge with statistical findings.

1,123 citations


Journal ArticleDOI
TL;DR: In this article, the authors discuss the arbitrariness of P-values, conclusions that the null hypothesis is true, power analysis, and distinctions between statistical and biological significance, and contrast that interpretation with the correct one.
Abstract: Despite their wide use in scientific journals such as The Journal of Wildlife Management, statistical hypothesis tests add very little value to the products of research. Indeed, they frequently confuse the interpretation of data. This paper describes how statistical hypothesis tests are often viewed, and then contrasts that interpretation with the correct one. I discuss the arbitrariness of P-values, conclusions that the null hypothesis is true, power analysis, and distinctions between statistical and biological significance. Statistical hypothesis testing, in which the null hypothesis about the properties of a population is almost always known a priori to be false, is contrasted with scientific hypothesis testing, which examines a credible null hypothesis about phenomena in nature. More meaningful alternatives are briefly outlined, including estimation and confidence intervals for determining the importance of factors, decision theory for guiding actions in the face of uncertainty, and Bayesian approaches to hypothesis testing and other statistical practices.

1,041 citations


Journal ArticleDOI
TL;DR: Almost entirely automated procedures for estimation of global, voxel, and cluster-level statistics to test the null hypothesis of zero neuroanatomical difference between two groups of structural magnetic resonance imaging (MRI) data are described.
Abstract: The authors describe almost entirely automated procedures for estimation of global, voxel, and cluster-level statistics to test the null hypothesis of zero neuroanatomical difference between two groups of structural magnetic resonance imaging (MRI) data. Theoretical distributions under the null hypothesis are available for (1) global tissue class volumes; (2) standardized linear model [analysis of variance (ANOVA and ANCOVA)] coefficients estimated at each voxel; and (3) an area of spatially connected clusters generated by applying an arbitrary threshold to a two-dimensional (2-D) map of normal statistics at voxel level. The authors describe novel methods for economically ascertaining probability distributions under the null hypothesis, with fewer assumptions, by permutation of the observed data. Nominal Type I error control by permutation testing is generally excellent; whereas theoretical distributions may be over conservative. Permutation has the additional advantage that it can be used to test any statistic of interest, such as the sum of suprathreshold voxel statistics in a cluster (or cluster mass), regardless of its theoretical tractability under the null hypothesis. These issues are illustrated by application to MRI data acquired from 18 adolescents with hyperkinetic disorder and 16 control subjects matched for age and gender.

1,036 citations


Journal ArticleDOI
29 Nov 1999
TL;DR: This work performs a theoretical investigation of the variance of a variant of the cross-validation estimator of the generalization error that takes into account the variability due to the randomness of the training set as well as test examples and proposes new estimators of this variance.
Abstract: In order to compare learning algorithms, experimental results reported in the machine learning literature often use statistical tests of significance to support the claim that a new learning algorithm generalizes better. Such tests should take into account the variability due to the choice of training set and not only that due to the test examples, as is often the case. This could lead to gross underestimation of the variance of the cross-validation estimator, and to the wrong conclusion that the new algorithm is significantly better when it is not. We perform a theoretical investigation of the variance of a variant of the cross-validation estimator of the generalization error that takes into account the variability due to the randomness of the training set as well as test examples. Our analysis shows that all the variance estimators that are based only on the results of the cross-validation experiment must be biased. This analysis allows us to propose new estimators of this variance. We show, via simulations, that tests of hypothesis about the generalization error using those new variance estimators have better properties than tests involving variance estimators currently in use and listed in Dietterich (1998). In particular, the new tests have correct size and good power. That is, the new tests do not reject the null hypothesis too often when the hypothesis is true, but they tend to frequently reject the null hypothesis when the latter is false.

925 citations


Book
01 Jan 1999
TL;DR: In this article, eight steps to successful data analysis are described, including choosing a test, hypothesis testing, sampling, experimental design, statistics, variables and distributions, and data exploration.
Abstract: Preface1 Eight steps to successful data analysis2 The basics3 Choosing a test: a key4 Hypothesis testing, sampling and experimental design5 Statistics, variables and distributions6 Descriptive and presentational techniques7 The tests 1: tests to look at differences8 The tests 2: tests to look at relationships9 The tests 3: tests for data exploration10 Symbols and letters used in statistics11 Assumption of the tests12 Hints and tipsGlossaryBibliography and short reviews of selected textsIndex

856 citations


Journal ArticleDOI
TL;DR: In this paper, a new false discovery rate controlling procedure is proposed for multiple hypotheses testing, which makes use of resampling-based p-value adjustment, and is designed to cope with correlated test statistics.

681 citations


Journal ArticleDOI
TL;DR: This article studies the small sample behavior of several test statistics that are based on maximum likelihood estimator, but are designed to perform better with nonnormal data.
Abstract: Structural equation modeling is a well-known technique for studying relationships among multivariate data. In practice, high dimensional nonnormal data with small to medium sample sizes are very common, and large sample theory, on which almost all modeling statistics are based, cannot be invoked for model evaluation with test statistics. The most natural method for nonnormal data, the asymptotically distribution free procedure, is not defined when the sample size is less than the number of nonduplicated elements in the sample covariance. Since normal theory maximum likelihood estimation remains defined for intermediate to small sample size, it may be invoked but with the probable consequence of distorted performance in model evaluation. This article studies the small sample behavior of several test statistics that are based on maximum likelihood estimator, but are designed to perform better with nonnormal data. We aim to identify statistics that work reasonably well for a range of small sample sizes and distribution conditions. Monte Carlo results indicate that Yuan and Bentler's recently proposed F-statistic performs satisfactorily.

553 citations


Book
16 Sep 1999
TL;DR: What is Bootstrapping?
Abstract: What is Bootstrapping? Estimation Confidence Sets and Hypothesis Testing Regression Analysis Forecasting and Time Series Analysis Which Resampling Method Should You Use? Efficient and Effective Simulation Special Topics When Does Bootstrapping Fail? Bibliography Indexes.

450 citations


Journal ArticleDOI
TL;DR: In this paper, the problem of testing for linearity and the number of regimes in the context of self-exciting threshold autoregressive (SETAR) models is reviewed.
Abstract: The problem of testing for linearity and the number of regimes in the context of self-exciting threshold autoregressive (SETAR) models is reviewed. We describe least-squares methods of estimation and inference. The primary complication is that the testing problem is non-standard, due to the presence of parameters which are only defined under the alternative, so the asymptotic distribution of the test statistics is non-standard. Simulation methods to calculate asymptotic and bootstrap distributions are presented. As the sampling distributions are quite sensitive to conditional heteroskedasticity in the error, careful modeling of the conditional variance is necessary for accurate inference on the conditional mean. We illustrate these methods with two applications--annual sunspot means and monthly U.S. industrial production. We find that annual sunspots and monthly industrial production are SETAR(2) processes. Copyright 1999 by Blackwell Publishers Ltd

Journal ArticleDOI
TL;DR: Intensive sampling of the individual's home range and habitat use during the time frame of the study leads to improved estimates for the individual, but use of location estimates as the sample unit to compare across animals is pseudoreplication, so it is recommended against use of habitat selection analysis techniques that use locations instead of individuals as a sample unit.
Abstract: The wildlife literature has been contradictory about the importance of autocorrelation in radiotracking data used for home range estimation and hypothesis tests of habitat selection. By definition, the concept of a home range involves autocorrelated movements, but estimates or hypothesis tests based on sampling designs that predefine a time frame of interest, and that generate representative samples of an animal's movement during this time frame, should not be affected by length of the sampling interval and autocorrelation. Intensive sampling of the individual's home range and habitat use during the time frame of the study leads to improved estimates for the individual, but use of location estimates as the sample unit to compare across animals is pseudoreplication. We therefore recommend against use of habitat selection analysis techniques that use locations instead of individuals as the sample unit. We offer a general outline for sampling designs for radiotracking studies.

Journal ArticleDOI
TL;DR: Several possible hypothesis test methods are evaluated: the paired t test, the nonparametric Wilcoxon signed-rank test, and two resampling tests, which indicate the more involved resampled test methodology is the most appropriate when testing threat scores from nonprobabilistic forecasts.
Abstract: When evaluating differences between competing precipitation forecasts, formal hypothesis testing is rarely performed. This may be due to the difficulty in applying common tests given the spatial correlation of and non-normality of errors. Possible ways around these difficulties are explored here. Two datasets of precipitation forecasts are evaluated, a set of two competing gridded precipitation forecasts from operational weather prediction models and sets of competing probabilistic quantitative precipitation forecasts from model output statistics and from an ensemble of forecasts. For each test, data from each competing forecast are collected into one sample for each case day to avoid problems with spatial correlation. Next, several possible hypothesis test methods are evaluated: the paired t test, the nonparametric Wilcoxon signed-rank test, and two resampling tests. The more involved resampling test methodology is the most appropriate when testing threat scores from nonprobabilistic forecasts. ...

Journal ArticleDOI
TL;DR: The results show that the asymptotic DerSimonian and Laird Q statistic and the bootstrap versions of the other tests give the correct type I error under the null hypothesis but that all of the tests considered have low statistical power, especially when the number of studies included in the meta-analysis is small.
Abstract: The identification of heterogeneity in effects between studies is a key issue in meta-analyses of observational studies, since it is critical for determining whether it is appropriate to pool the individual results into one summary measure. The result of a hypothesis test is often used as the decision criterion. In this paper, the authors use a large simulation study patterned from the key features of five published epidemiologic meta-analyses to investigate the type I error and statistical power of five previously proposed asymptotic homogeneity tests, a parametric bootstrap version of each of the tests, and tau2-bootstrap, a test proposed by the authors. The results show that the asymptotic DerSimonian and Laird Q statistic and the bootstrap versions of the other tests give the correct type I error under the null hypothesis but that all of the tests considered have low statistical power, especially when the number of studies included in the meta-analysis is small (<20). From the point of view of validity, power, and computational ease, the Q statistic is clearly the best choice. The authors found that the performance of all of the tests considered did not depend appreciably upon the value of the pooled odds ratio, both for size and for power. Because tests for heterogeneity will often be underpowered, random effects models can be used routinely, and heterogeneity can be quantified by means of R(I), the proportion of the total variance of the pooled effect measure due to between-study variance, and CV(B), the between-study coefficient of variation.

Journal ArticleDOI
TL;DR: This article examines how model selection in neural networks can be guided by statistical procedures such as hypothesis tests, information criteria and cross validation, and proposes five specification strategies based on different statistical procedures.

Journal ArticleDOI
TL;DR: These results provide a complete generalization of the results given by Veeravalli and Baum, where it was shown that the quasi-Bayesian MSPRT is asymptotically efficient with respect to the expected sample size for i.i.d. observations.
Abstract: The problem of sequential testing of multiple hypotheses is considered, and two candidate sequential test procedures are studied. Both tests are multihypothesis versions of the binary sequential probability ratio test (SPRT), and are referred to as MSPRTs. The first test is motivated by Bayesian optimality arguments, while the second corresponds to a generalized likelihood ratio test. It is shown that both MSPRTs are asymptotically optimal relative not only to the expected sample size but also to any positive moment of the stopping time distribution, when the error probabilities or, more generally, risks associated with incorrect decisions are small. The results are first derived for the discrete-time case of independent and identically distributed (i.i.d.) observations and simple hypotheses. They are then extended to general, possibly continuous-time, statistical models that may include correlated and nonhomogeneous observation processes. It also demonstrated that the results can be extended to hypothesis testing problems with nuisance parameters, where the composite hypotheses, due to nuisance parameters, can be reduced to simple ones by using the principle of invariance. These results provide a complete generalization of the results given by Veeravalli and Baum (see ibid., vol.41, p.1994-97, 1995), where it was shown that the quasi-Bayesian MSPRT is asymptotically efficient with respect to the expected sample size for i.i.d. observations.

Journal ArticleDOI
TL;DR: This tutorial provides an introduction to the hierarchical linear models technique in general terms, and then specifies model notation and assumptions in detail, and elaborate on model interpretation and provide guidelines for model checking.
Abstract: Hierarchical linear models are useful for understanding relationships in hierarchical data structures, such as patients within hospitals or physicians within hospitals. In this tutorial we provide an introduction to the technique in general terms, and then specify model notation and assumptions in detail. We describe estimation techniques and hypothesis testing procedures for the three types of parameters involved in hierarchical linear models: fixed effects, covariance components, and random effects. We illustrate the application using an example from the Type II Diabetes Patient Outcomes Research Team (PORT) study and use two popular PC-based statistical computing packages, HLM/2L and SAS Proc Mixed, to perform two-level hierarchical analysis. We compare output from the two packages applied to our example data as well as to simulated data. We elaborate on model interpretation and provide guidelines for model checking.

Journal ArticleDOI
TL;DR: In this paper, the authors provide a theoretical framework to study the accuracy of bootstrap P values, which may be based on a parametric or nonparametric bootstrap, and they show that, in many circumstances, the error in rejection probability of a bootstrap test will be one whole order of magnitude smaller than that of the corresponding asymptotic test.
Abstract: We provide a theoretical framework in which to study the accuracy of bootstrap P values, which may be based on a parametric or nonparametric bootstrap. In the parametric case, the accuracy of a bootstrap test will depend on the shape of what we call the critical value function. We show that, in many circumstances, the error in rejection probability of a bootstrap test will be one whole order of magnitude smaller than that of the corresponding asymptotic test. We also propose a simulation method for estimating this error that requires the calculation of only two test statistics per replication.

Book
29 Dec 1999
TL;DR: In this paper, the BCA algorithm has been used to estimate the standard error of a BCA CI for a normal population mean and for a Normal Population Mean for a nonparametric population variance.
Abstract: PREFACE: DATA ANALYSIS BY RESAMPLING PART I: RESAMPLING CONCEPTS INTRODUCTION CONCEPTS 1: TERMS AND NOTATION Case, Attributes, Scores, and Treatments / Experimental and Observational Studies / Data Sets, Samples, and Populations / Parameters, Statistics, and Distributions / Distribution Functions APPLICATIONS 1: CASES, ATTRIBUTES, AND DISTRIBUTIONS Attributes, Scores, Groups, and Treatments / Distributions of Scores and Statistics / Exercises CONCEPTS 2: POPULATIONS AND RANDOM SAMPLES Varieties of Populations / Random Samples APPLICATIONS 2: RANDOM SAMPLING Simple Random Samples / Exercises CONCEPTS 3: STATISTICS AND SAMPLING DISTRIBUTIONS Statistics and Estimators / Accuracy of Estimation / The Sampling Distribution / Bias of an Estimator / Standard Error of a Statistic / RMS Error of an Estimator / Confidence Interval APPLICATIONS 3: SAMPLING DISTRIBUTION COMPUTATIONS Exercises CONCEPTS 4: TESTING POPULATION HYPOTHESES Population Statistical Hypotheses / Population Hypothesis Testing APPLICATIONS 4: NULL SAMPLING DISTRIBUTION P-VALUES The p-value of a Directional Test / The p-value of a Nondirectional Test / Exercises CONCEPTS 5: PARAMETRICS, PIVOTALS, AND ASYMPTOTICS The Unrealizable Sampling Distribution / Sampling Distribution of a Sample Mean / Parametric Population Distributions / Pivotal Form Statistics / Asymptotic Sampling Distributions / Limitations of the Mathematical Approach APPLICATIONS 5: CIs FOR NORMAL POPULATION MEAN AND VARIANCE CI for a Normal Population Mean / CI for a Normal Population Variance / Nonparametric CI Estimation / Exercises CONCEPTS 6: LIMITATIONS OF PARAMETRIC INFERENCE Range and Precision of Scores / Size of Population / Size of Sample / Roughness of Population Distribution / Parameters and Statistics of Interests / Scarcity of Random Samples / Resampling Inference APPLICATIONS 6: RESAMPLING APPROACHES TO INFERENCE Exercises CONCEPTS 7: THE REAL AND BOOTSTRAP WORLDS The Real World of Population Inference / The Bootstrap World of Population Inference / Real World Population Distribution Estimates / Nonparametric Population Estimates / Sample Size and Distribution Estimates APPLICATIONS 7: BOOTSTRAP POPULATION DISTRIBUTIONS Nonparametric Population Estimates / Exercises CONCEPTS 8: THE BOOTSTRAP SAMPLING DISTRIBUTION The Bootstrap Conjecture / Complete Bootstrap Sampling Distributions / Monte Carlo Bootstrap Estimate of Standard Error / The Bootstrap Estimate of Bias / Simple Bootstrap CI Estimates APPLICATIONS 8: BOOTSTRAP SE, BIAS, AND CI ESTIMATES Example / Exercises CONCEPTS 9: BETTER BOOTSTRAP CIs: THE BOOTSTRAP-T Pivotal Form Statistics / The Bootstrap-t Pivotal Transformation / Forming Bootstrap-t CIs / Estimating the Standard Error of an Estimate / Range of Applications of the Bootstrap-t / Iterated Bootstrap CIs APPLICATIONS 9: SE AND CIs FOR TRIMMED MEANS Definition of the Trimmed Mean / Importance of the Trimmed Mean / A Note on Outliers / Determining the Trimming Fraction / Sampling Distribution of the Trimmed Mean / Applications / Exercises CONCEPTS 10: BETTER BOOTSTRAP CIs: BCA INTERVALS Bias Corrected and Accelerated CI Estimates / Applications of BCA CI / Better Confidence Interval Estimates APPLICATIONS 10: USING CI CORRECTION FACTORS Requirements for a BCA CI / Implementations of the BCA Algorithm / Exercise CONCEPTS 11: BOOTSTRAP HYPOTHESIS TESTING CIs, Null Hypothesis Tests, and p-values / Bootstrap-t Hypothesis Testing / Bootstrap Hypothesis Testing Alternatives / CI Hypothesis Testing / Confidence Intervals or p-values? APPLICATIONS 11: BOOTSTRAP P-VALUES Computing a Bootstrap-t p-value / Fixed-alpha CIs and Hypothesis Testing / Computing a BCI CI p-Value / Exercise CONCEPTS 12: RANDOMIZED TREATMENT ASSIGNMENT Two Functions of Randomization / Randomization of Sampled Cases / Randomization of Two Available Cases / Statistical Basis for Local Casual Inference / Population Hypothesis Revisited APPLICATIONS 12: MONTE CARLO REFERENCE DISTRIBUTIONS Serum Albumen in Diabetic Mice / Resampling Stats Analysis / SC Analysis / S-Plus Analysis / Exercises CONCEPTS 13: STRATEGIES FOR RANDOMIZING CASES Independent Randomization of Cases / Completely Randomized Designs / Randomized Blocks Designs / Restricted Randomization / Constraints on Rerandomization APPLICATIONS 13: IMPLEMENTING CASE RERANDOMIZATION Completely Randomized Designs / Randomized Blocks Designs / Independent Randomization of Cases / Restricted Randomization / Exercises CONCEPTS 14: RANDOM TREATMENT SEQUENCES Between- and Within-Cases Designs / Randomizing the Sequence of Treatments / Casual Inference for Within-Cases Designs / Sequence of Randomization Strategies APPLICATIONS 14: RERANDOMIZING TREATMENT SEQUENCES Analysis of the AB-BA Design / Sequences of k > 2 Treatments / Exercises CONCEPTS 15: BETWEEN- AND WITHIN-CASE DECISIONS Between/Within Designs / Between/Within Resampling Strategies / Doubly Randomized Available Cases APPLICATIONS 15: INTERACTIONS AND SIMPLE EFFECTS Simple and Main Effects / Exercises CONCEPTS 16: SUBSAMPLES: STABILITY OF DESCRIPTION Nonrandom Studies and Data Sets / Local Descriptive Inference / Descriptive Stability and Case Homogeneity / Subsample Descriptions / Employing Subsample Descriptions / Subsamples and Randomized Studies APPLICATIONS 16: STRUCTURED & UNSTRUCTURED DATA Half-Samples of Unstructured Data / Subsamples of Source-Structured Cases / Exercises PART II: RESAMPLING APPLICATIONS INTRODUCTION APPLICATIONS 17: A SINGLE GROUP OF CASES Random Sample or Set of Available Cases / Typical Size of Score Distribution / Variability of Attribute Scores / Association Between Two Attributes / Exercises APPLICATIONS 18: TWO INDEPENDENT GROUPS OF CASES Constitution of Independent Groups / Location Comparisons for Samples / Magnitude Differences, CR and RB Designs / Magnitude Differences, Nonrandom Designs / Study Size / Exercises APPLICATIONS 19: MULTIPLE INDEPENDENT GROUPS Multiple Group Parametric Comparisons / Nonparametric K-group Comparison / Comparisons among Randomized Groups / Comparisons among Nonrandom Groups / Adjustment for Multiple Comparisons / Exercises APPLICATIONS 20: MULTIPLE FACTORS AND COVARIATES Two Treatment Factors / Treatment and Blocking Factors / Covariate Adjustment of Treatment Scores / Exercises APPLICATIONS 21: WITHIN-CASES TREATMENT COMPARISONS Normal Models, Univariate and Multivariate / Bootstrap Treatment Comparisons / Randomized Sequence of Treatments / Nonrandom Repeated Measures / Exercises APPLICATIONS 22: LINEAR MODELS: MEASURED RESPONSE The Parametric Linear Model / Nonparametric Linear Models / Prediction Accuracy / Linear Models for Randomized Cases / Linear Models for Nonrandom Studies / Exercises APPLICATIONS 23: CATEGORICAL RESPONSE ATTRIBUTES Cross-Classification of Cases / The 2 x 2 Table / Logistic Regression / Exercises POSTSCRIPT: GENERALITY, CAUSALITY & STABILITY Study Design and Resampling / Resampling Tools / REFERENCES / INDEX

Journal ArticleDOI
TL;DR: A Monte Carlo study of the problem of testing for a centre effect in multi-centre studies following a proportional hazards regression analysis shows that for moderate samples the fixed effects tests have nominal levels much higher than specified, but the random effect test performs as expected under the null hypothesis.
Abstract: The problem of testing for a centre effect in multi-centre studies following a proportional hazards regression analysis is considered. Two approaches to the problem can be used. One fits a proportional hazards model with a fixed covariate included for each centre (except one). The need for a centre specific adjustment is evaluated using either a score, Wald or likelihood ratio test of the hypothesis that all the centre specific effects are equal to zero. An alternative approach is to introduce a random effect or frailty for each centre into the model. Recently, Commenges and Andersen have proposed a score test for this random effects model. By a Monte Carlo study we compare the performance of these two approaches when either the fixed or random effects model holds true. The study shows that for moderate samples the fixed effects tests have nominal levels much higher than specified, but the random effect test performs as expected under the null hypothesis. Under the alternative hypothesis the random effect test has good power to detect relatively small fixed or random centre effects. Also, if the centre effect is ignored the estimator of the main treatment effect may be quite biased and is inconsistent. The tests are illustrated on a retrospective multi-centre study of recovery from bone marrow transplantation.

Journal ArticleDOI
TL;DR: In this paper, the authors present an overview of the statistical model underlying the Mayfield method and of its assumptions, and present a test of goodness-of-fit based on the deviance.
Abstract: The Mayfield method of estimating a constant probability of daily nest success adjusts for the fact that nests found part-way through a nesting stage (egglaying, incubation or brood-rearing) have, by definition, not failed since the stage began. The equality of two independently estimated probabilities can be tested using their associated standard errors. However, published Mayfield methodology does not extend to testing the equality of three or more probabilities, considering multi-way comparisons, or fitting complex regression-type models. It is important that such flexibility is accessible to biologists. I present an overview of the statistical model underlying the Mayfield method and of its assumptions, and present a test of goodness-of-fit based on the deviance. The model is extended to one-way classifications numbering two or more categories. A hypothesis test of the equality of all category-specific probabilities is derived based on the likelihood-ratio statistic. Relevant formulae are given explic...

Journal ArticleDOI
TL;DR: Some aspects of signal detection theory relevant to FNI and, in addition, some common approaches to statistical inference used in FNI are discussed; low-pass filtering in relation to functional-anatomical variability and some effects of filtering on signal detection of interest to F NI are discussed.
Abstract: The field of functional neuroimaging (FNI) methodology has developed into a mature but evolving area of knowledge and its applications have been extensive. A general problem in the analysis of FNI data is finding a signal embedded in noise. This is sometimes called signal detection. Signal detection theory focuses in general on issues relating to the optimization of conditions for separating the signal from noise. When methods from probability theory and mathematical statistics are directly applied in this procedure it is also called statistical inference. In this paper we briefly discuss some aspects of signal detection theory relevant to FNI and, in addition, some common approaches to statistical inference used in FNI. Low-pass filtering in relation to functional-anatomical variability and some effects of filtering on signal detection of interest to FNI are discussed. Also, some general aspects of hypothesis testing and statistical inference are discussed. This includes the need for characterizing the signal in data when the null hypothesis is rejected, the problem of multiple comparisons that is central to FNI data analysis, omnibus tests and some issues related to statistical power in the context of FNI. In turn, random field, scale space, non-parametric and Monte Carlo approaches are reviewed, representing the most common approaches to statistical inference used in FNI. Complementary to these issues an overview and discussion of non-inferential descriptive methods, common statistical models and the problem of model selection is given in a companion paper. In general, model selection is an important prelude to subsequent statistical inference. The emphasis in both papers is on the assumptions and inherent limitations of the methods presented. Most of the methods described here generally serve their purposes well when the inherent assumptions and limitations are taken into account. Significant differences in results between different methods are most apparent in extreme parameter ranges, for example at low effective degrees of freedom or at small spatial autocorrelation. In such situations or in situations when assumptions and approximations are seriously violated it is of central importance to choose the most suitable method in order to obtain valid results.

Journal ArticleDOI
TL;DR: A generalized framework for hypothesis testing as a means of controlling for size and density effects is provided, and this method is applied to several well-known sets of social network data.

Journal ArticleDOI
TL;DR: It is concluded that the most productive way to define management units is on a case‐by‐case basis and that creating analytical tools designed specifically to address decision making in a management context, rather than re‐tooling academic tools designed for other purposes, will increase and improve the use of genetics in conservation.
Abstract: In contrast to the goals of the symposium from which this series of papers originated, we argue that attempts to apply unambiguously defined and general management unit criteria based solely on genetic parameters can easily lead to incorrect management decisions. We maintain that conservation genetics is best served by altering the perspective of data analysis so that decision making is optimally facilitated. To do so requires accounting for policy objectives early in the design and execution of the science. This contrasts with typical hypothesis testing approaches to analysing genetic data for determining population structure, which often aspire to objectivity by considering management objectives only after the analysis is complete. The null hypothesis is generally taken as panmixia with a strong predilection towards avoiding false acceptance of the alternative hypothesis (the existence of population structure). We show by example how defining management units using genetic data and standard scientific analyses that do not consider either the specific management objectives or the anthropogenic risks facing the populations being studied can easily result in a management failure by losing local populations. We then use the same example to show how an ‘applied’ approach driven by specific objectives and knowledge of abundance and mortality results in appropriate analyses and better decisions. Because management objectives stem from public policy, which differs among countries and among species groups, criteria for defining management units must be specific, not general. Therefore, we conclude that the most productive way to define management units is on a case-by-case basis. We also suggest that creating analytical tools designed specifically to address decision making in a management context, rather than re-tooling academic tools designed for other purposes, will increase and improve the use of genetics in conservation.

Journal ArticleDOI
TL;DR: The utility of a Bayesian perspective, especially for complex problems, is becoming increasingly clear to the statistics community; geneticists are also finding this framework useful and are increasingly utilizing the power of this approach.

Book
01 Jan 1999
TL;DR: The Elements of Inference discusses statistical models, hypothesis testing, and confidence intervals for Bayesian and Bayesian estimation of linear models and other analytical approximations.
Abstract: Introduction Information The concept of probability Assessing subjective probabilities An example Linear algebra and probability Notation Outline of the book Elements of Inference Common statistical models Likelihood-based functions Bayes theorem Exchangeability Sufficiency and exponential family Parameter elimination Prior Distribution Entirely subjective specification Specification through functional forms Conjugacy with the exponential family Non-informative priors Hierarchical priors Estimation Introduction to decision theory Bayesian point estimation Classical point estimation Empirical Bayes estimation Comparison of estimators Interval estimation Estimation in the Normal model Approximating Methods The general problem of inference Optimization techniques Asymptotic theory Other analytical approximations Numerical integration methods Simulation methods Hypothesis Testing Introduction Classical hypothesis testing Bayesian hypothesis testing Hypothesis testing and confidence intervals Asymptotic tests Prediction Bayesian prediction Classical prediction Prediction in the Normal model Linear prediction Introduction to Linear Models The linear model Classical estimation of linear models Bayesian estimation of linear models Hierarchical linear models Dynamic linear models Linear models with constraints Sketched Solutions to Selected Exercises List of Distributions References Index Exercises appear at the end of each chapter.

Journal ArticleDOI
TL;DR: The role of the confidence interval (CI) in statistical inference and its advantages over conventional hypothesis testing are examined, particularly when data are applied in the context of clinical practice.
Abstract: This article examines the role of the confidence interval (CI) in statistical inference and its advantages over conventional hypothesis testing, particularly when data are applied in the context of clinical practice. A CI provides a range of population values with which a sample statistic is consistent at a given level of confidence (usually 95%). Conventional hypothesis testing serves to either reject or retain a null hypothesis. A CI, while also functioning as a hypothesis test, provides additional information on the variability of an observed sample statistic (ie, its precision) and on its probable relationship to the value of this statistic in the population from which the sample was drawn (ie, its accuracy). Thus, the CI focuses attention on the magnitude and the probability of a treatment or other effect. It thereby assists in determining the clinical usefulness and importance of, as well as the statistical significance of, findings. The CI is appropriate for both parametric and nonparametric analyses and for both individual studies and aggregated data in meta-analyses. It is recommended that, when inferential statistical analysis is performed, CIs should accompany point estimates and conventional hypothesis tests wherever possible.

Journal ArticleDOI
TL;DR: In this paper, alternative test statistics are presented and better approximating test distributions are derived, and the methods in the unbalanced heteroscedastic 1-way random ANOVA model and for the probability difference method are discussed.
Abstract: In many fields of applications, test statistics are obtained by combining estimates from several experiments, studies or centres of a multi-centre trial. The commonly used test procedure to judge the evidence of a common overall effect can result in a considerable overestimation of the significance level, leading to a high rate of too liberal decisions. Alternative test statistics are presented and better approximating test distributions are derived. Explicitly discussed are the methods in the unbalanced heteroscedastic 1-way random ANOVA model and for the probability difference method, including interaction treatment by centres or studies. Numerical results are presented by simulation studies.

Journal ArticleDOI
TL;DR: In this article, the authors consider the issue of performing statistical inference for Lorenz curve orderings, which involves testing for an ordered relationship in a multivariate context and making comparisons among more than two population distributions.
Abstract: In this paper we consider the issue of performing statistical inference for Lorenz curve orderings. This involves testing for an ordered relationship in a multivariate context and making comparisons among more than two population distributions. Our approach is to frame the hypotheses of interest as sets of linear inequality constraints on the vector of Lorenz curve ordinates, and apply order-restricted statistical inference to derive test statistics and their sampling distributions. We go on to relate our results to others which have appeared in recent literature, and use Monte Carlo analysis to highlight their respective properties and comparative performances. Finally, we discuss in general terms the issue and problems of framing hypotheses, and testing them, in the context of the study of income inequality, and suggest ways in which the distributional analyst could best proceed, illustrating with empiricalexamples.

Journal ArticleDOI
TL;DR: In this paper, a semiparametric estimator of a household equivalence scale under the assumption of base independence without putting any further restrictions on the shape of household Engel curves is presented.