scispace - formally typeset
Search or ask a question

Showing papers on "Statistical hypothesis testing published in 1984"


Book ChapterDOI
TL;DR: In this article, the authors use the Wald, Likelihood Ratio, and Lagrange Multiplier (LM) tests in a maximum likelihood framework to evaluate the robustness of statistical models.
Abstract: Publisher Summary The use of increasingly complex statistical models has led to heavy reliance on maximum likelihood methods for both estimation and testing. In such a setting, only asymptotic properties can be expected for estimators or tests. Often there are asymptotically equivalent procedures that differ substantially in computational difficulty and finite sample performance. In a maximum likelihood framework, the Wald, Likelihood Ratio, and Lagrange Multiplier (LM) tests are a natural trio. They all share the property of being asymptotically locally the most powerful invariant tests, and in fact all are asymptotically equivalent. However, in practice, there are substantial differences in the way the tests look at particular models. Frequently when one is very complex, another will be much simpler. Furthermore, this formulation guides the intuition as to what is testable and how best to formulate a model to test it. In terms of forming diagnostic tests, the LM test is frequently computationally convenient as many of the test statistics are already available from the estimation of the null. The application of these tests principles and particularly the LM principle to a wide range of econometric problems is a natural development of the field, and it is a development that is proceeding at a very rapid pace.

687 citations


Book
01 Jan 1984
TL;DR: In this article, the authors define the binomial and normal probability distributions choosing samples, statistical inference, estimation and hypothesis testing sample size and power linear regression and correlation analysis of variance factorial designs.
Abstract: Basic definitions and concepts data graphics introduction to probability - the binomial and normal probability distributions choosing samples statistical inference - estimation and hypothesis testing sample size and power linear regression andcorrelation analysis of variance factorial designs transformations and outliers experimental design in clinical trials quality control validation computer-intensive methods nonparametric methods optimization techniques and screening designs.Appendices: some properties of the variance comparison of slopes and testing of linearity - determination of relative potency multiple regression tables outlier tests and chemical assays should a single unexplained failing assay be reason to reject abatch answers to exercises.

481 citations


Book
01 Jan 1984
TL;DR: In this paper, the authors introduce the concept of Bivariate association for nominal-and ordinal-level variables, and present a set of measures of central tendency and Chi-square distribution.
Abstract: 1. Introduction. PART I: DESCRIPTIVE STATISTICS. 2. Basic Descriptive Statistics: Tables, Percentages, Ratios and Rates, and Graphs. 3. Measures of Central Tendency. 4. Measures of Dispersion. 5. The Normal Curve. PART II INFERENTIAL STATISTICS. 6. Introduction to Inferential Statistics: Sampling and the Sampling Distribution. 7. Estimation Procedures. 8. Hypothesis Testing I: The One-Sample Case. 9. Hypothesis Testing II: The Two-Sample Case. 10. Hypothesis Testing III: The Analysis of Variance. 11. Hypothesis Testing IV: Chi Square. PART III BIVARIATE MEASURES OF ASSOCIATION. 12. Bivariate Association for Nominal- and Ordinal-Level Variables. 13. Association Between Variables Measured at the Interval-Ratio Level. PART IV: MULTIVARIATE TECHNIQUES. 14. Elaborating Bivariate Tables. 15. Partial Correlation and Multiple Regression and Correlation. Appendix A: Area Under the Normal Curve. Appendix B: Distribution of t. Appendix C: Distribution of Chi Square. Appendix D: Distribution of F.

351 citations


Journal ArticleDOI
TL;DR: Results encouraged investigations into modeling the picture as a mosaic of patches where the gray-value function within each patch is described as a second-order bivariate polynomial of the pixel coordinates, facilitating the determination of threshold values related to a priori confidence limits.
Abstract: Modeling the image as a piecewise linear gray-value function of the pixel coordinates considerably improved a change detection test based previously on a piecewise constant gray-value function. These results encouraged investigations into modeling the picture as a mosaic of patches where the gray-value function within each patch is described as a second-order bivariate polynomial of the pixel coordinates. Such a more appropriate model allowed the assumption to be made that the remaining gray-value variation within each patch can be attributed to noise related to the sensing and digitizing devices, independent of the individual image frames in a sequence. This assumption made it possible to relate the likelihood test for change detection to well-known statistical tests ( t test, F test), facilitating the determination of threshold values related to a priori confidence limits.

213 citations


Journal ArticleDOI
TL;DR: The anomalous data identification procedures existing today in power system state estimation become problematic-if not totally unefficient-under stringent conditions, such as multiple and interacting bad data.
Abstract: The anomalous data identification procedures existing today in power system state estimation become problematic-if not totally unefficient-under stringent conditions, such as multiple and interacting bad data. The identification method presented in this paper attempts to alleviate these difficulties. It consists in :(i) computing measurement error estimates and using them as the random variables of concern;(ii) making decisions on the basis of a hypothesis testing which takes into account their statistical properties. Two identification techniques are then derived and further investigated and assessed by means of a realistic illustrative example. Conceptually novel, the identification methodology is thus shown to lead to practical procedures which are efficient, reliable and workable under all theoretically feasible conditions.

184 citations


Journal ArticleDOI
TL;DR: The basic structure of the bivariate generalization of Engle's ARCH model is described in this paper, and conditions which guarantee that the conditional covariance matrix is well defined are summarized, as are estimation and hypothesis testing.

181 citations


Journal ArticleDOI
TL;DR: In this paper, the problem of hypothesis testing of regression analysis of data sets with multiple measurements from individual sampling units is discussed. But these problems arise when regression analysis is applied to data sets which contain many measurements from different sampling units.

156 citations


Journal ArticleDOI
TL;DR: Several guidelines concerning use of multiple-comparison procedures for comparing means, medians, or proportions are proposed, including specification of which test was used and why and a recommendation encouraging a switch to the use of confidence intervals instead of hypothesis tests.
Abstract: Multiple-comparison procedures for comparing means, medians, or proportions are commonly used by entomologists publishing in ecological and agricultural journals of the Entomological Society of America. Unfortunately, there is confusion among many researchers and reviewers with respect to the type I error rates of the various tests. The calculation of and reasoning behind the error rate and relative conservativeness or liberalness of each test are discussed. Several guidelines concerning use of these tests are proposed, including specification of which test was used and why, and a recommendation encouraging a switch to the use of confidence intervals instead of hypothesis tests. It is felt that adoption of these and other proposals for reporting results will increase the meaningfulness and scientific merit of published entomological research.

139 citations


Journal ArticleDOI
TL;DR: In this paper, the authors examine whether the hypothesis-testing strategies employed by auditors affect their search for, attention to, and use of judgment data and find that the decision maker tends to search for evidence that confirms the hypothesis under scrutiny.
Abstract: Auditors make numerous judgments in evaluating internal control, assessing audit risk, designing and implementing sampling plans, and appraising and reporting on uncertainties. Libby [1981] notes that auditors often explicitly or implicitly formulate hypotheses within these judgment tasks, and typically, alternative hypotheses are possible for any given judgment. After the hypothesis has been framed, the auditor searches for data to test it. In this paper I examine whether the hypothesis-testing strategies employed by auditors affect their search for, attention to, and use of judgment data. Snyder and Swann [1978] indicate that at least three such strategies might be employed. In one, the decision maker tends to search for evidence that confirms the hypothesis under scrutiny. For example, in evaluating the hypothesis that a target individual was an extrovert, the decision maker might recall or search for instances where the target behaved in an extroverted manner. Using this type of confirmatory strategy, the individual would be more inclined to accept the hypothesis if a number of hypothesis-confirming events are uncovered. A second strategy would be to search for disconfirming evidence (e.g., introverted behavior) when testing whether a target was an extrovert. If sufficient evidence is uncovered, the decision maker would be more prone to reject the extrovert hypothesis. Finally, the decision maker could invest equal

129 citations


Book ChapterDOI
TL;DR: A survey of multiple hypothesis testing procedures with an emphasis on those procedures that can be applied in the context of the classical linear regression model is presented in this article, where the authors present a survey of the most frequently used tests in econometrics.
Abstract: Publisher Summary This chapter presents a survey of multiple hypothesis testing procedures with an emphasis on those procedures that can be applied in the context of the classical linear regression model. Multiple hypothesis testing is the testing of two or more separate hypotheses simultaneously. The ‘t’ and ‘F’ tests are the most frequently used tests in econometrics. In regression analysis, there are two different procedures that can be used to test the hypothesis that all the coefficients are zero. One procedure is to test each coefficient separately with a ‘t’ test, and the other is to test all coefficients jointly using an ‘F’ test. The investigator usually performs both procedures when analyzing the sample data. It has been proved that the F test is equivalent to carrying out a set of simultaneous ‘t’ tests. The chapter also discusses an induced test, which is either finite or infinite depending on whether there are a finite or infinite number of separate hypotheses. In the case of finite induced tests, the exact sampling distributions of the test statistics can be complicated, so that, in practice, the critical regions of the tests are based on probability inequalities. On the other hand, infinite induced tests are commonly constructed such that the correct critical value can be readily calculated.

129 citations


Journal ArticleDOI
TL;DR: In this article, the authors discuss the sample size problem and four factors which affect its solution: significance level, statistical power, analysis procedure, and effect size, and demonstrate the interrelationship between these factors.
Abstract: In planning a research study, investigators are frequently uncertain regarding the minimal number of subjects needed to adequately test a hypothesis of interest. The present paper discusses the sample size problem and four factors which affect its solution: significance level, statistical power, analysis procedure, and effect size. The interrelationship between these factors is discussed and demonstrated by calculating minimal sample size requirements for a variety of research conditions.

Journal ArticleDOI
TL;DR: In this paper, the authors present an approach to analysis of variance modeling in designs where all factors are orthogonal, based on formal mathematical definitions of concepts related to factors and experimental designs.
Abstract: Summary This paper presents an approach to analysis of variance modelling in designs where all factors are orthogonal, based on formal mathematical definitions of concepts related to factors and experimental designs. The structure of an orthogonal design is described by a factor structure diagram containing the information about nestedness relations between the factors. An orthogonal design determines a unique decomposition of the observation space as a direct sum of orthogonal subspaces, one for each factor of the design. A class of well-behaved variance component models, stated in terms of fixed and random effects of factors from a given design, is characterized, and the solutions to problems of estimation and hypothesis testing within this class are given in terms of the factor structure diagram and the analysis of variance table induced by the decomposition.

Journal ArticleDOI
TL;DR: In this paper, a noncentral F-approximation for power for all but the largest root has been proposed for the null or non-null case, and an upper bound on the maximum root power has been provided.


Journal ArticleDOI
Abstract: The variance-covariance matrix of the parameter estimates in a model containing the Box-Cox transformation is analytically examined. Breaking the variance-covariance matrix into components helps in understanding (1) why some estimation algorithms are more efficient than others, (2) why both iterated OLS estimation and first derivative-only gradient estimation methods obtain biased estimates of the variances of the coefficients (with OLS underestimating the variances, and first derivative methods overestimating them), and (3) how the lack of scale invariance in the t-ratios for the linear coefficients makes hypothesis testing very misleading.

Journal ArticleDOI
TL;DR: In this article, an extremely general procedure for performing a wide variety of model specification tests by running artificial linear regressions was developed, which allows us to develop non-nested hypothesis tests for any set of models which attempt to explain the same dependent variable(s), even when the error specifications of the competing models differ.
Abstract: This paper develops an extremely general procedure for performing a wide variety of model specification tests by running artificial linear regressions. Inference may then be based either on a Lagrange Multiplier statistic from the procedure, or on conventional asymptotic t or F tests based on the artificial regressions. This procedure allows us to develop non-nested hypothesis tests for any set of models which attempt to explain the same dependent variable(s), even when the error specifications of the competing models differ. (This abstract was borrowed from another version of this item.)

Journal ArticleDOI
TL;DR: In this paper, higher-order asymptotic expansions are developed for comparing the size and power of some common procedures for testing linear hypotheses on the regression coefficients in a class of generalized normal linear models.
Abstract: A WELL DEVELOPED EXACT STATISTICAL THEORY exists for hypothesis testing in the normal linear regression model when the errors are independent and homoscedastic. In the more general case where the error covariance matrix is nonscalar and depends on a set of unknown parameters, exact analysis is difficult and reliance is usually placed on asymptotic approximations for large sample size n. In this paper, higher-order asymptotic expansions are developed for comparing the size and power of some common procedures for testing linear hypotheses on the regression coefficients in a class of generalized normal linear models. The class investigated is essentially the same as in Breusch [4] and Magnus [6] and includes many of the examples of heteroscedasticity and autocorrelation discussed in the literature. We assume simply that the regressors are nonrandom and that the error covariance matrix is a smooth function of a few parameters that can be efficiently estimated by maximum likelihood. Tests based on the Wald, likelihood ratio, and Lagrange multiplier principles are considered. These principles lead to three tests which, though distinct in finite samples, are locally asymptotically equivalent and share certain asymptotic optimality properties. Of course, there are infinitely many other tests that are asymptotically equivalent to the ones examined here. Although the techniques of this paper can be applied to any of them, our results concern only the tests arising from the three traditional principles. We show that, to a second order of approximation under local alternatives, the likelihood ratio test statistic is a simple average of the Wald statistic and the Lagrange multiplier statistic. When the null hypothesis is one dimensional, the three tests are, to second order, equally powerful; that is, after the critical regions are adjusted so that the tests have (to order n - i) the same size, the local power functions differ by terms of smaller order than n -. When the null hypothesis contains more than one

Journal ArticleDOI
TL;DR: Researchers often test the wrong statistical hypothesis in evaluating their experiments, thereby drawing wrong conclusions, and data containing correlations among subjects require a different form of statistical analysis than do data involving independent observations.
Abstract: Researchers often test the wrong statistical hypothesis in evaluating their experiments, thereby drawing wrong conclusions. Examples of this are given. In addition, data containing correlations among subjects require a different form of statistical analysis than do data involving independent observations. Errors that can result from an incorrect analysis are illustrated.

Journal ArticleDOI
TL;DR: Sample size graphs are given for clinical trials designed to test whether an experimental therapy is as effective as a standard therapy to assume a dichotomous outcome variable.

Journal ArticleDOI
TL;DR: In this paper, the conditional intensity function increases monotonically between events and drops by determined (nonrandom) amounts after each event, and the Hessian matrix is not asymptotically constant.

ReportDOI
01 Sep 1984
TL;DR: In this paper, the authors consider the problem of testing a sequence of independent and normally distributed random variables with common mean against alternatives involving a shift in the mean at an unknown time point.
Abstract: : This document considers the problem of testing a sequence of independent and normally distributed random variables with common mean against alternatives involving a shift in the mean at an unknown time point. The purpose of this paper is to study the asymptotic operating characteristics of the likelihood ration test. In the next section, the authors derive, by invoking a result of Darling and Erdos (1956), the asymptotic null distribution of the likelihood ration test, which is related to the extreme value behavior of the Ornstein-Uhlenbeck process. Section 3 studies the asymptotic operating characteristics of the likelihood ratio test and make comparisons between likelihood ration and Bayesian tests.

Journal ArticleDOI
TL;DR: This article examined the small sample properties of three testing strategies used to analyze the rationality, monetary neutrality and market efficiency hypotheses, and highlighted the extensive bias incurred by drawing inferences from simple unadjusted "two-step" estimates.

Journal ArticleDOI
01 Oct 1984
TL;DR: Some 40 years ago, Harold Hotelling pointed out that the statistical textbooks of that period were full of misconceptions, and were rather uniformly unaware of the new and dramatic development of the mathematical discipline of statistical inference.
Abstract: Some 40 years ago, Harold Hotelling pointed out that the statistical textbooks of that period were written largely by non-mathematicians. Those books were full of misconceptions, and were rather uniformly unaware of the new and dramatic development of the mathematical discipline of statistical inference. They did not take advantage of the sharpened logic for making decisions about populations on the basis of sample statistics, including the improved logic of estimation and of hypothesis testing. The situation was slowly remedied as more mathematical statisticians began to issue textbooks, until today the pendulum may have swung too far. In some quarters, the symbols of inference rather than the substance may have taken over. This appears to be especially true in the social sciences with which I am most acquainted, and to which this paper is largely (but not exclusively) addressed. For example, referees and editors of some journals insist on decorating tables of various kinds of data with stars and double stars, and on presenting lists of "standard errors", despite the fact that the implied probabilities for significance or confidence are quite erroneous from the point of view of statistical inference (see Problems 3 and 1 below).

Journal ArticleDOI
TL;DR: In this article, photon-limited images are cross correlated with a classical intensity reference scene to provide an estimate of the high-light-level cross-correlation function, and the theory of hypothesis testing is used to calculate the probabilities of detection and false alarm.
Abstract: A method for scene matching at low-light levels is analyzed. In this method photon-limited images are cross correlated with a classical intensity reference scene to provide an estimate of the high-light-level cross-correlation function. Expressions for the probability density function and characteristic function of the correlation signal are given for general input scenes and reference images. The theory of hypothesis testing is used to calculate the probabilities of detection and false alarm. The recognition capabilities of the method are illustrated with a simple example.

Journal ArticleDOI
TL;DR: In general, it is found that the failure to meet the assumption of independence leads to a conservative test of the goodness‐of‐fit of the path model, although likelihood ratio tests of specific null hypotheses were at times liberal, at times conservative, and at times nearly exact.
Abstract: Path analysis of family data has been widely applied to resolve genetic and environmental patterns of familial resemblance. A prevalent statistical approach in path analysis has been, first, to estimate the familial correlations and, second, by assuming these estimates to be independently distributed, define a likelihood function from which maximum likelihood estimates of model parameters can be obtained and likelihood ratio tests of hypotheses performed. Although it is generally known that the independence assumption does not hold when multiple familial correlations are estimated from the same family data, this statistical method has still been used in these situations owing, in part, to the lack of any viable alternatives and, in part, to the lack of any knowledge about the specific quantitative effects of not meeting the assumption of independence. Here, using computer-simulation methods, we evaluate the robustness of this statistical method to deviations from the assumption of independence. In general, we found that the failure to meet the assumption of independence leads to a conservative test of the goodness-of-fit of the path model, although likelihood ratio tests of specific null hypotheses were at times liberal, at times conservative, and at times nearly exact. Although the test statistics were found to be distorted, the parameter estimates using this method were nearly unbiased.

Journal ArticleDOI
TL;DR: It was found that the overwhelming majority of curves were linear, though ability to detect non-linearity of dose-response curves in the standard plate test is only limited, and authors tending to judge more experiments as positive when the dose- response is not linear.
Abstract: We searched the published literature for Salmonella test data on some 450 chemicals. Only 137 of more than 400 articles containing original data satisfied minimum criteria for a quantitative analysis [1751 experiments, comprising data on 152 chemicals (Table 1)]. Many of these papers did not report basic information about the test protocol (Table 2). We used previously described statistical procedures (Bernstein et al., 1982) to estimate the initial slopes of the dose-response curves and corresponding standard errors. We also applied tests for significance and linear goodness-of-fit. We then used the results of these analyses to examine several issues: (1) Linearity of the low dose region of the dose-response curve. We found that the overwhelming majority of curves were linear, though ability to detect non-linearity of dose-response curves in the standard plate test is only limited. 7% of all experiments to which the goodness-of-fit test was applied were curves of increasing slope, and with a few possible exceptions, these were not obviously associated with any particular mutagens, even those generally considered to produce non-linear effects such as MNNG and EMS (Table 3). (2) Performance of the statistical test for significance. Results of the statistical test for significance of the dose-response were compared with author's opinions as to positivity. In almost all cases (94%) results of the statistical test and authors opinions were the same. In the examples of conflicting opinions, the reasons were: (a) the statistical test places more weight than do most authors on the presence of a linear dose-response; (b) most authors tend to require at least a 2-fold increase over the spontaneous background for 'significance', and (c) when the number of spontaneous revertants is small (e.g., TA1537), authors tend to require a larger increase in induced revertants than when the spontaneous background is large, whereas the statistical procedure makes no such distinction. These factors result in the statistical test tending to identify more experiments as positive than do authors, provided there is a linear dose-response, and authors tending to judge more experiments as positive when the dose-response is not linear. (3) Reproducibility. Among the 1751 experiments there were 122 data-sets (a total of 333 experiments) in which the same chemical was tested by two or more different laboratories under the same protocol. 21 of the 122 data-sets had some disagreement between experiments as to whether results were positive or negative (Table 4).(ABSTRACT TRUNCATED AT 400 WORDS)


Journal ArticleDOI
TL;DR: In this paper, a Trend Detection Method is presented that provides: Hypothesis Formulation - statement of the problem to be tested, Data Preparation - selection of water quality variable and data, Data Analysis - exploratory data analysis techniques, and 4) Statistical Tests - tests for detecting trends.
Abstract: With the advent of standards and criteria for water quality variables, there has been an increasing concern about the changes of these variables over time. Thus, sound statistical methods for determining the presence or absence of trends are needed. A Trend Detection Method is presented that provides: 1) Hypothesis Formulation - statement of the problem to be tested, 2) Data Preparation - selection of water quality variable and data, 3) Data Analysis - exploratory data analysis techniques, and 4) Statistical Tests - tests for detecting trends. The method is utilized in a stepwise fashion and is presented in a nonstatistical manner to allow use by those not well versed in statistical theory. While the emphasis herein is on lakes, the method may be adopted easily to other water bodies.

Journal ArticleDOI
TL;DR: In this paper, Monte Carlo experiments were used to evaluate whether trace-level water-quality data that are routinely censored (not reported) contain valuable information for trend detection, and the resulting classes of data were subjected to a nonparametric statistical test for trend.
Abstract: Monte Carlo experiments were used to evaluate whether trace-level water-quality data that are routinely censored (not reported) contain valuable information for trend detection. Measurements are commonly censored if they fall below a level associated with some minimum acceptable level of reliability (detection limit). Trace-level organic data were simulated with best- and worst-case estimates of measurement uncertainty, various concentrations and degrees of linear trend, and different censoring rules. The resulting classes of data were subjected to a nonparametric statistical test for trend. For all classes of data evaluated, trends were most effectively detected in uncensored data as compared to censored data even when the data censored were highly reliable. Thus, censoring data at any concentration level may eliminate valuable information. Whether or not valuable information for trend analysis is, in fact, eliminated by censoring of actual rather than simulated data depends on whether the analytical process is in statistical control and bias is predictable for a particular type of chemical analyses.

Journal ArticleDOI
TL;DR: In this article, the authors show how application and consideration of the scientific context in which statistics is used can initiate important advances such as least squares, ratio estimators, correlation, contingency tables, studentization, experimental design, the analysis of variance, randomization, fractional replication, variance component analysis, bioassay, limits for a ratio, quality control, sampling inspection, nonparametric tests, transformation theory, ARIMA time series models, sequential tests, cumulative sum charts, data analysis plotting techniques.
Abstract: The article shows how application and consideration of the scientific context in which statistics is used can initiate important advances such as least squares, ratio estimators, correlation, contingency tables, studentization, experimental design, the analysis of variance, randomization, fractional replication, variance component analysis, bioassay, limits for a ratio, quality control, sampling inspection, nonparametric tests, transformation theory, ARIMA time series models, sequential tests, cumulative sum charts, data analysis plotting techniques, and a resolution of the Bayes-frequentist controversy. It appears that advances of this kind are frequently made because practical context reveals a novel formulation that eliminates an unnecessarily limiting framework.