scispace - formally typeset
Search or ask a question

Showing papers on "Statistical hypothesis testing published in 1973"


Journal ArticleDOI
TL;DR: The usefulness of common statistical tests as applied to method comparison studies is studied and different types of errors in test sets of data are simulated to determine the sensitivity of different statistical parameters.
Abstract: We have studied the usefulness of common statistical tests as applied to method comparison studies. We simulated different types of errors in test sets of data to determine the sensitivity of different statistical parameters. Least-squares parameters (slope of least-squares line, its y intercept, and the standard error of estimate in the y direction) provide specific estimates of proportional, constant, and random errors, but comparison data must be presented graphically to detect limitations caused by nonlinearity and errant points. t -test parameters ( bias, standard deviation of difference) provide estimates of constant and random errors, but only when proportional error is absent. Least-squares analysis can estimate proportional error and should be considered a prerequisite to t -test analysis. The correlation coefficient ( r ) is sensitive only to random error, but is not easily interpreted. Values for r, t , and F are not useful in making decisions on the acceptability of performance. These decisions should be judgments on the errors that are tolerable. Statistical tests can be applied in a manner that provides specific estimates of these errors

322 citations


Journal ArticleDOI
TL;DR: In this paper, a discussion is given of some of the pertinent literature for estimating variances in errors of measurement, or the "imprecisions" of measurement when two or three instruments are used to take the same observations on a series of items or characteristics.
Abstract: A very important and yet widely misunderstood concept or problem in science and technology is that of precision and accuracy of measurement. It is therefore necessary to define the terms precision and accuracy (or imprecision and inaccuracy) clearly and analytically if possible. Also, we need to establish and develop appropriate statistical tests of significance for these measures, since generally a relatively small number of measurements will be made or taken in most investigations. In this paper a discussion is given of some of the pertinent literature for estimating variances in errors of measurement, or the “imprecisions” of measurement, when two or three instruments are used to take the same observations on a series of items or characteristics. Also, present techniques for comparing the imprecision of measurement of one instrument with that of a second instrument through the use of statistical tests of significance are reviewed, as well as procedures for detecting the significance of the difference i...

129 citations


Journal ArticleDOI
TL;DR: In this paper, simple theories of government coalition formation are described and tested, and a statistical method is presented for comparing the theories and evaluating each of them relative to a certain null hypothesis.
Abstract: Several simple theories of government coalition formation are described and tested. A statistical method is presented for comparing the theories and evaluating each of them relative to a certain null hypothesis. The statistical tests are based upon the formation of 132 government coalitions in twelve countries of Western Europe during 1945–71.

123 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present methods for estimating and testing hypotheses about linear functions of the unknown parameters in a generalization of the growth curve model which allows missing data, and the estimators proposed are best asymptotically normal (BAN).

104 citations


Journal ArticleDOI
TL;DR: In this article, the authors compared the robustness of adaptive regression with and without the autoregressive correction to structural change over time, relative to ordinary least squares analysis with and with the adaptive regression correction.
Abstract: A NY econometric equation representing a complex behavioral or technical relationship is, of necessity, an approximation of reality. As such, it is subject to errors in specification and structural change over time. This problem is well recognized by econometricians. Duesenberry and Klein (1965) point out that *'. . . as technology, institutional arrangements, tastes and managerial techniques change over time, the relationships represented by our equations inevitably change." Furthermore, when statistical tests are applied to econometric relationships, the hypothesis of structural stability is frequently rejected.' Some methods for dealing with structural change have evolved. Quandt (1957) has developed a maximum likelihood technique for estimating a point of structural change within a sample.2 Klein and Evans (1967) adjust the intercepts of the Wharton Model to account for structural change.3 The purpose of this paper is to test the robustness of Adaptive Regression (1973) to specification errors causing structural change over time, relative to ordinary least squares analysis with and without the autoregressive correction.4 Since econometricians are inevitably faced with structural change and errors in specification, they should use a technique which is robust rela-tive to such problems. The device most commonly used is to assume that the disturbances are subject to an autoregressive process. The autoregressive correction may frequently ameliorate the effects of misspecification and structural change, but it is doubtful whether such processes, except in rare instances, describe the true distribution of the disturbances. The economics literature seldom gives any justification for this scheme except that omitted variables may be subject to an autoregressive process or the structure of the model may be changing.5 We suspect the reasons for the widespread use of the autoregressive correction are that it is a simple hypothesis, explains serial correlation in the disturbances, and can be dealt with efficiently. The adaptive regression model considered in this paper is equally simple but more general, explains serial correlation, and can also be dealt with efficiently.6 In the next section the adaptive regression model is presented and the Bayesian estimators are developed. In section II the results of a Monte Carlo Study are presented. Two models are considered for which data are generated by eleven different schemes. The estimation and forecasting efficiency of adaptive regression, and ordinary least squares with and without the autoregressive correction are compared. Section III contains an analysis of the role of time trends in econometric relationships. In section IV the relative forecasting ability of the three estimation techniques is tested on real data. The three models suggested by Received for publication February 10, 1972. Revision accepted for publication November 30, 1972. *The authors acknowledge helpful comments of Professors F. G. Adams, R. Roll and R. Summers and the participants of the NBER conference on Bayesian Statistical Inference in Economics. Computations were executed on the University of Pennsylvania computer. 'Examples of such tests include Brown (1966), Goldfield (1969) and Howrey (19-70). One of the most extensive studies was done by Duffy (1969). 'The Quandt technique is limited by the fact that it is mainly useful for finding stable subsamples. If structural change occurs often, it is not very useful. Rosenberg (1968) has used stepwise composition to develop the computationally efficient Aitken estimates of a model subject to structural change over time. His procedure, however, requires that the true covariance matrix of the disturbances be known up to a constant scale factor. 'Adjusting the intercepts is an ad hoc method for keeping the model on track for ex ante forecasting. The intercepts are not assumed to change over the sample period which is always much longer than the forecasting period. 'The autoregressive correction assumes the error is subject to a first or second order autoregressive scheme. See Dhrymes (1969) for the maximum likelihood approach and Zellner and Tiao (1965) for the Bayesian development. The latter approach is used in this paper. 'In fact, if omitted variables are subject to an autoregressive process, the disturbances will, in general, be subject to a more complicated process. 6 A test with sufficient power to differentiate betweer these two models (or others which result in serial correlation) using sample sizes generally available to econometricians does not appear to exist. Further, if one did, its usefulness would be limited as neither structure is likely to be an exact representation of reality. That one structure is more likely on the basis of the data does not imply thai it will forecast better if, in fact, a third structure is generating the data.

65 citations


Proceedings ArticleDOI
04 Jun 1973
TL;DR: Before an investigator can claim that his simulation model is a useful tool for studying behavior under new hypothetical conditions, he is well advised to check its consistency with the true system, as it exists before any change is made.
Abstract: Before an investigator can claim that his simulation model is a useful tool for studying behavior under new hypothetical conditions, he is well advised to check its consistency with the true system, as it exists before any change is made. The success of this validation establishes a basis for confidence in results that the model generates under new conditions. After all, if a model cannot reproduce system behavior without change, then we hardly expect it to produce truly representative results with change.The problem of how to validate a simulation model arises in every simulation study in which some semblance of a system exists. The space devoted to validation in Naylor's book Computer Simulation Experiments with Models of Economic Systems indicates both the relative importance of the topic and the difficulty of establishing universally applicable criteria for accepting a simulation model as a valid representation.One way to approach the validation of a simulation model is through its three essential components; input, structural representation and output. For example, the input consist of exogenous stimuli that drive the model during a run. Consequently one would like to assure himself that the probability distributions and time series representations used to characterize input variables are consistent with available data. With regard to structural representation one would like to test whether or not the mathematical and logical representations do not conflict with the true system's behavior. With regard to output one could feel comfortable with a simulation model if it behaved similarly to the true system when exposed to the same input.Interestingly enough, the greatest effort in model validation of large econometric models has concentrated on structural representation. No doubt this is due to the fact that regression methods, whether it be the simple leastsquares method or a more comprehensive simultaneous equations techniques, in addition to providing procedures for parameter estimation, facilitate hypothesis testing regarding structural representation. Because of the availability of these regression methods, it seems hard to believe that at least some part of a model's structural representation cannot be validated. Lamentably, some researchers choose to discount and avoid the use of available test procedures.With regard to input analysis, techniques exist for determining the temporal and probabilistic characteristics of exogeneous variables. For example the autoregressive---moving average schemes described in Box and Jenkins' book, Time Series Analysis: Forecasting and Control, are available today in canned statistical computer programs. Maximum likelihood estimation procedures are available for most common probability distribution and tables based on sufficient statistics have begun to appear in the literature. Regardless of how little data is available, a model's use would benefit from a conscientious effort to characterize the mechanism that produced those data.As mentioned earlier a check of consistency between model and system output in response to the same input would be an appropriate step in validation. A natural question that arises is: What form should the consistency check take? One approach might go as follows: Let X1, ..., Xn be the model's output in n consecutive time intervals and let Y1, ..., Yn be the system's output for n consecutive time intervals in response to the same stimuli. Test the hypothesis that the joint probability distribution of X1, ..., Xn is identical with that of Y1, ..., Yn.My own feeling is that the above test is too stringent and creates a misplaced emphasis on statistical exactness. I would prefer to frame output validation in more of a decision making context. In particular, one question that seems useful to answer is: In response to the same input, does the model's output lead decision makers to take the same action that they would take in response to the true system's output? While less stringent than the test first described, its implementation requires access to decision makers. This seems to me to be a desirable requirement for only through continual interaction with decision makers can an investigator hope to gauge the sensitive issues to which his model should be responsive and the degree of accuracy that these sensitivities require.

51 citations


Journal ArticleDOI
TL;DR: A study to determine the adequacy of statistical testing in communication research discovered a large amount of inconsistency in the reporting of statistical findings; a condition which hampers any clear understanding of what the researcher did and what he found.
Abstract: A study was conducted to determine the adequacy of statistical testing in communication research. Every article published in the 1971–72 issues of the Journal of Communication was studied. For those studies employing statistical testing, we computed the power of those tests and the observed effect size. We found the average a priori power to be.55, a figure which is probably much lower than communication researchers would desire. While the average observed effect size was high, we found little evidence of its being used in the interpretation of findings. This study also discovered a large amount of inconsistency in the reporting of statistical findings; a condition which hampers any clear understanding of what the researcher did and what he found. In this regard we suggest some guidelines for presenting statistical data.

50 citations




Journal ArticleDOI
TL;DR: The division is sometimes made as one between hypothesis testing and estimation, but the tool of hypothesis testing can be used as an interpretive aid in conjunction with the tools of estimation, so I prefer to make the divisions in terms of the uses made by the biologist.
Abstract: Statistics can be used in a biological assay as either a method of checking on the validity of conclusions or as a guide to the interpretation of experimental results. The division is sometimes made as one between hypothesis testing and estimation. However, the tool of hypothesis testing can be used as an interpretive aid in conjunction with the tools of estimation, so I prefer to make the divisions in terms of the uses made by the biologist. If we are to apply statistical procedures to the dominant-lethal trail as a check of validity of conclusions, we would have to consider the inherent theoretical faults in the design. For instance, the animals actually treated constitute a small set of males and any conclusions must be conditional upon that set of males; thus, the probability levels computed are, themselves, random variables, functions of the random choice of males. Or, a typical trial will involve more than one test compound or dose of a test compound against the controls. Thus, any formal probability level computed must take into account questions of multiple comparisons. With comments like these, the statistician can sit in the marble halls of his floating island and throw bolts of doubt at the whole procedure. But, almost any biological assay can be made to suffer from this kind of criticism. At this stage in the develop-

20 citations


Journal ArticleDOI
TL;DR: The power of statistical tests recently appearing in the JEM was determined using the power calculation guidelines proposed by Cohen (1969) as discussed by the authors, and the results indicated that power was generally below.50 for small effect sizes and above. 50 for medium and large effect sizes.
Abstract: The power of statistical tests recently appearing in the JEM was determined using the power calculation guidelines proposed by Cohen (1969). All the articles containing tests of significance were surveyed. The results indicated that power was generally below .50 for small effect sizes and above .50 for medium and large effect sizes. A suggestion for reporting statistical results to include power of the tests was made.

Journal ArticleDOI
TL;DR: In this article, the authors derived statistical tests of this hypothesis when the relation is specified with the exception of the additive constant, and then the results were reinterpreted in terms of the possible existence of an unspecified perfect linear relation between true scores of two psychological tests.
Abstract: We concern ourselves with the hypothesis that two variables have a perfect disattenuated correlation, hence measure the same trait except for errors of measurement. This hypothesis is equivalent to saying, within the adopted model, that true scores of two psychological tests satisfy a perfect linear relation. Statistical tests of this hypothesis are derived when the relation is specified with the exception of the additive constant. Two approaches are presented and various assumptions concerning the error parameters are used. Then the results are reinterpreted in terms of the possible existence of an unspecified perfect linear relation between true scores of two psychological tests. A numerical example is appended by way of illustration.

Journal ArticleDOI
TL;DR: Empirical evidence obtained through the use of a Monte-Carlo sampling process justifies theUse of parametric inference techniques with the OHI-S and PI in some instances.
Abstract: Empirical evidence obtained through the use of a Monte-Carlo sampling process justifies the use of parametric inference techniques with the OHI-S and PI in some instances.

Journal ArticleDOI
TL;DR: An iterative algorithm for solving the inverse problem in ecological modelling (the problem of using empirical population data to evaluate the parameters of a given model) and certain results from the theory of statistical hypothesis testing provide a method by which the many possible interspecies interactions in an ecosystem can be examined and the dominant interactions can be identified.

Journal ArticleDOI
TL;DR: A system is proposed which combines hypothesis testing and prediction to estimate the future value of a stochastic process whose statistics are known only to belong to some finite set of possible hypotheses.

01 Oct 1973
TL;DR: In this paper, the simple notion of relative conditional expectation is combined with Girsanov's Theorem and, in places, the Ito differential rule to provide a direct and attractive approach to hypothesis testing and detection, estimation, and representation theorems.
Abstract: : The simple notion of relative conditional expectation is combined with Girsanov's Theorem and, in places, the Ito differential rule to provide a direct and attractive approach to hypothesis testing and detection, estimation, and representation theorems. (Author)

01 Feb 1973
TL;DR: In this paper, the authors summarized several methods for obtaining various useful types of models, particularly ones with U-shaped hazard functions, including mixed models, composite models, components in series, nonhomogeneous Poisson processes, polynomial models and models obtained by transformations from other well-known models.
Abstract: : Section I summarizes several methods for obtaining various useful types of models, particularly ones with U-shaped hazard functions. These models include mixed models, composite models, components in series, nonhomogeneous Poisson Processes, polynomial models and models obtained by transformations from other well-known models. Some point estimation and hypothesis testing results are given for linear and quadratic hazard function models in Section 2. In Section 3 point and interval estimation procedures based on the maximum likelihood estimators for the location and scale parameters of the logistic distribution are considered. (Author Modified Abstract)


Journal ArticleDOI
TL;DR: In this paper, a hierarchy of hypotheses for parallel psychological tests with respect to their means and/or covariances has been developed, and the maximum likelihood estimators under various models and under the assumption of normally distributed test scores have been obtained, as well as related likelihood-ratio statistics.
Abstract: Suppose that we are utilizing k different psychological tests, each having one subtest in common. Of particular concern is the hypothesis that these tests are parallel with respect to their means and/or covariances. A complete hierarchy of hypotheses for this situation has been developed. For example, Hm've is the hypothesis that the tests are parallel only with respect to the means of the common test, but with respect to the covariances of both tests. (The prime indicates equality for the common test only.) This hypothesis might be tested against Hvc, the hypothesis of parallelism with respect to the covariances. Other hypotheses considered are Hmvc, Hm'vc, Hm'vc and Hvc. Maximum-likelihood estimators under the various models (and under the assumption of normally distributed test scores) have been obtained, as well as the related likelihood-ratio statistics. Approximate distributions of the likelihood-ratio statistics are worked out, so that the tests can be applied, and an example of their use is provided.

Journal Article
TL;DR: In multivariate analysis there exists a large class of important hypothesis testing problems all of which may be tested by a set of criteria that depend functionally on a matrix variate with a Beta distribution as mentioned in this paper.
Abstract: In multivariate analysis there exists a large class of important hypothesis testing problems all of which may be tested by a set of criteria that depend functionally on a matrix variate with a Beta distribution. Examples of such hypotheses are the following. (i) Hypothesis of independence of two sets of variates, considering the second set fixed. (We shall deal later with the case when both sets are comprised of random variables.) (ii) Linear hypotheses about regression coefficients. (iii) General linear hypothesis in MANOVA.


Journal ArticleDOI
TL;DR: In this paper, a model for dichotomous congeneric items is presented which has mean errors of zero, dichotomyous true scores that are uncorrelated with errors, and errors that are mutually unorrelated.
Abstract: To resolve recent controversy between Klein and Cleary and Levy, a model for dichotomous congeneric items is presented which has mean errors of zero, dichotomous true scores that are uncorrelated with errors, and errors that are mutually uncorrelated.


Journal ArticleDOI
TL;DR: In this paper, a non-central F-distribution for testing a partial null-hypothesis in the analysis of variance of a randomized PBIB design withm associate classes under the Neyman model is presented.
Abstract: As the final results of a series of our investigations [5]–[8], [10], [11] we present in this article a probability distribution—a non-centralF-distribution—which is asymptotically equivalent in the sense of type (M) d to the power function of theF-statistic for testing a partial null-hypothesis in the analysis of variance of a randomized PBIB design withm associate classes under the Neyman model which is a linear model taking both technical and unit errors into account. Thus this seems to be the final answer for which we have been after from the very beginning of our investigation.

Journal ArticleDOI
John Gaito1
TL;DR: In this article, the authors discuss the possible means of analyzing the null hypothesis (Ho) within repeated measurements designs and to indicate the consequences of each of these analyses, including multivariate tests, usual univariate F test, Box's univariate approximate F test and Geisser and Greenhouse univariate conservative F test.
Abstract: In spite of the fact that serious distortions of p levels can ensue in the analysis of repeated measurements, and that a number of individuals (e.g. Box, 1954; Geisser and Greenhouse, 1958; Gaito, 1961; Lana and Lubin, 1963: Collier, Baker, Mandeville, and Hayes, 1967) discussed these problems, this aspect seems to have been disregarded by many psychological researchers as indicated by a survey of recent issues of a few psychological journals. The purpose of this paper is to discuss the possible means of analyzing the null hypothesis (Ho) within repeated measurements designs and to indicate the consequences of each of these analyses. In repeated measurements designs there are four possible ways to test Ho : multivariate tests, usual univariate F test, Box’s univariate approximate F test, Geisser and Greenhouse univariate conservative F test.

Journal ArticleDOI
TL;DR: In this article, a Type III error is identified within the context of traditional classical hypothesis testing as an inferential error of direction, and a recommendation is made that errors of the third kind should be presented in elementary statistics courses as part of the section dealing with quasi-experimentation.
Abstract: A Type III error is identified within the context of traditional-classical hypothesis testing as an inferential error of direction. Its relevance for educational research is exemplified in the form of Kaiser’s (10) 2-tailed directional null hypothesis model. A recommendation is made that “errors of the third kind” should be presented in elementary statistics courses as part of the section dealing with quasi-experimentation.

01 Jun 1973
TL;DR: Bayesian fixed times tests, Bayesian/Classical fixed time tests, and Sequential Bayesian tests were developed and tabulated and showed that updates in the prior distribution are easily made.
Abstract: : alled Bayes/Classical in this report) when the producer and consumer cannot agree on a prior distribution; Develop methods of updating existing prior distributions; Develop a preliminary military standard for BRDT; Investigate some special problems; Fit additional prior distributions Bayesian fixed times tests, Bayesian/Classical fixed time tests, and Sequential Bayesian tests were developed and tabulated These tests form an essential part of the preliminary military standard which was also developed Additional fits of the inverted gamma distribution reconfirmed its choice as a prior distribution and further study showed that updates in the prior distribution are easily made A test based on probability of acceptance is satisfactory to test for shifts in the prior distribution Tables were developed giving the truncation points for the sequential tests At this time, no satisfactory solution has been found for placing more than one equipment on test at a time

Journal ArticleDOI
TL;DR: The concept of function-free birameters was introduced in this paper and general classes of two-sample hypotheses testing problems based on these Birameters are defined; a method for generating test procedures for these thesses is proposed; this generating scheme requires that we obtain, under those assumptions that we feel are justifiable, estimators for the underlying distributions functions.
Abstract: The concept of a function-free birameter is introduced and general classes of two-sample hypotheses testing problems based on these birameters are defined. A method for generating test procedures for these thesses is proposed; this generating scheme requires that we obtain, under those assumptions that we feel are justifiable, estimators for the underlying distributions functions. The resulting classes of test statistics are based in the same set of assumptions. A variety of examples illustrations the broad appliciability of the approach are discussed. Some of which yield new test statistics for specific problems.

Journal ArticleDOI
TL;DR: The basic principals of hypothesis testing are reviewed in this paper, including the development of the hypothesis, the statistical assumptions made, and the test of hypothesis, and appropriate experimental design and sampling technique for evaluation of hypotheses posed are discussed.
Abstract: The basic principals of hypothesis testing are reviewed, including the development of the hypothesis, the statistical assumptions made, and the test of the hypothesis. The appropriate experimental design and sampling technique for evaluation of hypotheses posed are discussed. Because the analysis of variance involving the F-test should be used in a wide variety of geological experiments, emphasis is placed on this analysis. Many geological experiments result in the measurement of one or more factors on a continuous scale, whereas others are recorded in a discrete fashion. This necessitates the use of a covariance analysis to evaluate the effect of discrete and continuous factors in the same model. Orthogonal comparisons are discussed as they are used to evaluate specific hypotheses following the general test of hypothesis in the analysis of variance or covariance. All procedures discussed are illustrated using actual palynofloral data.

Journal ArticleDOI
TL;DR: In this article, the problem of order discrimination of a linear system with noisy observations is viewed as a sequence of hypothesis tests where at the nth step, a choice is made between models of order n and (n+1(.
Abstract: The problem considered in this paper is the order discrimination of a linear system with noisy observations. The order discrimination problem is viewed as a sequence of hypothesis tests ; at the nth step, a choice is made between models of order n and (n+1(. The sequence is terminated the first time that the lower-order hypothesis is preferable. A series of detailed digital computer studies were performed to determine the effectiveness of the proposed order discrimination scheme.