scispace - formally typeset
Search or ask a question

Showing papers on "Statistical hypothesis testing published in 1969"


Book
01 Jan 1969
TL;DR: This work describes data -- A Single Variable, a single Variable: Associations between Two Quantitative Variables: Regression and Correlation, and Hypothesis Testing: General Principles and One--sample Tests for Means, Proportions, Counts and Rates.
Abstract: 1. Describing Data -- A Single Variable. 2. Probability, Populations and Samples. 3. Associations: Chance, Confounded or Causal?. 4. Confidence intervals: General principles Proportions, Means, Medians, Counts and Rates. 5. Hypothesis Testing: General Principles and One--sample Tests for Means, Proportions, Counts and Rates. 6. Epidemiological and Clinical Research Methods. 7. Confidence intervals and Hypothesis Tests: Two--group Comparisons. 8. Sample Size Determination. 9. Comparison of More than Two Independent Groups. 10. Associations between Two Quantitative Variables: Regression and Correlation. 11. Multivariate Analysis and the Control of Confounding!. 12. Bias and Measurement Error. Bibliography. Appendix A Computational Shortcuts. Appendix B Statistical Tables. Appendix C A "Cookbook" of Hypothesis Tests and Confidence Intervals. Appendix D World Medical Association Declaration of Helsinki

497 citations


Journal ArticleDOI
TL;DR: The problem of finding shortest-path probability distributions in graphs whose branches are weighted with random lengths is considered, and an exact method for computing the probability distribution is given, as well as methods based on hypothesis testing and statistical estimation.
Abstract: This paper considers the problem of finding shortest-path probability distributions in graphs whose branches are weighted with random lengths, examines the consequences of various assumptions concerning the nature of the available statistical information, and gives an exact method for computing the probability distribution, as well as methods based on hypothesis testing and statistical estimation. It presents Monte Carlo results and, based on these results, it develops an efficient method of hypothesis testing. Finally, it discusses briefly the pairwise comparison of paths.

362 citations


Journal ArticleDOI
TL;DR: This paper provided estimates of the statistical significance of results yielded by Kruskal's non-metric multidimensional scaling, revealing the relative frequency with which apparent structure is erroneously found in unstructured data.
Abstract: Recent advances in computer based psychometric techniques have yielded a collection of powerful tools for analyzing nonmetric data. These tools, although particularly well suited to the behavioral sciences, have several potential pitfalls. Among other things, there is no statistical test for evaluating the significance of the results. This paper provides estimates of the statistical significance of results yielded by Kruskal's nonmetric multidimensional scaling. The estimates, obtained from attempts to scale many randomly generated sets of data, reveal the relative frequency with which apparent structure is erroneously found in unstructured data. For a small number of points (i.e., six or seven) it is very likely that a good fit will be obtained in two or more dimensions when in fact the data are generated by a random process. The estimates presented here can be used as a bench mark against which to evaluate the significance of the results obtained from empirically based nonmetric multidimensional scaling.

175 citations


Journal ArticleDOI
TL;DR: In this paper, the authors considered the problem of estimating the distribution of true scores from the observed scores for a group of examinees given that the frequency distribution of the errors of measurement is known.
Abstract: The following problem is considered: Given that the frequency distribution of the errors of measurement is known, determine or estimate the distribution of true scores from the distribution of observed scores for a group of examinees. Typically this problem does not have a unique solution. However, if the true-score distribution is “smooth,” then any two smooth solutions to the problem will differ little from each other. Methods for finding smooth solutions are developed a) for a population and b) for a sample of examinees. The results of a number of tryouts on actual test data are summarized.

87 citations


Journal ArticleDOI
TL;DR: For a general multivariate linear hypothesis testing problem, a class of permutationally (conditionally) distribution-free tests is proposed and studied in this article, along with a generalization of the elegant results of Hajek (1968) to the multistatistics and multivariate situations.
Abstract: For a general multivariate linear hypothesis testing problem, a class of permutationally (conditionally) distribution-free tests is proposed and studied. The asymptotic distribution theory of the proposed class of test statistics is studied along with a generalization of the elegant results of Hajek (1968) to the multistatistics and multivariate situations. Asymptotic power and optimality of the proposed tests are established and a characterization of the multivariate multisample location problem [cf. Puri and Sen (1966)] in terms of the proposed linear hypothesis is also considered.

76 citations


Book
01 Jan 1969

69 citations


Journal ArticleDOI
TL;DR: In this paper, the authors demonstrate the importance of using disaggregated data even when micro-components exhibit different behaviors, and demonstrate that models estimated from micro data will give generally superior out-of-sample forecasts.
Abstract: IN a previous article with Professor Harold Watts, the authors demonstrated empirically the loss of information in the parameter estimators when data are aggregated prior to computing least-squares regressions [3]. These results came from simulations with a simple economic model containing identical microcomponents. Specifically, in addition to the error term, each component spent 0.9 of its previous income and 0.2 of its cash balance. The main point of our previous paper was that estimation prior to aggregation yielded substantially greater precision in the estimates of the parameters and their standard errors than did estimation of the same parameters after aggregation. The implications of this for hypothesis testing and the development of satisfactory policy response models seemed obvious. On the basis of a variety of evidence, including the paper with Watts and a paper by Orcutt [4], the case for seeking and frequently using disaggregated data seemed strong but one nagging concern remained. Suppose, as seems likely, the microcomponents exhibit different behaviors. In this case it might not be sensible to pool the data and treat it as a single sample from a single universe. However, if estimators from each micro equation are computed separately, would it still be desirable to use disaggregated data instead of data aggregated over all components? This turned out to be the case with identical components but would it be with nonidentical components in which something more than constant terms were different? This paper copes directly with this issue, and we demonstrate the importance of using disaggregated data even when microcomponents exhibit different behaviors. We do not deal with cases where microcomponents have nonlinear relations, but the need for disaggregated data in such cases seems fairly obvious without Monte Carlo experiments. If we wish to compare the accuracy of estimation at different levels of aggregation, we need a measure of merit different from the extent of bias and variance of parameter estimators, which we used in our previous study, because in an aggregate model whose components have different behaviors, the expected values of the estimators may be meaningless or nonstationary [Zellner, pp. 3-5]. Therefore, we use the accuracy of the out-of-sample forecasts to measure the merit of the estimated equations. In particular, we forecast the aggregate expenditure for the eight time periods following the last sample period. The rootmean-square forecast errors from models based on data at different levels of aggregation provide the yardstick for comparisons. Our results suggest that models estimated from micro data will give generally superior out-of-sample forecasts. This finding is at variance with the belief that one reaps an "aggregation gain" by aggregating the micro data prior to estimation. The concept of a possible aggregation gain was formalized in a 1960 article in this Review by Grunfeld and Griliches:

59 citations


Journal ArticleDOI
TL;DR: In this paper, a simple model specifying the fixed and variable costs of testing is adopted and equations are derived that indicate the sample size that maximizes the non-centrality parameter subject to the cost constraints.
Abstract: Formulae are developed from the assumptions of classical test theory to demonstrate the effect of error of measurement on the power of the F test for a fixed-effects one-way analysis of variance model. A simple model specifying the fixed and variable costs of testing is adopted and equations are derived that indicate the sample size that maximizes the non-centrality parameter subject to the cost constraints. Given the sample size, the corresponding test length is then implied by the cost model. A computer program used for the estimation of power for permissible allocations of resources is described.

56 citations



Journal ArticleDOI
TL;DR: A statistical test for homogeneity of regression is described and computational formulas and interpretive discussion are presented in an effort to widen the use of this technique in educational research.
Abstract: This article describes a statistical test for homogeneity of regression and presents computational formulas and interpretive discussion in an effort to widen the use of this technique in educational research. First, the rationale for the test is presented intuitively followed by a precise statement of the statistical problem. Second, the derivation and computation of an F-ratio for testing the hypothesis of homogeneity of regression is described. Finally, some diverse examples of experimental and correlation studies where the test has been useful are described, and arguments are made for its wider use in other studies.

21 citations


Journal ArticleDOI
TL;DR: This article developed a set of factors which would enable one to answer questions regarding specific alpha and beta values, whether they should be large or small errors, and to what kinds of decisions are these errors related.
Abstract: The purpose of this paper was to develop a set of factors which would enable one to answer questions regarding specific alpha and beta values, whether they should be large or small errors, and to what kinds of decisions are these errors related. The set of factors found useful in thinking through the difficulties encountered are: 1) number of alternatives, 2) planning horizon, 3) past success of decision-maker, and 4) cost-revenue consequences of an action. In classical hypothesis testing there are no rules that allow one to systematically set alpha and beta error levels. Generally, type I error is set at 5 per cent (alpha per cent) and type II error is almost ignored. A number of people in diverse areas of specialization have drawn attention to the unsystematic treatment in setting alpha and beta in classical hypothesis testing. Important too is the confusion between classical statistical inference and Bayesian statistical decision-making. Selecting error levels has been made difficult by a failure to di...

Journal ArticleDOI
TL;DR: In this article, the locally most powerful rank test for testing the symmetry axis of a symmetric circular distribution is derived and an efficiency expression for a class of linear rank tests is obtained.
Abstract: SUMMARY In this paper the locally most powerful rank test for testing the symmetry axis of a symmetric circular distribution is derived. Then an efficiency expression for a class of linear rank tests is obtained. The results are applied to the class of vonMises distributions and efficiency results for the sign test and the Wilcoxon test are calculated. 1. INTRODIUCTION Circular distributions are encountered in many areas of scientific investigation. Perhaps the most important example is the distribution of phases of periodic phenomena, for example, in biology and physics. Another area of application is the analysis of directions, for example, in earth sciences, migration, etc. Several examples are given by Batschelet (1965). Assume that Xl, ..., Xn are independent observations from a random variable, which takes its values on the unit circle. In this paper we are concerned with the problem of finding nonparametric tests for the hypothesis that the underlying distribution is symmetric with respect to the horizontal axis against the alternative of a displaced or rotated centre of symmetry. We proceed as follows: First we obtain a locally most powerful rank test against rotation alternatives, ? 2. After studying the distribution of a class of related test statistics under the hypothesis as well as under alternatives, ? 3, we derive the efficiency of such test sequences with respect to parametric competitors and we show that a fully efficient nonparametric test exists, ? 4. Finally, we apply some of the results to the class of von Mises distributions, ? 5.

Journal ArticleDOI
TL;DR: In this article, an intuitive correspondence between two-sample tests for shift alternatives on a circle and tests for uniformity of a circular distribution is found, supported by formal analysis, and leads to a method for constructing and inferring the properties of specific twosample tests from the mathematically simpler tests for the uniformity.
Abstract: Some recent developments in two-sample tests for shift alternatives on a circle and in tests for uniformity of a circular distribution are related in this paper. An intuitive correspondence, supported by formal analysis, is found between the two classes of test statistics in form, asymptotic null distribution and asymptotic behaviour under alternatives and leads to a method for constructing and inferring the properties of specific two-sample tests from the mathematically simpler tests for uniformity. Several useful two-sample tests on the circle, some previously suggested, the others new, are systematically derived.

Journal ArticleDOI
TL;DR: In this article, the results of the 1965-66 Value Line investment contest were analyzed and its implications for the random walk hypothesis concerning stock market prices were discussed. But the analysis was limited to the first six months of the contest, and it is difficult to believe the price changes were generated randomly.
Abstract: IN A recent article' in this Journal, Professor John Shelton presents a fascinating analysis of the results of the 1965-66 Value Line investment contest and its implications for the random walk hypothesis concerning stock market prices. He concludes that "the evidence from this study indicates that the stock market, during at least the six months of the contest, had enough elements of predictability that it is difficult to believe the price changes were generated randomly."2 I wish to call attention to an inappropriate statistical test used in Shelton's paper, to consider a new null hypothesis which might explain the results of the Value Line contest and at the same time be consistent with the random walk hypothesis, and to present an illustration based on this hypothesis.


Journal ArticleDOI
TL;DR: In this paper, the Hotelling-Lawley and Pillai criteria are partitioned into direction and collinearity parts and large sample tests corresponding to them are derived for testing the goodness of fit of an assigned function.
Abstract: : H. Ruben (1966) has suggested a simple approximate normalization for the correlation coefficient in normal samples, by representing it as the ratio of a linear combination of a standard normal variable and a chi variable to an independent chi variable and then using Fisher's approximation to a chi variable. This result is extended in this paper to a matrix, which in a sense is the correlation coefficient between two vector variables x and y. The result is then used to obtain large sample null and non-null (but in the linear case) distributions of the Hotelling-Lawley criterion and the Pillai criterion in multivariate analysis. Williams (1955) and Bartlett (1951) have derived some exact tests for the goodness of fit of a single hypothetical function to bring out adequately the entire relationship between two vectors x and y, by factorizing Wilks' lambda suitably. These factors are known as 'direction' and 'collinearity' factors, as they refer to the direction and collinearity aspects of the null hypothesis. In this paper, the other two criteria viz. the Hotelling-Lawley and Pillai criteria are partitioned into direction and collinearity parts and large sample tests corresponding to them are derived for testing the goodness of fit of an assigned function.



Journal ArticleDOI
Behram H. Bharucha1
TL;DR: A precise version of the loose assertion that “minimizing the error probability is equivalent to maximizing the a posteriori probability≓ is stated and proved" and a demonstration that function space type models are, in a sense, natural models for detection problems.
Abstract: The notion of a posteriori probability, often used in hypothesis testing in connection with problems of optimum signal detection, is put on a firm basis. The number of hypotheses is countable, and the observation space ω is abstract so as to include the case where the observation is a realization of a continuous parameter random process. The a posteriori probability is defined without recourse to limiting arguments on “finite dimensional≓ conditional probabilities. The existence of the a posteriori probability is established, its a.e. uniqueness is studied, and it is then used to define other a posteriori quantities and to solve the decision problem of minimizing the error probability. In particular, a precise version of the loose assertion that “minimizing the error probability is equivalent to maximizing the a posteriori probability≓ is stated and proved. The results are then applied to the case where the observation is a sample path of a random process, devoting considerable attention to questions of convergence and of having satisfactory models for the observation space, the random process, and the observables. The deficiencies of a common function space type model are pointed out and ways of correcting these deficiencies are discussed. The use of time samples and Karhunen-Loeve expansion coefficients as observables is investigated. The paper closes with an examination of non function space type models, and a demonstration that function space type models are, in a sense, natural models for detection problems.

Journal ArticleDOI
TL;DR: In this article, a modification of the blank-trials technique of Levine2 was employed that on each outcome trial, Ss received information about only one hypothesis, i.e. one value of one dimension.
Abstract: The major purposes of this paper were twofold: to test extensively the generality and validity of the conclusion by Fishbein, Benton, Osborne, and Wise that in concept-identification tasks, the stimulus dimensions play an important organizational role in the way Ss test hypotheses," and to determine the extent to which the number of dimensions and values within each dimension affects selective memory loss from Trial n to Trial n+ 1. In the Fishbein et al. study, Ss were run in either three-dimension, two-value, or six-dimension, twovalue, successive discrimination-learning tasks. A modification of the blank-trials technique of Levine2 was so employed that on each outcome trial, Ss received information about only one hypothesis, i.e. one value of one dimension. These authors found that following an error on Trial 1, the probability of Ss testing on Trial 2 a hypothesis that lay along the same stimulus dimension as their Trial 1 hypothesis was significantly greater than chance. Does this behavior occur when simultaneous discrimination tasks are used? More importantly, does the dimension continue to organize Ss' behavior beyond the first two outcome trials? The second major set of questions arose from the fact that when the blank-trials technique is utilized, S is informed on each outcome trial about both the correctness of the hypothesis he held during the no-outcome trials and the correctness of other hypotheses, i.e. the other values of the chosen stimulus. Levine has shown that in

Journal ArticleDOI
TL;DR: This paper is concerned with the application of the general linear model to the situation in which the observations are divided into several groups and it is assumed that some of the regression coefficients may be common to all groups whilst other regression coefficients and also the error variance may vary from group to group.
Abstract: This paper is concerned with the application of the general linear model to the situation in which the observations are divided into several groups. It is assumed that some of the regression coefficients may be common to all groups whilst other regression coefficients and also the error variance may vary from group to group. The problems of estimation and hypothesis testing are discussed. The least squares method is applicable if the model involves the assumption of a common error variance, but if the error variance differs from group to group the maximum likelihood approach is called for. An example is given of the testing of the hypothesis of equal error variances in a study of ventilatory function in coal miners.

01 Oct 1969
TL;DR: In this paper, a general linear hypothesis pertaining to untransformed data is tested by minimum chi-square techniques without assumptions required in ordinary analysis of variance, without assumptions in ordinary variance analysis.
Abstract: : Testing a general linear hypothesis pertaining to untransformed data is accomplished by minimum chi-square techniques without assumptions required in ordinary analysis of variance. (Author)


Book ChapterDOI
01 Jan 1969
TL;DR: The binomial distribution is an example of a discrete probability distribution, that is, the variable r may take only the discrete values 0, 1, 2, n and may not take fractional values as discussed by the authors.
Abstract: Publisher Summary This chapter provides an overview of probability. If an event can occur in N mutually exclusive and equally likely ways and if n of these outcomes have a character A, then the probability of A is n/N. A sample is the collection of results observed. A population is the collection of results obtained had one collected observations, under the same conditions, indefinitely. Statistical tests have been designed to test statements about the population. Estimation procedures have been chosen to enable the estimation of population values and the assessment of the precision of such estimates. The binomial distribution is an example of a discrete probability distribution, that is, the variable r may take only the discrete values 0, 1, 2, n and may not take fractional values. It is the discrete nature of the values that is important and not the fact that the number of values r may take is finite.

ReportDOI
01 Dec 1969
TL;DR: The use of alpha-level tests for fixed alpha is not a valid 'blade' for Occam's Razor because of the lack of an adequate formulation of the problems so that practical solutions can be derived.
Abstract: : The use of alpha-level tests for fixed alpha is not a valid 'blade' for Occam's Razor. In the most important cases of scientific inference, the null hypothesis is known to be false, and consequently the type I error probability is irrelevant. The author points out the lack of an adequate formulation of the problems so that practical solutions can be derived.

Journal ArticleDOI
TL;DR: In this article, the authors developed test procedures which could be considered as competitors to the Sen-David (1968) and to the Davidson-Bradley ( 1968) tests, which are applicable to situations which involve the preferences of each individual comparison.
Abstract: The only non-parametric multivariate paired comparison tests presently available for testing the hypothesis of no difference among several treatments are (i) the Sen-David (1968) test and (ii) the Davidson-Bradley (1969) test. Both these tests are applicable to situations which involve the preferences of each individual comparison. Both these tests are the generalizations of the one-sample multivariate sign tests [1]. As such their A.R.E.'s (Asymptotic Relative Efficiencies) with respect to the normal theory $\mathfrak{F}$-test are not expected to be high. In fact the A.R.E. of the Sen-David (1968) test with respect to the normal theory $\mathfrak{F}$-test can be as low as zero (under normality). The purpose of this paper is to develop test procedures which could be considered as competitors to the Sen-David (1968) and to the Davidson-Bradley (1968) tests. The proposed procedures are based on the ranks of the observed comparison differences, and include as special cases the multivariate normal scores and the multivariate rank sum paired comparison tests. For convenience of presentation we develop the theory when the paired comparisons involve paired characteristics. Under suitable regularity conditions the limiting distributions of the proposed test statistics are derived under the null as well as non-null hypotheses, and their large sample properties are studied. It is shown that for various situations of interest the proposed procedures have considerable efficiency improvements over the Sen-David (1968) and the normal theory procedures.


ReportDOI
26 Jun 1969
TL;DR: In this paper, some methods of analyzing experimental data for complex targets and reverberation are outlined, including a heuristic discussion of what constitutes an ensemble and how to obtain one experimentally.
Abstract: : Some methods of analyzing experimental data for complex targets and reverberation are outlined, including a heuristic discussion of what constitutes an ensemble and how to obtain one experimentally. The problem of then determining whether or not the actual data form a valid ensemble is considered. Tests for the stability, or stationarity, of the underlying random mechanism are briefly described (test of independence and homogeneity); tests of whether or not a particular data set belongs to some postulated probability distribution are provided (goodness-of-fit), including a powerful test for establishing the normality or non-normality of the sample. Among the tests considered are the X squared, the Kolmogorov-Smirnov, the runs test, and the W-test for normality. These tests are carried out for some hypothetical reverberation data to illustrate the individual tests, at particular ranges, which can include the domain of a target reverberation. Some second-order properties of various classes of second moments of these data are also discussed, and an approach to relating simulated data to those from a real environment is briefly sketched. This memorandum is intended as a preliminary guide to the statistical treatment of data that are obtained in target and background measurements.

01 Jul 1969
TL;DR: In this article, the Kolmogorov-Smirnov test is compared to the median for one-sided and two-sided alternatives for symmetric unimodal distributions.
Abstract: : The Kolmogorov-Smirnov test is compared to the median for one-sided and two-sided alternatives. In the two-sided case, although the asymptotic Bayes risk efficiency is the same for symmetric unimodal distributions, the relative efficiency approaches one slowly. In the one-sided case, the efficiency depends heavily on the test. For uniform alternatives, the K-S test is much better than the median, and the test based on the difference of the positive and negative deviations is still better. This latter test appears to have good properties even when the one-sided K-S test does not. (Author)

Journal ArticleDOI
TL;DR: In this paper, the truncated distribution is transformed so that the abscissa is translated to the truncation points and the curve above the new subspace is given unit area, and then the curve is half-rectified.
Abstract: In many cases, data are drawn from a population which is distributed approximately normal, but with bounded, rather than infinite, domain. The traditional approach is a truncated normal, but if the population probability approaches zero at the bounds of the domain, serious errors in hypothesis testing may accompany truncation since the tails of the assumed distribution are used in error probability and critical region computation. In this paper, the truncated distribution is transformed so that the abscissa is translated to the truncation points and the curve above the new abscissa is given unit area, and then the curve is half-rectified. The result is a quasi-normal distribution with finite domain yet appearing to retain many properties of the normal. Presented is exact sampling theory, tests of hypothesis methodology, tables of associated probabilities, and illustrations from electronic reliability and oceanography.