scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the American Statistical Association in 1959"



Journal ArticleDOI
TL;DR: There is some evidence that, in fields where statistical tests of significance are commonly used, research which yields nonsignificant results is not published as mentioned in this paper, and such research being unknown to other investigators may be repeated independently until eventually by chance a significant result occurs, an "error of the first kind" and is published.
Abstract: There is some evidence that in fields where statistical tests of significance are commonly used, research which yields nonsignificant results is not published. Such research being unknown to other investigators may be repeated independently until eventually by chance a significant result occurs—an “error of the first kind”—and is published. Significant results published in these fields are seldom verified by independent replication. The possibility thus arises that the literature of such a field consists in substantial part of false conclusions resulting from errors of the first kind in statistical tests of significance. * The author wishes to express his thanks to Sir Ronald Fisher whose discussion on related topics stimulated this research in the first place, and to Leo Katz, Oliver Lacey, Enders Robinson, and Paul Siegel for reading and criticizing earlier drafts of this manuscript.

958 citations


Journal ArticleDOI
TL;DR: In this paper, the problem of choosing a design such that the polynomial f(ξ) = f (ξ1, ξ2, · · ·, ξ k ) fitted by the method of least squares most closely represents the true function over some region of interest R in the ξ space, no restrictions being introduced that the experimental points should necessarily lie inside R, is considered.
Abstract: The general problem is considered of choosing a design such that (a) the polynomial f(ξ) = f(ξ1, ξ2, · · ·, ξ k ) in the k continuous variables ξ' = (ξ1, ξ2, · · ·, ξ k ) fitted by the method of least squares most closely represents the true function g(ξ1, ξ2, · · ·, ξ k ) over some “region of interest” R in the ξ space, no restrictions being introduced that the experimental points should necessarily lie inside R; and (b) subject to satisfaction of (a), there is a high chance that inadequacy of f(ξ) to represent g(ξ) will be detected. When the observations are subject to error, discrepancies between the fitted polynomial and the true function occur: i. due to sampling error (called here “variance error”), and ii. due to the inadequacy of the polynomial f(ξ) exactly to represent g(ξ) (called here “bias error”). To meet requirement (a) the design is selected so as to minimize J, the expected mean square error averaged over the region R. J contains two components, one associated entirely with varian...

697 citations


Journal ArticleDOI
Albert Madansky1
TL;DR: In this paper, the authors survey and comment on the solutions to the problem of obtaining consistent estimates of α and β from a sample of (x, y)s, when one makes various assumptions about properties of the errors and the true values other than those mentioned above, and when one has various kinds of "additional information" which aids in constructing these consistent estimates.
Abstract: Consider the situation where X and Y are related by Y = α + βX, where α and β are unknown and where we observe X and Y with error, i.e., we observe x = X + u and y = Y + v. Assume that Eu = Ev = 0 and that the errors (u and v) are uncorrelated with the true values (X and F). We survey and comment on the solutions to the problem of obtaining consistent estimates of α and β from a sample of (x, y)'s, (1) when one makes various assumptions about properties of the errors and the true values other than those mentioned above, and (2) when one has various kinds of “additional information” which aids in constructing these consistent estimates. The problems of obtaining confidence intervals for β and of testing hypotheses about β are not discussed, though approximate variances of some of the estimates of β are given. * This paper is an outgrowth of a Master's Thesis submitted to the Department of Statistics, University of Chicago. I am indebted for helpful comments and criticisms to T. E. Harris, W. H. Kr...

560 citations


Journal ArticleDOI
TL;DR: In this paper, a simple method for the iterative improvement of the points is given and illustrated on several examples, showing that nearly optimum points are obtained if they are chosen to equalize the integrals over the various strata of the square root of the population density.
Abstract: When estimating the mean value of a quantity x, in a population to be divided into L strata according to the value of a quantity closely correlated with x, it is necessary to choose the L — 1 points of stratification. Nearly optimum points are obtained if they are chosen to equalize the integrals over the various strata of the square root of the population density. A simple method for the iterative improvement of the points is given and illustrated on several examples.

312 citations


Book ChapterDOI
TL;DR: In this paper, a discussion of measures of association for cross classifications is extended in two ways: a number of supplementary remarks to [66] are made, including the presentation of some new measures.
Abstract: Our earlier discussion of measures of association for cross classifications [66] is extended in two ways. First, a number of supplementary remarks to [66] are made, including the presentation of some new measures. Second, historical and bibliographical material beyond that in [66] is critically surveyed; this includes discussion of early work in America by Doolittle and Peirce, early work in Europe by Korosy, Benini, Lipps, Deuchler and Gini, more recent work based on Shannon-Wiener information, association measures based on latent structure, and relevant material in the literatures of meteorology, ecology, sociology, and anthropology. New expressions are given for some of the earlier measures of association. * This research was supported in part by the Army, Navy, and Air Force through the Joint Services Advisory Committee for Research Groups in Applied Mathematics and Statistics, Contract No. N6ori-02035; and in part by the Office of Naval Research. This paper, in whole or in part, may be repro...

302 citations


Journal ArticleDOI
TL;DR: In this article, a linear programming approach was used to solve the least absolute deviations and least maximum deviations problem in regression problems, and fitting by the Chebyshev criterion was shown to lead to a standard-form p+1 equation linear programming model.
Abstract: In regression problems alternative criteria of “best fit” to least squares are least absolute deviations and least maximum deviations. In this paper it is noted that linear programming techniques may be employed to solve the latter two problems. In particular, if the linear regression relation contains p parameters, minimizing the sum of the absolute value of the “vertical” deviations from the regression line is shown to reduce to a p equation linear programming model with bounded variables; and fitting by the Chebyshev criterion is exhibited to lead to a standard-form p+1 equation linear programming model.

301 citations



Journal ArticleDOI
John W. Pratt1
TL;DR: In this article, a procedure for avoiding these difficulties is proposed, namely to rank the observations including the 0's, drop the ranks of the 0s, and reject the null hypothesis if the sum of the remaining negative (or positive) ranks falls in the tail of its null distribution (given the number of 0's).
Abstract: A Wilcoxon one-sample signed rank test may be made when some of the observations are 0 by dropping the 0's before ranking. However, a sample can be not significantly positive while a more negative sample (obtained by decreasing each observation equally), is significantly positive by the ordinary Wilcoxon test. The reverse is also possible. Two-piece confidence regions result. A procedure for avoiding these difficulties is proposed, namely to rank the observations including the 0's, drop the ranks of the 0's, and reject the null hypothesis if the sum of the remaining negative (or positive) ranks falls in the tail of its null distribution (given the number of 0's). If observations are tied in absolute value, their ranks may be averaged before attaching signs. This changes the null distribution. A sample may be significantly positive which is not significant if the observations are increased (unequally), or if the ties are broken in any way. * This research was supported by the United States Navy th...

226 citations


Journal ArticleDOI
TL;DR: In this paper, a table of divisors of the sample sums of squared deviations from the mean is provided to yield either the confidence interval of minimum length or the "shortest" unbiased interval for the variance of a normal distribution.
Abstract: Tables of divisors of the sample sums of squared deviations from the mean are provided to yield either the confidence interval of minimum length or the “shortest” unbiased interval for the variance of a normal distribution. Some questions are raised concerning confidence intervals of minimum length.

141 citations


Journal ArticleDOI
TL;DR: In this article, several possible methods are presented for constructing confidence intervals for the means of normally distributed, dependent variables when nothing is known about the correlations, and the extent to which these intervals may be shortened when some knowledge of the correlation structure is available is suggested as a problem for investigation.
Abstract: Several possible methods are presented for constructing confidence intervals for the means of normally distributed, dependent variables when nothing is known about the correlations. One, which uses the Student t distribution, is found, when the degrees of freedom is not too small compared to the number of variables, to give intervals almost as short as can possibly be attained. Methods based on Hotelling's T and on Scheffe's confidence intervals for all linear contrasts are found to yield intervals appreciably longer than those using the t distribution. The extent to which these intervals may be shortened when some knowledge of the correlation structure is available is suggested as a problem for investigation.

Journal ArticleDOI
TL;DR: A map which shows, by hatching or shading, the geographical distribution of some phenomenon in terms of absolute frequencies or percentages may usefully be supplemented by a map showing the probabilities of the observed deviations from the mean, calculated on the assumption that the true geographic distribution is uniform as mentioned in this paper.
Abstract: A map which shows, by hatching or shading, the geographical distribution of some phenomenon in terms of absolute frequencies or percentages may usefully be supplemented by a map showing the probabilities of the observed deviations from the mean, calculated on the assumption that the true geographic distribution is uniform. Such a supplementary map helps guard against attaching geographic significance to random variations. * This paper was written while the author was a Visiting Ford Foundation Fellow in the United States.


Journal ArticleDOI
TL;DR: The custom of only publishing research when it reaches a certain degree of significance is likely to lead to errors, not through repetition of the same experiments, but over many different experiments as mentioned in this paper.
Abstract: The custom of only publishing research when it reaches a certain degree of significance is likely to lead to errors, not through repetition of the same experiments, but over many different experiments.

Journal ArticleDOI
TL;DR: In this article, a need for extending the table of upper percentage points was revealed and a method of calculation of these percentage points is presented and a table containing these results is given.
Abstract: In using Student's t-distribution in testing component parts a need for extending the table of upper percentage points was revealed. The method of calculation of these percentage points is presented and a table containing these results is given.

Journal ArticleDOI
TL;DR: In this article, the authors present an attempt to bring together the agricultural and non-agricultural sectors by grafting onto a master model of the total economy a set of estimated relationships for the agricultural sector.
Abstract: Economists and agricultural economists have done parallel research in estimating quantitative economic relations for various economic sectors. This study is an attempt to bring together the agricultural and non-agricultural sectors by grafting onto a master model of the total economy a set of estimated relationships for the agricultural sector. In this way it is possible to trace the effects of changes in the non-agricultural economy through the agricultural sector and in turn to make some estimates of the contribution of agriculture to the total. In estimating supply, demand and price relationships within agriculture twelve product categories are examined. This permits the results to be applied as forecasting devices for disaggregated commodity groups within agriculture. The results are not as gratifying as one would wish and consequently many of the relationships are being re-examined and re-estimated.

Journal ArticleDOI
TL;DR: In this article, a class of ratio and regression type estimators is given such that the estimators are unbiased for random sampling, without replacement, from a finite population, and non-negative, unbiased estimators of estimator variance are provided for a subclass.
Abstract: A class of ratio and regression type estimators is given such that the estimators are unbiased for random sampling, without replacement, from a finite population. Non-negative, unbiased estimators of estimator variance are provided for a subclass. Similar results are given for the case of generalized procedures of sampling without replacement. Efficiency is compared with comparable sample selection and estimation methods for this case. * Most of this research was done while the author was associated with Iowa State College. It was completed while he was associated with the RAND Corporation.

Journal ArticleDOI
TL;DR: The authors studied the demographic structure and original interest in the subject matter of the study, at the time of the first and for each of the four subsequent interviews, and found that the demographic structures of the panel after five rounds of interviewing remained very similar to that of the original panel.
Abstract: Parts I and II of this paper evaluate “panel mortality” by studying the demographic structure and original interest in the subject matter of the study, at the time of the first and for each of the four subsequent interviews. Because of cancelling variations, the demographic structure of the panel after five rounds of interviewing remained very similar to that of the original panel. There was some tendency, however, for a disproportionate number of renters, low income people, and people not interested in the subject matter of the study to drop out after repeated interviews. The third part of this paper evaluates “repeated interview effects” by comparing the answers of panel members to the answers of members of a new probability sample of the urban, non-institutional population of the United States, who were interviewed at the same time on the same questions. Once the effects of differing income distribution in these groups were eliminated, there was little indication that the attitudes of a panel ...

Journal ArticleDOI
TL;DR: In this paper, it is conjectured that 2(min r − n/2)/ is distributed approximately as Dunnett's t. Tables based on this conjecture are computed and values are seen to agree well with comparable values from the exact distribution.
Abstract: Let (Xoj, X1j, …, Xkj) be the result of a single trial, where the subscripts o is associated with a control and the subscripts 1, …, k with treatments. To test the joint hypothesis P(Xij – Xoj > 0)=l/2 = P(Xij – Xoj < 0), all i, compute the test criterion (r1, …, rk) where ri is the number of times Xij, – Xoj is negative in n trials. A method for computing the distribution of (r1, …, rk) is illustrated. Exact probability distributions of min ri are given for k = 2, n = 4(1)10 and k = 3, n = 4(1)7. It is conjectured that 2(min r – n/2)/ is distributed approximately as Dunnett's t. Tables based on this conjecture are computed and values are seen to agree well with comparable values from the exact distribution.


Journal ArticleDOI
TL;DR: The diagonal matrices of type r as mentioned in this paper are a class of non-singular symmetric matrices that are characterized by linear combinations of the first (r − 1) rows providing the right-hand portion, starting with the elements on the principal diagonal, of the rth and remaining rows.
Abstract: Matrix inversion is used in the least squares analysis of data to estimate parameters and their variances and covariances. When the data come from the analysis of variance, analysis of covariance, order statistics, or the fitting of response-surfaces, the matrix to be inverted usually falls into a structured pattern that simplifies its inversion.One class of patterned matrices is characterized by non-singular symmetrical arrangemepts in which linear combinations of the first (r – 1) rows provide the right-hand portion, starting with the elements on the principal diagonal, of the rth and remaining rows. That is: for r i j, with vij = vij for all i ≠ j. The inverses of matrices of this clase contain a non-null principal diagonal, and immediately adjacent to the principal diagonal, (r − 1) non-null superdiagonals and (r – 1) non-null subdiagonals. All other elements are zero. These inverses are called diagonal matrices of type r. That is, a matrix is diagonal of type r if aij = 0 for |i – j| r and aij = aji ...

Journal ArticleDOI
TL;DR: In this paper, a Beta-approximation for the null distribution of the Kruskal-Wallis H-statistic for one-way analysis of variance of ranks is proposed.
Abstract: A Beta-approximation, commonly used to approximate permutation test distributions in the analysis of variance, is proposed for the null distribution of the Kruskal-Wallis H-statistic for one-way analysis of variance of ranks. The approximation seems slightly simpler and better than the Beta-approximation given by Kruskal and Wallis, particularly in bringing the H-test into closer relation to ordinary analysis of variance tests. Simple conditions on the group sizes allow further substantial simplifications in the approximations. Numerical comparisons for very small samples illustrate the various approximations.

Journal ArticleDOI
TL;DR: The proportion of men who said that they inhale differed very little between those smoking filter tip cigarettes and those smoking non-filter tip cigarettes.
Abstract: A survey conducted by mail was made to obtain information on inhalation in relation to type and amount of smoking. The proportion of men who said that they inhaled: (1) increased with amount of smoking and decreased with age, (2) was very much higher for cigarette smokers than for cigar and pipe smokers, and (3) was much higher for men who smoked only cigarettes than for men who smoked both cigarettes and cigars. The proportion of men who said that they inhale differed very little between those smoking filter tip cigarettes and those smoking non-filter tip cigarettes. A test was made to determine whether the wording of the letter of transmittal enclosed with the questionnaires, the organization from which the questionnaires were sent, the presence or absence of a postage stamp on the envelope enclosed for reply, or the failure of some men to reply had an influence on the findings. It appeared that these factors made very little difference in the percentage distribution of responses to questions o...

Journal ArticleDOI
TL;DR: In this article, the influence of linear autonomous growth on least squares (LS) and limited information single equation (LISE) estimates is examined using simulated economic data, and some procedures to improve estimates are suggested when linear autonomy is thought to be present.
Abstract: The influence of certain specification errors on estimates of parameters in economic models is examined using Monte Carlo techniques. Autonomous growth is a secular change in the endogenous variables not explained by the exogenous variables and parameters of the structural equations. Autonomous growth will therefore appear in the shocks, and this usually causes them to become correlated with the exogenous variables in economic models—a specification error. The influence of linear autonomous growth on least squares (LS) and limited information single equation (LISE) estimates is examined using simulated economic data. Estimates by both methods are badly biased when autonomous growth is present but ignored, and the use of probability theory tends to give very bad decisions. A simple change in the model removes the difficulty for the LISE estimates. Some procedures to improve estimates are suggested when linear autonomous growth is thought to be present. * The research reported here was made possibl...

Journal ArticleDOI
TL;DR: In this paper, it is argued that attention must be paid to non-null cases (in testing theory) if a satisfactory probabilistic model for sensory sorting tests is to be built, and if the efficiency of various experimental designs are to be considered.
Abstract: The well-known discussion of the principles of experimentation, illustrated by a taste-testing problem, in R. A. Fisher's Design of Experiments, is the basis of this expository paper. The notion of a hypothetical population of identical experiments is defended. It is argued that attention must be paid to non-null cases (in testing theory) if a satisfactory probabilistic model for sensory sorting tests is to be built, and if the efficiency of various experimental designs is to be considered. Finally, some remarks are made on the role of randomization, and on the problem of “inexact” acceptance regions in discrete distributions.

Journal ArticleDOI
TL;DR: In this article, the authors present a theory for estimating the proportions of names common to two or more lists of names, through use of samples drawn from the lists, covering the probability distributions, expected values, variances, and the third and fourth moments of the estimates of the proportions duplicated.
Abstract: This paper presents theory for estimation of the proportions of names common to two or more lists of names, through use of samples drawn from the lists. The theory covers (a) the probability distributions, expected values, variances, and the third and fourth moments of the estimates of the proportions duplicated; (b) testing a hypothesis with respect to a proportion; (c) optimum allocation of the samples; (d) the, effect of duplicates within a list; (e) possible gains from stratification. Examples illustrate some of the theory.

Journal ArticleDOI
TL;DR: In this article, the authors discuss the computation of variances for the estimators r = y/x and (r − r') where the random variables (variates) y and x are sample totals for two variates obtained from some multi-stage design.
Abstract: We discuss the computation of variances for the estimators r = y/x and (r — r') where the random variables (variates) y and x are sample totals for two variates obtained from some multi-stage design. The variate x often represents the sample size; then the ratio r = y/x is the simple and usual sample mean or proportion and a common statistic for presenting the results of sample surveys. The difference (r — r') occurs frequently and importantly in multistage samples either as the change in the estimates of the same characteristic obtained from two different surveys, or as the comparison of some characteristic estimated for two subclasses from the same survey. Several useful computational forms are presented for var(r — r') = var(r) + var(r') — 2 cov(r, r'). The aims of the presentation are: (a) to be general enough to cover the complexities which arise frequently in practical sample designs; (b) to provide easy computing formulas for good approximations; and (c) to make the procedures comprehensib...

Journal ArticleDOI
TL;DR: Evidence from earlier surveys and from a pretest conducted in Charlotte, North Carolina, was used in making certain decisions about the conduct of the interview for the National Health Survey, and certain procedures were adopted to improve the codability of disease and injury information secured.
Abstract: Evidence from earlier surveys and from a pretest conducted in Charlotte, North Carolina, was used in making certain decisions about the conduct of the interview for the National Health Survey. The inconclusiveness of evidence on the use of proxy respondents and on between-interviewer variance led to decisions to accept proxy respondents under certain conditions, and to continue with plans to use a staff of about 140 interviewers, but to accumulate further evidence on both these matters on a continuing basis. Check lists of diseases again proved efficacious in the Charlotte pretest. A recall period of two weeks was adopted for most illness and medical and dental care data, but it was decided not to attempt to count separate attacks of chronic illness. Certain procedures were adopted to improve the codability of disease and injury information secured. * Presented at the “Miscellaneous Session” of the Biometrics Section, American Statistical Association and the Biometric Society (ENAR), 117th Annual...

Journal ArticleDOI
TL;DR: A critical review of many sources on statistics of religious affiliation, including references to studies by social scientists that treat of or are closely related to religious affiliation can be found in this paper.
Abstract: The following bibliographic essay is a critical review of many sources on statistics of religious affiliation, including references to studies by social scientists that treat of or are closely related to religious affiliation. It is found that statistics of religious affiliation generally originate with the unstandardized records kept by clergymen or lay clerks in over 300,000 local churches, who are for the most part untrained. Officials of national religious bodies probably receive and publish reports from most local churches, but a considerable proportion of these officers make public official reports that are only their own estimates. Periodic compilations of “the latest information” are noted. A brief summary of the U. S. Censuses of Religious Bodies is also made. A Church Distribution Study is described. Social scientists probably regard most current statistics on religious affiliation as crude. The limitations and defects of these statistics have received relatively little documentary stud...