scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the American Statistical Association in 1971"


Journal ArticleDOI
TL;DR: This article proposes several criteria which isolate specific aspects of the performance of a method, such as its retrieval of inherent structure, its sensitivity to resampling and the stability of its results in the light of new data.
Abstract: Many intuitively appealing methods have been suggested for clustering data, however, interpretation of their results has been hindered by the lack of objective criteria. This article proposes several criteria which isolate specific aspects of the performance of a method, such as its retrieval of inherent structure, its sensitivity to resampling and the stability of its results in the light of new data. These criteria depend on a measure of similarity between two different clusterings of the same set of data; the measure essentially considers how each pair of data points is assigned in each clustering.

6,179 citations


Journal ArticleDOI
TL;DR: Proper scoring rules, i.e., devices of a certain class for eliciting a person's probabilities and other expectations, are studied, mainly theoretically but with some speculations about application as discussed by the authors.
Abstract: Proper scoring rules, i.e., devices of a certain class for eliciting a person's probabilities and other expectations, are studied, mainly theoretically but with some speculations about application. The relation of proper scoring rules to other economic devices and to the foundations of the personalistic theory of probability is brought out. The implications of various restrictions, especially symmetry restrictions, on scoring rules is explored, usually with a minimum of regularity hypothesis.

1,174 citations


Journal ArticleDOI
TL;DR: In this paper, estimators for the scale parameter and characteristic exponent of symmetric stable distributions are proposed and Monte Carlo studies of these estimators are reported. And the powers of various goodness-of-fit tests of a Gaussian null hypothesis against non-Gaussian stable alternatives are also investigated.
Abstract: Building on results of an earlier article [6], estimators are suggested for the scale parameter and characteristic exponent of symmetric stable distributions, and Monte Carlo studies of these estimators are reported. The powers of various goodness-of-fit tests of a Gaussian null hypothesis against non-Gaussian stable alternatives are also investigated. Finally, a test of the stability property of symmetric stable variables is suggested and demonstrated.

529 citations


Journal ArticleDOI
TL;DR: In this article, the problem of adjusting monthly or quarterly time series to make them conform with independent annual totals or averages without introducing artificial discontinuities is considered, and a general approach and some specific procedures involving constrained minimization of a quadratic form in the differences between revised and unrevised series are proposed.
Abstract: This article considers the problem of adjusting monthly or quarterly time series to make them accord with independent annual totals or averages without introducing artificial discontinuities. A general approach and some specific procedures involving constrained minimization of a quadratic form in the differences between revised and unrevised series are proposed. Some computational advantages are noted. Attention is given to the relationships between the adjustment problem and earlier work by other authors on the creation of monthly or quarterly series when only annual figures are available. An example is provided to illustrate the application of the proposed adjustment procedures.

446 citations


Journal ArticleDOI
TL;DR: In this article, a variant of the two-stage least squares technique is used to estimate the parameters of a nonlinear model, and the reduced form equations of such models are derived and discussed.
Abstract: It is demonstrated that a variant of the two-stage least squares technique can be used to estimate the parameters of a nonlinear model. To do this, the reduced form equations of such models are derived and discussed; then certain problems particular to the estimation of nonlinear models are considered.

363 citations


Journal ArticleDOI
TL;DR: Cluster analysis involves the problem of optimal partitioning of a given set of entities into a pre-assigned number of mutually exclusive and exhaustive clusters that lead to different kinds of linear and non-linear integer programming problems.
Abstract: Cluster analysis involves the problem of optimal partitioning of a given set of entities into a pre-assigned number of mutually exclusive and exhaustive clusters. Here the problem is formulated in two different ways with the distance function (a) of minimizing the within groups sums of squares and (b) minimizing the maximum distance within groups. These lead to different kinds of linear and non-linear (0–1) integer programming problems. Computational difficulties are discussed and efficient algorithms are provided for some special cases.

357 citations


Journal ArticleDOI
TL;DR: In this article, a measure of variation for categorical data is discussed, and a test statistic is constructed on the basis of these properties, and its asymptotic behavior under the null hypothesis of independence is studied.
Abstract: A measure of variation for categorical data is discussed. We develop an analysis of variance for a one-way table, where the response variable is categorical. The data can be viewed alternatively as falling in a two-dimensional contingency table with one margin fixed. Components of variation are derived, and their properties are investigated under a common multinomial model. Using these components, we propose a measure of the variation in the response variable explained by the grouping variable. A test statistic is constructed on the basis of these properties, and its asymptotic behavior under the null hypothesis of independence is studied. Empirical sampling results confirming the asymptotic behavior and investigating power are included.

320 citations


Journal ArticleDOI
TL;DR: In this paper, a procedure for obtaining maximum likelihood estimates and likelihood confidence regions in the intersecting two-phase linear regression model is presented, illustrated on a small set of data, and the distributional properties are examined empirically.
Abstract: Procedures are outlined for obtaining maximum likelihood estimates and likelihood confidence regions in the intersecting two-phase linear regression model. The procedures are illustrated on a small set of data, and the distributional properties are examined empirically.

306 citations


Journal ArticleDOI
TL;DR: In this paper, the authors show the useful results which can be obtained by simply reversing the order between selection units and component variables in this linear expression, assuming that the samples are large enough to justify using the Taylor approximation.
Abstract: A method often used for computing the variance of a complicated sample estimate is to first apply the Taylor approximation to reduce non-linear forms of the variables to linear form. This article shows the useful results which can be obtained by merely reversing the order between selection units and component variables in this linear expression. The method is completely general (assuming that the samples are large enough to justify using the Taylor approximation) involving no restrictions on (a) the form of the estimate, (b) the number of random variables involved in the estimate, (c) the type, complexity or number of the sample designs involved in the estimate.

304 citations


Journal ArticleDOI
TL;DR: In this paper, the randomized response technique of reducing respondent bias in obtaining answers to sensitive questions is extended from the situation where response is categorical to that in which the response is quantitative, and results are reported on the application of the method to estimating mean number of abortions in an urban population of women, and mean income of heads of households.
Abstract: The randomized response technique of reducing respondent bias in obtaining answers to sensitive questions is extended from the situation where response is categorical to that in which the response is quantitative. Results are reported on the application of the method to estimating mean number of abortions in an urban population of women, and mean income of heads of households. The efficiency of estimators based on the method of moments in the randomized response procedure is studied and representative results are reported and discussed.

300 citations


Journal ArticleDOI
TL;DR: In this article, an empirical study of the chi-square approximations as commonly encountered in behavioral research involved: (1) both tests of goodness-of-fit and of independence, (2) uniform distributions and two levels of departure from uniform, (3) sample sizes ranging from 10 to 100.
Abstract: This empirical study of the chi-square approximations as commonly encountered in behavioral research involved: (1) both tests of goodness-of-fit and of independence, (2) uniform distributions and two levels of departure from uniform, (3) sample sizes ranging from 10 to 100. Excellent approximations were obtained with average expected frequencies of one or two in tests of goodness-of-fit to uniform; slightly higher expected frequencies were required with the non-uniform cases. Tests of independence were strikingly robust with respect to Type I errors; in almost all cases the errors were in the conservative direction.

Journal ArticleDOI
TL;DR: In this paper, four methods of combining independent tests of hypothesis are compared via exact Bahadur relative efficiency: Fisher's method, the mean of the normal transforms of the significance levels, the maximum significance level, and the minimum significance level.
Abstract: Four methods of combining independent tests of hypothesis are compared via exact Bahadur relative efficiency. The methods considered are Fisher's method, the mean of the normal transforms of the significance levels, the maximum significance level, and the minimum significance level. None of these is uniformly more powerful than the others, but, according to Bahadur efficiency, Fisher's method is the most efficient of the four. In some cases, Fisher's method is most efficient of all tests based on the data, but this is not generally true.

Journal ArticleDOI
TL;DR: In this article, the authors studied the economic design of -charts used to maintain current control of a process when there is a single assignable cause occurring randomly, but with known effect.
Abstract: An earlier article by the author [4] studied the economic design of -charts used to maintain current control of a process when there is a single assignable cause occurring randomly, but with known effect. The present article extends the study to allow for the occurrence of several assignable causes the probability distribution of which is known. The initial model studied reveals the existence of readily acceptable (local minimum) solutions that are relatively stable with respect to model changes, including marked changes in the distribution of assignable causes. There were also found in some cases economically better solutions that would not be as readily acceptable as those offered by the local minima (e.g., the limits might fall at ± 6σ). The article argues that as extensions of the model approach reality, only the local-minimum solutions will remain. It then goes on to show that these can be well approximated by solutions of single-cause models. Thus in practice it may be sufficient to use sin...

Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of estimating the mean of a normal distribution when the mean itself has a normal prior, and propose a set of rules which are compromises between the Bayes rule and the MLE.
Abstract: The first part of this article considers the Bayesian problem of estimating the mean, θ, of a normal distribution when the mean itself has a normal prior. The usual Bayes estimator for this situation has high risk if θ is far from the mean of the prior distribution. We suggest rules which do not have this bad property and still perform well against the normal prior. These rules are compromises between the Bayes rule and the MLE. Similar rules are suggested for the empirical Bayes situation where the mean and variance of the prior is unknown but can be estimated from the data provided by several simultaneous estimation problems. In this case the suggested rules compromise between the James-Stein estimator of a mean vector and the MLE.

Journal ArticleDOI
TL;DR: In this article, the strong unimodality of log-concave probability density functions was demonstrated for lattice distributions and the potential significance of their potential significance was suggested. But their results were not considered.
Abstract: In a classical theorem, Ibragimov demonstrated the strong unimodality of log-concave probability density functions. Comparable results for lattice distributions are exhibited and their potential significance is suggested.

Journal ArticleDOI
TL;DR: A more complete, unifying approach to statistical theory and communication theory is proposed, and it is asked whether convexity conditions required for competitive market equilibrium are satisfied.
Abstract: n an information-processing chain, only the initial inputs (“environment”) and the terminal outputs (“actions”) affect directly the benefit to the user who maximizes its expected excess over cost. All intermediate flows (“symbols”) affect directly only costs and delays. Delays affect benefit non-additively, through “impatience” and, possibly, “obsolescence.” Traditionally, statistical theory disregards delays, and communication theory treats them as costs. A more complete, unifying approach is proposed, and it is asked whether convexity conditions (e.g., “decreasing marginal returns”) required for competitive market equilibrium are satisfied.

Journal ArticleDOI
TL;DR: In this article, the problem of choosing the optimal design to estimate a regression function which can be well-approximated by a polynomial is considered, and two new optimality criteria are presented and discussed.
Abstract: The problem of choosing the optimal design to estimate a regression function which can be well-approximated by a polynomial is considered, and two new optimality criteria are presented and discussed. Use of these criteria is illustrated by a detailed discussion of the case that the regression function can be assumed approximately linear. These criteria, which can be considered as compromises between the incompatible goals of inference about the regression function under an assumed model and of checking the model's adequacy, are found to yield designs superior in certain respects to others which have been proposed to deal with this problem, including minimum bias designs.


Journal ArticleDOI
TL;DR: It is shown that the optimized method is superior to Warner's Technique and one of the recommendations, however, is far from being the most efficient.
Abstract: In their article [1] B. G. Greenberg, et al. discussed Simmons' modification of Warner's Randomized Response Technique and advised on the choice of the parameters in the model. One of their recommendations, however, is far from being the most efficient. The correctness of their other recommendations is proved and it is shown that the optimized method is superior to Warner's Technique.

Journal ArticleDOI
TL;DR: In this article, it was shown that no continuous bivariate distribution with constant failure rate exists except in the special case when the marginals are independently distributed, and it was further shown that such a distribution is impossible to obtain in general.
Abstract: In this article bivariate failure rate is defined, and it is shown that no absolutely continuous bivariate distribution with constant failure rate exists except in the special case when the marginals are independently distributed.

Journal ArticleDOI
TL;DR: The results reported here show that methods such as scoring rules and bets are useful in leading individuals to make careful probability assessments, with predictions determined by mechanical schemes and by the organized betting market proving superior to those of many of the subjects.
Abstract: This article concerns a study in which personal probability assessments regarding the outcomes of football games were obtained. The results reported here, which include a detailed investigation of the assessments, an evaluation of the assessments in an inferential context and in a decision-theoretic context, and a discussion of the performance of a consensus, show that methods such as scoring rules and bets are useful in leading individuals to make careful probability assessments. Considerable variability existed among subjects, however, with predictions determined by mechanical schemes and by the organized betting market proving superior to those of many of the subjects.

Journal ArticleDOI
TL;DR: In this paper, a general linear randomized response model with estimates and variances obtained through analogy with familiar linear regression models is established with a special case of this more general model, and some competing procedures are suggested by the applicability of the model for multivariate mixes of randomized and non-randomized response using either discrete or continuous random variables.
Abstract: A general linear randomized response model is established with estimates and variances obtained through analogy with familiar linear regression models. All existing randomized response procedures are shown to be special cases of this more general model. Some competing procedures are suggested by the applicability of the model for multivariate mixes of randomized and non-randomized response using either discrete or continuous random variables. Some additional applications are suggested by the applicability of the model for situations where the data have already been collected by some agency but where there are disclosure restrictions.

Journal ArticleDOI
TL;DR: In this article, a family of random variables defined by the transformation Z = [Uλ - (1 - U)γ]/λ where U is uniformly distributed on [0, 1] is described with emphasis on properties of the sample range.
Abstract: Tukey introduced a family of random variables defined by the transformation Z = [Uλ - (1 - U)γ]/λ where U is uniformly distributed on [0, 1]. Some of its properties are described with emphasis on properties of the sample range. The rectangular and logistic distributions are members of this family and distributions corresponding to certain values of λ give useful approximations to the normal and t distributions. Closed form expressions are given for the expectation and coefficient of variation of the range and numerical values are computed for n = 2(1)6(2)12, 15, 20 for several values of λ. It is observed that Plackett's upper bound on the expectation of the range for samples of size n is attained for a λ distribution with λ = n − 1.

Journal ArticleDOI
TL;DR: In this paper, the authors used Monte Carlo methods for the test of Hotelling's test of, Williams's modification of the Hotelling test, and for two tests of, based on Fisher's z transformation.
Abstract: When two correlation coefficients are calculated from a single sample, rather than from two samples, they are not statistically independent, and the usual methods for testing equality of the population correlation coefficients no longer apply. This article considers tests to be made using a sample from a multivariate normal distribution. Small sample level of significance and power are obtained using Monte Carlo methods for Hotelling's test of , Williams's modification of Hotelling's test, and for two tests of , based on Fisher's z transformation.

Journal ArticleDOI
TL;DR: In this article, the authors considered the problem of estimating θ = Pr[Y < X] in both distribution-free and parametric frameworks, using a Bayesian approach.
Abstract: The problem of estimating θ = Pr[Y < X] has been considered in the literature in both distribution-free and parametric frameworks. In this article, using a Bayesian approach, we consider the estimation of θ from two approaches. The first, analogous to the classical procedure, is concerned with the problem of parametric estimation. The second, peculiar to the Bayesian approach, is directed to the query, “For two future observations, × and Y, what is the probability (given only the available sample data) that Y is less than X” This probability, termed the predictive probability, is not an estimate but is, in fact, a probability. These two views are related in that this predictive probability is the mean of the posterior distribution of θ. In the following sections, these Bayesian procedures are applied to the case of independent exponentially distributed random variables and to various cases of the normal distribution. The Bayesian estimates thus obtained are compared, whenever possible, with their...

Journal ArticleDOI
TL;DR: In this article, a step-by-step method is presented for partitioning a certain kind of hypothesis H about the m-way contingency table into a series of hypotheses about marginal tables formed from the mway table by ignoring one or more of table's m dimensions.
Abstract: A step-by-step method is presented herein for partitioning a certain kind of hypothesis H about the m-way contingency table into (a) a series of hypotheses about marginal tables formed from the m-way table by ignoring one or more of table's m dimensions; and (b) a hypothesis about independence, conditional independence, or conditional equiprobability in the m-way table. This step-by-step method facilitates both the testing of H and the calculation of , the estimated expected frequencies in the m-way table under H. The method introduced herein for calculating is easier to apply than the usual iterative-scaling method in many cases.

Journal ArticleDOI
TL;DR: In this paper, a class of ratio type estimators for estimating the mean of a finite population using information on p auxiliary characters x1, ···, xp1 was considered and asymptotic expressions for the bias and the variance of the estimator were obtained.
Abstract: For estimating the mean of a finite population using information on p auxiliary characters x1, ···, xp1 a class of ratio type estimators is considered. For any function h(u1, ···, up) = h(u) where , the ratio of the simple random sample mean and the population mean of the character xi, such that h(e) = 1, ei = 1, i = 1, ···, p, and such that it satisfies the conditions (1), (2) and (3) of Section 3, the estimator considered is . Asymptotic expressions for the bias and the variance of the estimator are obtained and it has been shown that ratio estimators of the form are asymptotically no more efficient than the regression estimator.


Journal ArticleDOI
TL;DR: In this paper, a general procedure for reducing the bias of point estimators is introduced, including the "jackknife" as a special case, and an interesting algorithm for the correct method is defined.
Abstract: A general procedure for reducing the bias of point estimators is introduced. The technique includes the “jackknife” as a special case. The existing notion of reapplication is shown to lack a desirable bias removal property for which it was originally designed. Proper reapplication is proposed to conform to the general notion of higher order bias elimination and an interesting algorithm for the correct method is defined. Illustrative examples are drawn from ratio estimation, reliability and truncated distributions. The reduced mean square error which attracted some attention to the jackknife is present in the generalization for some applications.

Journal ArticleDOI
TL;DR: In this article, the authors surveyed aggregation articles in various areas of economics, statistics, and accounting to develop a common body of fundamental queries which underlie these articles and made an assessment of the significance of aggregation theory in scientific investigations.
Abstract: Aggregation articles in various areas of economics, statistics, and accounting are surveyed to develop a common body of fundamental queries which underlie these articles. The conditions for total consistency between a microsystem and a macrosystem that have been developed for various systems are investigated and are related to the consistency conditions for general relational systems. This is followed by an analysis of other types of queries commonly observed in the aggregation literature, e.g., partial consistency, errors and biases, evaluation and selection of aggregation functions. Finally, an assessment is made of the significance of aggregation theory in scientific investigations.