scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the American Statistical Association in 1983"



Journal ArticleDOI
TL;DR: In this paper, a prediction rule is constructed on the basis of some data, and then the error rate of this rule is estimated in classifying future observations using cross-validation.
Abstract: We construct a prediction rule on the basis of some data, and then wish to estimate the error rate of this rule in classifying future observations. Cross-validation provides a nearly unbiased estimate, using only the original data. Cross-validation turns out to be related closely to the bootstrap estimate of the error rate. This article has two purposes: to understand better the theoretical basis of the prediction problem, and to investigate some related estimators, which seem to offer considerably improved estimation in small samples.

2,331 citations


Journal ArticleDOI
Naihua Duan1
TL;DR: The smearing estimate as discussed by the authors is a nonparametric estimate of the expected response on the untransformed scale after fitting a linear regression model on a transformed scale, which is consistent under mild regularity conditions, and usually attains high efficiency relative to parametric estimates.
Abstract: The smearing estimate is proposed as a nonparametric estimate of the expected response on the untransformed scale after fitting a linear regression model on a transformed scale. The estimate is consistent under mild regularity conditions, and usually attains high efficiency relative to parametric estimates. It can be viewed as a low-premium insurance policy against departures from parametric distributional assumptions. A real-world example of predicting medical expenditures shows that the smearing estimate can outperform parametric estimates even when the parametric assumption is nearly satisfied.

2,093 citations


Journal ArticleDOI
TL;DR: In this paper, a review of the state of the art in multiparameter shrinkage estimators with emphasis on the empirical Bayes viewpoint, particularly in the case of parametric prior distributions, is presented.
Abstract: This article reviews the state of multiparameter shrinkage estimators with emphasis on the empirical Bayes viewpoint, particularly in the case of parametric prior distributions. Some successful applications of major importance are considered. Recent results concerning estimates of error and confidence intervals are described and illustrated with data.

1,409 citations


Journal ArticleDOI
TL;DR: The derivation and use of a measure of similarity between two hierarchical clusterings, Bk, is derived from the matching matrix, [mij], formed by cutting the two hierarchical trees and counting the number of matching entries in the k clusters in each tree.
Abstract: This article concerns the derivation and use of a measure of similarity between two hierarchical clusterings. The measure, Bk , is derived from the matching matrix, [mij ], formed by cutting the two hierarchical trees and counting the number of matching entries in the k clusters in each tree. The mean and variance of Bk are determined under the assumption that the margins of [mij ] are fixed. Thus, Bk represents a collection of measures for k = 2, …, n – 1. (k, Bk ) plots are found to be useful in portraying the similarity of two clusterings. Bk is compared to other measures of similarity proposed respectively by Baker (1974) and Rand (1971). The use of (k, Bk ) plots for studying clustering methods is explored by a series of Monte Carlo sampling experiments. An example of the use of (k, Bk ) on real data is given.

1,376 citations


Journal ArticleDOI
TL;DR: In this article, Martingales et al. present an integral representation of point-processes and queues in a Markovian network of queues, based on the Stieltjes-Lebesgue integral calculus.
Abstract: I Martingales.- 1. Histories and Stopping Times.- 2. Martingales.- 3. Predictability.- 4. Square-Integrable Martingales.- References.- Solutions to Exercises, Chapter I.- II Point Processes, Queues, and Intensities.- 1. Counting Processes and Queues.- 2. Watanabe's Characterization.- 3. Stochastic Intensity, General Case.- 4. Predictable Intensities.- 5. Representation of Queues.- 6. Random Changes of Time.- 7. Cryptographic Point Processes.- References.- Solutions to Exercises, Chapter II.- III Integral Representation of Point-Process Martingales.- 1. The Structure of Internal Histories.- 2. Regenerative Form of the Intensity.- 3. The Representation Theorem.- 4. Hilbert-Space Theory of Poissonian Martingales.- 5. Useful Extensions.- References.- Solutions to Exercises, Chapter III.- IV Filtering.- 1. The Theory of Innovations.- 2. State Estimates for Queues and Markov Chains.- 3. Continuous States and Nontrivial Prehistory.- References.- Solutions to Exercises, Chapter IV.- V Flows in Markovian Networks of Queues.- 1. Single Station : The Historical Results and the Filtering Method.- 2. Jackson's Networks.- 3. Burke's Output Theorem for Networks.- 4. Cascades and Loops in Jackson's Networks.- 5. Independence and Poissonian Flows in Markov Chains.- References.- Solutions to Exercises, Chapter V.- VI Likelihood Ratios.- 1. Radon-Nikodym Derivatives and Tests of Hypotheses.- 2. Changes of Intensities "a la Girsanov".- 3. Filtering by the Method of the Probability of Reference.- 4. Applications.- 5. The Capacity of a Point-Process Channel.- 6. Detection Formula187 References189 Solutions to Exercises, Chapter VI.- VII Optimal Control.- 1. Modeling Intensity Controls.- 2. Dynamic Programming for Intensity Controls : Complete-Observation Case.- 3. Input Regulation. A Case Study in Impulsive Control.- 4. Attraction Controls.- 5. Existence via Likelihood Ratio.- References.- Solutions to Exercises, Chapter VII.- VIII Marked Point Processes.- 1. Counting Measure and Intensity Kernels.- 2. Martingale Representation and Filtering.- 3. Radon-Nikodym Derivatives.- 4. Towards a General Theory of Intensity.- References.- Solutions to Exercises, Chapter VIII.- A1 Background in Probability and Stochastic Processes.- 1. Introduction.- 2. Monotone Class Theorem.- 3. Random Variables.- 4. Expectations.- 5. Conditioning and Independence.- 6. Convergence.- 7. Stochastic Processes.- 8. Markov Processes.- References.- A2 Stopping Times and Point-Process Histories.- 1. Stopping Times.- 2. Changes of Time and Meyer-Dellacherie's Integration Formula.- 3. Point-Process Histories.- References.- A3 Wiener-Driven Dynamical Systems.- 1. Ito's Stochastic Integral.- 2. Square-Integrable Brownian Martingales.- 3. Girsanov's Theorem.- References.- A4 Stieltjes-Lebesgue Calculus.- 1. The Stieltjes-Lebesgue Integral.- 2. The Product and Exponential Formulas.- References.- General Bibliography.

1,363 citations


Journal ArticleDOI
TL;DR: In this paper, the problem of finding all paths through a directed acyclic network that equal or exceed a fixed length is transformed into one of identifying all paths in a directed ACYCLIC network.
Abstract: An exact test of significance of the hypothesis that the row and column effects are independent in an r × c contingency table can be executed in principle by generalizing Fisher's exact treatment of the 2 × 2 contingency table. Each table in a conditional reference set of r × c tables with fixed marginal sums is assigned a generalized hypergeometric probability. The significance level is then computed by summing the probabilities of all tables that are no larger (on the probability scale) than the observed table. However, the computational effort required to generate all r × c contingency tables with fixed marginal sums severely limits the use of Fisher's exact test. A novel technique that considerably extends the bounds of computational feasibility of the exact test is proposed here. The problem is transformed into one of identifying all paths through a directed acyclic network that equal or exceed a fixed length. Some interesting new optimization theorems are developed in the process. The numer...

960 citations




Journal ArticleDOI
TL;DR: The use of sample survey weights in a least square regression analysis is examined with respect to four increasingly general specifications of the population regression model as mentioned in this paper, and the appropriateness of the weighted regression estimate depends on which model is chosen.
Abstract: The rationale for the use of sample survey weights in a least squares regression analysis is examined with respect to four increasingly general specifications of the population regression model. The appropriateness of the weighted regression estimate depends on which model is chosen. A proposal is made to use the difference between the weighted and unweighted estimates as an aid in choosing the appropriate model and hence the appropriate estimator. When applied to an analysis of the familial and environmental determinants of the educational level attained by a sample of young adults, the methods lead to a revision of the initial additive model in which interaction terms between county unemployment and race, as well as between sex and mother's education, are included.

597 citations


Journal ArticleDOI
TL;DR: In this article, a simple iterative algorithm is presented and shown to converge to the desired solution for minimizing a least square expression subject to side constraints, which is a commonly occurring problem in statistics.
Abstract: A commonly occurring problem in statistics is that of minimizing a least squares expression subject to side constraints. Here a simple iterative algorithm is presented and shown to converge to the desired solution. Several examples are presented, including finding the closest concave (convex) function to a set of points and other general quadratic programming problems. The dual problem to the basic problem is also discussed and a solution for it is given in terms of the algorithm. Finally, extensions to expressions other than least squares are given.

Journal ArticleDOI
TL;DR: In this paper, three methods of cohort analysis are presented for a statistical model wherein the explanatory or exposure variables act multiplicatively on age × calendar year specific death rates, and all three approaches yield roughly equivalent estimates of the relative risk associated with arsenic exposure.
Abstract: Three methods of cohort analysis are presented for a statistical model wherein the explanatory or exposure variables act multiplicatively on age × calendar year specific death rates. The first method, which assumes that the baseline rates are known from national vital statistics, is a multiple regression analysis of the standardized mortality ratio. The second method is a variant of Cox's proportional hazards analysis in which the baseline rates are treated as unknown nuisance parameters. The third method consists of case-control sampling from the risk sets formed in the course of applying Cox's model. It requires substantially less computation than do the other two. In illustrative analysis of respiratory cancer deaths among a cohort of smelter workers, all three approaches yield roughly equivalent estimates of the relative risk associated with arsenic exposure. The discussion centers on the tradeoff between efficiency and bias in the selection of a particular method of analysis, and on practica...

Journal ArticleDOI
TL;DR: In this paper, it was shown that for X with binomial (n, p) distribution, the confidence coefficients of the Binomial → Normal convergence as p → 0 fail to have their stated asymptotic confidence coefficients; simple corrections are given for this.
Abstract: For X with Binomial (n, p) distribution, Section 1 gives a one-page table of .95 and .99 confidence intervals for p, for n = 1, 2, …, 30. This interval is equivariant under X → n − X and p → 1 − p, has approximately equal probability tails, is approximately unbiased, has Crow's property of minimizing the sum of the n + 1 possible lengths, and each of its ends is increasing in X and decreasing in n with about as regular steps as possible. Sections 2 and 3 consider the usual approximate confidence intervals. Calculations and asymptotic results show the need for the continuity correction in these even when n is large. Because of the nonuniformity of the Binomial → Normal convergence as p → 0, these intervals fail to have their stated asymptotic confidence coefficients; simple corrections are given for this. In the approximate interval (X/n) ± {(c/√n)√[(X/n)(1 − X/n)] + 1/(2n)} it is shown that the factor (c/√n) should be replaced by c/√ (n − c 2 − 2c/√n − 1/n).

Journal ArticleDOI
TL;DR: In this paper, splines are presented as a nonparametric function estimating technique, and the method of cross-validation for choosing the smoothing parameter is discussed and the general multivariate regression/surface estimation problem is addressed.
Abstract: This is a survey article that attempts to synthesize a broad variety of work on splines in statistics. Splines are presented as a nonparametric function estimating technique. After a general introduction to the theory of interpolating and smoothing splines, splines are treated in the nonparametric regression setting. The method of cross-validation for choosing the smoothing parameter is discussed and the general multivariate regression/surface estimation problem is addressed. An extensive discussion of splines as nonparametric density estimators is followed by a discussion of their role in time series analysis. A comparison of the spline and isotonic regression methodologies leads to a formulation of a hybrid estimator. The closing section provides a brief overall summary and formulates a number of open/unsolved problems relating to splines in statistics.

Journal ArticleDOI
Morris H. Hansen1
TL;DR: In this article, the authors contrast inferences that are dependent on an assumed model with inferences based on the randomization induced by the sample selection plan and conclude with a summary of principles that should guide the practitioner of sample surveys of finite populations.
Abstract: In this paper we are concerned with inferences from a sample survey to a finite population. We contrast inferences that are dependent on an assumed model with inferences based on the randomization induced by the sample selection plan. Randomization consistency for finite population estimators is defined and adopted as a requirement of probability sampling. A numerical example is examined to illustrate the dangers in the use of model-dependent estimators even when the model is apparently consonant with the sample data. The paper concludes with a summary of principles that we believe should guide the practitioner of sample surveys of finite populations.




Journal ArticleDOI
TL;DR: This article shows how to fit a smooth curve (polynomial spline) to pairs of data values (yi, xi) using the Kalman filter to evaluate the likelihood function and achieve significant computational advantages over previous approaches to this problem.
Abstract: This article shows how to fit a smooth curve (polynomial spline) to pairs of data values (yi, xi ). Prior specification of a parametric functional form for the curve is not required. The resulting curve can be used to describe the pattern of the data, and to predict unknown values of y given x. Both point and interval estimates are produced. The method is easy to use, and the computational requirements are modest, even for large sample sizes. Our method is based on maximum likelihood estimation of a signal-in-noise model of the data. We use the Kalman filter to evaluate the likelihood function and achieve significant computational advantages over previous approaches to this problem.

Journal ArticleDOI
TL;DR: In this paper, a Monte Carlo comparison of the estimators is presented, showing improvement of the bias and mean squared error over some estimators commonly used in practice, and compared when auxiliary information is available.
Abstract: Variance estimators under several sample designs are proposed and compared when auxiliary information is available. Improvement of the bias and mean squared error over some estimators commonly used in practice is illustrated. A Monte Carlo comparison of the estimators is also presented.


Journal ArticleDOI
TL;DR: In this article, the robustness and the number of design points for methods involving initial estimates, and for sequential methods in a small number of stages, are discussed, as well as the criterion of constant information for models involving one or two parameters.
Abstract: Models for binary data are usually such that the information matrix depends on the unknown parameters. Thus the standard criteria for optimality in regression experiments cannot be applied without modification. Methods of going around this difficulty include the use of initial point estimates, sequential methods, and Bayesian analysis. This article is mainly concerned with the robustness and the number of design points for methods involving initial estimates, and for sequential methods in a small number of stages. A final section discusses the criterion of constant information for models involving one or two parameters, and summarizes recent results in this field.



Journal ArticleDOI
TL;DR: A review of past population projection errors is presented as a means for constructing confidence intervals for future projections of the U.S. and U.N. population up to the year 2000.
Abstract: Population projections are key elements of many planning and policy studies but are inherently inaccurate. This study of past population projection errors provides a means for constructing confidence intervals for future projections. We first define a statistic to measure projection errors independently of the size of the population and the length of the projection period. A sample of U.S. Census Bureau and United Nations projections indicates that the distributions of components of the error statistic are relatively stable. We then use this information to construct confidence intervals for the total population of the United States through the year 2000. We find that for projections of total population size, simple projection techniques are more accurate than more complex techniques.

Journal ArticleDOI
TL;DR: In this paper, a unified framework for multimodal generalizations of the normal, gamma, inverse gamma, and beta distributions is introduced, and a statistic for bimodality, based on Cardan's discriminant for a cubic shape polynomial, is introduced.
Abstract: Multimodal generalizations of the normal, gamma, inverse gamma, and beta distributions are introduced within a unified framework. These multimodal distributions, belonging to the exponential family, require fewer parameters than corresponding mixture densities and have unique maximum likelihood estimators. Simple moment recursion relations, which make maximum likelihood estimation feasible, also yield easily computed estimators that themselves are shown to be consistent and asymptotically normal. Lastly, a statistic for bimodality, based on Cardan's discriminant for a cubic shape polynomial, is introduced.


Journal ArticleDOI
TL;DR: A class of Bayesian statistical methods for interspecies extrapolation of dose-response functions, using a system of hierarchical prior distributions similar to that of Lindley and Smith (1972), is proposed for the estimation of human lung cancer risk from various environmental emissions.
Abstract: We propose a class of Bayesian statistical methods for interspecies extrapolation of dose-response functions. The methods distinguish formally between the conventional sampling error within each dose-response experiment and a novel error of uncertain relevance between experiments. Through a system of hierarchical prior distributions similar to that of Lindley and Smith (1972), the dose-response data from many substances and species are used to estimate the interexperimental error. The data, the estimated error of interspecies extrapolation, and prior biological information on the relations between species or between substances each contribute to the posterior densities of human dose-response. We apply our methods to an illustrative problem in the estimation of human lung cancer risk from various environmental emissions.

Journal ArticleDOI
TL;DR: The optimal number of regressors is determined to minimize mean squared prediction error and is shown to be a small fraction of the number of data points and the Sp criterion provides an asymptotically optimal rule for theNumber of variables to enter.
Abstract: The optimal number of regressors is determined to minimize mean squared prediction error and is shown to be a small fraction of the number of data points. As the number of regressors grows large, the Sp criterion provides an asymptotically optimal rule for the number of variables to enter.