scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the American Statistical Association in 1993"


Journal ArticleDOI
TL;DR: A class of statistical models that generalizes classical linear models-extending them to include many other models useful in statistical analysis, of particular interest for statisticians in medicine, biology, agriculture, social science, and engineering.
Abstract: Addresses a class of statistical models that generalizes classical linear models-extending them to include many other models useful in statistical analysis. Incorporates numerous exercises, both theoretical and data-analytic Discusses quasi-likelihood functions and estimating equations, models for dispersion effect, components of dispersion, and conditional likelihoods Holds particular interest for statisticians in medicine, biology, agriculture, social science, and engineering

5,678 citations


Journal ArticleDOI
TL;DR: In this paper, generalized linear mixed models (GLMM) are used to estimate the marginal quasi-likelihood for the mean parameters and the conditional variance for the variances, and the dispersion matrix is specified in terms of a rank deficient inverse covariance matrix.
Abstract: Statistical approaches to overdispersion, correlated errors, shrinkage estimation, and smoothing of regression relationships may be encompassed within the framework of the generalized linear mixed model (GLMM). Given an unobserved vector of random effects, observations are assumed to be conditionally independent with means that depend on the linear predictor through a specified link function and conditional variances that are specified by a variance function, known prior weights and a scale factor. The random effects are assumed to be normally distributed with mean zero and dispersion matrix depending on unknown variance components. For problems involving time series, spatial aggregation and smoothing, the dispersion may be specified in terms of a rank deficient inverse covariance matrix. Approximation of the marginal quasi-likelihood using Laplace's method leads eventually to estimating equations based on penalized quasilikelihood or PQL for the mean parameters and pseudo-likelihood for the variances. Im...

4,317 citations


Journal ArticleDOI
TL;DR: In this paper, exact Bayesian methods for modeling categorical response data are developed using the idea of data augmentation, which can be summarized as follows: the probit regression model for binary outcomes is seen to have an underlying normal regression structure on latent continuous data, and values of the latent data can be simulated from suitable truncated normal distributions.
Abstract: A vast literature in statistics, biometrics, and econometrics is concerned with the analysis of binary and polychotomous response data. The classical approach fits a categorical response regression model using maximum likelihood, and inferences about the model are based on the associated asymptotic theory. The accuracy of classical confidence statements is questionable for small sample sizes. In this article, exact Bayesian methods for modeling categorical response data are developed using the idea of data augmentation. The general approach can be summarized as follows. The probit regression model for binary outcomes is seen to have an underlying normal regression structure on latent continuous data. Values of the latent data can be simulated from suitable truncated normal distributions. If the latent data are known, then the posterior distribution of the parameters can be computed using standard results for normal linear models. Draws from this posterior are used to sample new latent data, and t...

3,272 citations


Journal ArticleDOI
TL;DR: In this paper, the Gibbs sampler is used to indirectly sample from the multinomial posterior distribution on the set of possible subset choices to identify the promising subsets by their more frequent appearance in the Gibbs sample.
Abstract: A crucial problem in building a multiple regression model is the selection of predictors to include. The main thrust of this article is to propose and develop a procedure that uses probabilistic considerations for selecting promising subsets. This procedure entails embedding the regression setup in a hierarchical normal mixture model where latent variables are used to identify subset choices. In this framework the promising subsets of predictors can be identified as those with higher posterior probability. The computational burden is then alleviated by using the Gibbs sampler to indirectly sample from this multinomial posterior distribution on the set of possible subset choices. Those subsets with higher probability—the promising ones—can then be identified by their more frequent appearance in the Gibbs sample.

2,780 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider the median absolute deviation MAD n = 1.1926 med, {med j | xi − xj |} and the estimator Qn given by the.25 quantile of the distances {|xi − x j |; i < j}.
Abstract: In robust estimation one frequently needs an initial or auxiliary estimate of scale. For this one usually takes the median absolute deviation MAD n = 1.4826 med, {|xi − med j x j |}, because it has a simple explicit formula, needs little computation time, and is very robust as witnessed by its bounded influence function and its 50% breakdown point. But there is still room for improvement in two areas: the fact that MAD n is aimed at symmetric distributions and its low (37%) Gaussian efficiency. In this article we set out to construct explicit and 50% breakdown scale estimators that are more efficient. We consider the estimator Sn = 1.1926 med, {med j | xi − xj |} and the estimator Qn given by the .25 quantile of the distances {|xi − x j |; i < j}. Note that Sn and Qn do not need any location estimate. Both Sn and Qn can be computed using O(n log n) time and O(n) storage. The Gaussian efficiency of Sn is 58%, whereas Qn attains 82%. We study Sn and Qn by means of their influence functions, their b...

1,924 citations


Journal ArticleDOI
Jun Shao1
TL;DR: In this article, the authors show that the inconsistency of the leave-one-out cross-validation can be rectified by using a leave-n v -out crossvalidation with n v, the number of observations reserved for validation, satisfying n v /n → 1 as n → ∞.
Abstract: We consider the problem of selecting a model having the best predictive ability among a class of linear models. The popular leave-one-out cross-validation method, which is asymptotically equivalent to many other model selection methods such as the Akaike information criterion (AIC), the C p , and the bootstrap, is asymptotically inconsistent in the sense that the probability of selecting the model with the best predictive ability does not converge to 1 as the total number of observations n → ∞. We show that the inconsistency of the leave-one-out cross-validation can be rectified by using a leave-n v -out cross-validation with n v , the number of observations reserved for validation, satisfying n v /n → 1 as n → ∞. This is a somewhat shocking discovery, because nv/n → 1 is totally opposite to the popular leave-one-out recipe in cross-validation. Motivations, justifications, and discussions of some practical aspects of the use of the leave-n v -out cross-validation method are provided, and results ...

1,700 citations


Journal ArticleDOI
TL;DR: In this paper, the authors examined the problem of selecting an Archimedean copula providing a suitable representation of the dependence structure between two variates X and Y in the light of a random sample (X 1, Y 1, X n, Y n ).
Abstract: A bivariate distribution function H(x, y) with marginals F(x) and G(y) is said to be generated by an Archimedean copula if it can be expressed in the form H(x, y) = ϕ–1[ϕ{F(x)} + ϕ{G(y)}] for some convex, decreasing function ϕ defined on [0, 1] in such a way that ϕ(1) = 0. Many well-known systems of bivariate distributions belong to this class, including those of Gumbel, Ali-Mikhail-Haq-Thelot, Clayton, Frank, and Hougaard. Frailty models also fall under that general prescription. This article examines the problem of selecting an Archimedean copula providing a suitable representation of the dependence structure between two variates X and Y in the light of a random sample (X 1, Y 1), …, (X n , Y n ). The key to the estimation procedure is a one-dimensional empirical distribution function that can be constructed whether the uniform representation of X and Y is Archimedean or not, and independently of their marginals. This semiparametric estimator, based on a decomposition of Kendall's tau statistic...

1,246 citations


Journal ArticleDOI
TL;DR: For multivariate data with a general pattern of missing values, the literature has tended to adopt the selection-modeling approach (see for example Little and Rubin); here, pattern-mixture models are proposed for this more general problem.
Abstract: Consider a random sample on variables X1, …, Xv with some values of Xv missing. Selection models specify the distribution of X1 , …, XV over respondents and nonrespondents to Xv , and the conditional distribution that Xv is missing given X1 , …, Xv . In contrast, pattern-mixture models specify the conditional distribution of X 1, …, Xv given that XV is observed or missing respectively and the marginal distribution of the binary indicator for whether or not Xv is missing. For multivariate data with a general pattern of missing values, the literature has tended to adopt the selection-modeling approach (see for example Little and Rubin); here, pattern-mixture models are proposed for this more general problem. Pattern-mixture models are chronically underidentified; in particular for the case of univariate nonresponse mentioned above, there are no data on the distribution of Xv given X1 , …, XV–1 , in the stratum with Xv missing. Thus the models require restrictions or prior information to identify the paramet...

992 citations


Journal ArticleDOI
Boxin Tang1
TL;DR: It is proved that when used for integration, the sampling scheme with OA-based Latin hypercubes offers a substantial improvement over Latin hypercube sampling.
Abstract: In this article, we use orthogonal arrays (OA's) to construct Latin hypercubes. Besides preserving the univariate stratification properties of Latin hypercubes, these strength r OA-based Latin hypercubes also stratify each r-dimensional margin. Therefore, such OA-based Latin hypercubes provide more suitable designs for computer experiments and numerical integration than do general Latin hypercubes. We prove that when used for integration, the sampling scheme with OA-based Latin hypercubes offers a substantial improvement over Latin hypercube sampling.

768 citations


Journal ArticleDOI
TL;DR: In this paper, the problem of estimating the number of kinds in a population of animals and plants is discussed. But the focus is not on estimating the relative sizes of the classes, but on the estimation of C itself.
Abstract: How many kinds are there? Suppose that a population is partitioned into C classes. In many situations interest focuses not on estimation of the relative sizes of the classes, but on estimation of C itself. For example, biologists and ecologists may be interested in estimating the number of species in a population of plants or animals, numismatists may be concemed with estimating the number of dies used to produce an ancient coin issue, and linguists may be interested in estimating the size of an author's vocabulary. In this article we review the problem of statistical estimation of C. Many approaches have been proposed, some purely data-analytic and others based in sampling theory. In the latter case numerous variations have been considered. The population may be finite or infinite. If finite, samples may be taken with replacement (multinomial sampling) or without replacement (hypergeometric sampling), or by Bernoulli sampling; if infinite, sampling may be multinomial or Bernoulli, or the sample may be th...

736 citations


Journal ArticleDOI
TL;DR: An iterative outlier detection and adjustment procedure to obtain joint estimates of model parameters and outlier effects and the issues of spurious and masking effects are discussed.
Abstract: Time series data are often subject to uncontrolled or unexpected interventions, from which various types of outlying observations are produced. Outliers in time series, depending on their nature, may have a moderate to significant impact on the effectiveness of the Standard methodology for time series analysis with respect to model identification, estimation, and forecasting. In this article we use an iterative outlier detection and adjustment procedure to obtain joint estimates of model parameters and outlier effects. Four types of outliers are considered, and the issues of spurious and masking effects are discussed. The major differences between this procedure and those proposed in earlier literature include (a) the types and effects of outliers are obtained based on less contaminated estimates of model parameters, (b) the outlier effects are estimated simultaneously using multiple regression, and (c) the model parameters and the outlier effects are estimated jointly. The sampling behavior of the test s...

Journal ArticleDOI
TL;DR: This article defines outliers in terms of their position relative to the model for the good observations, in a sense derived from Donoho and Huber.
Abstract: One approach to identifying outliers is to assume that the outliers have a different distribution from the remaining observations. In this article we define outliers in terms of their position relative to the model for the good observations. The outlier identification problem is then the problem of identifying those observations that lie in a so-called outlier region. Methods based on robust statistics and outward testing are shown to have the highest possible breakdown points in a sense derived from Donoho and Huber. But a more detailed analysis shows that methods based on robust statistics perform better with respect to worst-case behavior. A concrete outlier identifier based on a suggestion of Hampel is given.

Journal ArticleDOI
TL;DR: The product partition model as discussed by the authors assumes that the probability of any partition is proportional to a product of prior cohesions, one for each block in the partition, and given the blocks the parameters in different blocks have independent prior distributions.
Abstract: A sequence of observations undergoes sudden changes at unknown times. We model the process by supposing that there is an underlying sequence of parameters partitioned into contiguous blocks of equal parameter values; the beginning of each block is said to be a change point. Observations are then assumed to be independent in different blocks given the sequence of parameters. In a Bayesian analysis it is necessary to give probability distributions to both the change points and the parameters. We use product partition models (Barry and Hartigan 1992), which assume that the probability of any partition is proportional to a product of prior cohesions, one for each block in the partition, and that given the blocks the parameters in different blocks have independent prior distributions. Given the observations a new product partition model holds, with posterior cohesions for the blocks and new independent block posterior distributions for parameters. The product model thus provides a convenient machinery for allo...

Book ChapterDOI
TL;DR: In this paper, a generalized version of the confidence interval is defined, and the generalized confidence interval can be applied to the problem of constructing confidence intervals for the difference in two exponential means and for variance components in mixed models.
Abstract: The definition of a confidence interval is generalized so that problems such as constructing exact confidence regions for the difference in two normal means can be tackled without the assumption of equal variances. Under certain conditions, the extended definition is shown to preserve a repeated sampling property that a practitioner expects from exact confidence intervals. The proposed procedure is also applied to the problem of constructing confidence intervals for the difference in two exponential means and for variance components in mixed models. A repeated sampling property of generalized p values is also given. With this characterization one can carry out fixed level tests of parameters of continuous distributions on the basis of generalized p values. Finally, Pratt's paradox is revisited, and a procedure that resolves the paradox is given.

Journal ArticleDOI
TL;DR: In this article, a nonparametric maximum likelihood estimation of the probability of failing from a particular cause by time t in the presence of other acting causes (i.e., the cause-specific failure probability) is discussed.
Abstract: Nonparametric maximum likelihood estimation of the probability of failing from a particular cause by time t in the presence of other acting causes (i.e., the cause-specific failure probability) is discussed. A commonly used incorrect approach is to take 1 minus the Kaplan-Meier (KM) estimator (1 – KM), whereby patients who fail of extraneous causes are treated as censored observations. Examples showing the extent of bias in using the 1-KM approach are presented using clinical oncology data. This bias can be quite large if the data are uncensored or if a large percentage of patients fail from extraneous causes prior to the occurrence of failures from the cause of interest. Each cause-specific failure probability is mathematically defined as a function of all of the cause-specific hazards. Therefore, nonparametric estimates of the cause-specific failure probabilities may not be able to identify categorized covariate effects on the cause-specific hazards. These effects would be correctly identified ...

Journal ArticleDOI
TL;DR: In this article, generalized raking is used for estimation in surveys with auxiliary information in the form of known marginal counts in a frequency table in two or more dimensions, where the original weights are derived by minimizing the total distance between original weights and new weights.
Abstract: We propose the name generalized raking for the class of procedures developed in this article, because the classical raking ratio of W. E. Deming is a special case. Generalized raking can be used for estimation in surveys with auxiliary information in the form of known marginal counts in a frequency table in two or more dimensions. An important property of the generalized raking weights is that they reproduce the known marginal counts when applied to the categorical variables that define the frequency table. Our starting point is a class of distance measures and a set of original weights in the form of the standard sampling weights 1/π k , where π k is the inclusion probability of element k. New weights are derived by minimizing the total distance between original weights and new weights. The article makes contributions in three areas: (1) statistical inference conditionally on estimated cell counts, (2) simple calculation of variance estimates for the generalized raking estimators, and (3) presen...

Journal ArticleDOI
TL;DR: In this paper, a new class of models for nonlinear time series analysis is proposed, and a modeling procedure for building such a model is suggested, which makes use of ideas from both parametric and nonparametric statistics.
Abstract: In this article we propose a new class of models for nonlinear time series analysis, investigate properties of the proposed model, and suggest a modeling procedure for building such a model. The proposed modeling procedure makes use of ideas from both parametric and nonparametric statistics. A consistency result is given to support the procedure. For illustration we apply the proposed model and procedure to several data sets and show that the resulting models substantially improve postsample multi-step ahead forecasts over other models.

Journal ArticleDOI
TL;DR: The Parameter Q = Q(F, G) as mentioned in this paper measures the Overall "outlyingness" of population G relative to population F and is defined using any concept of data depth, its value ranges from 0 to 1, and is.5 when F and G are identical.
Abstract: Let F and G be the distribution functions of two given populations on Rp, p ≥ 1. We introduce and study a Parameter Q = Q(F, G), which measures the Overall “outlyingness” of population G relative to population F. The Parameter Q can be defined using any concept of data depth. Its value ranges from 0 to 1, and is .5 when F and G are identical. We show that within the dass of elliptical distributions when G departs from F in location or G has a larger spread, or both, the value of Q dwindles down from .5. Hence Q can be used to detect the loss of accuracy or precision of a manufacturing process, and thus it should serve as an important measure in quality assurance. This in fact is the reason why we refer to Q as a quality index in this article. In addition to studying the properties of Q, we provide an exact rank test for testing Q = .5 vs. Q < .5. This can be viewed as a multivariate analog of Wilcoxon's rank sum test. The tests proposed here have power against location change and scale increase simultaneo...

Journal ArticleDOI
TL;DR: In this article, the authors compared the performance of different smoothing parameterizations of the kernel density estimator, using both the asymptotic and exact mean integrated squared error.
Abstract: The basic kernel density estimator in one dimension has a single smoothing parameter, usually referred to as the bandwidth. For higher dimensions, however, there are several options for smoothing parameterization of the kernel estimator. For the bivariate case, there can be between one and three independent smoothing parameters in the estimator, which leads to a flexibility versus complexity trade-off when using this estimator in practice. In this article the performances of the different possible smoothing parameterizations are compared, using both the asymptotic and exact mean integrated squared error. Our results show that it is important to have independent smoothing parameters for each of the coordinate directions. Although this is enough for many situations, for densities with high amounts of curvature in directions different to those of the coordinate axes, substantial gains can be made by allowing the kernel mass to have arbitrary orientations. The “sphering” approaches to choosing this o...


Journal ArticleDOI
TL;DR: In this paper, the authors introduce two test procedures for the detection of multiple outliers that appear to be less sensitive to the observations they are supposed to identify, and compare them with various existing methods.
Abstract: We consider the problem of identifying and testing multiple outliers in linear models. The available outlier identification methods often do not succeed in detecting multiple outliers because they are affected by the observations they are supposed to identify. We introduce two test procedures for the detection of multiple outliers that appear to be less sensitive to this problem. Both procedures attempt to separate the data into a set of “clean” data points and a set of points that contain the potential outliers. The potential outliers are then tested to see how extreme they are relative to the clean subset, using an appropriately scaled version of the prediction error. The procedures are illustrated and compared to various existing methods, using several data sets known to contain multiple outliers. Also, the performances of both procedures are investigated by a Monte Carlo study. The data sets and the Monte Carlo indicate that both procedures are effective in the detection of multiple outliers ...

Journal ArticleDOI
TL;DR: A tree-based method for censored survival data is developed, based on maximizing the difference in survival between groups of patients represented by nodes in a binary tree that includes a pruning algorithm with optimal properties analogous to the classification and regression tree (CART) pruned algorithm.
Abstract: A tree-based method for censored survival data is developed, based on maximizing the difference in survival between groups of patients represented by nodes in a binary tree. The method includes a pruning algorithm with optimal properties analogous to the classification and regression tree (CART) pruning algorithm. Uniform convergence of the estimates of the conditional cumulative hazard and survival functions is discussed, and an example is given to show the utility of the algorithm for developing prognostic classifications for patients.

Book ChapterDOI
TL;DR: The Fisher and Neyman-Pearson approaches to testing statistical hypotheses are compared with respect to their attitudes to the interpretation of the outcome, to power, to conditioning, and to the use of fixed significance levels as discussed by the authors.
Abstract: The Fisher and Neyman-Pearson approaches to testing statistical hypotheses are compared with respect to their attitudes to the interpretation of the outcome, to power, to conditioning, and to the use of fixed significance levels. It is argued that despite basic philosophical differences, in their main practical aspects the two theories are complementary rather than contradictory and that a unified approach is possible that combines the best features of both. As applications, the controversies about the Behrens-Fisher problem and the comparison of two binomials (2 × 2 tables) are considered from the present point of view.

Journal ArticleDOI
TL;DR: In this paper, a new approach to regression modeling of recurrent event time data is suggested and contrasted with existing methods, which do not require that an explicit model be formulated for the probabilistic association between failure times within an individual.
Abstract: Recurrent event time data are common in medical research; examples include infections in AIDS patients and seizures in epilepsy patients. In this context, as well as in the more usual context of a single failure time variable, time-dependent covariates are frequently of interest. We suggest some rate functions that might be displayed when analyzing recurrent failure time data or when the effect of a categorical time-dependent covariate is of interest. Estimators of these functions are provided along with two-sample test statistics. A new approach to regression modeling of these data is suggested and contrasted with existing methods. Our methods do not require that an explicit model be formulated for the probabilistic association between failure times within an individual. This is in line with the currently popular generalized estimating equation approach to longitudinal data. If the nature of such associations is known or is of particular interest, then alternative methods may be appropriate.

Journal ArticleDOI
TL;DR: In this article, the authors discuss the importance of statistical literacy in improving statistical literacy and enriching the society, and propose an approach for enhancing statistical literacy. Journal of the American Statistical Association: Vol. 88, No. 421, pp. 1-8.
Abstract: (1993). Enhancing Statistical Literacy: Enriching Our Society. Journal of the American Statistical Association: Vol. 88, No. 421, pp. 1-8.

Journal ArticleDOI
TL;DR: In this article, a method for modeling a changing periodic pattern is developed using time-varying splines, which enables this to be done relatively parsimoniously and is applied in a model used to forecast hourly electricity demand, with the periodic movements being intradaily or intraweekly.
Abstract: A method for modeling a changing periodic pattern is developed. The use of time-varying splines enables this to be done relatively parsimoniously. The method is applied in a model used to forecast hourly electricity demand, with the periodic movements being intradaily or intraweekly. The full model contains other components, including a temperature response, which is also modeled using splines.

Journal ArticleDOI
TL;DR: In this article, a monotonicity problem concerning the critical values in stepwise multiple test procedures for comparing k parameters was considered and solved for a large class of distributional settings by means of an inequality for cumulative distribution functions of test statistics.
Abstract: We consider a monotonicity problem concerning the critical values in stepwise multiple test procedures for comparing k parameters. This problem will be solved for a large class of distributional settings by means of an inequality for cumulative distribution functions of test statistics satisfying a simple monotonicity condition.

Journal ArticleDOI
TL;DR: In this article, a linear discriminant function was used to find a linear combination of markers to maximize the sensitivity over the entire specificity range uniformly under the multivariate normal distribution model with proportional covariance matrices.
Abstract: The receiver operating characteristic (ROC) curve is a simple and meaningful measure to assess the usefulness of diagnostic markers. To use the information carried by multiple markers, we note that Fisher's linear discriminant function provides a linear combination of markers to maximize the sensitivity over the entire specificity range uniformly under the multivariate normal distribution model with proportional covariance matrices. With no restriction on covariance matrices, we also provide a solution of the best linear combination of markers in the sense that the area under the ROC curve of this combination is maximized among all possible linear combinations. We illustrate both situations discussed in the article with a cancer clinical trial data.

Journal ArticleDOI
TL;DR: In this article, a general solution to the problem of missing covariate data under the Cox regression model is provided, where the estimating function for the vector of regression parameters is an approximation to the partial likelihood score function with full covariate measurements and reduces to the pseudolikelihood score function of Self and Prentice in the special case-cohort designs.
Abstract: This article provides a general solution to the problem of missing covariate data under the Cox regression model. The estimating function for the vector of regression parameters is an approximation to the partial likelihood score function with full covariate measurements and reduces to the pseudolikelihood score function of Self and Prentice in the special setting of case-cohort designs. The resulting parameter estimator is consistent and asymptotically normal with a covariance matrix for which a simple and consistent estimator is provided. Extensive simulation studies show that the large-sample approximations are adequate for practical use. The proposed approach tends to be more efficient than the complete-case analysis, especially for large cohorts with infrequent failures. For case-cohort designs, the new methodology offers a variance-covariance estimator that is much easier to calculate than the existing ones and allows multiple subcohort augmentations to improve efficiency. Real data taken f...

Journal ArticleDOI
TL;DR: This article developed Bayesian model-based theory for post-stratification, which is a common technique in survey analysis for incorporating population distributions of variables into survey estimates, such as functions of means and totals.
Abstract: Post-stratification is a common technique in survey analysis for incorporating population distributions of variables into survey estimates. The basic technique divides the sample into post-strata, and computes a post-stratification weight w ih = rP h /r h for each sample case in post-stratum h, where r h is the number of survey respondents in post-stratum h, P h is the population proportion from a census, and r is the respondent sample size. Survey estimates, such as functions of means and totals, then weight cases by w h . Variants and extensions of the method include truncation of the weights to avoid excessive variability and raking to a set of two or more univariate marginal distributions. Literature on post-stratification is limited and has mainly taken the randomization (or design-based) perspective, where inference is based on the sampling distribution with population values held fixed. This article develops Bayesian model-based theory for the method. A basic normal post-stratification mod...