scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the American Statistical Association in 2006"


Journal ArticleDOI
Hui Zou1
TL;DR: A new version of the lasso is proposed, called the adaptive lasso, where adaptive weights are used for penalizing different coefficients in the ℓ1 penalty, and the nonnegative garotte is shown to be consistent for variable selection.
Abstract: The lasso is a popular technique for simultaneous estimation and variable selection. Lasso variable selection has been shown to be consistent under certain conditions. In this work we derive a necessary condition for the lasso variable selection to be consistent. Consequently, there exist certain scenarios where the lasso is inconsistent for variable selection. We then propose a new version of the lasso, called the adaptive lasso, where adaptive weights are used for penalizing different coefficients in the l1 penalty. We show that the adaptive lasso enjoys the oracle properties; namely, it performs as well as if the true underlying model were given in advance. Similar to the lasso, the adaptive lasso is shown to be near-minimax optimal. Furthermore, the adaptive lasso can be solved by the same efficient algorithm for solving the lasso. We also discuss the extension of the adaptive lasso in generalized linear models and show that the oracle properties still hold under mild regularity conditions. As a bypro...

6,765 citations


Journal ArticleDOI
TL;DR: This work considers problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups, and considers a hierarchical model, specifically one in which the base measure for the childDirichlet processes is itself distributed according to a Dirichlet process.
Abstract: We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the well-known clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes ...

3,755 citations



Journal ArticleDOI
TL;DR: Because the text uses Excel exclusively, students entering the job market or applying for graduate school cannot claim knowledge of a statistical package—a potentially marketable skill.
Abstract: (2006). Quantitative Risk Management: Concepts, Techniques, and Tools. Journal of the American Statistical Association: Vol. 101, No. 476, pp. 1731-1732.

1,472 citations


Journal ArticleDOI
TL;DR: A general quantitative relationship between the risk as assessed using the 0–1 loss and the riskAs assessed using any nonnegative surrogate loss function is provided, and it is shown that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function.
Abstract: Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0–1 loss function. The convexity makes these algorithms computationally efficient. The use of a surrogate, however, has statistical consequences that must be balanced against the computational virtues of convexity. To study these issues, we provide a general quantitative relationship between the risk as assessed using the 0–1 loss and the risk as assessed using any nonnegative surrogate loss function. We show that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function—that it satisfies a pointwise form of Fisher consistency for classification. The relationship is based on a simple variational transformation of the loss function that is easy to compute in many applications. We also present a refined version of this result in the...

1,352 citations


Journal Article
TL;DR: In this article, the secret to improve the quality of life by reading this group-based modeling of development is found, which is a kind of book that you need now, and it can be your favorite book to read after having this book.
Abstract: Find the secret to improve the quality of life by reading this group based modeling of development. This is a kind of book that you need now. Besides, it can be your favorite book to read after having this book. Do you ask why? Well, this is a book that has different characteristic with others. You may not need to know who the author is, how well-known the work is. As wise word, never judge the words from who speaks, but make the words as your good value to your life.

864 citations


Journal ArticleDOI
TL;DR: Supervised Principal Component Analysis (SPCA) as mentioned in this paper is similar to conventional principal components analysis except that it uses a subset of the predictors selected based on their association with the outcome, and can be applied to regression and generalized regression problems, such as survival analysis.
Abstract: In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called supervised principal components that can be applied to this type of problem. Supervised principal components is similar to conventional principal components analysis except that it uses a subset of the predictors selected based on their association with the outcome. Supervised principal components can be applied to regression and generalized regression problems, such as survival analysis. It compares favorably to other techniques for this type of problem, and can also account for the effects of other covariates and help identify which predictor variables are most important. We also provide asymptotic consistency results to help support our empirical findings. These methods could become important tools for DNA microarray data, where they may be used to more accurately diagnose and treat cancer.

773 citations


Journal Article
TL;DR: In this paper, the identification and inference for econometric models essays in honor of thomas rothenberg PDF is available on our online library. But they do not provide a review of the essays.
Abstract: IDENTIFICATION AND INFERENCE FOR ECONOMETRIC MODELS ESSAYS IN HONOR OF THOMAS ROTHENBERG PDF Are you looking for Ebook identification and inference for econometric models essays in honor of thomas rothenberg PDF? You will be glad to know that right now identification and inference for econometric models essays in honor of thomas rothenberg PDF is available on our online library. With our online resources, you can find identification and inference for econometric models essays in honor of thomas rothenberg or just about any type of ebooks, for any type of product.

767 citations


Journal Article
TL;DR: Alho and Spencer as discussed by the authors published a book on statistical and mathematical demography, focusing on mature population models, the particular focus of the new author (see, e.g., Caswell 2000).
Abstract: Here are two books on a topic new to Technometrics: statistical and mathematical demography. The first author of Applied Mathematical Demography wrote the first two editions of this book alone. The second edition was published in 1985. Professor Keyfritz noted in the Preface (p. vii) that at age 90 he had no interest in doing another edition; however, the publisher encouraged him to find a coauthor. The result is an additional focus for the book in the world of biology that makes it much more relevant for the sciences. The book is now part of the publisher’s series on Statistics for Biology and Health. Much of it, of course, focuses on the many aspects of human populations. The new material focuses on mature population models, the particular focus of the new author (see, e.g., Caswell 2000). As one might expect from a book that was originally written in the 1970s, it does not include a lot of information on statistical computing. The new book by Alho and Spencer is focused on putting a better emphasis on statistics in the discipline of demography (Preface, p. vii). It is part of the publisher’s Series in Statistics. The authors are both statisticians, so the focus is on statistics as used for demographic problems. The authors are targeting human applications, so their perspective on science does not extend any further than epidemiology. The book actually strikes a good balance between statistical tools and demographic applications. The authors use the first two chapters to teach statisticians about the concepts of demography. The next four chapters are very similar to the statistics content found in introductory books on survival analysis, such as the recent book by Kleinbaum and Klein (2005), reported by Ziegel (2006). The next three chapters are focused on various aspects of forecasting demographic rates. The book concludes with chapters focusing on three areas of applications: errors in census numbers, financial applications, and small-area estimates.

710 citations


Journal ArticleDOI
TL;DR: A general method for assessing evidence inconsistency in the framework of Bayesian hierarchical models, which represents evidence consistency as a set of linear relations between effect parameters on the log odds ratio scale, and relax these relations to allow for inconsistency by adding to the model random inconsistency factors (ICFs).
Abstract: Randomized comparisons among several treatments give rise to an incomplete-blocks structure known as mixed treatment comparisons (MTCs). To analyze such data structures, it is crucial to assess whether the disparate evidence sources provide consistent information about the treatment contrasts. In this article we propose a general method for assessing evidence inconsistency in the framework of Bayesian hierarchical models. We begin with the distinction between basic parameters, which have prior distributions, and functional parameters, which are defined in terms of basic parameters. Based on a graphical analysis of MTC structures, evidence inconsistency is defined as a relation between a functional parameter and at least two basic parameters, supported by at least three evidence sources. The inconsistency degrees of freedom (ICDF) is the number of such inconsistencies. We represent evidence consistency as a set of linear relations between effect parameters on the log odds ratio scale, then relax these rela...

631 citations


Journal ArticleDOI
TL;DR: This work considers the problem of variable or feature selection for model-based clustering and proposes a greedy search algorithm for finding a local optimum in model space, which consistently yielded more accurate estimates of the number of groups and lower classification error rates.
Abstract: We consider the problem of variable or feature selection for model-based clustering. The problem of comparing two nested subsets of variables is recast as a model comparison problem and addressed using approximate Bayes factors. A greedy search algorithm is proposed for finding a local optimum in model space. The resulting method selects variables (or features), the number of clusters, and the clustering model simultaneously. We applied the method to several simulated and real examples and found that removing irrelevant variables often improved performance. Compared with methods based on all of the variables, our variable selection method consistently yielded more accurate estimates of the number of groups and lower classification error rates, as well as more parsimonious clustering models and easier visualization of results.

Journal ArticleDOI
TL;DR: It is shown that random forests with adaptive splitting schemes assign weights to k-PNNs in a desirable way: for the estimation at a given target point, these random forests assign voting weights to the k- PNNs of the target point according to the local importance of different input variables.
Abstract: In this article we study random forests through their connection with a new framework of adaptive nearest-neighbor methods. We introduce a concept of potential nearest neighbors (k-PNNs) and show that random forests can be viewed as adaptively weighted k-PNN methods. Various aspects of random forests can be studied from this perspective. We study the effect of terminal node sizes on the prediction accuracy of random forests. We further show that random forests with adaptive splitting schemes assign weights to k-PNNs in a desirable way: for the estimation at a given target point, these random forests assign voting weights to the k-PNNs of the target point according to the local importance of different input variables. We propose a new simple splitting scheme that achieves desirable adaptivity in a straightforward fashion. This simple scheme can be combined with existing algorithms. The resulting algorithm is computationally faster and gives comparable results. Other possible aspects of random forests, such...

Journal ArticleDOI
TL;DR: A frame-work for causal inference when interference is present is developed and a number of causal estimands of interest are defined, which are of great importance for a researcher who fails to recognize that a treatment is beneficial when in fact it is universally harmful.
Abstract: During the past 20 years, social scientists using observational studies have generated a large and inconclusive literature on neighborhood effects. Recent workers have argued that estimates of neighborhood effects based on randomized studies of housing mobility, such as the “Moving to Opportunity” (MTO) demonstration, are more credible. These estimates are based on the implicit assumption of no interference between units; that is, a subject's value on the response depends only on the treatment to which that subject is assigned, not on the treatment assignments of other subjects. For the MTO studies, this assumption is not reasonable. Although little work has been done on the definition and estimation of treatment effects when interference is present, interference is common in studies of neighborhood effects and in many other social settings (e.g., schools and networks), and when data from such studies are analyzed under the “no-interference assumption,” very misleading inferences can result. Furthermore, ...

Journal ArticleDOI
TL;DR: This article considers the problem of modeling a class of nonstationary time series using piecewise autoregressive (AR) processes, and the minimum description length principle is applied to compare various segmented AR fits to the data.
Abstract: This article considers the problem of modeling a class of nonstationary time series using piecewise autoregressive (AR) processes. The number and locations of the piecewise AR segments, as well as the orders of the respective AR processes, are assumed unknown. The minimum description length principle is applied to compare various segmented AR fits to the data. The goal is to find the “best” combination of the number of segments, the lengths of the segments, and the orders of the piecewise AR processes. Such a “best” combination is implicitly defined as the optimizer of an objective function, and a genetic algorithm is implemented to solve this difficult optimization problem. Numerical results from simulation experiments and real data analyses show that the procedure has excellent empirical properties. The segmentation of multivariate time series is also considered. Assuming that the true underlying model is a segmented autoregression, this procedure is shown to be consistent for estimating the location of...

Journal ArticleDOI
TL;DR: This article allows the nonparametric distribution to depend on covariates through ordering the random variables building the weights in the stick-breaking representation and derives the correlation between distributions at different covariate values.
Abstract: In this article we propose a new framework for Bayesian nonparametric modeling with continuous covariates. In particular, we allow the nonparametric distribution to depend on covariates through ordering the random variables building the weights in the stick-breaking representation. We focus mostly on the class of random distributions that induces a Dirichlet process at each covariate value. We derive the correlation between distributions at different covariate values and use a point process to implement a practically useful type of ordering. Two main constructions with analytically known correlation structures are proposed. Practical and efficient computational methods are introduced. We apply our framework, through mixtures of these processes, to regression modeling, the modeling of stochastic volatility in time series data, and spatial geostatistical modeling.

Journal ArticleDOI
TL;DR: In this article, the authors consider the policy of retaining low-achieving children in kindergarten rather than promoting them to first grade and develop a causal model that allows school assignment and peer treatments to affect potential outcomes.
Abstract: This article considers the policy of retaining low-achieving children in kindergarten rather than promoting them to first grade. Under the stable unit treatment value assumption (SUTVA) as articulated by Rubin, each child at risk of retention has two potential outcomes: Y(1) if retained and Y(0) if promoted. But SUTVA is questionable, because a child's potential outcomes will plausibly depend on which school that child attends and also on treatment assignments of other children. We develop a causal model that allows school assignment and peer treatments to affect potential outcomes. We impose an identifying assumption that peer effects can be summarized through a scalar function of the vector of treatment assignments in a school. Using a large, nationally representative sample, we then estimate (1) the effect of being retained in kindergarten rather than being promoted to the first grade in schools having a low retention rate, (2) the retention effect in schools having a high retention rate, and (3) the e...

Journal ArticleDOI
TL;DR: An easy-to-implement global procedure for testing the four assumptions of the linear model and its performance is compared with three potential competitors, including a procedure based on the Box–Cox power transformation.
Abstract: An easy-to-implement global procedure for testing the four assumptions of the linear model is proposed. The test can be viewed as a Neyman smooth test and it only relies on the standardized residual vector. If the global procedure indicates a violation of at least one of the assumptions, the components of the global test statistic can be utilized to gain insights into which assumptions have been violated. The procedure can also be used in conjunction with associated deletion statistics to detect unusual observations. Simulation results are presented indicating the sensitivity of the procedure in detecting model violations under a variety of situations, and its performance is compared with three potential competitors, including a procedure based on the Box-Cox power transformation. The procedure is demonstrated by applying it to a new car mileage data set and a water salinity data set that has been used previously to illustrate model diagnostics.

Journal ArticleDOI
TL;DR: In this article, the authors consider a method that stops the simulation when the width of a confidence interval based on an ergodic average is less than a user-specified value.
Abstract: Markov chain Monte Carlo is a method of producing a correlated sample to estimate features of a target distribution through ergodic averages. A fundamental question is when sampling should stop; that is, at what point the ergodic averages are good estimates of the desired quantities. We consider a method that stops the simulation when the width of a confidence interval based on an ergodic average is less than a user-specified value. Hence calculating a Monte Carlo standard error is a critical step in assessing the simulation output. We consider the regenerative simulation and batch means methods of estimating the variance of the asymptotic normal distribution. We give sufficient conditions for the strong consistency of both methods and investigate their finite-sample properties in various examples.

Journal ArticleDOI
TL;DR: The Pareto distribution is a simple model for nonnegative data with a power law probability tail as mentioned in this paper, and there is a natural upper bound that truncates the probability tail.
Abstract: The Pareto distribution is a simple model for nonnegative data with a power law probability tail. In many practical applications, there is a natural upper bound that truncates the probability tail. This article derives estimators for the truncated Pareto distribution, investigates their properties, and illustrates a way to check for fit. These methods are illustrated with applications from finance, hydrology, and atmospheric science.


Journal ArticleDOI
TL;DR: In this article, a subclass of generalized pivotal quantities, called fiducial generalized pivotal quantity (FGPQs), is proposed and shown to have correct frequentist coverage, at least asymptotically.
Abstract: Generalized pivotal quantities (GPQs) and generalized confidence intervals (GCIs) have proven to be useful tools for making inferences in many practical problems. Although GCIs are not guaranteed to have exact frequentist coverage, a number of published and unpublished simulation studies suggest that the coverage probabilities of such intervals are sufficiently close to their nominal value so as to be useful in practice. In this article we single out a subclass of generalized pivotal quantities, which we call fiducial generalized pivotal quantities (FGPQs), and show that under some mild conditions, GCIs constructed using FGPQs have correct frequentist coverage, at least asymptotically. We describe three general approaches for constructing FGPQs—a recipe based on invertible pivotal relationships, and two extensions of it—and demonstrate their usefulness by deriving some previously unknown GPQs and GCIs. It is fair to say that nearly every published GCI can be obtained using one of these recipes. As an inte...


Journal ArticleDOI
TL;DR: In this paper, the authors studied the asymptotic behavior of the estimate of the CDR space with high-dimensional covariates, that is, when the dimension of the covariates goes to infinity as the sample size went to infinity.
Abstract: Sliced inverse regression is a promising method for the estimation of the central dimension-reduction subspace (CDR space) in semiparametric regression models. It is particularly useful in tackling cases with high-dimensional covariates. In this article we study the asymptotic behavior of the estimate of the CDR space with high-dimensional covariates, that is, when the dimension of the covariates goes to infinity as the sample size goes to infinity. Strong and weak convergence are obtained. We also suggest an estimation procedure of the Bayes information criterion type to ascertain the dimension of the CDR space and derive the consistency. A simulation study is conducted.

Journal ArticleDOI
TL;DR: In this article, the authors adopt Rubin's potential outcomes framework for causal inference and propose two methods serving complementary purposes: one can be used to estimate average causal effects, assuming no confounding given measured covariates; the other can assess how the estimates might change under various departures from no confounding.
Abstract: Drawing inferences about the effects of treatments and actions is a common challenge in economics, epidemiology, and other fields We adopt Rubin's potential outcomes framework for causal inference and propose two methods serving complementary purposes One can be used to estimate average causal effects, assuming no confounding given measured covariates The other can be used to assess how the estimates might change under various departures from no confounding Both methods are developed from a nonparametric likelihood perspective The propensity score plays a central role and is estimated through a parametric model Under the assumption of no confounding, the joint distribution of covariates and each potential outcome is estimated as a weighted empirical distribution Expectations from the joint distribution are estimated as weighted averages or, equivalently to first order, regression estimates The likelihood estimator is at least as efficient and the regression estimator is at least as efficient and r


Journal Article
TL;DR: In this article, quantile autoregression (QAR) models are considered, where the autoregressive coefficients can be expressed as monotone functions of a single, scalar random variable.
Abstract: We consider quantile autoregression (QAR) models in which the autoregressive coefficients can be expressed as monotone functions of a single, scalar random variable. The models can capture systematic influences of conditioning variables on the location, scale, and shape of the conditional distribution of the response, and thus constitute a significant extension of classical constant coefficient linear time series models in which the effect of conditioning is confined to a location shift. The models may be interpreted as a special case of the general random-coefficient autoregression model with strongly dependent coefficients. Statistical properties of the proposed model and associated estimators are studied. The limiting distributions of the autoregression quantile process are derived. QAR inference methods are also investigated. Empirical applications of the model to the U.S. unemployment rate, short-term interest rate, and gasoline prices highlight the model's potential.

Journal ArticleDOI
TL;DR: In this paper, the authors discuss only a very small subset of the possible statistical methods and designs available and discuss the importance of randomizing the run order of an experiment to obtain valid and interpretable results, the distinction between replicates and repeated observations, or the role of blocking or restrictions on randomization to run an experiment more effectively.
Abstract: without “bogging them down in theoretical underpinnings.” What results is a book that discusses only a very small subset of the possible statistical methods and designs available. There is little development of the concepts on which design of experiments are built. For example, there is little or no mention of the importance of randomizing the run order of an experiment to obtain valid and interpretable results, the distinction between replicates and repeated observations, or the role of blocking or restrictions on randomization to run an experiment more effectively. Often the reasons for choosing between design types (such as designs for first or second-order models, or cuboidal or spherical regions of interest) are not explained. Written at a very introductory level, the book does a good job explaining the details of running an experiment, including appropriate conversion of units for factors to the unscaled notation standard for most designs, the need for confirming results with subsequent experimentation, and how some calculations are performed for simple analyses. However, if a practitioner were to rely solely on this book to plan and run an experiment, she/he would be left with an overly simplified impression of what the goals of experimentation should be and what tools are available. Alternatives such as Box, Hunter, and Hunter (2005) and Montgomery (2001) provide a much more comprehensive treatment of the topic, complete with wonderful insights into general industrial experimentation thought, including the benefits of sequential experimentation based on learning at each stage, and the power and potential of a good experiment to provide a tailored solution to a diverse set of questions.

Journal ArticleDOI
TL;DR: A Bayesian model-based hierarchical clustering algorithm is introduced for curve data to investigate mechanisms of regulation in the genes concerned and reveals structure within the data not captured by other approaches.
Abstract: Malaria represents one of the major worldwide challenges to public health. A recent breakthrough in the study of the disease follows the annotation of the genome of the malaria parasite Plasmodium falciparum and the mosquito vector (an organism that spreads an infectious disease)Anopheles. Of particular interest is the molecular biology underlying the immune response system of Anopheles, which actively fights against Plasmodium infection. This article reports a statistical analysis of gene expression time profiles from mosquitoes that have been infected with a bacterial agent. Specifically, we introduce a Bayesian model-based hierarchical clustering algorithm for curve data to investigate mechanisms of regulation in the genes concerned; that is, we aim to cluster genes having similar expression profiles. Genes displaying similar, interesting profiles can then be highlighted for further investigation by the experimenter. We show how our approach reveals structure within the data not captured by other appro...

Journal ArticleDOI
TL;DR: The National Health Interview Survey (NHIS) provides a rich source of data for studying relationships between income and health and for monitoring health and health care for persons at different income levels.
Abstract: The National Health Interview Survey (NHIS) provides a rich source of data for studying relationships between income and health and for monitoring health and health care for persons at different income levels. However, the nonresponse rates are high for two key items, total family income in the previous calendar year and personal earnings from employment in the previous calendar year. To handle the missing data on family income and personal earnings in the NHIS, multiple imputation of these items, along with employment status and ratio of family income to the federal poverty threshold (derived from the imputed values of family income), has been performed for the survey years 1997–2004. (There are plans to continue this work for years beyond 2004 as well.) Files of the imputed values, as well as documentation, are available at the NHIS website (http://www.cdc.gov/nchs/nhis.htm). This article describes the approach used in the multiple-imputation project and evaluates the methods through analyses of the mul...

Journal ArticleDOI
TL;DR: The text provides an overview of regression methods that is particularly strong in its breadth of coverage and emphasis on insight in place of mathematical detail, and should appeal to students who learn conceptually and verbally.
Abstract: Overall, the text provides an overview of regression methods that is particularly strong in its breadth of coverage and emphasis on insight in place of mathematical detail. As intended, this well-unified approach should appeal to students who learn conceptually and verbally. While there are places in the book where the lack of mathematical exposition may pose a slight hindrance, instructors of biostatistics will find it much easier to supplement on that level rather than on the conceptual level at which the book shines brightest.