scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the American Statistical Association in 2018"


Journal ArticleDOI
TL;DR: This paper developed a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm, and showed that causal forests are pointwise consistent for the true treatment effect and have an asymptotically Gaussian and centered sampling distribution.
Abstract: Many scientific and engineering challenges—ranging from personalized medicine to customized marketing recommendations—require an understanding of treatment effect heterogeneity. In this paper, we develop a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm. In the potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms. To our knowledge, this is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical infe...

1,156 citations


Journal ArticleDOI
TL;DR: In this article, a general class of weights, called balancing weights, is defined to balance the weighted distributions of the covariates between treatment groups, and a new weighting scheme, the overlap weights, are proposed to minimize the variance of the weighted average treatment effect among the class of balancing weights.
Abstract: Covariate balance is crucial for unconfounded descriptive or causal comparisons. However, lack of balance is common in observational studies. This article considers weighting strategies for balancing covariates. We define a general class of weights—the balancing weights—that balance the weighted distributions of the covariates between treatment groups. These weights incorporate the propensity score to weight each group to an analyst-selected target population. This class unifies existing weighting methods, including commonly used weights such as inverse-probability weights as special cases. General large-sample results on nonparametric estimation based on these weights are derived. We further propose a new weighting scheme, the overlap weights, in which each unit’s weight is proportional to the probability of that unit being assigned to the opposite group. The overlap weights are bounded, and minimize the asymptotic variance of the weighted average treatment effect among the class of balancing wei...

508 citations


Journal ArticleDOI
TL;DR: A general framework for distribution-free predictive inference in regression, using conformal inference, which allows for the construction of a prediction band for the response variable using any estimator of the regression function, and a model-free notion of variable importance, called leave-one-covariate-out or LOCO inference.
Abstract: We develop a general framework for distribution-free predictive inference in regression, using conformal inference. The proposed methodology allows for the construction of a prediction band for the response variable using any estimator of the regression function. The resulting prediction band preserves the consistency properties of the original estimator under standard assumptions, while guaranteeing finite-sample marginal coverage even when these assumptions do not hold. We analyze and compare, both empirically and theoretically, the two major variants of our conformal framework: full conformal inference and split conformal inference, along with a related jackknife method. These methods offer different tradeoffs between statistical accuracy (length of resulting prediction intervals) and computational efficiency. As extensions, we develop a method for constructing valid in-sample prediction intervals called rank-one-out conformal inference, which has essentially the same computational efficiency a...

449 citations


Journal ArticleDOI
TL;DR: It is shown on simulated data that the fully Bayes penalty mimics oracle performance, providing a viable alternative to cross-validation and developing theory for the separable and nonseparable variants of the penalty.
Abstract: Despite the wide adoption of spike-and-slab methodology for Bayesian variable selection, its potential for penalized likelihood estimation has largely been overlooked. In this paper, we bridge this gap by cross-fertilizing these two paradigms with the Spike-and-Slab LASSO procedure for variable selection and parameter estimation in linear regression. We introduce a new class of self-adaptive penalty functions that arise from a fully Bayes spike-and-slab formulation, ultimately moving beyond the separable penalty framework. A virtue of these non-separable penalties is their ability to borrow strength across coordinates, adapt to ensemble sparsity information and exert multiplicity adjustment. The Spike-and-Slab LASSO procedure harvests efficient coordinate-wise implementations with a path-following scheme for dynamic posterior exploration. We show on simulated data that the fully Bayes penalty mimics oracle performance, providing a viable alternative to cross-validation. We develop theory for the s...

299 citations


Journal ArticleDOI
TL;DR: In this paper, the tradeoff between privacy guarantees and the risk of the resulting statistical estimators is studied under a model of privacy in which data remain private even from the statistician.
Abstract: Working under a model of privacy in which data remain private even from the statistician, we study the tradeoff between privacy guarantees and the risk of the resulting statistical estimators. We develop private versions of classical information-theoretical bounds, in particular those due to Le Cam, Fano, and Assouad. These inequalities allow for a precise characterization of statistical rates under local privacy constraints and the development of provably (minimax) optimal estimation procedures. We provide a treatment of several canonical families of problems: mean estimation and median estimation, generalized linear models, and nonparametric density estimation. For all of these families, we provide lower and upper bounds that match up to constant factors, and exhibit new (optimal) privacy-preserving mechanisms and computationally efficient estimators that achieve the bounds. Additionally, we present a variety of experimental results for estimation problems involving sensitive data, including sal...

232 citations


Journal ArticleDOI
TL;DR: In this paper, the theoretical basis for multivariate functional principal component analysis is given in terms of a Karhunen-Loeve Theorem and a relationship between univariate and multivariate FP analysis is established.
Abstract: Existing approaches for multivariate functional principal component analysis are restricted to data on the same one-dimensional interval. The presented approach focuses on multivariate functional data on different domains that may differ in dimension, such as functions and images. The theoretical basis for multivariate functional principal component analysis is given in terms of a Karhunen–Loeve Theorem. For the practically relevant case of a finite Karhunen–Loeve representation, a relationship between univariate and multivariate functional principal component analysis is established. This offers an estimation strategy to calculate multivariate functional principal components and scores based on their univariate counterparts. For the resulting estimators, asymptotic results are derived. The approach can be extended to finite univariate expansions in general, not necessarily orthonormal bases. It is also applicable for sparse functional data or data with measurement error. A flexible R implementati...

227 citations


Journal ArticleDOI
TL;DR: In this paper, the authors exploit the Metropolis-Hastings algorithm for Markov chain Monte Carlo (MCMC) with transition kernels whose transition kernels are variations of the MHC algorithm.
Abstract: Many Markov chain Monte Carlo techniques currently available rely on discrete-time reversible Markov processes whose transition kernels are variations of the Metropolis–Hastings algorithm. We explo...

199 citations


Journal ArticleDOI
TL;DR: A two-step algorithm is developed to efficiently approximate the maximum likelihood estimate in logistic regression and derive optimal subsampling probabilities that minimize the asymptotic mean squared error of the resultant estimator.
Abstract: For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least-square esti...

172 citations


Journal ArticleDOI
TL;DR: A Bayesian point of view is taken and how to construct priors on decision tree ensembles that are capable of adapting to sparsity in the predictors by placing a sparsity-inducing Dirichlet hyperprior on the splitting proportions of the regression tree prior is shown.
Abstract: Decision tree ensembles are an extremely popular tool for obtaining high-quality predictions in nonparametric regression problems. Unmodified, however, many commonly used decision tree ensemble methods do not adapt to sparsity in the regime in which the number of predictors is larger than the number of observations. A recent stream of research concerns the construction of decision tree ensembles that are motivated by a generative probabilistic model, the most influential method being the Bayesian additive regression trees (BART) framework. In this article, we take a Bayesian point of view on this problem and show how to construct priors on decision tree ensembles that are capable of adapting to sparsity in the predictors by placing a sparsity-inducing Dirichlet hyperprior on the splitting proportions of the regression tree prior. We characterize the asymptotic distribution of the number of predictors included in the model and show how this prior can be easily incorporated into existing Markov chai...

168 citations


Journal ArticleDOI
TL;DR: In this paper, the authors developed an efficient network cross-validation (NCV) approach to determine the number of communities, as well as to choose between the regular stochastic block model and the degree corrected block model (DCBM).
Abstract: The stochastic block model (SBM) and its variants have been a popular tool for analyzing large network data with community structures. In this article, we develop an efficient network cross-validation (NCV) approach to determine the number of communities, as well as to choose between the regular stochastic block model and the degree corrected block model (DCBM). The proposed NCV method is based on a block-wise node-pair splitting technique, combined with an integrated step of community recovery using sub-blocks of the adjacency matrix. We prove that the probability of under-selection vanishes as the number of nodes increases, under mild conditions satisfied by a wide range of popular community recovery algorithms. The solid performance of our method is also demonstrated in extensive simulations and two data examples. Supplementary materials for this article are available online.

164 citations


Journal ArticleDOI
TL;DR: In this article, the effects of bias correction on confidence interval coverage in the context of kernel density and local polynomial regression estimation are studied. But bias correction can be preferred to undersmoothing for minimizing coverage error and increasing robustness to tuning parameter choice.
Abstract: Nonparametric methods play a central role in modern empirical work. While they provide inference procedures that are more robust to parametric misspecification bias, they may be quite sensitive to tuning parameter choices. We study the effects of bias correction on confidence interval coverage in the context of kernel density and local polynomial regression estimation, and prove that bias correction can be preferred to undersmoothing for minimizing coverage error and increasing robustness to tuning parameter choice. This is achieved using a novel, yet simple, Studentization, which leads to a new way of constructing kernel-based bias-corrected confidence intervals. In addition, for practical cases, we derive coverage error optimal bandwidths and discuss easy-to-implement bandwidth selectors. For interior points, we show that the MSE-optimal bandwidth for the original point estimator (before bias correction) delivers the fastest coverage error decay rate after bias correction when second-order (equi...

Journal ArticleDOI
TL;DR: In this article, the authors study a class of nonsharp null hypotheses about treatment effects in a setting with data from experiments involving members of a single connected network, including null hypotheses that limit the effect of one unit's treatment status on another according to the distance between units, for example, the hypothesis might specify that the treatment status of immediate neighbors has no effect, or that units more than two edges away have no effect.
Abstract: We study the calculation of exact p-values for a large class of nonsharp null hypotheses about treatment effects in a setting with data from experiments involving members of a single connected network. The class includes null hypotheses that limit the effect of one unit’s treatment status on another according to the distance between units, for example, the hypothesis might specify that the treatment status of immediate neighbors has no effect, or that units more than two edges away have no effect. We also consider hypotheses concerning the validity of sparsification of a network (e.g., based on the strength of ties) and hypotheses restricting heterogeneity in peer effects (so that, e.g., only the number or fraction treated among neighboring units matters). Our general approach is to define an artificial experiment, such that the null hypothesis that was not sharp for the original experiment is sharp for the artificial experiment, and such that the randomization analysis for the artificial experime...

Journal ArticleDOI
TL;DR: The most commonly used method of inference for MFMs is reversible jump Markov chain Monte Carlo, but it can be nontrivial to design good reversible jump moves, especially in high-dimensional spaces as discussed by the authors.
Abstract: A natural Bayesian approach for mixture models with an unknown number of components is to take the usual finite mixture model with symmetric Dirichlet weights, and put a prior on the number of components—that is, to use a mixture of finite mixtures (MFM). The most commonly used method of inference for MFMs is reversible jump Markov chain Monte Carlo, but it can be nontrivial to design good reversible jump moves, especially in high-dimensional spaces. Meanwhile, there are samplers for Dirichlet process mixture (DPM) models that are relatively simple and are easily adapted to new applications. It turns out that, in fact, many of the essential properties of DPMs are also exhibited by MFMs—an exchangeable partition distribution, restaurant process, random measure representation, and stick-breaking representation—and crucially, the MFM analogues are simple enough that they can be used much like the corresponding DPM properties. Consequently, many of the powerful methods developed for inference in DPMs ...

Journal ArticleDOI
TL;DR: Proposed methods for partial and point identification of the ATE under IV assumptions are reviewed, the identification results are expressed in a common notation and terminology, and a taxonomy is proposed that is based on sets of identifying assumptions.
Abstract: Several methods have been proposed for partially or point identifying the average treatment effect (ATE) using instrumental variable (IV) type assumptions. The descriptions of these methods are widespread across the statistical, economic, epidemiologic, and computer science literature, and the connections between the methods have not been readily apparent. In the setting of a binary instrument, treatment, and outcome, we review proposed methods for partial and point identification of the ATE under IV assumptions, express the identification results in a common notation and terminology, and propose a taxonomy that is based on sets of identifying assumptions. We further demonstrate and provide software for the application of these methods to estimate bounds. Supplementary materials for this article are available online.

Journal ArticleDOI
TL;DR: The Hollywood model is identified here as the canonical family of edge exchangeable distributions, which is computationally tractable, admits a clear interpretation, exhibits good theoretical properties, and performs reasonably well in estimation and prediction as it demonstrates on real network datasets.
Abstract: Many modern network datasets arise from processes of interactions in a population, such as phone calls, email exchanges, co-authorships, and professional collaborations. In such interaction networks, the edges comprise the fundamental statistical units, making a framework for edge-labeled networks more appropriate for statistical analysis. In this context we initiate the study of edge exchangeable network models and explore its basic statistical properties. Several theoretical and practical features make edge exchangeable models better suited to many applications in network analysis than more common vertex-centric approaches. In particular, edge exchangeable models allow for sparse structure and power law degree distributions, both of which are widely observed empirical properties that cannot be handled naturally by more conventional approaches. Our discussion culminates in the Hollywood model, which we identify here as the canonical family of edge exchangeable distributions. The Hollywood model i...

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed two-stage regularization methods for model selection in high-dimensional quadratic regression (QR) models, which maintain the hierarchical model structure between main effects and interaction effects.
Abstract: Quadratic regression (QR) models naturally extend linear models by considering interaction effects between the covariates. To conduct model selection in QR, it is important to maintain the hierarchical model structure between main effects and interaction effects. Existing regularization methods generally achieve this goal by solving complex optimization problems, which usually demands high computational cost and hence are not feasible for high-dimensional data. This article focuses on scalable regularization methods for model selection in high-dimensional QR. We first consider two-stage regularization methods and establish theoretical properties of the two-stage LASSO. Then, a new regularization method, called regularization algorithm under marginality principle (RAMP), is proposed to compute a hierarchy-preserving regularization solution path efficiently. Both methods are further extended to solve generalized QR models. Numerical results are also shown to demonstrate performance of the methods. S...

Journal ArticleDOI
TL;DR: A formal definition for moderated effects in terms of potential outcomes is introduced, a definition that is particularly suited to mobile interventions, where treatment occasions are numerous, individuals are not always available for treatment, and potential moderators might be influenced by past treatment.
Abstract: In mobile health interventions aimed at behavior change and maintenance, treatments are provided in real time to manage current or impending high-risk situations or promote healthy behaviors in nea...

Journal ArticleDOI
TL;DR: In this article, the authors propose a methodology for testing linear hypothesis in high-dimensional linear models, which does not impose any restriction on the size of the model, i.e. model sparsity or the loading vector representing the hypothesis.
Abstract: We propose a methodology for testing linear hypothesis in high-dimensional linear models. The proposed test does not impose any restriction on the size of the model, i.e. model sparsity or the loading vector representing the hypothesis. Providing asymptotically valid methods for testing general linear functions of the regression parameters in high-dimensions is extremely challenging - especially without making restrictive or unverifiable assumptions on the number of non-zero elements. We propose to test the moment conditions related to the newly designed restructured regression, where the inputs are transformed and augmented features. These new features incorporate the structure of the null hypothesis directly. The test statistics are constructed in such a way that lack of sparsity in the original model parameter does not present a problem for the theoretical justification of our procedures. We establish asymptotically exact control on Type I error without imposing any sparsity assumptions on mode...

Journal ArticleDOI
TL;DR: This article proposes a likelihood-based method to estimate the latent structure from the data that outperforms the existing approaches and establishes identifiability conditions that ensure the estimability of the structure matrix.
Abstract: This article focuses on a family of restricted latent structure models with wide applications in psychological and educational assessment, where the model parameters are restricted via a latent structure matrix to reflect prespecified assumptions on the latent attributes. Such a latent matrix is often provided by experts and assumed to be correct upon construction, yet it may be subjective and misspecified. Recognizing this problem, researchers have been developing methods to estimate the matrix from data. However, the fundamental issue of the identifiability of the latent structure matrix has not been addressed until now. The first goal of this article is to establish identifiability conditions that ensure the estimability of the structure matrix. With the theoretical development, the second part of the article proposes a likelihood-based method to estimate the latent structure from the data. Simulation studies show that the proposed method outperforms the existing approaches. We further illustra...

Journal ArticleDOI
TL;DR: A new copula model that can be used with replicated spatial data based on the assumption that a common factor exists and affects the joint dependence of all measurements of the process is proposed and can model tail dependence and tail asymmetry.
Abstract: We propose a new copula model that can be used with replicated spatial data. Unlike the multivariate normal copula, the proposed copula is based on the assumption that a common factor exists and affects the joint dependence of all measurements of the process. Moreover, the proposed copula can model tail dependence and tail asymmetry. The model is parameterized in terms of a covariance function that may be chosen from the many models proposed in the literature, such as the Matern model. For some choice of common factors, the joint copula density is given in closed form and therefore likelihood estimation is very fast. In the general case, one-dimensional numerical integration is needed to calculate the likelihood, but estimation is still reasonably fast even with large datasets. We use simulation studies to show the wide range of dependence structures that can be generated by the proposed model with different choices of common factors. We apply the proposed model to spatial temperature data and com...

Journal ArticleDOI
TL;DR: The regression kink (RK) design is an increasingly popular empirical method for estimating causal effects of policies, such as the effect of unemployment benefits on unemployment duration as discussed by the authors, using si...
Abstract: The regression kink (RK) design is an increasingly popular empirical method for estimating causal effects of policies, such as the effect of unemployment benefits on unemployment duration. Using si...

Journal ArticleDOI
TL;DR: The proposed population stochastic approximation Monte Carlo algorithm is implemented on the OpenMP platform and successfully applied to identification of the genes that are associated with anticancer drug sensitivities based on the data collected in the cancer cell line encyclopedia study.
Abstract: Recent advances in high-throughput biotechnologies have provided an unprecedented opportunity for biomarker discovery, which, from a statistical point of view, can be cast as a variable selection problem. This problem is challenging due to the high-dimensional and nonlinear nature of omics data and, in general, it suffers three difficulties: (i) an unknown functional form of the nonlinear system, (ii) variable selection consistency, and (iii) high-demanding computation. To circumvent the first difficulty, we employ a feed-forward neural network to approximate the unknown nonlinear function motivated by its universal approximation ability. To circumvent the second difficulty, we conduct structure selection for the neural network, which induces variable selection, by choosing appropriate prior distributions that lead to the consistency of variable selection. To circumvent the third difficulty, we implement the population stochastic approximation Monte Carlo algorithm, a parallel adaptive Markov Chai...

Journal ArticleDOI
TL;DR: This article developed a Bayesian semiparametric analysis of moment condition models by casting the problem within the exponentially tilted empirical likelihood (ETEL) framework, and applied it to moment condition analysis.
Abstract: In this article, we develop a Bayesian semiparametric analysis of moment condition models by casting the problem within the exponentially tilted empirical likelihood (ETEL) framework. We use this f...

Journal ArticleDOI
TL;DR: This work proposes unbiased estimators for a broad class of individual- and household-weighted estimands, with corresponding theoretical and estimated variances, and connects two common approaches for analyzing two-stage designs: linear regression and randomization inference.
Abstract: Two-stage randomization is a powerful design for estimating treatment effects in the presence of interference; that is, when one individual’s treatment assignment affects another individual’s outcomes. Our motivating example is a two-stage randomized trial evaluating an intervention to reduce student absenteeism in the School District of Philadelphia. In that experiment, households with multiple students were first assigned to treatment or control; then, in treated households, one student was randomly assigned to treatment. Using this example, we highlight key considerations for analyzing two-stage experiments in practice. Our first contribution is to address additional complexities that arise when household sizes vary; in this case, researchers must decide between assigning equal weight to households or equal weight to individuals. We propose unbiased estimators for a broad class of individual- and household-weighted estimands, with corresponding theoretical and estimated variances. Our second co...

Journal ArticleDOI
TL;DR: This work derives the theoretical optimal treatment rule under the risk constraint and draws an analogy to the Neyman–Pearson lemma to prove the theorem, and presents algorithms that can be easily implemented by any off-the-shelf quadratic programming package.
Abstract: Individualized medical decision making is often complex due to patient treatment response heterogeneity. Pharmacotherapy may exhibit distinct efficacy and safety profiles for different patient popu...

Journal ArticleDOI
TL;DR: In this article, a new surrogate model is proposed to provide efficient prediction and uncertainty quantification of turbulent flows in swirl injectors with varying geometries, devices commonly used in many engineering applications.
Abstract: In the quest for advanced propulsion and power-generation systems, high-fidelity simulations are too computationally expensive to survey the desired design space, and a new design methodology is needed that combines engineering physics, computer simulations, and statistical modeling. In this article, we propose a new surrogate model that provides efficient prediction and uncertainty quantification of turbulent flows in swirl injectors with varying geometries, devices commonly used in many engineering applications. The novelty of the proposed method lies in the incorporation of known physical properties of the fluid flow as simplifying assumptions for the statistical model. In view of the massive simulation data at hand, which is on the order of hundreds of gigabytes, these assumptions allow for accurate flow predictions in around an hour of computation time. To contrast, existing flow emulators which forgo such simplifications may require more computation time for training and prediction than is n...

Journal ArticleDOI
TL;DR: This article proposed exponential random graph models for social networks and Markov point processes with intractable normalizing functions, which are common examples of such models in statistics, such as exponential random graphs.
Abstract: Models with intractable normalizing functions arise frequently in statistics. Common examples of such models include exponential random graph models for social networks and Markov point processes f...

Journal ArticleDOI
TL;DR: This article proposes an alternative formulation of the estimator as a solution of an optimization problem with an estimated nuisance parameter, and derives theory involving a nonstandard convergence rate and a nonnormal limiting distribution of the quantile-optimal treatment regime.
Abstract: Finding the optimal treatment regime (or a series of sequential treatment regimes) based on individual characteristics has important applications in areas such as precision medicine, government policies and active labor market interventions. In the current literature, the optimal treatment regime is usually defined as the one that maximizes the average benefit in the potential population. This paper studies a general framework for estimating the quantile-optimal treatment regime, which is of importance in many real-world applications. Given a collection of treatment regimes, we consider robust estimation of the quantile-optimal treatment regime, which does not require the analyst to specify an outcome regression model. We propose an alternative formulation of the estimator as a solution of an optimization problem with an estimated nuisance parameter. This novel representation allows us to investigate the asymptotic theory of the estimated optimal treatment regime using empirical process techniques...

Journal ArticleDOI
TL;DR: An estimator of an optimal treatment regime composed of a sequence of decision rules, each expressible as a list of “if-then” statements that can be presented as either a paragraph or as a simple flowchart that is immediately interpretable to domain experts is proposed.
Abstract: Precision medicine is currently a topic of great interest in clinical and intervention science. A key component of precision medicine is that it is evidence-based, i.e., data-driven, and consequently there has been tremendous interest in estimation of precision medicine strategies using observational or randomized study data. One way to formalize precision medicine is through a treatment regime, which is a sequence of decision rules, one per stage of clinical intervention, that map up-to-date patient information to a recommended treatment. An optimal treatment regime is defined as maximizing the mean of some cumulative clinical outcome if applied to a population of interest. It is well-known that even under simple generative models an optimal treatment regime can be a highly nonlinear function of patient information. Consequently, a focal point of recent methodological research has been the development of flexible models for estimating optimal treatment regimes. However, in many settings, estimati...

Journal ArticleDOI
TL;DR: This work proposes a class of models for nonmonotone missing data mechanisms that spans the MAR model, while allowing the underlying full data law to remain unrestricted and introduces an unconstrained maximum likelihood estimator for estimating the missing data probabilities which is easily implemented using existing software.
Abstract: The development of coherent missing data models to account for nonmonotone missing at random (MAR) data by inverse probability weighting (IPW) remains to date largely unresolved. As a consequence, IPW has essentially been restricted for use only in monotone MAR settings. We propose a class of models for nonmonotone missing data mechanisms that spans the MAR model, while allowing the underlying full data law to remain unrestricted. For parametric specifications within the proposed class, we introduce an unconstrained maximum likelihood estimator for estimating the missing data probabilities which is easily implemented using existing software. To circumvent potential convergence issues with this procedure, we also introduce a constrained Bayesian approach to estimate the missing data process which is guaranteed to yield inferences that respect all model restrictions. The efficiency of standard IPW estimation is improved by incorporating information from incomplete cases through an augmented estimati...