scispace - formally typeset
Search or ask a question

Showing papers in "Biometrika in 2021"


Journal ArticleDOI
TL;DR: In this paper, a general class of two-step algorithms for heterogeneous treatment effect estimation in observational studies is developed, which can use any loss-minimization method, e.g., penalized regression, deep neural networks, or boosting, and can be fine-tuned by cross validation.
Abstract: Flexible estimation of heterogeneous treatment effects lies at the heart of many statistical challenges, such as personalized medicine and optimal resource allocation. In this paper, we develop a general class of two-step algorithms for heterogeneous treatment effect estimation in observational studies. We first estimate marginal effects and treatment propensities in order to form an objective function that isolates the causal component of the signal. Then, we optimize this data-adaptive objective function. Our approach has several advantages over existing methods. From a practical perspective, our method is flexible and easy to use: In both steps, we can use any loss-minimization method, e.g., penalized regression, deep neural networks, or boosting; moreover, these methods can be fine-tuned by cross validation. Meanwhile, in the case of penalized kernel regression, we show that our method has a quasi-oracle property: Even if the pilot estimates for marginal effects and treatment propensities are not particularly accurate, we achieve the same error bounds as an oracle who has a priori knowledge of these two nuisance components. We implement variants of our approach based on penalized regression, kernel ridge regression, and boosting in a variety of simulation setups, and find promising performance relative to existing baselines.

141 citations


Journal ArticleDOI
TL;DR: In this paper, the maximum penalized likelihood estimates of binomial generalized linear models are derived for a broad class of logistic regression models, including probit and log-log models.
Abstract: Penalization of the likelihood by Jeffreys’ invariant prior, or a positive power thereof, is shown to produce finite-valued maximum penalized likelihood estimates in a broad class of binomial generalized linear models. The class of models includes logistic regression, where the Jeffreys-prior penalty is known additionally to reduce the asymptotic bias of the maximum likelihood estimator, and models with other commonly used link functions, such as probit and log-log. Shrinkage towards equiprobability across observations, relative to the maximum likelihood estimator, is established theoretically and studied through illustrative examples. Some implications of finiteness and shrinkage for inference are discussed, particularly when inference is based on Wald-type procedures. A widely applicable procedure is developed for computation of maximum penalized likelihood estimates, by using repeated maximum likelihood fits with iteratively adjusted binomial responses and totals. These theoretical results and methods underpin the increasingly widespread use of reduced-bias and similarly penalized binomial regression models in many applied fields.

66 citations


Journal ArticleDOI
TL;DR: This work proposes a general framework based on selectively traversed accumulation rules (STAR) for interactive multiple testing with generic structural constraints on the rejection set, and suggests update rules for a variety of applications with complex structural constraints.
Abstract: We propose a general framework based on selectively traversed accumulation rules (STAR) for interactive multiple testing with generic structural constraints on the rejection set. It combines accumulation tests from ordered multiple testing with data-carving ideas from post-selection inference, allowing for highly flexible adaptation to generic structural information. Our procedure defines an interactive protocol for gradually pruning a candidate rejection set, beginning with the set of all hypotheses and shrinking with each step. By restricting the information at each step via a technique we call masking, our protocol enables interaction while controlling the false discovery rate (FDR) in finite samples for any data-adaptive update rule that the analyst may choose. We suggest update rules for a variety of applications with complex structural constraints, show that STAR performs well for problems ranging from convex region detection to FDR control on directed acyclic graphs, and show how to extend it to regression problems where knockoff statistics are available in lieu of $p$-values.

48 citations


Journal ArticleDOI
TL;DR: In this article, optimal subsampling for quantile regression is investigated and algorithms based on the optimal sampling probabilities are proposed to obtain asymptotic distributions and optimality of the resulting estimators.
Abstract: We investigate optimal subsampling for quantile regression. We derive the asymptotic distribution of a general subsampling estimator and then derive two versions of optimal subsampling probabilities. One version minimizes the trace of the asymptotic variance-covariance matrix for a linearly transformed parameter estimator and the other minimizes that of the original parameter estimator. The former does not depend on the densities of the responses given covariates and is easy to implement. Algorithms based on optimal subsampling probabilities are proposed and asymptotic distributions and asymptotic optimality of the resulting estimators are established. Furthermore, we propose an iterative subsampling procedure based on the optimal subsampling probabilities in the linearly transformed parameter estimation which has great scalability to utilize available computational resources. In addition, this procedure yields standard errors for parameter estimators without estimating the densities of the responses given the covariates. We provide numerical examples based on both simulated and real data to illustrate the proposed method.

46 citations


Journal ArticleDOI
TL;DR: The definition of causal excursion effect is proposed that can be used in primary aim analysis for MRTs with binary outcomes, and a semiparametric, locally efficient estimator of the causal effect is developed.
Abstract: Advances in wearables and digital technology now make it possible to deliver behavioral mobile health interventions to individuals in their everyday life. The micro-randomized trial is increasingly used to provide data to inform the construction of these interventions. In a micro-randomized trial, each individual is repeatedly randomized among multiple intervention options, often hundreds or even thousands of times, over the course of the trial. This work is motivated by multiple micro-randomized trials that have been conducted or are currently in the field, in which the primary outcome is a longitudinal binary outcome. The primary aim of such micro-randomized trials is to examine whether a particular time-varying intervention has an effect on the longitudinal binary outcome, often marginally over all but a small subset of the individual's data. We propose the definition of causal excursion effect that can be used in such primary aim analysis for micro-randomized trials with binary outcomes. Under rather restrictive assumptions one can, based on existing literature, derive a semiparametric, locally efficient estimator of the causal effect. Starting from this estimator, we develop an estimator that can be used as the basis of a primary aim analysis under more plausible assumptions. Simulation studies are conducted to compare the estimators. We illustrate the developed methods using data from the micro-randomized trial, BariFit. In BariFit, the goal is to support weight maintenance for individuals who received bariatric surgery.

43 citations


Journal ArticleDOI
TL;DR: This work provides the first consistency guarantees, both uniform and high-dimensional, of a greedy permutation-based search over the edge-graph of a sub-polytope of the permutohedron, called the DAG associahedron.
Abstract: Directed acyclic graphical models, or DAG models, are widely used to represent complex causal systems. Since the basic task of learning such a model from data is NP-hard, a standard approach is greedy search over the space of directed acyclic graphs or Markov equivalence classes of directed acyclic graphs. As the space of directed acyclic graphs on $p$ nodes and the associated space of Markov equivalence classes are both much larger than the space of permutations, it is desirable to consider permutation-based greedy searches. Here, we provide the first consistency guarantees, both uniform and high-dimensional, of a greedy permutation-based search. This search corresponds to a simplex-like algorithm operating over the edge-graph of a sub-polytope of the permutohedron, called a DAG associahedron. Every vertex in this polytope is associated with a directed acyclic graph, and hence with a collection of permutations that are consistent with the directed acyclic graph ordering. A walk is performed on the edges of the polytope maximizing the sparsity of the associated directed acyclic graphs. We show via simulated and real data that this permutation search is competitive with current approaches.

37 citations


Journal ArticleDOI
TL;DR: In this paper, a class of parameters with the mixed bias property admits rate doubly robust estimators, i.e., estimators that are consistent and asymptotically normal when one succeeds in estimating both nuisance functions at sufficiently fast rates, with the possibility of trading off slower rates of convergence for the estimator of one of the nuisance functions with faster rates for the estimation of the other nuisance function.
Abstract: In this article we study a class of parameters with the so-called `mixed bias property'. For parameters with this property, the bias of the semiparametric efficient one step estimator is equal to the mean of the product of the estimation errors of two nuisance functions. In non-parametric models, parameters with the mixed bias property admit so-called rate doubly robust estimators, i.e. estimators that are consistent and asymptotically normal when one succeeds in estimating both nuisance functions at sufficiently fast rates, with the possibility of trading off slower rates of convergence for the estimator of one of the nuisance functions with faster rates for the estimator of the other nuisance. We show that the class of parameters with the mixed bias property strictly includes two recently studied classes of parameters which, in turn, include many parameters of interest in causal inference. We characterize the form of parameters with the mixed bias property and of their influence functions. Furthermore, we derive two functional moment equations, each being solved at one of the two nuisance functions, as well as, two functional loss functions, each being minimized at one of the two nuisance functions. These loss functions can be used to derive loss based penalized estimators of the nuisance functions.

31 citations


Journal ArticleDOI
TL;DR: In this article, the authors study the asymptotic properties of covariate adjustment under the potential outcome model and propose a bias-corrected estimator that is consistent and asymPTotically normal under weaker conditions.
Abstract: Randomized experiments have become important tools in empirical research. In a completely randomized treatment-control experiment, the simple difference in means of the outcome is unbiased for the average treatment effect, and covariate adjustment can further improve the efficiency without assuming a correctly specified outcome model. In modern applications, experimenters often have access to many covariates, motivating the need for a theory of covariate adjustment under the asymptotic regime with a diverging number of covariates. We study the asymptotic properties of covariate adjustment under the potential outcomes model and propose a bias-corrected estimator that is consistent and asymptotically normal under weaker conditions. Our theory is purely randomization-based without imposing any parametric outcome model assumptions. To prove the theoretical results, we develop novel vector and matrix concentration inequalities for sampling without replacement.

30 citations


Journal ArticleDOI
TL;DR: In this article, a conditional central limit theorem for data oblivious sketches is proved for Gaussian, Hadamard and Clarkson-Woodruff estimators, and the authors show that the best sketching algorithm in terms of mean square error is related to the signal to noise ratio in the source dataset.
Abstract: Sketching is a probabilistic data compression technique that has been largely developed in the computer science community. Numerical operations on big datasets can be intolerably slow; sketching algorithms address this issue by generating a smaller surrogate dataset. Typically, inference proceeds on the compressed dataset. Sketching algorithms generally use random projections to compress the original dataset and this stochastic generation process makes them amenable to statistical analysis. We argue that the sketched data can be modelled as a random sample, thus placing this family of data compression methods firmly within an inferential framework. In particular, we focus on the Gaussian, Hadamard and Clarkson-Woodruff sketches, and their use in single pass sketching algorithms for linear regression with huge $n$. We explore the statistical properties of sketched regression algorithms and derive new distributional results for a large class of sketched estimators. A key result is a conditional central limit theorem for data oblivious sketches. An important finding is that the best choice of sketching algorithm in terms of mean square error is related to the signal to noise ratio in the source dataset. Finally, we demonstrate the theory and the limits of its applicability on two real datasets.

27 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a theoretical and computational study of the properties of the interventional (in)direct effect estimands based on the efficient influence fucntion (EIF) in the non-parametric statistical model.
Abstract: Interventional effects for mediation analysis were proposed as a solution to the lack of identifiability of natural (in)direct effects in the presence of a mediator-outcome confounder affected by exposure. We present a theoretical and computational study of the properties of the interventional (in)direct effect estimands based on the efficient influence fucntion (EIF) in the non-parametric statistical model. We use the EIF to develop two asymptotically optimal, non-parametric estimators that leverage data-adaptive regression for estimation of the nuisance parameters: a one-step estimator and a targeted minimum loss estimator. A free and open source \texttt{R} package implementing our proposed estimators is made available on GitHub. We further present results establishing the conditions under which these estimators are consistent, multiply robust, $n^{1/2}$-consistent and efficient. We illustrate the finite-sample performance of the estimators and corroborate our theoretical results in a simulation study. We also demonstrate the use of the estimators in our motivating application to elucidate the mechanisms behind the unintended harmful effects that a housing intervention had on adolescent girls' risk behavior.

27 citations


Journal ArticleDOI
TL;DR: A study of this limiting chain allows us to provide parameter dimension-dependent guidelines on how to optimally scale a normal random walk proposal, and the number of Monte Carlo samples for the pseudo-marginal method in the large-sample regime.
Abstract: SummaryThe pseudo-marginal algorithm is a variant of the Metropolis–Hastings algorithm which samples asymptotically from a probability distribution when it is only possible to estimate unbiasedly an unnormalized version of its density. Practically, one has to trade off the computational resources used to obtain this estimator against the asymptotic variances of the ergodic averages obtained by the pseudo-marginal algorithm. Recent works on optimizing this trade-off rely on some strong assumptions, which can cast doubts over their practical relevance. In particular, they all assume that the distribution of the difference between the log-density, and its estimate is independent of the parameter value at which it is evaluated. Under regularity conditions we show that as the number of data points tends to infinity, a space-rescaled version of the pseudo-marginal chain converges weakly to another pseudo-marginal chain for which this assumption indeed holds. A study of this limiting chain allows us to provide parameter dimension-dependent guidelines on how to optimally scale a normal random walk proposal, and the number of Monte Carlo samples for the pseudo-marginal method in the large-sample regime. These findings complement and validate currently available results.

Journal ArticleDOI
TL;DR: In this article, the authors compared Chatterjee's rank correlation with three popular local alternatives in independence testing literature, namely, Hoeffding's D, Blum-Kiefer-Rosenblatt's R, and Bergsma-Dassios-Yanagimoto's τ.
Abstract: Recently, Chatterjee (2020) introduced a new rank correlation that attracts many statisticians’ attention. This paper compares it to three already well-used rank correlations in literature, Hoeffding’s D, Blum–Kiefer–Rosenblatt’s R, and Bergsma–Dassios–Yanagimoto’s τ. Three criteria are considered: (i) computational efficiency, (ii) consistency against fixed alternatives, and (iii) power against local alternatives. Our main results show the unfortunate rate sub-optimality of Chatterjee’s rank correlation against three popular local alternatives in independence testing literature. Along with some recent computational breakthroughs, they favor the other three in many settings.

Journal ArticleDOI
TL;DR: This paper proposes distributed algorithms which account for the heterogeneous distributions by allowing site-specific nuisance parameters and establishes the non-asymptotic risk bound of the proposed distributed estimator and its limiting distribution in the two-index asymptotic setting.
Abstract: In multicenter research, individual-level data are often protected against sharing across sites. To overcome the barrier of data sharing, many distributed algorithms, which only require sharing aggregated information, have been developed. The existing distributed algorithms usually assume the data are homogeneously distributed across sites. This assumption ignores the important fact that the data collected at different sites may come from various sub-populations and environments, which can lead to heterogeneity in the distribution of the data. Ignoring the heterogeneity may lead to erroneous statistical inference. In this paper, we propose distributed algorithms which account for the heterogeneous distributions by allowing site-specific nuisance parameters. The proposed methods extend the surrogate likelihood approach to the heterogeneous setting by applying a novel density ratio tilting method to the efficient score function. The proposed algorithms maintain the same communication cost as the existing communication-efficient algorithms. We establish a non-asymptotic risk bound for the proposed distributed estimator and its limiting distribution in the two-index asymptotic setting which allows both sample size per site and the number of sites to go to infinity. In addition, we show that the asymptotic variance of the estimator attains the Cramer-Rao lower bound when the number of sites is in rate smaller than the sample size at each site. Finally, we use simulation studies and a real data application to demonstrate the validity and feasibility of the proposed methods.

Journal ArticleDOI
TL;DR: The asymptotic normality of underlying DRO estimators as well as the properties of an optimal (in a suitable sense) confidence region induced by the Wasserstein DRO formulation are studied.
Abstract: Wasserstein distributionally robust optimization estimators are obtained as solutions of min-max problems in which the statistician selects a parameter minimizing the worst-case loss among all probability models within a certain distance (in a Wasserstein sense) from the underlying empirical measure. While motivated by the need to identify optimal model parameters or decision choices that are robust to model misspecification, these distributionally robust estimators recover a wide range of regularized estimators, including square-root lasso and support vector machines, among others, as particular cases. This paper studies the asymptotic normality of these distributionally robust estimators as well as the properties of an optimal (in a suitable sense) confidence region induced by the Wasserstein distributionally robust optimization formulation. In addition, key properties of min-max distributionally robust optimization problems are also studied, for example, we show that distributionally robust estimators regularize the loss based on its derivative and we also derive general sufficient conditions which show the equivalence between the min-max distributionally robust optimization problem and the corresponding max-min formulation.

Journal ArticleDOI
TL;DR: This work lays the foundation for a general theory of elicitation complexity, including several basic results about how elicit complexity behaves, and the complexity of standard properties of interest.
Abstract: A property, or statistical functional, is said to be elicitable if it minimizes expected loss for some loss function. The study of which properties are elicitable sheds light on the capabilities and limitations of point estimation and empirical risk minimization. While recent work asks which properties are elicitable, we instead advocate for a more nuanced question: how many dimensions are required to indirectly elicit a given property? This number is called the elicitation complexity of the property. We lay the foundation for a general theory of elicitation complexity, including several basic results about how elicitation complexity behaves, and the complexity of standard properties of interest. Building on this foundation, our main result gives tight complexity bounds for the broad class of Bayes risks. We apply these results to several properties of interest, including variance, entropy, norms, and several classes of financial risk measures. We conclude with discussion and open directions.

Journal ArticleDOI
TL;DR: In this article, the authors proposed several estimators for model free inference on average treatment effect defined as the difference between response means under two treatments, and established asymptotic normality of the proposed estimators under all popular covariate-adaptive randomization schemes including the minimization whose theoretical property is unclear.
Abstract: Covariate-adaptive randomization schemes such as the minimization and stratified permuted blocks are often applied in clinical trials to balance treatment assignments across prognostic factors. The existing theoretical developments on inference after covariate-adaptive randomization are mostly limited to situations where a correct model between the response and covariates can be specified or the randomization method has well-understood properties. Based on stratification with covariate levels utilized in randomization and a further adjusting for covariates not used in randomization, in this article we propose several estimators for model free inference on average treatment effect defined as the difference between response means under two treatments. We establish asymptotic normality of the proposed estimators under all popular covariate-adaptive randomization schemes including the minimization whose theoretical property is unclear, and we show that the asymptotic distributions are invariant with respect to covariate-adaptive randomization methods. Consistent variance estimators are constructed for asymptotic inference. Asymptotic relative efficiencies and finite sample properties of estimators are also studied. We recommend using one of our proposed estimators for valid and model free inference after covariate-adaptive randomization.

Journal ArticleDOI
TL;DR: In this paper, the adaptive Markov chain Monte Carlo (MCMC) algorithm was proposed to address the problem of large-p, small-n settings, where the majority of the p variables will be approximately uncorrelated a posteriori.
Abstract: The availability of datasets with large numbers of variables is rapidly increasing. The effective application of Bayesian variable selection methods for regression with these datasets has proved difficult since available Markov chain Monte Carlo methods do not perform well in typical problem sizes of interest. We propose new adaptive Markov chain Monte Carlo algorithms to address this shortcoming. The adaptive design of these algorithms exploits the observation that in large-p⁠, small-n settings, the majority of the p variables will be approximately uncorrelated a posteriori. The algorithms adaptively build suitable nonlocal proposals that result in moves with squared jumping distance significantly larger than standard methods. Their performance is studied empirically in high-dimensional problems and speed-ups of up to four orders of magnitude are observed.

Journal ArticleDOI
TL;DR: If the predictor is augmented by an artificially generated random vector, then the parts of the eigenvectors of the matrix induced by the augmentation display a pattern that reveals information about the order to be determined, which greatly enhances the accuracy of order determination.
Abstract: In many dimension reduction problems in statistics and machine learning, such as in principal component analysis, canonical correlation analysis, independent component analysis and sufficient dimension reduction, it is important to determine the dimension of the reduced predictor, which often amounts to estimating the rank of a matrix. This problem is called order determination. In this article, we propose a novel and highly effective order-determination method based on the idea of predictor augmentation. We show that if the predictor is augmented by an artificially generated random vector, then the parts of the eigenvectors of the matrix induced by the augmentation display a pattern that reveals information about the order to be determined. This information, when combined with the information provided by the eigenvalues of the matrix, greatly enhances the accuracy of order determination.

Journal ArticleDOI
TL;DR: A fast algorithm is described and it is proved that it controls relevant error rates given certain assumptions on the dependence between the $p$-values and it provides the desired guarantees under a range of dependency structures.
Abstract: We introduce a multiple testing procedure that controls global error rates at multiple levels of resolution. Conceptually, we frame this problem as the selection of hypotheses that are organized hierarchically in a tree structure. We describe a fast algorithm and prove that it controls relevant error rates given certain assumptions on the dependence between the $p$-values. Through simulations, we demonstrate that the proposed procedure provides the desired guarantees under a range of dependency structures and that it has the potential to gain power over alternative methods. Finally, we apply the method to studies on the genetic regulation of gene expression across multiple tissues and on the relation between the gut microbiome and colorectal cancer.

Journal ArticleDOI
TL;DR: In this article, confidence intervals for conditional treatment effects that are uniformly valid, regardless of whether the outcome model is correct, were obtained by incorporating an additional model for the treatment selection mechanism.
Abstract: Eliminating the effect of confounding in observational studies typically involves fitting a model for an outcome adjusted for covariates. When, as often, these covariates are high-dimensional, this necessitates the use of sparse estimators, such as the lasso, or other regularization approaches. Naïve use of such estimators yields confidence intervals for the conditional treatment effect parameter that are not uniformly valid. Moreover, as the number of covariates grows with the sample size, correctly specifying a model for the outcome is nontrivial. In this article we deal with both of these concerns simultaneously, obtaining confidence intervals for conditional treatment effects that are uniformly valid, regardless of whether the outcome model is correct. This is done by incorporating an additional model for the treatment selection mechanism. When both models are correctly specified, we can weaken the standard conditions on model sparsity. Our procedure extends to multivariate treatment effect parameters and complex longitudinal settings.

Journal ArticleDOI
TL;DR: In this article, a Gibbs version of the ABC approach is explored, which runs component-wise approximate Bayesian computation steps aimed at the corresponding conditional posterior distributions, and based on summary statistics of reduced dimensions.
Abstract: Approximate Bayesian computation methods are useful for generative models with intractable likelihoods. These methods are however sensitive to the dimension of the parameter space, requiring exponentially increasing resources as this dimension grows. To tackle this difficulty, we explore a Gibbs version of the ABC approach that runs component-wise approximate Bayesian computation steps aimed at the corresponding conditional posterior distributions, and based on summary statistics of reduced dimensions. While lacking the standard justifications for the Gibbs sampler, the resulting Markov chain is shown to converge in distribution under some partial independence conditions. The associated stationary distribution can further be shown to be close to the true posterior distribution and some hierarchical versions of the proposed mechanism enjoy a closed form limiting distribution. Experiments also demonstrate the gain in efficiency brought by the Gibbs version over the standard solution.

Journal ArticleDOI
TL;DR: It is proved that given the initialization, the estimator converges linearly with a nontrivial, minimax optimal statistical error, and it is shown that the proposed nonconvex procedure outperforms existing methods.
Abstract: Differential graphical models are designed to represent the difference between the conditional dependence structures of two groups, thus are of particular interest for scientific investigation. Motivated by modern applications, this manuscript considers an extended setting where each group is generated by a latent variable Gaussian graphical model. Due to the existence of latent factors, the differential network is decomposed into sparse and low-rank components, both of which are symmetric indefinite matrices. We estimate these two components simultaneously using a two-stage procedure: (i) an initialization stage, which computes a simple, consistent estimator, and (ii) a convergence stage, implemented using a projected alternating gradient descent algorithm applied to a nonconvex objective, initialized using the output of the first stage. We prove that given the initialization, the estimator converges linearly with a nontrivial, minimax optimal statistical error. Experiments on synthetic and real data illustrate that the proposed nonconvex procedure outperforms existing methods.

Journal ArticleDOI
TL;DR: In this paper, the authors present a proof of the conjecture in Pearl (1995) about testing the validity of an instrumental variable in hidden variable models, which implies that instrument validity cannot be tested in the case where the endogenous treatment is continuously distributed.
Abstract: This note presents a proof of the conjecture in Pearl (1995) about testing the validity of an instrumental variable in hidden variable models. It implies that instrument validity cannot be tested in the case where the endogenous treatment is continuously distributed. This stands in contrast to the classical testability results for instrument validity when the treatment is discrete. However, imposing weak structural assumptions on the model, such as continuity between the observable variables, can re-establish theoretical testability in the continuous setting.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a new method for functional nonparametric regression with a predictor that resides on a finite-dimensional manifold but is only observable in an infinite-dimensional space.
Abstract: We propose a new method for functional nonparametric regression with a predictor that resides on a finite-dimensional manifold but is only observable in an infinite-dimensional space. Contamination of the predictor due to discrete/noisy measurements is also accounted for. By using functional local linear manifold smoothing, the proposed estimator enjoys a polynomial rate of convergence that adapts to the intrinsic manifold dimension and the contamination level. This is in contrast to the logarithmic convergence rate in the literature of functional nonparametric regression. We also observe a phase transition phenomenon regarding the interplay of the manifold dimension and the contamination level. We demonstrate that the proposed method has favorable numerical performance relative to commonly used methods via simulated and real data examples.

Journal ArticleDOI
TL;DR: In this article, the authors consider estimation of the local average treatment effect under the binary instrumental variable model and propose novel modelling and estimation procedures that improve upon existing proposals in terms of model congeniality, interpretability, robustness and efficiency.
Abstract: Instrumental variables are widely used to deal with unmeasured confounding in observational studies and imperfect randomized controlled trials. In these studies, researchers often target the so-called local average treatment effect as it is identifiable under mild conditions. In this paper we consider estimation of the local average treatment effect under the binary instrumental variable model. We discuss the challenges of causal estimation with a binary outcome and show that, surprisingly, it can be more difficult than in the case with a continuous outcome. We propose novel modelling and estimation procedures that improve upon existing proposals in terms of model congeniality, interpretability, robustness and efficiency. Our approach is illustrated via simulation studies and a real data analysis.

Journal ArticleDOI
TL;DR: This work studies posterior contraction rates in sparse high-dimensional generalized linear models using priors incorporating sparsity, and shows that Bayesian methods achieve convergence properties analogous to lasso-type procedures.
Abstract: We study posterior contraction rates in sparse high-dimensional generalized linear models using priors incorporating sparsity. A mixture of a point mass at zero and a continuous distribution is used as the prior distribution on regression coefficients. In addition to the usual posterior, the fractional posterior, which is obtained by applying the Bayes theorem on a fractional power of the likelihood, is also considered. The latter allows uniformity in posterior contraction over a 15 larger subset of the parameter space. In our setup, the link function of the generalized linear model need not be canonical. We show that Bayesian methods achieve convergence properties analogous to lasso-type procedures. Our results can be used to derive posterior contraction rates in many generalized linear models including logistic, Poisson regression, and others.

Journal ArticleDOI
TL;DR: This work proposes a nonparametric empirical-Bayes approach for constructing optimal selection-adjusted confidence sets, which produces confidence sets that are as short as possible on average, while both adjusting for selection and maintaining exact frequentist coverage uniformly over the parameter space.
Abstract: Many recently developed Bayesian methods have focused on sparse signal detection. However, much less work has been done addressing the natural follow-up question: how to make valid inferences for the magnitude of those signals after selection. Ordinary Bayesian credible intervals suffer from selection bias, owing to the fact that the target of inference is chosen adaptively. Existing Bayesian approaches for correcting this bias produce credible intervals with poor frequentist properties, while existing frequentist approaches require sacrificing the benefits of shrinkage typical in Bayesian methods, resulting in confidence intervals that are needlessly wide. We address this gap by proposing a nonparametric empirical-Bayes approach for constructing optimal selection-adjusted confidence sets. Our method produces confidence sets that are as short as possible on average, while both adjusting for selection and maintaining exact frequentist coverage uniformly over the parameter space. Our main theoretical result establishes an important consistency property of our procedure: that under mild conditions, it asymptotically converges to the results of an oracle-Bayes analysis in which the prior distribution of signal sizes is known exactly. Across a series of examples, the method outperforms existing frequentist techniques for post-selection inference, producing confidence sets that are notably shorter but with the same coverage guarantee.

Journal ArticleDOI
TL;DR: A dimension reduction framework that effectively reduces the estimation of the individualized dose rule to a lower-dimensional subspace of the covariates, leading to a more parsimonious model and a pseudo-direct learning approach that focuses more on estimating the dimensionality-reduced sub space of the treatment outcome.
Abstract: Learning an individualized dose rule in personalized medicine is a challenging statistical problem. Existing methods often suffer from the curse of dimensionality, especially when the decision function is estimated nonparametrically. To tackle this problem, we propose a dimension reduction framework that effectively reduces the estimation to a lower-dimensional subspace of the covariates. We exploit that the individualized dose rule can be defined in a subspace spanned by a few linear combinations of the covariates, leading to a more parsimonious model. Also, our framework does not require the inverse probability of the propensity score under observational studies due to a direct maximization of the value function. This distinguishes us from the outcome weighted learning framework, which also solves decision rules directly. Under the same framework, we further propose a pseudo-direct learning approach focuses more on estimating the dimensionality-reduced subspace of the treatment outcome. Parameters in both approaches can be estimated efficiently using an orthogonality constrained optimization algorithm on the Stiefel manifold. Under mild regularity assumptions, the asymptotic normality results of the proposed estimators can are established, respectively. We also derive the consistency and convergence rate for the value function under the estimated optimal dose rule. We evaluate the performance of the proposed approaches through extensive simulation studies and a warfarin pharmacogenetic dataset.

Journal ArticleDOI
TL;DR: In this paper, a Bayesian nonparametric methodology is proposed to predict the number of new variants in the follow-up study based on the pilot study, and when experimental conditions are kept constant between the pilot and the followup, the prediction is more accurate than three recent proposals and competitive with a more classic proposal.
Abstract: While the cost of sequencing genomes has decreased dramatically in recent years, this expense often remains non-trivial. Under a fixed budget, then, scientists face a natural trade-off between quantity and quality; they can spend resources to sequence a greater number of genomes (quantity) or spend resources to sequence genomes with increased accuracy (quality). Our goal is to find the optimal allocation of resources between quantity and quality. Optimizing resource allocation promises to reveal as many new variations in the genome as possible, and thus as many new scientific insights as possible. In this paper, we consider the common setting where scientists have already conducted a pilot study to reveal variants in a genome and are contemplating a follow-up study. We introduce a Bayesian nonparametric methodology to predict the number of new variants in the follow-up study based on the pilot study. When experimental conditions are kept constant between the pilot and follow-up, we demonstrate on real data from the gnomAD project that our prediction is more accurate than three recent proposals, and competitive with a more classic proposal. Unlike existing methods, though, our method allows practitioners to change experimental conditions between the pilot and the follow-up. We demonstrate how this distinction allows our method to be used for (i) more realistic predictions and (ii) optimal allocation of a fixed budget between quality and quantity.

Journal ArticleDOI
TL;DR: In this article, the authors show that the empirical likelihood statistic may lose asymptotic pivotalness under the above nonstandard asymPTotic frameworks, and argue that these phenomena are understood as emergence of Efron and Stein's (1981) bias of the jackknife variance estimator in the first order.
Abstract: This paper sheds light on inference problems for statistical models under alternative or nonstandard asymptotic frameworks from the perspective of jackknife empirical likelihood. Examples include small bandwidth asymptotics for semiparametric inference and goodness-of- fit testing, sparse network asymptotics, many covariates asymptotics for regression models, and many-weak instruments asymptotics for instrumental variable regression. We first establish Wilks’ theorem for the jackknife empirical likelihood statistic on a general semiparametric in- ference problem under the conventional asymptotics. We then show that the jackknife empirical likelihood statistic may lose asymptotic pivotalness under the above nonstandard asymptotic frameworks, and argue that these phenomena are understood as emergence of Efron and Stein’s (1981) bias of the jackknife variance estimator in the first order. Finally we propose a modi- fication of the jackknife empirical likelihood to recover asymptotic pivotalness under both the conventional and nonstandard asymptotics. Our modification works for all above examples and provides a unified framework to investigate nonstandard asymptotic problems.