scispace - formally typeset
Search or ask a question

Showing papers in "Journal of The Royal Statistical Society Series B-statistical Methodology in 2018"


Journal ArticleDOI
TL;DR: In this paper, the authors propose a new framework of "model-X" knockoffs, which reads from a different perspective the knockoff procedure that was originally designed for controlling the false discovery rate in linear models.
Abstract: Many contemporary large‐scale applications involve building interpretable models linking a large set of potential covariates to a response in a non‐linear fashion, such as when the response is binary. Although this modelling problem has been extensively studied, it remains unclear how to control the fraction of false discoveries effectively even in high dimensional logistic regression, not to mention general high dimensional non‐linear models. To address such a practical problem, we propose a new framework of ‘model‐X’ knockoffs, which reads from a different perspective the knockoff procedure that was originally designed for controlling the false discovery rate in linear models. Whereas the knockoffs procedure is constrained to homoscedastic linear models with n⩾p, the key innovation here is that model‐X knockoffs provide valid inference from finite samples in settings in which the conditional distribution of the response is arbitrary and completely unknown. Furthermore, this holds no matter the number of covariates. Correct inference in such a broad setting is achieved by constructing knockoff variables probabilistically instead of geometrically. To do this, our approach requires that the covariates are random (independent and identically distributed rows) with a distribution that is known, although we provide preliminary experimental evidence that our procedure is robust to unknown or estimated distributions. To our knowledge, no other procedure solves the controlled variable selection problem in such generality but, in the restricted settings where competitors exist, we demonstrate the superior power of knockoffs through simulations. Finally, we apply our procedure to data from a case–control study of Crohn's disease in the UK, making twice as many discoveries as the original analysis of the same data.

371 citations


Journal ArticleDOI
TL;DR: A method for debiasing penalized regression adjustments to allow sparse regression methods like the lasso to be used for √n‐consistent inference of average treatment effects in high dimensional linear models.
Abstract: There are many settings where researchers are interested in estimating average treatment effects and are willing to rely on the unconfoundedness assumption, which requires that the treatment assignment be as good as random conditional on pretreatment variables. The unconfoundedness assumption is often more plausible if a large number of pretreatment variables are included in the analysis, but this can worsen the performance of standard approaches to treatment effect estimation. We develop a method for debiasing penalized regression adjustments to allow sparse regression methods like the lasso to be used for √n‐consistent inference of average treatment effects in high dimensional linear models. Given linearity, we do not need to assume that the treatment propensities are estimable, or that the average treatment effect is a sparse contrast of the outcome model parameters. Rather, in addition to standard assumptions used to make lasso regression on the outcome model consistent under 1‐norm error, we require only overlap, i.e. that the propensity score be uniformly bounded away from 0 and 1. Procedurally, our method combines balancing weights with a regularized regression adjustment.

326 citations


Journal ArticleDOI
TL;DR: In this article, the authors propose a two-stage procedure called inspect for estimation of change points: first, a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimization problem derived from the cumulative sum transformation of the time series, and then apply an existing univariate change point estimation algorithm to the projected series.
Abstract: Summary Change points are a very common feature of ‘big data’ that arrive in the form of a data stream. We study high dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the co-ordinates. The challenge is to borrow strength across the co-ordinates to detect smaller changes than could be observed in any individual component series. We propose a two-stage procedure called inspect for estimation of the change points: first, we argue that a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimization problem derived from the cumulative sum transformation of the time series. We then apply an existing univariate change point estimation algorithm to the projected series. Our theory provides strong guarantees on both the number of estimated change points and the rates of convergence of their locations, and our numerical studies validate its highly competitive empirical performance for a wide range of data-generating mechanisms. Software implementing the methodology is available in the R package InspectChangepoint.

162 citations


Journal ArticleDOI
TL;DR: This work embeds the joint distribution and the product of the marginals in a reproducing kernel Hilbert space and defines the d‐variable Hilbert–Schmidt independence criterion dHSIC as the squared distance between the embeddings.
Abstract: Summary We investigate the problem of testing whether d possibly multivariate random variables, which may or may not be continuous, are jointly (or mutually) independent. Our method builds on ideas of the two-variable Hilbert–Schmidt independence criterion but allows for an arbitrary number of variables. We embed the joint distribution and the product of the marginals in a reproducing kernel Hilbert space and define the d-variable Hilbert–Schmidt independence criterion dHSIC as the squared distance between the embeddings. In the population case, the value of dHSIC is 0 if and only if the d variables are jointly independent, as long as the kernel is characteristic. On the basis of an empirical estimate of dHSIC, we investigate three non-parametric hypothesis tests: a permutation test, a bootstrap analogue and a procedure based on a gamma approximation. We apply non-parametric independence testing to a problem in causal discovery and illustrate the new methods on simulated and real data sets.

138 citations


Journal ArticleDOI
TL;DR: The correlated pseudomarginal method (CSM) as discussed by the authors is a modification of the pseudo-argininal method using a likelihood ratio estimator computed by using two correlated likelihood estimators.
Abstract: The pseudomarginal algorithm is a Metropolis–Hastings‐type scheme which samples asymptotically from a target probability density when we can only estimate unbiasedly an unnormalized version of it. In a Bayesian context, it is a state of the art posterior simulation technique when the likelihood function is intractable but can be estimated unbiasedly by using Monte Carlo samples. However, for the performance of this scheme not to degrade as the number T of data points increases, it is typically necessary for the number N of Monte Carlo samples to be proportional to T to control the relative variance of the likelihood ratio estimator appearing in the acceptance probability of this algorithm. The correlated pseudomarginal method is a modification of the pseudomarginal method using a likelihood ratio estimator computed by using two correlated likelihood estimators. For random‐effects models, we show under regularity conditions that the parameters of this scheme can be selected such that the relative variance of this likelihood ratio estimator is controlled when N increases sublinearly with T and we provide guidelines on how to optimize the algorithm on the basis of a non‐standard weak convergence analysis. The efficiency of computations for Bayesian inference relative to the pseudomarginal method empirically increases with T and exceeds two orders of magnitude in some examples.

124 citations


Journal ArticleDOI
TL;DR: The asymptotic distribution of empirical Wasserstein distances is derived as the optimal value of a linear programme with random objective function, which facilitates statistical inference in large generality.
Abstract: Summary The Wasserstein distance is an attractive tool for data analysis but statistical inference is hindered by the lack of distributional limits. To overcome this obstacle, for probability measures supported on finitely many points, we derive the asymptotic distribution of empirical Wasserstein distances as the optimal value of a linear programme with random objective function. This facilitates statistical inference (e.g. confidence intervals for sample-based Wasserstein distances) in large generality. Our proof is based on directional Hadamard differentiability. Failure of the classical bootstrap and alternatives are discussed. The utility of the distributional results is illustrated on two data sets.

110 citations


Journal ArticleDOI
TL;DR: In this paper, the adaptive p-value thresholding (AdaPT) procedure is proposed to estimate a Bayes-optimal p value rejection threshold and control the false discovery rate in finite samples.
Abstract: We consider the problem of multiple hypothesis testing with generic side information: for each hypothesis $H_i$ we observe both a p-value $p_i$ and some predictor $x_i$ encoding contextual information about the hypothesis. For large-scale problems, adaptively focusing power on the more promising hypotheses (those more likely to yield discoveries) can lead to much more powerful multiple testing procedures. We propose a general iterative framework for this problem, called the Adaptive p-value Thresholding (AdaPT) procedure, which adaptively estimates a Bayes-optimal p-value rejection threshold and controls the false discovery rate (FDR) in finite samples. At each iteration of the procedure, the analyst proposes a rejection threshold and observes partially censored p-values, estimates the false discovery proportion (FDP) below the threshold, and either stops to reject or proposes another threshold, until the estimated FDP is below $\alpha$. Our procedure is adaptive in an unusually strong sense, permitting the analyst to use any statistical or machine learning method she chooses to estimate the optimal threshold, and to switch between different models at each iteration as information accrues. We demonstrate the favorable performance of AdaPT by comparing it to state-of-the-art methods in five real applications and two simulation studies.

105 citations


Journal ArticleDOI
TL;DR: Novel assumptions are proposed that allow for identification of the average treatment effect (ATE) under the standard IV model and are clearly separated from model assumptions needed for estimation, so that researchers are not required to commit to a specific observed data model in establishing identification.
Abstract: Instrumental variables (IVs) are widely used for estimating causal effects in the presence of unmeasured confounding. Under the standard IV model, however, the average treatment effect (ATE) is only partially identifiable. To address this, we propose novel assumptions that allow for identification of the ATE. Our identification assumptions are clearly separated from model assumptions needed for estimation, so that researchers are not required to commit to a specific observed data model in establishing identification. We then construct multiple estimators that are consistent under three different observed data models, and multiply robust estimators that are consistent in the union of these observed data models. We pay special attention to the case of binary outcomes, for which we obtain bounded estimators of the ATE that are guaranteed to lie between -1 and 1. Our approaches are illustrated with simulations and a data analysis evaluating the causal effect of education on earnings.

105 citations


Journal ArticleDOI
TL;DR: In this article, sparsity inducing soft decision trees in which the decisions are treated as probabilistic have been proposed, and the posterior distribution concentrates at the minimax rate for sparse functions and functions with additive structures in the high dimensional regime.
Abstract: Ensembles of decision trees are a useful tool for obtaining flexible estimates of regression functions. Examples of these methods include gradient‐boosted decision trees, random forests and Bayesian classification and regression trees. Two potential shortcomings of tree ensembles are their lack of smoothness and their vulnerability to the curse of dimensionality. We show that these issues can be overcome by instead considering sparsity inducing soft decision trees in which the decisions are treated as probabilistic. We implement this in the context of the Bayesian additive regression trees framework and illustrate its promising performance through testing on benchmark data sets. We provide strong theoretical support for our methodology by showing that the posterior distribution concentrates at the minimax rate (up to a logarithmic factor) for sparse functions and functions with additive structures in the high dimensional regime where the dimensionality of the covariate space is allowed to grow nearly exponentially in the sample size. Our method also adapts to the unknown smoothness and sparsity levels, and can be implemented by making minimal modifications to existing Bayesian additive regression tree algorithms.

87 citations


Journal ArticleDOI
TL;DR: In this article, the authors use tail expectiles to estimate Value at Risk (VaR), Expected Shortfall (ES) and Marginal ExpectedShortfall (MES), three instruments of risk protection of utmost importance in actuarial science and statistical finance.
Abstract: We use tail expectiles to estimate Value at Risk (VaR), Expected Shortfall (ES) and Marginal Expected Shortfall (MES), three instruments of risk protection of utmost importance in actuarial science and statistical finance. The concept of expectiles is a least squares analogue of quantiles. Both expectiles and quantiles were embedded in the more general class of M-quantiles as the minimizers of an asymmetric convex loss function. It has been proved very recently that the only M-quantiles that are coherent risk measures are the expectiles. Moreover, expectiles define the only coherent risk measure that is also elicitable. The elicitability corresponds to the existence of a natural backtesting methodology. The estimation of expectiles did not, however, receive yet any attention from the perspective of extreme values. The first estimation method that we propose enables the usage of advanced high quantile and tail-index estimators. The second method joins together the least asymmetrically weighted squares estimation with the tail restrictions of extreme-value theory. We establish the limit distributions of the proposed estimators when they are located in the range of the data or near and even beyond the maximum observed loss. A main tool is to first estimate the intermediate large expectile-based VaR, ES and MES, and then extrapolate these estimates to the very far tails. We show through a detailed simulation study the good performance of the procedures, and also present concrete applications to medical insurance data and three large US investment banks.

75 citations


Journal ArticleDOI
TL;DR: In this article, the authors propose a two-stage hard thresholding with voting (THVV) procedure to find instruments that are valid, or have no direct effect on the outcome and are ignorable.
Abstract: A major challenge in instrumental variable (IV) analysis is to find instruments that are valid, or have no direct effect on the outcome and are ignorable. Typically one is unsure whether all of the putative IVs are in fact valid. We propose a general inference procedure in the presence of invalid IVs, called two‐stage hard thresholding with voting. The procedure uses two hard thresholding steps to select strong instruments and to generate candidate sets of valid IVs. Voting takes the candidate sets and uses majority and plurality rules to determine the true set of valid IVs. In low dimensions with invalid instruments, our proposal correctly selects valid IVs, consistently estimates the causal effect, produces valid confidence intervals for the causal effect and has oracle optimal width, even if the so‐called 50% rule or the majority rule is violated. In high dimensions, we establish nearly identical results without oracle optimality. In simulations, our proposal outperforms traditional and recent methods in the invalid IV literature. We also apply our method to reanalyse the causal effect of education on earnings.

Journal ArticleDOI
TL;DR: The main results highlight that the fully functional procedure performs best under conditions when analogous fPCA based estimators are at their worst, namely when the feature of interest is orthogonal to the leading principal components of the data.
Abstract: Summary Methodology is proposed to uncover structural breaks in functional data that is ‘fully functional’ in the sense that it does not rely on dimension reduction techniques. A thorough asymptotic theory is developed for a fully functional break detection procedure as well as for a break date estimator, assuming a fixed break size and a shrinking break size. The latter result is utilized to derive confidence intervals for the unknown break date. The main results highlight that the fully functional procedures perform best under conditions when analogous estimators based on functional principal component analysis are at their worst, namely when the feature of interest is orthogonal to the leading principal components of the data. The theoretical findings are confirmed by means of a Monte Carlo simulation study in finite samples. An application to annual temperature curves illustrates the practical relevance of the procedures proposed.

Journal ArticleDOI
Nathan Kallus1
TL;DR: In this article, a unified theory of designs for controlled experiments that balance baseline covariates a priori (before treatment and before randomization) using the framework of minimax variance and a new method called kernel allocation was developed.
Abstract: Summary We develop a unified theory of designs for controlled experiments that balance baseline covariates a priori (before treatment and before randomization) using the framework of minimax variance and a new method called kernel allocation. We show that any notion of a priori balance must go hand in hand with a notion of structure, since with no structure on the dependence of outcomes on baseline covariates complete randomization (no special covariate balance) is always minimax optimal. Restricting the structure of dependence, either parametrically or non-parametrically, gives rise to certain covariate imbalance metrics and optimal designs. This recovers many popular imbalance metrics and designs previously developed ad hoc, including randomized block designs, pairwise-matched allocation and rerandomization. We develop a new design method called kernel allocation based on the optimal design when structure is expressed by using kernels, which can be parametric or non-parametric. Relying on modern optimization methods, kernel allocation, which ensures nearly perfect covariate balance without biasing estimates under model misspecification, offers sizable advantages in precision and power as demonstrated in a range of real and synthetic examples. We provide strong theoretical guarantees on variance, consistency and rates of convergence and develop special algorithms for design and hypothesis testing.

Journal ArticleDOI
TL;DR: An L2 ‐type test for testing mutual independence and banded dependence structure for high dimensional data is introduced and it is shown to identify non‐linear dependence in empirical data analysis successfully and has the rate optimality in the class of Gaussian distributions with equal correlation.
Abstract: Summary We introduce an L2-type test for testing mutual independence and banded dependence structure for high dimensional data. The test is constructed on the basis of the pairwise distance covariance and it accounts for the non-linear and non-monotone dependences among the data, which cannot be fully captured by the existing tests based on either Pearson correlation or rank correlation. Our test can be conveniently implemented in practice as the limiting null distribution of the test statistic is shown to be standard normal. It exhibits excellent finite sample performance in our simulation studies even when the sample size is small albeit the dimension is high and is shown to identify non-linear dependence in empirical data analysis successfully. On the theory side, asymptotic normality of our test statistic is shown under quite mild moment assumptions and with little restriction on the growth rate of the dimension as a function of sample size. As a demonstration of good power properties for our distance-covariance-based test, we further show that an infeasible version of our test statistic has the rate optimality in the class of Gaussian distributions with equal correlation.

Journal ArticleDOI
TL;DR: This work proposes a popularity‐adjusted block model for flexible and realistic modelling of node popularity, establishes consistency of likelihood modularity for community detection as well as estimation of node popularities and model parameters, and demonstrates the advantages of the new modularity over the degree‐corrected block model modularity in simulations.
Abstract: Summary The community structure that is observed in empirical networks has been of particular interest in the statistics literature, with a strong emphasis on the study of block models. We study an important network feature called node popularity, which is closely associated with community structure. Neither the classical stochastic block model nor its degree-corrected extension can satisfactorily capture the dynamics of node popularity as observed in empirical networks. We propose a popularity-adjusted block model for flexible and realistic modelling of node popularity. We establish consistency of likelihood modularity for community detection as well as estimation of node popularities and model parameters, and demonstrate the advantages of the new modularity over the degree-corrected block model modularity in simulations. By analysing the political blogs network, the British Members of Parliament network and the ‘Digital bibliography and library project’ bibliographical network, we illustrate that improved empirical insights can be gained through this methodology.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a matrix variate regression model for high dimensional data, where the response on each unit is a random matrix and the predictor X can be either a scalar, a vector or a matrix, treated as non-stochastic in terms of the conditional distribution Y|X.
Abstract: Summary Modern technology often generates data with complex structures in which both response and explanatory variables are matrix valued. Existing methods in the literature can tackle matrix-valued predictors but are rather limited for matrix-valued responses. We study matrix variate regressions for such data, where the response Y on each experimental unit is a random matrix and the predictor X can be either a scalar, a vector or a matrix, treated as non-stochastic in terms of the conditional distribution Y|X. We propose models for matrix variate regressions and then develop envelope extensions of these models. Under the envelope framework, redundant variation can be eliminated in estimation and the number of parameters can be notably reduced when the matrix variate dimension is large, possibly resulting in significant gains in efficiency. The methods proposed are applicable to high dimensional settings.

Journal ArticleDOI
TL;DR: In this paper, the authors extend significance analysis of microarrays by providing 1−α upper confidence bounds for the FDP, so that exact confidence statements can be made, and using a closed testing procedure, they decrease the upper bounds and estimates in such a way that the confidence level is maintained.
Abstract: Summary Significance analysis of microarrays is a highly popular permutation-based multiple-testing method that estimates the false discovery proportion (FDP): the fraction of false positive results among all rejected hypotheses. Perhaps surprisingly, until now this method had no known properties. This paper extends significance analysis of microarrays by providing 1−α upper confidence bounds for the FDP, so that exact confidence statements can be made. As a special case, an estimate of the FDP is obtained that underestimates the FDP with probability at most 0.5. Moreover, using a closed testing procedure, this paper decreases the upper bounds and estimates in such a way that the confidence level is maintained. We base our methods on a general result on exact testing with random permutations.

Journal ArticleDOI
TL;DR: A novel thrifty algorithm for solving standard DWD and generalized DWD is proposed, and it can be several hundred times faster than the existing state of the art algorithm based on second‐order cone programming.
Abstract: Summary Distance-weighted discrimination (DWD) is a modern margin-based classifier with an interesting geometric motivation. It was proposed as a competitor to the support vector machine (SVM). Despite many recent references on DWD, DWD is far less popular than the SVM, mainly because of computational and theoretical reasons. We greatly advance the current DWD methodology and its learning theory. We propose a novel thrifty algorithm for solving standard DWD and generalized DWD, and our algorithm can be several hundred times faster than the existing state of the art algorithm based on second-order cone programming. In addition, we exploit the new algorithm to design an efficient scheme to tune generalized DWD. Furthermore, we formulate a natural kernel DWD approach in a reproducing kernel Hilbert space and then establish the Bayes risk consistency of the kernel DWD by using a universal kernel such as the Gaussian kernel. This result solves an open theoretical problem in the DWD literature. A comparison study on 16 benchmark data sets shows that data-driven generalized DWD consistently delivers higher classification accuracy with less computation time than the SVM.

Journal ArticleDOI
TL;DR: In this paper, a two-stage computational framework is proposed to solve the sparse generalized eigenvalue problem (GEP) by solving a convex relaxation of the sparse GEP.
Abstract: The sparse generalized eigenvalue problem (GEP) plays a pivotal role in a large family of high dimensional statistical models, including sparse Fisher's discriminant analysis, canonical correlation analysis and sufficient dimension reduction. The sparse GEP involves solving a non‐convex optimization problem. Most existing methods and theory in the context of specific statistical models that are special cases of the sparse GEP require restrictive structural assumptions on the input matrices. We propose a two‐stage computational framework to solve the sparse GEP. At the first stage, we solve a convex relaxation of the sparse GEP. Taking the solution as an initial value, we then exploit a non‐convex optimization perspective and propose the truncated Rayleigh flow method (which we call ‘rifle’) to estimate the leading generalized eigenvector. We show that rifle converges linearly to a solution with the optimal statistical rate of convergence. Theoretically, our method significantly improves on the existing literature by eliminating structural assumptions on the input matrices. To achieve this, our analysis involves two key ingredients: a new analysis of the gradient‐based method on non‐convex objective functions, and a fine‐grained characterization of the evolution of sparsity patterns along the solution path. Thorough numerical studies are provided to validate the theoretical results.

Journal ArticleDOI
TL;DR: In this paper, a variant of Expectation Propagation (EP) called averaged EP is introduced, which operates on a smaller parameter space and is shown to behave like iterations of Newton's algorithm for finding the mode of a function.
Abstract: Summary Expectation propagation (EP) is a widely successful algorithm for variational inference. EP is an iterative algorithm used to approximate complicated distributions, typically to find a Gaussian approximation of posterior distributions. In many applications of this type, EP performs extremely well. Surprisingly, despite its widespread use, there are very few theoretical guarantees on Gaussian EP, and it is quite poorly understood. To analyse EP, we first introduce a variant of EP: averaged EP, which operates on a smaller parameter space. We then consider averaged EP and EP in the limit of infinite data, where the overall contribution of each likelihood term is small and where posteriors are almost Gaussian. In this limit, we prove that the iterations of both averaged EP and EP are simple: they behave like iterations of Newton's algorithm for finding the mode of a function. We use this limit behaviour to prove that EP is asymptotically exact, and to obtain other insights into the dynamic behaviour of EP, e.g. that it may diverge under poor initialization exactly like Newton's method. EP is a simple algorithm to state, but a difficult one to study. Our results should facilitate further research into the theoretical properties of this important method.

Journal ArticleDOI
TL;DR: It is argued that residual prediction tests can be designed to test for as diverse model misspecifications as heteroscedasticity and non‐linearity and that some form of the parametric bootstrap can do the same when the high dimensional linear model is under consideration.
Abstract: Rajen Shah was supported in part by the Forschungsinstitut fur Mathematik at the Eidgenossiche Technische Hochschule Zurich.

Journal ArticleDOI
TL;DR: A novel inference methodology to perform Bayesian inference for spatiotemporal Cox processes where the intensity function depends on a multivariate Gaussian process that samples from the joint posterior distribution of the parameters and latent variables of the model.
Abstract: Summary We present a novel inference methodology to perform Bayesian inference for spatiotemporal Cox processes where the intensity function depends on a multivariate Gaussian process. Dynamic Gaussian processes are introduced to enable evolution of the intensity function over discrete time. The novelty of the method lies on the fact that no discretization error is involved despite the non-tractability of the likelihood function and infinite dimensionality of the problem. The method is based on a Markov chain Monte Carlo algorithm that samples from the joint posterior distribution of the parameters and latent variables of the model. A particular choice of the dominating measure to obtain the likelihood function is shown to be crucial to devise a valid Markov chain Monte Carlo algorithm. The models are defined in a general and flexible way but they are amenable to direct sampling from the relevant distributions because of careful characterization of its components. The models also enable the inclusion of regression covariates and/or temporal components to explain the variability of the intensity function. These components may be subject to relevant interaction with space and/or time. Real and simulated examples illustrate the methodology, followed by concluding remarks.

Journal ArticleDOI
TL;DR: This paper considers patient's heterogeneity caused by groupwise individualized treatment effects assuming the same marginal treatment effects for all groups, and proposes a new maximin-projection learning for estimating a single treatment decision rule that works reliably for a group of future patients from a possibly new subpopulation.
Abstract: A salient feature of data from clinical trials and medical studies is inhomogeneity. Patients not only differ in baseline characteristics, but also in the way that they respond to treatment. Optimal individualized treatment regimes are developed to select effective treatments based on patient's heterogeneity. However, the optimal treatment regime might also vary for patients across different subgroups. We mainly consider patients’ heterogeneity caused by groupwise individualized treatment effects assuming the same marginal treatment effects for all groups. We propose a new maximin projection learning method for estimating a single treatment decision rule that works reliably for a group of future patients from a possibly new subpopulation. Based on estimated optimal treatment regimes for all subgroups, the proposed maximin treatment regime is obtained by solving a quadratically constrained linear programming problem, which can be efficiently computed by interior point methods. Consistency and asymptotic normality of the estimator are established. Numerical examples show the reliability of the methodology proposed.

Journal ArticleDOI
TL;DR: In this article, the authors introduce a new family of Markov chain Monte Carlo samplers that combine auxiliary variables, Gibbs sampling and Taylor expansions of the target density for hyperparameter learning.
Abstract: Summary We introduce a new family of Markov chain Monte Carlo samplers that combine auxiliary variables, Gibbs sampling and Taylor expansions of the target density. Our approach permits the marginalization over the auxiliary variables, yielding marginal samplers, or the augmentation of the auxiliary variables, yielding auxiliary samplers. The well-known Metropolis-adjusted Langevin algorithm MALA and preconditioned Crank–Nicolson–Langevin algorithm pCNL are shown to be special cases. We prove that marginal samplers are superior in terms of asymptotic variance and demonstrate cases where they are slower in computing time compared with auxiliary samplers. In the context of latent Gaussian models we propose new auxiliary and marginal samplers whose implementation requires a single tuning parameter, which can be found automatically during the transient phase. Extensive experimentation shows that the increase in efficiency (measured as the effective sample size per unit of computing time) relative to (optimized implementations of) pCNL, elliptical slice sampling and MALA ranges from tenfold in binary classification problems to 25 fold in log-Gaussian Cox processes to 100 fold in Gaussian process regression, and it is on a par with Riemann manifold Hamiltonian Monte Carlo sampling in an example where that algorithm has the same complexity as the aforementioned algorithms. We explain this remarkable improvement in terms of the way that alternative samplers try to approximate the eigenvalues of the target. We introduce a novel Markov chain Monte Carlo sampling scheme for hyperparameter learning that builds on the auxiliary samplers. The MATLAB code for reproducing the experiments in the paper is publicly available and an on-line supplement to this paper contains additional experiments and implementation details.

Journal ArticleDOI
TL;DR: In this article, an improved variance estimator is proposed for fine-grained experiments with multiple treated and control individuals in each block, which is based on the classical least square theory.
Abstract: Although attractive from a theoretical perspective, finely stratified experiments such as paired designs suffer from certain analytical limitations that are not present in block‐randomized experiments with multiple treated and control individuals in each block. In short, when using a weighted difference in means to estimate the sample average treatment effect, the traditional variance estimator in a paired experiment is conservative unless the pairwise average treatment effects are constant across pairs; however, in more coarsely stratified experiments, the corresponding variance estimator is unbiased if treatment effects are constant within blocks, even if they vary across blocks. Using insights from classical least squares theory, we present an improved variance estimator that is appropriate in finely stratified experiments. The variance estimator remains conservative in expectation but is asymptotically no more conservative than the classical estimator and can be considerably less conservative. The magnitude of the improvement depends on the extent to which effect heterogeneity can be explained by observed covariates. Aided by this estimator, a new test for the null hypothesis of a constant treatment effect is proposed. These findings extend to some, but not all, superpopulation models, depending on whether the covariates are viewed as fixed across samples.

Journal ArticleDOI
TL;DR: In this paper, conditional independence relationships for random networks and their interplay with exchangeability are studied and a new class of Markov network models corresponding to bidirected Kneser graphs are identified.
Abstract: We study conditional independence relationships for random networks and their interplay with exchangeability. We show that, for finitely exchangeable network models, the empirical subgraph densities are maximum likelihood estimates of their theoretical counterparts. We then characterize all possible Markov structures for finitely exchangeable random graphs, thereby identifying a new class of Markov network models corresponding to bidirected Kneser graphs. In particular, we demonstrate that the fundamental property of dissociatedness corresponds to a Markov property for exchangeable networks described by bidirected line graphs. Finally we study those exchangeable models that are also summarized in the sense that the probability of a network depends only on the degree distribution, and we identify a class of models that is dual to the Markov graphs of Frank and Strauss. Particular emphasis is placed on studying consistency properties of network models under the process of forming subnetworks and we show that the only consistent systems of Markov properties correspond to the empty graph, the bidirected line graph of the complete graph and the complete graph.

Journal ArticleDOI
TL;DR: This work employs non‐convex penalization to tackle the estimation of multiple graphs from matrix‐valued data under a matrix normal distribution and establishes the asymptotic properties of the estimator, which requires less stringent conditions and has a sharper probability error bound than existing results.
Abstract: Matrix-valued data, where the sampling unit is a matrix consisting of rows and columns of measurements, are emerging in numerous scientific and business applications. Matrix Gaussian graphical model is a useful tool to characterize the conditional dependence structure of rows and columns. In this article, we employ nonconvex penalization to tackle the estimation of multiple graphs from matrix-valued data under a matrix normal distribution. We propose a highly efficient nonconvex optimization algorithm that can scale up for graphs with hundreds of nodes. We establish the asymptotic properties of the estimator, which requires less stringent conditions and has a sharper probability error bound than existing results. We demonstrate the efficacy of our proposed method through both simulations and real functional magnetic resonance imaging analyses.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a new paradigm for evaluating confidence regions by showing that the distance between an estimated region and the desired region (with proper coverage) tends to 0 faster than the regions shrink to a point.
Abstract: Summary Functional data analysis is now a well-established discipline of statistics, with its core concepts and perspectives in place. Despite this, there are still fundamental statistical questions which have received relatively little attention. One of these is the systematic construction of confidence regions for functional parameters. This work is concerned with developing, understanding and visualizing such regions. We provide a general strategy for constructing confidence regions in a real separable Hilbert space by using hyperellipsoids and hyper-rectangles. We then propose specific implementations which work especially well in practice. They provide powerful hypothesis tests and useful visualization tools without relying on simulation. We also demonstrate the negative result that nearly all regions, including our own, have zero coverage when working with empirical covariances. To overcome this challenge we propose a new paradigm for evaluating confidence regions by showing that the distance between an estimated region and the desired region (with proper coverage) tends to 0 faster than the regions shrink to a point. We call this phenomena ghosting and refer to the empirical regions as ghost regions. We illustrate the proposed methods in a simulation study and an application to fractional anisotropy tract profile data.

Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of testing functional constraints in a class of functional concurrent linear models where both the predictors and the response are functional data measured at discrete time points.
Abstract: Summary We consider the problem of testing functional constraints in a class of functional concurrent linear models where both the predictors and the response are functional data measured at discrete time points. We propose test procedures based on the empirical likelihood with bias-corrected estimating equations to conduct both pointwise and simultaneous inferences. The asymptotic distributions of the test statistics are derived under the null and local alternative hypotheses, where sparse and dense functional data are considered in a unified framework. We find a phase transition in the asymptotic null distributions and the orders of detectable alternatives from sparse to dense functional data. Specifically, the tests proposed can detect alternatives of √n-order when the number of repeated measurements per curve is of an order larger than nη0 with n being the number of curves. The transition points η0 for pointwise and simultaneous tests are different and both are smaller than the transition point in the estimation problem. Simulation studies and real data analyses are conducted to demonstrate the methods proposed.

Journal ArticleDOI
TL;DR: In this article, a hybrid quantile regression approach for the generalized auto-regressive conditional heteroscedastic (GARCH) model is proposed, which takes advantage of the efficiency of the GARCH model in modelling the volatility globally as well as the flexibility of quantiles regression in fitting quantiles at a specific level.
Abstract: Estimating conditional quantiles of financial time series is essential for risk management and many other financial applications For time series models with conditional heteroscedasticity, although it is the generalized auto‐regressive conditional heteroscedastic (GARCH) model that has the greatest popularity, quantile regression for this model usually gives rise to non‐smooth non‐convex optimization which may hinder its practical feasibility The paper proposes an easy‐to‐implement hybrid quantile regression estimation procedure for the GARCH model, where we overcome the intractability due to the square‐root form of the conditional quantile function by a simple transformation The method takes advantage of the efficiency of the GARCH model in modelling the volatility globally as well as the flexibility of quantile regression in fitting quantiles at a specific level The asymptotic distribution of the estimator is derived and is approximated by a novel mixed bootstrapping procedure A portmanteau test is further constructed to check the adequacy of fitted conditional quantiles The finite sample performance of the method is examined by simulation studies, and its advantages over existing methods are illustrated by an empirical application to value‐at‐risk forecasting