scispace - formally typeset
Search or ask a question

Showing papers in "Biometrika in 2019"


Journal ArticleDOI
TL;DR: In this paper, a multivariate framework for terminating simulation in MCMC is presented, which requires strongly consistent estimators of the covariance matrix in the Markov chain central limit theorem (CLT), and a lower bound on the number of minimum effective samples required for a desired level of precision.
Abstract: Markov chain Monte Carlo (MCMC) produces a correlated sample for estimating expectations with respect to a target distribution. A fundamental question is when should sampling stop so that we have good estimates of the desired quantities? The key to answering this question lies in assessing the Monte Carlo error through a multivariate Markov chain central limit theorem (CLT). The multivariate nature of this Monte Carlo error largely has been ignored in the MCMC literature. We present a multivariate framework for terminating simulation in MCMC. We define a multivariate effective sample size, estimating which requires strongly consistent estimators of the covariance matrix in the Markov chain CLT; a property we show for the multivariate batch means estimator. We then provide a lower bound on the number of minimum effective samples required for a desired level of precision. This lower bound depends on the problem only in the dimension of the expectation being estimated, and not on the underlying stochastic process. This result is obtained by drawing a connection between terminating simulation via effective sample size and terminating simulation using a relative standard deviation fixed-volume sequential stopping rule; which we demonstrate is an asymptotically valid procedure. The finite sample properties of the proposed method are demonstrated in a variety of examples.

219 citations


Journal ArticleDOI
TL;DR: In this paper, the authors extend the methodology of knockoffs to problems where the distribution of the covariates can be described by a hidden Markov model, and they develop an exact and efficient algorithm to sample knockoff variables in this setting and then argue that this provides a natural and powerful tool for inference in genome-wide association studies with guaranteed false discovery rate control.
Abstract: Modern scientific studies often require the identification of a subset of explanatory variables. Several statistical methods have been developed to automate this task, and the framework of knockoffs has been proposed as a general solution for variable selection under rigorous Type I error control, without relying on strong modelling assumptions. In this paper, we extend the methodology of knockoffs to problems where the distribution of the covariates can be described by a hidden Markov model. We develop an exact and efficient algorithm to sample knockoff variables in this setting and then argue that, combined with the existing selective framework, this provides a natural and powerful tool for inference in genome-wide association studies with guaranteed false discovery rate control. We apply our method to datasets on Crohn's disease and some continuous phenotypes.

127 citations


Journal ArticleDOI
TL;DR: This work proposes a test of independence of two multivariate random vectors, given a sample from the underlying population, based on the estimation of mutual information, whose decomposition into joint and marginal entropies facilitates the use of recently-developed efficient entropy estimators derived from nearest neighbour distances.
Abstract: We propose a test of independence of two multivariate random vectors, given a sample from the underlying population. Our approach, which we call MINT, is based on the estimation of mutual information, whose decomposition into joint and marginal entropies facilitates the use of recently-developed efficient entropy estimators derived from nearest neighbour distances. The proposed critical values, which may be obtained from simulation (in the case where one marginal is known) or resampling, guarantee that the test has nominal size, and we provide local power analyses, uniformly over classes of densities whose mutual information satisfies a lower bound. Our ideas may be extended to provide a new goodness-of-fit tests of normal linear models based on assessing the independence of our vector of covariates and an appropriately-defined notion of an error vector. The theory is supported by numerical studies on both simulated and real data.

83 citations


Journal ArticleDOI
TL;DR: In this paper, a class of weighting methods that find the weights of minimum dispersion that approximately balance the covariates is studied and shown to be consistent, asymptotically normal, and semiparametrically efficient.
Abstract: Weighting methods are widely used to adjust for covariates in observational studies, sample surveys, and regression settings. In this paper, we study a class of recently proposed weighting methods which find the weights of minimum dispersion that approximately balance the covariates. We call these weights "minimal weights" and study them under a common optimization framework. The key observation is the connection between approximate covariate balance and shrinkage estimation of the propensity score. This connection leads to both theoretical and practical developments. From a theoretical standpoint, we characterize the asymptotic properties of minimal weights and show that, under standard smoothness conditions on the propensity score function, minimal weights are consistent estimates of the true inverse probability weights. Also, we show that the resulting weighting estimator is consistent, asymptotically normal, and semiparametrically efficient. From a practical standpoint, we present a finite sample oracle inequality that bounds the loss incurred by balancing more functions of the covariates than strictly needed. This inequality shows that minimal weights implicitly bound the number of active covariate balance constraints. We finally provide a tuning algorithm for choosing the degree of approximate balance in minimal weights. We conclude the paper with four empirical studies that suggest approximate balance is preferable to exact balance, especially when there is limited overlap in covariate distributions. In these studies, we show that the root mean squared error of the weighting estimator can be reduced by as much as a half with approximate balance.

76 citations


Journal ArticleDOI
TL;DR: In this article, the authors formalize the concept of a conditioning mechanism, which provides a framework for constructing valid and powerful randomization tests under general forms of interference, and apply their approach to a randomized evaluation of an intervention targeting student absenteeism in the school district of Philadelphia.
Abstract: Many causal questions involve interactions between units, also known as interference, for example between individuals in households, students in schools, or firms in markets. In this paper we formalize the concept of a conditioning mechanism, which provides a framework for constructing valid and powerful randomization tests under general forms of interference. We describe our framework in the context of twostage randomized designs and apply our approach to a randomized evaluation of an intervention targeting student absenteeism in the school district of Philadelphia. We show improvements over existing methods in terms of computational and statistical power.

70 citations



Journal ArticleDOI
TL;DR: In this paper, the authors derive a central limit theorem for the Frechet variance under mild regularity conditions, and also provide a consistent estimator of the asymptotic variance.
Abstract: SummaryFrechet mean and variance provide a way of obtaining a mean and variance for metric space-valued random variables, and can be used for statistical analysis of data objects that lie in abstract spaces devoid of algebraic structure and operations. Examples of such data objects include covariance matrices, graph Laplacians of networks and univariate probability distribution functions. We derive a central limit theorem for the Frechet variance under mild regularity conditions, using empirical process theory, and also provide a consistent estimator of the asymptotic variance. These results lead to a test for comparing $k$ populations of metric space-valued data objects in terms of Frechet means and variances. We examine the finite-sample performance of this novel inference procedure through simulation studies on several special cases that include probability distributions and graph Laplacians, leading to a test for comparing populations of networks. The proposed approach has good finite-sample performance in simulations for different kinds of random objects. We illustrate the proposed methods by analysing data on mortality profiles of various countries and resting-state functional magnetic resonance imaging data.

50 citations


Journal ArticleDOI
TL;DR: A transformation is used to define a vector space on the positive orthant and it is shown that transformed-linear operations applied to regularly-varying random vectors preserve regular variation.
Abstract: Employing the framework of regular variation, we propose two decompositions which help to summarize and describel high-dimensional tail dependence. Via transformation, we define a vector space on the positive orthant, yielding the notion of basis. With a suitably-chosen transformation, we show that transformed-linear operations applied to regularly varying random vectors preserve regular variation. Rather than model regular-variation's angular measure, we summarize tail dependence via a matrix of pairwise tail dependence metrics. This matrix is positive semidefinite, and eigendecomposition allows one to interpret tail dependence via the resulting eigenbasis. Additionally this matrix is completely positive, and a resulting decomposition allows one to easily construct regularly varying random vectors which share the same pairwise tail dependencies. We illustrate our methods with Swiss rainfall data and financial return data.

49 citations


Journal ArticleDOI
TL;DR: The choice of algorithmic parameters and the efficiency of the proposed approach are illustrated on a logistic regression with 300 covariates and a log-Gaussian Cox point processes model with low- to fine-grained discretizations.
Abstract: We propose a methodology to parallelize Hamiltonian Monte Carlo estimators. Our approach constructs a pair of Hamiltonian Monte Carlo chains that are coupled in such a way that they meet exactly after some random number of iterations. These chains can then be combined so that resulting estimators are unbiased. This allows us to produce independent replicates in parallel and average them to obtain estimators that are consistent in the limit of the number of replicates, instead of the usual limit of the number of Markov chain iterations. We investigate the scalability of our coupling in high dimensions on a toy example. The choice of algorithmic parameters and the efficiency of our proposed methodology are then illustrated on a logistic regression with 300 covariates, and a log-Gaussian Cox point processes model with low to fine grained discretizations.

49 citations


Journal ArticleDOI
TL;DR: A scalar tuning parameter is introduced that controls the posterior distribution spread, and a Monte Carlo algorithm is developed that sets this parameter so that the corresponding credible region achieves the nominal frequentist coverage probability.
Abstract: An advantage of methods that base inference on a posterior distribution is that credible regions are readily obtained. Except in well-specified situations, however, there is no guarantee that such regions will achieve the nominal frequentist coverage probability, even approximately. To overcome this difficulty, we propose a general strategy that introduces an additional scalar tuning parameter to control the posterior spread, and we develop an algorithm that chooses this parameter so that the corresponding credible region achieves the nominal coverage probability.

49 citations


Journal ArticleDOI
TL;DR: In this article, a generalized meta-analysis approach is proposed to combine information on multivariate regression parameters across multiple studies that have varying levels of covariate information using algebraic relationships among regression parameters in different dimensions.
Abstract: Meta-analysis is widely popular for synthesizing information on common parameters of interest across multiple studies because of its logistical convenience and statistical efficiency. We develop a generalized meta-analysis approach to combining information on multivariate regression parameters across multiple studies that have varying levels of covariate information. Using algebraic relationships among regression parameters in different dimensions, we specify a set of moment equations for estimating parameters of a maximal model through information available from sets of parameter estimates for a series of reduced models from the different studies. The specification of the equations requires a reference dataset for estimating the joint distribution of the covariates. We propose to solve these equations using the generalized method of moments approach, with the optimal weighting of the equations taking into account uncertainty associated with estimates of the parameters of the reduced models. We describe extensions of the iterated reweighted least-squares algorithm for fitting generalized linear regression models using the proposed framework. Based on the same moment equations, we also develop a diagnostic test for detecting violations of underlying model assumptions, such as those arising from heterogeneity in the underlying study populations. The proposed methods are illustrated with extensive simulation studies and a real-data example involving the development of a breast cancer risk prediction model using disparate risk factor information from multiple studies.

Journal ArticleDOI
TL;DR: In this paper, confidence upper bounds for the false discovery proportion are constructed, which are simultaneous over all rejection cut-offs, and a generalization of the method is provided that lets the user select the shape of the simultaneous confidence bounds; this gives the user more freedom in determining the power properties of the methods.
Abstract: When multiple hypotheses are tested, interest is often in ensuring that the proportion of false discoveries (FDP) is small with high confidence. In this paper, confidence upper bounds for the FDP are constructed, which are simultaneous over all rejection cut-offs. In particular this allows the user to select a set of hypotheses post hoc such that the FDP lies below some constant with high confidence. Our method uses permutations to account for the dependence structure in the data. So far only Meinshausen provided an exact, permutation-based and computationally feasible method for simultaneous FDP bounds. We provide an exact method, which uniformly improves this procedure. Further, we provide a generalization of this method. It lets the user select the shape of the simultaneous confidence bounds. This gives the user more freedom in determining the power properties of the method. Interestingly, several existing permutation methods, such as Significance Analysis of Microarrays (SAM) and Westfall and Young's maxT method, are obtained as special cases.

Journal ArticleDOI
TL;DR: It is proved that the posterior distribution for the probit coefficients has a unified skew-normal kernel under Gaussian priors, which allows efficient Bayesian inference for a wide class of applications, especially in large-$p and small-to-moderate-$n settings where state-of-the-art computational methods face notable challenges.
Abstract: Regression models for dichotomous data are ubiquitous in statistics. Besides being useful for inference on binary responses, these methods serve also as building blocks in more complex formulations, such as density regression, nonparametric classification and graphical models. Within the Bayesian framework, inference proceeds by updating the priors for the coefficients, typically set to be Gaussians, with the likelihood induced by probit or logit regressions for the responses. In this updating, the apparent absence of a tractable posterior has motivated a variety of computational methods, including Markov Chain Monte Carlo routines and algorithms which approximate the posterior. Despite being routinely implemented, Markov Chain Monte Carlo strategies face mixing or time-inefficiency issues in large p and small n studies, whereas approximate routines fail to capture the skewness typically observed in the posterior. This article proves that the posterior distribution for the probit coefficients has a unified skew-normal kernel, under Gaussian priors. Such a novel result allows efficient Bayesian inference for a wide class of applications, especially in large p and small-to-moderate n studies where state-of-the-art computational methods face notable issues. These advances are outlined in a genetic study, and further motivate the development of a wider class of conjugate priors for probit models along with methods to obtain independent and identically distributed samples from the unified skew-normal posterior.

Journal ArticleDOI
TL;DR: For the special case of closed testing with Simes local tests, the authors showed that the average power to detect false hypotheses at any desired FDP level does not vanish, and that the confidence bounds for FDP are consistent estimators for the true FDP for every non vanishing subset.
Abstract: Closed testing procedures are classically used for familywise error rate (FWER) control, but they can also be used to obtain simultaneous confidence bounds for the false discovery proportion (FDP) in all subsets of the hypotheses. In this paper we investigate the special case of closed testing with Simes local tests. We construct a novel fast and exact shortcut which we use to investigate the power of this method when the number of hypotheses goes to infinity. We show that, if a minimal amount of signal is present, the average power to detect false hypotheses at any desired FDP level does not vanish. Additionally, we show that the confidence bounds for FDP are consistent estimators for the true FDP for every non-vanishing subset. For the case of a finite number of hypotheses, we show connections between Simes-based closed testing and the procedure of Benjamini and Hochberg.

Journal ArticleDOI
TL;DR: In this article, the authors characterize the behavior of perturbed eigenvectors for a range of signal-plus-noise matrix models encountered in both instatistical and random-matrix-theoretic settings.
Abstract: Summary Estimating eigenvectors and low-dimensional subspaces is of central importance for numerous problems in statistics, computer science and applied mathematics. In this paper we characterize the behaviour of perturbed eigenvectors for a range of signal-plus-noise matrix models encountered instatistical and random-matrix-theoretic settings. We establish both first-order approximation results, i.e., sharp deviations, and second-order distributional limit theory, i.e., fluctuations. The concise methodology presented in this paper synthesizes tools rooted in two core concepts, namely deterministic decompositions of matrix perturbations and probabilistic matrix concentration phenomena. We illustrate our theoretical results with simulation examples involving stochastic block model random graphs.

Journal ArticleDOI
TL;DR: In this paper, a debiased version of the Whittle likelihood is proposed for second-order stationary stochastic processes, which can be computed in O(n log n) time.
Abstract: The Whittle likelihood is a widely used and computationally efficient pseudolikelihood. However, it is known to produce biased parameter estimates with finite sample sizes for large classes of models. We propose a method for debiasing Whittle estimates for second-order stationary stochastic processes. The debiased Whittle likelihood can be computed in the same O(n log n) operations as the standard Whittle approach. We demonstrate the superior performance of our method in simulation studies and in application to a large-scale oceanographic dataset, where in both cases the debiased approach reduces bias by up to two orders of magnitude, achieving estimates that are close to those of the exact maximum likelihood, at a fraction of the computational cost. We prove that the method yields estimates that are consistent at an optimal convergence rate of n(-1/2) for Gaussian processes and for certain classes of non-Gaussian or nonlinear processes. This is established under weaker assumptions than in the standard theory, and in particular the power spectral density is not required to be continuous in frequency. We describe how the method can be readily combined with standard methods of bias reduction, such as tapering and differencing, to further reduce bias in parameter estimates.

Journal ArticleDOI
TL;DR: In this paper, the optimality of the uniform design measure is established via the approximate theory for a broad range of criteria, and the closed-form construction of a class of robust optimal fractional designs is explored and illustrated.
Abstract: In an order-of-addition experiment, each treatment is a permutation of m components. It is often unaffordable to test all the m! treatments, and the design problem arises. We consider a model that incorporates the order of each pair of components and can also account for the distance between the two components in every such pair. Under this model, the optimality of the uniform design measure is established, via the approximate theory, for a broad range of criteria. Coupled with an eigen-analysis, this result serves as a benchmark that paves the way for assessing the efficiency and robustness of any exact design. The closed-form construction of a class of robust optimal fractional designs is then explored and illustrated.

Journal ArticleDOI
TL;DR: This paper showed that causal structure can be uniquely identified from observational data when these follow a structural equation model whose error terms have equal variance and showed that this fact is implied by an ordering among conditional variances.
Abstract: Prior work has shown that causal structure can be uniquely identified from observational data when these follow a structural equation model whose error terms have equal variance We show that this fact is implied by an ordering among conditional variances We demonstrate that ordering estimates of these variances yields a simple yet state-of-the-art method for causal structure learning that is readily extendable to high-dimensional problems

Journal ArticleDOI
TL;DR: An experimental design strategy for testing whether the classic assumption of no interference among users, under which the outcome of one user does not depend on the treatment assigned to other users, is rarely tenable on such platforms is introduced.
Abstract: SummaryExperimentation platforms are essential to large modern technology companies, as they are used to carry out many randomized experiments daily. The classic assumption of no interference among users, under which the outcome for one user does not depend on the treatment assigned to other users, is rarely tenable on such platforms. Here, we introduce an experimental design strategy for testing whether this assumption holds. Our approach is in the spirit of the Durbin–Wu–Hausman test for endogeneity in econometrics, where multiple estimators return the same estimate if and only if the null hypothesis holds. The design that we introduce makes no assumptions on the interference model between units, nor on the network among the units, and has a sharp bound on the variance and an implied analytical bound on the Type I error rate. We discuss how to apply the proposed design strategy to large experimentation platforms, and we illustrate it in the context of an experiment on the LinkedIn platform.

Journal ArticleDOI
TL;DR: GreedyExperimentalDesign as mentioned in this paper is an experimental design procedure that divides a set of experimental units into two groups so that the two groups are balanced on a prespecified set of covariates and being almost as random as complete randomization.
Abstract: We present a new experimental design procedure that divides a set of experimental units into two groups so that the two groups are balanced on a prespecified set of covariates and being almost as random as complete randomization. Under complete randomization, the difference in covariate balance as measured by the standardized average between treatment and control will be $O_p(n^{-1/2})$. If the sample size is not too large this may be material. In this article, we present an algorithm which greedily switches assignment pairs. Resultant designs produce balance of the much lower order $O_p(n^{-3})$ for one covariate. However, our algorithm creates assignments which are, strictly speaking, non-random. We introduce two metrics which capture departures from randomization: one in the style of entropy and one in the style of standard error and demonstrate our assignments are nearly as random as complete randomization in terms of both measures. The results are extended to more than one covariate, simulations are provided to illustrate the results and statistical inference under our design is discussed. We provide an open source R package available on CRAN called GreedyExperimentalDesign which generates designs according to our algorithm.

Journal ArticleDOI
TL;DR: Methods for estimating the spectral density of a random field on a [Formula: see text]-dimensional lattice from incomplete gridded data and a parametric filtering method that is designed to reduce periodogram smoothing bias are introduced.
Abstract: We introduce methods for estimating the spectral density of a random field on a [Formula: see text]-dimensional lattice from incomplete gridded data. Data are iteratively imputed onto an expanded lattice according to a model with a periodic covariance function. The imputations are convenient computationally, in that circulant embedding and preconditioned conjugate gradient methods can produce imputations in [Formula: see text] time and [Formula: see text] memory. However, these so-called periodic imputations are motivated mainly by their ability to produce accurate spectral density estimates. In addition, we introduce a parametric filtering method that is designed to reduce periodogram smoothing bias. The paper contains theoretical results on properties of the imputed-data periodogram and numerical and simulation studies comparing the performance of the proposed methods to existing approaches in a number of scenarios. We present an application to a gridded satellite surface temperature dataset with missing values.

Journal ArticleDOI
TL;DR: In this article, rank-based functionals of Cramer-von Mises type were proposed for testing the hypothesis that arbitrary random variables are mutually independent, which can be used for contingency tables which are sparse or whose dimension depends on the sample size.
Abstract: SUMMARY Statistics are proposed for testing the hypothesis that arbitrary random variables are mutually independent. The tests are consistent and well behaved for any marginal distributions; they can be used, for example, for contingency tables which are sparse or whose dimension depends on the sample size, as well as for mixed data. No regularity conditions, data jittering, or binning mechanisms are required. The statistics are rank-based functionals of Cramer–von Mises type whose asymptotic behaviour derives from the empirical multilinear copula process. Approximate $p$-values are computed using a wild bootstrap. The procedures are simple to implement and computationally efficient, and maintain their level well in moderate to large samples. Simulations suggest that the tests are robust with respect to the number of ties in the data, can easily detect a broad range of alternatives, and outperform existing procedures in many settings. Additional insight into their performance is provided through asymptotic local power calculations under contiguous alternatives. The procedures are illustrated on traumatic brain injury data.

Journal ArticleDOI
TL;DR: A flexible Markov random field model for microbial network structure is proposed and a hypothesis testing framework for detecting differences between networks, also known as differential network analysis is introduced, which is particularly powerful against sparse alternatives.
Abstract: Micro-organisms such as bacteria form complex ecological community networks that can be greatly influenced by diet and other environmental factors. Differential analysis of microbial community structures aims to elucidate systematic changes during an adaptive response to changes in environment. In this paper, we propose a flexible Markov random field model for microbial network structure and introduce a hypothesis testing framework for detecting differences between networks, also known as differential network analysis. Our global test for differential networks is particularly powerful against sparse alternatives. In addition, we develop a multiple testing procedure with false discovery rate control to identify the structure of the differential network. The proposed method is applied to data from a gut microbiome study on U.K. twins to evaluate how age affects the microbial community network.

Journal ArticleDOI
TL;DR: This work proves that existing methods to account for unobserved covariates often inflate Type I error, and provides alternative estimators for the coefficients of interest that correct the inflation, and proves that their estimators are asymptotically equivalent to the ordinary least squares estimators obtained when every covariate is observed.
Abstract: An important phenomenon in high-throughput biological data is the presence of unobserved covariates that can have a significant impact on the measured response. When these covariates are also correlated with the covariate of interest, ignoring or improperly estimating them can lead to inaccurate estimates of and spurious inference on the corresponding coefficients of interest in a multivariate linear model. We first prove that existing methods to account for these unobserved covariates often inflate Type I error for the null hypothesis that a given coefficient of interest is zero. We then provide alternative estimators for the coefficients of interest that correct the inflation, and prove that our estimators are asymptotically equivalent to the ordinary least squares estimators obtained when every covariate is observed. Lastly, we use previously published DNA methylation data to show that our method can more accurately estimate the direct effect of asthma on DNA methylation levels compared to existing methods, the latter of which likely fail to recover and account for latent cell type heterogeneity.

Journal ArticleDOI
TL;DR: In this paper, the natural lasso estimator for the error variance is proposed, where the likelihood is expressed in terms of the natural parameterization of the multiparameter exponential family of a Gaussian with unknown mean and variance.
Abstract: The lasso has been studied extensively as a tool for estimating the coefficient vector in the high-dimensional linear model; however, considerably less is known about estimating the error variance in this context. In this paper, we propose the natural lasso estimator for the error variance, which maximizes a penalized likelihood objective. A key aspect of the natural lasso is that the likelihood is expressed in terms of the natural parameterization of the multiparameter exponential family of a Gaussian with unknown mean and variance. The result is a remarkably simple estimator of the error variance with provably good performance in terms of mean squared error. These theoretical results do not require placing any assumptions on the design matrix or the true regression coefficients. We also propose a companion estimator, called the organic lasso, which theoretically does not require tuning of the regularization parameter. Both estimators do well empirically compared to preexisting methods, especially in settings where successful recovery of the true support of the coefficient vector is hard. Finally, we show that existing methods can do well under fewer assumptions than previously known, thus providing a fuller story about the problem of estimating the error variance in high-dimensional linear models.

Journal ArticleDOI
TL;DR: In this paper, a Wasserstein covariance measure is proposed to quantify and visualize the average density from each sampling unit, where the covariance matrix is derived from the mean density.
Abstract: A common feature of methods for analyzing samples of probability density functions is that they respect the geometry inherent to the space of densities. Once a metric is specified for this space, the Frechet mean is typically used to quantify and visualize the average density from the sample. For one-dimensional densities, the Wasserstein metric is popular due to its theoretical appeal and interpretive value as an optimal transport metric, leading to the Wasserstein-Frechet mean or barycenter as the mean density. We extend the existing methodology for samples of densities in two key directions. First, motivated by applications in neuroimaging, we consider dependent density data, where a $p$-vector of univariate random densities is observed for each sampling unit. Second, we introduce a Wasserstein covariance measure and propose intuitively appealing estimators for both fixed and diverging $p$, where the latter corresponds to continuously-indexed densities. We also give theory demonstrating consistency and asymptotic normality, while accounting for errors introduced in the unavoidable preparatory density estimation step. The utility of the Wasserstein covariance matrix is demonstrated through applications to functional connectivity in the brain using functional magnetic resonance imaging data and to the secular evolution of mortality for various countries.

Journal ArticleDOI
TL;DR: A constraint reduction method is developed that constructs a set of active constraints from super-exponentially many constraints, coupled with an alternating direction method of multipliers and a difference convex method, permits efficient computation for large-graph learning.
Abstract: Directed acyclic graphs are widely used to describe directional pairwise relations. Such relations are estimated by reconstructing a directed acyclic graph's structure, which is challenging when the ordering of nodes of the graph is unknown. In such a situation, existing methods such as the neighbourhood and search-and-score methods have high estimation errors or computational complexities, especially when a local or sequential approach is used to enumerate edge directions by testing or optimizing a criterion locally, as a local method may break down even for moderately sized graphs. We propose a novel approach to simultaneously identifying all estimable directed edges and model parameters, using constrained maximum likelihood with nonconvex constraints. We develop a constraint reduction method that constructs a set of active constraints from super-exponentially many constraints. This, coupled with an alternating direction method of multipliers and a difference convex method, permits efficient computation for large-graph learning. We show that the proposed method consistently reconstructs identifiable directions of the true graph and achieves the optimal performance in terms of parameter estimation. Numerically, the method compares favourably with competitors. A protein network is analysed to demonstrate that the proposed method can make a difference in identifying the network's structure.

Journal ArticleDOI
TL;DR: In this paper, the authors consider nonparametric estimation of a covariance function on the unit square, given a sample of discretely observed fragments of functional data, and give precise deterministic conditions on how fine the observation grid needs to be relative to the rank and fragment length to hold true.
Abstract: SUMMARY We consider nonparametric estimation of a covariance function on the unit square, given a sample of discretely observed fragments of functional data. When each sample path is observed only on a subinterval of length $\delta<1$, one has no statistical information on the unknown covariance outside a $\delta$-band around the diagonal. The problem seems unidentifiable without parametric assumptions, but we show that nonparametric estimation is feasible under suitable smoothness and rank conditions on the unknown covariance. This remains true even when the observations are discrete, and we give precise deterministic conditions on how fine the observation grid needs to be relative to the rank and fragment length for identifiability to hold true. We show that our conditions translate the estimation problem to a low-rank matrix completion problem, construct a nonparametric estimator in this vein, and study its asymptotic properties. We illustrate the numerical performance of our method on real and simulated data.

Journal ArticleDOI
TL;DR: In this paper, the authors propose an approach to test network dependence via diffusion maps and distance-based correlations, which yields a consistent test statistic under mild distributional assumptions on the graph structure, and demonstrate that it is able to efficiently identify the most informative graph embedding with respect to diffusion time.
Abstract: SummaryDeciphering the associations between network connectivity and nodal attributes is one of the core problems in network science. The dependency structure and high dimensionality of networks pose unique challenges to traditional dependency tests in terms of theoretical guarantees and empirical performance. We propose an approach to test network dependence via diffusion maps and distance-based correlations. We prove that the new method yields a consistent test statistic under mild distributional assumptions on the graph structure, and demonstrate that it is able to efficiently identify the most informative graph embedding with respect to the diffusion time. The methodology is illustrated on both simulated and real data.

Journal ArticleDOI
TL;DR: This work considers how different choices of kinetic energy in Hamiltonian Monte Carlo affect algorithm performance, and introduces two quantities which can be easily evaluated, the composite gradient and the implicit noise.
Abstract: We consider how different choices of kinetic energy in Hamiltonian Monte Carlo affect algorithm performance. To this end, we introduce two quantities which can be easily evaluated, the composite gradient and the implicit noise. Results are established on integrator stability and geometric convergence, and we show that choices of kinetic energy that result in heavy-tailed momentum distributions can exhibit an undesirable negligible moves property, which we define. A general efficiency-robustness trade off is outlined, and implementations which rely on approximate gradients are also discussed. Two numerical studies illustrate our theoretical findings, showing that the standard choice which results in a Gaussian momentum distribution is not always optimal in terms of either robustness or efficiency.