scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Methodology in 2019"


Journal ArticleDOI
TL;DR: In this paper, quantile g-computation is used to estimate the effect of exposure mixtures on public health actions that act on exposure sources, such as regulations on industrial emissions or mining processes, dietary changes, or consumer behavioral changes.
Abstract: Exposure mixtures frequently occur in data across many domains, particularly in the fields of environmental and nutritional epidemiology. Various strategies have arisen to answer questions about mixtures, including methods such as weighted quantile sum (WQS) regression that estimate a joint effect of the mixture components.We demonstrate a new approach to estimating the joint effects of a mixture: quantile g-computation. This approach combines the inferential simplicity of WQS regression with the flexibility of g-computation, a method of causal effect estimation. We use simulations to examine whether quantile g-computation and WQS regression can accurately and precisely estimate effects of mixtures in common scenarios. We examine the bias, confidence interval coverage, and bias-variance tradeoff of quantile g-computation and WQS regression, and how these quantities are impacted by the presence of non-causal exposures, exposure correlation, unmeasured confounding, and non-linear effects. Quantile g-computation, unlike WQS regression allows inference on mixture effects that is unbiased with appropriate confidence interval coverage at sample sizes typically encountered in epidemiologic studies and when the assumptions of WQS regression are not met. Further, WQS regression can magnify bias from unmeasured confounding that might occur if important components of the mixture are omitted. Unlike inferential approaches that examine effects of individual exposures, methods like quantile g-computation that can estimate the effect of a mixture are essential for understanding effects of potential public health actions that act on exposure sources. Our approach may serve to help bridge gaps between epidemiologic analysis and interventions such as regulations on industrial emissions or mining processes, dietary changes, or consumer behavioral changes that act on multiple exposures simultaneously.

305 citations


Posted Content
TL;DR: This paper proposes a new method that is fully adaptive to heteroscedasticity, which combines conformal prediction with classical quantile regression, inheriting the advantages of both.
Abstract: Conformal prediction is a technique for constructing prediction intervals that attain valid coverage in finite samples, without making distributional assumptions. Despite this appeal, existing conformal methods can be unnecessarily conservative because they form intervals of constant or weakly varying length across the input space. In this paper we propose a new method that is fully adaptive to heteroscedasticity. It combines conformal prediction with classical quantile regression, inheriting the advantages of both. We establish a theoretical guarantee of valid coverage, supplemented by extensive experiments on popular regression datasets. We compare the efficiency of conformalized quantile regression to other conformal methods, showing that our method tends to produce shorter intervals.

222 citations


Posted Content
TL;DR: In this article, the authors proposed two novel approaches to estimate the sample mean and standard deviation when data are suspected to be non-normal, and the proposed methods often perform better than the existing methods when applied to nonnormal data.
Abstract: Researchers increasingly use meta-analysis to synthesize the results of several studies in order to estimate a common effect. When the outcome variable is continuous, standard meta-analytic approaches assume that the primary studies report the sample mean and standard deviation of the outcome. However, when the outcome is skewed, authors sometimes summarize the data by reporting the sample median and one or both of (i) the minimum and maximum values and (ii) the first and third quartiles, but do not report the mean or standard deviation. To include these studies in meta-analysis, several methods have been developed to estimate the sample mean and standard deviation from the reported summary data. A major limitation of these widely used methods is that they assume that the outcome distribution is normal, which is unlikely to be tenable for studies reporting medians. We propose two novel approaches to estimate the sample mean and standard deviation when data are suspected to be non-normal. Our simulation results and empirical assessments show that the proposed methods often perform better than the existing methods when applied to non-normal data.

212 citations


MonographDOI
TL;DR: Cattaneo et al. as mentioned in this paper provide an accessible and practical guide for the analysis and interpretation of regression discontinuity (RD) designs that encourages the use of a common set of practices and facilitates the accumulation of RD-based empirical evidence.
Abstract: In this Element and its accompanying Element, Matias D. Cattaneo, Nicolas Idrobo, and Rocio Titiunik provide an accessible and practical guide for the analysis and interpretation of Regression Discontinuity (RD) designs that encourages the use of a common set of practices and facilitates the accumulation of RD-based empirical evidence. In this Element, the authors discuss the foundations of the canonical Sharp RD design, which has the following features: (i) the score is continuously distributed and has only one dimension, (ii) there is only one cutoff, and (iii) compliance with the treatment assignment is perfect. In the accompanying Element, the authors discuss practical and conceptual extensions to the basic RD setup.

173 citations


Posted Content
TL;DR: The authors apply causal forests to a dataset derived from the National Study of Learning Mindsets, and consider resulting practical and conceptual challenges, and discuss how causal forests use estimated propensity scores to be more robust to confounding and how they handle data with clustered errors.
Abstract: We apply causal forests to a dataset derived from the National Study of Learning Mindsets, and consider resulting practical and conceptual challenges. In particular, we discuss how causal forests use estimated propensity scores to be more robust to confounding, and how they handle data with clustered errors.

165 citations


Posted Content
TL;DR: It is shown that a weighted version of conformal prediction can be used to compute distribution-free prediction intervals for problems in which the test and training covariate distributions differ, but the likelihood ratio between these two distributions is known.
Abstract: We extend conformal prediction methodology beyond the case of exchangeable data. In particular, we show that a weighted version of conformal prediction can be used to compute distribution-free prediction intervals for problems in which the test and training covariate distributions differ, but the likelihood ratio between these two distributions is known---or, in practice, can be estimated accurately with access to a large set of unlabeled data (test covariate points). Our weighted extension of conformal prediction also applies more generally, to settings in which the data satisfies a certain weighted notion of exchangeability. We discuss other potential applications of our new conformal methodology, including latent variable and missing data problems.

124 citations


Posted Content
TL;DR: Some of the work on directed acyclic graphs, including the recent "The Book of Why," by Pearl and MacKenzie, and the potential outcome framework developed by Rubin and coauthors are reviewed.
Abstract: In this essay I discuss potential outcome and graphical approaches to causality, and their relevance for empirical work in economics I review some of the work on directed acyclic graphs, including the recent "The Book of Why," by Pearl and MacKenzie I also discuss the potential outcome framework developed by Rubin and coauthors, building on work by Neyman I then discuss the relative merits of these approaches for empirical work in economics, focusing on the questions each answer well, and why much of the the work in economics is closer in spirit to the potential outcome framework

97 citations


Posted Content
TL;DR: This paper argues that breaking dependencies between features in hold-out data places undue emphasis on sparse regions of the feature space by forcing the original model to extrapolate to regions where there is little to no data, and finds support for previous claims in the literature that PaP metrics tend to over-emphasize correlated features.
Abstract: This paper advocates against permute-and-predict (PaP) methods for interpreting black box functions. Methods such as the variable importance measures proposed for random forests, partial dependence plots, and individual conditional expectation plots remain popular because of their ability to provide model-agnostic measures that depend only on the pre-trained model output. However, numerous studies have found that these tools can produce diagnostics that are highly misleading, particularly when there is strong dependence among features. Rather than simply add to this growing literature by further demonstrating such issues, here we seek to provide an explanation for the observed behavior. In particular, we argue that breaking dependencies between features in hold-out data places undue emphasis on sparse regions of the feature space by forcing the original model to extrapolate to regions where there is little to no data. We explore these effects through various settings where a ground-truth is understood and find support for previous claims in the literature that PaP metrics tend to over-emphasize correlated features both in variable importance and partial dependence plots, even though applying permutation methods to the ground-truth models do not. As an alternative, we recommend more direct approaches that have proven successful in other settings: explicitly removing features, conditional permutations, or model distillation methods.

94 citations


Posted Content
TL;DR: In this paper, the authors present simple methods for sensitivity analysis that do not require detailed background knowledge about specific unknown or unmeasured determinants of the outcome or modifiers of the treatment effect.
Abstract: Extending (generalizing or transporting) causal inferences from a randomized trial to a target population requires ``generalizability'' or ``transportability'' assumptions, which state that randomized and non-randomized individuals are exchangeable conditional on baseline covariates. These assumptions are made on the basis of background knowledge, which is often uncertain or controversial, and need to be subjected to sensitivity analysis. We present simple methods for sensitivity analyses that do not require detailed background knowledge about specific unknown or unmeasured determinants of the outcome or modifiers of the treatment effect. Instead, our methods directly parameterize violations of the assumptions using bias functions. We show how the methods can be applied to non-nested trial designs, where the trial data are combined with a separately obtained sample of non-randomized individuals, as well as to nested trial designs, where a clinical trial is embedded within a cohort sampled from the target population. We illustrate the methods using data from a clinical trial comparing treatments for chronic hepatitis C infection.

65 citations


Journal ArticleDOI
TL;DR: In this paper, the Shannon transform of the $P$-value $p, also known as the binary surprisal, is used to measure the information supplied by the testing procedure, and to help calibrate intuitions against simple physical experiments like coin tossing.
Abstract: Researchers often misinterpret and misrepresent statistical outputs. This abuse has led to a large literature on modification or replacement of testing thresholds and $P$-values with confidence intervals, Bayes factors, and other devices. Because the core problems appear cognitive rather than statistical, we review simple aids to statistical interpretations. These aids emphasize logical and information concepts over probability, and thus may be more robust to common misinterpretations than are traditional descriptions. We use the Shannon transform of the $P$-value $p$, also known as the binary surprisal or $S$-value $s=-\log_{2}(p)$, to measure the information supplied by the testing procedure, and to help calibrate intuitions against simple physical experiments like coin tossing. We also use tables or graphs of test statistics for alternative hypotheses, and interval estimates for different percentile levels, to thwart fallacies arising from arbitrary dichotomies. Finally, we reinterpret $P$-values and interval estimates in unconditional terms, which describe compatibility of data with the entire set of analysis assumptions. We illustrate these methods with a reanalysis of data from an existing record-based cohort study. In line with other recent recommendations, we advise that teaching materials and research reports discuss $P$-values as measures of compatibility rather than significance, compute $P$-values for alternative hypotheses whenever they are computed for null hypotheses, and interpret interval estimates as showing values of high compatibility with data, rather than regions of confidence. Our recommendations emphasize cognitive devices for displaying the compatibility of the observed data with various hypotheses of interest, rather than focusing on single hypothesis tests or interval estimates. We believe these simple reforms are well worth the minor effort they require.

65 citations


Posted Content
TL;DR: This work provides an alternate view of conformal prediction that starts with a sequence of nested sets and calibrates them to find a valid prediction region, and uses the framework to derive a new algorithm that combines four ideas: quantile regression, cross-conformalization, ensemble methods and out-of-bag predictions.
Abstract: Conformal prediction is a popular tool for providing valid prediction sets for classification and regression problems, without relying on any distributional assumptions on the data. While the traditional description of conformal prediction starts with a nonconformity score, we provide an alternate (but equivalent) view that starts with a sequence of nested sets and calibrates them to find a valid prediction set. The nested framework subsumes all nonconformity scores, including recent proposals based on quantile regression and density estimation. While these ideas were originally derived based on sample splitting, our framework seamlessly extends them to other aggregation schemes like cross-conformal, jackknife+ and out-of-bag methods. We use the framework to derive a new algorithm (QOOB, pronounced cube) that combines four ideas: quantile regression, cross-conformalization, ensemble methods and out-of-bag predictions. We develop a computationally efficient implementation of cross-conformal, that is also used by QOOB. In a detailed numerical investigation, QOOB performs either the best or close to the best on all simulated and real datasets.

Posted Content
TL;DR: A new model is introduced, the common subspace independent-edge multiple random graph model, which describes a heterogeneous collection of networks with a shared latent structure on the vertices but potentially different connectivity patterns for each graph, and is both flexible enough to meaningfully account for important graph differences and tractable enough to allow for accurate inference in multiple networks.
Abstract: The development of models for multiple heterogeneous network data is of critical importance both in statistical network theory and across multiple application domains. Although single-graph inference is well-studied, multiple graph inference is largely unexplored, in part because of the challenges inherent in appropriately modeling graph differences and yet retaining sufficient model simplicity to render estimation feasible. This paper addresses exactly this gap, by introducing a new model, the common subspace independent-edge (COSIE) multiple random graph model, which describes a heterogeneous collection of networks with a shared latent structure on the vertices but potentially different connectivity patterns for each graph. The COSIE model encompasses many popular network representations, including the stochastic blockmodel. The model is both flexible enough to meaningfully account for important graph differences and tractable enough to allow for accurate inference in multiple networks. In particular, a joint spectral embedding of adjacency matrices - the multiple adjacency spectral embedding (MASE) - leads, in a COSIE model, to simultaneous consistent estimation of underlying parameters for each graph. Under mild additional assumptions, MASE estimates satisfy asymptotic normality and yield improvements for graph eigenvalue estimation and hypothesis testing. In both simulated and real data, the COSIE model and the MASE embedding can be deployed for a number of subsequent network inference tasks, including dimensionality reduction, classification, hypothesis testing and community detection. Specifically, when MASE is applied to a dataset of connectomes constructed through diffusion magnetic resonance imaging, the result is an accurate classification of brain scans by patient and a meaningful determination of heterogeneity across scans of different subjects.

Posted Content
TL;DR: An approach to spatial extreme value theory based on the conditional multivariate extreme value model, whereby the limit theory is formed through conditioning upon the value at a particular site being extreme, allows for a flexible class of dependence structures, as well as models that can be fitted in high dimensions.
Abstract: Currently available models for spatial extremes suffer either from inflexibility in the dependence structures that they can capture, lack of scalability to high dimensions, or in most cases, both of these. We present an approach to spatial extreme value theory based on the conditional multivariate extreme value model, whereby the limit theory is formed through conditioning upon the value at a particular site being extreme. The ensuing methodology allows for a flexible class of dependence structures, as well as models that can be fitted in high dimensions. To overcome issues of conditioning on a single site, we suggest a joint inference scheme based on all observation locations, and implement an importance sampling algorithm to provide spatial realizations and estimates of quantities conditioning upon the process being extreme at any of one of an arbitrary set of locations. The modelling approach is applied to Australian summer temperature extremes, permitting assessment the spatial extent of high temperature events over the continent.

BookDOI
TL;DR: This chapter focuses on delineating the mixture of experts modelling framework and demonstrates the utility and flexibility of mixtures of experts models as an analytic tool.
Abstract: Mixtures of experts models provide a framework in which covariates may be included in mixture models. This is achieved by modelling the parameters of the mixture model as functions of the concomitant covariates. Given their mixture model foundation, mixtures of experts models possess a diverse range of analytic uses, from clustering observations to capturing parameter heterogeneity in cross-sectional data. This chapter focuses on delineating the mixture of experts modelling framework and demonstrates the utility and flexibility of mixtures of experts models as an analytic tool.

Posted Content
TL;DR: In this paper, the authors identify and illustrate such gaps in the literature and present them at a moderate technical level to the wide community of practitioners, researchers, and students of statistics.
Abstract: How to select variables and identify functional forms for continuous variables is a key concern when creating a multivariable model. Ad hoc 'traditional' approaches to variable selection have been in use for at least 50 years. Similarly, methods for determining functional forms for continuous variables were first suggested many years ago. More recently, many alternative approaches to address these two challenges have been proposed, but knowledge of their properties and meaningful comparisons between them are scarce. To define a state-of-the-art and to provide evidence-supported guidance to researchers who have only a basic level of statistical knowledge many outstanding issues in multivariable modelling remain. Our main aims are to identify and illustrate such gaps in the literature and present them at a moderate technical level to the wide community of practitioners, researchers and students of statistics. We briefly discuss general issues in building descriptive regression models, strategies for variable selection, different ways of choosing functional forms for continuous variables, and methods for combining the selection of variables and functions. We discuss two examples, taken from the medical literature, to illustrate problems in the practice of modelling. Our overview revealed that there is not yet enough evidence on which to base recommendations for the selection of variables and functional forms in multivariable analysis. Such evidence may come from comparisons between alternative methods. In particular, we highlight seven important topics that require further investigation and make suggestions for the direction of further research.

Posted Content
TL;DR: This work proposes a new method for community detection that operates directly on the hypergraph, and introduces a degree-corrected block model for hypergraphs (hDCBM), and shows that Tensor-SCORE yields consistent community detection for a wide range of network sparsity and degree heterogeneity.
Abstract: To date, social network analysis has been largely focused on pairwise interactions. The study of higher-order interactions, via a hypergraph network, brings in new insights. We study community detection in a hypergraph network. A popular approach is to project the hypergraph to a graph and then apply community detection methods for graph networks, but we show that this approach may cause unwanted information loss. We propose a new method for community detection that operates directly on the hypergraph. At the heart of our method is a regularized higher-order orthogonal iteration (reg-HOOI) algorithm that computes an approximate low-rank decomposition of the network adjacency tensor. Compared with existing tensor decomposition methods such as HOSVD and vanilla HOOI, reg-HOOI yields better performance, especially when the hypergraph is sparse. Given the output of tensor decomposition, we then generalize the community detection method SCORE (Jin, 2015) from graph networks to hypergraph networks. We call our new method Tensor-SCORE. In theory, we introduce a degree-corrected block model for hypergraphs (hDCBM), and show that Tensor-SCORE yields consistent community detection for a wide range of network sparsity and degree heterogeneity. As a byproduct, we derive the rates of convergence on estimating the principal subspace by reg-HOOI, with different initializations, including the two new initialization methods we propose, a diagonal-removed HOSVD and a randomized graph projection. We apply our method to several real hypergraph networks which yields encouraging results. It suggests that exploring higher-order interactions provides additional information not seen in graph representations.

Posted Content
TL;DR: Theoretical properties of a class of minimum distance estimators for intractable generative models, that is, statistical models for which the likelihood is intracted, but simulation is cheap, are studied, showing that they are consistent, asymptotically normal and robust to model misspecification.
Abstract: While likelihood-based inference and its variants provide a statistically efficient and widely applicable approach to parametric inference, their application to models involving intractable likelihoods poses challenges. In this work, we study a class of minimum distance estimators for intractable generative models, that is, statistical models for which the likelihood is intractable, but simulation is cheap. The distance considered, maximum mean discrepancy (MMD), is defined through the embedding of probability measures into a reproducing kernel Hilbert space. We study the theoretical properties of these estimators, showing that they are consistent, asymptotically normal and robust to model misspecification. A main advantage of these estimators is the flexibility offered by the choice of kernel, which can be used to trade-off statistical efficiency and robustness. On the algorithmic side, we study the geometry induced by MMD on the parameter space and use this to introduce a novel natural gradient descent-like algorithm for efficient implementation of these estimators. We illustrate the relevance of our theoretical results on several classes of models including a discrete-time latent Markov process and two multivariate stochastic differential equation models.

Journal ArticleDOI
TL;DR: In this paper, the authors compare two recently proposed methods that combine ideas from conformal inference and quantile regression to produce locally adaptive and marginally valid prediction intervals under sample exchangeability.
Abstract: We compare two recently proposed methods that combine ideas from conformal inference and quantile regression to produce locally adaptive and marginally valid prediction intervals under sample exchangeability (Romano et al., 2019; Kivaranovic et al., 2019). First, we prove that these two approaches are asymptotically efficient in large samples, under some additional assumptions. Then we compare them empirically on simulated and real data. Our results demonstrate that the method in Romano et al. (2019) typically yields tighter prediction intervals in finite samples. Finally, we discuss how to tune these procedures by fixing the relative proportions of observations used for training and conformalization.

Posted Content
TL;DR: In this article, the authors present a hitchhiker's guide to rigorous comparisons of reinforcement learning algorithms in terms of false positive rate and statistical power as a function of the sample size and effect size.
Abstract: Consistently checking the statistical significance of experimental results is the first mandatory step towards reproducible science. This paper presents a hitchhiker's guide to rigorous comparisons of reinforcement learning algorithms. After introducing the concepts of statistical testing, we review the relevant statistical tests and compare them empirically in terms of false positive rate and statistical power as a function of the sample size (number of seeds) and effect size. We further investigate the robustness of these tests to violations of the most common hypotheses (normal distributions, same distributions, equal variances). Beside simulations, we compare empirical distributions obtained by running Soft-Actor Critic and Twin-Delayed Deep Deterministic Policy Gradient on Half-Cheetah. We conclude by providing guidelines and code to perform rigorous comparisons of RL algorithm performances.

Journal ArticleDOI
TL;DR: Using Pareto smoothed importance sampling, a method is proposed for approximating exact LFO-CV that drastically reduces the computational costs while also providing informative diagnostics about the quality of the approximation.
Abstract: One of the common goals of time series analysis is to use the observed series to inform predictions for future observations. In the absence of any actual new data to predict, cross-validation can be used to estimate a model's future predictive accuracy, for instance, for the purpose of model comparison or selection. Exact cross-validation for Bayesian models is often computationally expensive, but approximate cross-validation methods have been developed, most notably methods for leave-one-out cross-validation (LOO-CV). If the actual prediction task is to predict the future given the past, LOO-CV provides an overly optimistic estimate because the information from future observations is available to influence predictions of the past. To properly account for the time series structure, we can use leave-future-out cross-validation (LFO-CV). Like exact LOO-CV, exact LFO-CV requires refitting the model many times to different subsets of the data. Using Pareto smoothed importance sampling, we propose a method for approximating exact LFO-CV that drastically reduces the computational costs while also providing informative diagnostics about the quality of the approximation.

Journal ArticleDOI
TL;DR: In this article, the authors proposed to avoid the use of summaries and the ensuing loss of information by instead using the Wasserstein distance between the empirical distributions of the observed and synthetic data.
Abstract: A growing number of generative statistical models do not permit the numerical evaluation of their likelihood functions. Approximate Bayesian computation (ABC) has become a popular approach to overcome this issue, in which one simulates synthetic data sets given parameters and compares summaries of these data sets with the corresponding observed values. We propose to avoid the use of summaries and the ensuing loss of information by instead using the Wasserstein distance between the empirical distributions of the observed and synthetic data. This generalizes the well-known approach of using order statistics within ABC to arbitrary dimensions. We describe how recently developed approximations of the Wasserstein distance allow the method to scale to realistic data sizes, and propose a new distance based on the Hilbert space-filling curve. We provide a theoretical study of the proposed method, describing consistency as the threshold goes to zero while the observations are kept fixed, and concentration properties as the number of observations grows. Various extensions to time series data are discussed. The approach is illustrated on various examples, including univariate and multivariate g-and-k distributions, a toggle switch model from systems biology, a queueing model, and a Levy-driven stochastic volatility model.

Journal ArticleDOI
TL;DR: A mean-field spike and slab variational Bayes (VB) approximation to Bayesian model selection priors in sparse high-dimensional linear regression is studied, showing that it works comparably well as other state-of-the-art Bayesian variable selection methods.
Abstract: We study a mean-field spike and slab variational Bayes (VB) approximation to Bayesian model selection priors in sparse high-dimensional linear regression. Under compatibility conditions on the design matrix, oracle inequalities are derived for the mean-field VB approximation, implying that it converges to the sparse truth at the optimal rate and gives optimal prediction of the response vector. The empirical performance of our algorithm is studied, showing that it works comparably well as other state-of-the-art Bayesian variable selection methods. We also numerically demonstrate that the widely used coordinate-ascent variational inference (CAVI) algorithm can be highly sensitive to the parameter updating order, leading to potentially poor performance. To mitigate this, we propose a novel prioritized updating scheme that uses a data-driven updating order and performs better in simulations. The variational algorithm is implemented in the R package 'sparsevb'.

Posted Content
Jin Ying, Weilin Fu, Jian Kang, Guo Jiadong, Jian Guo 
TL;DR: The proposed BSR(Bayesian Symbolic Regression) method saves computer memory with no need to keep an updated 'genome pool', and numerical experiments show that, compared with GP, the solutions of BSR are closer to the ground truth and the expressions are more concise.
Abstract: Interpretability is crucial for machine learning in many scenarios such as quantitative finance, banking, healthcare, etc. Symbolic regression (SR) is a classic interpretable machine learning method by bridging X and Y using mathematical expressions composed of some basic functions. However, the search space of all possible expressions grows exponentially with the length of the expression, making it infeasible for enumeration. Genetic programming (GP) has been traditionally and commonly used in SR to search for the optimal solution, but it suffers from several limitations, e.g. the difficulty in incorporating prior knowledge; overly-complicated output expression and reduced interpretability etc. To address these issues, we propose a new method to fit SR under a Bayesian framework. Firstly, Bayesian model can naturally incorporate prior knowledge (e.g., preference of basis functions, operators and raw features) to improve the efficiency of fitting SR. Secondly, to improve interpretability of expressions in SR, we aim to capture concise but informative signals. To this end, we assume the expected signal has an additive structure, i.e., a linear combination of several concise expressions, whose complexity is controlled by a well-designed prior distribution. In our setup, each expression is characterized by a symbolic tree, and the proposed SR model could be solved by sampling symbolic trees from the posterior distribution using an efficient Markov chain Monte Carlo (MCMC) algorithm. Finally, compared with GP, the proposed BSR(Bayesian Symbolic Regression) method saves computer memory with no need to keep an updated 'genome pool'. Numerical experiments show that, compared with GP, the solutions of BSR are closer to the ground truth and the expressions are more concise. Meanwhile we find the solution of BSR is robust to hyper-parameter specifications such as the number of trees.

Posted Content
TL;DR: In this paper, a factor model approach for analyzing high-dimensional dynamic tensor time series and multi-category dynamic transport networks is presented, along with their theoretical properties and simulation results.
Abstract: Large tensor (multi-dimensional array) data are now routinely collected in a wide range of applications, due to modern data collection capabilities. Often such observations are taken over time, forming tensor time series. In this paper we present a factor model approach for analyzing high-dimensional dynamic tensor time series and multi-category dynamic transport networks. Two estimation procedures along with their theoretical properties and simulation results are presented. Two applications are used to illustrate the model and its interpretations.

Posted Content
TL;DR: This work proposes a novel filtering methodology that harnesses transportation of measures, convex optimization, and ideas from probabilistic graphical models to yield robust ensemble approximations of the filtering distribution in high dimensions, and avoids any form of importance sampling and introduces non-Gaussian localization approaches for dimension scalability.
Abstract: We consider filtering in high-dimensional non-Gaussian state-space models with intractable transition kernels, nonlinear and possibly chaotic dynamics, and sparse observations in space and time. We propose a novel filtering methodology that harnesses transportation of measures, convex optimization, and ideas from probabilistic graphical models to yield robust ensemble approximations of the filtering distribution in high dimensions. Our approach can be understood as the natural generalization of the ensemble Kalman filter (EnKF) to nonlinear updates, using stochastic or deterministic couplings. The use of nonlinear updates can reduce the intrinsic bias of the EnKF at a marginal increase in computational cost. We avoid any form of importance sampling and introduce non-Gaussian localization approaches for dimension scalability. Our framework achieves state-of-the-art tracking performance on challenging configurations of the Lorenz-96 model in the chaotic regime.

Book ChapterDOI
TL;DR: A focus of the paper is on methodology to standardise the different characteristics of a clustering so that users can aggregate them in a suitable way specifying weights for the various criteria that are relevant in the clustering application at hand.
Abstract: There are many cluster analysis methods that can produce quite different clusterings on the same dataset. Cluster validation is about the evaluation of the quality of a clustering; "relative cluster validation" is about using such criteria to compare clusterings. This can be used to select one of a set of clusterings from different methods, or from the same method ran with different parameters such as different numbers of clusters. There are many cluster validation indexes in the literature. Most of them attempt to measure the overall quality of a clustering by a single number, but this can be inappropriate. There are various different characteristics of a clustering that can be relevant in practice, depending on the aim of clustering, such as low within-cluster distances and high between-cluster separation. In this paper, a number of validation criteria will be introduced that refer to different desirable characteristics of a clustering, and that characterise a clustering in a multidimensional way. In specific applications the user may be interested in some of these criteria rather than others. A focus of the paper is on methodology to standardise the different characteristics so that users can aggregate them in a suitable way specifying weights for the various criteria that are relevant in the clustering application at hand.

Posted Content
TL;DR: The new separable effects to study the causal effect of a treatment on an event of interest are proposed and do not require cross-world contrasts or hypothetical interventions to prevent death.
Abstract: In time-to-event settings, the presence of competing events complicates the definition of causal effects. Here we propose the new separable effects to study the causal effect of a treatment on an event of interest. The separable direct effect is the treatment effect on the event of interest not mediated by its effect on the competing event. The separable indirect effect is the treatment effect on the event of interest only through its effect on the competing event. Similar to Robins and Richardson's extended graphical approach for mediation analysis, the separable effects can only be identified under the assumption that the treatment can be decomposed into two distinct components that exert their effects through distinct causal pathways. Unlike existing definitions of causal effects in the presence of competing events, our estimands do not require cross-world contrasts or hypothetical interventions to prevent death. As an illustration, we apply our approach to a randomized clinical trial on estrogen therapy in individuals with prostate cancer.

Posted Content
TL;DR: In this paper, the authors used directed acyclic graphs (DAGs) and simple simulations to provide an accessible explanation of why change scores do not estimate causal effects in observational data.
Abstract: Background: In longitudinal data, it is common to create 'change scores' by subtracting measurements taken at baseline from those taken at follow-up, and then to analyse the resulting 'change' as the outcome variable. In observational data, this approach can produce misleading causal effect estimates. The present article uses directed acyclic graphs (DAGs) and simple simulations to provide an accessible explanation of why change scores do not estimate causal effects in observational data. Methods: Data were simulated to match three general scenarios where the variable representing measurements of the outcome at baseline was a 1) competing exposure, 2) confounder, or 3) mediator for the total causal effect of the exposure on the variable representing measurements of the outcome at follow-up. Regression coefficients were compared between change-score analyses and DAG-informed analyses. Results: Change-score analyses do not provide meaningful causal effect estimates unless the variable representing measurements of the outcome at baseline is a competing exposure, as in a randomised experiment. Where such variables (i.e. baseline measurements of the outcome) are confounders or mediators, the conclusions drawn from analyses of change scores diverge (potentially substantially) from those of DAG-informed analyses. Conclusions: Future observational studies that seek causal effect estimates should avoid analysing change scores and adopt alternative analytical strategies.

Posted Content
TL;DR: This work considers the multi‐class classification problem when the training data and the out‐of‐sample test data may have different distributions and proposes a method called BCOPS (balanced and conformal optimized prediction sets), which tries to optimize the out-of-sample performance and estimates the outlier detection rate of a given procedure.
Abstract: We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction sets). BCOPS constructs a prediction set $C(x)$ as a subset of class labels, possibly empty. It tries to optimize the out-of-sample performance, aiming to include the correct class as often as possible, but also detecting outliers $x$, for which the method returns no prediction (corresponding to $C(x)$ equal to the empty set). The proposed method combines supervised-learning algorithms with the method of conformal prediction to minimize a misclassification loss averaged over the out-of-sample distribution. The constructed prediction sets have a finite-sample coverage guarantee without distributional assumptions. We also propose a method to estimate the outlier detection rate of a given method. We prove asymptotic consistency and optimality of our proposals under suitable assumptions and illustrate our methods on real data examples.

Book ChapterDOI
TL;DR: This chapter distinguishes between selecting G as a density estimation problem in Section 2 and selecting G in a model-based clustering framework in Section 3, and presents some of the Bayesian solutions to the different interpretations of picking the "right" number of components in a mixture.
Abstract: Determining the number G of components in a finite mixture distribution is an important and difficult inference issue. This is a most important question, because statistical inference about the resulting model is highly sensitive to the value of G. Selecting an erroneous value of G may produce a poor density estimate. This is also a most difficult question from a theoretical perspective as it relates to unidentifiability issues of the mixture model. This is further a most relevant question from a practical viewpoint since the meaning of the number of components G is strongly related to the modelling purpose of a mixture distribution. We distinguish in this chapter between selecting G as a density estimation problem in Section 2 and selecting G in a model-based clustering framework in Section 3. Both sections discuss frequentist as well as Bayesian approaches. We present here some of the Bayesian solutions to the different interpretations of picking the "right" number of components in a mixture, before concluding on the ill-posed nature of the question.