scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Methodology in 2015"


Posted Content
TL;DR: This work shows that failure to converge typically is not due to a suboptimal estimation algorithm, but is a consequence of attempting to fit a model that is too complex to be properly supported by the data, irrespective of whether estimation is based on maximum likelihood or on Bayesian hierarchical modeling with uninformative or weakly informative priors.
Abstract: The analysis of experimental data with mixed-effects models requires decisions about the specification of the appropriate random-effects structure. Recently, Barr, Levy, Scheepers, and Tily, 2013 recommended fitting `maximal' models with all possible random effect components included. Estimation of maximal models, however, may not converge. We show that failure to converge typically is not due to a suboptimal estimation algorithm, but is a consequence of attempting to fit a model that is too complex to be properly supported by the data, irrespective of whether estimation is based on maximum likelihood or on Bayesian hierarchical modeling with uninformative or weakly informative priors. Importantly, even under convergence, overparameterization may lead to uninterpretable models. We provide diagnostic tools for detecting overparameterization and guiding model simplification.

889 citations


Posted Content
TL;DR: This is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference and is found to be substantially more powerful than classical methods based on nearest-neighbor matching.
Abstract: Many scientific and engineering challenges -- ranging from personalized medicine to customized marketing recommendations -- require an understanding of treatment effect heterogeneity. In this paper, we develop a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm. In the potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms. To our knowledge, this is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference. In experiments, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially in the presence of irrelevant covariates.

816 citations


Posted Content
TL;DR: This work proposes to exploit invariance of a prediction under a causal model for causal inference: given different experimental settings (e.g. various interventions) the authors collect all models that do show invariance in their predictive accuracy across settings and interventions, and yields valid confidence intervals for the causal relationships in quite general scenarios.
Abstract: What is the difference of a prediction that is made with a causal model and a non-causal model? Suppose we intervene on the predictor variables or change the whole environment. The predictions from a causal model will in general work as well under interventions as for observational data. In contrast, predictions from a non-causal model can potentially be very wrong if we actively intervene on variables. Here, we propose to exploit this invariance of a prediction under a causal model for causal inference: given different experimental settings (for example various interventions) we collect all models that do show invariance in their predictive accuracy across settings and interventions. The causal model will be a member of this set of models with high probability. This approach yields valid confidence intervals for the causal relationships in quite general scenarios. We examine the example of structural equation models in more detail and provide sufficient assumptions under which the set of causal predictors becomes identifiable. We further investigate robustness properties of our approach under model misspecification and discuss possible extensions. The empirical properties are studied for various data sets, including large-scale gene perturbation experiments.

491 citations


Posted Content
TL;DR: In this article, the authors propose a new framework for measuring connectedness among financial variables that arises due to heterogeneous frequency responses to shocks, based on the spectral representation of variance decompositions.
Abstract: We propose a new framework for measuring connectedness among financial variables that arises due to heterogeneous frequency responses to shocks. To estimate connectedness in short-, medium-, and long-term financial cycles, we introduce a framework based on the spectral representation of variance decompositions. In an empirical application, we document the rich time-frequency dynamics of volatility connectedness in US financial institutions. Economically, periods in which connectedness is created at high frequencies are periods when stock markets seem to process information rapidly and calmly, and a shock to one asset in the system will have an impact mainly in the short term. When the connectedness is created at lower frequencies, it suggests that shocks are persistent and are being transmitted for longer periods.

323 citations


Journal ArticleDOI
TL;DR: In this article, a general framework for smoothing parameter estimation for models with regular likelihoods constructed in terms of unknown smooth functions of covariates is discussed, where the smoothing parameters controlling the extent of penalization are estimated by Laplace approximate marginal likelihood.
Abstract: This paper discusses a general framework for smoothing parameter estimation for models with regular likelihoods constructed in terms of unknown smooth functions of covariates. Gaussian random effects and parametric terms may also be present. By construction the method is numerically stable and convergent, and enables smoothing parameter uncertainty to be quantified. The latter enables us to fix a well known problem with AIC for such models. The smooth functions are represented by reduced rank spline like smoothers, with associated quadratic penalties measuring function smoothness. Model estimation is by penalized likelihood maximization, where the smoothing parameters controlling the extent of penalization are estimated by Laplace approximate marginal likelihood. The methods cover, for example, generalized additive models for non-exponential family responses (for example beta, ordered categorical, scaled t distribution, negative binomial and Tweedie distributions), generalized additive models for location scale and shape (for example two stage zero inflation models, and Gaussian location-scale models), Cox proportional hazards models and multivariate additive models. The framework reduces the implementation of new model classes to the coding of some standard derivatives of the log likelihood.

305 citations


Journal ArticleDOI
TL;DR: The main approaches to building cross-covariance models are reviewed, including the linear model of coregionalization, convolution methods, the multivariate Mat\'{e}rn and nonstationary and space-time extensions of these among others, and specialized constructions, including those designed for asymmetry, compact support and spherical domains, are covered.
Abstract: Continuously indexed datasets with multiple variables have become ubiquitous in the geophysical, ecological, environmental and climate sciences, and pose substantial analysis challenges to scientists and statisticians. For many years, scientists developed models that aimed at capturing the spatial behavior for an individual process; only within the last few decades has it become commonplace to model multiple processes jointly. The key difficulty is in specifying the cross-covariance function, that is, the function responsible for the relationship between distinct variables. Indeed, these cross-covariance functions must be chosen to be consistent with marginal covariance functions in such a way that the second-order structure always yields a nonnegative definite covariance matrix. We review the main approaches to building cross-covariance models, including the linear model of coregionalization, convolution methods, the multivariate Matern and nonstationary and space-time extensions of these among others. We additionally cover specialized constructions, including those designed for asymmetry, compact support and spherical domains, with a review of physics-constrained models. We illustrate select models on a bivariate regional climate model output example for temperature and pressure, along with a bivariate minimum and maximum temperature observational dataset; we compare models by likelihood value as well as via cross-validation co-kriging studies. The article closes with a discussion of unsolved problems.

212 citations


Posted Content
TL;DR: In this article, a principled joint prior for the range and the marginal variance of one-dimensional, two-dimensional and three-dimensional Matern GRFs with fixed smoothness is proposed.
Abstract: Priors are important for achieving proper posteriors with physically meaningful covariance structures for Gaussian random fields (GRFs) since the likelihood typically only provides limited information about the covariance structure under in-fill asymptotics. We extend the recent Penalised Complexity prior framework and develop a principled joint prior for the range and the marginal variance of one-dimensional, two-dimensional and three-dimensional Matern GRFs with fixed smoothness. The prior is weakly informative and penalises complexity by shrinking the range towards infinity and the marginal variance towards zero. We propose guidelines for selecting the hyperparameters, and a simulation study shows that the new prior provides a principled alternative to reference priors that can leverage prior knowledge to achieve shorter credible intervals while maintaining good coverage. We extend the prior to a non-stationary GRF parametrized through local ranges and marginal standard deviations, and introduce a scheme for selecting the hyperparameters based on the coverage of the parameters when fitting simulated stationary data. The approach is applied to a dataset of annual precipitation in southern Norway and the scheme for selecting the hyperparameters leads to concervative estimates of non-stationarity and improved predictive performance over the stationary model.

199 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compare several widely used Bayesian model selection methods in practical model selection problems, highlight their differences and give recommendations about the preferred approaches for variable subset selection for regression and classification.
Abstract: The goal of this paper is to compare several widely used Bayesian model selection methods in practical model selection problems, highlight their differences and give recommendations about the preferred approaches. We focus on the variable subset selection for regression and classification and perform several numerical experiments using both simulated and real world data. The results show that the optimization of a utility estimate such as the cross-validation (CV) score is liable to finding overfitted models due to relatively high variance in the utility estimates when the data is scarce. This can also lead to substantial selection induced bias and optimism in the performance evaluation for the selected model. From a predictive viewpoint, best results are obtained by accounting for model uncertainty by forming the full encompassing model, such as the Bayesian model averaging solution over the candidate models. If the encompassing model is too complex, it can be robustly simplified by the projection method, in which the information of the full model is projected onto the submodels. This approach is substantially less prone to overfitting than selection based on CV-score. Overall, the projection method appears to outperform also the maximum a posteriori model and the selection of the most probable variables. The study also demonstrates that the model selection can greatly benefit from using cross-validation outside the searching process both for guiding the model size selection and assessing the predictive performance of the finally selected model.

197 citations


Posted Content
TL;DR: In this paper, robust variance estimation (RVE) is proposed as a meta-analytic method for dealing with dependent effect sizes. But, traditional meta-regression models are ill-equipped to handle the complex and often unknown correlations among non-independent effect sizes, and the RVE method is not suitable for large and small sample estimators under various weighting schemes.
Abstract: Meta-regression models are commonly used to synthesize and compare effect sizes. Unfortunately, traditional meta-regression methods are ill-equipped to handle the complex and often unknown correlations among non-independent effect sizes. Robust variance estimation (RVE) is a recently proposed meta-analytic method for dealing with dependent effect sizes. The robumeta package provides functions for performing robust variance meta-regression using both large and small sample RVE estimators under various weighting schemes. These methods are distribution free and provide valid point estimates, standard errors and hypothesis tests even when the degree and structure of dependence between effect sizes is unknown.

181 citations


Journal ArticleDOI
TL;DR: A multi-resolution approximation (M-RA) of Gaussian processes observed at irregular locations in space is proposed, which can capture spatial structure from very fine to very large scales.
Abstract: Automated sensing instruments on satellites and aircraft have enabled the collection of massive amounts of high-resolution observations of spatial fields over large spatial regions. If these datasets can be efficiently exploited, they can provide new insights on a wide variety of issues. However, traditional spatial-statistical techniques such as kriging are not computationally feasible for big datasets. We propose a multi-resolution approximation (M-RA) of Gaussian processes observed at irregular locations in space. The M-RA process is specified as a linear combination of basis functions at multiple levels of spatial resolution, which can capture spatial structure from very fine to very large scales. The basis functions are automatically chosen to approximate a given covariance function, which can be nonstationary. All computations involving the M-RA, including parameter inference and prediction, are highly scalable for massive datasets. Crucially, the inference algorithms can also be parallelized to take full advantage of large distributed-memory computing environments. In comparisons using simulated data and a large satellite dataset, the M-RA outperforms a related state-of-the-art method.

160 citations


Book ChapterDOI
TL;DR: In this article, the authors explore the use of Hamiltonian Monte Carlo for hierarchical models and demonstrate how the algorithm can overcome pathologies of hierarchical models in practical applications, and demonstrate that it can overcome those pathologies.
Abstract: Hierarchical modeling provides a framework for modeling the complex interactions typical of problems in applied statistics. By capturing these relationships, however, hierarchical models also introduce distinctive pathologies that quickly limit the efficiency of most common methods of in- ference. In this paper we explore the use of Hamiltonian Monte Carlo for hierarchical models and demonstrate how the algorithm can overcome those pathologies in practical applications.

Posted Content
TL;DR: It turns out that many of the essential properties of DPMs are also exhibited by MFMs, and the MFM analogues are simple enough that they can be used much like the corresponding DPM properties; this simplifies the implementation of MFMs and can substantially improve mixing.
Abstract: A natural Bayesian approach for mixture models with an unknown number of components is to take the usual finite mixture model with Dirichlet weights, and put a prior on the number of components---that is, to use a mixture of finite mixtures (MFM). While inference in MFMs can be done with methods such as reversible jump Markov chain Monte Carlo, it is much more common to use Dirichlet process mixture (DPM) models because of the relative ease and generality with which DPM samplers can be applied. In this paper, we show that, in fact, many of the attractive mathematical properties of DPMs are also exhibited by MFMs---a simple exchangeable partition distribution, restaurant process, random measure representation, and in certain cases, a stick-breaking representation. Consequently, the powerful methods developed for inference in DPMs can be directly applied to MFMs as well. We illustrate with simulated and real data, including high-dimensional gene expression data.

Posted Content
TL;DR: In this paper, the authors explore statistical properties and frequentist inference in a model that combines a stochastic block model (SBM) for its static part with independent Markov chains for the evolution of the nodes groups through time.
Abstract: Statistical node clustering in discrete time dynamic networks is an emerging field that raises many challenges. Here, we explore statistical properties and frequentist inference in a model that combines a stochastic block model (SBM) for its static part with independent Markov chains for the evolution of the nodes groups through time. We model binary data as well as weighted dynamic random graphs (with discrete or continuous edges values). Our approach, motivated by the importance of controlling for label switching issues across the different time steps, focuses on detecting groups characterized by a stable within group connectivity behavior. We study identifiability of the model parameters, propose an inference procedure based on a variational expectation maximization algorithm as well as a model selection criterion to select for the number of groups. We carefully discuss our initialization strategy which plays an important role in the method and compare our procedure with existing ones on synthetic datasets. We also illustrate our approach on dynamic contact networks, one of encounters among high school students and two others on animal interactions. An implementation of the method is available as a R package called dynsbm.

Posted Content
TL;DR: RWL weights misclassification errors by clinical outcomes by residuals of the outcome from a regression fit on clinical covariates excluding treatment assignment and obtains a rate of convergence for the difference between the expected outcome using the estimated ITR and that of the optimal treatment rule.
Abstract: Personalized medicine has received increasing attention among statisticians, computer scientists, and clinical practitioners. A major component of personalized medicine is the estimation of individualized treatment rules (ITRs). Recently, Zhao et al. (2012) proposed outcome weighted learning (OWL) to construct ITRs that directly optimize the clinical outcome. Although OWL opens the door to introducing machine learning techniques to optimal treatment regimes, it still has some problems in performance. In this article, we propose a general framework, called Residual Weighted Learning (RWL), to improve finite sample performance. Unlike OWL which weights misclassification errors by clinical outcomes, RWL weights these errors by residuals of the outcome from a regression fit on clinical covariates excluding treatment assignment. We utilize the smoothed ramp loss function in RWL, and provide a difference of convex (d.c.) algorithm to solve the corresponding non-convex optimization problem. By estimating residuals with linear models or generalized linear models, RWL can effectively deal with different types of outcomes, such as continuous, binary and count outcomes. We also propose variable selection methods for linear and nonlinear rules, respectively, to further improve the performance. We show that the resulting estimator of the treatment rule is consistent. We further obtain a rate of convergence for the difference between the expected outcome using the estimated ITR and that of the optimal treatment rule. The performance of the proposed RWL methods is illustrated in simulation studies and in an analysis of cystic fibrosis clinical trial data.

Posted Content
TL;DR: An alternative scheme recently introduced in the physics literature where the target distribution is explored using a continuous-time nonreversible piecewise-deterministic Markov process is explored, and several computationally efficient implementations of this Markov chain Monte Carlo schemes are proposed.
Abstract: Markov chain Monte Carlo methods have become standard tools in statistics to sample from complex probability measures. Many available techniques rely on discrete-time reversible Markov chains whose transition kernels build up over the Metropolis-Hastings algorithm. We explore and propose several original extensions of an alternative approach introduced recently in Peters and de With (2012) where the target distribution of interest is explored using a continuous-time Markov process. In the Metropolis-Hastings algorithm, a trial move to a region of lower target density, equivalently "higher energy", than the current state can be rejected with positive probability. In this alternative approach, a particle moves along straight lines continuously around the space and, when facing a high energy barrier, it is not rejected but its path is modified by bouncing against this barrier. The resulting non-reversible Markov process provides a rejection-free MCMC sampling scheme. We propose several original techniques to simulate this continuous-time process exactly in a wide range of scenarios of interest to statisticians. When the target distribution factorizes as a product of factors involving only subsets of variables, such as the posterior distribution associated to a probabilistic graphical model, it is possible to modify the original algorithm to exploit this structure and update in parallel variables within each clique. We present several extensions by proposing methods to sample mixed discrete-continuous distributions and distributions restricted to a connected smooth domain. We also show that it is possible to move the particle using a general flow instead of straight lines. We demonstrate the efficiency of this methodology through simulations on a variety of applications and show that it can outperform Hybrid Monte Carlo schemes in interesting scenarios.

Posted Content
TL;DR: An overview of FDA is provided, starting with simple statistical notions such as mean and covariance functions, then covering some core techniques, the most popular of which is Functional Principal Component Analysis (FPCA), an important dimension reduction tool and in sparse data situations can be used to impute functional data that are sparsely observed.
Abstract: With the advance of modern technology, more and more data are being recorded continuously during a time interval or intermittently at several discrete time points. They are both examples of "functional data", which have become a prevailing type of data. Functional Data Analysis (FDA) encompasses the statistical methodology for such data. Broadly interpreted, FDA deals with the analysis and theory of data that are in the form of functions. This paper provides an overview of FDA, starting with simple statistical notions such as mean and covariance functions, then covering some core techniques, the most popular of which is Functional Principal Component Analysis (FPCA). FPCA is an important dimension reduction tool and in sparse data situations can be used to impute functional data that are sparsely observed. Other dimension reduction approaches are also discussed. In addition, we review another core technique, functional linear regression, as well as clustering and classification of functional data. Beyond linear and single or multiple index methods we touch upon a few nonlinear approaches that are promising for certain applications. They include additive and other nonlinear functional regression models, such as time warping, manifold learning, and dynamic modeling with empirical differential equations. The paper concludes with a brief discussion of future directions.

Journal ArticleDOI
TL;DR: This tutorial provides a practical introduction to fitting LMMs in a Bayesian framework using the probabilistic programming language Stan, which provides an elegant and scalable framework for fitting models in most of the standard applications of LMMs.
Abstract: With the arrival of the R packages nlme and lme4, linear mixed models (LMMs) have come to be widely used in experimentally-driven areas like psychology, linguistics, and cognitive science. This tutorial provides a practical introduction to fitting LMMs in a Bayesian framework using the probabilistic programming language Stan. We choose Stan (rather than WinBUGS or JAGS) because it provides an elegant and scalable framework for fitting models in most of the standard applications of LMMs. We ease the reader into fitting increasingly complex LMMs, first using a two-condition repeated measures self-paced reading study, followed by a more complex $2\times 2$ repeated measures factorial design that can be generalized to much more complex designs.

Journal ArticleDOI
TL;DR: A simulation study is performed to investigate the behaviour of the standard HKSJ and modified mKH procedures in a range of circumstances, with a focus on the common case of meta-analysis based on only a few studies.
Abstract: BACKGROUND: Random-effects meta-analysis is commonly performed by first deriving an estimate of the between-study variation, the heterogeneity, and subsequently using this as the basis for combining results, i.e., for estimating the effect, the figure of primary interest. The heterogeneity variance estimate however is commonly associated with substantial uncertainty, especially in contexts where there are only few studies available, such as in small populations and rare diseases. METHODS: Confidence intervals and tests for the effect may be constructed via a simple normal approximation, or via a Student-t distribution, using the Hartung-Knapp-Sidik-Jonkman (HKSJ) approach, which additionally uses a refined estimator of variance of the effect estimator. The modified Knapp-Hartung method (mKH) applies an ad hoc correction and has been proposed to prevent counterintuitive effects and to yield more conservative inference. We performed a simulation study to investigate the behaviour of the standard HKSJ and modified mKH procedures in a range of circumstances, with a focus on the common case of meta-analysis based on only a few studies. RESULTS: The standard HKSJ procedure works well when the treatment effect estimates to be combined are of comparable precision, but nominal error levels are exceeded when standard errors vary considerably between studies (e.g. due to variations in study size). Application of the modification on the other hand yields more conservative results with error rates closer to the nominal level. Differences are most pronounced in the common case of few studies of varying size or precision. CONCLUSIONS: Use of the modified mKH procedure is recommended, especially when only a few studies contribute to the meta-analysis and the involved studies' precisions (standard errors) vary.

Journal ArticleDOI
TL;DR: An overview of the models and estimation methods as developed, primarily by James Robins, over the years are provided, and insight into their advantages over other methods is provided.
Abstract: Structural nested models (SNMs) and the associated method of G-estimation were first proposed by James Robins over two decades ago as approaches to modeling and estimating the joint effects of a sequence of treatments or exposures. The models and estimation methods have since been extended to dealing with a broader series of problems, and have considerable advantages over the other methods developed for estimating such joint effects. Despite these advantages, the application of these methods in applied research has been relatively infrequent; we view this as unfortunate. To remedy this, we provide an overview of the models and estimation methods as developed, primarily by Robins, over the years. We provide insight into their advantages over other methods, and consider some possible reasons for failure of the methods to be more broadly adopted, as well as possible remedies. Finally, we consider several extensions of the standard models and estimation methods.

Posted Content
TL;DR: It is established via numerical experiments that the MIO approach performs better than {\texttt {Lasso}} and other popularly used sparse learning procedures, in terms of achieving sparse solutions with good predictive power.
Abstract: In the last twenty-five years (1990-2014), algorithmic advances in integer optimization combined with hardware improvements have resulted in an astonishing 200 billion factor speedup in solving Mixed Integer Optimization (MIO) problems. We present a MIO approach for solving the classical best subset selection problem of choosing $k$ out of $p$ features in linear regression given $n$ observations. We develop a discrete extension of modern first order continuous optimization methods to find high quality feasible solutions that we use as warm starts to a MIO solver that finds provably optimal solutions. The resulting algorithm (a) provides a solution with a guarantee on its suboptimality even if we terminate the algorithm early, (b) can accommodate side constraints on the coefficients of the linear regression and (c) extends to finding best subset solutions for the least absolute deviation loss function. Using a wide variety of synthetic and real datasets, we demonstrate that our approach solves problems with $n$ in the 1000s and $p$ in the 100s in minutes to provable optimality, and finds near optimal solutions for $n$ in the 100s and $p$ in the 1000s in minutes. We also establish via numerical experiments that the MIO approach performs better than {\texttt {Lasso}} and other popularly used sparse learning procedures, in terms of achieving sparse solutions with good predictive power.

Journal ArticleDOI
TL;DR: It is demonstrated that investigating the cross-covariance and theCross-correlation matrix between sphered and original variables allows to break the rotational invariance and to identify optimal whitening transformations.
Abstract: Whitening, or sphering, is a common preprocessing step in statistical analysis to transform random variables to orthogonality. However, due to rotational freedom there are infinitely many possible whitening procedures. Consequently, there is a diverse range of sphering methods in use, for example based on principal component analysis (PCA), Cholesky matrix decomposition and zero-phase component analysis (ZCA), among others. Here we provide an overview of the underlying theory and discuss five natural whitening procedures. Subsequently, we demonstrate that investigating the cross-covariance and the cross-correlation matrix between sphered and original variables allows to break the rotational invariance and to identify optimal whitening transformations. As a result we recommend two particular approaches: ZCA-cor whitening to produce sphered variables that are maximally similar to the original variables, and PCA-cor whitening to obtain sphered variables that maximally compress the original variables.

Journal ArticleDOI
TL;DR: This paper, using random matrix theory (RMT), motivates data-driven tools to perceive the complex grids in high-dimension; meanwhile, an architecture with detailed procedures is proposed, the first attempt to apply big data technology into smart grids.
Abstract: Model-based analysis tools, built on assumptions and simplifications, are difficult to handle smart grids with data characterized by 4Vs data. This paper, using random matrix theory (RMT), motivates data-driven tools to perceive the complex grids in highdimension; meanwhile, an architecture with detailed procedures is proposed. In algorithm perspective, the architecture performs a high-dimensional analysis, and compares the findings with RMT predictions to conduct anomaly detections. Mean Spectral Radius (MSR), as a statistical indicator, is defined to reflect the correlations of system data in different dimensions. In management mode perspective, a group-work mode is discussed for smart grids operation. This mode breaks through regional limitations for energy flows and data flows, and makes advanced big data analyses possible. For a specific large-scale zone-dividing system with multiple connected utilities, each site, operating under the group-work mode, is able to work out the regional MSR only with its own measured/simulated data. The large-scale interconnected system, in this way, is naturally decoupled from statistical parameters perspective, rather than from engineering models perspective. Furthermore, a comparative analysis of these distributed MSRs, even with imperceptible different raw data, will produce a contour line to detect the event and locate the source. It demonstrates that the architecture is compatible with the block calculation only using the regional small database; beyond that, this architecture, as a data-driven solution, is sensitive to system situation awareness, and practical for real large-scale interconnected systems. Five case studies and their visualizations validate the designed architecture in various fields of power systems. To our best knowledge, this study is the first attempt to apply big data technology into smart grids.

Journal ArticleDOI
TL;DR: In this paper, the authors introduce a formal representation called "selection diagrams" for expressing knowledge about differences and commonalities between populations of interest and, using this representation, reduce questions of transportability to symbolic derivations in the do-calculus.
Abstract: The generalizability of empirical findings to new environments, settings or populations, often called "external validity," is essential in most scientific explorations. This paper treats a particular problem of generalizability, called "transportability," defined as a license to transfer causal effects learned in experimental studies to a new population, in which only observational studies can be conducted. We introduce a formal representation called "selection diagrams" for expressing knowledge about differences and commonalities between populations of interest and, using this representation, we reduce questions of transportability to symbolic derivations in the do-calculus. This reduction yields graph-based procedures for deciding, prior to observing any data, whether causal effects in the target population can be inferred from experimental findings in the study population. When the answer is affirmative, the procedures identify what experimental and observational findings need be obtained from the two populations, and how they can be combined to ensure bias-free transport.

Journal ArticleDOI
TL;DR: The possibility of automating the process of constructing summary statistics by training deep neural networks to predict the parameters from artificially generated data is explored: the resulting summary statistics are approximately posterior means of the parameters.
Abstract: Approximate Bayesian Computation (ABC) methods are used to approximate posterior distributions in models with unknown or computationally intractable likelihoods. Both the accuracy and computational efficiency of ABC depend on the choice of summary statistic, but outside of special cases where the optimal summary statistics are known, it is unclear which guiding principles can be used to construct effective summary statistics. In this paper we explore the possibility of automating the process of constructing summary statistics by training deep neural networks to predict the parameters from artificially generated data: the resulting summary statistics are approximately posterior means of the parameters. With minimal model-specific tuning, our method constructs summary statistics for the Ising model and the moving-average model, which match or exceed theoretically-motivated summary statistics in terms of the accuracies of the resulting posteriors.

Posted Content
TL;DR: In this article, the authors extend their work to calibration problems with stochastic physical data and propose a novel method, called the $L_2$ calibration, and show its semiparametric efficiency.
Abstract: Many computer models contain unknown parameters which need to be estimated using physical observations. Kennedy and O'Hagan (2001) shows that the calibration method based on Gaussian process models proposed by Kennedy and O'Hagan (2001) may lead to unreasonable estimate for imperfect computer models. In this work, we extend their study to calibration problems with stochastic physical data. We propose a novel method, called the $L_2$ calibration, and show its semiparametric efficiency. The conventional method of the ordinary least squares is also studied. Theoretical analysis shows that it is consistent but not efficient. Numerical examples show that the proposed method outperforms the existing ones.

Posted Content
TL;DR: In this paper, a simple screening technique called the high-dimensional ordinary least-squares projection (HOLP) was proposed to overcome the negative effect of the strong correlation assumption.
Abstract: Variable selection is a challenging issue in statistical applications when the number of predictors $p$ far exceeds the number of observations $n$. In this ultra-high dimensional setting, the sure independence screening (SIS) procedure was introduced to significantly reduce the dimensionality by preserving the true model with overwhelming probability, before a refined second stage analysis. However, the aforementioned sure screening property strongly relies on the assumption that the important variables in the model have large marginal correlations with the response, which rarely holds in reality. To overcome this, we propose a novel and simple screening technique called the high-dimensional ordinary least-squares projection (HOLP). We show that HOLP possesses the sure screening property and gives consistent variable selection without the strong correlation assumption, and has a low computational complexity. A ridge type HOLP procedure is also discussed. Simulation study shows that HOLP performs competitively compared to many other marginal correlation based methods. An application to a mammalian eye disease data illustrates the attractiveness of HOLP.

Journal ArticleDOI
TL;DR: A conceptual classification scheme is developed to better describe this vast literature of BMA, understand its trends and future directions and provide guidance for the researcher interested in both the application and development of the methodology.
Abstract: Bayesian Model Averaging (BMA) is an application of Bayesian inference to the problems of model selection, combined estimation and prediction that produces a straightforward model choice criteria and less risky predictions. However, the application of BMA is not always straightforward, leading to diverse assumptions and situational choices on its different aspects. Despite the widespread application of BMA in the literature, there were not many accounts of these differences and trends besides a few landmark revisions in the late 1990s and early 2000s, therefore not taking into account any advancements made in the last 15 years. In this work, we present an account of these developments through a careful content analysis of 587 articles in BMA published between 1996 and 2014. We also develop a conceptual classification scheme to better describe this vast literature, understand its trends and future directions and provide guidance for the researcher interested in both the application and development of the methodology. The results of the classification scheme and content review are then used to discuss the present and future of the BMA literature.

Journal ArticleDOI
TL;DR: In this article, the Fourier decomposition method (FDM) was proposed for the analysis of nonlinear (i.e. data generated by nonlinear systems) and nonstationary time series.
Abstract: Since many decades, there is a general perception in literature that the Fourier methods are not suitable for the analysis of nonlinear and nonstationary data In this paper, we propose a Fourier Decomposition Method (FDM) and demonstrate its efficacy for the analysis of nonlinear (ie data generated by nonlinear systems) and nonstationary time series The proposed FDM decomposes any data into a small number of `Fourier intrinsic band functions' (FIBFs) The FDM presents a generalized Fourier expansion with variable amplitudes and frequencies of a time series by the Fourier method itself We propose an idea of zero-phase filter bank based multivariate FDM (MFDM) algorithm, for the analysis of multivariate nonlinear and nonstationary time series, from the FDM We also present an algorithm to obtain cutoff frequencies for MFDM The MFDM algorithm is generating finite number of band limited multivariate FIBFs (MFIBFs) The MFDM preserves some intrinsic physical properties of the multivariate data, such as scale alignment, trend and instantaneous frequency The proposed methods produce the results in a time-frequency-energy distribution that reveal the intrinsic structures of a data Simulations have been carried out and comparison is made with the Empirical Mode Decomposition (EMD) methods in the analysis of various simulated as well as real life time series, and results show that the proposed methods are powerful tools for analyzing and obtaining the time-frequency-energy representation of any data

Posted Content
TL;DR: In this article, the authors provide a selective review of several recent developments on estimating large covariance and precision matrices, focusing on two general approaches: rank based method and factor model based method.
Abstract: Estimating large covariance and precision matrices are fundamental in modern multivariate analysis. The problems arise from statistical analysis of large panel economics and finance data. The covariance matrix reveals marginal correlations between variables, while the precision matrix encodes conditional correlations between pairs of variables given the remaining variables. In this paper, we provide a selective review of several recent developments on estimating large covariance and precision matrices. We focus on two general approaches: rank based method and factor model based method. Theories and applications of both approaches are presented. These methods are expected to be widely applicable to analysis of economic and financial data.

Posted Content
TL;DR: In this article, the authors proposed a geometric posterior inference method that combines the geometry of posterior distributions estimated across different subsets and combines them through their barycenter in a Wasserstein space of probability measures.
Abstract: Divide-and-conquer based methods for Bayesian inference provide a general approach for tractable posterior inference when the sample size is large. These methods divide the data into smaller subsets, sample from the posterior distribution of parameters in parallel on all the subsets, and combine posterior samples from all the subsets to approximate the full data posterior distribution. The smaller size of any subset compared to the full data implies that posterior sampling on any subset is computationally more efficient than sampling from the true posterior distribution. Since the combination step takes negligible time relative to sampling, posterior computations can be scaled to massive data by dividing the full data into a sufficiently large number of data subsets. One such approach relies on the geometry of posterior distributions estimated across different subsets and combines them through their barycenter in a Wasserstein space of probability measures. We provide theoretical guarantees on the accuracy of approximation that are valid in many applications. We show that the geometric method approximates the full data posterior distribution better than its competitors across diverse simulations and reproduces known results when applied to a movie ratings database.