scispace - formally typeset
Search or ask a question

Showing papers on "Bayesian probability published in 2016"


Journal ArticleDOI
TL;DR: Variational inference (VI), a method from machine learning that approximates probability densities through optimization, is reviewed and a variant that uses stochastic optimization to scale up to massive data is derived.
Abstract: One of the core problems of modern statistics is to approximate difficult-to-compute probability densities. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation involving the posterior density. In this paper, we review variational inference (VI), a method from machine learning that approximates probability densities through optimization. VI has been used in many applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of densities and then to find the member of that family which is close to the target. Closeness is measured by Kullback-Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data. We discuss modern research in VI and highlight important open problems. VI is powerful, but it is not yet well understood. Our hope in writing this paper is to catalyze statistical research on this class of algorithms.

852 citations


Book
14 Jan 2016
TL;DR: This book provides a compact self-contained introduction to the theory and application of Bayesian statistical methods and ends with modern topics such as variable selection in regression, generalized linear mixed effects models, and semiparametric copula estimation.
Abstract: This book provides a compact self-contained introduction to the theory and application of Bayesian statistical methods. The book is accessible to readers havinga basic familiarity with probability, yet allows more advanced readers to quickly grasp the principles underlying Bayesian theory and methods. The examples and computer code allow the reader to understand and implement basic Bayesian data analyses using standard statistical models and to extend the standard models to specialized data analysis situations. The book begins with fundamental notions such as probability, exchangeability and Bayes' rule, and ends with modern topics such as variable selection in regression, generalized linear mixed effects models, and semiparametric copula estimation. Numerous examples from the social, biological and physical sciences show how to implement these methodologies in practice. Monte Carlo summaries of posterior distributions play an important role in Bayesian data analysis. The open-source R statistical computing environment provides sufficient functionality to make Monte Carlo estimation very easy for a large number of statistical models and example R-code is provided throughout the text. Much of the example code can be run as is' in R, and essentially all of it can be run after downloading the relevant datasets from the companion website for this book.

684 citations



Journal ArticleDOI
TL;DR: It is argued that a valid update of a prior belief distribution to a posterior can be made for parameters which are connected to observations through a loss function rather than the traditional likelihood function, which is recovered as a special case.
Abstract: We propose a framework for general Bayesian inference. We argue that a valid update of a prior belief distribution to a posterior can be made for parameters which are connected to observations through a loss function rather than the traditional likelihood function, which is recovered as a special case. Modern application areas make it increasingly challenging for Bayesians to attempt to model the true data-generating mechanism. For instance, when the object of interest is low dimensional, such as a mean or median, it is cumbersome to have to achieve this via a complete model for the whole data distribution. More importantly, there are settings where the parameter of interest does not directly index a family of density functions and thus the Bayesian approach to learning about such parameters is currently regarded as problematic. Our framework uses loss functions to connect information in the data to functionals of interest. The updating of beliefs then follows from a decision theoretic approach involving cumulative loss functions. Importantly, the procedure coincides with Bayesian updating when a true likelihood is known yet provides coherent subjective inference in much more general settings. Connections to other inference frameworks are highlighted.

359 citations


Proceedings Article
05 Dec 2016
TL;DR: This work presents a general approach for using flexible parametric models (neural networks) for Bayesian optimization, staying as close to a truly Bayesian treatment as possible and obtaining scalability through stochastic gradient Hamiltonian Monte Carlo, whose robustness is improved via a scale adaptation.
Abstract: Bayesian optimization is a prominent method for optimizing expensive-to-evaluate black-box functions that is widely applied to tuning the hyperparameters of machine learning algorithms. Despite its successes, the prototypical Bayesian optimization approach - using Gaussian process models - does not scale well to either many hyperparameters or many function evaluations. Attacking this lack of scalability and flexibility is thus one of the key challenges of the field. We present a general approach for using flexible parametric models (neural networks) for Bayesian optimization, staying as close to a truly Bayesian treatment as possible. We obtain scalability through stochastic gradient Hamiltonian Monte Carlo, whose robustness we improve via a scale adaptation. Experiments including multi-task Bayesian optimization with 21 tasks, parallel optimization of deep neural networks and deep reinforcement learning show the power and flexibility of this approach.

358 citations


01 Jan 2016
TL;DR: The bayesian forecasting and dynamic models is universally compatible with any devices to read and is available in the book collection an online access to it is set as public so you can get it instantly.
Abstract: Thank you very much for downloading bayesian forecasting and dynamic models. As you may know, people have look numerous times for their favorite readings like this bayesian forecasting and dynamic models, but end up in infectious downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they cope with some malicious virus inside their computer. bayesian forecasting and dynamic models is available in our book collection an online access to it is set as public so you can get it instantly. Our digital library hosts in multiple countries, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the bayesian forecasting and dynamic models is universally compatible with any devices to read.

338 citations


Book
21 Jul 2016
TL;DR: This book takes an exhilarating journey through the revolution in data analysis following the introduction of electronic computation in the 1950s, with speculation on the future direction of statistics and data science.
Abstract: The twenty-first century has seen a breathtaking expansion of statistical methodology, both in scope and in influence. 'Big data', 'data science', and 'machine learning' have become familiar terms in the news, as statistical methods are brought to bear upon the enormous data sets of modern science and commerce. How did we get here? And where are we going? This book takes us on an exhilarating journey through the revolution in data analysis following the introduction of electronic computation in the 1950s. Beginning with classical inferential theories - Bayesian, frequentist, Fisherian - individual chapters take up a series of influential topics: survival analysis, logistic regression, empirical Bayes, the jackknife and bootstrap, random forests, neural networks, Markov chain Monte Carlo, inference after model selection, and dozens more. The distinctly modern approach integrates methodology and algorithms with statistical inference. The book ends with speculation on the future direction of statistics and data science.

323 citations


Journal ArticleDOI
TL;DR: In this paper, the authors show that relying on software defaults or diffuse priors with small samples can yield more biased estimates than frequentist methods, especially if frequentist small sample corrections are utilized.
Abstract: As Bayesian methods continue to grow in accessibility and popularity, more empirical studies are turning to Bayesian methods to model small sample data. Bayesian methods do not rely on asympotics, a property that can be a hindrance when employing frequentist methods in small sample contexts. Although Bayesian methods are better equipped to model data with small sample sizes, estimates are highly sensitive to the specification of the prior distribution. If this aspect is not heeded, Bayesian estimates can actually be worse than frequentist methods, especially if frequentist small sample corrections are utilized. We show with illustrative simulations and applied examples that relying on software defaults or diffuse priors with small samples can yield more biased estimates than frequentist methods. We discuss conditions that need to be met if researchers want to responsibly harness the advantages that Bayesian methods offer for small sample problems as well as leading small sample frequentist methods.

295 citations


01 Jan 2016
TL;DR: The data analysis a bayesian tutorial is universally compatible with any devices to read and is available in the digital library an online access to it is set as public so you can download it instantly.
Abstract: Thank you very much for reading data analysis a bayesian tutorial. As you may know, people have look numerous times for their chosen readings like this data analysis a bayesian tutorial, but end up in harmful downloads. Rather than reading a good book with a cup of tea in the afternoon, instead they are facing with some harmful virus inside their desktop computer. data analysis a bayesian tutorial is available in our digital library an online access to it is set as public so you can download it instantly. Our books collection hosts in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Merely said, the data analysis a bayesian tutorial is universally compatible with any devices to read.

284 citations


Journal ArticleDOI
TL;DR: In this paper, the authors focus on two common inferential scenarios: testing the nullity of a normal mean (i.e., the Bayesian equivalent of the t -test) and testing the correlation.

276 citations


Journal ArticleDOI
TL;DR: It is shown that Bayesian analysis of macroevolutionary mixtures (BAMM)—a method for identifying lineage-specific diversification rates—is flawed and the inability to correctly compute the likelihood or to correctly specify the prior for rate-variable trees precludes the use of Bayesian approaches for testing hypotheses regarding the number and location of diversification-rate shifts using BAMM.
Abstract: Bayesian analysis of macroevolutionary mixtures (BAMM) has recently taken the study of lineage diversification by storm. BAMM estimates the diversification-rate parameters (speciation and extinction) for every branch of a study phylogeny and infers the number and location of diversification-rate shifts across branches of a tree. Our evaluation of BAMM reveals two major theoretical errors: (i) the likelihood function (which estimates the model parameters from the data) is incorrect, and (ii) the compound Poisson process prior model (which describes the prior distribution of diversification-rate shifts across branches) is incoherent. Using simulation, we demonstrate that these theoretical issues cause statistical pathologies; posterior estimates of the number of diversification-rate shifts are strongly influenced by the assumed prior, and estimates of diversification-rate parameters are unreliable. Moreover, the inability to correctly compute the likelihood or to correctly specify the prior for rate-variable trees precludes the use of Bayesian approaches for testing hypotheses regarding the number and location of diversification-rate shifts using BAMM.

Proceedings Article
10 Dec 2016
TL;DR: This work proposes a new approach to likelihood-free inference based on Bayesian conditional density estimation, which requires fewer model simulations than Monte Carlo ABC methods need to produce a single sample from an approximate posterior.
Abstract: Many statistical models can be simulated forwards but have intractable likelihoods. Approximate Bayesian Computation (ABC) methods are used to infer properties of these models from data. Traditionally these methods approximate the posterior over parameters by conditioning on data being inside an e-ball around the observed data, which is only correct in the limit e→0. Monte Carlo methods can then draw samples from the approximate posterior to approximate predictions or error bars on parameters. These algorithms critically slow down as e→0, and in practice draw samples from a broader distribution than the posterior. We propose a new approach to likelihood-free inference based on Bayesian conditional density estimation. Preliminary inferences based on limited simulation data are used to guide later simulations. In some cases, learning an accurate parametric representation of the entire true posterior distribution requires fewer model simulations than Monte Carlo ABC methods need to produce a single sample from an approximate posterior.

Journal ArticleDOI
TL;DR: A two phases data driven method for RUL prediction is presented and the results show the effectiveness of the method in predicting the RUL for both applications.
Abstract: Reliability of prognostics and health management systems relies upon accurate understanding of critical components' degradation process to predict the remaining useful life (RUL). Traditionally, degradation process is represented in the form of physical or expert models. Such models require extensive experimentation and verification that are not always feasible. Another approach that builds up knowledge about the system degradation over the time from component sensor data is known as data driven. Data driven models, however, require that sufficient historical data have been collected. In this paper, a two phases data driven method for RUL prediction is presented. In the offline phase, the proposed method builds on finding variables that contain information about the degradation behavior using unsupervised variable selection method. Different health indicators (HIs) are constructed from the selected variables, which represent the degradation as a function of time, and saved in the offline database as reference models. In the online phase, the method finds the most similar offline HI, to the online HI, using k-nearest neighbors classifier to use it as a RUL predictor. The method finally estimates the degradation state using discrete Bayesian filter. The method is verified using battery and turbofan engine degradation simulation data acquired from NASA data repository. The results show the effectiveness of the method in predicting the RUL for both applications.

Journal ArticleDOI
TL;DR: A sparse Bayesian method is introduced by exploiting Laplace priors, namely, SBLaplace, for EEG classification by learning a sparse discriminant vector with a Laplace prior in a hierarchical fashion under a Bayesian evidence framework.
Abstract: Regularization has been one of the most popular approaches to prevent overfitting in electroencephalogram (EEG) classification of brain–computer interfaces (BCIs). The effectiveness of regularization is often highly dependent on the selection of regularization parameters that are typically determined by cross-validation (CV). However, the CV imposes two main limitations on BCIs: 1) a large amount of training data is required from the user and 2) it takes a relatively long time to calibrate the classifier. These limitations substantially deteriorate the system’s practicability and may cause a user to be reluctant to use BCIs. In this paper, we introduce a sparse Bayesian method by exploiting Laplace priors, namely, SBLaplace, for EEG classification. A sparse discriminant vector is learned with a Laplace prior in a hierarchical fashion under a Bayesian evidence framework. All required model parameters are automatically estimated from training data without the need of CV. Extensive comparisons are carried out between the SBLaplace algorithm and several other competing methods based on two EEG data sets. The experimental results demonstrate that the SBLaplace algorithm achieves better overall performance than the competing algorithms for EEG classification.

Journal ArticleDOI
TL;DR: The practical advantages of Bayesian inference are demonstrated here through two concrete examples as mentioned in this paper, which demonstrate how Bayesian analyses can be more informative, more elegant, and more flexible than the orthodox methodology that remains dominant within the field of psychology.
Abstract: The practical advantages of Bayesian inference are demonstrated here through two concrete examples. In the first example, we wish to learn about a criminal’s IQ: a problem of parameter estimation. In the second example, we wish to quantify and track support in favor of the null hypothesis that Adam Sandler movies are profitable regardless of their quality: a problem of hypothesis testing. The Bayesian approach unifies both problems within a coherent predictive framework, in which parameters and models that predict the data successfully receive a boost in plausibility, whereas parameters and models that predict poorly suffer a decline. Our examples demonstrate how Bayesian analyses can be more informative, more elegant, and more flexible than the orthodox methodology that remains dominant within the field of psychology.

Journal ArticleDOI
TL;DR: In this paper, the authors present a new package in R implementing Bayesian additive regression trees (BART), which introduces many new features for data analysis using BART such as variable selection, interaction detection, model diagnostic plots, incorporation of missing data and the ability to save trees for future prediction.
Abstract: We present a new package in R implementing Bayesian additive regression trees (BART). The package introduces many new features for data analysis using BART such as variable selection, interaction detection, model diagnostic plots, incorporation of missing data and the ability to save trees for future prediction. It is significantly faster than the current R implementation, parallelized, and capable of handling both large sample sizes and high-dimensional data.

Journal ArticleDOI
TL;DR: It is proposed that Bayesian brains need not represent or calculate probabilities at all and are, indeed, poorly adapted to do so: the brain is a Bayesian sampler.

Posted ContentDOI
22 Jun 2016-bioRxiv
TL;DR: The bModelTest as discussed by the authors is a Bayesian approach to inferring and marginalizing site models in a phylogenetic analysis based on trans-dimensional Markov chain Monte Carlo (MCMC) proposals that allow switching between substitution models as well as estimating the posterior probability for gamma-distributed rate heterogeneity, a proportion of invariable sites and unequal base frequencies.
Abstract: Background: Reconstructing phylogenies through Bayesian methods has many benefits, which include providing a mathematically sound framework, providing realistic estimates of uncertainty and being able to incorporate different sources of information based on formal principles. Bayesian phylogenetic analyses are popular for interpreting nucleotide sequence data, however for such studies one needs to specify a site model and associated substitution model. Often, the parameters of the site model is of no interest and an ad-hoc or additional likelihood based analysis is used to select a single site model. Results: bModelTest allows for a Bayesian approach to inferring and marginalizing site models in a phylogenetic analysis. It is based on trans-dimensional Markov chain Monte Carlo (MCMC) proposals that allow switching between substitution models as well as estimating the posterior probability for gamma-distributed rate heterogeneity, a proportion of invariable sites and unequal base frequencies. The model can be used with the full set of time-reversible models on nucleotides, but we also introduce and demonstrate the use of two subsets of time-reversible substitution models. Conclusion: With the new method the site model can be inferred (and marginalized) during the MCMC analysis and does not need to be pre-determined, as is now often the case in practice, by likelihood-based methods. The method is implemented in the bModelTest package of the popular BEAST 2 software, which is open source, licensed under the GNU Lesser General Public License and allows joint site model and tree inference under a wide range of models.

Journal ArticleDOI
TL;DR: This paper proposes a general framework for Bayesian deep learning and reviews its recent applications on recommender systems, topic models, and control and discusses the relationship and differences between Bayesiandeep learning and other related topics such as the Bayesian treatment of neural networks.
Abstract: While perception tasks such as visual object recognition and text understanding play an important role in human intelligence, subsequent tasks that involve inference, reasoning, and planning require an even higher level of intelligence. The past few years have seen major advances in many perception tasks using deep learning models. For higher-level inference, however, probabilistic graphical models with their Bayesian nature are still more powerful and flexible. To achieve integrated intelligence that involves both perception and inference, it is naturally desirable to tightly integrate deep learning and Bayesian models within a principled probabilistic framework, which we call Bayesian deep learning . In this unified framework, the perception of text or images using deep learning can boost the performance of higher-level inference and in return, the feedback from the inference process is able to enhance the perception of text or images. This paper proposes a general framework for Bayesian deep learning and reviews its recent applications on recommender systems, topic models, and control. In this paper, we also discuss the relationship and differences between Bayesian deep learning and other related topics such as the Bayesian treatment of neural networks.

Journal ArticleDOI
TL;DR: The animal track analysis suggests some general principles for the exploratory analysis of movement data, including ways to exploit the strengths of the various methods.
Abstract: Summary Movement data provide a window – often our only window – into the cognitive, social and biological processes that underlie the behavioural ecology of animals in the wild. Robust methods for identifying and interpreting distinct modes of movement behaviour are of great importance, but complicated by the fact that movement data are complex, multivariate and dependent. Many different approaches to exploratory analysis of movement have been developed to answer similar questions, and practitioners are often at a loss for how to choose an appropriate tool for a specific question. We apply and compare four methodological approaches: first passage time (FPT), Bayesian partitioning of Markov models (BPMM), behavioural change point analysis (BCPA) and a fitted multistate random walk (MRW) to three simulated tracks and two animal trajectories – a sea lamprey (Petromyzon marinus) tracked for 12 h and a wolf (Canis lupus) tracked for 1 year. The simulations – in which, respectively, velocity, tortuosity and spatial bias change – highlight the sensitivity of all methods to model misspecification. Methods that do not account for autocorrelation in the movement variables lead to spurious change points, while methods that do not account for spatial bias completely miss changes in orientation. When applied to the animal data, the methods broadly agree on the structure of the movement behaviours. Important discrepancies, however, reflect differences in the assumptions and nature of the outputs. Important trade-offs are between the strength of the a priori assumptions (low in BCPA, high in MRW), complexity of output (high in the BCPA, low in the BPMM and MRW) and explanatory potential (highest in the MRW). The animal track analysis suggests some general principles for the exploratory analysis of movement data, including ways to exploit the strengths of the various methods. We argue for close and detailed exploratory analysis of movement before fitting complex movement models.

Proceedings ArticleDOI
19 Jun 2016
TL;DR: A new approximate Bayesian learning scheme is developed that enables DGPs to be applied to a range of medium to large scale regression problems for the first time and is almost always better than state-of-the-art deterministic and sampling-based approximate inference methods for Bayesian neural networks.
Abstract: Deep Gaussian processes (DGPs) are multilayer hierarchical generalisations of Gaussian processes (GPs) and are formally equivalent to neural networks with multiple, infinitely wide hidden layers. DGPs are nonparametric probabilistic models and as such are arguably more flexible, have a greater capacity to generalise, and provide better calibrated uncertainty estimates than alternative deep models. This paper develops a new approximate Bayesian learning scheme that enables DGPs to be applied to a range of medium to large scale regression problems for the first time. The new method uses an approximate Expectation Propagation procedure and a novel and efficient extension of the probabilistic backpropagation algorithm for learning. We evaluate the new method for non-linear regression on eleven real-world datasets, showing that it always outperforms GP regression and is almost always better than state-of-the-art deterministic and sampling-based approximate inference methods for Bayesian neural networks. As a by-product, this work provides a comprehensive analysis of six approximate Bayesian methods for training neural networks.


Proceedings Article
Daniel Russo1
06 Jun 2016
TL;DR: In this paper, the optimal adaptive allocation of measurement effort for identifying the best among a finite set of options or designs is studied. But the authors focus on the problem of selecting the best design after a small number of measurements.
Abstract: This paper considers the optimal adaptive allocation of measurement effort for identifying the best among a finite set of options or designs. An experimenter sequentially chooses designs to measure and observes noisy signals of their quality with the goal of confidently identifying the best design after a small number of measurements. I propose three simple Bayesian algorithms for adaptively allocating measurement effort. One is Top-Two Probability sampling, which computes the two designs with the highest posterior probability of being optimal, and then randomizes to select among these two. One is a variant a top-two sampling which considers not only the probability a design is optimal, but the expected amount by which its quality exceeds that of other designs. The final algorithm is a modified version of Thompson sampling that is tailored for identifying the best design. I prove that these simple algorithms satisfy a strong optimality property. In a frequestist setting where the true quality of the designs is fixed, one hopes the posterior definitively identifies the optimal design, in the sense that that the posterior probability assigned to the event that some other design is optimal converges to zero as measurements are collected. I show that under the proposed algorithms this convergence occurs at an exponential rate, and the corresponding exponent is the best possible among all allocation rules.

Journal ArticleDOI
TL;DR: In this article, a sequential importance sampling (SIS) algorithm is proposed to estimate the probability of failure in structural reliability in the context of structural reliability problems, which is applicable to general problems with small to moderate number of random variables and is especially efficient for tackling high-dimensional problems.

Journal ArticleDOI
TL;DR: A structured Bayesian group factor analysis model is developed that extends the factor model to multiple coupled observation matrices and allows for both dense and sparse latent factors so that covariation among either all features or only a subset of features can be recovered.
Abstract: Latent factor models are the canonical statistical tool for exploratory analyses of low-dimensional linear structure for a matrix of p features across n samples. We develop a structured Bayesian group factor analysis model that extends the factor model to multiple coupled observation matrices; in the case of two observations, this reduces to a Bayesian model of canonical correlation analysis. Here, we carefully define a structured Bayesian prior that encourages both element-wise and column-wise shrinkage and leads to desirable behavior on high-dimensional data. In particular, our model puts a structured prior on the joint factor loading matrix, regularizing at three levels, which enables element-wise sparsity and unsupervised recovery of latent factors corresponding to structured variance across arbitrary subsets of the observations. In addition, our structured prior allows for both dense and sparse latent factors so that covariation among either all features or only a subset of features can be recovered. We use fast parameter-expanded expectation-maximization for parameter estimation in this model. We validate our method on simulated data with substantial structure. We show results of our method applied to three high-dimensional data sets, comparing results against a number of state-of-the-art approaches. These results illustrate useful properties of our model, including i) recovering sparse signal in the presence of dense effects; ii) the ability to scale naturally to large numbers of observations; iii) flexible observation- and factor-specific regularization to recover factors with a wide variety of sparsity levels and percentage of variance explained; and iv) tractable inference that scales to modern genomic and text data sizes.

Journal ArticleDOI
TL;DR: In this article, the authors study the computational complexity of Markov chain Monte Carlo (MCMC) methods for high-dimensional Bayesian linear regression under sparsity constraints and show that a Bayesian approach can achieve variable-selection consistency under relatively mild conditions on the design matrix.
Abstract: We study the computational complexity of Markov chain Monte Carlo (MCMC) methods for high-dimensional Bayesian linear regression under sparsity constraints. We rst show that a Bayesian approach can achieve variable-selection consistency under relatively mild conditions on the design matrix. We then demonstrate that the statistical criterion of posterior concentration need not imply the computational desideratum of rapid mixing of the MCMC algorithm. By introducing a truncated sparsity prior for variable selection, we provide a set of conditions that guarantee both variable-selection consistency and rapid mixing of a particular Metropolis-Hastings algorithm. The mixing time is linear in the number of covariates up to a logarithmic factor. Our proof controls the spectral gap of the Markov chain by constructing a canonical path ensemble that is inspired by the steps taken by greedy algorithms for variable selection.

Posted Content
TL;DR: The reasons for the success of the INLA approach, the R-INLA package, why it is so accurate, why the approximations are very quick to compute, and why LGMs make such a useful concept for Bayesian computing are discussed.
Abstract: The key operation in Bayesian inference, is to compute high-dimensional integrals. An old approximate technique is the Laplace method or approximation, which dates back to Pierre- Simon Laplace (1774). This simple idea approximates the integrand with a second order Taylor expansion around the mode and computes the integral analytically. By developing a nested version of this classical idea, combined with modern numerical techniques for sparse matrices, we obtain the approach of Integrated Nested Laplace Approximations (INLA) to do approximate Bayesian inference for latent Gaussian models (LGMs). LGMs represent an important model-abstraction for Bayesian inference and include a large proportion of the statistical models used today. In this review, we will discuss the reasons for the success of the INLA-approach, the R-INLA package, why it is so accurate, why the approximations are very quick to compute and why LGMs make such a useful concept for Bayesian computing.

Proceedings Article
01 Jan 2016
TL;DR: In this paper, the authors leverage the idea that data is often redundant to obtain a weighted subset of the data (called a coreset) that is much smaller than the original dataset, which can then be used in any number of existing posterior inference algorithms without modification.
Abstract: The use of Bayesian methods in large-scale data settings is attractive because of the rich hierarchical models, uncertainty quantification, and prior specification they provide. Standard Bayesian inference algorithms are computationally expensive, however, making their direct application to large datasets difficult or infeasible. Recent work on scaling Bayesian inference has focused on modifying the underlying algorithms to, for example, use only a random data subsample at each iteration. We leverage the insight that data is often redundant to instead obtain a weighted subset of the data (called a coreset) that is much smaller than the original dataset. We can then use this small coreset in any number of existing posterior inference algorithms without modification. In this paper, we develop an efficient coreset construction algorithm for Bayesian logistic regression models. We provide theoretical guarantees on the size and approximation quality of the coreset -- both for fixed, known datasets, and in expectation for a wide class of data generative models. Crucially, the proposed approach also permits efficient construction of the coreset in both streaming and parallel settings, with minimal additional effort. We demonstrate the efficacy of our approach on a number of synthetic and real-world datasets, and find that, in practice, the size of the coreset is independent of the original dataset size. Furthermore, constructing the coreset takes a negligible amount of time compared to that required to run MCMC on it.

Journal ArticleDOI
TL;DR: In this paper, a Bayesian graphical VAR (BGVAR) model is proposed to identify the causal structures of the structural VAR model, where the contemporaneous and temporal causal structures are represented by two different graphs.
Abstract: Summary This paper proposes a Bayesian, graph-based approach to identification in vector autoregressive (VAR) models. In our Bayesian graphical VAR (BGVAR) model, the contemporaneous and temporal causal structures of the structural VAR model are represented by two different graphs. We also provide an efficient Markov chain Monte Carlo algorithm to estimate jointly the two causal structures and the parameters of the reduced-form VAR model. The BGVAR approach is shown to be quite effective in dealing with model identification and selection in multivariate time series of moderate dimension, as those considered in the economic literature. In the macroeconomic application the BGVAR identifies the relevant structural relationships among 20 US economic variables, thus providing a useful tool for policy analysis. The financial application contributes to the recent econometric literature on financial interconnectedness. The BGVAR approach provides evidence of a strong unidirectional linkage from financial to non-financial super-sectors during the 2007–2009 financial crisis and a strong bidirectional linkage between the two sectors during the 2010–2013 European sovereign debt crisis. Copyright © 2015 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A Bayesian approach to regularize the RNN-LM and apply it for continuous speech recognition by compensating for the uncertainty of the estimated model parameters, which is represented by a Gaussian prior.
Abstract: A language model (LM) is calculated as the probability of a word sequence that provides the solution to word prediction for a variety of information systems. A recurrent neural network (RNN) is powerful to learn the large-span dynamics of a word sequence in the continuous space. However, the training of the RNN-LM is an ill-posed problem because of too many parameters from a large dictionary size and a high-dimensional hidden layer. This paper presents a Bayesian approach to regularize the RNN-LM and apply it for continuous speech recognition. We aim to penalize the too complicated RNN-LM by compensating for the uncertainty of the estimated model parameters, which is represented by a Gaussian prior. The objective function in a Bayesian classification network is formed as the regularized cross-entropy error function. The regularized model is constructed not only by calculating the regularized parameters according to the maximum a posteriori criterion but also by estimating the Gaussian hyperparameter by maximizing the marginal likelihood. A rapid approximation to a Hessian matrix is developed to implement the Bayesian RNN-LM (BRNN-LM) by selecting a small set of salient outer-products. The proposed BRNN-LM achieves a sparser model than the RNN-LM. Experiments on different corpora show the robustness of system performance by applying the rapid BRNN-LM under different conditions.