Showing papers on "Bayes' theorem published in 2020"

PDF

Open Access

Posted Content•

Stochastic Variational Bayesian Inference for a Nonlinear Forward Model

[...]

Michael A. Chappell, Martin Craig, Mark W. Woolrich

03 Jul 2020-arXiv: Signal Processing

TL;DR: Variational Bayes (VB) has been proposed as a method to facilitate calculations of the posterior distributions for linear models, by providing a fast method for Bayesian inference by estimating the parameters of a factorized approximation to the posterior distribution.

...read moreread less

Abstract: Variational Bayes (VB) has been used to facilitate the calculation of the posterior distribution in the context of Bayesian inference of the parameters of nonlinear models from data Previously an analytical formulation of VB has been derived for nonlinear model inference on data with additive gaussian noise as an alternative to nonlinear least squares Here a stochastic solution is derived that avoids some of the approximations required of the analytical formulation, offering a solution that can be more flexibly deployed for nonlinear model inference problems The stochastic VB solution was used for inference on a biexponential toy case and the algorithmic parameter space explored, before being deployed on real data from a magnetic resonance imaging study of perfusion The new method was found to achieve comparable parameter recovery to the analytic solution and be competitive in terms of computational speed despite being reliant on sampling

...read moreread less

261 citations

Proceedings Article•

How Good is the Bayes Posterior in Deep Neural Networks Really

[...]

Florian Wenzel¹, Kevin Roth², Bastiaan S. Veeling³, Jakub Swiatkowski⁴, Linh Tran⁵, Stephan Mandt⁶, Jasper Snoek¹, Tim Salimans¹, Rodolphe Jenatton¹, Sebastian Nowozin⁷ - Show less +6 more•Institutions (7)

Google¹, ETH Zurich², University of Amsterdam³, University of Warsaw⁴, Imperial College London⁵, University of California, Irvine⁶, Microsoft⁷

12 Jul 2020

TL;DR: This work demonstrates through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD and argues that it is timely to focus on understanding the origin of the improved performance of cold posteriors.

...read moreread less

Abstract: During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are—as of early 2020—no publicized deployments of Bayesian neural networks in industrial practice. In this work we cast doubt on the current understanding of Bayes posteriors in popular deep neural networks: we demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD. Furthermore, we demonstrate that predictive performance is improved significantly through the use of a “cold posterior” that overcounts evidence. Such cold posteriors sharply deviate from the Bayesian paradigm but are commonly used as heuristic in Bayesian deep learning papers. We put forward several hypotheses that could explain cold posteriors and evaluate the hypotheses through experiments. Our work questions the goal of accurate posterior approximations in Bayesian deep learning: If the true Bayes posterior is poor, what is the use of more accurate approximations? Instead, we argue that it is timely to focus on understanding the origin of the improved performance of cold posteriors. Code available on GitHub.

...read moreread less

198 citations

Journal Article•DOI•

LSTM Learning With Bayesian and Gaussian Processing for Anomaly Detection in Industrial IoT

[...]

Di Wu¹, Zhongkai Jiang¹, Xiaofeng Xie¹, Xuetao Wei², Weiren Yu³, Renfa Li¹ - Show less +2 more•Institutions (3)

Hunan University¹, Southern University of Science and Technology², University of Warwick³

01 Aug 2020-IEEE Transactions on Industrial Informatics

TL;DR: This article proposes a long short-term memory (LSTM)-Gauss-NBayes method, which is a synergy of the LSTM-NN and the Gaussian Bayes model for outlier detection in the IIoT, and demonstrates that the proposed techniques outperform the best-known competitors.

...read moreread less

Abstract: The data generated by millions of sensors in the industrial Internet of Things (IIoT) are extremely dynamic, heterogeneous, and large scale and pose great challenges on the real-time analysis and decision making for anomaly detection in the IIoT. In this article, we propose a long short-term memory (LSTM)-Gauss-NBayes method, which is a synergy of the long short-term memory neural network (LSTM-NN) and the Gaussian Bayes model for outlier detection in the IIoT. In a nutshell, the LSTM-NN builds a model on normal time series. It detects outliers by utilizing the predictive error for the Gaussian Naive Bayes model. Our method exploits advantages of both LSTM and Gaussian Naive Bayes models, which not only has strong prediction capability of LSTM for future time point data, but also achieves an excellent classification performance of the Gaussian Naive Bayes model through the predictive error. We evaluate our approaches on three real-life datasets that involve both long-term and short-term time dependence. Empirical studies demonstrate that our proposed techniques outperform the best-known competitors, which is a preferable choice for detecting anomalies.

...read moreread less

158 citations

Book Chapter•DOI•

An Ensemble of Epoch-wise Empirical Bayes for Few-shot Learning

[...]

Yaoyao Liu¹, Bernt Schiele¹, Qianru Sun²•Institutions (2)

Max Planck Society¹, Singapore Management University²

23 Aug 2020

TL;DR: In this paper, the authors propose to meta-learn the ensemble of epoch-wise empirical Bayes models (E3BM) to achieve robust predictions, where each training epoch has a Bayes model whose parameters are specifically learned and deployed.

...read moreread less

Abstract: Few-shot learning aims to train efficient predictive models with a few examples. The lack of training data leads to poor models that perform high-variance or low-confidence predictions. In this paper, we propose to meta-learn the ensemble of epoch-wise empirical Bayes models (E$^3$BM) to achieve robust predictions. “Epoch-wise” means that each training epoch has a Bayes model whose parameters are specifically learned and deployed. “Empirical” means that the hyperparameters, e.g., used for learning and ensembling the epoch-wise models, are generated by hyperprior learners conditional on task-specific data. We introduce four kinds of hyperprior learners by considering inductive vs. transductive, and epoch-dependent vs. epoch-independent, in the paradigm of meta-learning. We conduct extensive experiments for five-class few-shot tasks on three challenging benchmarks: miniImageNet, tieredImageNet, and FC100, and achieve top performance using the epoch-dependent transductive hyperprior learner, which captures the richest information. Our ablation study shows that both “epoch-wise ensemble” and “empirical” encourage high efficiency and robustness in the model performance (Our code is open-sourced at https://gitlab.mpi-klsb.mpg.de/yaoyaoliu/e3bm).

...read moreread less

107 citations

Journal Article•DOI•

An efficient feature selection based Bayesian and rough set approach for intrusion detection

[...]

Mahendra Prasad¹, Sachin Tripathi¹, Keshav Dahal²•Institutions (2)

Indian Institutes of Technology¹, University of the West of Scotland²

01 Feb 2020-Applied Soft Computing

TL;DR: A novel intelligent system that comprises the feature selection with a hybrid approach of the Rough set theory and the Bayes theorem is designed, which reduces false alarm rate, computational complexity, training complexity and increases detection rate.

...read moreread less

83 citations

Posted Content•

Empirical Bayes Transductive Meta-Learning with Synthetic Gradients

[...]

Shell Xu Hu, Pablo Garcia Moreno, Yang Xiao, Xi Shen, Guillaume Obozinski, Neil D. Lawrence, Andreas Damianou - Show less +3 more

27 Apr 2020-arXiv: Learning

TL;DR: A novel amortized variational inference that couples all the variational posteriors into a meta-model, which consists of a synthetic gradient network and an initialization network that allows for backpropagating information from unlabeled data, thereby enabling transduction.

...read moreread less

Abstract: We propose a meta-learning approach that learns from multiple tasks in a transductive setting, by leveraging the unlabeled query set in addition to the support set to generate a more powerful model for each task. To develop our framework, we revisit the empirical Bayes formulation for multi-task learning. The evidence lower bound of the marginal log-likelihood of empirical Bayes decomposes as a sum of local KL divergences between the variational posterior and the true posterior on the query set of each task. We derive a novel amortized variational inference that couples all the variational posteriors via a meta-model, which consists of a synthetic gradient network and an initialization network. Each variational posterior is derived from synthetic gradient descent to approximate the true posterior on the query set, although where we do not have access to the true gradient. Our results on the Mini-ImageNet and CIFAR-FS benchmarks for episodic few-shot classification outperform previous state-of-the-art methods. Besides, we conduct two zero-shot learning experiments to further explore the potential of the synthetic gradient.

...read moreread less

79 citations

Posted Content•

The Case for Bayesian Deep Learning

[...]

Andrew Gordon Wilson¹•Institutions (1)

New York University¹

29 Jan 2020-arXiv: Learning

TL;DR: The key distinguishing property of a Bayesian approach is marginalization instead of optimization, not the prior, or Bayes rule, which reflects the inductive biases of neural networks that help them generalize.

...read moreread less

Abstract: The key distinguishing property of a Bayesian approach is marginalization instead of optimization, not the prior, or Bayes rule. Bayesian inference is especially compelling for deep neural networks. (1) Neural networks are typically underspecified by the data, and can represent many different but high performing models corresponding to different settings of parameters, which is exactly when marginalization will make the biggest difference for both calibration and accuracy. (2) Deep ensembles have been mistaken as competing approaches to Bayesian methods, but can be seen as approximate Bayesian marginalization. (3) The structure of neural networks gives rise to a structured prior in function space, which reflects the inductive biases of neural networks that help them generalize. (4) The observed correlation between parameters in flat regions of the loss and a diversity of solutions that provide good generalization is further conducive to Bayesian marginalization, as flat regions occupy a large volume in a high dimensional space, and each different solution will make a good contribution to a Bayesian model average. (5) Recent practical advances for Bayesian deep learning provide improvements in accuracy and calibration compared to standard training, while retaining scalability.

...read moreread less

75 citations

Posted Content•

How Good is the Bayes Posterior in Deep Neural Networks Really

[...]

Florian Wenzel¹, Kevin Roth, Bastiaan S. Veeling, Jakub Świątkowski, Linh Tran, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin - Show less +6 more•Institutions (1)

Google¹

06 Feb 2020-arXiv: Machine Learning

TL;DR: This article showed that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD, and showed that predictive performance is improved significantly through the use of a "cold posterior" that overcounts evidence.

...read moreread less

Abstract: During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are---as of early 2020---no publicized deployments of Bayesian neural networks in industrial practice. In this work we cast doubt on the current understanding of Bayes posteriors in popular deep neural networks: we demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD. Furthermore, we demonstrate that predictive performance is improved significantly through the use of a "cold posterior" that overcounts evidence. Such cold posteriors sharply deviate from the Bayesian paradigm but are commonly used as heuristic in Bayesian deep learning papers. We put forward several hypotheses that could explain cold posteriors and evaluate the hypotheses through experiments. Our work questions the goal of accurate posterior approximations in Bayesian deep learning: If the true Bayes posterior is poor, what is the use of more accurate approximations? Instead, we argue that it is timely to focus on understanding the origin of the improved performance of cold posteriors.

...read moreread less

68 citations

Journal Article•DOI•

Adapting fit indices for Bayesian structural equation modeling: Comparison to maximum likelihood.

[...]

Mauricio Garnier-Villarreal¹, Terrence D. Jorgensen²•Institutions (2)

Marquette University¹, University of Amsterdam²

01 Feb 2020-Psychological Methods

TL;DR: Simulation results show that the sampling distributions of the posterior means of these fit indices are similar to their frequentist counterparts across sample sizes, model types, and levels of misspecification when BSEMs are estimated with noninformative priors, when Bayesian and frequentist estimation methods might not yield similar results.

...read moreread less

Abstract: In a frequentist framework, the exact fit of a structural equation model (SEM) is typically evaluated with the chi-square test and at least one index of approximate fit. Current Bayesian SEM (BSEM) software provides one measure of overall fit: the posterior predictive p value (PPP χ2 ). Because of the noted limitations of PPP χ2 , common practice for evaluating Bayesian model fit instead focuses on model comparison, using information criteria or Bayes factors. Fit indices developed under maximum-likelihood estimation have not been incorporated into software for BSEM. We propose adapting 7 chi-square-based approximate fit indices for BSEM, using a Bayesian analog of the chi-square model-fit statistic. Simulation results show that the sampling distributions of the posterior means of these fit indices are similar to their frequentist counterparts across sample sizes, model types, and levels of misspecification when BSEMs are estimated with noninformative priors. The proposed fit indices therefore allow overall model-fit evaluation using familiar metrics of the original indices, with an accompanying interval to quantify their uncertainty. Illustrative examples with real data raise some important issues about the proposed fit indices' application to models specified with informative priors, when Bayesian and frequentist estimation methods might not yield similar results. (PsycINFO Database Record (c) 2020 APA, all rights reserved).

...read moreread less

67 citations

Proceedings Article•DOI•

Safe Testing

[...]

Peter Grünwald¹, Rianne de Heide¹, Wouter M. Koolen¹•Institutions (1)

Centrum Wiskunde & Informatica¹

02 Feb 2020

TL;DR: In this article, the authors present a new theory of hypothesis testing called s-value, a notion of evidence which, unlike p-values, allows for effortlessly combining evidence from several tests, even in the common scenario where the decision to perform a new test depends on the previous test outcome.

...read moreread less

Abstract: We present a new theory of hypothesis testing. The main concept is the s-value, a notion of evidence which, unlike p-values, allows for effortlessly combining evidence from several tests, even in the common scenario where the decision to perform a new test depends on the previous test outcome: safe tests based on s-values generally preserve Type-I error guarantees under such ‘optional continuation’. S-values exist for completely general testing problems with composite null and alternatives. Their prime interpretation is in terms of gambling or investing, each S-value corresponding to a particular investment. Surprisingly, optimal "GROW" S-values, which lead to fastest capital growth, are fully characterized by the joint information projection (JIPr) between the set of all Bayes marginal distributions on ${\mathcal{H}_0}$ and ${\mathcal{H}_1}$. Thus, optimal s-values also have an interpretation as Bayes factors, with priors given by the JIPr. We illustrate the theory using two classical testing scenarios: the one-sample t-test and the 2 × 2-contingency table. In the t-test setting, GROW S-values correspond to adopting the right Haar prior on the variance, like in Jeffreys’ Bayesian t-test. However, unlike Jeffreys’, the default safe t-test puts a discrete 2-point prior on the effect size, leading to better behaviour in terms of statistical power. Sharing Fisherian, Neymanian and Jeffreys-Bayesian interpretations, S-values and safe tests may provide a methodology acceptable to adherents of all three schools.

...read moreread less

67 citations

Journal Article•DOI•

A Bayesian computational model reveals a failure to adapt interoceptive precision estimates across depression, anxiety, eating, and substance use disorders.

[...]

Ryan Smith¹, Rayus Kuplicki¹, Justin S. Feinstein¹, Justin S. Feinstein², Katherine L Forthman¹, Jennifer L. Stewart², Jennifer L. Stewart¹, Martin P. Paulus², Martin P. Paulus¹, Tulsa Investigators, Sahib S. Khalsa², Sahib S. Khalsa¹ - Show less +8 more•Institutions (2)

McGovern Institute for Brain Research¹, College of Health Sciences, Bahrain²

14 Dec 2020-PLOS Computational Biology

TL;DR: Modelling revealed that, during an interoceptive perturbation condition (inspiratory breath-holding during heartbeat tapping), healthy individuals assigned greater precision to ascending cardiac signals than individuals with symptoms of anxiety, depression, or co-morbid depression/anxiety–who failed to increase their precision estimates from resting levels.

...read moreread less

Abstract: Recent neurocomputational theories have hypothesized that abnormalities in prior beliefs and/or the precision-weighting of afferent interoceptive signals may facilitate the transdiagnostic emergence of psychopathology. Specifically, it has been suggested that, in certain psychiatric disorders, interoceptive processing mechanisms either over-weight prior beliefs or under-weight signals from the viscera (or both), leading to a failure to accurately update beliefs about the body. However, this has not been directly tested empirically. To evaluate the potential roles of prior beliefs and interoceptive precision in this context, we fit a Bayesian computational model to behavior in a transdiagnostic patient sample during an interoceptive awareness (heartbeat tapping) task. Modelling revealed that, during an interoceptive perturbation condition (inspiratory breath-holding during heartbeat tapping), healthy individuals (N = 52) assigned greater precision to ascending cardiac signals than individuals with symptoms of anxiety (N = 15), depression (N = 69), co-morbid depression/anxiety (N = 153), substance use disorders (N = 131), and eating disorders (N = 14)-who failed to increase their precision estimates from resting levels. In contrast, we did not find strong evidence for differences in prior beliefs. These results provide the first empirical computational modeling evidence of a selective dysfunction in adaptive interoceptive processing in psychiatric conditions, and lay the groundwork for future studies examining how reduced interoceptive precision influences visceral regulation and interoceptively-guided decision-making.

...read moreread less

Posted Content•

Sophisticated Inference.

[...]

Karl J. Friston, Lancelot Da Costa, Danijar Hafner, Casper Hesp, Thomas Parr - Show less +1 more

07 Jun 2020-arXiv: Neurons and Cognition

TL;DR: A sophisticated kind of active inference using a recursive form of expected free energy, which effectively implements a deep tree search over actions and outcomes in the future over sequences of belief states as opposed to states per se.

...read moreread less

Abstract: Active inference offers a first principle account of sentient behaviour, from which special and important cases can be derived, e.g., reinforcement learning, active learning, Bayes optimal inference, Bayes optimal design, etc. Active inference resolves the exploitation-exploration dilemma in relation to prior preferences, by placing information gain on the same footing as reward or value. In brief, active inference replaces value functions with functionals of (Bayesian) beliefs, in the form of an expected (variational) free energy. In this paper, we consider a sophisticated kind of active inference, using a recursive form of expected free energy. Sophistication describes the degree to which an agent has beliefs about beliefs. We consider agents with beliefs about the counterfactual consequences of action for states of affairs and beliefs about those latent states. In other words, we move from simply considering beliefs about 'what would happen if I did that' to 'what would I believe about what would happen if I did that'. The recursive form of the free energy functional effectively implements a deep tree search over actions and outcomes in the future. Crucially, this search is over sequences of belief states, as opposed to states per se. We illustrate the competence of this scheme, using numerical simulations of deep decision problems.

...read moreread less

Posted Content•

PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees

[...]

Jonas Rothfuss¹, Vincent Fortuin¹, Martin Josifoski², Andreas Krause¹•Institutions (2)

ETH Zurich¹, École Polytechnique Fédérale de Lausanne²

13 Feb 2020-arXiv: Machine Learning

TL;DR: A theoretical analysis using the PAC-Bayesian framework is provided and novel generalization bounds for meta-learning with unbounded loss functions and Bayesian base learners are derived, which develop a class of PAC-optimal meta- learning algorithms with performance guarantees and a principled meta-regularization.

...read moreread less

Abstract: Meta-learning can successfully acquire useful inductive biases from data. Yet, its generalization properties to unseen learning tasks are poorly understood. Particularly if the number of meta-training tasks is small, this raises concerns about overfitting. We provide a theoretical analysis using the PAC-Bayesian framework and derive novel generalization bounds for meta-learning. Using these bounds, we develop a class of PAC-optimal meta-learning algorithms with performance guarantees and a principled meta-level regularization. Unlike previous PAC-Bayesian meta-learners, our method results in a standard stochastic optimization problem which can be solved efficiently and scales well. When instantiating our PAC-optimal hyper-posterior (PACOH) with Gaussian processes and Bayesian Neural Networks as base learners, the resulting methods yield state-of-the-art performance, both in terms of predictive accuracy and the quality of uncertainty estimates. Thanks to their principled treatment of uncertainty, our meta-learners can also be successfully employed for sequential decision problems.

...read moreread less

Journal Article•DOI•

State Estimation for Stochastic Time-Varying Boolean Networks

[...]

Hongwei Chen¹, Zidong Wang², Jinling Liang³, Maozhen Li²•Institutions (3)

Donghua University¹, Brunel University London², Southeast University³

13 Feb 2020-IEEE Transactions on Automatic Control

TL;DR: A Boolean Bayesian filter is designed that can be utilized to provide the minimum MSE state estimate for the STVBNs and a recursive matrix-based algorithm is obtained to calculate the one-step prediction and estimation of the forward–backward state probability distribution vectors.

...read moreread less

Abstract: In this article, a general theoretical framework is developed for the state estimation problem of stochastic time-varying Boolean networks (STVBNs). The STVBN consists of a system model describing the evolution of the Boolean states and a model relating the noisy measurements to the Boolean states. Both the process noise and the measurement noise are characterized by sequences of mutually independent Bernoulli distributed stochastic variables taking values of 1 or 0, which imply that the state/measurement variables may be flipped with certain probabilities. First, an algebraic representation of the STVBNs is derived based on the semitensor product. Then, based on Bayes’ theorem, a recursive matrix-based algorithm is obtained to calculate the one-step prediction and estimation of the forward–backward state probability distribution vectors. Owing to the Boolean nature of the state variables, the Boolean Bayesian filter is designed that can be utilized to provide the minimum MSE state estimate for the STVBNs. The fixed-interval smoothing filter is also obtained by resorting to the forward–backward technique. Finally, a simulation experiment is carried out for the context estimation problem of the $p53$ - $MDM2$ negative-feedback gene regulatory network.

...read moreread less

Journal Article•DOI•

Bayesian reassessment of the epigenetic architecture of complex traits

[...]

Daniel Trejo Banos¹, Daniel L. McCartney², Marion Patxot¹, Lucas Anchieri¹, Thomas Battram³, C. Christiansen⁴, Ricardo Costeira⁴, Rosie M. Walker², Stewart W. Morris², Archie Campbell², Qian Zhang⁵, David J. Porteous², Allan F. McRae⁵, Naomi R. Wray⁵, Peter M. Visscher⁵, Chris Haley², Kathryn L. Evans², Ian J. Deary², Andrew M. McIntosh⁶, Andrew M. McIntosh², Gibran Hemani³, Jordana T. Bell⁴, Riccardo E. Marioni², Matthew R. Robinson⁷ - Show less +20 more•Institutions (7)

University of Lausanne¹, University of Edinburgh², University of Bristol³, King's College London⁴, University of Queensland⁵, Royal Edinburgh Hospital⁶, Institute of Science and Technology Austria⁷

08 Jun 2020-Nature Communications

TL;DR: A statistical approach is presented to infer the epigenetic architecture of complex disease, determine the variation captured by epigenetic effects, and estimate phenotype-epigenetic probe associations jointly.

...read moreread less

Abstract: Linking epigenetic marks to clinical outcomes improves insight into molecular processes, disease prediction, and therapeutic target identification. Here, a statistical approach is presented to infer the epigenetic architecture of complex disease, determine the variation captured by epigenetic effects, and estimate phenotype-epigenetic probe associations jointly. Implicitly adjusting for probe correlations, data structure (cell-count or relatedness), and single-nucleotide polymorphism (SNP) marker effects, improves association estimates and in 9,448 individuals, 75.7% (95% CI 71.70–79.3) of body mass index (BMI) variation and 45.6% (95% CI 37.3–51.9) of cigarette consumption variation was captured by whole blood methylation array data. Pathway-linked probes of blood cholesterol, lipid transport and sterol metabolism for BMI, and xenobiotic stimuli response for smoking, showed >1.5 times larger associations with >95% posterior inclusion probability. Prediction accuracy improved by 28.7% for BMI and 10.2% for smoking over a LASSO model, with age-, and tissue-specificity, implying associations are a phenotypic consequence rather than causal.

...read moreread less

Proceedings Article•

Continual Learning with Bayesian Neural Networks for Non-Stationary Data

[...]

Richard Kurle¹, Botond Cseke², Alexej Klushyn², Patrick van der Smagt², Stephan Günnemann¹ - Show less +1 more•Institutions (2)

Technische Universität München¹, Volkswagen²

30 Apr 2020

TL;DR: This work addresses continual learning for non-stationary data, using Bayesian neural networks and memory-based online variational Bayes, by introducing a novel method for sequentially updating both components of the posterior approximation.

...read moreread less

Abstract: This work addresses continual learning for non-stationary data, using Bayesian neural networks and memory-based online variational Bayes. We represent the posterior approximation of the network weights by a diagonal Gaussian distribution and a complementary memory of raw data. This raw data corresponds to likelihood terms that cannot be well approximated by the Gaussian. We introduce a novel method for sequentially updating both components of the posterior approximation. Furthermore, we propose Bayesian forgetting and a Gaussian diffusion process for adapting to non-stationary data. The experimental results show that our update method improves on existing approaches for streaming data. Additionally, the adaptation methods lead to better predictive performance for non-stationary data.

...read moreread less

Journal Article•DOI•

Theoretical and computational guarantees of mean field variational inference for community detection

[...]

Anderson Y. Zhang, Harrison H. Zhou

01 Oct 2020-Annals of Statistics

TL;DR: In this paper, the authors studied the mean field method for community detection under the stochastic block model and showed that it has a linear convergence rate and converges to the minimax rate within √ log n$ iterations.

...read moreread less

Abstract: The mean field variational Bayes method is becoming increasingly popular in statistics and machine learning. Its iterative coordinate ascent variational inference algorithm has been widely applied to large scale Bayesian inference. See Blei et al. (2017) for a recent comprehensive review. Despite the popularity of the mean field method, there exist remarkably little fundamental theoretical justifications. To the best of our knowledge, the iterative algorithm has never been investigated for any high-dimensional and complex model. In this paper, we study the mean field method for community detection under the stochastic block model. For an iterative batch coordinate ascent variational inference algorithm, we show that it has a linear convergence rate and converges to the minimax rate within $\log n$ iterations. This complements the results of Bickel et al. (2013) which studied the global minimum of the mean field variational Bayes and obtained asymptotic normal estimation of global model parameters. In addition, we obtain similar optimality results for Gibbs sampling and an iterative procedure to calculate maximum likelihood estimation, which can be of independent interest.

...read moreread less

Journal Article•DOI•

Bayes Imbalance Impact Index: A Measure of Class Imbalanced Data Set for Classification Problem

[...]

Yang Lu¹, Yiu-ming Cheung², Yuan Yan Tang³•Institutions (3)

Xiamen University¹, Hong Kong Baptist University², Hong Kong Community College³

01 Sep 2020-IEEE Transactions on Neural Networks

TL;DR: IBI3 can be used as an instance complexity measure of imbalance and BI3 as a criterion to demonstrate the degree to which imbalance deteriorates the classification of a data set and is used to access whether it is worth using imbalance recovery methods to recover the performance loss of a classifier.

...read moreread less

Abstract: Recent studies of imbalanced data classification have shown that the imbalance ratio (IR) is not the only cause of performance loss in a classifier, as other data factors, such as small disjuncts, noise, and overlapping, can also make the problem difficult. The relationship between the IR and other data factors has been demonstrated, but to the best of our knowledge, there is no measurement of the extent to which class imbalance influences the classification performance of imbalanced data. In addition, it is also unknown which data factor serves as the main barrier for classification in a data set. In this article, we focus on the Bayes optimal classifier and examine the influence of class imbalance from a theoretical perspective. We propose an instance measure called the Individual Bayes Imbalance Impact Index (IBI3) and a data measure called the Bayes Imbalance Impact Index (BI3). IBI3 and BI3 reflect the extent of influence using only the imbalance factor, in terms of each minority class sample and the whole data set, respectively. Therefore, IBI3 can be used as an instance complexity measure of imbalance and BI3 as a criterion to demonstrate the degree to which imbalance deteriorates the classification of a data set. We can, therefore, use BI3 to access whether it is worth using imbalance recovery methods, such as sampling or cost-sensitive methods, to recover the performance loss of a classifier. The experiments show that IBI3 is highly consistent with the increase of the prediction score obtained by the imbalance recovery methods and that BI3 is highly consistent with the improvement in the F1 score obtained by the imbalance recovery methods on both synthetic and real benchmark data sets.

...read moreread less

Journal Article•DOI•

On the nonparametric maximum likelihood estimator for Gaussian location mixture densities with application to Gaussian denoising

[...]

Sujayam Saha, Adityanand Guntuboyina

01 Apr 2020-Annals of Statistics

TL;DR: The results imply, in particular, that every NPMLE achieves near parametric risk (up to logarithmic multiplicative factors) when the true density is a discrete Gaussian mixture without any prior information on the number of mixture components.

...read moreread less

Abstract: We study the nonparametric maximum likelihood estimator (NPMLE) for estimating Gaussian location mixture densities in $d$-dimensions from independent observations. Unlike usual likelihood-based methods for fitting mixtures, NPMLEs are based on convex optimization. We prove finite sample results on the Hellinger accuracy of every NPMLE. Our results imply, in particular, that every NPMLE achieves near parametric risk (up to logarithmic multiplicative factors) when the true density is a discrete Gaussian mixture without any prior information on the number of mixture components. NPMLEs can naturally be used to yield empirical Bayes estimates of the oracle Bayes estimator in the Gaussian denoising problem. We prove bounds for the accuracy of the empirical Bayes estimate as an approximation to the oracle Bayes estimator. Here our results imply that the empirical Bayes estimator performs at nearly the optimal level (up to logarithmic factors) for denoising in clustering situations without any prior knowledge of the number of clusters.

...read moreread less

Journal Article•DOI•

Inference and computation with generalized additive models and their extensions

[...]

Simon N. Wood¹•Institutions (1)

University of Bristol¹

01 Jun 2020-Test

TL;DR: An overview of some widely applicable frameworks for regression models in which a response variable is related to smooth functions of some predictor variables and the equivalence of smoothing, Gaussian latent process models and Gaussian random effects is provided.

...read moreread less

Abstract: Regression models in which a response variable is related to smooth functions of some predictor variables are popular as a result of their appealing balance between flexibility and interpretability. Since the original generalized additive models of Hastie and Tibshirani (Generalized additive models. Chapman & Hall, Boca Raton, 1990) numerous model extensions have been proposed, and a variety of practically useful computational strategies have emerged. This paper provides an overview of some widely applicable frameworks for this type of modelling, emphasizing the similarities between the different approaches, and the equivalence of smoothing, Gaussian latent process models and Gaussian random effects. The focus is particularly on Bayes empirical smoother theory, fully Bayesian inference via stochastic simulation or integrated nested Laplace approximation and boosting.

...read moreread less

Journal Article•DOI•

Fault Detection and Classification of Shunt Compensated Transmission Line Using Discrete Wavelet Transform and Naive Bayes Classifier

[...]

Elhadi Aker, Mohammad Lutfi Othman, Veerapandiyan Veerasamy, Ishak Aris, Noor Izzri Abdul Wahab, Hashim Hizam - Show less +2 more

03 Jan 2020-Energies

TL;DR: The results obtained reveal that the proposed NB classifier outperforms in terms of accuracy rate, misclassification rate, kappa statistics, mean absolute error (MAE), root mean square error (RMSE), percentage relative absolute error (% RAE) and percentage root relative square error (% RRSE) than both MLP and the Bayes classifier.

...read moreread less

Abstract: This paper presents the methodology to detect and identify the type of fault that occurs in the shunt compensated static synchronous compensator (STATCOM) transmission line using a combination of Discrete Wavelet Transform (DWT) and Naive Bayes (NB) classifiers. To study this, the network model is designed using Matlab/Simulink. Different types of faults, such as Line to Ground (LG), Line to Line (LL), Double Line to Ground (LLG) and the three-phase (LLLG) fault, are applied at disparate zones of the system, with and without STATCOM, considering the effect of varying fault resistance. The three-phase fault current waveforms obtained are decomposed into several levels using Daubechies (db) mother wavelet of db4 to extract the features, such as the standard deviation (SD) and energy values. Then, the extracted features are used to train the classifiers, such as Multi-Layer Perceptron Neural Network (MLP), Bayes and the Naive Bayes (NB) classifier to classify the type of fault that occurs in the system. The results obtained reveal that the proposed NB classifier outperforms in terms of accuracy rate, misclassification rate, kappa statistics, mean absolute error (MAE), root mean square error (RMSE), percentage relative absolute error (% RAE) and percentage root relative square error (% RRSE) than both MLP and the Bayes classifier.

...read moreread less

Journal Article•DOI•

Linear, generalized, hierarchical, bayesian and random regression mixed models in genetics/genomics in plant breeding

[...]

Marcos Deon Vilela de Resende, Rodrigo Silva Alves

16 Oct 2020

TL;DR: In this paper, the state of the art of statistical modelling as applied to plant breeding is presented, where the authors emphasize the importance of model selection and parameters estimation in a practical way.

...read moreread less

Abstract: This paper presents the state of the art of the statistical modelling as applied to plant breeding. Classes of inference, statistical models, estimation methods and model selection are emphasized in a practical way. Restricted Maximum Likelihood (REML), Hierarchical Maximum Likelihood (HIML) and Bayesian (BAYES) are highlighted. Distributions of data and effects, and dimension and structure of the models are considered for model selection and parameters estimation. Theory and practical examples referring to selection between models with different fixed effects factors are given using the Full Maximum Likelihood (FML). An analytical FML way of defining random or fixed effects is presented to avoid the subjective or conceptual usual definitions. Examples of the applications of the Hierarchical Maximum Likelihood/Hierarchical Generalized Best Linear Unbiased Prediction (HIML/HG-BLUP) procedure are also presented. Sample sizes for achieving high experimental quality and accuracy are indicated and simple interpretation of the estimates of key genetic parameters are given. Phenomics and genomics are approached. Maximum accuracy under the truest model is the key for achieving efficacy in plant breeding programs.

...read moreread less

Journal Article•DOI•

Confidence reports in decision-making with multiple alternatives violate the Bayesian confidence hypothesis.

[...]

Hsin-Hung Li¹, Wei Ji Ma², Wei Ji Ma¹•Institutions (2)

New York University¹, Center for Neural Science²

24 Apr 2020-Nature Communications

TL;DR: The authors show that confidence reports are best explained by the difference between the posterior probabilities of the best and the next-best options, rather than by the posterior probability of the chosen (best) option alone, or by the overall uncertainty of the posterior distribution.

...read moreread less

Abstract: Decision confidence reflects our ability to evaluate the quality of decisions and guides subsequent behavior. Experiments on confidence reports have almost exclusively focused on two-alternative decision-making. In this realm, the leading theory is that confidence reflects the probability that a decision is correct (the posterior probability of the chosen option). There is, however, another possibility, namely that people are less confident if the best two options are closer to each other in posterior probability, regardless of how probable they are in absolute terms. This possibility has not previously been considered because in two-alternative decisions, it reduces to the leading theory. Here, we test this alternative theory in a three-alternative visual categorization task. We found that confidence reports are best explained by the difference between the posterior probabilities of the best and the next-best options, rather than by the posterior probability of the chosen (best) option alone, or by the overall uncertainty (entropy) of the posterior distribution. Our results upend the leading notion of decision confidence and instead suggest that confidence reflects the observer’s subjective probability that they made the best possible decision. Conventional theory suggests that people’s confidence about a decision reflects their subjective probability that the decision was correct. By studying decisions with multiple alternatives, the authors show that confidence reports instead reflect the difference in probabilities between the chosen and the next-best alternative.

...read moreread less

Journal Article•DOI•

Better Document-Level Machine Translation with Bayes' Rule

[...]

Lei Yu¹, Laurent Sartran¹, Wojciech Stokowiec², Wang Ling¹, Lingpeng Kong¹, Phil Blunsom³, Chris Dyer¹ - Show less +3 more•Institutions (3)

Google¹, Polish-Japanese Academy of Information Technology², University of Oxford³

08 Jul 2020-Transactions of the Association for Computational Linguistics

TL;DR: It is shown that Bayes’ rule provides an effective mechanism for creating document translation models that can be learned from only parallel sentences and monolingual documents a compelling benefit because parallel documents are not always available.

...read moreread less

Abstract: We show that Bayes’ rule provides an effective mechanism for creating document translation models that can be learned from only parallel sentences and monolingual documents a compelling benefit bec...

...read moreread less

Journal Article•DOI•

Pointless spatial modeling.

[...]

Katie Wilson¹, Jon Wakefield¹•Institutions (1)

University of Washington¹

01 Apr 2020-Biostatistics

TL;DR: This article exploits recent theoretical and computational advances to carry out modeling at the continuous spatial level, which induces a spatial model for the discrete areas and presents two approaches: a fully Bayesian implementation using a Hamiltonian Monte Carlo algorithm and an empirical Bayes implementation, that is much faster and is based on Laplace approximations.

...read moreread less

Abstract: The analysis of area-level aggregated summary data is common in many disciplines including epidemiology and the social sciences Typically, Markov random field spatial models have been employed to acknowledge spatial dependence and allow data-driven smoothing In the context of an irregular set of areas, these models always have an ad hoc element with respect to the definition of a neighborhood scheme In this article, we exploit recent theoretical and computational advances to carry out modeling at the continuous spatial level, which induces a spatial model for the discrete areas This approach also allows reconstruction of the continuous underlying surface, but the interpretation of such surfaces is delicate since it depends on the quality, extent and configuration of the observed data We focus on models based on stochastic partial differential equations We also consider the interesting case in which the aggregate data are supplemented with point data We carry out Bayesian inference and, in the language of generalized linear mixed models, if the link is linear, an efficient implementation of the model is available via integrated nested Laplace approximations For nonlinear links, we present two approaches: a fully Bayesian implementation using a Hamiltonian Monte Carlo algorithm and an empirical Bayes implementation, that is much faster and is based on Laplace approximations We examine the properties of the approach using simulation, and then apply the model to the classic Scottish lip cancer data

...read moreread less

Journal Article•DOI•

Discussion points for Bayesian inference.

[...]

Balazs Aczel¹, Rink Hoekstra, Andrew Gelman², Eric-Jan Wagenmakers³, Irene Klugkist⁴, Jeffrey N. Rouder⁵, Joachim Vandekerckhove⁵, Michael D. Lee⁵, Richard D. Morey⁶, Wolf Vanpaemel⁷, Zoltan Dienes⁸, Don van Ravenzwaaij - Show less +8 more•Institutions (8)

Hungarian Academy of Sciences¹, Columbia University², University of Amsterdam³, Utrecht University⁴, University of California, Irvine⁵, Cardiff University⁶, Katholieke Universiteit Leuven⁷, University of Sussex⁸

01 Jun 2020-Nature Human Behaviour

TL;DR: A thinking guideline to assist researchers in conducting Bayesian inference in the social and behavioural sciences and a summary of agreements and disagreements of the authors on several discussion points regardingBayesian inference are provided.

...read moreread less

Abstract: Why is there no consensual way of conducting Bayesian analyses? We present a summary of agreements and disagreements of the authors on several discussion points regarding Bayesian inference. We also provide a thinking guideline to assist researchers in conducting Bayesian inference in the social and behavioural sciences.

...read moreread less

Proceedings Article•

Empirical Bayes Transductive Meta-Learning with Synthetic Gradients

[...]

Xu Hu, Pablo Garcia Moreno¹, Yang Xiao², Xi Shen², Guillaume Obozinski³, Neil D. Lawrence⁴ - Show less +2 more•Institutions (4)

Amazon.com¹, École Normale Supérieure², École Polytechnique Fédérale de Lausanne³, University of Cambridge⁴

30 Apr 2020

TL;DR: This article propose a meta-learning approach that learns from multiple tasks in a transductive setting, by leveraging unlabeled information in the query set to learn a more powerful meta-model.

...read moreread less

Abstract: We propose a meta-learning approach that learns from multiple tasks in a transductive setting, by leveraging unlabeled information in the query set to learn a more powerful meta-model. To develop our framework we revisit the empirical Bayes formulation for multi-task learning. The evidence lower bound of the marginal log-likelihood of empirical Bayes decomposes as a sum of local KL divergences between the variational posterior and the true posterior of each task. We derive a novel amortized variational inference that couples all the variational posteriors into a meta-model, which consists of a synthetic gradient network and an initialization network. The combination of local KL divergences and synthetic gradient network allows for backpropagating information from unlabeled data, thereby enabling transduction. Our results on the Mini-ImageNet and CIFAR-FS benchmarks for episodic few-shot classification significantly outperform previous state-of-the-art methods.

...read moreread less

Journal Article•DOI•

A theory of learning to infer.

[...]

Ishita Dasgupta¹, Eric Schulz¹, Joshua B. Tenenbaum², Samuel J. Gershman¹•Institutions (2)

Harvard University¹, Massachusetts Institute of Technology²

01 Apr 2020-Psychological Review

TL;DR: It is shown that this theory can explain why and when people underreact to the data or the prior, and a new experiment demonstrates that these two forms of underreaction can be systematically controlled by manipulating the query distribution.

...read moreread less

Abstract: Bayesian theories of cognition assume that people can integrate probabilities rationally. However, several empirical findings contradict this proposition: human probabilistic inferences are prone to systematic deviations from optimality. Puzzlingly, these deviations sometimes go in opposite directions. Whereas some studies suggest that people underreact to prior probabilities (base rate neglect), other studies find that people underreact to the likelihood of the data (conservatism). We argue that these deviations arise because the human brain does not rely solely on a general-purpose mechanism for approximating Bayesian inference that is invariant across queries. Instead, the brain is equipped with a recognition model that maps queries to probability distributions. The parameters of this recognition model are optimized to get the output as close as possible, on average, to the true posterior. Because of our limited computational resources, the recognition model will allocate its resources so as to be more accurate for high probability queries than for low probability queries. By adapting to the query distribution, the recognition model learns to infer. We show that this theory can explain why and when people underreact to the data or the prior, and a new experiment demonstrates that these two forms of underreaction can be systematically controlled by manipulating the query distribution. The theory also explains a range of related phenomena: memory effects, belief bias, and the structure of response variability in probabilistic reasoning. We also discuss how the theory can be integrated with prior sampling-based accounts of approximate inference. (PsycInfo Database Record (c) 2020 APA, all rights reserved).

...read moreread less

Journal Article•DOI•

On spike and slab empirical Bayes multiple testing

[...]

Ismaël Castillo, Etienne Roquain

01 Oct 2020-Annals of Statistics

TL;DR: In this paper, a connection between empirical Bayes posterior distributions and false discovery rate (FDR) control was explored in the Gaussian sequence model, and it was shown empirically that empirical bayes-calibrated spike and slab posterior distributions allow FDR control under sparsity.

...read moreread less

Abstract: This paper explores a connection between empirical Bayes posterior distributions and false discovery rate (FDR) control. In the Gaussian sequence model, this work shows that empirical Bayes-calibrated spike and slab posterior distributions allow a correct FDR control under sparsity. Doing so, it offers a frequentist theoretical validation of empirical Bayes methods in the context of multiple testing. Our theoretical results are illustrated with numerical experiments.

...read moreread less

Journal Article•DOI•

Bayesian network analysis incorporating genetic anchors complements conventional Mendelian randomization approaches for exploratory analysis of causal relationships in complex data

[...]

Richard Howey¹, So-Youn Shin², So-Youn Shin¹, Caroline L Relton², George Davey Smith², Heather J. Cordell¹ - Show less +2 more•Institutions (2)

Newcastle University¹, University of Bristol²

02 Mar 2020-PLOS Genetics

TL;DR: It is concluded that BN incorporating genetic anchors is a useful complementary method to conventional MR for exploring causal relationships in complex data sets such as those generated from modern “omics” technologies.

...read moreread less

Abstract: Mendelian randomization (MR) implemented through instrumental variables analysis is an increasingly popular causal inference tool used in genetic epidemiology. But it can have limitations for evaluating simultaneous causal relationships in complex data sets that include, for example, multiple genetic predictors and multiple potential risk factors associated with the same genetic variant. Here we use real and simulated data to investigate Bayesian network analysis (BN) with the incorporation of directed arcs, representing genetic anchors, as an alternative approach. A Bayesian network describes the conditional dependencies/independencies of variables using a graphical model (a directed acyclic graph) with an accompanying joint probability. In real data, we found BN could be used to infer simultaneous causal relationships that confirmed the individual causal relationships suggested by bi-directional MR, while allowing for the existence of potential horizontal pleiotropy (that would violate MR assumptions). In simulated data, BN with two directional anchors (mimicking genetic instruments) had greater power for a fixed type 1 error than bi-directional MR, while BN with a single directional anchor performed better than or as well as bi-directional MR. Both BN and MR could be adversely affected by violations of their underlying assumptions (such as genetic confounding due to unmeasured horizontal pleiotropy). BN with no directional anchor generated inference that was no better than by chance, emphasizing the importance of directional anchors in BN (as in MR). Under highly pleiotropic simulated scenarios, BN outperformed both MR (and its recent extensions) and two recently-proposed alternative approaches: a multi-SNP mediation intersection-union test (SMUT) and a latent causal variable (LCV) test. We conclude that BN incorporating genetic anchors is a useful complementary method to conventional MR for exploring causal relationships in complex data sets such as those generated from modern "omics" technologies.

...read moreread less

Collapse