scispace - formally typeset
Search or ask a question

Showing papers on "Bayes' theorem published in 2020"


Posted Content
TL;DR: Variational Bayes (VB) has been proposed as a method to facilitate calculations of the posterior distributions for linear models, by providing a fast method for Bayesian inference by estimating the parameters of a factorized approximation to the posterior distribution.
Abstract: Variational Bayes (VB) has been used to facilitate the calculation of the posterior distribution in the context of Bayesian inference of the parameters of nonlinear models from data Previously an analytical formulation of VB has been derived for nonlinear model inference on data with additive gaussian noise as an alternative to nonlinear least squares Here a stochastic solution is derived that avoids some of the approximations required of the analytical formulation, offering a solution that can be more flexibly deployed for nonlinear model inference problems The stochastic VB solution was used for inference on a biexponential toy case and the algorithmic parameter space explored, before being deployed on real data from a magnetic resonance imaging study of perfusion The new method was found to achieve comparable parameter recovery to the analytic solution and be competitive in terms of computational speed despite being reliant on sampling

261 citations


Proceedings Article
12 Jul 2020
TL;DR: This work demonstrates through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD and argues that it is timely to focus on understanding the origin of the improved performance of cold posteriors.
Abstract: During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are—as of early 2020—no publicized deployments of Bayesian neural networks in industrial practice. In this work we cast doubt on the current understanding of Bayes posteriors in popular deep neural networks: we demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD. Furthermore, we demonstrate that predictive performance is improved significantly through the use of a “cold posterior” that overcounts evidence. Such cold posteriors sharply deviate from the Bayesian paradigm but are commonly used as heuristic in Bayesian deep learning papers. We put forward several hypotheses that could explain cold posteriors and evaluate the hypotheses through experiments. Our work questions the goal of accurate posterior approximations in Bayesian deep learning: If the true Bayes posterior is poor, what is the use of more accurate approximations? Instead, we argue that it is timely to focus on understanding the origin of the improved performance of cold posteriors. Code available on GitHub.

198 citations


Journal ArticleDOI
TL;DR: This article proposes a long short-term memory (LSTM)-Gauss-NBayes method, which is a synergy of the LSTM-NN and the Gaussian Bayes model for outlier detection in the IIoT, and demonstrates that the proposed techniques outperform the best-known competitors.
Abstract: The data generated by millions of sensors in the industrial Internet of Things (IIoT) are extremely dynamic, heterogeneous, and large scale and pose great challenges on the real-time analysis and decision making for anomaly detection in the IIoT. In this article, we propose a long short-term memory (LSTM)-Gauss-NBayes method, which is a synergy of the long short-term memory neural network (LSTM-NN) and the Gaussian Bayes model for outlier detection in the IIoT. In a nutshell, the LSTM-NN builds a model on normal time series. It detects outliers by utilizing the predictive error for the Gaussian Naive Bayes model. Our method exploits advantages of both LSTM and Gaussian Naive Bayes models, which not only has strong prediction capability of LSTM for future time point data, but also achieves an excellent classification performance of the Gaussian Naive Bayes model through the predictive error. We evaluate our approaches on three real-life datasets that involve both long-term and short-term time dependence. Empirical studies demonstrate that our proposed techniques outperform the best-known competitors, which is a preferable choice for detecting anomalies.

158 citations


Book ChapterDOI
23 Aug 2020
TL;DR: In this paper, the authors propose to meta-learn the ensemble of epoch-wise empirical Bayes models (E3BM) to achieve robust predictions, where each training epoch has a Bayes model whose parameters are specifically learned and deployed.
Abstract: Few-shot learning aims to train efficient predictive models with a few examples. The lack of training data leads to poor models that perform high-variance or low-confidence predictions. In this paper, we propose to meta-learn the ensemble of epoch-wise empirical Bayes models (E\(^3\)BM) to achieve robust predictions. “Epoch-wise” means that each training epoch has a Bayes model whose parameters are specifically learned and deployed. “Empirical” means that the hyperparameters, e.g., used for learning and ensembling the epoch-wise models, are generated by hyperprior learners conditional on task-specific data. We introduce four kinds of hyperprior learners by considering inductive vs. transductive, and epoch-dependent vs. epoch-independent, in the paradigm of meta-learning. We conduct extensive experiments for five-class few-shot tasks on three challenging benchmarks: miniImageNet, tieredImageNet, and FC100, and achieve top performance using the epoch-dependent transductive hyperprior learner, which captures the richest information. Our ablation study shows that both “epoch-wise ensemble” and “empirical” encourage high efficiency and robustness in the model performance (Our code is open-sourced at https://gitlab.mpi-klsb.mpg.de/yaoyaoliu/e3bm).

107 citations


Journal ArticleDOI
TL;DR: A novel intelligent system that comprises the feature selection with a hybrid approach of the Rough set theory and the Bayes theorem is designed, which reduces false alarm rate, computational complexity, training complexity and increases detection rate.

83 citations


Posted Content
TL;DR: A novel amortized variational inference that couples all the variational posteriors into a meta-model, which consists of a synthetic gradient network and an initialization network that allows for backpropagating information from unlabeled data, thereby enabling transduction.
Abstract: We propose a meta-learning approach that learns from multiple tasks in a transductive setting, by leveraging the unlabeled query set in addition to the support set to generate a more powerful model for each task. To develop our framework, we revisit the empirical Bayes formulation for multi-task learning. The evidence lower bound of the marginal log-likelihood of empirical Bayes decomposes as a sum of local KL divergences between the variational posterior and the true posterior on the query set of each task. We derive a novel amortized variational inference that couples all the variational posteriors via a meta-model, which consists of a synthetic gradient network and an initialization network. Each variational posterior is derived from synthetic gradient descent to approximate the true posterior on the query set, although where we do not have access to the true gradient. Our results on the Mini-ImageNet and CIFAR-FS benchmarks for episodic few-shot classification outperform previous state-of-the-art methods. Besides, we conduct two zero-shot learning experiments to further explore the potential of the synthetic gradient.

79 citations


Posted Content
TL;DR: The key distinguishing property of a Bayesian approach is marginalization instead of optimization, not the prior, or Bayes rule, which reflects the inductive biases of neural networks that help them generalize.
Abstract: The key distinguishing property of a Bayesian approach is marginalization instead of optimization, not the prior, or Bayes rule. Bayesian inference is especially compelling for deep neural networks. (1) Neural networks are typically underspecified by the data, and can represent many different but high performing models corresponding to different settings of parameters, which is exactly when marginalization will make the biggest difference for both calibration and accuracy. (2) Deep ensembles have been mistaken as competing approaches to Bayesian methods, but can be seen as approximate Bayesian marginalization. (3) The structure of neural networks gives rise to a structured prior in function space, which reflects the inductive biases of neural networks that help them generalize. (4) The observed correlation between parameters in flat regions of the loss and a diversity of solutions that provide good generalization is further conducive to Bayesian marginalization, as flat regions occupy a large volume in a high dimensional space, and each different solution will make a good contribution to a Bayesian model average. (5) Recent practical advances for Bayesian deep learning provide improvements in accuracy and calibration compared to standard training, while retaining scalability.

75 citations


Posted Content
TL;DR: This article showed that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD, and showed that predictive performance is improved significantly through the use of a "cold posterior" that overcounts evidence.
Abstract: During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are---as of early 2020---no publicized deployments of Bayesian neural networks in industrial practice. In this work we cast doubt on the current understanding of Bayes posteriors in popular deep neural networks: we demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD. Furthermore, we demonstrate that predictive performance is improved significantly through the use of a "cold posterior" that overcounts evidence. Such cold posteriors sharply deviate from the Bayesian paradigm but are commonly used as heuristic in Bayesian deep learning papers. We put forward several hypotheses that could explain cold posteriors and evaluate the hypotheses through experiments. Our work questions the goal of accurate posterior approximations in Bayesian deep learning: If the true Bayes posterior is poor, what is the use of more accurate approximations? Instead, we argue that it is timely to focus on understanding the origin of the improved performance of cold posteriors.

68 citations


Journal ArticleDOI
TL;DR: Simulation results show that the sampling distributions of the posterior means of these fit indices are similar to their frequentist counterparts across sample sizes, model types, and levels of misspecification when BSEMs are estimated with noninformative priors, when Bayesian and frequentist estimation methods might not yield similar results.
Abstract: In a frequentist framework, the exact fit of a structural equation model (SEM) is typically evaluated with the chi-square test and at least one index of approximate fit. Current Bayesian SEM (BSEM) software provides one measure of overall fit: the posterior predictive p value (PPP χ2 ). Because of the noted limitations of PPP χ2 , common practice for evaluating Bayesian model fit instead focuses on model comparison, using information criteria or Bayes factors. Fit indices developed under maximum-likelihood estimation have not been incorporated into software for BSEM. We propose adapting 7 chi-square-based approximate fit indices for BSEM, using a Bayesian analog of the chi-square model-fit statistic. Simulation results show that the sampling distributions of the posterior means of these fit indices are similar to their frequentist counterparts across sample sizes, model types, and levels of misspecification when BSEMs are estimated with noninformative priors. The proposed fit indices therefore allow overall model-fit evaluation using familiar metrics of the original indices, with an accompanying interval to quantify their uncertainty. Illustrative examples with real data raise some important issues about the proposed fit indices' application to models specified with informative priors, when Bayesian and frequentist estimation methods might not yield similar results. (PsycINFO Database Record (c) 2020 APA, all rights reserved).

67 citations


Proceedings ArticleDOI
02 Feb 2020
TL;DR: In this article, the authors present a new theory of hypothesis testing called s-value, a notion of evidence which, unlike p-values, allows for effortlessly combining evidence from several tests, even in the common scenario where the decision to perform a new test depends on the previous test outcome.
Abstract: We present a new theory of hypothesis testing. The main concept is the s-value, a notion of evidence which, unlike p-values, allows for effortlessly combining evidence from several tests, even in the common scenario where the decision to perform a new test depends on the previous test outcome: safe tests based on s-values generally preserve Type-I error guarantees under such ‘optional continuation’. S-values exist for completely general testing problems with composite null and alternatives. Their prime interpretation is in terms of gambling or investing, each S-value corresponding to a particular investment. Surprisingly, optimal "GROW" S-values, which lead to fastest capital growth, are fully characterized by the joint information projection (JIPr) between the set of all Bayes marginal distributions on ${\mathcal{H}_0}$ and ${\mathcal{H}_1}$. Thus, optimal s-values also have an interpretation as Bayes factors, with priors given by the JIPr. We illustrate the theory using two classical testing scenarios: the one-sample t-test and the 2 × 2-contingency table. In the t-test setting, GROW S-values correspond to adopting the right Haar prior on the variance, like in Jeffreys’ Bayesian t-test. However, unlike Jeffreys’, the default safe t-test puts a discrete 2-point prior on the effect size, leading to better behaviour in terms of statistical power. Sharing Fisherian, Neymanian and Jeffreys-Bayesian interpretations, S-values and safe tests may provide a methodology acceptable to adherents of all three schools.

67 citations


Journal ArticleDOI
TL;DR: Modelling revealed that, during an interoceptive perturbation condition (inspiratory breath-holding during heartbeat tapping), healthy individuals assigned greater precision to ascending cardiac signals than individuals with symptoms of anxiety, depression, or co-morbid depression/anxiety–who failed to increase their precision estimates from resting levels.
Abstract: Recent neurocomputational theories have hypothesized that abnormalities in prior beliefs and/or the precision-weighting of afferent interoceptive signals may facilitate the transdiagnostic emergence of psychopathology. Specifically, it has been suggested that, in certain psychiatric disorders, interoceptive processing mechanisms either over-weight prior beliefs or under-weight signals from the viscera (or both), leading to a failure to accurately update beliefs about the body. However, this has not been directly tested empirically. To evaluate the potential roles of prior beliefs and interoceptive precision in this context, we fit a Bayesian computational model to behavior in a transdiagnostic patient sample during an interoceptive awareness (heartbeat tapping) task. Modelling revealed that, during an interoceptive perturbation condition (inspiratory breath-holding during heartbeat tapping), healthy individuals (N = 52) assigned greater precision to ascending cardiac signals than individuals with symptoms of anxiety (N = 15), depression (N = 69), co-morbid depression/anxiety (N = 153), substance use disorders (N = 131), and eating disorders (N = 14)-who failed to increase their precision estimates from resting levels. In contrast, we did not find strong evidence for differences in prior beliefs. These results provide the first empirical computational modeling evidence of a selective dysfunction in adaptive interoceptive processing in psychiatric conditions, and lay the groundwork for future studies examining how reduced interoceptive precision influences visceral regulation and interoceptively-guided decision-making.

Posted Content
TL;DR: A sophisticated kind of active inference using a recursive form of expected free energy, which effectively implements a deep tree search over actions and outcomes in the future over sequences of belief states as opposed to states per se.
Abstract: Active inference offers a first principle account of sentient behaviour, from which special and important cases can be derived, e.g., reinforcement learning, active learning, Bayes optimal inference, Bayes optimal design, etc. Active inference resolves the exploitation-exploration dilemma in relation to prior preferences, by placing information gain on the same footing as reward or value. In brief, active inference replaces value functions with functionals of (Bayesian) beliefs, in the form of an expected (variational) free energy. In this paper, we consider a sophisticated kind of active inference, using a recursive form of expected free energy. Sophistication describes the degree to which an agent has beliefs about beliefs. We consider agents with beliefs about the counterfactual consequences of action for states of affairs and beliefs about those latent states. In other words, we move from simply considering beliefs about 'what would happen if I did that' to 'what would I believe about what would happen if I did that'. The recursive form of the free energy functional effectively implements a deep tree search over actions and outcomes in the future. Crucially, this search is over sequences of belief states, as opposed to states per se. We illustrate the competence of this scheme, using numerical simulations of deep decision problems.

Posted Content
TL;DR: A theoretical analysis using the PAC-Bayesian framework is provided and novel generalization bounds for meta-learning with unbounded loss functions and Bayesian base learners are derived, which develop a class of PAC-optimal meta- learning algorithms with performance guarantees and a principled meta-regularization.
Abstract: Meta-learning can successfully acquire useful inductive biases from data. Yet, its generalization properties to unseen learning tasks are poorly understood. Particularly if the number of meta-training tasks is small, this raises concerns about overfitting. We provide a theoretical analysis using the PAC-Bayesian framework and derive novel generalization bounds for meta-learning. Using these bounds, we develop a class of PAC-optimal meta-learning algorithms with performance guarantees and a principled meta-level regularization. Unlike previous PAC-Bayesian meta-learners, our method results in a standard stochastic optimization problem which can be solved efficiently and scales well. When instantiating our PAC-optimal hyper-posterior (PACOH) with Gaussian processes and Bayesian Neural Networks as base learners, the resulting methods yield state-of-the-art performance, both in terms of predictive accuracy and the quality of uncertainty estimates. Thanks to their principled treatment of uncertainty, our meta-learners can also be successfully employed for sequential decision problems.

Journal ArticleDOI
TL;DR: A Boolean Bayesian filter is designed that can be utilized to provide the minimum MSE state estimate for the STVBNs and a recursive matrix-based algorithm is obtained to calculate the one-step prediction and estimation of the forward–backward state probability distribution vectors.
Abstract: In this article, a general theoretical framework is developed for the state estimation problem of stochastic time-varying Boolean networks (STVBNs). The STVBN consists of a system model describing the evolution of the Boolean states and a model relating the noisy measurements to the Boolean states. Both the process noise and the measurement noise are characterized by sequences of mutually independent Bernoulli distributed stochastic variables taking values of 1 or 0, which imply that the state/measurement variables may be flipped with certain probabilities. First, an algebraic representation of the STVBNs is derived based on the semitensor product. Then, based on Bayes’ theorem, a recursive matrix-based algorithm is obtained to calculate the one-step prediction and estimation of the forward–backward state probability distribution vectors. Owing to the Boolean nature of the state variables, the Boolean Bayesian filter is designed that can be utilized to provide the minimum MSE state estimate for the STVBNs. The fixed-interval smoothing filter is also obtained by resorting to the forward–backward technique. Finally, a simulation experiment is carried out for the context estimation problem of the $p53$ - $MDM2$ negative-feedback gene regulatory network.

Journal ArticleDOI
TL;DR: A statistical approach is presented to infer the epigenetic architecture of complex disease, determine the variation captured by epigenetic effects, and estimate phenotype-epigenetic probe associations jointly.
Abstract: Linking epigenetic marks to clinical outcomes improves insight into molecular processes, disease prediction, and therapeutic target identification. Here, a statistical approach is presented to infer the epigenetic architecture of complex disease, determine the variation captured by epigenetic effects, and estimate phenotype-epigenetic probe associations jointly. Implicitly adjusting for probe correlations, data structure (cell-count or relatedness), and single-nucleotide polymorphism (SNP) marker effects, improves association estimates and in 9,448 individuals, 75.7% (95% CI 71.70–79.3) of body mass index (BMI) variation and 45.6% (95% CI 37.3–51.9) of cigarette consumption variation was captured by whole blood methylation array data. Pathway-linked probes of blood cholesterol, lipid transport and sterol metabolism for BMI, and xenobiotic stimuli response for smoking, showed >1.5 times larger associations with >95% posterior inclusion probability. Prediction accuracy improved by 28.7% for BMI and 10.2% for smoking over a LASSO model, with age-, and tissue-specificity, implying associations are a phenotypic consequence rather than causal.

Proceedings Article
30 Apr 2020
TL;DR: This work addresses continual learning for non-stationary data, using Bayesian neural networks and memory-based online variational Bayes, by introducing a novel method for sequentially updating both components of the posterior approximation.
Abstract: This work addresses continual learning for non-stationary data, using Bayesian neural networks and memory-based online variational Bayes. We represent the posterior approximation of the network weights by a diagonal Gaussian distribution and a complementary memory of raw data. This raw data corresponds to likelihood terms that cannot be well approximated by the Gaussian. We introduce a novel method for sequentially updating both components of the posterior approximation. Furthermore, we propose Bayesian forgetting and a Gaussian diffusion process for adapting to non-stationary data. The experimental results show that our update method improves on existing approaches for streaming data. Additionally, the adaptation methods lead to better predictive performance for non-stationary data.

Journal ArticleDOI
TL;DR: In this paper, the authors studied the mean field method for community detection under the stochastic block model and showed that it has a linear convergence rate and converges to the minimax rate within √ log n$ iterations.
Abstract: The mean field variational Bayes method is becoming increasingly popular in statistics and machine learning. Its iterative coordinate ascent variational inference algorithm has been widely applied to large scale Bayesian inference. See Blei et al. (2017) for a recent comprehensive review. Despite the popularity of the mean field method, there exist remarkably little fundamental theoretical justifications. To the best of our knowledge, the iterative algorithm has never been investigated for any high-dimensional and complex model. In this paper, we study the mean field method for community detection under the stochastic block model. For an iterative batch coordinate ascent variational inference algorithm, we show that it has a linear convergence rate and converges to the minimax rate within $\log n$ iterations. This complements the results of Bickel et al. (2013) which studied the global minimum of the mean field variational Bayes and obtained asymptotic normal estimation of global model parameters. In addition, we obtain similar optimality results for Gibbs sampling and an iterative procedure to calculate maximum likelihood estimation, which can be of independent interest.

Journal ArticleDOI
TL;DR: IBI3 can be used as an instance complexity measure of imbalance and BI3 as a criterion to demonstrate the degree to which imbalance deteriorates the classification of a data set and is used to access whether it is worth using imbalance recovery methods to recover the performance loss of a classifier.
Abstract: Recent studies of imbalanced data classification have shown that the imbalance ratio (IR) is not the only cause of performance loss in a classifier, as other data factors, such as small disjuncts, noise, and overlapping, can also make the problem difficult. The relationship between the IR and other data factors has been demonstrated, but to the best of our knowledge, there is no measurement of the extent to which class imbalance influences the classification performance of imbalanced data. In addition, it is also unknown which data factor serves as the main barrier for classification in a data set. In this article, we focus on the Bayes optimal classifier and examine the influence of class imbalance from a theoretical perspective. We propose an instance measure called the Individual Bayes Imbalance Impact Index (IBI3) and a data measure called the Bayes Imbalance Impact Index (BI3). IBI3 and BI3 reflect the extent of influence using only the imbalance factor, in terms of each minority class sample and the whole data set, respectively. Therefore, IBI3 can be used as an instance complexity measure of imbalance and BI3 as a criterion to demonstrate the degree to which imbalance deteriorates the classification of a data set. We can, therefore, use BI3 to access whether it is worth using imbalance recovery methods, such as sampling or cost-sensitive methods, to recover the performance loss of a classifier. The experiments show that IBI3 is highly consistent with the increase of the prediction score obtained by the imbalance recovery methods and that BI3 is highly consistent with the improvement in the F1 score obtained by the imbalance recovery methods on both synthetic and real benchmark data sets.

Journal ArticleDOI
TL;DR: The results imply, in particular, that every NPMLE achieves near parametric risk (up to logarithmic multiplicative factors) when the true density is a discrete Gaussian mixture without any prior information on the number of mixture components.
Abstract: We study the nonparametric maximum likelihood estimator (NPMLE) for estimating Gaussian location mixture densities in $d$-dimensions from independent observations. Unlike usual likelihood-based methods for fitting mixtures, NPMLEs are based on convex optimization. We prove finite sample results on the Hellinger accuracy of every NPMLE. Our results imply, in particular, that every NPMLE achieves near parametric risk (up to logarithmic multiplicative factors) when the true density is a discrete Gaussian mixture without any prior information on the number of mixture components. NPMLEs can naturally be used to yield empirical Bayes estimates of the oracle Bayes estimator in the Gaussian denoising problem. We prove bounds for the accuracy of the empirical Bayes estimate as an approximation to the oracle Bayes estimator. Here our results imply that the empirical Bayes estimator performs at nearly the optimal level (up to logarithmic factors) for denoising in clustering situations without any prior knowledge of the number of clusters.

Journal ArticleDOI
01 Jun 2020-Test
TL;DR: An overview of some widely applicable frameworks for regression models in which a response variable is related to smooth functions of some predictor variables and the equivalence of smoothing, Gaussian latent process models and Gaussian random effects is provided.
Abstract: Regression models in which a response variable is related to smooth functions of some predictor variables are popular as a result of their appealing balance between flexibility and interpretability. Since the original generalized additive models of Hastie and Tibshirani (Generalized additive models. Chapman & Hall, Boca Raton, 1990) numerous model extensions have been proposed, and a variety of practically useful computational strategies have emerged. This paper provides an overview of some widely applicable frameworks for this type of modelling, emphasizing the similarities between the different approaches, and the equivalence of smoothing, Gaussian latent process models and Gaussian random effects. The focus is particularly on Bayes empirical smoother theory, fully Bayesian inference via stochastic simulation or integrated nested Laplace approximation and boosting.

Journal ArticleDOI
03 Jan 2020-Energies
TL;DR: The results obtained reveal that the proposed NB classifier outperforms in terms of accuracy rate, misclassification rate, kappa statistics, mean absolute error (MAE), root mean square error (RMSE), percentage relative absolute error (% RAE) and percentage root relative square error (% RRSE) than both MLP and the Bayes classifier.
Abstract: This paper presents the methodology to detect and identify the type of fault that occurs in the shunt compensated static synchronous compensator (STATCOM) transmission line using a combination of Discrete Wavelet Transform (DWT) and Naive Bayes (NB) classifiers. To study this, the network model is designed using Matlab/Simulink. Different types of faults, such as Line to Ground (LG), Line to Line (LL), Double Line to Ground (LLG) and the three-phase (LLLG) fault, are applied at disparate zones of the system, with and without STATCOM, considering the effect of varying fault resistance. The three-phase fault current waveforms obtained are decomposed into several levels using Daubechies (db) mother wavelet of db4 to extract the features, such as the standard deviation (SD) and energy values. Then, the extracted features are used to train the classifiers, such as Multi-Layer Perceptron Neural Network (MLP), Bayes and the Naive Bayes (NB) classifier to classify the type of fault that occurs in the system. The results obtained reveal that the proposed NB classifier outperforms in terms of accuracy rate, misclassification rate, kappa statistics, mean absolute error (MAE), root mean square error (RMSE), percentage relative absolute error (% RAE) and percentage root relative square error (% RRSE) than both MLP and the Bayes classifier.

Journal ArticleDOI
16 Oct 2020
TL;DR: In this paper, the state of the art of statistical modelling as applied to plant breeding is presented, where the authors emphasize the importance of model selection and parameters estimation in a practical way.
Abstract: This paper presents the state of the art of the statistical modelling as applied to plant breeding. Classes of inference, statistical models, estimation methods and model selection are emphasized in a practical way. Restricted Maximum Likelihood (REML), Hierarchical Maximum Likelihood (HIML) and Bayesian (BAYES) are highlighted. Distributions of data and effects, and dimension and structure of the models are considered for model selection and parameters estimation. Theory and practical examples referring to selection between models with different fixed effects factors are given using the Full Maximum Likelihood (FML). An analytical FML way of defining random or fixed effects is presented to avoid the subjective or conceptual usual definitions. Examples of the applications of the Hierarchical Maximum Likelihood/Hierarchical Generalized Best Linear Unbiased Prediction (HIML/HG-BLUP) procedure are also presented. Sample sizes for achieving high experimental quality and accuracy are indicated and simple interpretation of the estimates of key genetic parameters are given. Phenomics and genomics are approached. Maximum accuracy under the truest model is the key for achieving efficacy in plant breeding programs.

Journal ArticleDOI
TL;DR: The authors show that confidence reports are best explained by the difference between the posterior probabilities of the best and the next-best options, rather than by the posterior probability of the chosen (best) option alone, or by the overall uncertainty of the posterior distribution.
Abstract: Decision confidence reflects our ability to evaluate the quality of decisions and guides subsequent behavior. Experiments on confidence reports have almost exclusively focused on two-alternative decision-making. In this realm, the leading theory is that confidence reflects the probability that a decision is correct (the posterior probability of the chosen option). There is, however, another possibility, namely that people are less confident if the best two options are closer to each other in posterior probability, regardless of how probable they are in absolute terms. This possibility has not previously been considered because in two-alternative decisions, it reduces to the leading theory. Here, we test this alternative theory in a three-alternative visual categorization task. We found that confidence reports are best explained by the difference between the posterior probabilities of the best and the next-best options, rather than by the posterior probability of the chosen (best) option alone, or by the overall uncertainty (entropy) of the posterior distribution. Our results upend the leading notion of decision confidence and instead suggest that confidence reflects the observer’s subjective probability that they made the best possible decision. Conventional theory suggests that people’s confidence about a decision reflects their subjective probability that the decision was correct. By studying decisions with multiple alternatives, the authors show that confidence reports instead reflect the difference in probabilities between the chosen and the next-best alternative.

Journal ArticleDOI
TL;DR: It is shown that Bayes’ rule provides an effective mechanism for creating document translation models that can be learned from only parallel sentences and monolingual documents a compelling benefit because parallel documents are not always available.
Abstract: We show that Bayes’ rule provides an effective mechanism for creating document translation models that can be learned from only parallel sentences and monolingual documents a compelling benefit bec...

Journal ArticleDOI
TL;DR: This article exploits recent theoretical and computational advances to carry out modeling at the continuous spatial level, which induces a spatial model for the discrete areas and presents two approaches: a fully Bayesian implementation using a Hamiltonian Monte Carlo algorithm and an empirical Bayes implementation, that is much faster and is based on Laplace approximations.
Abstract: The analysis of area-level aggregated summary data is common in many disciplines including epidemiology and the social sciences Typically, Markov random field spatial models have been employed to acknowledge spatial dependence and allow data-driven smoothing In the context of an irregular set of areas, these models always have an ad hoc element with respect to the definition of a neighborhood scheme In this article, we exploit recent theoretical and computational advances to carry out modeling at the continuous spatial level, which induces a spatial model for the discrete areas This approach also allows reconstruction of the continuous underlying surface, but the interpretation of such surfaces is delicate since it depends on the quality, extent and configuration of the observed data We focus on models based on stochastic partial differential equations We also consider the interesting case in which the aggregate data are supplemented with point data We carry out Bayesian inference and, in the language of generalized linear mixed models, if the link is linear, an efficient implementation of the model is available via integrated nested Laplace approximations For nonlinear links, we present two approaches: a fully Bayesian implementation using a Hamiltonian Monte Carlo algorithm and an empirical Bayes implementation, that is much faster and is based on Laplace approximations We examine the properties of the approach using simulation, and then apply the model to the classic Scottish lip cancer data

Journal ArticleDOI
TL;DR: A thinking guideline to assist researchers in conducting Bayesian inference in the social and behavioural sciences and a summary of agreements and disagreements of the authors on several discussion points regardingBayesian inference are provided.
Abstract: Why is there no consensual way of conducting Bayesian analyses? We present a summary of agreements and disagreements of the authors on several discussion points regarding Bayesian inference. We also provide a thinking guideline to assist researchers in conducting Bayesian inference in the social and behavioural sciences.

Proceedings Article
30 Apr 2020
TL;DR: This article propose a meta-learning approach that learns from multiple tasks in a transductive setting, by leveraging unlabeled information in the query set to learn a more powerful meta-model.
Abstract: We propose a meta-learning approach that learns from multiple tasks in a transductive setting, by leveraging unlabeled information in the query set to learn a more powerful meta-model. To develop our framework we revisit the empirical Bayes formulation for multi-task learning. The evidence lower bound of the marginal log-likelihood of empirical Bayes decomposes as a sum of local KL divergences between the variational posterior and the true posterior of each task. We derive a novel amortized variational inference that couples all the variational posteriors into a meta-model, which consists of a synthetic gradient network and an initialization network. The combination of local KL divergences and synthetic gradient network allows for backpropagating information from unlabeled data, thereby enabling transduction. Our results on the Mini-ImageNet and CIFAR-FS benchmarks for episodic few-shot classification significantly outperform previous state-of-the-art methods.

Journal ArticleDOI
TL;DR: It is shown that this theory can explain why and when people underreact to the data or the prior, and a new experiment demonstrates that these two forms of underreaction can be systematically controlled by manipulating the query distribution.
Abstract: Bayesian theories of cognition assume that people can integrate probabilities rationally. However, several empirical findings contradict this proposition: human probabilistic inferences are prone to systematic deviations from optimality. Puzzlingly, these deviations sometimes go in opposite directions. Whereas some studies suggest that people underreact to prior probabilities (base rate neglect), other studies find that people underreact to the likelihood of the data (conservatism). We argue that these deviations arise because the human brain does not rely solely on a general-purpose mechanism for approximating Bayesian inference that is invariant across queries. Instead, the brain is equipped with a recognition model that maps queries to probability distributions. The parameters of this recognition model are optimized to get the output as close as possible, on average, to the true posterior. Because of our limited computational resources, the recognition model will allocate its resources so as to be more accurate for high probability queries than for low probability queries. By adapting to the query distribution, the recognition model learns to infer. We show that this theory can explain why and when people underreact to the data or the prior, and a new experiment demonstrates that these two forms of underreaction can be systematically controlled by manipulating the query distribution. The theory also explains a range of related phenomena: memory effects, belief bias, and the structure of response variability in probabilistic reasoning. We also discuss how the theory can be integrated with prior sampling-based accounts of approximate inference. (PsycInfo Database Record (c) 2020 APA, all rights reserved).

Journal ArticleDOI
TL;DR: In this paper, a connection between empirical Bayes posterior distributions and false discovery rate (FDR) control was explored in the Gaussian sequence model, and it was shown empirically that empirical bayes-calibrated spike and slab posterior distributions allow FDR control under sparsity.
Abstract: This paper explores a connection between empirical Bayes posterior distributions and false discovery rate (FDR) control. In the Gaussian sequence model, this work shows that empirical Bayes-calibrated spike and slab posterior distributions allow a correct FDR control under sparsity. Doing so, it offers a frequentist theoretical validation of empirical Bayes methods in the context of multiple testing. Our theoretical results are illustrated with numerical experiments.

Journal ArticleDOI
TL;DR: It is concluded that BN incorporating genetic anchors is a useful complementary method to conventional MR for exploring causal relationships in complex data sets such as those generated from modern “omics” technologies.
Abstract: Mendelian randomization (MR) implemented through instrumental variables analysis is an increasingly popular causal inference tool used in genetic epidemiology. But it can have limitations for evaluating simultaneous causal relationships in complex data sets that include, for example, multiple genetic predictors and multiple potential risk factors associated with the same genetic variant. Here we use real and simulated data to investigate Bayesian network analysis (BN) with the incorporation of directed arcs, representing genetic anchors, as an alternative approach. A Bayesian network describes the conditional dependencies/independencies of variables using a graphical model (a directed acyclic graph) with an accompanying joint probability. In real data, we found BN could be used to infer simultaneous causal relationships that confirmed the individual causal relationships suggested by bi-directional MR, while allowing for the existence of potential horizontal pleiotropy (that would violate MR assumptions). In simulated data, BN with two directional anchors (mimicking genetic instruments) had greater power for a fixed type 1 error than bi-directional MR, while BN with a single directional anchor performed better than or as well as bi-directional MR. Both BN and MR could be adversely affected by violations of their underlying assumptions (such as genetic confounding due to unmeasured horizontal pleiotropy). BN with no directional anchor generated inference that was no better than by chance, emphasizing the importance of directional anchors in BN (as in MR). Under highly pleiotropic simulated scenarios, BN outperformed both MR (and its recent extensions) and two recently-proposed alternative approaches: a multi-SNP mediation intersection-union test (SMUT) and a latent causal variable (LCV) test. We conclude that BN incorporating genetic anchors is a useful complementary method to conventional MR for exploring causal relationships in complex data sets such as those generated from modern "omics" technologies.