scispace - formally typeset
Search or ask a question

Showing papers on "Bayes' theorem published in 1999"


Journal ArticleDOI
TL;DR: The second article on evidence-based statistics explores the inductive Bayesian approach to measuring evidence and combining information and addresses the epistemologic uncertainties that affect beliefs in the absence of evidence.
Abstract: The second article on evidence-based statistics explores the inductive Bayesian approach to measuring evidence and combining information and addresses the epistemologic uncertainties that affect al...

809 citations


Proceedings Article
30 Jul 1999
TL;DR: The Variational Bayes framework as discussed by the authors approximates full posterior distributions over model parameters and structures, as well as latent variables, in an analytical manner without resorting to sampling methods.
Abstract: Current methods for learning graphical models with latent variables and a fixed structure estimate optimal values for the model parameters. Whereas this approach usually produces overfitting and suboptimal generalization performance, carrying out the Bayesian program of computing the full posterior distributions over the parameters remains a difficult problem. Moreover, learning the structure of models with latent variables, for which the Bayesian approach is crucial, is yet a harder problem. In this paper I present the Variational Bayes framework, which provides a solution to these problems. This approach approximates full posterior distributions over model parameters and structures, as well as latent variables, in an analytical manner without resorting to sampling methods. Unlike in the Laplace approximation, these posteriors are generally non-Gaussian and no Hessian needs to be computed. The resulting algorithm generalizes the standard Expectation Maximization algorithm, and its convergence is guaranteed. I demonstrate that this algorithm can be applied to a large class of models in several domains, including unsupervised clustering and blind source separation.

615 citations


Journal ArticleDOI
TL;DR: The BFS is compared with Monte Carlo simulation and “ensemble forecasting” technique, none of which can alone produce a probabilistic forecast that meets requirements of rational decision making, but each can serve as a component of the BFS.
Abstract: Rational decision making (for flood warning, navigation, or reservoir systems) requires that the total uncertainty about a hydrologic predictand (such as river stage, discharge, or runoff volume) be quantified in terms of a probability distribution, conditional on all available information and knowledge. Hydrologic knowledge is typically embodied in a deterministic catchment model. Fundamentals are presented of a Bayesian forecasting system (BFS) for producing a probabilistic forecast of a hydrologic predictand via any deterministic catchment model. The BFS decomposes the total uncertainty into input uncertainty and hydrologic uncertainty, which are quantified independently and then integrated into a predictive (Bayes) distribution. This distribution results from a revision of a prior (climatic) distribution, is well calibrated, and has a nonnegative ex ante economic value. The BFS is compared with Monte Carlo simulation and “ensemble forecasting” technique, none of which can alone produce a probabilistic forecast that meets requirements of rational decision making, but each can serve as a component of the BFS.

429 citations


Proceedings Article
29 Nov 1999
TL;DR: This work presents an efficient algorithm for learning Bayes networks from data by first identifying each node's Markov blankets, then connecting nodes in a maximally consistent way, and proves that under mild assumptions, the approach requires time polynomial in the size of the data and the number of nodes.
Abstract: In recent years, Bayesian networks have become highly successful tool for diagnosis, analysis, and decision making in real-world domains. We present an efficient algorithm for learning Bayes networks from data. Our approach constructs Bayesian networks by first identifying each node's Markov blankets, then connecting nodes in a maximally consistent way. In contrast to the majority of work, which typically uses hill-climbing approaches that may produce dense and causally incorrect nets, our approach yields much more compact causal networks by heeding independencies in the data. Compact causal networks facilitate fast inference and are also easier to understand. We prove that under mild assumptions, our approach requires time polynomial in the size of the data and the number of nodes. A randomized variant, also presented here, yields comparable results at much higher speeds.

347 citations


Journal ArticleDOI
21 Aug 1999-BMJ
TL;DR: Current thinking on the value of the Bayesian approach to health technology assessment is reviewed, and it is argued that a bayesian approach allows conclusions to be provided in a form that is most suitable for decisions specific to patients and decisions affecting public policy.
Abstract: This is the third of four articles Bayes's theorem arose from a posthumous publication in 1763 by Thomas Bayes, a non-conformist minister from Tunbridge Wells. Although it gives a simple and uncontroversial result in probability theory, specific uses of the theorem have been the subject of considerable controversy for more than two centuries. In recent years a more balanced and pragmatic perspective has emerged, and in this paper we review current thinking on the value of the Bayesian approach to health technology assessment. A concise definition of bayesian methods in health technology assessment has not been established, but we suggest the following: the explicit quantitative use of external evidence in the design, monitoring, analysis, interpretation, and reporting of a health technology assessment. This approach acknowledges that judgments about the benefits of a new technology will rarely be based solely on the results of a single study but should synthesise evidence from multiple sources—for example, pilot studies, trials of similar interventions, and even subjective judgments about the generalisability of the study's results. A bayesian perspective leads to an approach to clinical trials that is claimed to be more flexible and ethical than traditional methods,1 and to elegant ways of handling multiple substudies—for example, when simultaneously estimating the effects of a treatment on many subgroups.2 Proponents have also argued that a bayesian approach allows conclusions to be provided in a form that is most suitable for decisions specific to patients and decisions affecting public policy.3 #### Summary points Bayesian methods interpret data from a study in the light of external evidence and judgment, and the form in which conclusions are drawn contributes naturally to decision making Prior plausibility of hypotheses is taken into account, just as when interpreting the results of a diagnostic test Scepticism about large treatment effects can be formally …

324 citations


Journal ArticleDOI
TL;DR: Some of the issues in developing adaptive methods for Markov chain Monte Carlo methods are outlined and some preliminary results are presented.
Abstract: Monte Carlo methods, in particular Markov chain Monte Carlo methods, have become increasingly important as a tool for practical Bayesian inference in recent years A wide range of algorithms is available, and choosing an algorithm that will work well on a specific problem is challenging It is therefore important to explore the possibility of developing adaptive strategies that choose and adjust the algorithm to a particular context based on information obtained during sampling as well as information provided with the problem This paper outlines some of the issues in developing adaptive methods and presents some preliminary results

293 citations


Journal ArticleDOI
TL;DR: A theory of naive probability is outlined, which explains how naive reasoners can infer posterior probabilities without relying on Bayes's theorem, and predicts several phenomena of reasoning about absolute probabilities, including typical biases.
Abstract: This article outlines a theory of naive probability. According to the theory, individuals who are unfamiliar with the probability calculus can infer the probabilities of events in an extensional way: They construct mental models of what is true in the various possibilities. Each model represents an equiprobable alternative unless individuals have beliefs to the contrary, in which case some models will have higher probabilities than others. The probability of an event depends on the proportion of models in which it occurs. The theory predicts several phenomena of reasoning about absolute probabilities, including typical biases. It correctly predicts certain cognitive illusions in inferences about relative probabilities. It accommodates reasoning based on numerical premises, and it explains how naive reasoners can infer posterior probabilities without relying on Bayes's theorem. Finally, it dispels some common misconceptions of probabilistic reasoning. The defence were permitted to lead evidence of the Bayes Theorem in connection with the statistical evaluation of the DNA profile. Although their Lordships expressed no concluded view on the matter, they had very grave doubts as to whether that evidence was properly admissible . .. their Lordships had never heard it suggested that a jury

283 citations


Journal ArticleDOI
TL;DR: This essay uses Bayes' theorem and concepts from decision theory to describe and explain some well-documented errors in clinical reasoning and heuristics and biases that produce these errors.
Abstract: Many clinical decisions are made in uncertainty. When the diagnosis is uncertain, the goal is to establish a diagnosis or to treat even if the diagnosis remains unknown. If the diagnosis is known (e.g., breast cancer or prostate cancer) but the treatment is risky and its outcome uncertain, still a choice must be made. In researching the psychology of clinical judgment and decision making, the major strategy is to compare observed clinical judgments and decisions with the normative model established by statistical decision theory. In this framework, the process of diagnosing is conceptualized as using imperfect information to revise opinions; Bayes' theorem is the formal rule for updating a diagnosis as new data are available. Treatment decisions should be made so as to maximize expected value. This essay uses Bayes' theorem and concepts from decision theory to describe and explain some well-documented errors in clinical reasoning. Heuristics and biases are the cognitive factors that produce these errors.

276 citations


Journal ArticleDOI
TL;DR: A thought-provoking discussion of the basis of the Bayesian information criterion (BIC), where Weakliem seems to agree that the Bayes factor framework is a useful one for hypothesis testing and model selection; his concern is with how the BayES factors are to be evaluated.
Abstract: I would like to thank David L. Weakliem (1999 [this issue]) for a thought-provoking discussion of the basis of the Bayesian information criterion (BIC). We may be in closer agreement than one might think from reading his article. When writing about Bayesian model selection for social researchers, 1 I focused on the BIC approximation on the grounds that it is easily implemented and often reasonable, and simplifies the exposition of an already technical topic. As Weakliem says, BIC corresponds to one of many possible priors, although I will argue that this prior is such as to make BIC appropriate for baseline reference use and reporting, albeit not necessarily always appropriate for drawing final conclusions. When writing about the same subject for statistical journals, 2 however, I have paid considerable attention to the choice of priors for Bayes factors. I thank Weakliem for bringing this subtle but important topic to the attention of sociologists. In 1986, I proposed replacing Pvalues by Bayes factors as the basis for hypothesis testing and model selection in social research, and I suggested BIC as a simple and convenient, albeit crude, approximation. Since then, a great deal has been learned about Bayes factors in general, and about BIC in particular. Weakliem seems to agree that the Bayes factor framework is a useful one for hypothesis testing and model selection; his concern is with how the Bayes factors are to be evaluated. Weakliem makes two main points about the BIC approximation. The first is that BIC yields an approximation to Bayes factors that corresponds closely to a particular prior (the unit information prior) on

232 citations


Journal ArticleDOI
19 Nov 1999-Science
TL;DR: Bayesian statistics, which allows researchers to use everything from hunches to hard data to compute the probability that a hypothesis is correct, is experiencing a renaissance in fields of science ranging from astrophysics to genomics and in real-world applications such as testing new drugs and setting catch limits for fish.
Abstract: Bayesian statistics, which allows researchers to use everything from hunches to hard data to compute the probability that a hypothesis is correct (see p. 1461), is experiencing a renaissance in fields of science ranging from astrophysics to genomics and in real-world applications such as testing new drugs and setting catch limits for fish. Advances in computers and the limitations of traditional statistical methods are part of the reason for the new popularity of this approach, first proposed in a 1763 paper by the Reverend Thomas Bayes. In addition, advocates say it produces answers that are easier to understand and forces users to be explicit about biases obscured by reigning "frequentist" approaches. Detractors, on the other hand, fear that because Bayesian analysis can take into account prior opinion, it could spawn less objective evaluations of experimental results.

220 citations


Journal ArticleDOI
TL;DR: The basic concepts of frequentist and Bayesian techniques for the identification of model parameters and the estimation of model prediction uncertainty are briefly reviewed in this paper, where a simple example with synthetically generated data sets of a model for microbial substrate conversion is used as a didactical tool for analyzing strengths and weaknesses of both techniques in the context of environmental system identification.

Posted Content
TL;DR: The relationship between the Bayesian approach and the minimum description length approach is established in this article, where the authors sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity.
Abstract: The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's minimal sufficient statistic. In general we show that data compression is almost always the best strategy, both in hypothesis identification and prediction.


Book
01 Jan 1999
TL;DR: The Elements of Inference discusses statistical models, hypothesis testing, and confidence intervals for Bayesian and Bayesian estimation of linear models and other analytical approximations.
Abstract: Introduction Information The concept of probability Assessing subjective probabilities An example Linear algebra and probability Notation Outline of the book Elements of Inference Common statistical models Likelihood-based functions Bayes theorem Exchangeability Sufficiency and exponential family Parameter elimination Prior Distribution Entirely subjective specification Specification through functional forms Conjugacy with the exponential family Non-informative priors Hierarchical priors Estimation Introduction to decision theory Bayesian point estimation Classical point estimation Empirical Bayes estimation Comparison of estimators Interval estimation Estimation in the Normal model Approximating Methods The general problem of inference Optimization techniques Asymptotic theory Other analytical approximations Numerical integration methods Simulation methods Hypothesis Testing Introduction Classical hypothesis testing Bayesian hypothesis testing Hypothesis testing and confidence intervals Asymptotic tests Prediction Bayesian prediction Classical prediction Prediction in the Normal model Linear prediction Introduction to Linear Models The linear model Classical estimation of linear models Bayesian estimation of linear models Hierarchical linear models Dynamic linear models Linear models with constraints Sketched Solutions to Selected Exercises List of Distributions References Index Exercises appear at the end of each chapter.

Proceedings Article
30 Jul 1999
TL;DR: A new unified approach is presented that combines approximate inference and the clique tree algorithm, thereby circumventing the need to maintain an exact representation of the cliques potentials.
Abstract: The clique tree algorithm is the standard method for doing inference in Bayesian networks. It works by manipulating clique potentials - distributions over the variables in a clique. While this approach works well for many networks, it is limited by the need to maintain an exact representation of the clique potentials. This paper presents a new unified approach that combines approximate inference and the clique tree algorithm, thereby circumventing this limitation. Many known approximate inference algorithms can be viewed as instances of this approach. The algorithm essentially does clique tree propagation, using approximate inference to estimate the densities in each clique. In many settings, the computation of the approximate clique potential can be done easily using statistical importance sampling. Iterations are used to gradually improve the quality of the estimation.

Posted Content
TL;DR: It is demonstrated that source separation problems are well-suited for the Bayesian approach which provides a natural and logically consistent method by which one can incorporate prior knowledge to estimate the most probable solution given that knowledge.
Abstract: The problem of source separation is by its very nature an inductive inference problem. There is not enough information to deduce the solution, so one must use any available information to infer the most probable solution. We demonstrate that source separation problems are well-suited for the Bayesian approach which provides a natural and logically consistent method by which one can incorporate prior knowledge to estimate the most probable solution given that knowledge. We derive the Bell-Sejnowski ICA algorithm from first principles, i.e. Bayes' Theorem and demonstrate how the Bayesian methodology makes explicit the underlying assumptions. We then further demonstrate the power of the Bayesian approach by deriving two separation algorithms that incorporate additional prior information. One algorithm separates signals that are known a priori to be decorrelated and the other utilizes information about the signal propagation through the medium from the sources to the detectors.

Journal ArticleDOI
TL;DR: It is found that frequency-based predictions are different from-but no better than-case-specific judgments of probability, while results from studies of overconfidence in general knowledge and base rate neglect in categorical prediction underline a general conclusion.

Proceedings Article
30 Jul 1999
TL;DR: Details for the special cases of item response theory (IRT) and multivariate latent class modeling are given, with a numerical example of the latter.
Abstract: As observations and student models become complex, educational assessments that exploit advances in technology and cognitive psychology can outstrip familiar testing models and analytic methods. Within the Portal conceptual framework for assessment design, Bayesian inference networks (BINS) record beliefs about students' knowledge and skills, in light of what they say and do. Joining evidence model BIN fragments--which contain observable variables and pointers to student model variables--to the student model allows one to update belief about knowledge and skills as observations arrive. Markov Chain Monte Carlo (MCMC) techniques can estimate the required conditional probabilities from empirical data, supplemented by expert judgment or substantive theory. Details for the special cases of item response theory (IRT) and multivariate latent class modeling are given, with a numerical example of the latter.

Proceedings ArticleDOI
23 Jun 1999
TL;DR: The framework and background for this approach are described, and the family of spatio-temporal receptive fields used for characterizing activities is presented, followed by a review of probabilistic recognition of patterns from joint statistics of receptive field responses.
Abstract: This paper addresses the problem of probabilistic recognition of activities from local spatio-temporal appearance. Joint statistics of space-time filters are employed to define histograms which characterize the activities to be recognized. These histograms provide the joint probability density functions required for recognition using Bayes rule. The result is a technique for recognition of activities which is robust to partial occlusions as well as changes in illumination. In this paper the framework and background for this approach is first described. Then the family of spatio-temporal receptive fields used for characterizing activities is presented. This is followed by a review of probabilistic recognition of patterns from joint statistics of receptive field responses. The approach is validated with the results of experiments in the discrimination of persons walking in different directions, and the recognition of a simple set of hand gestures in an augmented reality scenario.

Journal ArticleDOI
TL;DR: In this paper, the default Bayes factors, such as the fractional Bayes factor, median intrinsic factor, and expected intrinsic factor are compared with each other and with the p value in normal one-sided testing, to illustrate the basic issues.
Abstract: Bayesian hypothesis testing for nonnested hypotheses is studied, using various “default” Bayes factors, such as the fractional Bayes factor, the median intrinsic Bayes factor, and the encompassing and expected intrinsic Bayes factors. The different default methods are first compared with each other and with the p value in normal one-sided testing, to illustrate the basic issues. General results for one-sided testing in location and scale models are then presented. The default Bayes factors are also studied for specific models involving multiple hypotheses. In particular, a multiple hypothesis testing example involving a sequential clinical trial is discussed. In most of the examples presented we also derive the intrinsic prior; this is the prior distribution, which, if used directly, would yield answers (asymptotically) equivalent to those for the given default Bayes factor.

Journal ArticleDOI
TL;DR: It is shown that the two problems are in fact of the same difficulty in terms of rates of convergence under a sufficient condition, which is satisfied by many function classes including Besov (Sobolev), Lipschitz, and bounded variation.
Abstract: This paper studies minimax aspects of nonparametric classification. We first study minimax estimation of the conditional probability of a class label, given the feature variable. This function, say f, is assumed to be in a general nonparametric class. We show the minimax rate of convergence under square L/sub 2/ loss is determined by the massiveness of the class as measured by metric entropy. The second part of the paper studies minimax classification. The loss of interest is the difference between the probability of misclassification of a classifier and that of the Bayes decision. As is well known, an upper bound on risk for estimating f gives an upper bound on the risk for classification, but the rate is known to be suboptimal for the class of monotone functions. This suggests that one does not have to estimate f well in order to classify well. However, we show that the two problems are in fact of the same difficulty in terms of rates of convergence under a sufficient condition, which is satisfied by many function classes including Besov (Sobolev), Lipschitz, and bounded variation. This is somewhat surprising in view of a result of Devroye, Gorfi, and Lugosi (see A Probabilistic Theory of Pattern Recognition, New York: Springer-Verlag, 1996).

Journal ArticleDOI
TL;DR: Results indicate that teaching frequency representations fosters insight into Bayesian reasoning in medical experts, and opens up applications in medicine, law, statistics education, and other fields.
Abstract: Bayesian reasoning can be improved by representing information in frequency formats rather than in probabilities. This thesis opens up applications in medicine, law, statistics education, and other fields. The beneficial effect is no longer in dispute, but rather its cause and its boundary conditions. C. Lewis and G. Keren (1999) argued that the effect of frequency formats is due to "joint statements" rather than to "frequency statements." However, they overlooked the fact that our thesis is about frequency formats, not just any kind of frequency statements. We show that joint statements alone cannot account for the effect. B. A. Mellers and A. P. McGraw (1999) proposed a boundary condition under which the beneficial effect is reduced. In a reanalysis of our original data, we found this reduction for the problem they used but not for any other problem. We conclude by summarizing results indicating that teaching frequency representations fosters insight into Bayesian reasoning. Degrees of uncertainty can be represented in various ways, including probability and frequency formats. Let us first illustrate a frequency format and how it improves Bayesian reasoning in medical experts. We asked a sample of 48 physicians with an average of 14 years of professional experience, including private practitioners, university professors, and clinic directors (Hoffrage & Gigerenzer, 1998), to make inferences about the presence of a disease given a positive result for four routinely used medical diagnostic tests. One was mammography. The relevant information (concerning a population of women aged 40 years) was presented to half of the physicians in a probability format, which can be summarized as follows: The probability of breast cancer is 1%; the probability of a positive test given breast cancer is 80%; and the probability of a positive test given no breast cancer is 10%. The question was What is the probability that a woman who tests positive actually has breast cancer? The other half of the physicians in the study received the same information in a frequency format: 10 of every 1,000 women have breast cancer; 8 of those 10 women with breast cancer will test positive; and 99 of the 990 women without breast cancer will also test positive. The question was How many of those who test positive actually have breast cancer? When the information concerning mammography and breast cancer was presented in a probability format, only 8% of the physicians gave an estimate close to that yielded by Bayes's rule (i.e., .075). When the information was presented in a frequency format, in contrast, 46% of them arrived at the Bayesian response. This beneficial effect of the frequency format on physicians' judgments was obtained in each of the four diagnostic tasks. Thus, frequency formats help to improve Bayesian reasoning not only in

Book
22 Jun 1999
TL;DR: In this article, the authors present an overview of Wavelet regularization and prior models for image denoising in the Wavelet domain, and present a multiresolution Wavelet analysis in Hierarchical Bayesian Turbulence Models.
Abstract: I Introduction.- 1 An Introduction to Wavelets.- 2 Spectral View of Wavelets and Nonlinear Regression.- II Prior Models - Independent Case.- 3 Bayesian Approach to Wavelet Decomposition and Shrinkage.- 4 Some Observations on the Iractability of Certain Multi-Scale Models..- 5 Bayesian Analysis of Change-Point Models.- 6 Prior Elicitation in the Wavelet Domain.- 7 Wavelet Nonparametric Regression Using Basis Averaging.- III Decision Theoretic Wavelet Shrinkage.- 8 An Overview of Wavelet Regularization.- 9 Minimax Restoration and Deconvolution.- 10 Robust Bayesian and Bayesian Decision Theoretic Wavelet Shrinkage.- 11 Best Basis Representations with Prior Statistical Models.- IV Prior Models - Dependent Case.- 12 Modeling Dependence in the Wavelet Domain.- 13 MCMC Methods in Wavelet Shrinkage.- V Spatial Models.- 14 Empirical Bayesian Spatial Prediction Using Wavelets.- 15 Geometrical Priors for Noisefree Wavelet Coefficients in Image Denoising.- 16 Multiscale Hidden Markov Models for Bayesian Image Analysis.- 17 Wavelets for Object Representation and Recognition in Computer Vision.- 18 Bayesian Denoising of Visual Images in the Wavelet Domain.- VI Empirical Bayes.- 19 Empirical Bayes Estimation in Wavelet Nonparametric Regression.- 20 Nonparametric Empirical Bayes Estimation via Wavelets.- VII Case Studies.- 21 Multiresolution Wavelet Analyses in Hierarchical Bayesian Turbulence Models.- 22 Low Dimensional Turbulent Transport Mechanics Near the Forest-Atmosphere Interface.- 23 Latent Structure Analyses of Turbulence Data Using Wavelets and Time Series Decompositions.

Journal ArticleDOI
TL;DR: A new method is developed for calculating Bayes posterior distributions of future catches that conform to a specified harvest control law while incorporating uncertainty in biological reference reference law.
Abstract: A new method is developed for calculating Bayes posterior distributions of future catches that conform to a specified harvest control law while incorporating uncertainty in biological reference poi...

Book
01 Jan 1999
TL;DR: This chapter discusses the role of failure data, Bayesian Inference, Predictive Distributions, and Maximization of Likelihood in Software Engineering, and the development of Software Reliability Models.
Abstract: 1 Introduction and Overview.- 1.1 What is Software Engineering?.- 1.2 Uncertainty in Software Production.- 1.2.1 The Software Development Process.- 1.2.2 Sources of Uncertainty in the Development Process.- 1.3 The Quantification of Uncertainty.- 1.3.1 Probability as an Approach for Quantifying Uncertainty.- 1.3.2 Interpretations of Probability.- 1.3.3 Interpreting Probabilities in Software Engineering.- 1.4 The Role of Statistical Methods in Software Engineering.- 1.5 Chapter Summary.- 2 Foundational Issues: Probability and Reliability.- 2.0 Preamble.- 2.1 The Calculus of Probability.- 2.1.1 Notation and Preliminaries.- 2.1.2 Conditional Probabilities and Conditional Independence.- 2.1.3 The Calculus of Probability.- 2.1.4 The Law of Total Probability, Bayes' Law, and the Likelihood Function.- 2.1.5 The Notion of Exchangeability.- 2.2 Probability Models and Their Parameters.- 2.2.1 What is a Software Reliability Model?.- 2.2.2 Some Commonly Used Probability Models.- 2.2.3 Moments of Probability Distributions and Expectation of Random Variables.- 2.2.4 Moments of Probability Models: The Mean Time to Failure.- 2.3 Point Processes and Counting Process Models.- 2.3.1 The Nonhomogeneous Poisson Process Model.- 2.3.2 The Homogeneous Poisson Process Model.- 2.3.3 Generalizations of the Point Process Model.- 2.4 Fundamentals of Reliability.- 2.4.1 The Notion of a Failure Rate Function.- 2.4.2 Some Commonly Used Model Failure Rates.- 2.4.3 Covariates in the Failure Rate Function.- 2.4.4 The Concatenated Failure Rate Function.- 2.5 Chapter Summary.- Exercises for Chapter 2.- 3 Models for Measuring Software Reliability.- 3.1 Background: The Failure of Software.- 3.1.1 The Software Failure Process and Its Associated Randomness.- 3.1.2 Classification of Software Reliability Models.- 3.2 Models Based on the Concatenated Failure Rate Function.- 3.2.1 The Failure Rate of Software.- 3.2.2 The Model of Jelinski and Moranda (1972).- 3.2.3 Extensions and Generalizations of the Model by Jelinski and Moranda.- 3.2.4 Hierarchical Bayesian Reliability Growth Models.- 3.3 Models Based on Failure Counts.- 3.3.1 Time Dependent Error Detection Models.- 3.4 Models Based on Times Between Failures.- 3.4.1 The Random Coefficient Autoregressive Process Model.- 3.4.2 A Non-Gaussian Kalman Filter Model.- 3.5 Unification of Software Reliability Models.- 3.5.1 Unification via the Bayesian Paradigm.- 3.5.2 Unification via Self-Exciting Point Process Models.- 3.5.3 Other Approaches to Unification.- 3.6 An Adaptive Concatenated Failure Rate Model.- 3.6.1 The Model and Its Motivation.- 3.6.2 Properties of the Model and Interpretation of Model Parameters.- 3.7 Chapter Summary.- Exercises for Chapter 3.- 4 Statistical Analysis of Software Failure Data.- 4.1 Background: The Role of Failure Data.- 4.2 Bayesian Inference, Predictive Distributions, and Maximization of Likelihood.- 4.2.1 Bayesian Inference and Prediction.- 4.2.2 The Method of Maximum Likelihood.- 4.2.3 Application: Inference and Prediction Using Jelinski and Moranda's Model.- 4.2.4 Application: Inference and Prediction Under an Error Detection Model.- 4.3 Specification of Prior Distributions.- 4.3.1 Standard of Reference-Noninformative Priors.- 4.3.2 Subjective Priors Based on Elicitation of Specialist Knowledge.- 4.3.3 Extensions of the Elicitation Model.- 4.3.4 Example: Eliciting Priors for the Logarithmic-Poisson Model.- 4.3.5 Application: Failure Prediction Using Logarithmic-Poisson Model.- 4.4 Inference and Prediction Using a Hierarchical Model.- 4.4.1 Application to NTDS Data: Assessing Reliability Growth.- 4.5 Inference and Predictions Using Dynamic Models.- 4.5.1 Inference for the Random Coefficient Exchangeable Model.- 4.5.2 Inference for the Adaptive Kalman Filter Model.- 4.5.3 Inference for the Non-Gaussian Kalman Filter Model.- 4.6 Prequential Prediction, Bayes Factors, and Model Comparison.- 4.6.1 Prequential Likelihoods and Prequential Prediction.- 4.6.2 Bayes' Factors and Model Averaging.- 4.6.3 Model Complexity Occam's Razor.- 4.6.4 Application: Comparing the Exchangeable, Adaptive, and Non-Gaussian Models.- 4.6.5 An Example of Reversals in the Prequential Likelihood Ratio.- 4.7 Inference for the Concatenated Failure Rate Model.- 4.7.1 Specification of the Prior Distribution.- 4.7.2 Calculating Posteriors by Markov Chain Monte Carlo.- 4.7.3 Testing Hypotheses About Reliability Growth or Decay.- 4.7.4 Application to System 40 Data.- 4.8 Chapter Summary.- Exercises for Chapter 4.- 5 Software Productivity and Process Management.- 5.1 Background: Producing Quality Software.- 5.2 A Growth-Curve Model for Estimating Software Productivity.- 5.2.1 The Statistical Model.- 5.2.2 Inference and Prediction Under the Growth-Curve Model.- 5.2.3 Application: Estimating Individual Software Productivity.- 5.3 The Capability Maturity Model for Process Management.- 5.3.1 The Conceptual Framework.- 5.3.2 The Probabilistic Approach for Hierarchical Classification.- 5.3.3 Application: Classifying a Software Developer.- 5.4 Chapter Summary.- Exercises for Chapter 5.- 6 The Optimal Testing and Release of Software.- 6.1 Background: Decision Making and the Calculus of Probability.- 6.2 Decision Making Under Uncertainty.- 6.3 Utility and Choosing the Optimal Decision.- 6.3.1 Maximization of Expected Utility.- 6.3.2 The Utility of Money.- 6.4 Decision Trees.- 6.4.1 Solving Decision Trees.- 6.5 Software Testing Plans.- 6.6 Examples of Optimal Testing Plans.- 6.6.1 One-Stage Testing Using the Jelinski-Moranda Model.- 6.6.2 One-and Two-Stage Testing Using the Model by Goel and Okumoto.- 6.6.3 One-Stage Lookahead Testing Using the Model by Goel and Okumoto.- 6.6.4 Fixed-Time Lookahead Testing for the Goel-Okumoto Model.- 6.6.5 One-Bug Lookahead Testing Plans.- 6.6.6 Optimality of One-Stage Look Ahead Plans.- 6.7 Application: Testing the NTDS Data.- 6.8 Chapter Summary.- Exercises for Chapter 6.- 7 Other Developments: Open Problems.- 7.0 Preamble.- 7.1 Dynamic Modeling and the Operational Profile.- 7.1.1 Martingales, Predictable Processes, and Compensators: An Overview.- 7.1.2 The Doob-Meyer Decomposition of Counting Processes.- 7.1.3 Incorporating the Operational Profile.- 7.2 Statistical Aspects of Software Testing: Experimental Designs.- 7.2.1 Inferential Issues in Random and Partition Testing.- 7.2.2 Comparison of Random and Partition Testing.- 7.2.3 Design of Experiments in Software Testing.- 7.2.4 Design of Experiments in Multiversion Programming.- 7.2.5 Concluding Remarks.- 7.3 The Integration of Module and System Performance.- 7.3.1 The Protocols of Control Flow and Data Flow.- 7.3.2 The Structure Function of Modularized Software.- Appendices.- Appendix A Statistical Computations Using the Gibbs Sampler.- A.1 An Overview of the Gibbs Sampler.- A.2 Generating Random Variates The Rejection Method.- A.3 Examples: Using the Gibbs Sampler.- A.3.1 Gibbs Sampling the Jelinski-Moranda Model.- A.3.2 Gibbs Sampling the Hierarchical Model.- A.3.3 Gibbs Sampling the Adaptive Kalman Filter Model.- A.3.4 Gibbs Sampling the Non-Gaussian Kalman Filter Model.- Appendix B The Maturity Questionnaire and Responses.- B. 1 The Maturity Questionnaire.- B.2 Binary (Yes, No) Responses to the Maturity Questionnaire.- B.3 Prior Probabilities and Likelihoods.- References.- Author Index.

Journal ArticleDOI
TL;DR: A comparison of the weights-of-evidence method to probabilistic neural networks is performed here with data from Chisel Lake-Andeson Lake, Manitoba, Canada to demonstrate the neural network's ability at making unbiased probability estimates and lower error rates when measured by number of polygons or by the area of land misclassified.
Abstract: The need to integrate large quantities of digital geoscience information to classify locations as mineral deposits or nondeposits has been met by the weights-of-evidence method in many situations Widespread selection of this method may be more the result of its ease of use and interpretation rather than comparisons with alternative methods A comparison of the weights-of-evidence method to probabilistic neural networks is performed here with data from Chisel Lake-Andeson Lake, Manitoba, Canada Each method is designed to estimate the probability of belonging to learned classes where the estimated probabilities are used to classify the unknowns Using these data, significantly lower classification error rates were observed for the neural network, not only when test and training data were the same (002 versus 23%), but also when validation data, not used in any training, were used to test the efficiency of classification (07 versus 17%) Despite these data containing too few deposits, these tests of this set of data demonstrate the neural network's ability at making unbiased probability estimates and lower error rates when measured by number of polygons or by the area of land misclassified For both methods, independent validation tests are required to ensure that estimates are representative of real-world results Results from the weights-of-evidence method demonstrate a strong bias where most errors are barren areas misclassified as deposits The weights-of-evidence method is based on Bayes rule, which requires independent variables in order to make unbiased estimates The chi-square test for independence indicates no significant correlations among the variables in the Chisel Lake–Andeson Lake data However, the expected number of deposits test clearly demonstrates that these data violate the independence assumption Other, independent simulations with three variables show that using variables with correlations of 10 can double the expected number of deposits as can correlations of −10 Studies done in the 1970s on methods that use Bayes rule show that moderate correlations among attributes seriously affect estimates and even small correlations lead to increases in misclassifications Adverse effects have been observed with small to moderate correlations when only six to eight variables were used Consistent evidence of upward biased probability estimates from multivariate methods founded on Bayes rule must be of considerable concern to institutions and governmental agencies where unbiased estimates are required In addition to increasing the misclassification rate, biased probability estimates make classification into deposit and nondeposit classes an arbitrary subjective decision The probabilistic neural network has no problem dealing with correlated variables—its performance depends strongly on having a thoroughly representative training set Probabilistic neural networks or logistic regression should receive serious consideration where unbiased estimates are required The weights-of-evidence method would serve to estimate thresholds between anomalies and background and for exploratory data analysis

Proceedings Article
29 Nov 1999
TL;DR: This paper argues that two apparently distinct modes of generalizing concepts - abstracting rules and computing similarity to exemplars - should both be seen as special cases of a more general Bayesian learning framework.
Abstract: This paper argues that two apparently distinct modes of generalizing concepts - abstracting rules and computing similarity to exemplars - should both be seen as special cases of a more general Bayesian learning framework. Bayes explains the specific workings of these two modes - which rules are abstracted, how similarity is measured - as well as why generalization should appear rule - or similarity-based in different situations. This analysis also suggests why the rules/similarity distinction, even if not computationally fundamental, may still be useful at the algorithmic level as part of a principled approximation to fully Bayesian learning.

Journal ArticleDOI
TL;DR: In this paper, a Bayesian approach with a non-informative prior distribution is proposed for nonlinear time-series models, where the MCMC technique is used to iteratively simulate missing states, innovations and parameters until convergence.

Book ChapterDOI
01 Jan 1999
TL;DR: This chapter provides an analytical framework to quantify the improvements in classification results due to combining, and derives expressions that indicate how much the median, the maximum and in general the ith order statistic can improve classifier performance.
Abstract: Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification results due to combining. The results apply to both linear combiners and order statistics combiners. We first show that to a first order approximation, the error rate obtained over and above the Bayes error rate, is directly proportional to the variance of the actual decision boundaries around the Bayes optimum boundary. Combining classifiers in output space reduces this variance, and hence reduces the “added” error. If N unbiased classifiers are combined by simple averaging, the added error rate can be reduced by a factor of N if the individual errors in approximating the decision boundaries are uncorrelated. Expressions are then derived for linear combiners which are biased or correlated, and the effect of output correlations on ensemble performance is quantified. For order statistics based non-linear combiners, we derive expressions that indicate how much the median, the maximum and in general the ith order statistic can improve classifier performance. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions, and combining in output space. Experimental results on several public domain data sets are provided to illustrate the benefits of combining and to support the analytical results.

Proceedings ArticleDOI
23 Jun 1999
TL;DR: It is argued that Bayesian network models are an attractive statistical framework for cue fusion in these applications because they combine a natural mechanism for expressing contextual information with efficient algorithms for learning and inference.
Abstract: The development of user interfaces based on vision and speech requires the solution of a challenging statistical inference problem: The intentions and actions of multiple individuals must be inferred from noisy and ambiguous data. We argue that Bayesian network models are an attractive statistical framework for cue fusion in these applications. Bayes nets combine a natural mechanism for expressing contextual information with efficient algorithms for learning and inference. We illustrate these points through the development of a Bayes net model for detecting when a user is speaking. The model combines four simple vision sensors: face detection, skin color, skin texture, and mouth motion. We present some promising experimental results.