scispace - formally typeset
Search or ask a question

Showing papers on "Bayesian probability published in 1997"


Journal ArticleDOI
TL;DR: In this paper, a hierarchical prior model is proposed to deal with weak prior information while avoiding the mathematical pitfalls of using improper priors in the mixture context, which can be used as a basis for a thorough presentation of many aspects of the posterior distribution.
Abstract: New methodology for fully Bayesian mixture analysis is developed, making use of reversible jump Markov chain Monte Carlo methods that are capable of jumping between the parameter subspaces corresponding to different numbers of components in the mixture A sample from the full joint distribution of all unknown variables is thereby generated, and this can be used as a basis for a thorough presentation of many aspects of the posterior distribution The methodology is applied here to the analysis of univariate normal mixtures, using a hierarchical prior model that offers an approach to dealing with weak prior information while avoiding the mathematical pitfalls of using improper priors in the mixture context

2,018 citations


Journal ArticleDOI
TL;DR: An improved Bayesian method is presented for estimating phylogenetic trees using DNA sequence data, and the posterior probabilities of phylogenies are used to estimate the maximum posterior probability (MAP) tree, which has a probability of approximately 95%.
Abstract: An improved Bayesian method is presented for estimating phylogenetic trees using DNA sequence data. The birth-death process with species sampling is used to specify the prior distribution of phylogenies and ancestral speciation times, and the posterior probabilities of phylogenies are used to estimate the maximum posterior probability (MAP) tree. Monte Carlo integration is used to integrate over the ancestral speciation times for particular trees. A Markov Chain Monte Carlo method is used to generate the set of trees with the highest posterior probabilities. Methods are described for an empirical Bayesian analysis, in which estimates of the speciation and extinction rates are used in calculating the posterior probabilities, and a hierarchical Bayesian analysis, in which these parameters are removed from the model by an additional integration. The Markov Chain Monte Carlo method avoids the requirement of our earlier method for calculating MAP trees to sum over all possible topologies (which limited the number of taxa in an analysis to about five). The methods are applied to analyze DNA sequences for nine species of primates, and the MAP tree, which is identical to a maximum-likelihood estimate of topology, has a probability of approximately 95%.

1,230 citations


01 Jan 1997
TL;DR: In this article, a hierarchical prior model is used to deal with weak prior information while avoiding the mathematical pitfalls of using improper priors in the mixture context, and a sample from the full joint distribution of all unknown variables is generated, and this can be used as a basis for a thorough presentation of many aspects of the posterior distribution.
Abstract: SUMMARY New methodology for fully Bayesian mixture analysis is developed, making use of reversible jump Markov chain Monte Carlo methods that are capable of jumping between the parameter subspaces corresponding to different numbers of components in the mixture. A sample from the full joint distribution of all unknown variables is thereby generated, and this can be used as a basis for a thorough presentation of many aspects of the posterior distribution. The methodology is applied here to the analysis of univariate normal mixtures, using a hierarchical prior model that offers an approach to dealing with weak prior information while avoiding the mathematical pitfalls of using improper priors in the mixture context.

1,229 citations


Book
01 Mar 1997
TL;DR: A survey of statistical and econometric techniques for the analysis of count data, with a focus on conditional distribution models, can be found in this paper, where the authors provide an up-to-date survey.
Abstract: The book provides graduate students and researchers with an up-to-date survey of statistical and econometric techniques for the analysis of count data, with a focus on conditional distribution models. Proper count data probability models allow for rich inferences, both with respect to the stochastic count process that generated the data, and with respect to predicting the distribution of outcomes. The book starts with a presentation of the benchmark Poisson regression model. Alternative models address unobserved heterogeneity, state dependence, selectivity, endogeneity, underreporting, and clustered sampling. Testing and estimation is discussed from frequentist and Bayesian perspectives. Finally, applications are reviewed in fields such as economics, marketing, sociology, demography, and health sciences.The fifth edition contains several new topics, including copula functions, Poisson regression for non-counts, additional semi-parametric methods, and discrete factor models. Other sections have been reorganized, rewritten, and extended.

977 citations


Book
01 Jun 1997
TL;DR: The first principle of the Law of Likelihood as discussed by the authors is that the strength of evidence is limited by the expectation of the researcher's expectation, and the importance of the evidence is determined by the test of significance.
Abstract: The First Principle Introduction The Law of Likelihood Three Questions Towards Verification Relativity of Evidence Strength of Evidence Counterexamples Testing Simple Hypotheses Composite Hypotheses Another Counterexample Irrelevance of the Sample Space The Likelihood Principle Evidence and Uncertainty Summary Exercises Neyman-Pearson Theory Introduction Neyman-Pearson Statistical Theory Evidential Interpretation of Results of Neyman-Pearson Decision Procedures Neyman-Pearson Hypothesis Testing in Planning Experiments: Choosing the Sample Size Summary Exercises Fisherian Theory Introduction A Method for Measuring Statistical Evidence: The Test of Significance The Rationale for Significance Tests Troubles with p-Values Rejection Trials A Sample of Interpretations The Illogic of Rejection Trials Confidence Sets from Rejection Trials Alternative Hypothesis in Science Summary Paradigms for Statistics Introduction Three Paradigms An Alternative Paradigm Probabilities of Weak and Misleading Evidence: Normal Distribution Mean Understanding the Likelihood Paradigm Evidence about a Probability: Planning a Clinical Trial and Interpreting the Results Summary Exercises Resolving the Old Paradoxes Introduction Why is Power of Only 0.80 OK? Peeking at Data Repeated Tests Testing More than One Hypothesis What's Wrong with One-SIded Tests? Must the Significance Level be Predetermined? And is the Strength of Evidence Limited by the Researcher's Expectations? Summary Looking at Likelihoods Introduction Evidence about Hazard Rates in Two Factories Evidence about an Odds Ration A Standardized Mortality Rate Evidence about a Finite Population Total Determinants of Plans to Attend College Evidence about the Probabilities in a 2x2x2x2 Table Evidence from a Community Intervention Study of Hypertension Effects of Sugars on Growth of Pea Sections: Analysis of Variance Summary Exercises Nuisance Parameters Introduction Orthogonal Parameters Marginal Likelihoods Conditional Likelihoods Estimated Likelihoods Profile Likelihoods Synthetic Conditional Likelihoods Summary Exercises Bayesian Statistical Inference Introduction Bayesian Statistical Models Subjectivity in Bayesian Models The Trouble with Bayesian Statistics Are Likelihood Methods Bayesian? Objective Bayesian Inference Bayesian Integrated Likelihoods Summary Appendix: The Paradox of the Ravens

880 citations


Journal ArticleDOI
TL;DR: Methods for estimating non-Gaussian time series models rely on Markov chain Monte Carlo to carry out simulation smoothing and Bayesian posterior analysis of parameters, and on importance sampling to estimate the likelihood function for classical inference.
Abstract: SUMMARY In this paper we provide methods for estimating non-Gaussian time series models. These techniques rely on Markov chain Monte Carlo to carry out simulation smoothing and Bayesian posterior analysis of parameters, and on importance sampling to estimate the likelihood function for classical inference. The time series structure of the models is used to ensure that our simulation algorithms are efficient.

732 citations


Journal ArticleDOI
TL;DR: It is found that Gibbs sampling performs as well as, or better, then importance sampling and that the Gibbs sampling algorithms are less adversely affected by model size.
Abstract: In Bayesian analysis of vector autoregressive models, and especially in forecasting applications, the Minnesota prior of Litterman is frequently used. In many cases other prior distributions provid ...

635 citations


Journal ArticleDOI
TL;DR: In this paper, the posterior for the number of components in a mixture of normals is not well defined, and posterior simulation does not provide a direct estimate of the posterior of the components in the mixture.
Abstract: Mixtures of normals provide a flexible model for estimating densities in a Bayesian framework. There are some difficulties with this model, however. First, standard reference priors yield improper posteriors. Second, the posterior for the number of components in the mixture is not well defined (if the reference prior is used). Third, posterior simulation does not provide a direct estimate of the posterior for the number of components. We present some practical methods for coping with these problems. Finally, we give some results on the consistency of the method when the maximum number of components is allowed to grow with the sample size.

545 citations


Journal ArticleDOI
TL;DR: In this article, Bayesian inferential methods for causal estimands in the presence of noncompliance are presented, where the binary treatment assignment is random and hence ignorable, but the treatment received is not ignorable.
Abstract: For most of this century, randomization has been a cornerstone of scientific experimentation, especially when dealing with humans as experimental units. In practice, however, noncompliance is relatively common with human subjects, complicating traditional theories of inference that require adherence to the random treatment assignment. In this paper we present Bayesian inferential methods for causal estimands in the presence of noncompliance, when the binary treatment assignment is random and hence ignorable, but the binary treatment received is not ignorable. We assume that both the treatment assigned and the treatment received are observed. We describe posterior estimation using EM and data augmentation algorithms. Also, we investigate the role of two assumptions often made in econometric instrumental variables analyses, the exclusion restriction and the monotonicity assumption, without which the likelihood functions generally have substantial regions of maxima. We apply our procedures to real and artificial data, thereby demonstrating the technology and showing that our new methods can yield valid inferences that differ in practically important ways from those based on previous methods for analysis in the presence of noncompliance, including intention-to-treat analyses and analyses based on econometric instrumental variables techniques. Finally, we perform a simulation to investigate the operating characteristics of the competing procedures in a simple setting, which indicates relatively dramatic improvements in frequency operating characteristics attainable using our Bayesian procedures.

542 citations


Journal ArticleDOI
TL;DR: It is argued that for many common machine learning problems, although in general the authors do not know the true (objective) prior for the problem, they do have some idea of a set of possible priors to which the true prior belongs.
Abstract: A Bayesian model of learning to learn by sampling from multiple tasks is presented. The multiple tasks are themselves generated by sampling from a distribution over an environment of related tasks. Such an environment is shown to be naturally modelled within a Bayesian context by the concept of an objective prior distribution. It is argued that for many common machine learning problems, although in general we do not know the true (objective) prior for the problem, we do have some idea of a set of possible priors to which the true prior belongs. It is shown that under these circumstances a learner can use Bayesian inference to learn the true prior by learning sufficiently many tasks from the environment. In addition, bounds are given on the amount of information required to learn a task when it is simultaneously learnt with several other tasks. The bounds show that if the learner has little knowledge of the true prior, but the dimensionality of the true prior is small, then sampling multiple tasks is highly advantageous. The theory is applied to the problem of learning a common feature set or equivalently a low-dimensional-representation (LDR) for an environment of related tasks.

496 citations


Journal ArticleDOI
TL;DR: A new estimator, which is called the maximum local mass (MLM) estimate, that integrates local probability density and uses an optimality criterion that is appropriate for perception tasks: It finds the most probable approximately correct answer.
Abstract: The problem of color constancy may be solved if we can recover the physical properties of illuminants and surfaces from photosensor responses. We consider this problem within the framework of Bayesian decision theory. First, we model the relation among illuminants, surfaces, and photosensor responses. Second, we construct prior distributions that describe the probability that particular illuminants and surfaces exist in the world. Given a set of photosensor responses, we can then use Bayes’s rule to compute the posterior distribution for the illuminants and the surfaces in the scene. There are two widely used methods for obtaining a single best estimate from a posterior distribution. These are maximum a posteriori (MAP) and minimum mean-squared-error (MMSE) estimation. We argue that neither is appropriate for perception problems. We describe a new estimator, which we call the maximum local mass (MLM) estimate, that integrates local probability density. The new method uses an optimality criterion that is appropriate for perception tasks: It finds the most probable approximately correct answer. For the case of low observation noise, we provide an efficient approximation. We develop the MLM estimator for the color-constancy problem in which flat matte surfaces are uniformly illuminated. In simulations we show that the MLM method performs better than the MAP estimator and better than a number of standard color-constancy algorithms. We note conditions under which even the optimal estimator produces poor estimates: when the spectral properties of the surfaces in the scene are biased. © 1997 Optical Society of America [S0740-3232(97)01607-4]

Journal ArticleDOI
TL;DR: It is shown that any statistician who ignores the randomization probabilities is unable to construct nominal 95 per cent confidence intervals for the true treatment effect, and proposes a curse of dimensionality appropriate (CODA) asymptotic theory for inference in non- and semi-parametric models.
Abstract: We argue, that due to the curse of dimensionality, there are major difficulties with any pure or smoothed likelihood-based method of inference in designed studies with randomly missing data when missingness depends on a high-dimensional vector of variables. We study in detail a semi-parametric superpopulation version of continuously stratified random sampling. We show that all estimators of the population mean that are uniformly consistent or that achieve an algebraic rate of convergence, no matter how slow, require the use of the selection (randomization) probabilities. We argue that, in contrast to likelihood methods which ignore these probabilities, inverse selection probability weighted estimators continue to perform well achieving uniform n 1/2-rates of convergence. We propose a curse of dimensionality appropriate (CODA) asymptotic theory for inference in non- and semi-parametric models in an attempt to formalize our arguments. We discuss whether our results constitute a fatal blow to the likelihood principle and study the attitude toward these that a committed subjective Bayesian would adopt. Finally, we apply our CODA theory to analyse the effect of the 'curse of dimensionality' in several interesting semi-parametric models, including a model for a two-armed randomized trial with randomization probabilities depending on a vector of continuous pretreatment covariates X. We provide substantive settings under which a subjective Bayesian would ignore the randomization probabilities in analysing the trial data. We then show that any statistician who ignores the randomization probabilities is unable to construct nominal 95 per cent confidence intervals for the true treatment effect that have both: (i) an expected length which goes to zero with increasing sample size; and (ii) a guaranteed expected actual coverage rate of at least 95 per cent over the ensemble of trials analysed by the statistician during his or her lifetime. However, we derive a new interval estimator, depending on the Randomization probabilities, that satisfies (i) and (ii).

Journal ArticleDOI
TL;DR: This paper develops Bayesian tools for making inferences about firm-specific inefficiencies in panel data models using Monte Carlo integration or Gibbs sampling to study the influence of the particular priors used on the firm effects.

Journal ArticleDOI
TL;DR: In this paper, a small number of simple problems, such as estimating the mean of a normal distribution or the slope in a regression equation, are covered, and some key techniques are presented.
Abstract: This paper is concerned with methods of sample size determination. The approach is to cover a small number of simple problems, such as estimating the mean of a normal distribution or the slope in a regression equation, and to present some key techniques. The methods covered are in two groups: frequentist and Bayesian. Frequentist methods specify a null and alternative hypothesis for the parameter of interest and then find the sample size by controlling both size and power. These methods often need to use prior information but cannot allow for the uncertainty that is associated with it. By contrast, the Bayesian approach offers a wide variety of techniques, all of which offer the ability to deal with uncertainty associated with prior information.

Journal Article
TL;DR: In this paper, a hierarchical prior model is proposed to deal with weak prior information while avoiding the mathematical pitfalls of using improper priors in the mixture context, and a sample from the full joint distribution of all unknown variables is generated, which can be used as a basis for a thorough presentation of many aspects of the posterior distribution.
Abstract: New methodology for fully Bayesian mixture analysis is developed, making use of reversible jump Markov chain Monte Carlo methods that are capable of jumping between the parameter subspaces corresponding to different numbers of components in the mixture. A sample from the full joint distribution of all unknown variables is thereby generated, and this can be used as a basis for a thorough presentation of many aspects of the posterior distribution. The methodology is applied here to the analysis of univariate normal mixtures, using a hierarchical prior model that offers an approach to dealing with weak prior information while avoiding the mathematical pitfalls of using improper priors in the mixture context.

Proceedings Article
23 Aug 1997
TL;DR: Patterns of reasoning are described that allow identity sentences to be grounded in sensory observations, thereby bridging the gap between standard probability theory and sensory observations.
Abstract: Object identification--the task of deciding that two observed objects are in fact one and the same object--is a fundamental requirement for any situated agent that reasons about individuals Object identity, as represented by the equality operator between two terms in predicate calculus, is essentially a first-order concept Raw sensory observations, on the other hand, are essentially propositional-- especially when formulated as evidence in standard probability theory This paper describes patterns of reasoning that allow identity sentences to be grounded in sensory observations, thereby bridging the gap We begin by defining a physical event space over which probabilities are defined We then introduce an identity criterion, which selects those events that correspond to identity between observed objects From this, we are able to compute the probability that any two objects are the same, given a stream of observations of many objects We show that the appearance probability, which defines how an object can be expected to appear at subsequent observations given its current appearance, is a natural model for this type of reasoning We apply the theory to the task of recognizing cars observed by cameras at widely separated sites in a freeway network, with new heuristics to handle the inevitable complexity of matching large numbers of objects and with online learning of appearance probability models Despite extremely noisy observations, we are able to achieve high levels of performance

Journal ArticleDOI
01 Jul 1997
TL;DR: A fully Bayesian treatment of the problem of how large a sample to take from a population in order to make an inference about, or to take a decision concerning, some feature of the population is described.
Abstract: This paper discusses the problem of how large a sample to take from a population in order to make an inference about, or to take a decision concerning, some feature of the population. It first describes a fully Bayesian treatment. It then compares this with other methods described in recent papers in The Statistician. The major contrast lies in the use of a utility function in the Bayesian approach, whereas other methods use constraints, such as fixing an error probability.

Journal ArticleDOI
TL;DR: The authors showed that the conditional frequentist method can be made virtually equiva- lent to Bayesian testing, which is of considerable interest because it is often perceived that Bayesian and frequentist testing are incompatible in this situation.
Abstract: In this paper, we show that the conditional frequentist method of testing a precise hypothesis can be made virtually equiva- lent to Bayesian testing. The conditioning strategy proposed by Berger, Brown and Wolpert in 1994, for the simple versus simple case, is gener- alized to testing a precise null hypothesis versus a composite alternative hypothesis. Using this strategy, both the conditional frequentist and the Bayesian will report the same error probabilities upon rejecting or ac- cepting. This is of considerable interest because it is often perceived that Bayesian and frequentist testing are incompatible in this situation. That they are compatible, when conditional frequentist testing is allowed, is a strong indication that the \wrong" frequentist tests are currently be- ing used for postexperimental assessment of accuracy. The new unied testing procedure is discussed and illustrated in several common testing situations.

Journal ArticleDOI
TL;DR: In this article, a Bayesian approach to vector autoregression with stochastic volatility is proposed, where the multiplicative evolution of the precision matrix is driven by a multivariate beta variate.
Abstract: This paper proposes a Bayesian approach to a vector autoregression with stochastic volatility, where the multiplicative evolution of the precision matrix is driven by a multivariate beta variate.Exact updating formulas are given to the nonlinear filtering of the precision matrix.Estimation of the autoregressive parameters requires numerical methods: an importance-sampling based approach is explained here.

Journal ArticleDOI
TL;DR: In this paper, the advantages and disadvantages of both the classical and the Bayesian methodology are discussed, and it is argued that from a methodical point of view, for poorly identifiable systems typical in ecological modelling, the bayesian technique is the superior approach.

Journal ArticleDOI
TL;DR: Several Bayesian and mixed Bayesian/likelihood approaches to sample size calculations based on lengths and coverages of posterior credible intervals are applied to the design of an experiment to estimate the difference between two binomial proportions.
Abstract: Sample size estimation is a major component of the design of virtually every experiment in medicine. Prudent use of the available prior information is a crucial element of experimental planning. Most sample size formulae in current use employ this information only in the form of point estimates, even though it is usually more accurately expressed as a distribution over a range of values. In this paper, we review several Bayesian and mixed Bayesian/likelihood approaches to sample size calculations based on lengths and coverages of posterior credible intervals. We apply these approaches to the design of an experiment to estimate the difference between two binomial proportions, and we compare results to those derived from standard formulae. Consideration of several criteria can contribute to selection of a final sample size.

Journal ArticleDOI
TL;DR: It is shown that the assignment of penalties in the puzzling step of the QP algorithm is a special case of a more general Bayesian weighting scheme for quartet topologies, which leads to an improvement in the efficiency of QP at recovering the true tree as well as to better theoretical understanding of the method itself.
Abstract: Quartet puzzling (QP), a heuristic tree search procedure for maximum-likelihood trees, has recently been introduced (Strimmer and von Haeseler 1996). This method uses maximum-likelihood criteria for quartets of taxa which are then combined to form trees based on larger numbers of taxa. Thus, QP can be practically applied to data sets comprising a much greater number of taxa than can other search algorithms such as stepwise addition and subsequent branch swapping as implemented, e.g., in DNAML (Felsenstein 1993). However, its ability to reconstruct the true tree is less than that of DNAML (Strimmer and von Haeseler 1996). Here, we show that the assignment of penalties in the puzzling step of the QP algorithm is a special case of a more general Bayesian weighting scheme for quartet topologies. Application of this general framework leads to an improvement in the efficiency of QP at recovering the true tree as well as to better theoretical understanding of the method itself. On average, the accuracy of QP increases by 10% over all cases studied, without compromising speed or requiring more computer memory. Consider the three different fully-bifurcating tree topologies Q,, Q2, and Q3 for four taxa (fig. 1). Denote by ml, m2, and m3 their corresponding maximum-likelihood (not log-likelihood) values. Note that ml + m2 + m3 << 1. Evaluation via Bayes’ theorem of the three tree topologies given uniform prior information leads to posterior probabilities

Journal ArticleDOI
TL;DR: In this paper Bayesian analysis and Wiener process are used in orderto build an algorithm to solve the problem of globaloptimization and the Bayesian approach is exploited not only in the choice of the Wiener model but also in the estimation of the parameter σ2 of theWiener process.
Abstract: In this paper Bayesian analysis and Wiener process are used in order to build an algorithm to solve the problem of global optimization The paper is divided in two main parts In the first part an already known algorithm is considered: a new (Bayesian) stopping rule is added to it and some results are given, such as an upper bound for the number of iterations under the new stopping rule In the second part a new algorithm is introduced in which the Bayesian approach is exploited not only in the choice of the Wiener model but also in the estimation of the parameter \sigma^2 of the Wiener process, whose value appears to be quite crucial Some results about this algorithm are also given

Journal ArticleDOI
TL;DR: In this article, a Bayesian hierarchical analysis for the desired finite population quantities was carried out to include all sources of variation in the model, and compared the hierarchical Bayes estimates to empirical Bayes.
Abstract: The National Health Interview Survey is designed to produce precise estimates of finite population parameters for the entire United States but not for small geographical areas or subpopulations. Our investigation concerns estimates of proportions such as the probability of at least one visit to a doctor within the past 12 months. To include all sources of variation in the model, we carry out a Bayesian hierarchical analysis for the desired finite population quantities. First, for each cluster (county) a separate logistic regression relates the individual's probability of a doctor visit to his or her characteristics. Second, a multivariate linear regression links cluster regression parameters to covariates measured at the cluster level. We describe the numerical methods needed to obtain the desired posterior moments. Then we compare estimates produced using the exact numerical method with approximations. Finally, we compare the hierarchical Bayes estimates to empirical Bayes estimates and to stand...

Journal ArticleDOI
TL;DR: Adopting an exploratory data analysis viewpoint, diagnostic tools based on conditional predictive ordinates that conveniently get tied in with Markov chain Monte Carlo fitting of models are developed.
Abstract: SUMMARY In this paper, we propose a general model-determination strategy based on Bayesian methods for nonlinear mixed effects models. Adopting an exploratory data analysis viewpoint, we develop diagnostic tools based on conditional predictive ordinates that conveniently get tied in with Markov chain Monte Carlo fitting of models. Sampling-based methods are used to carry out these diagnostics. Two examples are presented to illustrate the effectiveness of these criteria. The first one is the famous Langmuir equation, commonly used in pharmacokinetic models, whereas the second model is used in the growth curve model for longitudinal data.

Journal ArticleDOI
01 Sep 1997
TL;DR: This paper demonstrates how Bayesian and evidential reasoning can address the same target identification problem involving multiple levels of abstraction, such as identification based on type, class, and nature, and shows that probability theory can accommodate all of these issues that are present in dealing with uncertainty.
Abstract: This paper demonstrates how Bayesian and evidential reasoning can address the same target identification problem involving multiple levels of abstraction, such as identification based on type, class, and nature. In the process of demonstrating target identification with these two reasoning methods, we compare their convergence time to a long run asymptote for a broad range of aircraft identification scenarios that include missing reports and misassociated reports. Our results show that probability theory can accommodate all of these issues that are present in dealing with uncertainty and that the probabilistic results converge to a solution much faster than those of evidence theory.

Book ChapterDOI
TL;DR: This chapter discusses the Bayesian statistical viewpoint for the determination of macromolecular crystal structures by combining a statistical scheme for describing in a quantitatively correct fashion the ambiguities present at each stage of a structure determination.
Abstract: Publisher Summary This chapter discusses the Bayesian statistical viewpoint for the determination of macromolecular crystal structures. Its application to ab initio phasing at typical macromolecular resolutions requires in addition the incorporation of stereochemical information into structure factor statistics. The concepts and methods of Bayesian statistics are designed to enable the numerical representation (via probabilities) and the bookkeeping of various states of incomplete knowledge about a system, especially of the transition from an initial (or prior) state of knowledge toward subsequent (or posterior) states as new information acquired through observations is incorporated by means of Bayes' theorem. The concepts and methods of the Bayesian statistics provide a natural framework for a unified approach combining a statistical scheme for describing in a quantitatively correct fashion the ambiguities present at each stage of a structure determination, encompassing all current methods, together with a general exploratory mechanism for resolving these ambiguities by systematically forming and evaluating multiple hypotheses about the missing information.

Journal ArticleDOI
TL;DR: In this paper, a Bayesian methodology for estimating the size of a closed population from multiple incomplete administrative lists is proposed, which allows for a variety of dependence structures between the lists, can make use of covariates, and explicitly accounts for model uncertainty.
Abstract: SUMMARY A Bayesian methodology for estimating the size of a closed population from multiple incomplete administrative lists is proposed. The approach allows for a variety of dependence structures between the lists, can make use of covariates, and explicitly accounts for model uncertainty. Interval estimates from this approach are compared to frequentist and previously published Bayesian approaches. Several examples are considered.

Proceedings Article
14 Aug 1997
TL;DR: This paper empirically test two alternative explanations for why bagging works: it is an approximation to the optimal procedure of Bayesian model averaging, with an appropriate implicit prior, and it effectively shifts the prior to a more appropriate region of model space.
Abstract: The error rate of decision-tree and other classification learners can often be much reduced by bagging: learning multiple models from bootstrap samples of the database, and combining them by uniform voting. In this paper we empirically test two alternative explanations for this, both based on Bayesian learning theory: (1) bagging works because it is an approximation to the optimal procedure of Bayesian model averaging, with an appropriate implicit prior; (2) bagging works because it effectively shifts the prior to a more appropriate region of model space. All the experimental evidence contradicts the first hypothesis, and confirms the second.

Journal ArticleDOI
TL;DR: In this paper, a regional estimation procedure that combines the index-flood concept with an empirical Bayes method for inferring regional information is introduced, which is based on the partial duration series approach with generalized Pareto (GP) distributed exceedances.
Abstract: A regional estimation procedure that combines the index-flood concept with an empirical Bayes method for inferring regional information is introduced. The model is based on the partial duration series approach with generalized Pareto (GP) distributed exceedances. The prior information of the model parameters is inferred from regional data using generalized least squares (GLS) regression. Two different Bayesian T-year event estimators are introduced: a linear estimator that requires only some moments of the prior distributions to be specified and a parametric estimator that is based on specified families of prior distributions. The regional method is applied to flood records from 48 New Zealand catchments. In the case of a strongly heterogeneous intersite correlation structure, the GLS procedure provides a more efficient estimate of the regional GP shape parameter as compared to the usually applied weighted regional average. If intersite dependence is ignored, the uncertainty of the regional estimator may be seriously underestimated and erroneous conclusions with respect to regional homogeneity may be drawn. The GLS procedure is shown to provide a general framework for a reliable evaluation of parameter uncertainty as well as for an objective appraisal of regional homogeneity. A comparison of the two different Bayesian T-year event estimators reveals that generally the simple linear estimator is adequate.