scispace - formally typeset
Search or ask a question

Showing papers on "Posterior probability published in 1995"


Journal ArticleDOI
TL;DR: There is a comprehensive introduction to the applied models of probability that stresses intuition, and both professionals, researchers, and the interested reader will agree that this is the most solid and widely used book for probability theory.
Abstract: The Seventh Edition of the successful Introduction to Probability Models introduces elementary probability theory and stochastic processes. This book is particularly well-suited to those applying probability theory to the study of phenomena in engineering, management science, the physical and social sciences, and operations research. Skillfully organized, Introduction to Probability Models covers all essential topics. Sheldon Ross, a talented and prolific textbook author, distinguishes this book by his effort to develop in students an intuitive, and therefore lasting, grasp of probability theory. Ross' classic and best-selling text has been carefully and substantially revised. The Seventh Edition includes many new examples and exercises, with the majority of the new exercises being of the easier type. Also, the book introduces stochastic processes, stressing applications, in an easily understood manner. There is a comprehensive introduction to the applied models of probability that stresses intuition. Both professionals, researchers, and the interested reader will agree that this is the most solid and widely used book for probability theory. Features: * Provides a detailed coverage of the Markov Chain Monte Carlo methods and Markov Chain covertimes * Gives a thorough presentation of k-record values and the surprising Ignatov's * theorem * Includes examples relating to: "Random walks to circles," "The matching rounds problem," "The best prize problem," and many more * Contains a comprehensive appendix with the answers to approximately 100 exercises from throughout the text * Accompanied by a complete instructor's solutions manual with step-by-step solutions to all exercises New to this edition: * Includes many new and easier examples and exercises * Offers new material on utilizing probabilistic method in combinatorial optimization problems * Includes new material on suspended animation reliability models * Contains new material on random algorithms and cycles of random permutations

4,945 citations


Journal ArticleDOI
TL;DR: In this article, a Bayesian approach for learning Bayesian networks from a combination of prior knowledge and statistical data is presented, which is derived from a set of assumptions made previously as well as the assumption of likelihood equivalence, which says that data should not help to discriminate network structures that represent the same assertions of conditional independence.
Abstract: We describe a Bayesian approach for learning Bayesian networks from a combination of prior knowledge and statistical data. First and foremost, we develop a methodology for assessing informative priors needed for learning. Our approach is derived from a set of assumptions made previously as well as the assumption of likelihood equivalence, which says that data should not help to discriminate network structures that represent the same assertions of conditional independence. We show that likelihood equivalence when combined with previously made assumptions implies that the user's priors for network parameters can be encoded in a single Bayesian network for the next case to be seen—a prior network—and a single measure of confidence for that network. Second, using these priors, we show how to compute the relative posterior probabilities of network structures given data. Third, we describe search methods for identifying network structures with high posterior probabilities. We describe polynomial algorithms for finding the highest-scoring network structures in the special case where every node has at most k e 1 parent. For the general case (k > 1), which is NP-hard, we review heuristic search algorithms including local search, iterative local search, and simulated annealing. Finally, we describe a methodology for evaluating Bayesian-network learning algorithms, and apply this approach to a comparison of various approaches.

4,124 citations


Journal ArticleDOI
TL;DR: This work exploits the fact that the marginal density can be expressed as the prior times the likelihood function over the posterior density, so that Bayes factors for model comparisons can be routinely computed as a by-product of the simulation.
Abstract: In the context of Bayes estimation via Gibbs sampling, with or without data augmentation, a simple approach is developed for computing the marginal density of the sample data (marginal likelihood) given parameter draws from the posterior distribution. Consequently, Bayes factors for model comparisons can be routinely computed as a by-product of the simulation. Hitherto, this calculation has proved extremely challenging. Our approach exploits the fact that the marginal density can be expressed as the prior times the likelihood function over the posterior density. This simple identity holds for any parameter value. An estimate of the posterior density is shown to be available if all complete conditional densities used in the Gibbs sampler have closed-form expressions. To improve accuracy, the posterior density is estimated at a high density point, and the numerical standard error of resulting estimate is derived. The ideas are applied to probit regression and finite mixture models.

1,954 citations


Journal ArticleDOI
TL;DR: Basic methodology of MCMC is presented, emphasizing the Bayesian paradigm, conditional probability and the intimate relationship with Markov random fields in spatial statistics, and particular emphasis on the calculation of posterior probabilities.
Abstract: Markov chain Monte Carlo (MCMC) methods have been used extensively in statistical physics over the last 40 years, in spatial statistics for the past 20 and in Bayesian image analysis over the last decade. In the last five years, MCMC has been introduced into significance testing, general Bayesian inference and maximum likelihood estimation. This paper presents basic methodology of MCMC, emphasizing the Bayesian paradigm, conditional probability and the intimate relationship with Markov random fields in spatial statistics. Hastings algorithms are discussed, including Gibbs, Metropolis and some other variations. Pairwise difference priors are described and are used subsequently in three Bayesian applications, in each of which there is a pronounced spatial or temporal aspect to the modeling. The examples involve logistic regression in the presence of unobserved covariates and ordinal factors; the analysis of agricultural field experiments, with adjustment for fertility gradients; and processing of low-resolution medical images obtained by a gamma camera. Additional methodological issues arise in each of these applications and in the Appendices. The paper lays particular emphasis on the calculation of posterior probabilities and concurs with others in its view that MCMC facilitates a fundamental breakthrough in applied Bayesian modeling.

1,006 citations


Journal ArticleDOI
01 Dec 1995-Genetics
TL;DR: A statistical method was developed for reconstructing the nucleotide or amino acid sequences of extinct ancestors, given the phylogeny and sequences of the extant species, and the new likelihood-based method was found to be superior to the parsimony method.
Abstract: A statistical method was developed for reconstructing the nucleotide or amino acid sequences of extinct ancestors, given the phylogeny and sequences of the extant species. A model of nucleotide or amino acid substitution was employed to analyze data of the present-day sequences, and maximum likelihood estimates of parameters such as branch lengths were used to compare the posterior probabilities of assignments of character states (nucleotides or amino acids) to interior nodes of the tree; the assignment having the highest probability was the best reconstruction at the site. The lysozyme c sequences of six mammals were analyzed by using the likelihood and parsimony methods. The new likelihood-based method was found to be superior to the parsimony method. The probability that the amino acids for all interior nodes at a site reconstructed by the new method are correct was calculated to be 0.91, 0.86, and 0.73 for all, variable, and parsimony-informative sites, respectively, whereas the corresponding probabilities for the parsimony method were 0.84, 0.76, and 0.51, respectively. The probability that an amino acid in an ancestral sequence is correctly reconstructed by the likelihood analysis ranged from 91.3 to 98.7% for the four ancestral sequences.

710 citations


Journal ArticleDOI
TL;DR: The simulation smoother is introduced, which draws from the multivariate posterior distribution of the disturbances of the model, so avoiding the degeneracies inherent in state samplers.
Abstract: SUMMARY Recently suggested procedures for simulating from the posterior density of states given a Gaussian state space time series are refined and extended. We introduce and study the simulation smoother, which draws from the multivariate posterior distribution of the disturbances of the model, so avoiding the degeneracies inherent in state samplers. The technique is important in Gibbs sampling with non-Gaussian time series models, and for performing Bayesian analysis of Gaussian time series.

587 citations


01 Jan 1995
TL;DR: A metric for computing the relative posterior probability of a network structure given data developed by Heckerman et al. (1994a,b,c) has a property useful for inferring causation from data and is described.
Abstract: We discuss Bayesian approaches for learning Bayesian networks from data. First, we review a metric for computing the relative posterior probability of a network structure given data developed by Heckerman et al. (1994a,b,c). We see that the metric has a property useful for inferring causation from data. Next, we describe search methods for identifying network structures with high posterior probabilities. We describe polynomial algorithms for finding the highestscoring network structures in the special case where every node has at most k = 1 parent. We show that the general case (k > 1) is NP-hard, and review heuristic search algorithms for this general case. Finally, we describe a methodology for evaluating learning algorithms, and use this methodology to evaluate various scoring metrics and search procedures.

259 citations


Proceedings Article
18 Aug 1995
TL;DR: In this paper, the authors present simulation algorithms that use the evidence observed at each time step to push the set of trials back towards reality in dynamic probabilistic networks, and compare the performance of each algorithm with likelihood weighting on the original network, and also investigate the benefits of combining the ER and SOF methods.
Abstract: Stochastic simulation algorithms such as likelihood weighting often give fast, accurate approximations to posterior probabilities in probabilistic networks, and are the methods bf choice for very large networks. Unfortunately, the special characteristics of dynamic probabilistic networks (DPNs), which are used to represent stochastic temporal processes, mean that standard simulation algorithms perform very poorly. In essence, the simulation trials diverge further and further from reality as the process is observed over time. In this paper, we present simulation algorithms that use the evidence observed at each time step to push the set of trials back towards reality. The first algorithm, "evidence reversal" (ER) restructures each time slice of the DPN so that the evidence nodes for the slice become ancestors of the state variables. The second algorithm, called "survival of the fittest" sampling (SOF), "repopulates" the set of trials at each time step using a stochastic reproduction rate weighted by the likelihood of the evidence according to each trial. We compare the performance of each algorithm with likelihood weighting on the original network, and also investigate the benefits of combining the ER and SOF methods. The ER/SOF combination appears to maintain bounded error independent of the number of time steps in the simulation.

231 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a method based on a linear minimum variance solution, given data and an assumed prior model which specifies the covariance matrix of the held to be reconstructed, which can be used to reconstruct the large-scale structure of the universe from noisy, sparse, and incomplete data.
Abstract: The formalism of Wiener filtering is developed here for the purpose of reconstructing the large-scale structure of the universe from noisy, sparse, and incomplete data. The method is based on a linear minimum variance solution, given data and an assumed prior model which specifies the covariance matrix of the held to be reconstructed While earlier applications of the Wiener filer have focused on estimation, namely suppressing the noise in the measured quantities, we extend the method here to perform both prediction and dynamical reconstruction. The Wiener filter is used to predict the values of unmeasured quantities, such as the density held in unsampled regions of space, or to deconvolve blurred data The method is developed, within the context of linear gravitational instability theory, to perform dynamical reconstruction of one held which is dynamically related to some other observed held. This is the case, for example, in the reconstruction of the real space galaxy distribution from its redshift distribution or the prediction of the radial velocity held from the observed density field.When the field to be reconstructed is a Gaussian random held, such as the primordial perturbation field predicted by the canonical model of cosmology, the Wiener filter can be pushed to its fullest potential. In such a case the Wiener estimator coincides with the Bayesian estimator designed to maximize the posterior probability. The Wiener filter can be also derived by assuming a quadratic regularization function, in analogy with the ''maximum entropy'' method. The mean field obtained by the minimal variance solution can be supplemented with constrained realizations of the Gaussian held to create random recitations of the residual from the mean.

196 citations


Journal ArticleDOI
TL;DR: A model of fragment generation and retention for data involving two or more copies of the chromosome of interest per clone is presented and statistical criteria such as minimum obligate breaks, maximum likelihood ratios, and Bayesian posterior probabilities can be used to decide locus order.
Abstract: Radiation hybrid mapping is a somatic cell technique for ordering genetic loci along a chromosome and estimating physical distances between adjacent loci. This paper presents a model of fragment generation and retention for data involving two or more copies of the chromosome of interest per clone. Such polyploid data can be generated by initially irradiating normal diploid cells or by pooling haploid or diploid clones. The current model assumes that fragments are generated in the ancestral cell of a clone according to an independent Poisson breakage process along each chromosome. Once generated, fragments are independently retained in the clone with a common retention probability. On the basis of this and less restrictive retention models, statistical criteria such as minimum obligate breaks, maximum likelihood ratios, and Bayesian posterior probabilities can be used to decide locus order. Distances can be estimated by maximum likelihood. Likelihood computation is particularly challenging, and computing techniques from the theory of hidden Markov chains prove crucial. Within this context it is possible to incorporate typing errors. The statistical tools discussed here are applied to 14 loci on the short arm of human chromosome 4.

170 citations


Journal ArticleDOI
TL;DR: A variety of examples demonstrate that the proposed method can provide classification ability close to or superior to learning VQ while simultaneously providing superior compression performance.
Abstract: We describe a method of combining classification and compression into a single vector quantizer by incorporating a Bayes risk term into the distortion measure used in the quantizer design algorithm. Once trained, the quantizer can operate to minimize the Bayes risk weighted distortion measure if there is a model providing the required posterior probabilities, or it can operate in a suboptimal fashion by minimizing the squared error only. Comparisons are made with other vector quantizer based classifiers, including the independent design of quantization and minimum Bayes risk classification and Kohonen's LVQ. A variety of examples demonstrate that the proposed method can provide classification ability close to or superior to learning VQ while simultaneously providing superior compression performance. >

Journal ArticleDOI
TL;DR: In this article, three different Bayesian approaches to sample size calculations based on highest posterior density (HPD) intervals are discussed and illustrated in the context of a binomial experiment.
Abstract: Three different Bayesian approaches to sample size calculations based on highest posterior density (HPD) intervals are discussed and illustrated in the context of a binomial experiment. The preposterior marginal distribution of the data is used to find the sample size needed to attain an expected HPD coverage probability for a given fixed interval length. Alternatively, one can find the sample size required to attain an expected HPD interval length for a fixed coverage. These two criteria can lead to different sample size requirements. In addition to averaging, a worst possible outcome scenario is also considered. The results presented here provide an exact solution to a problem recently addressed in the literature.

Proceedings Article
20 Aug 1995
TL;DR: A hierarchical clustering algorithm that constructs a Bet of clusters having the maximum Bayesian posterior probability, the probability that the given texts are classified into clusters is proposed.
Abstract: Text classification, the grouping of texts into several clusters, has been used as a means of improving both the efficiency and the effective-Dess of text retrieval/categorization In this paper we propose a hierarchical clustering algorithm that constructs a Bet of clusters having the maximum Bayesian posterior probability, the probability that the given texts are classified into clusters We call the algorithm Hierarchical Bayesian Clustering (HBC) The advantages of HBC are experimentally verified from several viewpoints (1) HBC can reconstruct the original clusters more accurately than do other non probabilistic algorithms (2) When a probabilistic text categorization is extended to a cluster-based one, the use of HBC offers better performance than does the use of non probabilistic algorithms.

Journal ArticleDOI
TL;DR: In this article, the authors derive the differential equation that a prior must satisfy if the posterior probability of a one-sided credibility interval for a parametric function and its frequentist probability agree up to O(n-1).
Abstract: SUMMARY We derive the differential equation that a prior must satisfy if the posterior probability of a one-sided credibility interval for a parametric function and its frequentist probability agree up to O(n-1). This equation turns out to be identical with Stein's equation for a slightly different problem, for which also our method provides a rigorous justification. Our method is different in details from Stein's but similar in spirit to Dawid (1991) and Bickel & Ghosh (1990). Some examples are provided.

Journal ArticleDOI
TL;DR: In this article, Bayesian residuals have continuous-valued posterior distributions which can be graphed to learn about outlying observations for binary regression data and can be used for outlier detection.
Abstract: SUMMARY In a binary response regression model, classical residuals are difficult to define and interpret due to the discrete nature of the response variable In contrast, Bayesian residuals have continuous-valued posterior distributions which can be graphed to learn about outlying observations Two definitions of Bayesian residuals are proposed for binary regression data Plots of the posterior distributions of the basic 'observed - fitted' residuals can be helpful in outlier detection Alternatively, the notion of a tolerance random variable can be used to define latent data residuals that are functions of the tolerance random variables and the parameters In the probit setting, these residuals are attractive in that a priori they are a sample from a standard normal distribution, and therefore the corresponding posterior distributions are easy to interpret These residual definitions are illustrated in examples and contrasted with classical outlier detection methods for binary data

Book ChapterDOI
01 Jan 1995
TL;DR: This paper presents a framework for statistical inference in which an ensemble of parameter vectors is optimized rather than a single parameter vector and approximates the posterior probability distribution of the parameters.
Abstract: Ensemble learning by variational free energy minimization is a framework for statistical inference in which an ensemble of parameter vectors is optimized rather than a single parameter vector. The ensemble approximates the posterior probability distribution of the parameters.

Journal ArticleDOI
TL;DR: It is suggested that general model comparison, model selection, and model probability estimation be performed using the Schwarz criterion, which can be implemented given the model log likelihoods using only a hand calculator.
Abstract: We investigate the performance of empirical criteria for comparing and selecting quantitative models from among a candidate set. A simulation based on empirically observed parameter values is used to determine which criterion is the most accurate at identifying the correct model specification. The simulation is composed of both nested and nonnested linear regression models. We then derive posterior probability estimates of the superiority of the alternative models from each of the criteria and evaluate the relative accuracy, bias, and information content of these probabilities. To investigate whether additional accuracy can be derived from combining criteria, a method for obtaining a joint prediction from combinations of the criteria is proposed and the incremental improvement in selection accuracy considered. Based on the simulation, we conclude that most leading criteria perform well in selecting the best model, and several criteria also produce accurate probabilities of model superiority. Computationally intensive criteria failed to perform better than criteria which were computationally simpler. Also, the use of several criteria in combination failed to appreciably outperform the use of one model. The Schwarz criterion performed best overall in terms of selection accuracy, accuracy of posterior probabilities, and ease of use. Thus, we suggest that general model comparison, model selection, and model probability estimation be performed using the Schwarz criterion, which can be implemented given the model log likelihoods using only a hand calculator.

Journal ArticleDOI
TL;DR: In this paper, it was shown that this condition is also sufficient to imply the posterior convergence, which is a necessary condition for convergence of a suitably centered (and normalized) posterior to a constant limit in terms of the limiting likelihood ratio process.
Abstract: Z.A general (asymptotic) theory of estimation was developed by Ibragimov and Has’minskii under certain conditions on the normalized likelihood ratios. In an earlier work, the present authors studied the limiting behaviour of the posterior distributions under the general setup of Ibragimov and Has’minskii. In particular, they obtained a necessary condition for the convergence of a suitably centered (and normalized) posterior to a constant limit in terms of the limiting likelihood ratio process. In this paper, it is shown that this condition is also sufficient to imply the posterior convergence. Some related results are also presented.

Journal ArticleDOI
TL;DR: In this paper, the authors consider a process control procedure with fixed sample sizes and sampling intervals, where the fraction defective is the quality variable of interest, and they show that relatively standard cost assumptions lead to the formulation of the process control problem as a partially observed Markov decision process.
Abstract: We consider a process control procedure with fixed sample sizes and sampling intervals, where the fraction defective is the quality variable of interest, a standard attributes control chart methodology. We show that relatively standard cost assumptions lead to formulation of the process control problem as a partially observed Markov decision process, where the posterior probability of a process shift is a sufficient statistic for decision making. We characterize features of the optimal solution and show that the optimal policy has a simple control limit structure. Numerical results are provided which indicate that the procedure may provide significant savings over non-Bayesian techniques.

Patent
23 Jun 1995
TL;DR: In this article, a Bayesian updating rule is employed to build a local posterior distribution for the primary variable at each simulated location, where the posterior distribution is the product of a Gaussian kernel function obtained by simple kriging of the primary variables and a secondary probability function obtained directly from a scatter diagram between primary and secondary variables.
Abstract: A multivariate stochastic simulation application that involves the mapping of a primary variable from a combination for sparse primary data and more densely sampled secondary data The method is applicable when the relationship between the simulated primary variable and one or more secondary variables is non-linear The method employs a Bayesian updating rule to build a local posterior distribution for the primary variable at each simulated location The posterior distribution is the product of a Gaussian kernel function obtained by simple kriging of the primary variable and a secondary probability function obtained directly from a scatter diagram between primary and secondary variables

Journal ArticleDOI
TL;DR: A probabilistic interpretation is presented for two important issues in neural network based classification, namely the interpretation of discriminative training criteria and the neural network outputs as well as the interpretation in terms of weighted maximum likelihood estimation.
Abstract: A probabilistic interpretation is presented for two important issues in neural network based classification, namely the interpretation of discriminative training criteria and the neural network outputs as well as the interpretation of the structure of the neural network. The problem of finding a suitable structure of the neural network can be linked to a number of well established techniques in statistical pattern recognition. Discriminative training of neural network outputs amounts to approximating the class or posterior probabilities of the classical statistical approach. This paper extends these links by introducing and analyzing novel criteria such as maximizing the class probability and minimizing the smoothed error rate. These criteria are defined in the framework of class conditional probability density functions. We show that these criteria can be interpreted in terms of weighted maximum likelihood estimation. In particular, this approach covers widely used techniques such as corrective training, learning vector quantization, and linear discriminant analysis. >

Book ChapterDOI
09 Jul 1995
TL;DR: A hybrid scheme which uses decision trees to find the relevant structure in high-dimensional classification problems and then uses local kernel density estimates to fit smooth probability estimates within this structure is discussed.
Abstract: A novel method for combining decision trees and kernel density estimators is proposed. Standard classification trees, or class probability trees, provide piecewise constant estimates of class posterior probabilities. Kernel density estimators can provide smooth non-parametric estimates of class probabilities, but scale poorly as the dimensionality of the problem increases. This paper discusses a hybrid scheme which uses decision trees to find the relevant structure in high-dimensional classification problems and then uses local kernel density estimates to fit smooth probability estimates within this structure. Experimental results on simulated data indicate that the method provides substantial improvement over trees or density methods alone for certain classes of problems. The paper briefly discusses various extensions of the basic approach and the types of application for which the method is best suited.

Journal ArticleDOI
TL;DR: This paper describes methods for assessing the robustness of the posterior distribution to the specification of the prior and illustrates how to use these methods to help a data monitoring committee decide whether or not to stop a trial early.
Abstract: Bayesian methods for the analysis of clinical trials data have received increasing attention recently as they offer an approach for dealing with difficult problems that arise in practice. A major criticism of the Bayesian approach, however, has focused on the need to specify a single, often subjective, prior distribution for the parameters of interest. In an attempt to address this critism, we describe methods for assessing the robustness of the posterior distribution to the specification of the prior. The robust Bayesian approach to data analysis replaces the prior distribution with a class of prior distributions and investigations and investigates how the inferences might change as the prior varies over this class. The purpose of this paper is to illustrate the application of robust Bayesian methods to the analysis of clinical trials data. Using two examples of clinical trials taken from the literature, we illustrate how to use these methods to help a data monitoring committee decide whether or not to stop a trial early.

Journal ArticleDOI
TL;DR: This note is a discussion of the problem which the forensic scientist faces, particularly when at court, of avoiding making probability statements which are logically incorrect.

Journal ArticleDOI
TL;DR: In this paper, the authors developed the statistical mechanics formulation of the image restoration problem and established the posterior probability distribution for restored images, for given data (corrupted image) and prior (assumptions about source and corruption process).
Abstract: We develop the statistical mechanics formulation of the image restoration problem, pioneered by Geman and Geman (1984). Using Bayesian methods we establish the posterior probability distribution for restored images, for given data (corrupted image) and prior (assumptions about source and corruption process). In the simplest cases, studied here, the posterior is controlled by a cost function analagous to the configurational energy of an Ising model with local fields whose sense is defined by the data. Through a combination of Monte Carlo simulation and mean-field theory we address three key issues. First, we explore the sensitivity of the posterior distribution to the choice of prior parameters: we find phase transitions separating regions in which the distribution is effective (data-dominated) from regions in which it is ineffective (prior-dominated). Second, we examine the question of how best to use the posterior distribution to prescribe a single "optimal" restored image: we argue that the mean of the posterior is, in general, to be preferred over the mode, both in principle and in practice. Finally, borrowing from Monte Carlo techniques for free-energy calculations, we address the question of prior parameter estimation within the "evidence" framework of Gull (1989) and MacKay (1992): our results suggest that parameters identified by this framework provide effective priors, leading to optimal restoration, only to the extent that the forms of the priors are well matched to the processes they claim to represent.

Journal ArticleDOI
TL;DR: In this paper, necessary and sufficient conditions for the existence of the posterior distribution of the variance components in a class of mixed models for binomial responses are given for the binomial response.
Abstract: SUMMARY Necessary and sufficient conditions are given for the existence of the posterior distribution of the variance components in a class of mixed models for binomial responses. The implications of our results are illustrated through an example.

Journal ArticleDOI
LW Hepple1
TL;DR: In this paper, the problems of specification and non-nested model comparison in spatial and network econometrics are examined, and the Bayesian posterior probabilities approach is developed.
Abstract: In this paper the problems of specification and nonnested model comparison in spatial and network econometrics are examined, and the Bayesian posterior probabilities approach is developed. The theo...

Journal ArticleDOI
TL;DR: In this article, the problem of combining information related to I binomial experiments, each having a distinct probability of success θ i, is considered instead of using a standard exchangeable prior for θ; = (θ1, …, θ I ), and a more flexible distribution is proposed that takes into account various degrees of similarity among the samples.
Abstract: The problem of combining information related to I binomial experiments, each having a distinct probability of success θ i , is considered Instead of using a standard exchangeable prior for θ; = (θ1, …, θ I ), we propose a more flexible distribution that takes into account various degrees of similarity among the θ i 's Using ideas developed by Malec and Sedransk, we consider a partition g of the experiments and take the θ i 's belonging to the same partition subset to be exchangeable and the θ i 's belonging to distinct subsets to be independent Next we perform Bayesian inference on θ; conditional on g Of course, one is typically uncertain about which partition to use, and so a prior distribution is assigned on a set of plausible partitions g The final inference on θ; is obtained by combining the conditional inferences according to the posterior distribution of g The methodology adopted in this article offers a wide flexibility in structuring the dependence among the θ i 's This allows the

Proceedings Article
27 Nov 1995
TL;DR: REMAP is introduced, an approach for the training and estimation of posterior probabilities using a recursive algorithm that is reminiscent of the EM-based Forward-Backward (Liporace 1982) algorithm for the estimation of sequence likelihoods.
Abstract: In this paper, we introduce REMAP, an approach for the training and estimation of posterior probabilities using a recursive algorithm that is reminiscent of the EM-based Forward-Backward (Liporace 1982) algorithm for the estimation of sequence likelihoods. Although very general, the method is developed in the context of a statistical model for transition-based speech recognition using Artificial Neural Networks (ANN) to generate probabilities for Hidden Markov Models (HMMs). In the new approach, we use local conditional posterior probabilities of transitions to estimate global posterior probabilities of word sequences. Although we still use ANNs to estimate posterior probabilities, the network is trained with targets that are themselves estimates of local posterior probabilities. An initial experimental result shows a significant decrease in error-rate in comparison to a baseline system.

Book ChapterDOI
01 Jan 1995
TL;DR: The uncertainty treatment in Artificial Intelligence can be based on numerical and non numerical methods and different uncertainty measures have been proposed to manage vague information and imprecise data.
Abstract: The uncertainty treatment in Artificial Intelligence can be based on numerical and non numerical methods. Among the numerical methods different uncertainty measures have been proposed in literature to manage vague information and imprecise data.