scispace - formally typeset
Search or ask a question

Showing papers on "Posterior probability published in 1996"


Journal ArticleDOI
TL;DR: The results of the method are found to be insensitive to changes in the rate parameter of the branching process, and the best trees estimated by the new method are the same as those from the maximum likelihood analysis of separate topologies, but the posterior probabilities are quite different from the bootstrap proportions.
Abstract: A new method is presented for inferring evolutionary trees using nucleotide sequence data. The birth-death process is used as a model of speciation and extinction to specify the prior distribution of phylogenies and branching times. Nucleotide substitution is modeled by a continuous-time Markov process. Parameters of the branching model and the substitution model are estimated by maximum likelihood. The posterior probabilities of different phylogenies are calculated and the phylogeny with the highest posterior probability is chosen as the best estimate of the evolutionary relationship among species. We refer to this as the maximum posterior probability (MAP) tree. The posterior probability provides a natural measure of the reliability of the estimated phylogeny. Two example data sets are analyzed to infer the phylogenetic relationship of human, chimpanzee, gorilla, and orangutan. The best trees estimated by the new method are the same as those from the maximum likelihood analysis of separate topologies, but the posterior probabilities are quite different from the bootstrap proportions. The results of the method are found to be insensitive to changes in the rate parameter of the branching process.

1,508 citations


01 Jan 1996

1,282 citations


Book ChapterDOI
01 Jan 1996
TL;DR: In this article, it was shown that the search problem of identifying a Bayesian network with a relative posterior probability greater than a given constant is NP-complete, when the BDe metric is used.
Abstract: Algorithms for learning Bayesian networks from data have two components: a scoring metric and a search procedure. The scoring metric computes a score reflecting the goodness-of-fit of the structure to the data. The search procedure tries to identify network structures with high scores. Heckerman et al. (1995) introduce a Bayesian metric, called the BDe metric, that computes the relative posterior probability of a network structure given data. In this paper, we show that the search problem of identifying a Bayesian network—among those where each node has at most K parents—that has a relative posterior probability greater than a given constant is NP-complete, when the BDe metric is used.

1,133 citations


Journal ArticleDOI
TL;DR: This article introduces a new criterion called the intrinsic Bayes factor, which is fully automatic in the sense of requiring only standard noninformative priors for its computation and yet seems to correspond to very reasonable actual Bayes factors.
Abstract: In the Bayesian approach to model selection or hypothesis testing with models or hypotheses of differing dimensions, it is typically not possible to utilize standard noninformative (or default) prior distributions. This has led Bayesians to use conventional proper prior distributions or crude approximations to Bayes factors. In this article we introduce a new criterion called the intrinsic Bayes factor, which is fully automatic in the sense of requiring only standard noninformative priors for its computation and yet seems to correspond to very reasonable actual Bayes factors. The criterion can be used for nested or nonnested models and for multiple model comparison and prediction. From another perspective, the development suggests a general definition of a “reference prior” for model comparison.

993 citations


Journal ArticleDOI
TL;DR: A locally adaptive form of nearest neighbor classification is proposed to try to finesse this curse of dimensionality, and a method for global dimension reduction is proposed, that combines local dimension information.
Abstract: Nearest neighbour classification expects the class conditional probabilities to be locally constant, and suffers from bias in high dimensions. We propose a locally adaptive form of nearest neighbour classification to try to ameliorate this curse of dimensionality. We use a local linear discriminant analysis to estimate an effective metric for computing neighbourhoods. We determine the local decision boundaries from centroid information, and then shrink neighbourhoods in directions orthogonal to these local decision boundaries, and elongate them parallel to the boundaries. Thereafter, any neighbourhood-based classifier can be employed, using the modified neighbourhoods. The posterior probabilities tend to be more homogeneous in the modified neighbourhoods. We also propose a method for global dimension reduction, that combines local dimension information. In a number of examples, the methods demonstrate the potential for substantial improvements over nearest neighbour classification.

908 citations


Journal ArticleDOI
TL;DR: The method of Hidden Markov Models is used to allow for unequal and unknown evolutionary rates at different sites in molecular sequences and it is shown how to use the Newton-Raphson method to estimate branch lengths of a phylogeny and to infer from a phylogenies what assignment of rates to sites has the largest posterior probability.
Abstract: The method of Hidden Markov Models is used to allow for unequal and unknown evolutionary rates at different sites in molecular sequences. Rates of evolution at different sites are assumed to be drawn from a set of possible rates, with a finite number of possibilities. The overall likelihood of phylogeny is calculated as a sum of terms, each term being the probability of the data given a particular assignment of rates to sites, times the prior probability of that particular combination of rates. The probabilities of different rate combinations are specified by a stationary Markov chain that assigns rate categories to sites. While there will be a very large number of possible ways of assigning rates to sites, a simple recursive algorithm allows the contributions to the likelihood from all possible combinations of rates to be summed, in a time proportional to the number of different rates at a single site. Thus with three rates, the effort involved is no greater than three times that for a single rate. This “Hidden Markov Model” method allows for rates to differ between sites and for correlations between the rates of neighboring sites. By summing over all possibilities it does not require us to know the rates at individual sites. However, it does not allow for correlation of rates at nonadjacent sites, nor does it allow for a continuous distribution of rates over sites. It is shown how to use the Newton-Raphson method to estimate branch lengths of a phylogeny and to infer from a phylogeny what assignment of rates to sites has the largest posterior probability. An example is given using phemoglobin DNA sequences in eight mammal species; the regions of high and low evolutionary rates are inferred and also the average length of patches of similar rates.

831 citations


Journal ArticleDOI
TL;DR: The Gibbs sampler may be used to explore the posterior distribution without ever having established propriety of the posterior, showing that the output from a Gibbs chain corresponding to an improper posterior may appear perfectly reasonable.
Abstract: Often, either from a lack of prior information or simply for convenience, variance components are modeled with improper priors in hierarchical linear mixed models. Although the posterior distributions for these models are rarely available in closed form, the usual conjugate structure of the prior specification allows for painless calculation of the Gibbs conditionals. Thus the Gibbs sampler may be used to explore the posterior distribution without ever having established propriety of the posterior. An example is given showing that the output from a Gibbs chain corresponding to an improper posterior may appear perfectly reasonable. Thus one cannot expect the Gibbs output to provide a “red flag,” informing the user that the posterior is improper. The user must demonstrate propriety before a Markov chain Monte Carlo technique is used. A theorem is given that classifies improper priors according to the propriety of the resulting posteriors. Applications concerning Bayesian analysis of animal breeding...

450 citations


Journal ArticleDOI
TL;DR: A notion of causal independence is presented that enables one to further factorize the conditional probabilities into a combination of even smaller factors and consequently obtain a finer-grain factorization of the joint probability.
Abstract: A new method is proposed for exploiting causal independencies in exact Bayesian network inference. A Bayesian network can be viewed as representing a factorization of a joint probability into the multiplication of a set of conditional probabilities. We present a notion of causal independence that enables one to further factorize the conditional probabilities into a combination of even smaller factors and consequently obtain a finer-grain factorization of the joint probability. The new formulation of causal independence lets us specify the conditional probability of a variable given its parents in terms of an associative and commutative operator, such as "or", "sum" or "max", on the contribution of each parent. We start with a simple algorithm VE for Bayesian network inference that, given evidence and a query variable, uses the factorization to find the posterior distribution of the query. We show how this algorithm can be extended to exploit causal independence. Empirical studies, based on the CPCS networks for medical diagnosis, show that this method is more efficient than previous methods and allows for inference in larger networks than previous algorithms.

449 citations


Posted Content
TL;DR: In this article, the authors proposed a new method for exploiting causal independencies in exact Bayesian network inference, which enables one to further factorize the conditional probabilities into a combination of even smaller factors and consequently obtain a finer-grain factorization of the joint probability.
Abstract: A new method is proposed for exploiting causal independencies in exact Bayesian network inference. A Bayesian network can be viewed as representing a factorization of a joint probability into the multiplication of a set of conditional probabilities. We present a notion of causal independence that enables one to further factorize the conditional probabilities into a combination of even smaller factors and consequently obtain a finer-grain factorization of the joint probability. The new formulation of causal independence lets us specify the conditional probability of a variable given its parents in terms of an associative and commutative operator, such as ``or'', ``sum'' or ``max'', on the contribution of each parent. We start with a simple algorithm VE for Bayesian network inference that, given evidence and a query variable, uses the factorization to find the posterior distribution of the query. We show how this algorithm can be extended to exploit causal independence. Empirical studies, based on the CPCS networks for medical diagnosis, show that this method is more efficient than previous methods and allows for inference in larger networks than previous algorithms.

437 citations


Journal ArticleDOI
TL;DR: In this article, a hierarchical model is proposed for a Bayesian semiparametric analysis of randomised block experiments, in which a Dirichlet process is inserted at the middle stage for the distribution of the block effects.
Abstract: SUMMARY A model is proposed for a Bayesian semiparametric analysis of randomised block experiments. The model is a hierarchical model in which a Dirichlet process is inserted at the middle stage for the distribution of the block effects. This model allows an arbitrary distribution of block effects, and it results in effective estimates of treatment contrasts, block effects and the distribution of block effects. An effective computational strategy is presented for describing the posterior distribution.

348 citations


Journal ArticleDOI
TL;DR: It is shown that a similarity network can always answer any query of the form: “What is the posterior probability of an hypothesis given evidence?” and is called diagnostic completeIZESS.

Book ChapterDOI
26 Aug 1996
TL;DR: New results are presented which show that within a Bayesian framework not only grammars, but also logic programs are learnable with arbitrarily low expected error from positive examples only and the upper bound for expected error of a learner which maximises the Bayes' posterior probability is within a small additive term of one which does the same from a mixture of positive and negative examples.
Abstract: Gold showed in 1967 that not even regular grammars can be exactly identified from positive examples alone. Since it is known that children learn natural grammars almost exclusively from positives examples, Gold's result has been used as a theoretical support for Chomsky's theory of innate human linguistic abilities. In this paper new results are presented which show that within a Bayesian framework not only grammars, but also logic programs are learnable with arbitrarily low expected error from positive examples only. In addition, we show that the upper bound for expected error of a learner which maximises the Bayes' posterior probability when learning from positive examples is within a small additive term of one which does the same from a mixture of positive and negative examples. An Inductive Logic Programming implementation is described which avoids the pitfalls of greedy search by global optimisation of this function during the local construction of individual clauses of the hypothesis. Results of testing this implementation on artificially-generated data-sets are reported. These results are in agreement with the theoretical predictions.

Journal ArticleDOI
TL;DR: Simulations show that density estimation correctly finds movement directions for nonuniform distributions of preferred directions and noncosine cell tuning curves, whereas the population vector method fails for these cases.
Abstract: 1. Electrophysiological recording data from multiple cells in motor cortex and elsewhere often are interpreted using the population vector method pioneered by Georgopoulos and coworkers. This paper proposes an alternative method for interpreting coding across populations of cells that may succeed under circumstances in which the population vector fails. 2. Population codes are analyzed using probability theory to find the complete conditional probability density of a movement parameter given the firing pattern of a set of cells. 3. The conditional probability density when a single cell fires is proportional to the shape of the cell's tuning curve of firing rate in response to different movement parameters. 4. The conditional density when multiple cells fire is proportional to the product of their tuning curves. 5. Movement parameters can be estimated from the conditional density using statistical maximum likelihood or minimum mean-squared error methods. 6. Simulations show that density estimation correctly finds movement directions for nonuniform distributions of preferred directions and noncosine cell tuning curves, whereas the population vector method fails for these cases. 7. Probability methods thus provide a statistically based alternative to the population vector for interpreting electrophysiological recording data from multiple cells.

Journal ArticleDOI
07 Sep 1996-BMJ
TL;DR: Data presented as a series of posterior probability distributions would be a much better guide to policy, reflecting the reality that degrees of belief are often continuous, not dichotomous, and often vary from one person to another in the face of inconclusive evidence.
Abstract: The recent controversy over the increased risk of venous thrombosis with third generation oral contraceptives illustrates the public policy dilemma that can be created by relying on conventional statistical tests and estimates: case-control studies showed a significant increase in risk and forced a decision either to warn or not to warn. Conventional statistical tests are an improper basis for such decisions because they dichotomise results according to whether they are or are not significant and do not allow decision makers to take explicit account of additional evidence—for example, of biological plausibility or of biases in the studies. A Bayesian approach overcomes both these problems. A Bayesian analysis starts with a “prior” probability distribution for the value of interest (for example, a true relative risk)—based on previous knowledge—and adds the new evidence (via a model) to produce a “posterior” probability distribution. Because different experts will have different prior beliefs sensitivity analyses are important to assess the effects on the posterior distributions of these differences. Sensitivity analyses should also examine the effects of different assumptions about biases and about the model which links the data with the value of interest. One advantage of this method is that it allows such assumptions to be handled openly and explicitly. Data presented as a series of posterior probability distributions would be a much better guide to policy, reflecting the reality that degrees of belief are often continuous, not dichotomous, and often vary from one person to another in the face of inconclusive evidence. Every five to 10 years a “pill scare” hits the headlines. Imagine that you are the chairperson of the Committee on Safety of Medicines. You have been sent the galley proofs of four case-control studies showing that the leading brands of oral contraceptive, which have been widely used for some five years, …

Posted Content
TL;DR: In this article, the authors use the technique of low-temperature expansions to derive a systematic series for the Bayesian posterior probability of a model family that significantly extends known results in the literature.
Abstract: The task of parametric model selection is cast in terms of a statistical mechanics on the space of probability distributions. Using the techniques of low-temperature expansions, we arrive at a systematic series for the Bayesian posterior probability of a model family that significantly extends known results in the literature. In particular, we arrive at a precise understanding of how Occam's Razor, the principle that simpler models should be preferred until the data justifies more complex models, is automatically embodied by probability theory. These results require a measure on the space of model parameters and we derive and discuss an interpretation of Jeffreys' prior distribution as a uniform prior over the distributions indexed by a family. Finally, we derive a theoretical index of the complexity of a parametric family relative to some true distribution that we call the {\it razor} of the model. The form of the razor immediately suggests several interesting questions in the theory of learning that can be studied using the techniques of statistical mechanics.

Journal ArticleDOI
TL;DR: Based on orthogonalization of the space of candidate predictors, this approach can approximate the posterior probabilities of models by products of predictor-specific terms, and leads to an importance sampling function for sampling directly from the joint distribution over the model space, without resorts to Markov chains.
Abstract: We introduce an approach and algorithms for model mixing in large prediction problems with correlated predictors. We focus on the choice of predictors in linear models, and mix over possible subsets of candidate predictors. Our approach is based on expressing the space of models in terms of an orthogonalization of the design matrix. Advantages are both statistical and computational. Statistically, orthogonalization often leads to a reduction in the number of competing models by eliminating correlations. Computationally, large model spaces cannot be enumerated; recent approaches are based on sampling models with high posterior probability via Markov chains. Based on orthogonalization of the space of candidate predictors, we can approximate the posterior probabilities of models by products of predictor-specific terms. This leads to an importance sampling function for sampling directly from the joint distribution over the model space, without resorting to Markov chains. Compared to the latter, ortho...

Journal ArticleDOI
TL;DR: This article proposed a method for simultaneous variable selection and outlier identification based on the computation of posterior model probabilities, which avoids the problem that the model selection depends upon the order in which variable selection is carried out.

Book ChapterDOI
01 Jan 1996
TL;DR: This chapter describes a use of recurrent neural networks (i.e., feedback is incorporated in the computation) as an acoustic model for continuous speech recognition as well as an appropriate parameter estimation procedure.
Abstract: This chapter describes a use of recurrent neural networks (i.e., feedback is incorporated in the computation) as an acoustic model for continuous speech recognition. The form of the recurrent neural network is described along with an appropriate parameter estimation procedure. For each frame of acoustic data, the recurrent network generates an estimate of the posterior probability of of the possible phones given the observed acoustic signal. The posteriors are then converted into scaled likelihoods and used as the observation probabilities within a conventional decoding paradigm (e.g., Viterbi decoding). The advantages of using recurrent networks are that they require a small number of parameters and provide a fast decoding capability (relative to conventional, large-vocabulary, HMM systems)3.

Journal ArticleDOI
TL;DR: The mass assignment theory of conditional probabilities is shown to be probability/possibility consistent and an alternative theory of unconditional probabilities based on mass assignments is presented together with a number of results illustrating some intuitive properties.

Book ChapterDOI
01 Jan 1996
TL;DR: Two approximate methods for computational implementation of Bayesian hierarchical models which include unknown hyperparameters such as regularization constants are examined, and comparisons are made with the ideal hierarchical Bayesian solution.
Abstract: I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants. In the ‘evidence framework’ the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized hyperparameters are used to define a Gaussian approximation to the posterior distribution. In the alternative ‘MAP’ method, the true posterior probability is found by integrating over the hyperparameters. The true posterior is then maximized over the model parameters, and a Gaussian approximation is made. The similarities of the two approaches, and their relative merits, are discussed, and comparisons are made with the ideal hierarchical Bayesian solution.

Journal ArticleDOI
TL;DR: The problem of simultaneous wavelet estimation and deconvolution is investigated with a Bayesian approach under the assumption that the reflectivity obeys a Bernoulli-Gaussian distribution.
Abstract: The problem of simultaneous wavelet estimation and deconvolution is investigated with a Bayesian approach under the assumption that the reflectivity obeys a Bernoulli-Gaussian distribution. Unknown quantities, including the seismic wavelet, the reflection sequence, and the statistical parameters of reflection sequence and noise are all treated as realizations of random variables endowed with suitable prior distributions. Instead of deterministic procedures that can be quite computationally burdensome, a simple Monte Carlo method, called Gibbs sampler, is employed to produce random samples iteratively from the joint posterior distribution of the unknowns. Modifications are made in the Gibbs sampler to overcome the ambiguity problems inherent in seismic deconvolution. Simple averages of the random samples are used to approximate the minimum mean-squared error (MMSE) estimates of the unknowns. Numerical examples are given to demonstrate the performance of the method.

Patent
16 Apr 1996
TL;DR: In this article, a queryless, multimedia database search method incorporating a Bayesian inference engine that refines its answer with each user response is presented, and the set of user responses includes of a series of displays and user actions.
Abstract: A queryless, multimedia database search method incorporating a Bayesian inference engine that refines its answer with each user response. The set of user responses includes of a series of displays and user actions, and is defined by a relatively simple user interface.

01 Jan 1996
TL;DR: It is shown that the search problem of identifying a Bayesian network—among those where each node has at most K parents—that has a relative posterior probability greater than a given constant is NP-complete, when the BDe metric is used.
Abstract: Algorithms for learning Bayesian networks from data have two components: a scoring metric and a search procedure. The scoring metric computes a score re ecting the goodness-of- t of the structure to the data. The search procedure tries to identify network structures with high scores. Heckerman et al. (1995) introduce a Bayesian metric, called the BDe metric, that computes the relative posterior probability of a network structure given data. In this paper, we show that the search problem of identifying a Bayesian network|among those where each node has at most K parents|that has a relative posterior probability greater than a given constant is NP-complete, when the BDe metric is used.

Proceedings Article
01 Aug 1996
TL;DR: It is shown that even asymmetric, Iogodds-normal noise has modest effects, and that the gold-standard posterior probabilities are often near zero or one, and are little disturbed by noise.
Abstract: Recent research has found that diagnostic performance with Bayesian belief networks is often surprisingly insensitive to imprecision in the numerical probabilities. For example, the authors have recently completed an extensive study in which they applied random noise to the numerical probabilities in a set of belief networks for medical diagnosis, subsets of the CPCS network, a subset of the QMR (Quick Medical Reference) focused on liver and bile diseases. The diagnostic performance in terms of the average probabilities assigned to the actual diseases showed small sensitivity even to large amounts of noise. In this paper, we summarize the findings of this study and discuss possible explanations of this low sensitivity. One reason is that the criterion for performance is average probability of the true hypotheses, rather than average error in probability, which is insensitive to symmetric noise distributions. But, we show that even asymmetric, Iogodds-normal noise has modest effects, A second reason is that the gold-standard posterior probabilities are often near zero or one, and are little disturbed by noise.

Journal ArticleDOI
TL;DR: In this article, the authors discuss sufficient conditions under which this reasoning to a foregone conclusion cannot occur, and illustrate how when the sufficient conditions fail, because probability is finitely but not countably additive, it may be that a Bayesian can design an experiment to lead his/her posterior probability into a final conclusion.
Abstract: When can a Bayesian select an hypothesis H and design an experiment (or a sequence of experiments) to make certain that, given the experimental outcome(s), the posterior probability of H will be greater than its prior probability? We discuss an elementary result that establishes sufficient conditions under which this reasoning to a foregone conclusion cannot occur. We illustrate how when the sufficient conditions fail, because probability is finitely but not countably additive, it may be that a Bayesian can design an experiment to lead his/her posterior probability into a foregone conclusion. The problem has a decision theoretic version in which a Bayesian might rationally pay not to see the outcome of certain cost-free experiments, which we discuss from several perspectives. Also, we relate this issue in Bayesian hypothesis testing to various concerns about “optional stopping.”

Journal ArticleDOI
TL;DR: This work investigates several VQ-based algorithms that seek to minimize both the distortion of compressed images and errors in classifying their pixel blocks and introduces a tree-structured posterior estimator to produce the class posterior probabilities required for the Bayes risk computation in this design.
Abstract: Classification and compression play important roles in communicating digital information. Their combination is useful in many applications, including the detection of abnormalities in compressed medical images. In view of the similarities of compression and low-level classification, it is not surprising that there are many similar methods for their design. Because some of these methods are useful for designing vector quantizers, it seems natural that vector quantization (VQ) is explored for the combined goal. We investigate several VQ-based algorithms that seek to minimize both the distortion of compressed images and errors in classifying their pixel blocks. These algorithms are investigated with both full search and tree-structured codes. We emphasize a nonparametric technique that minimizes both error measures simultaneously by incorporating a Bayes risk component into the distortion measure used for the design and encoding. We introduce a tree-structured posterior estimator to produce the class posterior probabilities required for the Bayes risk computation in this design. For two different image sources, we demonstrate that this system provides superior classification while maintaining compression close or superior to that of several other VQ-based designs, including Kohonen's (1992) "learning vector quantizer" and a sequential quantizer/classifier design.

Journal ArticleDOI
TL;DR: It is argued here that the alternative solution position is correct, the posterior moment position representing a conflation of the criterion, which is provided by the equations of the model, with metaphors, analogies, and senses of "factor" that are external to the model.
Abstract: The issue of indeterminacy in the factor analysis model has been the source of a lengthy and on-going debate. This debate can be seen as featuring two relevant interpretations of indeterminacy. The alternative solution position considers the latent common factor to be a random variate whose properties are determined by functional constraints inherent in the model. When the model fits the data, an infinity of random variates are criterially latent common factors to the set of manifest variates analyzed. The posterior moment position considers the latent common factor to be a single random entity with a non-point posterior distribution, given the manifest variables. It is argued here that: (a) The issue of indeterminacy centres on the criterion for the claim "X is a latent common factor to Y"; (b) the alternative solution position is correct, the posterior moment position representing a conflation of the criterion, which is provided by the equations of the model, with metaphors, analogies, and senses of "factor" that are external to the model. A number of implications for applied work involving factor analysis are discussed.

Journal ArticleDOI
TL;DR: Through analyses of the data from an innovative mathematics curriculum, it is examined when and why it becomes important to employ a fully Bayesian approach and the need to study the sensitivity of results to alternative prior distributional assumptions for the variance components and for the random regression parameters is discussed.
Abstract: In applications of hierarchical models (HMs), a potential weakness of empirical Bayes estimation approaches is that they do not to take into account uncertainty in the estimation of the variance components (see, e.g., Dempster, 1987). One possible solution entails employing a fully Bayesian approach, which involves specifying a prior probability distribution for the variance components and then integrating over the variance components as well as other unknowns in the HM to obtain a marginal posterior distribution of interest (see, e.g., Draper, 1995; Rubin, 1981). Though the required integrations are often exceedingly complex, Markov-chain Monte Carlo techniques (e.g., the Gibbs sampler) provide a viable means of obtaining marginal posteriors of interest in many complex settings. In this article, we fully generalize the Gibbs sampling algorithms presented in Seltzer (1993) to a broad range of settings in which vectors of random regression parameters in the HM (e.g., school means and slopes) are assumed mu...

Journal ArticleDOI
TL;DR: The Hit-and-Run sampler is elaborated on, a Monte Carlo approach that estimates the value of a high-dimensional integral with integrand h( x)f(x) by sampling from a time-reversible Markov chain over the suport of the density f.

Journal ArticleDOI
TL;DR: In this paper, the authors model the occurrence time and reporting delay of claims as a marked point process and apply it to a portfolio of accident insurances, where the distribution of the process is described by 14 onedimensional components.
Abstract: Occurrences and developments of claims are modelled as a marked point process. The individual claim consists of an occurrence time, two covariates, a reporting delay, and a process describing partial payments and settlement of the claim. Under certain likelihood assumptions the distribution of the process is described by 14 onedimensional components. The modelling is nonparametric Bayesian. The posterior distribution of the components and the posterior distribution of the outstanding IBNR and RBNS liabilities are found simultaneously. The method is applied to a portfolio of accident insurances.