scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Inferences from Multinomial Data: Learning About a Bag of Marbles

01 Jan 1996-Journal of the royal statistical society series b-methodological (John Wiley & Sons, Ltd)-Vol. 58, Iss: 1, pp 3-34
TL;DR: In this article, the imprecise Dirichlet model is proposed for multinomial data in cases where there is no prior information and the probabilities are expressed in terms of posterior upper and lower probabilities.
Abstract: A new method is proposed for making inferences from multinomial data in cases where there is no prior information. A paradigm is the problem of predicting the colour of the next marble to be drawn from a bag whose contents are (initially) completely unknown. In such problems we may be unable to formulate a sample space because we do not know what outcomes are possible. This suggests an invariance principle : inferences based on observations should not depend on the sample space in which the observations and future events of interest are represented. Objective Bayesian methods do not satisfy this principle. This paper describes a statistical model, called the imprecise Dirichlet model, for drawing coherent inferences from multinomial data. Inferences are expressed in terms of posterior upper and lower probabilities. The probabilities are initially vacuous, reflecting prior ignorance, but they become more precise as the number of observations increases. This model does satisfy the invariance principle. Two sets of data are analysed in detail. In the first example one red marble is observed in six drawings from a bag. Inferences from the imprecise Dirichlet model are compared with objective Bayesian and frequentist inferences. The second example is an analysis of data from medical trials which compared two treatments for cardiorespiratory failure in newborn babies. There are two problems : to draw conclusions about which treatment is more effective and to decide when the randomized trials should be terminated. This example shows how the imprecise Dirichlet model can be used to analyse data in the form of a contingency table.
Citations
More filters
Journal ArticleDOI
TL;DR: zCompositions as discussed by the authors is an R package for the imputation of left-censored data under a compositional approach, which is used in fields like geochemistry of waters or sedimentary rocks, environmental studies related to air pollution, physicochemical analysis of glass fragments in forensic science.

551 citations

Journal Article
TL;DR: In this article, the authors argue for abandonment of NHST by exposing its fallacies and, more importantly, offer better - more sound and useful - alternatives for NHST in machine learning.
Abstract: The machine learning community adopted the use of null hypothesis significance testing (NHST) in order to ensure the statistical validity of results. Many scientific fields however realized the shortcomings of frequentist reasoning and in the most radical cases even banned its use in publications. We should do the same: just as we have embraced the Bayesian paradigm in the development of new machine learning methods, so we should also use it in the analysis of our own results. We argue for abandonment of NHST by exposing its fallacies and, more importantly, offer better - more sound and useful - alternatives for it.

227 citations

Journal ArticleDOI
TL;DR: In this article, the authors gratefully acknowledge the support by the Operational Program Education for Competitiveness-European Social Fund (project CZ.1.07/2.3.00/20.0170 of the Ministry of Education, Youth and Sports of the Czech Republic) and the Scottish Government's Rural and Environment Science and Analytical Services Division (RESAS).
Abstract: This research was supported by the Ministerio de Economia y Competividad under the project 'METRICS' Ref. MTM2012-33236, by the Agencia de Gestio d'Ajuts Universitaris i de Recerca of the Generalitat de Catalunya under the project Ref: 2009SGR424, and by the Scottish Government's Rural and Environment Science and Analytical Services Division (RESAS). The authors also gratefully acknowledge the support by the Operational Program Education for Competitiveness-European Social Fund (project CZ.1.07/2.3.00/20.0170 of the Ministry of Education, Youth and Sports of the Czech Republic)

175 citations


Cites background from "Inferences from Multinomial Data: L..."

  • ...In addition, the Perks, Jeffreys, Bayes-Laplace and SQ priors (Table 1) have another difficulty because they violate the representation invariance principle (RIP): the posterior probability should not depend on D, the number of mutually exclusive categories (Walley, 1996)....

    [...]

  • ...Following Walley (1996), we assume that the compositional count data set contains zero values resulting from insufficiently large samples....

    [...]

  • ...In particular, it holds for the imprecise Dirichlet model presented in Walley (1996) which assumes a Dirichlet model Dir(D; st) as a prior distribution of the random vector r....

    [...]

  • ...According to Walley (1996), ‘the value s n= yields a posterior mean that is equal to the minimax estimator of r under quadratic loss’....

    [...]

Journal ArticleDOI
TL;DR: It is shown that a very simple base classifier, based on imprecise probabilities and uncertainty measures, attains a better trade-off among some aspects of interest for this type of studies such as accuracy and area under ROC curve (AUC).
Abstract: In the last years, the application of artificial intelligence methods on credit risk assessment has meant an improvement over classic methods. Small improvements in the systems about credit scoring and bankruptcy prediction can suppose great profits. Then, any improvement represents a high interest to banks and financial institutions. Recent works show that ensembles of classifiers achieve the better results for this kind of tasks. In this paper, it is extended a previous work about the selection of the best base classifier used in ensembles on credit data sets. It is shown that a very simple base classifier, based on imprecise probabilities and uncertainty measures, attains a better trade-off among some aspects of interest for this type of studies such as accuracy and area under ROC curve (AUC). The AUC measure can be considered as a more appropriate measure in this grounds, where the different type of errors have different costs or consequences. The results shown here present to this simple classifier as an interesting choice to be used as base classifier in ensembles for credit scoring and bankruptcy prediction, proving that not only the individual performance of a classifier is the key point to be selected for an ensemble scheme.

166 citations

Journal ArticleDOI
01 Jun 2005
TL;DR: The imprecise Dirichlet model (IDM) was recently proposed by Walley as a model for objective statistical inference from multinomial data with chances @q, and some of its recent applications to various statistical problems are reviewed.
Abstract: The imprecise Dirichlet model (IDM) was recently proposed by Walley as a model for objective statistical inference from multinomial data with chances @q. In the IDM, prior or posterior uncertainty about @q is described by a set of Dirichlet distributions, and inferences about events are summarized by lower and upper probabilities. The IDM avoids shortcomings of alternative objective models, either frequentist or Bayesian. We review the properties of the model, for both parametric and predictive inferences, and some of its recent applications to various statistical problems.

165 citations


Cites background or methods from "Inferences from Multinomial Data: L..."

  • ...Problem: s ≥ K/2 or ≥ K • Encompass frequentist inferences • If too high, inferences are too weak Suggested values: s = 1 or s = 2 (Walley, 1996) Why does the IDM satisfy the RIP?...

    [...]

  • ...INTRODUCTION The IDM in brief Model for statistical inference Proposed by Walley (1996), generalizes the IBM (Walley, 1991)....

    [...]

  • ...(Walley, 1996) Which value for s Suggested value(s) for s?...

    [...]

  • ...“Bag of marbles” problems (Walley, 1996) • “I have . . . a closed bag of coloured marbles....

    [...]

  • ...…coefficient ρ = θ11 θ1.θ.1 robs = 0.467 Hypothesis to be tested: H0 : ρ ≤ 0 vs. H1 : ρ > 0 Comparison of frequentist, Bayesian & IDM inferences (Altham, 1969; Walley, 1996; Walley et al., 1996; Bernard, 2003) Frequentist inference Fisher’s exact test for a 2 × 2 table Amounts to considering all…...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this article, the authors discuss the problem of estimating the sampling distribution of a pre-specified random variable R(X, F) on the basis of the observed data x.
Abstract: We discuss the following problem given a random sample X = (X 1, X 2,…, X n) from an unknown probability distribution F, estimate the sampling distribution of some prespecified random variable R(X, F), on the basis of the observed data x. (Standard jackknife theory gives an approximate mean and variance in the case R(X, F) = \(\theta \left( {\hat F} \right) - \theta \left( F \right)\), θ some parameter of interest.) A general method, called the “bootstrap”, is introduced, and shown to work satisfactorily on a variety of estimation problems. The jackknife is shown to be a linear approximation method for the bootstrap. The exposition proceeds by a series of examples: variance of the sample median, error rates in a linear discriminant analysis, ratio estimation, estimating regression parameters, etc.

14,483 citations

Book
01 Jan 1939
TL;DR: In this paper, the authors introduce the concept of direct probabilities, approximate methods and simplifications, and significant importance tests for various complications, including one new parameter, and various complications for frequency definitions and direct methods.
Abstract: 1. Fundamental notions 2. Direct probabilities 3. Estimation problems 4. Approximate methods and simplifications 5. Significance tests: one new parameter 6. Significance tests: various complications 7. Frequency definitions and direct methods 8. General questions

7,086 citations

Journal ArticleDOI
TL;DR: In this article, a class of prior distributions, called Dirichlet process priors, is proposed for nonparametric problems, for which treatment of many non-parametric statistical problems may be carried out, yielding results that are comparable to the classical theory.
Abstract: The Bayesian approach to statistical problems, though fruitful in many ways, has been rather unsuccessful in treating nonparametric problems. This is due primarily to the difficulty in finding workable prior distributions on the parameter space, which in nonparametric ploblems is taken to be a set of probability distributions on a given sample space. There are two desirable properties of a prior distribution for nonparametric problems. (I) The support of the prior distribution should be large--with respect to some suitable topology on the space of probability distributions on the sample space. (II) Posterior distributions given a sample of observations from the true probability distribution should be manageable analytically. These properties are antagonistic in the sense that one may be obtained at the expense of the other. This paper presents a class of prior distributions, called Dirichlet process priors, broad in the sense of (I), for which (II) is realized, and for which treatment of many nonparametric statistical problems may be carried out, yielding results that are comparable to the classical theory. In Section 2, we review the properties of the Dirichlet distribution needed for the description of the Dirichlet process given in Section 3. Briefly, this process may be described as follows. Let $\mathscr{X}$ be a space and $\mathscr{A}$ a $\sigma$-field of subsets, and let $\alpha$ be a finite non-null measure on $(\mathscr{X}, \mathscr{A})$. Then a stochastic process $P$ indexed by elements $A$ of $\mathscr{A}$, is said to be a Dirichlet process on $(\mathscr{X}, \mathscr{A})$ with parameter $\alpha$ if for any measurable partition $(A_1, \cdots, A_k)$ of $\mathscr{X}$, the random vector $(P(A_1), \cdots, P(A_k))$ has a Dirichlet distribution with parameter $(\alpha(A_1), \cdots, \alpha(A_k)). P$ may be considered a random probability measure on $(\mathscr{X}, \mathscr{A})$, The main theorem states that if $P$ is a Dirichlet process on $(\mathscr{X}, \mathscr{A})$ with parameter $\alpha$, and if $X_1, \cdots, X_n$ is a sample from $P$, then the posterior distribution of $P$ given $X_1, \cdots, X_n$ is also a Dirichlet process on $(\mathscr{X}, \mathscr{A})$ with a parameter $\alpha + \sum^n_1 \delta_{x_i}$, where $\delta_x$ denotes the measure giving mass one to the point $x$. In Section 4, an alternative definition of the Dirichlet process is given. This definition exhibits a version of the Dirichlet process that gives probability one to the set of discrete probability measures on $(\mathscr{X}, \mathscr{A})$. This is in contrast to Dubins and Freedman [2], whose methods for choosing a distribution function on the interval [0, 1] lead with probability one to singular continuous distributions. Methods of choosing a distribution function on [0, 1] that with probability one is absolutely continuous have been described by Kraft [7]. The general method of choosing a distribution function on [0, 1], described in Section 2 of Kraft and van Eeden [10], can of course be used to define the Dirichlet process on [0, 1]. Special mention must be made of the papers of Freedman and Fabius. Freedman [5] defines a notion of tailfree for a distribution on the set of all probability measures on a countable space $\mathscr{X}$. For a tailfree prior, posterior distribution given a sample from the true probability measure may be fairly easily computed. Fabius [3] extends the notion of tailfree to the case where $\mathscr{X}$ is the unit interval [0, 1], but it is clear his extension may be made to cover quite general $\mathscr{X}$. With such an extension, the Dirichlet process would be a special case of a tailfree distribution for which the posterior distribution has a particularly simple form. There are disadvantages to the fact that $P$ chosen by a Dirichlet process is discrete with probability one. These appear mainly because in sampling from a $P$ chosen by a Dirichlet process, we expect eventually to see one observation exactly equal to another. For example, consider the goodness-of-fit problem of testing the hypothesis $H_0$ that a distribution on the interval [0, 1] is uniform. If on the alternative hypothesis we place a Dirichlet process prior with parameter $\alpha$ itself a uniform measure on [0, 1], and if we are given a sample of size $n \geqq 2$, the only nontrivial nonrandomized Bayes rule is to reject $H_0$ if and only if two or more of the observations are exactly equal. This is really a test of the hypothesis that a distribution is continuous against the hypothesis that it is discrete. Thus, there is still a need for a prior that chooses a continuous distribution with probability one and yet satisfies properties (I) and (II). Some applications in which the possible doubling up of the values of the observations plays no essential role are presented in Section 5. These include the estimation of a distribution function, of a mean, of quantiles, of a variance and of a covariance. A two-sample problem is considered in which the Mann-Whitney statistic, equivalent to the rank-sum statistic, appears naturally. A decision theoretic upper tolerance limit for a quantile is also treated. Finally, a hypothesis testing problem concerning a quantile is shown to yield the sign test. In each of these problems, useful ways of combining prior information with the statistical observations appear. Other applications exist. In his Ph. D. dissertation [1], Charles Antoniak finds a need to consider mixtures of Dirichlet processes. He treats several problems, including the estimation of a mixing distribution, bio-assay, empirical Bayes problems, and discrimination problems.

5,033 citations

Book
01 Jun 1973
TL;DR: In this article, the effect of non-normality on inference about a population mean with generalizations was investigated. But the authors focused on the effect on the mean with information from more than one source.
Abstract: Nature of Bayesian Inference Standard Normal Theory Inference Problems Bayesian Assessment of Assumptions: Effect of Non-Normality on Inferences About a Population Mean with Generalizations Bayesian Assessment of Assumptions: Comparison of Variances Random Effect Models Analysis of Cross Classification Designs Inference About Means with Information from More than One Source: One-Way Classification and Block Designs Some Aspects of Multivariate Analysis Estimation of Common Regression Coefficients Transformation of Data Tables References Indexes.

3,896 citations