scispace - formally typeset
Search or ask a question
Book ChapterDOI

Penalty Specialists Among Goalkeepers: A Nonparametric Bayesian Analysis of 44 Years of German Bundesliga

TL;DR: In this paper, the authors used Bayesian hierarchical random effects models to shrink the individual goalkeepers estimates towards an overall estimate with the degree of shrinkage depending on the amount of information that is available for each goalkeeper.
Abstract: Penalty saving abilities are of major importance for a goalkeeper in modern football. However, statistical investigations of the performance of individual goalkeepers in penalties, leading to a ranking or a clustering of the keepers, are rare in the scientific literature. In this paper we will perform such an analysis based on all penalties in the German Bundesliga from 1963 to 2007. A challenge when analyzing such a data set is the fact that the counts of penalties for the different goalkeepers are highly imbalanced, leading to the question on how to compare goalkeepers who were involved in a disparate number of penalties. We will approach this issue by using Bayesian hierarchical random effects models. These models shrink the individual goalkeepers estimates towards an overall estimate with the degree of shrinkage depending on the amount of information that is available for each goalkeeper. The underlying random effects distribution will be modelled nonparametrically based on the Dirichlet process. Proceeding this way relaxes the assumptions underlying parametric random effect models and additionally allows to find clusters among the goalkeepers.

Content maybe subject to copyright    Report

Citations
More filters
DOI
02 Jul 2010
TL;DR: A Bayesian approach to these models, which has several advantages, can solve problems with unbounded likelihood functions, that can occur in mixture models and allows an elegant extension of finite to (countable) infinite mixture models.
Abstract: Finite mixture models assume that a distribution is a combination of several parametric distributions. They offer a compromise between the interpretability of parametric models and the flexibility of nonparametric models. This thesis considers a Bayesian approach to these models, which has several advantages. For example, using only weak prior information, it can solve problems with unbounded likelihood functions, that can occur in mixture models. The Bayesian approach also allows an elegant extension of finite to (countable) infinite mixture models. Depending on the application, the components of mixture models can either be viewed as just a means to the flexible modeling of a distribution or as defining subgroups of a population with different parametric distributions. Regarding the former case consistency results for Bayesian mixtures are stated. An example concerning the flexible modeling of a random effects distribution in a logistic regression is also given. The application considers the goalkeeper's effect in saving a penalty. In the latter case mixture models can be used for clustering. Bayesian mixtures then allow the estimation of the number of clusters at the same time as the cluster-specific parameters. For cluster analysis the standard approach for fitting Bayesian mixtures, Markov Chain Monte Carlo (MCMC), unfortunately leads to inferential difficulties. The labels associated with the clusters can change during the MCMC run, a phenomenon called label-switching. The problem gets severe, if the number of clusters is allowed to vary. Existing methods to deal with label-switching and a varying number of components are reviewed and new approaches are proposed for both situations. The first consists of a variant of the relabeling algorithm of Stephens (2000). The variant is more general, as it applies to drawn clusterings and not drawn parameter values. Therefore it does not depend on the specific form of the component distributions. The second approach is based on pairwise posterior probabilities and is an improvement of a commonly used loss function due to Binder (1978). Minimization of this loss is shown to be equivalent to maximizing the posterior expected Rand index with the true clustering. As the adjusted Rand index is preferable to the raw index, the maximization of the posterior expected adjusted Rand is proposed. The new approaches are compared to the previous methods on simulated and real data. The real data used for cluster analysis are two gene expression data sets and Fisher's iris data.

2 citations

Posted Content
TL;DR: In this paper, the authors investigated the hot shoe effect for penalty takers in the German Bundesliga and found that players with player-specific abilities can be tied to different forms of players.
Abstract: Although academic research on the 'hot hand' effect (in particular, in sports, especially in basketball) has been going on for more than 30 years, it still remains a central question in different areas of research whether such an effect exists. In this contribution, we investigate the potential occurrence of a 'hot shoe' effect for the performance of penalty takers in football based on data from the German Bundesliga. For this purpose, we consider hidden Markov models (HMMs) to model the (latent) forms of players. To further account for individual heterogeneity of the penalty taker as well as the opponent's goalkeeper, player-specific abilities are incorporated in the model formulation together with a LASSO penalty. Our results suggest states which can be tied to different forms of players, thus providing evidence for the hot shoe effect, and shed some light on exceptionally well-performing goalkeepers, which are of potential interest to managers and sports fans.

2 citations

DOI
14 Dec 2009
TL;DR: This thesis starts with a short introduction to the Bayesian world of thought from the nonparametric perspective, and a motivation for shape constraints from three concrete applications, and develops novel Bayesian non Parametric models for particular shape constrained problems.
Abstract: Shape constraints are a way of incorporating geometric or structural prior information into a statistical model. An advantage of shape constraints over other structural modelling assumptions is the fact that shape constraints are usually directly motivated from the science underlying the studied application, while other modelling assumptions (e.g. parametric assumptions) often lack such a motivation. Compared to plain nonparametric inference, shape constrained nonparametric inference has the advantage that shape constraints, when adequate, can substantially improve the efficiency of inference, as they reduce the effective modelling complexity (by restricting the model). Examples of shape constraints which are covered in this thesis are monotonicity, convexity and concavity constraints, when modelling a functional relationship and a stochastic ordering constraint when modelling probability distributions. The thesis starts with a short introduction to the Bayesian world of thought from the nonparametric perspective and a motivation for shape constraints from three concrete applications. In Chapter 2 we then present an overview of nonparametric (i.e. infinite dimensional) Bayesian inference. We first review prior distributions for probability measures and functions, and then present a brief summary of an asymptotic analysis of Bayesian nonparametrics. The focus of the review is on methods, which are needed in the following chapters. In Chapter 3, 4 and 5 we then develop novel Bayesian nonparametric models for particular shape constrained problems. In Chapter 3 a Bayesian nonparametric model for monotone regression is developed. For this purpose the monotone function is modelled as a mixture of shifted and scaled parametric probability distribution functions, and a general random probability measure is assumed as the prior for the mixing distribution. We investigate the theoretical properties of the model and illustrate it on two practical examples (dose-response analysis and growth curve analysis). Chapter 4 then extends the model developed in Chapter 3 to the case of more general constraints on derivatives of the modelled function, such as, for example, convexity and monotone convexity. Beside a practical illustration of the model, we also derive a consistency result in this chapter. In Chapter 5 we finally introduce a model for estimating a set of stochastically ordered densities, where the ordering is assumed with respect to multivariate continuous covariates. For this purpose the residual density is modelled as a mixture of normal distributions and the stochastic ordering is induced by assuming multivariate monotone functions as component specific means. We investigate the support properties of the so formed prior distribution and illustrate the method on an epidemiologic data set. The thesis ends with a summary and an outlook of possible future work.

1 citations

Journal ArticleDOI
TL;DR: This paper considers how to rank penalty takers in the German Bundesliga based on historical data from 1963 to 2021 based on Bayesian models that improve inference on ability measures of individual players by imposing structural assumptions on an associated high-dimensional parameter space.
Abstract: Judging by its significant potential to affect the outcome of a game in one single action, the penalty kick is arguably the most important set piece in football. Scientific studies on how the ability to convert a penalty kick is distributed among professional football players are scarce. In this paper, we consider how to rank penalty takers in the German Bundesliga based on historical data from 1963 to 2021. We use Bayesian models that improve inference on ability measures of individual players by imposing structural assumptions on an associated high-dimensional parameter space. These methods prove useful for our application, coping with the inherent difficulty that many players only take few penalties, making purely frequentist inference rather unreliable.

1 citations

References
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined and derive a measure pD for the effective number in a model as the difference between the posterior mean of the deviances and the deviance at the posterior means of the parameters of interest, which is related to other information criteria and has an approximate decision theoretic justification.
Abstract: Summary. We consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. Using an information theoretic argument we derive a measure pD for the effective number of parameters in a model as the difference between the posterior mean of the deviance and the deviance at the posterior means of the parameters of interest. In general pD approximately corresponds to the trace of the product of Fisher's information and the posterior covariance, which in normal models is the trace of the ‘hat’ matrix projecting observations onto fitted values. Its properties in exponential families are explored. The posterior mean deviance is suggested as a Bayesian measure of fit or adequacy, and the contributions of individual observations to the fit and complexity can give rise to a diagnostic plot of deviance residuals against leverages. Adding pD to the posterior mean deviance gives a deviance information criterion for comparing models, which is related to other information criteria and has an approximate decision theoretic justification. The procedure is illustrated in some examples, and comparisons are drawn with alternative Bayesian and classical proposals. Throughout it is emphasized that the quantities required are trivial to compute in a Markov chain Monte Carlo analysis.

11,691 citations

Journal ArticleDOI
TL;DR: In this article, a class of prior distributions, called Dirichlet process priors, is proposed for nonparametric problems, for which treatment of many non-parametric statistical problems may be carried out, yielding results that are comparable to the classical theory.
Abstract: The Bayesian approach to statistical problems, though fruitful in many ways, has been rather unsuccessful in treating nonparametric problems. This is due primarily to the difficulty in finding workable prior distributions on the parameter space, which in nonparametric ploblems is taken to be a set of probability distributions on a given sample space. There are two desirable properties of a prior distribution for nonparametric problems. (I) The support of the prior distribution should be large--with respect to some suitable topology on the space of probability distributions on the sample space. (II) Posterior distributions given a sample of observations from the true probability distribution should be manageable analytically. These properties are antagonistic in the sense that one may be obtained at the expense of the other. This paper presents a class of prior distributions, called Dirichlet process priors, broad in the sense of (I), for which (II) is realized, and for which treatment of many nonparametric statistical problems may be carried out, yielding results that are comparable to the classical theory. In Section 2, we review the properties of the Dirichlet distribution needed for the description of the Dirichlet process given in Section 3. Briefly, this process may be described as follows. Let $\mathscr{X}$ be a space and $\mathscr{A}$ a $\sigma$-field of subsets, and let $\alpha$ be a finite non-null measure on $(\mathscr{X}, \mathscr{A})$. Then a stochastic process $P$ indexed by elements $A$ of $\mathscr{A}$, is said to be a Dirichlet process on $(\mathscr{X}, \mathscr{A})$ with parameter $\alpha$ if for any measurable partition $(A_1, \cdots, A_k)$ of $\mathscr{X}$, the random vector $(P(A_1), \cdots, P(A_k))$ has a Dirichlet distribution with parameter $(\alpha(A_1), \cdots, \alpha(A_k)). P$ may be considered a random probability measure on $(\mathscr{X}, \mathscr{A})$, The main theorem states that if $P$ is a Dirichlet process on $(\mathscr{X}, \mathscr{A})$ with parameter $\alpha$, and if $X_1, \cdots, X_n$ is a sample from $P$, then the posterior distribution of $P$ given $X_1, \cdots, X_n$ is also a Dirichlet process on $(\mathscr{X}, \mathscr{A})$ with a parameter $\alpha + \sum^n_1 \delta_{x_i}$, where $\delta_x$ denotes the measure giving mass one to the point $x$. In Section 4, an alternative definition of the Dirichlet process is given. This definition exhibits a version of the Dirichlet process that gives probability one to the set of discrete probability measures on $(\mathscr{X}, \mathscr{A})$. This is in contrast to Dubins and Freedman [2], whose methods for choosing a distribution function on the interval [0, 1] lead with probability one to singular continuous distributions. Methods of choosing a distribution function on [0, 1] that with probability one is absolutely continuous have been described by Kraft [7]. The general method of choosing a distribution function on [0, 1], described in Section 2 of Kraft and van Eeden [10], can of course be used to define the Dirichlet process on [0, 1]. Special mention must be made of the papers of Freedman and Fabius. Freedman [5] defines a notion of tailfree for a distribution on the set of all probability measures on a countable space $\mathscr{X}$. For a tailfree prior, posterior distribution given a sample from the true probability measure may be fairly easily computed. Fabius [3] extends the notion of tailfree to the case where $\mathscr{X}$ is the unit interval [0, 1], but it is clear his extension may be made to cover quite general $\mathscr{X}$. With such an extension, the Dirichlet process would be a special case of a tailfree distribution for which the posterior distribution has a particularly simple form. There are disadvantages to the fact that $P$ chosen by a Dirichlet process is discrete with probability one. These appear mainly because in sampling from a $P$ chosen by a Dirichlet process, we expect eventually to see one observation exactly equal to another. For example, consider the goodness-of-fit problem of testing the hypothesis $H_0$ that a distribution on the interval [0, 1] is uniform. If on the alternative hypothesis we place a Dirichlet process prior with parameter $\alpha$ itself a uniform measure on [0, 1], and if we are given a sample of size $n \geqq 2$, the only nontrivial nonrandomized Bayes rule is to reject $H_0$ if and only if two or more of the observations are exactly equal. This is really a test of the hypothesis that a distribution is continuous against the hypothesis that it is discrete. Thus, there is still a need for a prior that chooses a continuous distribution with probability one and yet satisfies properties (I) and (II). Some applications in which the possible doubling up of the values of the observations plays no essential role are presented in Section 5. These include the estimation of a distribution function, of a mean, of quantiles, of a variance and of a covariance. A two-sample problem is considered in which the Mann-Whitney statistic, equivalent to the rank-sum statistic, appears naturally. A decision theoretic upper tolerance limit for a quantile is also treated. Finally, a hypothesis testing problem concerning a quantile is shown to yield the sign test. In each of these problems, useful ways of combining prior information with the statistical observations appear. Other applications exist. In his Ph. D. dissertation [1], Charles Antoniak finds a need to consider mixtures of Dirichlet processes. He treats several problems, including the estimation of a mixing distribution, bio-assay, empirical Bayes problems, and discrimination problems.

5,033 citations


"Penalty Specialists Among Goalkeepe..." refers background or methods in this paper

  • ...A flexible and convenient solution is to use the Dirichlet process, dating back to [5]....

    [...]

  • ...which allows for an efficient exact implementation in many cases (see [5] for details)....

    [...]

ReportDOI
01 May 1991
TL;DR: In this article, a class of priors known as Dirichlet measures have been used for the distribution of a random variable X when it takes values in R sub K, where K is the dimension of all probability measures on a large space.
Abstract: : The parameter in a Bayesian nonparametric problem is the unknown distribution P of the observation X. A Bayesian uses a prior distribution for P, and after observing X, solves the statistical inference problem by using the posterior distribution of P, which is the conditional distribution of P given X. For Bayesian nonparametrics to be successful one needs a large class of priors for which posterior distributions can be easily calculated. Unless X takes values in a finite space, the unknown distribution P varies in an infinite dimensional space. Thus one has to talk about measures in a complicated space like the space of all probability measures on a large space. This has always required a more careful attention to the attendant measure theoretic problems. A class of priors known as Dirichlet measures have been used for the distribution of a random variable X when it takes values in R sub K.

2,162 citations


"Penalty Specialists Among Goalkeepe..." refers background in this paper

  • ...Another reason for the popularity of Dirichlet process priors is the constructive stick-breaking representation of the Dirichlet process given by [17]....

    [...]

Journal ArticleDOI
TL;DR: In this article, the conditional distribution of the random measure, given the observations, is no longer that of a simple Dirichlet process, but can be described as being a mixture of DirICHlet processes.
Abstract: process. This paper extends Ferguson's result to cases where the random measure is a mixing distribution for a parameter which determines the distribution from which observations are made. The conditional distribution of the random measure, given the observations, is no longer that of a simple Dirichlet process, but can be described as being a mixture of Dirichlet processes. This paper gives a formal definition for these mixtures and develops several theorems about their properties, the most important of which is a closure property for such mixtures. Formulas for computing the conditional distribution are derived and applications to problems in bio-assay, discrimination, regression, and mixing distributions are given.

2,146 citations


"Penalty Specialists Among Goalkeepe..." refers background in this paper

  • ...For a random sample of size n from a probability distribution realized by a Dirichlet process [1] has shown that the prior density of the number of distinct values (clusters/components) k in n realizations is...

    [...]

Journal ArticleDOI
TL;DR: Two general types of Gibbs samplers that can be used to fit posteriors of Bayesian hierarchical models based on stick-breaking priors are presented and the blocked Gibbs sampler, based on an entirely different approach that works by directly sampling values from the posterior of the random measure.
Abstract: A rich and flexible class of random probability measures, which we call stick-breaking priors, can be constructed using a sequence of independent beta random variables. Examples of random measures that have this characterization include the Dirichlet process, its two-parameter extension, the two-parameter Poisson–Dirichlet process, finite dimensional Dirichlet priors, and beta two-parameter processes. The rich nature of stick-breaking priors offers Bayesians a useful class of priors for nonparametric problems, while the similar construction used in each prior can be exploited to develop a general computational procedure for fitting them. In this article we present two general types of Gibbs samplers that can be used to fit posteriors of Bayesian hierarchical models based on stick-breaking priors. The first type of Gibbs sampler, referred to as a Polya urn Gibbs sampler, is a generalized version of a widely used Gibbs sampling method currently employed for Dirichlet process computing. This method applies t...

1,701 citations