Book ChapterDOI

Penalty Specialists Among Goalkeepers: A Nonparametric Bayesian Analysis of 44 Years of German Bundesliga

01 Jan 2009-pp 63-76

AbstractPenalty saving abilities are of major importance for a goalkeeper in modern football. However, statistical investigations of the performance of individual goalkeepers in penalties, leading to a ranking or a clustering of the keepers, are rare in the scientific literature. In this paper we will perform such an analysis based on all penalties in the German Bundesliga from 1963 to 2007. A challenge when analyzing such a data set is the fact that the counts of penalties for the different goalkeepers are highly imbalanced, leading to the question on how to compare goalkeepers who were involved in a disparate number of penalties. We will approach this issue by using Bayesian hierarchical random effects models. These models shrink the individual goalkeepers estimates towards an overall estimate with the degree of shrinkage depending on the amount of information that is available for each goalkeeper. The underlying random effects distribution will be modelled nonparametrically based on the Dirichlet process. Proceeding this way relaxes the assumptions underlying parametric random effect models and additionally allows to find clusters among the goalkeepers.

Content maybe subject to copyright    Report

Citations
More filters

01 Jan 1992

75 citations

Journal ArticleDOI
Abstract: Much of the existing goalkeeper (GK) research is based around GK’s performance but not with a match analysis theme. Research has focused on physiology, psychology and injury prevention. Performance...

16 citations

Journal ArticleDOI

Abstract: This study contributes to research on migrant pay disparities by analysing the impact of players' domestic/foreign status on performance-based pay offered to professional footballers, to understand if foreign players benefit from a preferential labour market. We used information from publicly available data of 275 footballers who played for two consecutive seasons in the Italian league Serie A. We found that the relationship between previous and current performance was partially mediated by the current salary. This result reinforced earlier findings on the pay-performance relationship, where seasonal performance is particularly relevant. Moreover, our results show that pay discrimination does not indicate a straightforward (dis)advantage for one group, but presents a more complex picture. We have examined possible underlying reasons for these disparities and offered suggestions for further research. We conclude by discussing how clubs and managers could consider incentives to strengthen pay-performance relationships by being sensitive to the complex influence of players' origins.

11 citations

Cites background or result from "Penalty Specialists Among Goalkeepe..."

• ...This view is consistent with recent literature in football that focuses on the role of positional skills in determining the performance of both players and teams (e.g. Bornkamp et al., 2009; Seaton and Campos, 2011)....

[...]

• ...Nonetheless, using game statistics has the advantage of ensuring the data’s objectivity, but it might not entirely reflect each player’s performance in a particular game (Bornkamp et al., 2009)....

[...]

• ...Empirical evidence suggests that different roles require different skills to be successful (e.g. Bornkamp et al., 2009; Seaton and Campos, 2011)....

[...]

01 Jan 2002
Abstract: Sport scientists have devoted relatively little attention to soccer penalty kicks, despite their decisive role in important competitions such as the World Cup. Two possible kicker strategies have been described: ignoring the goalkeeper action (open loop) or trying to react to the goalkeeper action (closed loop). We used a paradigm simulating a penalty kick in the laboratory to investigate the dynamics of the closed-loop strategy in these controlled conditions. The probability of correctly responding to the simulated goalkeeper motion as a function of time available followed a logistic curve. Kickers on average reached perfect performance only if the goalkeeper committed him or herself to one side about 400 ms before ball contact and showed chance performance if the goalkeeper motion occurred less than 150 ms before ball contact. Interestingly, coincidence judgement - another aspect of the laboratory responses - appeared to be affected for a much longer time (>500 ms) than was needed to correctly determine...

6 citations

DOI
02 Jul 2010
TL;DR: A Bayesian approach to these models, which has several advantages, can solve problems with unbounded likelihood functions, that can occur in mixture models and allows an elegant extension of finite to (countable) infinite mixture models.
Abstract: Finite mixture models assume that a distribution is a combination of several parametric distributions. They offer a compromise between the interpretability of parametric models and the flexibility of nonparametric models. This thesis considers a Bayesian approach to these models, which has several advantages. For example, using only weak prior information, it can solve problems with unbounded likelihood functions, that can occur in mixture models. The Bayesian approach also allows an elegant extension of finite to (countable) infinite mixture models. Depending on the application, the components of mixture models can either be viewed as just a means to the flexible modeling of a distribution or as defining subgroups of a population with different parametric distributions. Regarding the former case consistency results for Bayesian mixtures are stated. An example concerning the flexible modeling of a random effects distribution in a logistic regression is also given. The application considers the goalkeeper's effect in saving a penalty. In the latter case mixture models can be used for clustering. Bayesian mixtures then allow the estimation of the number of clusters at the same time as the cluster-specific parameters. For cluster analysis the standard approach for fitting Bayesian mixtures, Markov Chain Monte Carlo (MCMC), unfortunately leads to inferential difficulties. The labels associated with the clusters can change during the MCMC run, a phenomenon called label-switching. The problem gets severe, if the number of clusters is allowed to vary. Existing methods to deal with label-switching and a varying number of components are reviewed and new approaches are proposed for both situations. The first consists of a variant of the relabeling algorithm of Stephens (2000). The variant is more general, as it applies to drawn clusterings and not drawn parameter values. Therefore it does not depend on the specific form of the component distributions. The second approach is based on pairwise posterior probabilities and is an improvement of a commonly used loss function due to Binder (1978). Minimization of this loss is shown to be equivalent to maximizing the posterior expected Rand index with the true clustering. As the adjusted Rand index is preferable to the raw index, the maximization of the posterior expected adjusted Rand is proposed. The new approaches are compared to the previous methods on simulated and real data. The real data used for cluster analysis are two gene expression data sets and Fisher's iris data.

2 citations

References
More filters

Journal ArticleDOI
Abstract: Summary. We consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. Using an information theoretic argument we derive a measure pD for the effective number of parameters in a model as the difference between the posterior mean of the deviance and the deviance at the posterior means of the parameters of interest. In general pD approximately corresponds to the trace of the product of Fisher's information and the posterior covariance, which in normal models is the trace of the ‘hat’ matrix projecting observations onto fitted values. Its properties in exponential families are explored. The posterior mean deviance is suggested as a Bayesian measure of fit or adequacy, and the contributions of individual observations to the fit and complexity can give rise to a diagnostic plot of deviance residuals against leverages. Adding pD to the posterior mean deviance gives a deviance information criterion for comparing models, which is related to other information criteria and has an approximate decision theoretic justification. The procedure is illustrated in some examples, and comparisons are drawn with alternative Bayesian and classical proposals. Throughout it is emphasized that the quantities required are trivial to compute in a Markov chain Monte Carlo analysis.

10,825 citations

Journal ArticleDOI
Abstract: The Bayesian approach to statistical problems, though fruitful in many ways, has been rather unsuccessful in treating nonparametric problems. This is due primarily to the difficulty in finding workable prior distributions on the parameter space, which in nonparametric ploblems is taken to be a set of probability distributions on a given sample space. There are two desirable properties of a prior distribution for nonparametric problems. (I) The support of the prior distribution should be large--with respect to some suitable topology on the space of probability distributions on the sample space. (II) Posterior distributions given a sample of observations from the true probability distribution should be manageable analytically. These properties are antagonistic in the sense that one may be obtained at the expense of the other. This paper presents a class of prior distributions, called Dirichlet process priors, broad in the sense of (I), for which (II) is realized, and for which treatment of many nonparametric statistical problems may be carried out, yielding results that are comparable to the classical theory. In Section 2, we review the properties of the Dirichlet distribution needed for the description of the Dirichlet process given in Section 3. Briefly, this process may be described as follows. Let $\mathscr{X}$ be a space and $\mathscr{A}$ a $\sigma$-field of subsets, and let $\alpha$ be a finite non-null measure on $(\mathscr{X}, \mathscr{A})$. Then a stochastic process $P$ indexed by elements $A$ of $\mathscr{A}$, is said to be a Dirichlet process on $(\mathscr{X}, \mathscr{A})$ with parameter $\alpha$ if for any measurable partition $(A_1, \cdots, A_k)$ of $\mathscr{X}$, the random vector $(P(A_1), \cdots, P(A_k))$ has a Dirichlet distribution with parameter $(\alpha(A_1), \cdots, \alpha(A_k)). P$ may be considered a random probability measure on $(\mathscr{X}, \mathscr{A})$, The main theorem states that if $P$ is a Dirichlet process on $(\mathscr{X}, \mathscr{A})$ with parameter $\alpha$, and if $X_1, \cdots, X_n$ is a sample from $P$, then the posterior distribution of $P$ given $X_1, \cdots, X_n$ is also a Dirichlet process on $(\mathscr{X}, \mathscr{A})$ with a parameter $\alpha + \sum^n_1 \delta_{x_i}$, where $\delta_x$ denotes the measure giving mass one to the point $x$. In Section 4, an alternative definition of the Dirichlet process is given. This definition exhibits a version of the Dirichlet process that gives probability one to the set of discrete probability measures on $(\mathscr{X}, \mathscr{A})$. This is in contrast to Dubins and Freedman [2], whose methods for choosing a distribution function on the interval [0, 1] lead with probability one to singular continuous distributions. Methods of choosing a distribution function on [0, 1] that with probability one is absolutely continuous have been described by Kraft [7]. The general method of choosing a distribution function on [0, 1], described in Section 2 of Kraft and van Eeden [10], can of course be used to define the Dirichlet process on [0, 1]. Special mention must be made of the papers of Freedman and Fabius. Freedman [5] defines a notion of tailfree for a distribution on the set of all probability measures on a countable space $\mathscr{X}$. For a tailfree prior, posterior distribution given a sample from the true probability measure may be fairly easily computed. Fabius [3] extends the notion of tailfree to the case where $\mathscr{X}$ is the unit interval [0, 1], but it is clear his extension may be made to cover quite general $\mathscr{X}$. With such an extension, the Dirichlet process would be a special case of a tailfree distribution for which the posterior distribution has a particularly simple form. There are disadvantages to the fact that $P$ chosen by a Dirichlet process is discrete with probability one. These appear mainly because in sampling from a $P$ chosen by a Dirichlet process, we expect eventually to see one observation exactly equal to another. For example, consider the goodness-of-fit problem of testing the hypothesis $H_0$ that a distribution on the interval [0, 1] is uniform. If on the alternative hypothesis we place a Dirichlet process prior with parameter $\alpha$ itself a uniform measure on [0, 1], and if we are given a sample of size $n \geqq 2$, the only nontrivial nonrandomized Bayes rule is to reject $H_0$ if and only if two or more of the observations are exactly equal. This is really a test of the hypothesis that a distribution is continuous against the hypothesis that it is discrete. Thus, there is still a need for a prior that chooses a continuous distribution with probability one and yet satisfies properties (I) and (II). Some applications in which the possible doubling up of the values of the observations plays no essential role are presented in Section 5. These include the estimation of a distribution function, of a mean, of quantiles, of a variance and of a covariance. A two-sample problem is considered in which the Mann-Whitney statistic, equivalent to the rank-sum statistic, appears naturally. A decision theoretic upper tolerance limit for a quantile is also treated. Finally, a hypothesis testing problem concerning a quantile is shown to yield the sign test. In each of these problems, useful ways of combining prior information with the statistical observations appear. Other applications exist. In his Ph. D. dissertation [1], Charles Antoniak finds a need to consider mixtures of Dirichlet processes. He treats several problems, including the estimation of a mixing distribution, bio-assay, empirical Bayes problems, and discrimination problems.

4,678 citations

"Penalty Specialists Among Goalkeepe..." refers background or methods in this paper

• ...A flexible and convenient solution is to use the Dirichlet process, dating back to [5]....

[...]

• ...which allows for an efficient exact implementation in many cases (see [5] for details)....

[...]

ReportDOI
01 May 1991
Abstract: : The parameter in a Bayesian nonparametric problem is the unknown distribution P of the observation X. A Bayesian uses a prior distribution for P, and after observing X, solves the statistical inference problem by using the posterior distribution of P, which is the conditional distribution of P given X. For Bayesian nonparametrics to be successful one needs a large class of priors for which posterior distributions can be easily calculated. Unless X takes values in a finite space, the unknown distribution P varies in an infinite dimensional space. Thus one has to talk about measures in a complicated space like the space of all probability measures on a large space. This has always required a more careful attention to the attendant measure theoretic problems. A class of priors known as Dirichlet measures have been used for the distribution of a random variable X when it takes values in R sub K.

2,013 citations

"Penalty Specialists Among Goalkeepe..." refers background in this paper

• ...Another reason for the popularity of Dirichlet process priors is the constructive stick-breaking representation of the Dirichlet process given by [17]....

[...]

Journal ArticleDOI
Abstract: process. This paper extends Ferguson's result to cases where the random measure is a mixing distribution for a parameter which determines the distribution from which observations are made. The conditional distribution of the random measure, given the observations, is no longer that of a simple Dirichlet process, but can be described as being a mixture of Dirichlet processes. This paper gives a formal definition for these mixtures and develops several theorems about their properties, the most important of which is a closure property for such mixtures. Formulas for computing the conditional distribution are derived and applications to problems in bio-assay, discrimination, regression, and mixing distributions are given.

2,011 citations

"Penalty Specialists Among Goalkeepe..." refers background in this paper

• ...For a random sample of size n from a probability distribution realized by a Dirichlet process [1] has shown that the prior density of the number of distinct values (clusters/components) k in n realizations is...

[...]

01 Jan 1991

1,560 citations