scispace - formally typeset
Search or ask a question
Author

Cinzia Carota

Bio: Cinzia Carota is an academic researcher from University of Turin. The author has contributed to research in topics: Dirichlet process & Bayesian probability. The author has an hindex of 7, co-authored 29 publications receiving 198 citations. Previous affiliations of Cinzia Carota include University of Pavia & University of Genoa.

Papers
More filters
Journal ArticleDOI
TL;DR: A class of Bayesian semiparametric models for regression problems in which the response variable is a count is introduced, to provide a flexible, easy-to-implement and robust extension of generalised linear models, for datasets of moderate or large size.
Abstract: We introduce a class of Bayesian semiparametric models for regression problems in which the response variable is a count. Our goal is to provide a flexible, easy-to-implement and robust extension of generalised linear models, for datasets of moderate or large size. Our approach is based on modelling the distribution of the response variable using a Dirichlet process, whose mean distribution function is itself random and is given a parametric form, such as a generalised linear model. The effects of the explanatory variables on the response are modelled via both the parameters of the mean distribution function of the Dirichlet process and the total mass parameter. We discuss modelling options and relationships with other approaches. We derive in closed form the marginal posterior distribution of the regression coefficients and discuss its use in inference and computing. We illustrate the benefits of our approach with a prognostic model for early breast cancer patients.

65 citations

Journal ArticleDOI
TL;DR: This work approaches model criticism by identifying possible troublesome features of the currently entertained model, embedding the model in an elaborated model, and measuring the value of elaborating, which allows model criticism to be performed jointly with parameter inference and prediction.
Abstract: We discuss the problem of model criticism, with emphasis on developing summary diagnostic measures. We approach model criticism by identifying possible troublesome features of the currently entertained model, embedding the model in an elaborated model, and measuring the value of elaborating. This requires three elements: a model elaboration, a prior distribution, and a utility function. Each triplet generates a different diagnostic measure. We focus primarily on the measure given by a Kullback—Leibler divergence between the marginal prior and posterior distributions on the elaboration parameter. We also develop a linearized version of this diagnostic and use it to show that our procedure is related to other tools commonly used for model diagnostics, such as Bayes factors and the score function. One attraction of this approach is that it allows model criticism to be performed jointly with parameter inference and prediction. Also, this diagnostic approach aims at maintaining an exploratory nature t...

50 citations

11 May 2015
TL;DR: The Ross Sea can be considered, in a biological sense, one of the better-known areas in Antarctica due to the high number of expeditions engaged since 1899 as mentioned in this paper, and hundreds of mollusc species have been collected and classified along years in a unique database which is now available for study.
Abstract: The Ross Sea can be considered, in a biological sense, one of the better-known areas in Antarctica due to the high number of expeditions engaged since 1899. Hundreds of mollusc species have been collected and classified along years in a unique database which is now available for study. The possibility to access such impressive information offers the opportunity to apply important results in the study of biodiversity for that area. Recent influential scientific contributions induce us to study species diversity by means of accumulation curves based on Hill numbers, i.e. the effective number of equally frequent species.

16 citations

Journal ArticleDOI
TL;DR: In this paper, the Dirichlet process random effects are used to reduce the number of fixed effects required to achieve reliable risk estimates, and the results show that their mixed models with main effects only produce roughly equivalent estimates compared to the all two-way interactions models, and are effective in defusing potential shortcomings of traditional loglinear models.
Abstract: Statistical agencies and other institutions collect data under the promise to protect the confidentiality of respondents. When releasing microdata samples, the risk that records can be identified must be assessed. To this aim, a widely adopted approach is to isolate categorical variables key to the identification and analyze multi-way contingency tables of such variables. Common disclosure risk measures focus on sample unique cells in these tables and adopt parametric log-linear models as the standard statistical tools for the problem. Such models often have to deal with large and extremely sparse tables that pose a number of challenges to risk estimation. This paper proposes to overcome these problems by studying nonparametric alternatives based on Dirichlet process random effects. The main finding is that the inclusion of such random effects allows us to reduce considerably the number of fixed effects required to achieve reliable risk estimates. This is studied on applications to real data, suggesting, in particular, that our mixed models with main effects only produce roughly equivalent estimates compared to the all two-way interactions models, and are effective in defusing potential shortcomings of traditional log-linear models. This paper adopts a fully Bayesian approach that accounts for all sources of uncertainty, including that about the population frequencies, and supplies unconditional (posterior) variances and credible intervals.

13 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This work considers problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups, and considers a hierarchical model, specifically one in which the base measure for the childDirichlet processes is itself distributed according to a Dirichlet process.
Abstract: We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the well-known clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes ...

3,755 citations

01 Jan 2012

3,692 citations

Journal ArticleDOI
TL;DR: Chapman and Miller as mentioned in this paper, Subset Selection in Regression (Monographs on Statistics and Applied Probability, no. 40, 1990) and Section 5.8.
Abstract: 8. Subset Selection in Regression (Monographs on Statistics and Applied Probability, no. 40). By A. J. Miller. ISBN 0 412 35380 6. Chapman and Hall, London, 1990. 240 pp. £25.00.

1,154 citations

Journal Article
TL;DR: The methodology proposed automatically adapts to the local structure when simulating paths across this manifold, providing highly efficient convergence and exploration of the target density, and substantial improvements in the time‐normalized effective sample size are reported when compared with alternative sampling approaches.
Abstract: The paper proposes Metropolis adjusted Langevin and Hamiltonian Monte Carlo sampling methods defined on the Riemann manifold to resolve the shortcomings of existing Monte Carlo algorithms when sampling from target densities that may be high dimensional and exhibit strong correlations. The methods provide fully automated adaptation mechanisms that circumvent the costly pilot runs that are required to tune proposal densities for Metropolis-Hastings or indeed Hamiltonian Monte Carlo and Metropolis adjusted Langevin algorithms. This allows for highly efficient sampling even in very high dimensions where different scalings may be required for the transient and stationary phases of the Markov chain. The methodology proposed exploits the Riemann geometry of the parameter space of statistical models and thus automatically adapts to the local structure when simulating paths across this manifold, providing highly efficient convergence and exploration of the target density. The performance of these Riemann manifold Monte Carlo methods is rigorously assessed by performing inference on logistic regression models, log-Gaussian Cox point processes, stochastic volatility models and Bayesian estimation of dynamic systems described by non-linear differential equations. Substantial improvements in the time-normalized effective sample size are reported when compared with alternative sampling approaches. MATLAB code that is available from http://www.ucl.ac.uk/statistics/research/rmhmc allows replication of all the results reported.

1,031 citations

01 Jan 1980
TL;DR: In this article, the authors consider the problem of bad value estimation from a Bayesian viewpoint and compare the performance of M estimators with predictive checking functions for transformation, serial correlation, bad values, and their relation with Bayesian options.
Abstract: : Scientific learning is an iterative process employing Criticism and Estimation. Correspondingly the formulated model factors into two complimentary parts - a predictive part allowing model criticism, and a Bayes posterior part allowing estimation. Implications for significance tests, the theory of precise measurement, and for ridge estimates are considered. Predictive checking functions for transformation, serial correlation, bad values, and their relation with Bayesian options are considered. Robustness is seen from a Bayesian viewpoint and examples are given. For the bad value problem a comparison with M estimators is made. (Author)

768 citations