scispace - formally typeset
Open AccessPosted Content

Assessing Bayesian Nonparametric Log-Linear Models: an application to Disclosure Risk estimation

TLDR
A method is proposed for identifying models with good predictive performance in the family of Bayesian log‐linear mixed models with Dirichlet process random effects for count data, which is the focus of the present work.
Abstract
We present a method for identification of models with good predictive performances in the family of Bayesian log-linear mixed models with Dirichlet process random effects. Such a problem arises in many different applications; here we consider it in the context of disclosure risk estimation, an increasingly relevant issue raised by the increasing demand for data collected under a pledge of confidentiality. Two different criteria are proposed and jointly used via a two-stage selection procedure, in a M-open view. The first stage is devoted to identifying a path of search; then, at the second, a small number of nonparametric models is evaluated through an application-specific score based Bayesian information criterion. We test our method on a variety of contingency tables based on microdata samples from the US Census Bureau and the Italian National Security Administration, treated here as populations, and carefully discuss its features. This leads us to a journey around different forms and sources of bias along which we show that (i) while based on the so called "score+search" paradigm, our method is by construction well protected from the selection-induced bias, and (ii) models with good performances are invariably characterized by an extraordinarily simple structure of fixed effects. The complexity of model selection - a very challenging and difficult task in a strictly parametric context with large and sparse tables - is therefore significantly defused by our approach. An attractive collateral result of our analysis are fruitful new ideas about modeling in small area estimation problems, where interest is in total counts over cells with a small number of observations.

read more

Citations
More filters
Journal ArticleDOI

Subset Selection in Regression

TL;DR: Chapman and Miller as mentioned in this paper, Subset Selection in Regression (Monographs on Statistics and Applied Probability, no. 40, 1990) and Section 5.8.
Posted Content

Discrete multivariate distributions

Oleg Yu. Vorobyev, +1 more
- 05 Nov 2008 - 
TL;DR: In this paper, the authors introduced two new discrete distributions: multivariate Binomial distribution and multivariate Poisson distribution, which were created in eventology as more correct generalizations of Binomial and Poisson distributions.
Posted Content

Assessing identification risk in survey microdata using log-linear models

TL;DR: This article developed new criteria for assessing the specification of a log-linear model in relation to the accuracy of risk estimates and found that within a class of "reasonable" models, risk estimates tend to decrease as the complexity of the model increases.
Posted Content

Optimal disclosure risk assessment

TL;DR: In this article, the authors study nonparametric estimation of the disclosure risk under the Poisson abundance model for sample records and establish a lower bound for the minimax NMSE for the estimation.
Journal ArticleDOI

Optimal disclosure risk assessment

TL;DR: A class of linear estimators of $\tau_{1}$ that are simple, computationally efficient and scalable to massive datasets, and they provably estimate all of the way up to the sampling fraction, with vanishing normalized mean-square error (NMSE) for large $n).
References
More filters
Journal ArticleDOI

Equation of state calculations by fast computing machines

TL;DR: In this article, a modified Monte Carlo integration over configuration space is used to investigate the properties of a two-dimensional rigid-sphere system with a set of interacting individual molecules, and the results are compared to free volume equations of state and a four-term virial coefficient expansion.
Journal ArticleDOI

A Bayesian Analysis of Some Nonparametric Problems

TL;DR: In this article, a class of prior distributions, called Dirichlet process priors, is proposed for nonparametric problems, for which treatment of many non-parametric statistical problems may be carried out, yielding results that are comparable to the classical theory.
Journal ArticleDOI

Hybrid Monte Carlo

TL;DR: In this article, a hybrid (molecular dynamics/Langevin) algorithm is used to guide a Monte Carlo simulation of lattice field theory, which is especially efficient for quantum chromodynamics which contain fermionic degrees of freedom.
Journal ArticleDOI

Bayesian Density Estimation and Inference Using Mixtures

TL;DR: In this article, the authors describe and illustrate Bayesian inference in models for density estimation using mixtures of Dirichlet processes and show convergence results for a general class of normal mixture models.
Journal ArticleDOI

Markov Chain Sampling Methods for Dirichlet Process Mixture Models

TL;DR: In this article, Markov chain methods for sampling from the posterior distribution of a Dirichlet process mixture model are presented, and two new classes of methods are presented. But neither of these methods is suitable for handling general models with non-conjugate priors.
Related Papers (5)