Showing papers in &quot;International Statistical Review in 2011&quot;

Causality: Models, Reasoning and Inference, Second Edition by Judea Pearl

TL;DR: In this paper, the authors review statistical methods which combine hidden Markov models (HMMs) and random effects models in a longitudinal setting, leading to the class of so-called mixed HMMs.

...read moreread less

Abstract: In this paper we review statistical methods which combine hidden Markov models (HMMs) and random effects models in a longitudinal setting, leading to the class of so-called mixed HMMs. This class of models has several interesting features. It deals with the dependence of a response variable on covariates, serial dependence, and unobserved heterogeneity in an HMM framework. It exploits the properties of HMMs, such as the relatively simple dependence structure and the efficient computational procedure, and allows one to handle a variety of real-world time-dependent data. We give details of the Expectation-Maximization algorithm for computing the maximum likelihood estimates of model parameters and we illustrate the method with two real applications describing the relationship between patent counts and research and development expenditures, and between stock and market returns via the Capital Asset Pricing Model.

...read moreread less

106 citations

Journal Article•DOI•

[...]

Jayanta K. Ghosh¹•Institutions (1)

Purdue University¹

Connections between Survey Calibration Estimators and Semiparametric Models for Incomplete Data

103 citations

Journal Article•DOI•

[...]

Thomas Lumley¹, Thomas Lumley², Pamela A. Shaw, James Y. Dai¹•Institutions (2)

Fred Hutchinson Cancer Research Center¹, University of Washington²

Towards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database

TL;DR: This paper relates the survey calibration estimators to the semiparametric incomplete-data estimators of Robins and coworkers, and to adjustment for baseline variables in a randomized trial.

...read moreread less

Abstract: Summary Survey calibration (or generalized raking) estimators are a standard approach to the use of auxiliary information in survey sampling, improving on the simple Horvitz–Thompson estimator. In this paper we relate the survey calibration estimators to the semiparametric incomplete-data estimators of Robins and coworkers, and to adjustment for baseline variables in a randomized trial. The development based on calibration estimators explains the “estimated weights” paradox and provides useful heuristics for constructing practical estimators. We present some examples of using calibration to gain precision without making additional modelling assumptions in a variety of regression models.

...read moreread less

100 citations

Journal Article•DOI•

[...]

Satkartar K. Kinney¹, Jerome P. Reiter², Arnold P. Reznek³, Javier Miranda³, Ron S. Jarmin³, John M. Abowd⁴ - Show less +2 more•Institutions (4)

Research Triangle Park¹, Duke University², United States Census Bureau³, Cornell University⁴

Adjustments and their Consequences—Collapsibility Analysis using Graphical Models

TL;DR: The first-ever business microdata set publicly released in the United States was the Synthetic Longitudinal Business Database (SLDB) as discussed by the authors, which was created by the U.S. Bureau of Census and the Internal Revenue Service.

...read moreread less

Abstract: Resume Dans la plupart des pays, les instituts nationaux de statistique ne publient pas les micro-donnees relatives aux entreprises. Les publier presente en effet un risque trop eleve de rupture de confidentialite. Ce risque peut etre evite par un recours a des donnees synthetiques---des donnees simulees a partir de modeles statistiques reproduisant la loi des veritables micro-donnees. Dans cet article, nous decrivons une application de cette strategie a la creation d'une telle base de donnees a partir des resultats du recensement economique annuel des entreprises americaines. Cette base de donnee comprend plus de 20 millions d'entreprises sur une periode remontant a 1976. L'U.S. Bureau of Census et l'Internal Revenue Service ont recemment approuve la publication sous forme synthetique de ces micro-donnees, faisant ainsi de la Longitudinal Business Database le premier ensemble de micro-donnees de ce type accessible au public aux Etats-Unis. Nous expliquons la facon dont cette base de donnees synthetiques a ete creee, comment sa validite a ete testee, et comment son risque de rupture de confidentialite a eteevalue. Summary In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments’ confidentiality. One approach with the potential for overcoming these risks is to release synthetic data; that is, the released establishment data are simulated from statistical models designed to mimic the distributions of the underlying real microdata. In this article, we describe an application of this strategy to create a public use file for the Longitudinal Business Database, an annual economic census of establishments in the United States comprising more than 20 million records dating back to 1976. The U.S. Bureau of the Census and the Internal Revenue Service recently approved the release of these synthetic microdata for public use, making the synthetic Longitudinal Business Database the first-ever business microdata set publicly released in the United States. We describe how we created the synthetic data, evaluated analytical validity, and assessed disclosure risk.

...read moreread less

96 citations

Journal Article•DOI•

[...]

Sander Greenland¹, Judea Pearl¹•Institutions (1)

University of California, Los Angeles¹

Non-Fundamentalness in Structural Econometric Models: A Review

TL;DR: Probabilistic and graphical rules for detecting situations in which a dependence of one variable on another is altered by adjusting for a third variable are considered, whether that dependence is causal or purely predictive.

...read moreread less

Abstract: We consider probabilistic and graphical rules for detecting situations in which a dependence of one variable on another is altered by adjusting for a third variable (i.e., noncollapsibility), whether that dependence is causal or purely predictive. We focus on distinguishing situations in which adjustment will reduce, increase, or leave unchanged the degree of bias in an association of two variables when that association is taken to represent a causal effect of one variable on the other. We then consider situations in which adjustment may partially remove or introduce a potential source of bias in estimating causal effects, and some additional special cases useful for casecontrol studies, cohort studies with loss, and trials with noncompliance (nonadherence).

...read moreread less

87 citations

Journal Article•DOI•

[...]

Lucia Alessi¹, Matteo Barigozzi², Matteo Barigozzi³, Marco Capasso⁴•Institutions (4)

European Central Bank¹, Université libre de Bruxelles², London School of Economics and Political Science³, Maastricht University⁴

Multilevel Analysis: Techniques and Applications, Second Edition by Joop J. Hox

TL;DR: In this paper, structural dynamic factor models have been used to analyze the relationship between hours worked and technology shocks and compare it with traditional VAR results, showing that the fundamentalness of the shocks is always guaranteed in this framework.

...read moreread less

Abstract: Economic theory typically assumes the existence of few unobserved unpredictable stochastic disturbances, called structural shocks, driving the whole economy. Would the economy be representable as a very high dimensional stochastic vector process, those shocks would be the reduced rank innovation of that process. In practice, however, only a few components of that process are observable, the innovation of which, in general, does not coincide with the structural shocks. As a result, a MA representation of the observed process in terms of the structural shocks will be a noninvertible one. In other words, the structural shocks driving the economy do not belong to the past of the observed series, but also involve their future, i.e. they are nonfundamental for the observed process. It follows that the present values of those structural shocks cannot be recovered from the observations; in particular, fitting causal VARs to the observed series can be extremely misleading. We review economic literature on VARs and we provide many examples of small size economic models that imply nonfundamentalness. If fundamental shocks nevertheless are to be recovered, the only solution consists in enlarging the space of observations, that is, in considering a very large panel of related time series. Among several alternatives, we consider the dynamic factor model methodology, which requires very little additional assumptions. We review first their economic interpretation and then we show how fundamentalness of the shocks is always guaranteed in this framework. Finally, by means of structural dynamic factor models we provide new empirical evidence on the relation between hours worked and technology shocks and we compare it with traditional VAR results.

...read moreread less

84 citations

Journal Article•DOI•

[...]

Anna Szczepańska

Inequalities: Theory of Majorization and Its Applications, Second Edition by Albert W. Marshall, Ingram Olkin, Barry C. Arnold

70 citations

Journal Article•DOI•

[...]

Simo Puntanen¹•Institutions (1)

University of Tampere¹

Statistics for Archaeologists: A Common Sense Approach, Second Edition by Robert D. Drennan

67 citations

Journal Article•DOI•

[...]

Norman R. Draper¹•Institutions (1)

University of Wisconsin-Madison¹

Risk-Utility Paradigms for Statistical Disclosure Limitation: How to Think, But Not How to Act

43 citations

Journal Article•DOI•

[...]

Lawrence H. Cox¹, Alan F. Karr¹, Satkartar K. Kinney¹•Institutions (1)

Research Triangle Park¹

Negative Binomial Regression, Second Edition by Joseph M. Hilbe

TL;DR: The authors argue that risk-utility formulations for problems of statistical disclosure limitation are now common and argue that these approaches are powerful guides to official statistics agencies in regard to how to think about disclosure limitation problems, but that they fall short from providing a sound basis for acting upon the problems.

...read moreread less

Abstract: Risk-utility formulations for problems of statistical disclosure limitation are now common. We argue that these approaches are powerful guides to official statistics agencies in regard to how to think about disclosure limitation problems, but that they fall short in essential ways from providing a sound basis for acting upon the problems. We illustrate this position in three specific contexts—transparency, tabular data and survey weights, with shorter consideration of two key emerging issues—longitudinal data and the use of administrative data to augment surveys.

...read moreread less

37 citations

Journal Article•DOI•

[...]

Carl M. O'Brien¹•Institutions (1)

Centre for Environment, Fisheries and Aquaculture Science¹

A Review of the Use of Conditional Likelihood in Capture-Recapture Experiments

Journal Article•DOI•

[...]

Richard Huggins¹, Wen-Han Hwang²•Institutions (2)

University of Melbourne¹, National Chung Hsing University²

Modelling Time Series of Counts in Epidemiology

TL;DR: In this article, a modern perspective of the conditional likelihood approach to the analysis of capture-recapture experiments is presented, which shows the conditional probability to be a member of generalized linear model (GLM), and there is the potential to apply the full range of GLM methodologies.

...read moreread less

Abstract: Resume Nous presentons une perspective moderne de l'approche par vraisemblances conditionnelles de l'analyse des experiences de capture-recapture. Nous montrons que ces vraisemblances conditionnelles relevent d'un modele lineaire generalise, ce qui permet l'application des nombreuses methodes elaborees dans ce cadre. Pour replacer ces applications dans leur contexte, nous passons en revue quelques-unes des approches existantes dans les modeles de capture-recapture avec probabilites de capture heterogenes au sein de populations fermees. Nous decrivons, en particulier, l'utilisation de modeles de melange parametriques et non parametriques, et examinons de facon plus detaillee le cas ou les probabilites de capture sont fonction de covariables. Summary We present a modern perspective of the conditional likelihood approach to the analysis of capture-recapture experiments, which shows the conditional likelihood to be a member of generalized linear model (GLM). Hence, there is the potential to apply the full range of GLM methodologies. To put this method in context, we first review some approaches to capture-recapture experiments with heterogeneous capture probabilities in closed populations, covering parametric and non-parametric mixture models and the use of covariates. We then review in more detail the analysis of capture-recapture experiments when the capture probabilities depend on a covariate.

...read moreread less

Journal Article•DOI•

[...]

Alexandra M. Schmidt¹, João B. M. Pereira¹•Institutions (1)

Federal University of Rio de Janeiro¹

The Cambridge Dictionary of Statistics, Fourth Edition by B. S. Everitt, A. Skrondal

TL;DR: In this article, generalized dynamic models for time series of count data are presented, where the probability of presence of the disease given no cases were observed were observed. But these models have a parameter for each time, which captures possible extra-variation present in the data.

...read moreread less

Abstract: Summary We review generalized dynamic models for time series of count data. Usually temporal counts are modelled as following a Poisson distribution, and a transformation of the mean depends on parameters which evolve smoothly with time. We generalize the usual dynamic Poisson model by considering continuous mixtures of the Poisson distribution. We consider Poisson-gamma and Poisson-log-normal mixture models. These models have a parameter for each time t which captures possible extra-variation present in the data. If the time interval between observations is short, many observed zeros might result. We also propose zero inflated versions of the models mentioned above. In epidemiology, when a count is equal to zero, one does not know if the disease is present or not. Our model has a parameter which provides the probability of presence of the disease given no cases were observed. We rely on the Bayesian paradigm to obtain estimates of the parameters of interest, and discuss numerical methods to obtain samples from the resultant posterior distribution. We fit the proposed models to artificial data sets and also to a weekly time series of registered number of cases of dengue fever in a district of the city of Rio de Janeiro, Brazil, during 2001 and 2002. Resume Nous passons en revue les modeles dynamiques generalises pour les series temporelles de donnees de comptages. Les donnees de comptages sont generalement modelisees sous la forme d'une loi de Poisson dont la moyenne, ou une de ses transformations, evolue de facon reguliere dans le temps. Nous generalisons cette approche traditionnelle en l'etendant a des melanges de lois de Poisson. Nous considerons en particulier des melanges du type Poisson-gamma et Poisson-lognormal. Ces melanges dependent d'un parametre qui varie selon la date d'observation. Si les intervalles entre les observations sont courts, de nombreuses observations prennent la valeur zero. Nous proposons egalement des versions “zero inflated” des modeles decrits plus haut. En epidemiologie, lorsque la valeur zero se presente, il est impossible de savoir si la maladie est presente ou non. Notre modele comporte un parametre qui fournit la probabilite pour que la maladie soit presente conditionnellement a l'absence de cas observes. Nous nous fondons sur le paradigme Bayesien pour obtenir des estimateurs des parametres d'interet, et discutons les methodes numeriques permettant de construire des echantillons provenant de la loi a posteriori. Nous ajustons les modeles proposes a un ensemble de donnees artificielles ainsi qu'a la serie observee du nombre de cas de dengue enregistres dans un quartier de Rio de Janeiro entre 2001 et 2002.

...read moreread less

Journal Article•DOI•

[...]

Norman R. Draper¹•Institutions (1)

University of Wisconsin-Madison¹

Logistic Regression: A Self-Learning Text, Third Edition by David G. Kleinbaum, Mitchel Klein

Journal Article•DOI•

[...]

Alice Richardson¹•Institutions (1)

University of Canberra¹

Alternative Approaches to Multilevel Modelling of Survey Non-Contact and Refusal

Journal Article•DOI•

[...]

Fiona Steele¹, Gabriele B. Durrant²•Institutions (2)

University of Bristol¹, University of Southampton²

Estimation of Common Factors under Cross‐Sectional and Temporal Aggregation Constraints

TL;DR: In this article, three alternative approaches to modelling survey non-contact and refusal: multinomial, sequential, and sample selection (bivariate probit) models are reviewed and compared in an analysis of household non-response in the United Kingdom, using a data set with unusually rich information on both respondents and nonrespondents from six major surveys.

...read moreread less

Abstract: We review three alternative approaches to modelling survey non-contact and refusal: multinomial, sequential, and sample selection (bivariate probit) models. We then propose a multilevel extension of the sample selection model to allow for both interviewer effects and dependency between non-contact and refusal rates at the household and interviewer level. All methods are applied and compared in an analysis of household non-response in the United Kingdom, using a data set with unusually rich information on both respondents and non-respondents from six major surveys. After controlling for household characteristics, there is little evidence of residual correlation between the unobserved characteristics affecting non-contact and refusal propensities at either the household or the interviewer level. We also find that the estimated coefficients of the multinomial and sequential models are surprisingly similar, which further investigation via a simulation study suggests is due to non-contact and refusal having largely different predictors

...read moreread less

Journal Article•DOI•

[...]

Tommaso Proietti¹•Institutions (1)

University of Sydney¹

A Beginner's Guide to Structural Equation Modeling, Third Edition by Randall E. Schumacker, Richard G. Lomax

TL;DR: In this article, the authors present an estimation of the modele a facteurs dynamiques multifrequences de grande dimension for l'activiteeconomique de la zone euro.

...read moreread less

Abstract: Resume Cet article presente l'estimation d'un modele a facteurs dynamiques multifrequences de grande dimension pour l'activiteeconomique de la zone euro. La base de donnees analysee comprend diverses series mensuelles, ainsi que les evaluations trimestrielles des Produits Nationaux Bruts (PNB) et de leurs principales composantes, telles qu'elles apparaissent dans les publications trimestrielles des Comptes Nationaux. Ces dernieres constituent des mesures de l'activiteeconomique reelle (ainsi, la ventilation du PNB par branche d'activite) que nous desirons introduire dans le modele a facteurs de facon a accroitre la representativite des facteurs. Le probleme est que les donnees relatives aux PNB sont trimestrielles, et publiees avec un delai plus ou moins long. Notre modele est un modele a facteurs traditionnel, formule aux frequences mensuelles en termes de representation stationnaire des variables. Cette formulation devient non lineaire, toutefois, quand les contraintes liees a l'observation sont prises en compte. Ces contraintes sont de deux types: contraintes observationnelles liees a l'agregation temporelle non lineaire (le modele fait intervenir des variations logarithmiques mensuelles non observables, alors que seules sont observees leurs sommes trimestrielles), et contraintes longitudinales non lineaires liees a la nature des donnees de type PNB. Nous fournissons un traitement exact des contraintes observationnelles, et proposons des algorithmes iteratifs pour l'estimation du modele a facteurs. Cette estimation permet le “nowcasting” des PNB mensuels et de leurs composantes, ainsi qu'une mesure de la fiabilite des “nowcasts” ainsi obtenus. Summary The paper estimates a large-scale mixed-frequency dynamic factor model for the euro area, using monthly series along with gross domestic product (GDP) and its main components, obtained from the quarterly national accounts (NA). The latter define broad measures of real economic activity (such as GDP and its decomposition by expenditure type and by branch of activity) that we are willing to include in the factor model, in order to improve its coverage of the economy and thus the representativeness of the factors. The main problem with their inclusion is not one of model consistency, but rather of data availability and timeliness, as the NA series are quarterly and are available with a large publication lag. Our model is a traditional dynamic factor model formulated at the monthly frequency in terms of the stationary representation of the variables, which however becomes nonlinear when the observational constraints are taken into account. These are of two kinds: nonlinear temporal aggregation constraints, due to the fact that the model is formulated in terms of the unobserved monthly logarithmic changes, but we observe only the sum of the monthly levels within a quarter, and nonlinear cross-sectional constraints, since GDP and its main components are linked by the NA identities, but the series are expressed in chained volumes. The paper provides an exact treatment of the observational constraints and proposes iterative algorithms for estimating the parameters of the factor model and for signal extraction, thereby producing nowcasts of monthly GDP and its main components, as well as measures of their reliability.

...read moreread less

Journal Article•DOI•

[...]

Kimmo Vehkalahti¹•Institutions (1)

University of Helsinki¹

Reading Keynes' Treatise on Probability

Journal Article•DOI•

[...]

Christian P. Robert¹•Institutions (1)

Institut Universitaire de France¹

Three Factors to Signal Non‐Response Bias With Applications to Categorical Auxiliary Variables

TL;DR: A Treatise on Probability as mentioned in this paper was published by John Maynard Keynes in 1921 and contains a critical assessment of the foundations of probability and of the current statistical methodology, including the Bayesian approach.

...read moreread less

Abstract: The book A Treatise on Probability was published by John Maynard Keynes in 1921. It contains a critical assessment of the foundations of probability and of the current statistical methodology. As a modern reader, we review here the aspects that are most related with statistics, avoiding a neophyte's perspective on the philosophical issues. In particular, the book is quite critical of the Bayesian approach and we examine the arguments provided by Keynes, as well as the alternative he proposes. This review does not subsume the scholarly study of Aldrich (2008a) relating Keynes with the statistics community of the time.

...read moreread less

Journal Article•DOI•

[...]

Carl-Erik Särndal¹•Institutions (1)

Statistics Sweden¹

Tracking Progress Towards Statistical Capacity Building Efforts: The African Statistical Development Index

TL;DR: In this paper, a bias reduction indicator is proposed and expressed as a product of three factors reflecting familiar statistical ideas, which provide a useful perspective on the components that constitute non-response bias in estimates.

...read moreread less

Abstract: Summary Non-response causes bias in survey estimates. The unknown bias can be reduced, for example as in this paper by the use of a calibration estimator built on powerful auxiliary information. Still, some bias will always remain. A bias reduction indicator is proposed and expressed as a product of three factors reflecting familiar statistical ideas. These factors provide a useful perspective on the components that constitute non-response bias in estimates. To illustrate the indicator, we focus on the important case with information defined by one or more categorical auxiliary variables, each expressed by two or more properties or traits. Together, the auxiliary variables may represent a large number of traits, more or less important for bias reduction. An examination of the three factors of the bias reduction indicator brings the insight that the ultimate auxiliary vector for calibration need not or should not contain all available traits; some are unimportant or detrimental to bias reduction. The question becomes one of selection of traits, not of complete auxiliary variables. Empirical examples are given, and a stepwise procedure for selecting important traits is proposed.

...read moreread less

Journal Article•DOI•

[...]

Dimitri Sanga¹, Bakary Dosso², Steve Loris Gui-Diby¹•Institutions (2)

United Nations Economic Commission for Africa¹, United Nations Development Programme²

Research Design and Statistical Analysis, Third Edition by Jerome L. Myers, Arnold D. Well, Robert F. Lorch, Jr.

TL;DR: The African Statistical Development Index (ASDI) as mentioned in this paper is a composite index that aims at supporting the monitoring and evaluation of the implementation of the Reference Regional Strategic Framework for Statistical Capacity Building in Africa.

...read moreread less

Abstract: Resume Cet article presente l'Indice de developpement statistique africain, un indice composite ayant pour objectif de supporter le suivi et l'evaluation de la mise en oeuvre du Cadre strategique regional de reference pour le renforcement des capacites statistiques en Afrique. Cet indice permet, entre autres, d'identifier les forces et faiblesses du systeme statistique national de chaque pays africain en vue de favoriser des interventions ciblees de la part des intervenants. Pour ce faire, cet article presente la raison d'etre et le contexte entourant le developpement de l'indice. Par la suite, il s'attarde sur la methodologie entourant son developpement incluant la selection des composantes et des variables, les ponderations et le processus d'agregation ainsi que celui de validation. Il presente une application de la methodologie sur un echantillon des pays africains et compare finalement l'indice a d'autres indices de developpement statistique existants sans oublier les limitations relatives a son utilisation. Summary This paper presents the development of the African Statistical Development Index, a composite index that aims at supporting the monitoring and evaluation of the implementation of the Reference Regional Strategic Framework for Statistical Capacity Building in Africa. It also helps to identify, for each African country, weaknesses and strengths of the National Statistical Systems so that support interventions can be developed. The paper first gives the rationale behind the development of the index as well as the context. It then elaborates on the methodology used to develop the index, including the selection of components and variables, the scaling of the variables, the weighting and aggregation schemes, and the validation process. The methodology is applied to a sample of African countries. Finally, the paper compares the proposed index to existing statistical capacity building indicators and highlights the related limitations.

...read moreread less

Journal Article•DOI•

[...]

Anna Szczepańska

Principles and Theory for Data Mining and Machine Learning by Bertrand Clarke, Ernest Fokoué, Hao Helen Zhang

Journal Article•DOI•

[...]

John H. Maindonald¹•Institutions (1)

Australian National University¹

Autocorrelation Functions: Autocorrelation Functions

Journal Article•DOI•

[...]

Richard Finlay, Thomas Fung, Eugene Seneta

Numerical Analysis for Statisticians, Second Edition by Kenneth Lange

Journal Article•DOI•

[...]

Christian P. Robert¹•Institutions (1)

Paris Dauphine University¹

Measurements and their Uncertainties: A Practical Guide to Modern Error Analysis by Ifan G. Hughes, Thomas P. A. Hase

Journal Article•DOI•

[...]

David J. Hand¹•Institutions (1)

Imperial College London¹

Multiple Comparisons Using R by Frank Bretz, Torsten Hothorn, Peter Westfall

Journal Article•DOI•

[...]

Alice Richardson¹•Institutions (1)

University of Canberra¹

Adjusting for Non-Response in Population-Based Case-Control Studies

Journal Article•DOI•

[...]

Yannan Jiang¹, Alastair Scott¹, Chris J. Wild¹•Institutions (1)

University of Auckland¹