scispace - formally typeset
Search or ask a question

Showing papers in "International Statistical Review in 2011"


Journal ArticleDOI
TL;DR: In this paper, the authors review statistical methods which combine hidden Markov models (HMMs) and random effects models in a longitudinal setting, leading to the class of so-called mixed HMMs.
Abstract: In this paper we review statistical methods which combine hidden Markov models (HMMs) and random effects models in a longitudinal setting, leading to the class of so-called mixed HMMs. This class of models has several interesting features. It deals with the dependence of a response variable on covariates, serial dependence, and unobserved heterogeneity in an HMM framework. It exploits the properties of HMMs, such as the relatively simple dependence structure and the efficient computational procedure, and allows one to handle a variety of real-world time-dependent data. We give details of the Expectation-Maximization algorithm for computing the maximum likelihood estimates of model parameters and we illustrate the method with two real applications describing the relationship between patent counts and research and development expenditures, and between stock and market returns via the Capital Asset Pricing Model.

106 citations



Journal ArticleDOI
TL;DR: This paper relates the survey calibration estimators to the semiparametric incomplete-data estimators of Robins and coworkers, and to adjustment for baseline variables in a randomized trial.
Abstract: Summary Survey calibration (or generalized raking) estimators are a standard approach to the use of auxiliary information in survey sampling, improving on the simple Horvitz–Thompson estimator. In this paper we relate the survey calibration estimators to the semiparametric incomplete-data estimators of Robins and coworkers, and to adjustment for baseline variables in a randomized trial. The development based on calibration estimators explains the “estimated weights” paradox and provides useful heuristics for constructing practical estimators. We present some examples of using calibration to gain precision without making additional modelling assumptions in a variety of regression models.

100 citations


Journal ArticleDOI
TL;DR: The first-ever business microdata set publicly released in the United States was the Synthetic Longitudinal Business Database (SLDB) as discussed by the authors, which was created by the U.S. Bureau of Census and the Internal Revenue Service.
Abstract: Resume Dans la plupart des pays, les instituts nationaux de statistique ne publient pas les micro-donnees relatives aux entreprises. Les publier presente en effet un risque trop eleve de rupture de confidentialite. Ce risque peut etre evite par un recours a des donnees synthetiques---des donnees simulees a partir de modeles statistiques reproduisant la loi des veritables micro-donnees. Dans cet article, nous decrivons une application de cette strategie a la creation d'une telle base de donnees a partir des resultats du recensement economique annuel des entreprises americaines. Cette base de donnee comprend plus de 20 millions d'entreprises sur une periode remontant a 1976. L'U.S. Bureau of Census et l'Internal Revenue Service ont recemment approuve la publication sous forme synthetique de ces micro-donnees, faisant ainsi de la Longitudinal Business Database le premier ensemble de micro-donnees de ce type accessible au public aux Etats-Unis. Nous expliquons la facon dont cette base de donnees synthetiques a ete creee, comment sa validite a ete testee, et comment son risque de rupture de confidentialite a eteevalue. Summary In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments’ confidentiality. One approach with the potential for overcoming these risks is to release synthetic data; that is, the released establishment data are simulated from statistical models designed to mimic the distributions of the underlying real microdata. In this article, we describe an application of this strategy to create a public use file for the Longitudinal Business Database, an annual economic census of establishments in the United States comprising more than 20 million records dating back to 1976. The U.S. Bureau of the Census and the Internal Revenue Service recently approved the release of these synthetic microdata for public use, making the synthetic Longitudinal Business Database the first-ever business microdata set publicly released in the United States. We describe how we created the synthetic data, evaluated analytical validity, and assessed disclosure risk.

96 citations


Journal ArticleDOI
TL;DR: Probabilistic and graphical rules for detecting situations in which a dependence of one variable on another is altered by adjusting for a third variable are considered, whether that dependence is causal or purely predictive.
Abstract: We consider probabilistic and graphical rules for detecting situations in which a dependence of one variable on another is altered by adjusting for a third variable (i.e., noncollapsibility), whether that dependence is causal or purely predictive. We focus on distinguishing situations in which adjustment will reduce, increase, or leave unchanged the degree of bias in an association of two variables when that association is taken to represent a causal effect of one variable on the other. We then consider situations in which adjustment may partially remove or introduce a potential source of bias in estimating causal effects, and some additional special cases useful for casecontrol studies, cohort studies with loss, and trials with noncompliance (nonadherence).

87 citations


Journal ArticleDOI
TL;DR: In this paper, structural dynamic factor models have been used to analyze the relationship between hours worked and technology shocks and compare it with traditional VAR results, showing that the fundamentalness of the shocks is always guaranteed in this framework.
Abstract: Economic theory typically assumes the existence of few unobserved unpredictable stochastic disturbances, called structural shocks, driving the whole economy. Would the economy be representable as a very high dimensional stochastic vector process, those shocks would be the reduced rank innovation of that process. In practice, however, only a few components of that process are observable, the innovation of which, in general, does not coincide with the structural shocks. As a result, a MA representation of the observed process in terms of the structural shocks will be a noninvertible one. In other words, the structural shocks driving the economy do not belong to the past of the observed series, but also involve their future, i.e. they are nonfundamental for the observed process. It follows that the present values of those structural shocks cannot be recovered from the observations; in particular, fitting causal VARs to the observed series can be extremely misleading. We review economic literature on VARs and we provide many examples of small size economic models that imply nonfundamentalness. If fundamental shocks nevertheless are to be recovered, the only solution consists in enlarging the space of observations, that is, in considering a very large panel of related time series. Among several alternatives, we consider the dynamic factor model methodology, which requires very little additional assumptions. We review first their economic interpretation and then we show how fundamentalness of the shocks is always guaranteed in this framework. Finally, by means of structural dynamic factor models we provide new empirical evidence on the relation between hours worked and technology shocks and we compare it with traditional VAR results.

84 citations





Journal ArticleDOI
TL;DR: The authors argue that risk-utility formulations for problems of statistical disclosure limitation are now common and argue that these approaches are powerful guides to official statistics agencies in regard to how to think about disclosure limitation problems, but that they fall short from providing a sound basis for acting upon the problems.
Abstract: Risk-utility formulations for problems of statistical disclosure limitation are now common. We argue that these approaches are powerful guides to official statistics agencies in regard to how to think about disclosure limitation problems, but that they fall short in essential ways from providing a sound basis for acting upon the problems. We illustrate this position in three specific contexts—transparency, tabular data and survey weights, with shorter consideration of two key emerging issues—longitudinal data and the use of administrative data to augment surveys.

37 citations



Journal ArticleDOI
TL;DR: In this article, a modern perspective of the conditional likelihood approach to the analysis of capture-recapture experiments is presented, which shows the conditional probability to be a member of generalized linear model (GLM), and there is the potential to apply the full range of GLM methodologies.
Abstract: Resume Nous presentons une perspective moderne de l'approche par vraisemblances conditionnelles de l'analyse des experiences de capture-recapture. Nous montrons que ces vraisemblances conditionnelles relevent d'un modele lineaire generalise, ce qui permet l'application des nombreuses methodes elaborees dans ce cadre. Pour replacer ces applications dans leur contexte, nous passons en revue quelques-unes des approches existantes dans les modeles de capture-recapture avec probabilites de capture heterogenes au sein de populations fermees. Nous decrivons, en particulier, l'utilisation de modeles de melange parametriques et non parametriques, et examinons de facon plus detaillee le cas ou les probabilites de capture sont fonction de covariables. Summary We present a modern perspective of the conditional likelihood approach to the analysis of capture-recapture experiments, which shows the conditional likelihood to be a member of generalized linear model (GLM). Hence, there is the potential to apply the full range of GLM methodologies. To put this method in context, we first review some approaches to capture-recapture experiments with heterogeneous capture probabilities in closed populations, covering parametric and non-parametric mixture models and the use of covariates. We then review in more detail the analysis of capture-recapture experiments when the capture probabilities depend on a covariate.

Journal ArticleDOI
TL;DR: In this article, generalized dynamic models for time series of count data are presented, where the probability of presence of the disease given no cases were observed were observed. But these models have a parameter for each time, which captures possible extra-variation present in the data.
Abstract: Summary We review generalized dynamic models for time series of count data. Usually temporal counts are modelled as following a Poisson distribution, and a transformation of the mean depends on parameters which evolve smoothly with time. We generalize the usual dynamic Poisson model by considering continuous mixtures of the Poisson distribution. We consider Poisson-gamma and Poisson-log-normal mixture models. These models have a parameter for each time t which captures possible extra-variation present in the data. If the time interval between observations is short, many observed zeros might result. We also propose zero inflated versions of the models mentioned above. In epidemiology, when a count is equal to zero, one does not know if the disease is present or not. Our model has a parameter which provides the probability of presence of the disease given no cases were observed. We rely on the Bayesian paradigm to obtain estimates of the parameters of interest, and discuss numerical methods to obtain samples from the resultant posterior distribution. We fit the proposed models to artificial data sets and also to a weekly time series of registered number of cases of dengue fever in a district of the city of Rio de Janeiro, Brazil, during 2001 and 2002. Resume Nous passons en revue les modeles dynamiques generalises pour les series temporelles de donnees de comptages. Les donnees de comptages sont generalement modelisees sous la forme d'une loi de Poisson dont la moyenne, ou une de ses transformations, evolue de facon reguliere dans le temps. Nous generalisons cette approche traditionnelle en l'etendant a des melanges de lois de Poisson. Nous considerons en particulier des melanges du type Poisson-gamma et Poisson-lognormal. Ces melanges dependent d'un parametre qui varie selon la date d'observation. Si les intervalles entre les observations sont courts, de nombreuses observations prennent la valeur zero. Nous proposons egalement des versions “zero inflated” des modeles decrits plus haut. En epidemiologie, lorsque la valeur zero se presente, il est impossible de savoir si la maladie est presente ou non. Notre modele comporte un parametre qui fournit la probabilite pour que la maladie soit presente conditionnellement a l'absence de cas observes. Nous nous fondons sur le paradigme Bayesien pour obtenir des estimateurs des parametres d'interet, et discutons les methodes numeriques permettant de construire des echantillons provenant de la loi a posteriori. Nous ajustons les modeles proposes a un ensemble de donnees artificielles ainsi qu'a la serie observee du nombre de cas de dengue enregistres dans un quartier de Rio de Janeiro entre 2001 et 2002.



Journal ArticleDOI
TL;DR: In this article, three alternative approaches to modelling survey non-contact and refusal: multinomial, sequential, and sample selection (bivariate probit) models are reviewed and compared in an analysis of household non-response in the United Kingdom, using a data set with unusually rich information on both respondents and nonrespondents from six major surveys.
Abstract: We review three alternative approaches to modelling survey non-contact and refusal: multinomial, sequential, and sample selection (bivariate probit) models. We then propose a multilevel extension of the sample selection model to allow for both interviewer effects and dependency between non-contact and refusal rates at the household and interviewer level. All methods are applied and compared in an analysis of household non-response in the United Kingdom, using a data set with unusually rich information on both respondents and non-respondents from six major surveys. After controlling for household characteristics, there is little evidence of residual correlation between the unobserved characteristics affecting non-contact and refusal propensities at either the household or the interviewer level. We also find that the estimated coefficients of the multinomial and sequential models are surprisingly similar, which further investigation via a simulation study suggests is due to non-contact and refusal having largely different predictors

Journal ArticleDOI
TL;DR: In this article, the authors present an estimation of the modele a facteurs dynamiques multifrequences de grande dimension for l'activiteeconomique de la zone euro.
Abstract: Resume Cet article presente l'estimation d'un modele a facteurs dynamiques multifrequences de grande dimension pour l'activiteeconomique de la zone euro. La base de donnees analysee comprend diverses series mensuelles, ainsi que les evaluations trimestrielles des Produits Nationaux Bruts (PNB) et de leurs principales composantes, telles qu'elles apparaissent dans les publications trimestrielles des Comptes Nationaux. Ces dernieres constituent des mesures de l'activiteeconomique reelle (ainsi, la ventilation du PNB par branche d'activite) que nous desirons introduire dans le modele a facteurs de facon a accroitre la representativite des facteurs. Le probleme est que les donnees relatives aux PNB sont trimestrielles, et publiees avec un delai plus ou moins long. Notre modele est un modele a facteurs traditionnel, formule aux frequences mensuelles en termes de representation stationnaire des variables. Cette formulation devient non lineaire, toutefois, quand les contraintes liees a l'observation sont prises en compte. Ces contraintes sont de deux types: contraintes observationnelles liees a l'agregation temporelle non lineaire (le modele fait intervenir des variations logarithmiques mensuelles non observables, alors que seules sont observees leurs sommes trimestrielles), et contraintes longitudinales non lineaires liees a la nature des donnees de type PNB. Nous fournissons un traitement exact des contraintes observationnelles, et proposons des algorithmes iteratifs pour l'estimation du modele a facteurs. Cette estimation permet le “nowcasting” des PNB mensuels et de leurs composantes, ainsi qu'une mesure de la fiabilite des “nowcasts” ainsi obtenus. Summary The paper estimates a large-scale mixed-frequency dynamic factor model for the euro area, using monthly series along with gross domestic product (GDP) and its main components, obtained from the quarterly national accounts (NA). The latter define broad measures of real economic activity (such as GDP and its decomposition by expenditure type and by branch of activity) that we are willing to include in the factor model, in order to improve its coverage of the economy and thus the representativeness of the factors. The main problem with their inclusion is not one of model consistency, but rather of data availability and timeliness, as the NA series are quarterly and are available with a large publication lag. Our model is a traditional dynamic factor model formulated at the monthly frequency in terms of the stationary representation of the variables, which however becomes nonlinear when the observational constraints are taken into account. These are of two kinds: nonlinear temporal aggregation constraints, due to the fact that the model is formulated in terms of the unobserved monthly logarithmic changes, but we observe only the sum of the monthly levels within a quarter, and nonlinear cross-sectional constraints, since GDP and its main components are linked by the NA identities, but the series are expressed in chained volumes. The paper provides an exact treatment of the observational constraints and proposes iterative algorithms for estimating the parameters of the factor model and for signal extraction, thereby producing nowcasts of monthly GDP and its main components, as well as measures of their reliability.


Journal ArticleDOI
TL;DR: A Treatise on Probability as mentioned in this paper was published by John Maynard Keynes in 1921 and contains a critical assessment of the foundations of probability and of the current statistical methodology, including the Bayesian approach.
Abstract: The book A Treatise on Probability was published by John Maynard Keynes in 1921. It contains a critical assessment of the foundations of probability and of the current statistical methodology. As a modern reader, we review here the aspects that are most related with statistics, avoiding a neophyte's perspective on the philosophical issues. In particular, the book is quite critical of the Bayesian approach and we examine the arguments provided by Keynes, as well as the alternative he proposes. This review does not subsume the scholarly study of Aldrich (2008a) relating Keynes with the statistics community of the time.

Journal ArticleDOI
TL;DR: In this paper, a bias reduction indicator is proposed and expressed as a product of three factors reflecting familiar statistical ideas, which provide a useful perspective on the components that constitute non-response bias in estimates.
Abstract: Summary Non-response causes bias in survey estimates. The unknown bias can be reduced, for example as in this paper by the use of a calibration estimator built on powerful auxiliary information. Still, some bias will always remain. A bias reduction indicator is proposed and expressed as a product of three factors reflecting familiar statistical ideas. These factors provide a useful perspective on the components that constitute non-response bias in estimates. To illustrate the indicator, we focus on the important case with information defined by one or more categorical auxiliary variables, each expressed by two or more properties or traits. Together, the auxiliary variables may represent a large number of traits, more or less important for bias reduction. An examination of the three factors of the bias reduction indicator brings the insight that the ultimate auxiliary vector for calibration need not or should not contain all available traits; some are unimportant or detrimental to bias reduction. The question becomes one of selection of traits, not of complete auxiliary variables. Empirical examples are given, and a stepwise procedure for selecting important traits is proposed.

Journal ArticleDOI
TL;DR: The African Statistical Development Index (ASDI) as mentioned in this paper is a composite index that aims at supporting the monitoring and evaluation of the implementation of the Reference Regional Strategic Framework for Statistical Capacity Building in Africa.
Abstract: Resume Cet article presente l'Indice de developpement statistique africain, un indice composite ayant pour objectif de supporter le suivi et l'evaluation de la mise en oeuvre du Cadre strategique regional de reference pour le renforcement des capacites statistiques en Afrique. Cet indice permet, entre autres, d'identifier les forces et faiblesses du systeme statistique national de chaque pays africain en vue de favoriser des interventions ciblees de la part des intervenants. Pour ce faire, cet article presente la raison d'etre et le contexte entourant le developpement de l'indice. Par la suite, il s'attarde sur la methodologie entourant son developpement incluant la selection des composantes et des variables, les ponderations et le processus d'agregation ainsi que celui de validation. Il presente une application de la methodologie sur un echantillon des pays africains et compare finalement l'indice a d'autres indices de developpement statistique existants sans oublier les limitations relatives a son utilisation. Summary This paper presents the development of the African Statistical Development Index, a composite index that aims at supporting the monitoring and evaluation of the implementation of the Reference Regional Strategic Framework for Statistical Capacity Building in Africa. It also helps to identify, for each African country, weaknesses and strengths of the National Statistical Systems so that support interventions can be developed. The paper first gives the rationale behind the development of the index as well as the context. It then elaborates on the methodology used to develop the index, including the selection of components and variables, the scaling of the variables, the weighting and aggregation schemes, and the validation process. The methodology is applied to a sample of African countries. Finally, the paper compares the proposed index to existing statistical capacity building indicators and highlights the related limitations.







Journal ArticleDOI
TL;DR: In this paper, the analysis of data from population-based case-control studies when there is appreciable non-response is discussed, and a class of estimating equations that are relatively easy to implement are developed.
Abstract: Summary In this paper we discuss the analysis of data from population-based case-control studies when there is appreciable non-response. We develop a class of estimating equations that are relatively easy to implement. For some important special cases, we also provide efficient semi-parametric maximum-likelihood methods. We compare the methods in a simulation study based on data from the Women's Cardiovascular Health Study discussed in Arbogast et al. (Estimating incidence rates from population-based case-control studies in the presence of non-respondents, Biometrical Journal 44, 227–239, 2002). Resume Cet article est consacrea une discussion de l'analyse des resultats d'etudes du type “case-control” presentant un taux appreciable de non-reponse. Nous developpons une classe d'equations d'estimation relativement faciles a mettre en oeuvre. Pour quelques cas particuliers importants, nous decrivons egalement des methodes de vraisemblance optimales au sens semi-parametrique. Une etude comparee de ces methodes est ensuite menee, par simulations, au depart d'une base de donnees etudiee par Arbogast et al. (2002).