scispace - formally typeset
Search or ask a question

Showing papers in "Statistical Science in 2010"


Journal ArticleDOI
TL;DR: A structure for thinking about matching methods and guidance on their use is provided, coalescing the existing research (both old and new) and providing a summary of where the literature on matching methods is now and where it should be headed.
Abstract: When estimating causal effects using observational data, it is desirable to replicate a randomized experiment as closely as possible by obtaining treated and control groups with similar covariate distributions. This goal can often be achieved by choosing well-matched samples of the original treated and control groups, thereby reducing bias due to the covariates. Since the 1970's, work on matching methods has examined how to best choose treated and control subjects for comparison. Matching methods are gaining popularity in fields such as economics, epidemiology, medicine, and political science. However, until now the literature and related advice has been scattered across disciplines. Researchers who are interested in using matching methods-or developing methods related to matching-do not have a single place to turn to learn about past and current research. This paper provides a structure for thinking about matching methods and guidance on their use, coalescing the existing research (both old and new) and providing a summary of where the literature on matching methods is now and where it should be headed.

3,952 citations


Journal ArticleDOI
TL;DR: The distinction between explanatory and predictive models is discussed in this paper, and the practical implications of the distinction to each step in the model- ing process are discussed as well as a discussion of the differences that arise in the process of modeling for an explanatory ver- sus a predictive goal.
Abstract: Statistical modeling is a powerful tool for developing and testing theories by way of causal explanation, prediction, and description. In many disciplines there is near-exclusive use of statistical modeling for causal ex- planation and the assumption that models with high explanatory power are inherently of high predictive power. Conflation between explanation and pre- diction is common, yet the distinction must be understood for progressing scientific knowledge. While this distinction has been recognized in the phi- losophy of science, the statistical literature lacks a thorough discussion of the many differences that arise in the process of modeling for an explanatory ver- sus a predictive goal. The purpose of this article is to clarify the distinction between explanatory and predictive modeling, to discuss its sources, and to reveal the practical implications of the distinction to each step in the model- ing process.

1,747 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that under a particular version of sequential ignorability assumption, the aver- age causal mediation effect (ACME) is nonparametrically identified.
Abstract: Causal mediation analysis is routinely conducted by applied researchers in a variety of disciplines. The goal of such an analysis is to investigate alternative causal mechanisms by examining the roles of intermediate variables that lie in the causal paths between the treat- ment and outcome variables. In this paper we first prove that under a particular version of sequential ignorability assumption, the aver- age causal mediation effect (ACME) is nonparametrically identified. We compare our identification assumption with those proposed in the literature. Some practical implications of our identification result are also discussed. In particular, the popular estimator based on the linear structural equation model (LSEM) can be interpreted as an ACME estimator once additional parametric assumptions are made. We show that these assumptions can easily be relaxed within and outside of the LSEM framework and propose simple nonparametric estimation strate- gies. Second, and perhaps most importantly, we propose a new sensi- tivity analysis that can be easily implemented by applied researchers within the LSEM framework. Like the existing identifying assumptions, the proposed sequential ignorability assumption may be too strong in many applied settings. Thus, sensitivity analysis is essential in order to examine the robustness of empirical findings to the possible existence of an unmeasured confounder. Finally, we apply the proposed methods to a randomized experiment from political psychology. We also make easy-to-use software available to implement the proposed methods.

1,158 citations


Journal ArticleDOI
TL;DR: Particle learning (PL) as mentioned in this paper extends existing particle methods by incorporating the estimation of static parameters via a fully-adapted filter that utilizes conditional sufficient statistics for parameters and/or states as particles.
Abstract: Particle learning (PL) provides state filtering, sequential parameter learning and smoothing in a general class of state space models. Our approach extends existing particle methods by incorporating the estimation of static parameters via a fully-adapted filter that utilizes conditional sufficient statistics for parameters and/or states as particles. State smoothing in the presence of parameter uncertainty is also solved as a by-product of PL. In a number of examples, we show that PL outperforms existing particle filtering alternatives and proves to be a competitor to MCMC.

291 citations


Journal ArticleDOI
TL;DR: One of the conclusions is that all IV methods encounter problems in the presence of effect modification by unobserved confounders, and it is recommended that practical applications of IV estimators be evaluated routinely by a sensitivity analysis.
Abstract: Instrumental variable (IV) methods are becoming increas- ingly popular as they seem to offer the only viable way to overcome the problem of unobserved confounding in observational studies. However, some attention has to be paid to the details, as not all such methods target the same causal parameters and some rely on more restrictive parametric assumptions than others. We therefore discuss and contrast the most common IV approaches with relevance to typical applications in observational epidemiology. Further, we illustrate and compare the asymptotic bias of these IV estimators when underlying assumptions are violated in a numerical study. One of our conclusions is that all IV methods encounter problems in the presence of effect modification by unobserved confounders. Since this can never be ruled out for sure, we recommend that practical applications of IV estimators be accompa- nied routinely by a sensitivity analysis.

193 citations


Journal ArticleDOI
TL;DR: In an application on the association between black carbon particulate matter air pollution and birth weight, controlling for large-scale spatial variation appears to reduce bias from unmeasured confounders, while increasing uncertainty in the estimated pollution effect.
Abstract: Residuals in regression models are often spatially correlated. Prominent examples include studies in environmental epidemiology to understand the chronic health effects of pollutants. I consider the effects of residual spatial structure on the bias and precision of regression coefficients, developing a simple framework in which to understand the key issues and derive informative analytic results. When unmeasured confounding introduces spatial structure into the residuals, regression models with spatial random effects and closely-related models such as kriging and penalized splines are biased, even when the residual variance components are known. Analytic and simulation results show how the bias depends on the spatial scales of the covariate and the residual: one can reduce bias by fitting a spatial model only when there is variation in the covariate at a scale smaller than the scale of the unmeasured confounding. I also discuss how the scales of the residual and the covariate affect efficiency and uncertainty estimation when the residuals are independent of the covariate. In an application on the association between black carbon particulate matter air pollution and birth weight, controlling for large-scale spatial variation appears to reduce bias from unmeasured confounders, while increasing uncertainty in the estimated pollution effect.

187 citations


Journal ArticleDOI
TL;DR: In this paper, a broad class of generalized linear models accommodating overdispersion and clustering through two separate sets of random effects is proposed, including conjugate random effects at the level of the mean and normal random effects embedded within the linear pre- dictor.
Abstract: Non-Gaussian outcomes are often modeled using members of the so-called exponential family. Notorious members are the Bernoulli model for binary data, leading to logistic regression, and the Poisson model for count data, leading to Poisson regression. Two of the main reasons for extending this family are (1) the occurrence of overdispersion, meaning that the vari- ability in the data is not adequately described by the models, which often exhibit a prescribed mean-variance link, and (2) the accommodation of hier- archical structure in the data, stemming from clustering in the data which, in turn, may result from repeatedly measuring the outcome, for various mem- bers of the same family, etc. The first issue is dealt with through a variety of overdispersion models, such as, for example, the beta-binomial model for grouped binary data and the negative-binomial model for counts. Clustering is often accommodated through the inclusion of random subject-specific ef- fects. Though not always, one conventionally assumes such random effects to be normally distributed. While both of these phenomena may occur simul- taneously, models combining them are uncommon. This paper proposes a broad class of generalized linear models accommodating overdispersion and clustering through two separate sets of random effects. We place particular emphasis on so-called conjugate random effects at the level of the mean for the first aspect and normal random effects embedded within the linear pre- dictor for the second aspect, even though our family is more general. The binary, count and time-to-event cases are given particular emphasis. Apart from model formulation, we present an overview of estimation methods, and then settle for maximum likelihood estimation with analytic-numerical in- tegration. Implications for the derivation of marginal correlations functions are discussed. The methodology is applied to data from a study in epileptic seizures, a clinical trial in toenail infection named onychomycosis and sur- vival data in children with asthma.

141 citations


Journal ArticleDOI
TL;DR: A reparameterisation of the MMPP which leads to a highly efficient RWM within Gibbs algorithm in certain circumstances is also developed.
Abstract: The random walk Metropolis (RWM) is one of the most common Markov Chain Monte Carlo algorithms in practical use today. Its theoretical properties have been extensively explored for certain classes of target, and a number of results with important practical implications have been derived. This article draws together a selection of new and existing key results and concepts and describes their implications. The impact of each new idea on algorithm efficiency is demonstrated for the practical example of the Markov modulated Poisson process (MMPP). A reparameterisation of the MMPP which leads to a highly efficient RWM within Gibbs algorithm in certain circumstances is also developed.

116 citations


Journal ArticleDOI
TL;DR: The EM algorithm is a special case of a more general algorithm called the MM algorithm as mentioned in this paper, which is used to solve high-dimensional optimization and estimation problems, such as random graph models, discriminant analysis and image restoration.
Abstract: The EM algorithm is a special case of a more general algorithm called the MM algorithm. Specific MM algorithms often have nothing to do with missing data. The first M step of an MM algorithm creates a surrogate function that is optimized in the second M step. In minimization, MM stands for majorize–minimize; in maximization, it stands for minorize–maximize. This two-step process always drives the objective function in the right direction. Construction of MM algorithms relies on recognizing and manipulating inequalities rather than calculating conditional expectations. This survey walks the reader through the construction of several specific MM algorithms. The potential of the MM algorithm in solving high-dimensional optimization and estimation problems is its most attractive feature. Our applications to random graph models, discriminant analysis and image restoration showcase this ability.

94 citations


Journal ArticleDOI
TL;DR: Aldous et al. as mentioned in this paper introduce and motivate a particular statistic R measuring shortness of routes in a network, and illustrate the trade-off between normalized network length and R in a one-parameter family of proximity graphs.
Abstract: Author(s): Aldous, DJ; Shun, J | Abstract: We review mathematically tractable models for connected networks on random points in the plane, emphasizing the class of proximity graphs which deserves to be better known to applied probabilists and statisticians. We introduce and motivate a particular statistic R measuring shortness of routes in a network. We illustrate, via Monte Carlo in part, the trade-off between normalized network length and R in a one-parameter family of proximity graphs. How close this family comes to the optimal trade-off over all possible networks remains an intriguing open question. The paper is a write-up of a talk developed by the first author during 2007-2009. © Institute of Mathematical Statistics, 2010.

74 citations


Journal ArticleDOI
TL;DR: Weak belief (WB) as discussed by the authors is an extension of the Dempster-Shafer (DS) theory for probabilistic reasoning based on a formal calculus for combining evi- dence.
Abstract: The Dempster-Shafer (DS) theory is a powerful tool for probabilistic reasoning based on a formal calculus for combining evi- dence. DS theory has been widely used in computer science and engi- neering applications, but has yet to reach the statistical mainstream, perhaps because the DS belief functions do not satisfy long-run fre- quency properties. Recently, two of the authors proposed an extension of DS, called the weak belief (WB) approach, that can incorporate de- sirable frequency properties into the DS framework by systematically enlarging the focal elements. The present paper reviews and extends this WB approach. We present a general description of WB in the context of inferential models, its interplay with the DS calculus, and the maximal belief solution. New applications of the WB method in two high-dimensional hypothesis testing problems are given. Simula- tions show that the WB procedures, suitably calibrated, perform well compared to popular classical methods. Most importantly, the WB ap- proach combines the probabilistic reasoning of DS with the desirable frequency properties of classical statistics.

Journal ArticleDOI
TL;DR: This work considers situations where data have been collected such that the sampling depends on the outcome of interest and possibly fur- ther covariates, as for instance in case-control studies, and gives sufficient graphical conditions for testing and estimating the causal effect of exposure on outcome.
Abstract: We consider situations where data have been collected such that the sampling depends on the outcome of interest and possibly fur- ther covariates, as for instance in case-control studies. Graphical mod- els represent assumptions about the conditional independencies among the variables. By including a node for the sampling indicator, assump- tions about sampling processes can be made explicit. We demonstrate how to read off such graphs whether consistent estimation of the asso- ciation between exposure and outcome is possible. Moreover, we give sufficient graphical conditions for testing and estimating t causal ef- fect of exposure on outcome. The practical use is illustrated with a number of examples.

Journal ArticleDOI
TL;DR: This article is basically the text of a recent talk featuring some examples from current practice, with a little bit of futuristic speculation, where indirect evidence seems too important to ignore.
Abstract: Familiar statistical tests and estimates are obtained by the direct observation of cases of interest: a clinical trial of a new drug, for instance, will compare the drug’s effects on a relevant set of patients and controls. Sometimes, though, indirect evidence may be temptingly available, perhaps the results of previous trials on closely related drugs. Very roughly speaking, the difference between direct and indirect statistical evidence marks the boundary between frequentist and Bayesian thinking. Twentieth-century statistical practice focused heavily on direct evidence, on the grounds of superior objectivity. Now, however, new scientific devices such as microarrays routinely produce enormous data sets involving thousands of related situations, where indirect evidence seems too important to ignore. Empirical Bayes methodology offers an attractive direct/indirect compromise. There is already some evidence of a shift toward a less rigid standard of statistical objectivity that allows better use of indirect evidence. This article is basically the text of a recent talk featuring some examples from current practice, with a little bit of futuristic speculation.

Book ChapterDOI
TL;DR: Omitting an important predictor of toxicity when dose assignments to cancer patients are determined results in a high percent of patients experiencing severe side effects and a significant proportion treated at sub-optimal doses, as shown in the recently completed ABR-217620 (naptumomab estafenatox).
Abstract: Traditionally, the major objective in phase I trials is to iden- tify a working-dose for subsequent studies, whereas the major endpoint in phase II and III trials is treatment efficacy. The dose sought is typ- ically referred to as the maximum tolerated dose (MTD). Several sta- tistical methodologies have been proposed to select the MTD in cancer phase I trials. In this manuscript, we focus on a Bayesian adaptive design, known as escalation with overdose control (EWOC). Several aspects of this design are discussed, including large sample properties of the sequence of doses selected in the trial, choice of prior distri- butions, and use of covariates. The methodology is exemplified with real-life examples of cancer phase I trials. In particular, we show in the recently completed ABR-217620 (naptumomab estafenatox) trial that omitting an important predictor of toxicity when dose assignments to cancer patients are determined results in a high percent of patients experiencing severe side effects and a significant proportion treated at sub-optimal doses.

Journal ArticleDOI
TL;DR: The basic approach, some connections to other methods, some generalizations, as well as further applications of the model are investigated and some new results which can provide guidance in practice are obtained.
Abstract: During the last twenty years there have been considerable methodological developments in the design and analysis of Phase 1, Phase 2 and Phase 1/2 dose-finding studies. Many of these developments are related to the continual reassessment method (CRM), first introduced by O’Quigley, Pepe and Fisher (1990). CRM models have proven themselves to be of practical use and, in this discussion, we investigate the basic approach, some connections to other methods, some generalizations, as well as further applications of the model. We obtain some new results which can provide guidance in practice.

Journal ArticleDOI
TL;DR: In this paper, the authors discuss the potential of graphics processing units (GPUs) in high-dimensional optimization problems and demonstrate the utility of GPUs in nonnegative matrix factorization, PET image reconstruction, and multidimensional scaling.
Abstract: This paper discusses the potential of graphics processing units (GPUs) in high-dimensional optimization problems. A single GPU card with hundreds of arithmetic cores can be inserted in a personal computer and dramatically accelerates many statistical algorithms. To exploit these devices fully, optimization algorithms should reduce to multiple parallel tasks, each accessing a limited amount of data. These criteria favor EM and MM algorithms that separate parameters and data. To a lesser extent block relaxation and coordinate descent and ascent also qualify. We demonstrate the utility of GPUs in nonnegative matrix factorization, PET image reconstruction, and multidimensional scaling. Speedups of 100 fold can easily be attained. Over the next decade, GPUs will fundamentally alter the landscape of computational statistics. It is time for more statisticians to get on-board.

Journal ArticleDOI
TL;DR: The resulting design is a convex combination of a "treatment" design and a "learning" design, thus directly address- ing the treatment versus experimentation dilemma inherent in Phase I trials and providing a simple and intuitive design for clinical use.
Abstract: Optimal design of a Phase I cancer trial can be formulated as a stochastic optimization problem. By making use of recent advances in approximate dynamic programming to tackle the problem, we develop an approximation of the Bayesian optimal design. The resulting design is a convex combination of a “treatment” design, such as Babb et al.’s (1998) escalation with overdose control, and a “learning” design, such as Haines et al.’s (2003) c-optimal design, thus directly addressing the treatment versus experimentation dilemma inherent in Phase I trials and providing a simple and intuitive design for clinical use. Computational details are given and the proposed design is compared to existing designs in a simulation study. The design can also be readily modified to include a first stage that cautiously escalates doses similarly to traditional nonparametric step-up/down schemes, while validating the Bayesian parametric model for the efficient model-based design in the second stage.

Journal ArticleDOI
TL;DR: In this paper, a variety of extensions and refinements have been developed for data augmentation based model fitting routines, such as the deterministic EM algorithm for mode finding and stochastic Gibbs sampler and other auxiliary-variable based methods for posterior sampling.
Abstract: In recent years, a variety of extensions and refinements have been developed for data augmentation based model fitting routines. These developments aim to extend the application, improve the speed and/or simplify the implementation of data augmentation methods, such as the deterministic EM algorithm for mode finding and stochastic Gibbs sampler and other auxiliary-variable based methods for posterior sampling. In this overview article we graphically illustrate and compare a number of these extensions, all of which aim to maintain the simplicity and computation stability of their predecessors. We particularly emphasize the usefulness of identifying similarities between the deterministic and stochastic counterparts as we seek more efficient computational strategies. We also demonstrate the applicability of data augmentation methods for handling complex models with highly hierarchical structure, using a high-energy high-resolution spectral imaging model for data from satellite telescopes, such as the Chandra X-ray Observatory.

Journal ArticleDOI
TL;DR: In this article, the authors examine the development in the critical period 1980-1990, when the ideas of Markov chain simulation from the statistical physics literature and the latent variable formulation in maximum likelihood computation (i.e., EM algorithm) came together to spark the widespread application of MCMC methods in Bayesian computation.
Abstract: It was known from Metropolis et al. [J. Chem. Phys. 21 (1953) 1087–1092] that one can sample from a distribution by performing Monte Carlo simulation from a Markov chain whose equilibrium distribution is equal to the target distribution. However, it took several decades before the statistical community embraced Markov chain Monte Carlo (MCMC) as a general computational tool in Bayesian inference. The usual reasons that are advanced to explain why statisticians were slow to catch on to the method include lack of computing power and unfamiliarity with the early dynamic Monte Carlo papers in the statistical physics literature. We argue that there was a deeper reason, namely, that the structure of problems in the statistical mechanics and those in the standard statistical literature are different. To make the methods usable in standard Bayesian problems, one had to exploit the power that comes from the introduction of judiciously chosen auxiliary variables and collective moves. This paper examines the development in the critical period 1980–1990, when the ideas of Markov chain simulation from the statistical physics literature and the latent variable formulation in maximum likelihood computation (i.e., EM algorithm) came together to spark the widespread application of MCMC methods in Bayesian computation.

Journal ArticleDOI
TL;DR: In this paper, a subclass of block-conditional MAR (BCMAR) models are proposed, and an associated block-monotone reduced likelihood strategy that typically yields consistent estimates by selectively discarding some data.
Abstract: Two major ideas in the analysis of missing data are (a) the EM algorithm [Dempster, Laird and Rubin, J. Roy. Statist. Soc. Ser. B 39 (1977) 1–38] for maximum likelihood (ML) estimation, and (b) the formulation of models for the joint distribution of the data Z and missing data indicators M, and associated “missing at random” (MAR) condition under which a model for M is unnecessary [Rubin, Biometrika 63 (1976) 581–592]. Most previous work has treated Z and M as single blocks, yielding selection or pattern-mixture models depending on how their joint distribution is factorized. This paper explores “block-sequential” models that interleave subsets of the variables and their missing data indicators, and then make parameter restrictions based on assumptions in each block. These include models that are not MAR. We examine a subclass of block-sequential models we call block-conditional MAR (BCMAR) models, and an associated block-monotone reduced likelihood strategy that typically yields consistent estimates by selectively discarding some data. Alternatively, full ML estimation can often be achieved via the EM algorithm. We examine in some detail BCMAR models for the case of two multinomially distributed categorical variables, and a two block structure where the first block is categorical and the second block arises from a (possibly multivariate) exponential family distribution.

Journal ArticleDOI
TL;DR: This paper will review several Bayesian early phase trial designs that were tailored to accommodate specific complexities of the treatment regime and patient outcomes in particular clinical settings.
Abstract: An early phase clinical trial is the first step in evaluating the effects in humans of a potential new anti-disease agent or combination of agents. Usually called “phase I” or “phase I/II” trials, these experiments typically have the nominal scientific goal of determining an acceptable dose, most often based on adverse event probabilities. This arose from a tradition of phase I trials to evaluate cytotoxic agents for treating cancer, although some methods may be applied in other medical settings, such as treatment of stroke or immunological diseases. Most modern statistical designs for early phase trials include model-based, outcome-adaptive decision rules that choose doses for successive patient cohorts based on data from previous patients in the trial. Such designs have seen limited use in clinical practice, however, due to their complexity, the requirement of intensive, computer-based data monitoring, and the medical community’s resistance to change. Still, many actual applications of model-based outcome-adaptive designs have been remarkably successful in terms of both patient benefit and scientific outcome. In this paper, I will review several Bayesian early phase trial designs that were tailored to accommodate specific complexities of the treatment regime and patient outcomes in particular clinical settings.

Journal ArticleDOI
TL;DR: First, the information principle, which is that the key to a good statistical method is not its underlying philosophy or mathematical reasoning, but rather what information the method allows us to use.
Abstract: First, the information principle, which is that the key to a good statistical method is not its underlying philosophy or mathematical reasoning, but rather what information the method allows us to use. Good methods make use of more information. This can come in different ways: in my own experience (following the lead of Efron and Morris, 1971, among others), hierarchical Bayes allows us to combine different data sources and weight them appropriately using partial pooling. Other statisticians find parametric Bayes too restrictive: in practice, parametric modeling typically comes down to conventional models such as the normal and

Journal ArticleDOI
TL;DR: In this paper, a two-groups mixed-effects model for the comparison of (normalized) microarray data from two treatment groups is considered, where the posterior odds of treatment × gene interactions, derived from the model, involve shrinkage estimates of both the interactions and of the gene specific error variances.
Abstract: A two-groups mixed-effects model for the comparison of (normalized) microarray data from two treatment groups is considered. Most competing parametric methods that have appeared in the literature are obtained as special cases or by minor modification of the proposed model. Approximate maximum likelihood fitting is accomplished via a fast and scalable algorithm, which we call LEMMA (Laplace approximated EM Microarray Analysis). The posterior odds of treatment × gene interactions, derived from the model, involve shrinkage estimates of both the interactions and of the gene specific error variances. Genes are classified as being associated with treatment based on the posterior odds and the local false discovery rate (f.d.r.) with a fixed cutoff. Our model-based approach also allows one to declare the non-null status of a gene by controlling the false discovery rate (FDR). It is shown in a detailed simulation study that the approach outperforms well-known competitors. We also apply the proposed methodology to two previously analyzed microarray examples. Extensions of the proposed method to paired treatments and multiple treatments are also discussed.

Journal ArticleDOI
TL;DR: In this article, the authors explore similarities and differences between the dose-finding and the stochastic approximation literatures, and also shed light on the present and future relevance of the latter to dose finding clinical trials.
Abstract: In 1951 Robbins and Monro published the seminal paper on stochastic approximation and made a specific reference to its application to the "estimation of a quantal using response, non-response data". Since the 1990s, statistical methodology for dose-finding studies has grown into an active area of research. The dose-finding problem is at its core a percentile estimation problem and is in line with what the Robbins-Monro method sets out to solve. In this light, it is quite surprising that the dose-finding literature has developed rather independently of the older stochastic approximation literature. The fact that stochastic approximation has seldom been used in actual clinical studies stands in stark contrast with its constant application in engineering and finance. In this article, I explore similarities and differences between the dose-finding and the stochastic approximation literatures. This review also sheds light on the present and future relevance of stochastic approximation to dose-finding clinical trials. Such connections will in turn steer dose-finding methodology on a rigorous course and extend its ability to handle increasingly complex clinical situations.

Journal ArticleDOI
TL;DR: In this paper, the relationship between the two measures and the effects of conditional dependence between the observable quantities on the Bayesian information measures are explored through the information provided by the sample about the parameter and prediction jointly.
Abstract: The Bayesian measure of sample information about the parameter, known as Lindley’s measure, is widely used in various problems such as developing prior distributions, models for the likelihood functions and optimal designs. The predictive information is defined similarly and used for model selection and optimal designs, though to a lesser extent. The parameter and predictive information measures are proper utility functions and have been also used in combination. Yet the relationship between the two measures and the effects of conditional dependence between the observable quantities on the Bayesian information measures remain unexplored. We address both issues. The relationship between the two information measures is explored through the information provided by the sample about the parameter and prediction jointly. The role of dependence is explored along with the interplay between the information measures, prior and sampling design. For the conditionally independent sequence of observable quantities, decompositions of the joint information characterize Lindley’s measure as the sample information about the parameter and prediction jointly and the predictive information as part of it. For the conditionally dependent case, the joint information about parameter and prediction exceeds Lindley’s measure by an amount due to the dependence. More specific results are shown for the normal linear models and a broad subfamily of the exponential family. Conditionally independent samples provide relatively little information for prediction, and the gap between the parameter and predictive information measures grows rapidly with the sample size. Three dependence structures are studied: the intraclass (IC) and serially correlated (SC) normal models, and order statistics. For IC and SC models, the information about the mean parameter decreases and the predictive information increases with the correlation, but the joint information is not monotone and has a unique minimum. Compensation of the loss of parameter information due to dependence requires larger samples. For the order statistics, the joint information exceeds Lindley’s measure by an amount which does not depend on the prior or the model for the data, but it is not monotone in the sample size and has a unique maximum.

Journal ArticleDOI
TL;DR: This review gives a nontechnical introduction to the EM algorithm for a general scientific audience, and presents a few examples characteristic of its application.
Abstract: The popularity of the EM algorithm owes much to the 1977 paper by Dempster, Laird and Rubin That paper gave the algorithm its name, identified the general form and some key properties of the algorithm and established its broad applicability in scientific research This review gives a nontechnical introduction to the algorithm for a general scientific audience, and presents a few examples characteristic of its application

Journal ArticleDOI
TL;DR: In this paper, an active basis model is proposed for learning image templates of object categories where the learning is not fully supervised, where the unknown locations, orientations and scales are incorporated as latent variables into the image generation process.
Abstract: EM algorithm is a convenient tool for maximum likelihood model fitting when the data are incomplete or when there are latent variables or hidden states. In this review article we explain that EM algorithm is a natural computational scheme for learning image templates of object categories where the learning is not fully supervised. We represent an image template by an active basis model, which is a linear composition of a selected set of localized, elongated and oriented wavelet elements that are allowed to slightly perturb their locations and orientations to account for the deformations of object shapes. The model can be easily learned when the objects in the training images are of the same pose, and appear at the same location and scale. This is often called supervised learning. In the situation where the objects may appear at different unknown locations, orientations and scales in the training images, we have to incorporate the unknown locations, orientations and scales as latent variables into the image generation process, and learn the template by EM-type algorithms. The E-step imputes the unknown locations, orientations and scales based on the currently learned template. This step can be considered self-supervision, which involves using the current template to recognize the objects in the training images. The M-step then relearns the template based on the imputed locations, orientations and scales, and this is essentially the same as supervised learning. So the EM learning process iterates between recognition and supervised learning. We illustrate this scheme by several experiments.

Journal ArticleDOI
TL;DR: The Mendel-Fisher controversy as discussed by the authors is the most famous controversy in the history of modern genetics, and it dates back to 1866 when Gregor Mendel published a seminal paper containing the foundations of the modern genetics.
Abstract: In 1866 Gregor Mendel published a seminal paper containing the foundations of modern genetics. In 1936 Ronald Fisher published a statistical analysis of Mendel’s data concluding that “the data of most, if not all, of the experiments have been falsified so as to agree closely with Mendel’s expectations.” The accusation gave rise to a controversy which has reached the present time. There are reasonable grounds to assume that a certain unconscious bias was systematically introduced in Mendel’s experimentation. Based on this assumption, a probability model that fits Mendel’s data and does not offend Fisher’s analysis is given. This reconciliation model may well be the end of the Mendel–Fisher controversy.

Journal ArticleDOI
TL;DR: This article surveys the use of the EM algorithm in a few important computational biology problems surrounding the central dogma of molecular biology: from DNA to RNA and then to proteins.
Abstract: In the past decade, computational biology has grown from a cottage industry with a handful of researchers to an attractive in- terdisciplinary eld, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the \central dogma" of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolution- ary models, and mRNA expression microarray data analysis.

Journal ArticleDOI
TL;DR: Rothman and Greenland as mentioned in this paper argued that the problem with conventional methods lies not so much with frequentism, but rather with frequentist tools for designed experiments being misapplied to observational data, and that Bayesians can and do misapply their methods similarly; they just haven't been given as much opportunity to do so.
Abstract: tribution to the merging of frequentisi and Bayesian thinking into a harmonious (even if not strictly coherent) statistical viewpoint. I will review my thinking along those lines and some inspirations for it. I agree with most of Dr. Efron's views expressed here and in Efron (2005), with these important exceptions: First, I disagree that frequentism has supplied a good set of working rules. Instead, I argue that frequentism has been a prime source of reckless overconfidence in many fields (especially but not only in the form of 0.05-level testing; see Rothman, Greenland and Lash, 2008, Chapter 10 for examples and further citations). I also disagree that Bayesians are more aggressive than frequentists in modeling. The most aggressive modeling is that which fixes unknown parameters at some known constant like zero (whence they disappear from the model and are forgotten), thus generating overconfident inferences and an illusion of simplicity; such practice is a hallmark of conventional frequentist applications in observational studies. As working rules, the problem with conventional methods lies not so much with frequentism, but rather with frequentist tools for designed experiments being misapplied to observational data (Greenland, 2005a). Bayesians can and do misapply their methods similarly; they just haven't been given as much opportunity to do so. Conversely, many frequentist as well as Bayesian tools for observational studies have been developed, especially for sensitivity analysis. But the overconfidence problem has been perpetuated by the ongoing concealment of unbelievable point-mass priors within models in order to maintain frequentist identification of target parameters. The problem can be addressed by sacrificing identification and replacing bad modeling assumptions with explicit and reasonable priors (Gustaf son, 2005;