scispace - formally typeset
Search or ask a question

Showing papers in "Statistical Science in 2011"


Journal ArticleDOI
TL;DR: In this article, the authors trace the history and development of Markov chain Monte Carlo (MCMC) from its early inception in the late 1940s through its use today, and see how the earlier stages of MC, not MCMC, research have led to the algorithms currently in use.
Abstract: We attempt to trace the history and development of Markov chain Monte Carlo (MCMC) from its early inception in the late 1940s through its use today. We see how the earlier stages of Monte Carlo (MC, not MCMC) research have led to the algorithms currently in use. More importantly, we see how the development of this methodology has not only changed our solutions to problems, but has changed the way we think about problems.

246 citations


Journal ArticleDOI
TL;DR: A survey of the progress made in modeling covariance matrices from two relatively complementary perspectives: (1) generalized linear models (GLM) or parsimony and use of covariates in low dimensions, and (2) regularization or sparsity for high-dimensional data.
Abstract: Finding an unconstrained and statistically interpretable reparameterization of a covariance matrix is still an open problem in statistics. Its solution is of central importance in covariance estimation, particularly in the recent high-dimensional data environment where enforcing the positive-definiteness constraint could be computationally expensive. We provide a survey of the progress made in modeling covariance matrices from two relatively complementary perspectives: (1) generalized linear models (GLM) or parsimony and use of covariates in low dimensions, and (2) regularization or sparsity for high-dimensional data. An emerging, unifying and powerful trend in both perspectives is that of reducing a covariance estimation problem to that of estimating a sequence of regression problems. We point out several instances of the regression-based formulation. A notable case is in sparse estimation of a precision matrix or a Gaussian graphical model leading to the fast graphical LASSO algorithm. Some advantages and limitations of the regression-based Cholesky decomposition relative to the classical spectral (eigenvalue) and variance-correlation decompositions are highlighted. The former provides an unconstrained and statistically interpretable reparameterization, and guarantees the positive-definiteness of the estimated covariance matrix. It reduces the unintuitive task of covariance estimation to that of modeling a sequence of regressions at the cost of imposing an a priori order among the variables. Elementwise regularization of the sample covariance matrix such as banding, tapering and thresholding has desirable asymptotic properties and the sparse estimated covariance matrix is positive definite with probability tending to one for large samples and dimensions.

220 citations


Journal ArticleDOI
TL;DR: This article examined the impact of random effects distribution misspecification on a variety of inferences, including prediction, inference about covariate effects, prediction of random effect and estimation of variance.
Abstract: Statistical models that include random effects are commonly used to analyze longitudinal and correlated data, often with strong and parametric assumptions about the random effects distribution. There is marked disagreement in the literature as to whether such parametric assumptions are important or innocuous. In the context of generalized linear mixed models used to analyze clustered or longitudinal data, we examine the impact of random effects distribution misspecification on a variety of inferences, including prediction, inference about covariate effects, prediction of random effects and estimation of random effects variances. We describe examples, theoretical calculations and simulations to elucidate situations in which the specification is and is not important. A key conclusion is the large degree of robustness of maximum likelihood for a wide variety of commonly encountered situations.

171 citations


Journal ArticleDOI
TL;DR: In this article, a reversal of roles of the user and the multiple testing procedure is proposed, where the user can choose the rejected set freely, and the test procedure returns a confidence statement on the number of false rejections incurred.
Abstract: Motivated by the practice of exploratory research, we formulate an approach to multiple testing that reverses the conventional roles of the user and the multiple testing procedure. Traditionally, the user chooses the error criterion, and the procedure the resulting rejected set. Instead, we propose to let the user choose the rejected set freely, and to let the multiple testing procedure return a confidence statement on the number of false rejections incurred. In our approach, such confidence statements are simultaneous for all choices of the rejected set, so that post hoc selection of the rejected set does not compromise their validity. The proposed reversal of roles requires nothing more than a review of the familiar closed testing procedure, but with a focus on the non-consonant rejections that this procedure makes. We suggest several shortcuts to avoid the computational problems associated with closed testing.

159 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a method to estimate conditional causal odds ratios, which express the effect of an arbitrary exposure on a dichotomous outcome conditional on the exposure level, instrumental variable and measured covariates.
Abstract: Inference for causal effects can benefit from the availability of an instrumental variable (IV) which, by definition, is associated with the given exposure, but not with the outcome of interest other than through a causal exposure effect. Estimation methods for instrumental variables are now well established for continuous outcomes, but much less so for dichotomous outcomes. In this article we review IV estimation of so-called conditional causal odds ratios which express the effect of an arbitrary exposure on a dichotomous outcome conditional on the exposure level, instrumental variable and measured covariates. In addition, we propose IV estimators of so-called marginal causal odds ratios which express the effect of an arbitrary exposure on a dichotomous outcome at the population level, and are therefore of greater public health relevance. We explore interconnections between the different estimators and support the results with extensive simulation studies and three applications.

124 citations


Journal ArticleDOI
TL;DR: It is shown that the proportion of true statements in the Bayes case depends critically on the presence of linearity in the model; and with departure from this linearity the Bayesian approach can be seriously misleading.
Abstract: Bayes (1763) introduced the observed likelihood function to statistical inference and pro- vided a weight function to calibrate the parameter; he also introduced a condence results were dierent when the model was not location. This paper examines the occurrence of true statements from the Bayes approach and from the condence approach, and shows that the proportion of true statements in the Bayes case depends critically on the presence of linearity in the model; and with departure from this linearity the Bayes approach can be seriously misleading. Bayesian integration of weighted likelihood provides a rst order linear approximation to condence, but without linearity can give substantially incorrect results.

110 citations


Journal ArticleDOI
TL;DR: A statistical model for estimating isoform abundance from RNA-Seq data is introduced and is flexible enough to accommodate both single end and paired end RNA- Seq data and sampling bias along the length of the transcript.
Abstract: Recently, ultra high-throughput sequencing of RNA (RNA-Seq) has been developed as an approach for analysis of gene expression. By obtaining tens or even hundreds of millions of reads of transcribed sequences, an RNA-Seq experiment can offer a comprehensive survey of the population of genes (transcripts) in any sample of interest. This paper introduces a statistical model for estimating isoform abundance from RNA-Seq data and is flexible enough to accommodate both single end and paired end RNA-Seq data and sampling bias along the length of the transcript. Based on the derivation of minimal sufficient statistics for the model, a computationally feasible implementation of the maximum likelihood estimator of the model is provided. Further, it is shown that using paired end RNA-Seq provides more accurate isoform abundance estimates than single end sequencing at fixed sequencing depth. Simulation studies are also given.

98 citations


Journal ArticleDOI
TL;DR: This paper presents a unified treatment of Gaussian process models that extends to data from the exponential dispersion family and to survival data and describes a general framework that employs mixture priors.
Abstract: This paper presents a unified treatment of Gaussian process models that extends to data from the exponential dispersion family and to survival data. Our specific interest is in the analysis of data sets with predictors that have an a priori unknown form of possibly nonlinear associations to the response. The modeling approach we describe incorporates Gaussian processes in a generalized linear model framework to obtain a class of nonparametric regression models where the covariance matrix depends on the predictors. We consider, in particular, continuous, categorical and count responses. We also look into models that account for survival outcomes. We explore alternative covariance formulations for the Gaussian process prior and demonstrate the flexibility of the construction. Next, we focus on the important problem of selecting variables from the set of possible predictors and describe a general framework that employs mixture priors. We compare alternative MCMC strategies for posterior inference and achieve a computationally efficient and practical approach. We demonstrate performances on simulated and benchmark data sets.

90 citations


Journal ArticleDOI
TL;DR: It is argued that introductory courses often mis-characterize the process of statistical inference and an alternative "big picture" depiction is proposed.
Abstract: Statistics has moved beyond the frequentist-Bayesian controver- sies of the past. Where does this leave our ability to interpret results? I suggest that a philosophy compatible with statistical practice, labeled here statisti- cal pragmatism, serves as a foundation for inference. Statistical pragmatism is inclusive and emphasizes the assumptions that connect statistical models with observed data. I argue that introductory courses often mischaracterize the process of statistical inference and I propose an alternative "big picture" depiction. The protracted battle for the foundations of statis- tics, joined vociferously by Fisher, Jeffreys, Neyman, Savage and many disciples, has been deeply illuminat- ing, but it has left statistics without a philosophy that matches contemporary attitudes. Because each camp took as its goal exclusive ownership of inference, each was doomed to failure. We have all, or nearly all, moved past these old debates, yet our textbook expla- nations have not caught up with the eclecticism of sta- tistical practice. The difficulties go both ways. Bayesians have de- nied the utility of confidence and statistical signifi- cance, attempting to sweep aside the obvious success of these concepts in applied work. Meanwhile, for their part, frequentists have ignored the possibility of infer- ence about unique events despite their ubiquitous oc- currence throughout science. Furthermore, interpreta- tions of posterior probability in terms of subjective be- lief, or confidence in terms of long-run frequency, give students a limited and sometimes confusing view of the nature of statistical inference. When used to introduce the expression of uncertainty based on a random sam-

87 citations


Journal ArticleDOI
TL;DR: In this article, a nonnegative martingale with initial value equal to one measures evidence against a probabilistic hypothesis and the inverse of its value at some stopping time can be interpreted as a Bayes factor.
Abstract: A nonnegative martingale with initial value equal to one measures evidence against a probabilistic hypothesis. The inverse of its value at some stopping time can be interpreted as a Bayes factor. If we exaggerate the ev- idence by considering the largest value attained so far by such a martingale, the exaggeration will be limited, and there are systematic ways to eliminate it. The inverse of the exaggerated value at some stopping time can be in- terpreted as a p-value. We give a simple characterization of all increasing functions that eliminate the exaggeration.

74 citations


Journal ArticleDOI
TL;DR: For example, the authors showed that for regular models where asymptotic normality holds, Jeffreys' general rule prior, the positive square root of the determinant of the Fisher information matrix, enjoys many optimality properties in the absence of nuisance parameters.
Abstract: Bayesian methods are increasingly applied in these days in the theory and practice of statistics. Any Bayesian inference depends on a likelihood and a prior. Ideally one would like to elicit a prior from related sources of information or past data. However, in its absence, Bayesian methods need to rely on some “objective” or “default” priors, and the resulting posterior inference can still be quite valuable. Not surprisingly, over the years, the catalog of objective priors also has become prohibitively large, and one has to set some specific criteria for the selection of such priors. Our aim is to review some of these criteria, compare their performance, and illustrate them with some simple examples. While for very large sample sizes, it does not possibly matter what objective prior one uses, the selection of such a prior does influence inference for small or moderate samples. For regular models where asymptotic normality holds, Jeffreys’ general rule prior, the positive square root of the determinant of the Fisher information matrix, enjoys many optimality properties in the absence of nuisance parameters. In the presence of nuisance parameters, however, there are many other priors which emerge as optimal depending on the criterion selected. One new feature in this article is that a prior different from Jeffreys’ is shown to be optimal under the chi-square divergence criterion even in the absence of nuisance parameters. The latter is also invariant under one-to-one reparameterization.

Journal ArticleDOI
TL;DR: In this paper, the Calibrated Bayesian (CB) approach to statistical inference is described and applications of the CB approach to normal mod- els are described, both for monotone and non-monotone missing data patterns.
Abstract: It is argued that the Calibrated Bayesian (CB) approach to statistical inference capitalizes on the strength of Bayesian and frequen- tist approaches to statistical inference. In the CB approach, inferences under a particular model are Bayesian, but frequentist methods are useful for model development and model checking. In this article the CB approach is outlined. Bayesian methods for missing data are then reviewed from a CB perspective. The basic theory of the Bayesian ap- proach, and the closely related technique of multiple imputation, is described. Then applications of the Bayesian approach to normal mod- els are described, both for monotone and nonmonotone missing data patterns. Sequential Regression Multivariate Imputation and Penalized Spline of Propensity Models are presented as two useful approaches for relaxing distributional assumptions.

Journal ArticleDOI
TL;DR: The purpose of this work is to characterize priors that can be used as conservative inputs to an analysis relative to the base prior, in terms of a priori measures of prior-data conflict.
Abstract: A question of some interest is how to characterize the amount of information that a prior puts into a statistical analysis. Rather than a general characterization, we provide an approach to characterizing the amount of information a prior puts into an analysis, when compared to another base prior. The base prior is considered to be the prior that best reflects the current available information. Our purpose then is to characterize priors that can be used as conservative inputs to an analysis relative to the base prior. The characterization that we provide is in terms of a priori measures of prior-data conflict.

Journal ArticleDOI
TL;DR: Probability sampling designs and randomization inference are widely accepted as the standard approach in sample surveys, particularly by federal agencies and other survey organizations conducting complex large scale surveys on topics related to public policy.
Abstract: According to Hansen, Madow and Tepping [J. Amer. Statist. Assoc. 78 (1983) 776–793], “Probability sampling designs and randomization inference are widely accepted as the standard approach in sample surveys.” In this article, reasons are advanced for the wide use of this design-based approach, particularly by federal agencies and other survey organizations conducting complex large scale surveys on topics related to public policy. Impact of Bayesian methods in survey sampling is also discussed in two different directions: nonparametric calibrated Bayesian inferences from large samples and hierarchical Bayes methods for small area estimation based on parametric models.

Journal ArticleDOI
TL;DR: In this article, a simple and quick general test to screen for numerical anomalies is presented, which can be applied, for example, to electoral processes, both electronic and manual, using vote counts in officially published voting units, which are typically widely available and institutionally backed.
Abstract: A simple and quick general test to screen for numerical anomalies is presented. It can be applied, for example, to electoral processes, both electronic and manual. It uses vote counts in officially published voting units, which are typically widely available and institutionally backed. The test examines the frequencies of digits on voting counts and rests on the First (NBL1) and Second Digit Newcomb–Benford Law (NBL2), and in a novel generalization of the law under restrictions of the maximum number of voters per unit (RNBL2). We apply the test to the 2004 USA presidential elections, the Puerto Rico (1996, 2000 and 2004) governor elections, the 2004 Venezuelan presidential recall referendum (RRP) and the previous 2000 Venezuelan Presidential election. The NBL2 is compellingly rejected only in the Venezuelan referendum and only for electronic voting units. Our original suggestion on the RRP (Pericchi and Torres, 2004) was criticized by The Carter Center report (2005). Acknowledging this, Mebane (2006) and The Economist (US) (2007) presented voting models and case studies in favor of NBL2. Further evidence is presented here. Moreover, under the RNBL2, Mebane’s voting models are valid under wider conditions. The adequacy of the law is assessed through Bayes Factors (and corrections of p-values) instead of significance testing, since for large sample sizes and fixed α levels the null hypothesis is over rejected. Our tests are extremely simple and can become a standard screening that a fair electoral process should pass.

Journal ArticleDOI
TL;DR: The score test is converted to a "pseudo-score" test derived from Besag's pseudo-likelihood, and to a class of diagnostics based on point process residuals, and new tools such as the compensator of the K-function for testing other fitted models are developed.
Abstract: We develop new tools for formal inference and informal model validation in the analysis of spatial point pattern data. The score test is gen- eralized to a "pseudo-score" test derived from Besag's pseudo-likelihood, and to a class of diagnostics based on point process residuals. The results lend theoretical support to the established practice of using functional sum- mary statistics, such as Ripley's K-function, when testing for complete spa- tial randomness; and they provide new tools such as the compensator of the K-function for testing other fitted models. The results also support localiza- tion methods such as the scan statistic and smoothed residual plots. Software for computing the diagnostics is provided.

Journal ArticleDOI
TL;DR: The authors argue that Bayesian approaches with formal and informal assessments of priors and likelihood functions are well accepted and should become the norm in public settings and provide the primary way to respond to questions raised in these settings and the numbers and diversity of Bayesian applications have grown dramatically in recent years.
Abstract: Starting with the neo-Bayesian revival of the 1950s, many statisticians argued that it was inappropriate to use Bayesian methods, and in particular subjective Bayesian methods in governmental and public policy settings because of their reliance upon prior distributions. But the Bayesian framework often provides the primary way to respond to questions raised in these settings and the numbers and diversity of Bayesian applications have grown dramatically in recent years. Through a series of examples, both historical and recent, we argue that Bayesian approaches with formal and informal assessments of priors AND likelihood functions are well accepted and should become the norm in public settings. Our examples include census-taking and small area estimation, US election night forecasting, studies reported to the US Food and Drug Administration, assessing global climate change, and measuring potential declines in disability among the elderly.

Journal ArticleDOI
TL;DR: In this article, a catch-all approach is proposed to match the joint probability distribution of the observable time series, including long-term features of the dynamics that underpin the data, rather than short-term prediction.
Abstract: Using a time series model to mimic an observed time series has a long history. However, with regard to this objective, conventional estimation methods for discrete-time dynamical models are frequently found to be wanting. In fact, they are characteristically misguided in at least two respects: (i) assuming that there is a true model; (ii) evaluating the efficacy of the estimation as if the postulated model is true. There are numerous examples of models, when fitted by conventional methods, that fail to capture some of the most basic global features of the data, such as cycles with good matching periods, singularities of spectral density functions (especially at the origin) and others. We argue that the shortcomings need not always be due to the model formulation but the inadequacy of the conventional fitting methods. After all, all models are wrong, but some are useful if they are fitted properly. The practical issue becomes one of how to best fit the model to data. Thus, in the absence of a true model, we prefer an alternative approach to conventional model fitting that typically involves one-step-ahead prediction errors. Our primary aim is to match the joint probability distribution of the observable time series, including long-term features of the dynamics that underpin the data, such as cycles, long memory and others, rather than short-term prediction. For want of a better name, we call this specific aim feature matching. The challenges of model misspecification, measurement errors and the scarcity of data are forever present in real time series modeling. In this paper, by synthesizing earlier attempts into an extended-likelihood, we develop a systematic approach to empirical time series analysis to address these challenges and to aim at achieving better feature matching. Rigorous proofs are included but relegated to the Appendix. Numerical results, based on both simulations and real data, suggest that the proposed catch-all approach has several advantages over the conventional methods, especially when the time series is short or with strong cyclical fluctuations. We conclude with listing directions that require further development.

Journal ArticleDOI
TL;DR: In this article, it was shown that the spectrum of the sandwich operator dominates the spectrum in the sense that the ordered elements of the former are all less than or equal to the corresponding elements in the latter.
Abstract: The reversible Markov chains that drive the data augmentation (DA) and sandwich algorithms define self-adjoint operators whose spectra encode the convergence properties of the algorithms. When the target distribution has uncountable support, as is nearly always the case in practice, it is generally quite difficult to get a handle on these spectra. We show that, if the augmentation space is finite, then (under regularity conditions) the operators defined by the DA and sandwich chains are compact, and the spectra are finite subsets of [0; 1). Moreover, we prove that the spectrum of the sandwich operator dominates the spectrum of the DA operator in the sense that the ordered elements of the former are all less than or equal to the corresponding elements of the latter. As a concrete example, we study a widely used DA algorithm for the exploration of posterior densities associated with Bayesian mixture models (Diebolt and Robert, 1994). In particular, we compare this mixture DA algorithm with an alternative algorithm proposed by Fr¨ uhwirth-Schnatter (2001) that is based on random label switching.

Journal ArticleDOI
TL;DR: In this article, the authors developed and evaluated point and interval estimates for the random effects θi, having made observations yi|θi ∼ind N[θ i, Vi], i = 1, …, k that follow a two-level Normal hierarchical model.
Abstract: We develop and evaluate point and interval estimates for the random effects θi, having made observations yi|θi ∼ind N[θi, Vi], i = 1, …, k that follow a two-level Normal hierarchical model. Fitting this model requires assessing the Level-2 variance A ≡ Var(θi) to estimate shrinkages Bi ≡ Vi / (Vi + A) toward a (possibly estimated) subspace, with Bi as the target because the conditional means and variances of θi depend linearly on Bi, not on A. Adjustment for density maximization, ADM, can do the fitting for any smooth prior on A. Like the MLE, ADM bases inferences on two derivatives, but ADM can approximate with any Pearson family, with Beta distributions being appropriate because shrinkage factors satisfy 0 ≤ Bi ≤ 1. Our emphasis is on frequency properties, which leads to adopting a uniform prior on A ≥ 0, which then puts Stein’s harmonic prior (SHP) on the k random effects. It is known for the “equal variances case” V1 = ⋯ = Vk that formal Bayes procedures for this prior produce admissible minimax estimates of the random effects, and that the posterior variances are large enough to provide confidence intervals that meet their nominal coverages. Similar results are seen to hold for our approximating “ADM-SHP” procedure for equal variances and also for the unequal variances situations checked here. For shrinkage coefficient estimation, the ADM-SHP procedure allows an alternative frequency interpretation. Writing L(A) as the likelihood of Bi with i fixed, ADM-SHP estimates Bi as Bi = Vi / (Vi + Â) with  ≡ argmax (A ∗ L(A)). This justifies the term “adjustment for likelihood maximization,” ALM.

Journal ArticleDOI
TL;DR: In the case of the recall referendum in Venezuela on August 15, 2004, this article found that the deviation pattern between precincts, based on the relationship between the signatures collected to request the referendum in November 2003 (the so-called, Reafirmazo), and the YES votes on August15, is positive and significantly correlated with the deviation patterns in the relationships between exit polls and votes in those same precincts, and therefore what causes its correlation is precisely the presence of fraud.
Abstract: This study analyzes diverse hypotheses of electronic fraud in the Recall Referendum celebrated in Venezuela on August 15, 2004. We define fraud as the difference between the elector’s intent, and the official vote tally. Our null hypothesis is that there was no fraud, and we attempt to search for evidence that will allow us to reject this hypothesis. We find no evidence that fraud was committed by applying numerical maximums to machines in some precincts. Equally, we discard any hypothesis that implies altering some machines and not others, at each electoral precinct, because the variation patterns between machines at each precinct are normal. However, the statistical evidence is compatible with the occurrence of fraud that has affected every machine in a single precinct, but differentially more in some precincts than others. We find that the deviation pattern between precincts, based on the relationship between the signatures collected to request the referendum in November 2003 (the so-called, Reafirmazo), and the YES votes on August 15, is positive and significantly correlated with the deviation pattern in the relationship between exit polls and votes in those same precincts. In other words, those precincts in which, according to the number of signatures, there are an unusually low number of YES votes (i.e., votes to impeach the president), is also where, according to the exit polls, the same thing occurs. Using statistical techniques, we discard the fact that this is due to spurious errors in the data or to random coefficients in such relationships. We interpret that it is because both the signatures and the exit polls are imperfect measurements of the elector’s intent but not of the possible fraud, and therefore what causes its correlation is precisely the presence of fraud. Moreover, we find that the sample used in the audit conducted on August 18 was neither random nor representative of the entire universe of precincts. In this sample, the Reafirmazo signatures are associated with 10 percent more votes than in the non-audited precincts. We built 1,000 random samples in non-audited precincts and found that this result occurs with a frequency lower than 1 percent. This result is compatible with the hypothesis that the sample for the audit was chosen only among those precincts whose results had not been altered.

Journal ArticleDOI
TL;DR: A critical review of recent statistical literature on the Venezuelan recall referendum can be found in this paper, where the main conclusion is that there were a significant number of irregularities in the vote counting that introduced a bias in favor of the winning option.
Abstract: The best way to reconcile political actors in a controversial electoral process is a full audit. When this is not possible, statistical tools may be useful for measuring the likelihood of the results. The Venezuelan recall referendum (2004) provides a suitable dataset for thinking about this important problem. The cost of errors in examining an allegation of electoral fraud can be enormous. They can range from legitimizing an unfair election to supporting an unfounded accusation, with serious political implications. For this reason, we must be very selective about data, hypotheses and test statistics that will be used. This article offers a critical review of recent statistical literature on the Venezuelan referendum. In addition, we propose a testing methodology, based exclusively on vote counting, that is potentially useful in election forensics. The referendum is reexamined, offering new and intriguing aspects to previous analyses. The main conclusion is that there were a significant number of irregularities in the vote counting that introduced a bias in favor of the winning option. A plausible scenario in which the irregularities could overturn the results is also discussed.

Journal ArticleDOI
TL;DR: In this paper, the results of two major exit polls conducted during the recall referendum that took place in Venezuela on August 15, 2004, are compared to the official results of the Venezuelan National Electoral Council “Consejo Nacional Electoral” (CNE).
Abstract: We present a simulation-based study in which the results of two major exit polls conducted during the recall referendum that took place in Venezuela on August 15, 2004, are compared to the official results of the Venezuelan National Electoral Council “Consejo Nacional Electoral” (CNE). The two exit polls considered here were conducted independently by Sumate, a nongovernmental organization, and Primero Justicia, a political party. We find significant discrepancies between the exit poll data and the official CNE results in about 60% of the voting centers that were sampled in these polls. We show that discrepancies between exit polls and official results are not due to a biased selection of the voting centers or to problems related to the size of the samples taken at each center. We found discrepancies in all the states where the polls were conducted. We do not have enough information on the exit poll data to determine whether the observed discrepancies are the consequence of systematic biases in the selection of the people interviewed by the pollsters around the country. Neither do we have information to study the possibility of a high number of false or nonrespondents. We have limited data suggesting that the discrepancies are not due to a drastic change in the voting patterns that occurred after the exit polls were conducted. We notice that the two exit polls were done independently and had few centers in common, yet their overall results were very similar.

Journal ArticleDOI
Heping Zhang1
TL;DR: The challenges and methods from a statistical perspective are presented and genetic association studies are focused on, which are of significant public health importance.
Abstract: Identifying the risk factors for mental illnesses is of significant public health importance. Diagnosis, stigma associated with mental illnesses, comorbidity, and complex etiologies, among others, make it very challenging to study mental disorders. Genetic studies of mental illnesses date back at least a century ago, beginning with descriptive studies based on Mendelian laws of inheritance. A variety of study designs including twin studies, family studies, linkage analysis, and more recently, genomewide association studies have been employed to study the genetics of mental illnesses, or complex diseases in general. In this paper, I will present the challenges and methods from a statistical perspective and focus on genetic association studies.

Journal ArticleDOI
TL;DR: The significance of the high linear correlation (0.99) between the number of requesting signatures for the recall petition and number of opposition votes in computerized centers is analyzed in this article.
Abstract: On August 15th, 2004, Venezuelans had the opportunity to vote in a Presidential Recall Referendum to decide whether or not President Hugo Chavez should be removed from office. The process was largely computerized using a touch-screen system. In general the ballots were not manually counted. The significance of the high linear correlation (0.99) between the number of requesting signatures for the recall petition and the number of opposition votes in computerized centers is analyzed. The same-day audit was found to be not only ineffective but a source of suspicion. Official results were compared with the 1998 presidential election and other electoral events and distortions were found.

Journal ArticleDOI
TL;DR: Brown and Kass as discussed by the authors proposed a new philosophy of "pragmatism", which replaces frequentism and Bayesianism with a more ecumenical and practical approach, which they put forward as a way to enshrine in foundations what good statisticians already do.
Abstract: In this piece, Rob Kass brings to bear his insights from a long career in both theoretical and applied statistics to reflect on the disconnect between what we teach and what we do. Not content to focus just on didactic and professional matters, the focus of his 2009 article (Brown and Kass, 2009), in this commentary he proposes a remake of the foundations of inference. He proposes to replace two fundamental “isms”—frequentism and Bayesianism— with a new “ism”—“pragmatism;” an approach that he puts forward as more ecumenical and practical, enshrining in foundations what good statisticians already do. There is a lot to commend in this piece, particularly the emphasis on the subjunctive nature of all model-based inference, and I am sure the other commentators will do justice to its strengths. But in spite of its clarity and initial promise, I found Kass’s proposal ultimately unsatisfying. It seems less a new foundational philosophy than a call for a truce, one of many over the years. It is telling that all of the examples show practical equivalence between Bayesian and frequentist estimates, so the biggest stakes here seem to be what people think, not what they do. The difficulty with “big tent” foundations is that in circumstances where different philosophies within the tent dictate different actions, there is no guidance as to what route to take. It is interesting to contrast this with the philosophic version of “pragmatism,” originally put forth by the polymath C. S. Peirce in the late 1800s [also

Journal ArticleDOI
TL;DR: In this paper, statistical comparisons of electoral variables are made between groups of electronic voting machines and voting centers classified by types of transmissions according to the volume of traffic in incoming and outgoing data of machines from and toward the National Electoral Council (CNE) totalizing servers.
Abstract: Statistical comparisons of electoral variables are made between groups of electronic voting machines and voting centers classified by types of transmissions according to the volume of traffic in incoming and outgoing data of machines from and toward the National Electoral Council (CNE) totalizing servers. One unexpectedly finds two types of behavior in wire telephony data transmissions and only one type where cellular telephony is employed, contravening any reasonable electoral normative. Differentiation in data transmissions arise when comparing number of incoming and outgoing data bytes per machine against total number of votes per machine reported officially by the CNE. The respective distributions of electoral variables for each type of transmission show that the groups classified by it do not correspond to random sets of the electoral universe. In particular, the distributions for the NO percentage of votes per machine differ statistically across groups. The presidential elections of 1998, 2000 and the 2004 Presidential Recall Referendum (2004 PRR) are compared according to the type of transmissions in 2004 PRR. Statistically, the difference between the empirical distributions of the 2004 PRR NO results and the 2000 Chavez votes results by voting centers is not significant.

Journal ArticleDOI
TL;DR: In this article, the authors add another anchoring point: calibration, which is an objective, not subjective process, although some subjectivity (or scientific judgment) is necessarily involved in the choice of events used in the calibration.
Abstract: Kass describes probability theory as anchored upon physical randomization (coin flips, die rolls and the like) but being useful more generally as a mathematical model. I completely agree but would also add another anchoring point: calibration. Calibration of probabil ity assessments is an objective, not subjective process, although some subjectivity (or scientific judgment) is necessarily involved in the choice of events used in the calibration. In that way, Bayesian probability cal ibration is closely connected to frequentist probability statements, in that both are conditional on "reference sets" of comparable events. We discuss these issues further in Chapter 1 of Bayesian Data Analysis, featur ing examples from sports betting and record linkage.

Journal ArticleDOI
TL;DR: A referendum to recall President Hugo Chavez was held in Venezuela in August of 2004 as mentioned in this paper, which was monitored by various international groups including the Organization of American States and the Carter Center (both of which declared that the referendum had been conducted in a free and transparent manner).
Abstract: A referendum to recall President Hugo Chavez was held in Venezuela in August of 2004. In the referendum, voters were to vote YES if they wished to recall the President and NO if they wanted him to continue in office. The official results were 59% NO and 41% YES. Even though the election was monitored by various international groups including the Organization of American States and the Carter Center (both of which declared that the referendum had been conducted in a free and transparent manner), the outcome of the election was questioned by other groups both inside and outside of Venezuela. The collection of manuscripts that comprise this issue of Statistical Science discusses the general topic of election forensics but also focuses on different statistical approaches to explore, post-election, whether irregularities in the voting, vote transmission or vote counting processes could be detected in the 2004 presidential recall referendum. In this introduction to the Venezuela issue, we discuss the more recent literature on post-election auditing, describe the institutional context for the 2004 Venezuelan referendum, and briefly introduce each of the five contributions.

Journal ArticleDOI
TL;DR: While Fraser's paper sheds new insights on the evaluation of Bayesian bounds in a frequentist light, the main point of the paper seems to be a radical reexamination of the relevance of the whole Bayesian approach to confidence regions.
Abstract: While Fraser's paper sheds new insights on the evaluation of Bayesian bounds in a frequentist light, the main point of the paper seems to be a radical reexamination of the relevance of the whole Bayesian approach to confidence regions. This is surprising given that the disagreement between classical and frequentist perspectives is usually quite limited