scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Consideration sets, intentions and the inclusion of "don't know" in a two-stage model for voter choice

TL;DR: In this paper, the authors present a statistical model for voter choice that incorporates a consideration set stage and final vote intention stage and explicitly accounts for the three types of missing data encountered in polling.
About: This article is published in International Journal of Forecasting.The article was published on 2004-05-01 and is currently open access. It has received 20 citations till now. The article focuses on the topics: Multinomial probit & Missing data.

Summary (3 min read)

1 INTRODUCTION

  • Modeling and forecasting voter choice is a key topic in the political science literature.
  • Given the use of the models for such an important topic, it should not come as a surprise that there is abundant literature on the design and implementation of models for voter choice.
  • The key problem for the inclusion of non-voters, that is, those who do not know and those who do not want to say into a forecasting model is of course the fact that there is no answer to the question concerning the voting intention for these individuals.
  • Hence, it is of interest to include questions on consideration sets in the questionnaire, also since these may receive less “do not know” or “do not want to say” responses.
  • In Section 5 the authors report on the empirical results and they provide a discussion of the main conclusions that can be drawn from these results.

2.1 Preliminaries

  • The authors denote the stated vote intention prior to the election by the variable di, which can take J + 3 different values.
  • If di = J + 1 the individual does not intend to vote, while di = J +2 indicates that the individual does not know yet and di = J +3 indicates that the individual does not want to say his or her vote intention.
  • Individuals are assumed not to consider all parties in their decision process, but to choose one party from a particular subset of the parties at a particular point in time.
  • Hence, for each individual there are Q = 2J potential consideration sets.
  • Note that an individual may have an empty consideration set, which implies that he or she does not consider any party at all.

2.2 Two-stage Model

  • The goal of this paper is to construct a choice model for the vote intention of the individuals stated prior to the elections and to use this model to predict the actual vote of the individuals at the election.
  • In the second stage the authors model the actual choice among the parties in the consideration set of this individual.
  • The parameter therefore models the effect of state-dependence.
  • Additionally, the authors assume that an individual may always decide not to vote at all and hence they impose that ci;J+1 = 1 for all i.
  • For identification purposes the authors impose that J+1 equals zero.

3.1 Estimation

  • To estimate the model parameters in their two-stage model, the authors consider the likelihood function for the stated considerations and party choices of the individuals, c = fci; i = 1; : : : ;.
  • The likelihood function involves the product of the probability that the consideration set of individual i is c i and the probability that the party choice is di given ci for all individuals, and thus for a given individual the product of a multivariate and a multinomial probit probability.
  • The authors assume a flat prior distribution for the model parameters and use the Gibbs sampling approach of Geman and Geman (1984) to obtain posterior results.
  • The unobserved utilitiesD ij and C i are sampled alongside the model parameters (data augmentation), see Tanner and Wong (1987).
  • For details on a Gibbs sampling approach in multivariate and multinomial probit models, the authors refer to Albert and Chib (1993), McCulloch and Rossi (1994), Chib and Greenberg (1998), and McCulloch, Polson and Rossi (2000).

3.2 Forecasting

  • The goal of this paper is to model the vote intention of individuals prior to the election and to use the estimated model to predict the actual voting behavior of the individuals in the upcoming election.
  • These predictive probabilities can easily be computed using the Gibbs output.
  • If the vote intention is unknown, the authors use their model and parameter estimates to predict the vote.
  • In forecasting, the authors do not reweight the sample.
  • In that case the choice simply corresponds to the largest utility.

4.1 Political Context

  • The political context in this paper concerns the Dutch 1994 parliament elections.
  • Shifts in the preference distribution can have substantial political consequences.
  • To form the government a coalition of two or more parties has to represent 50% or more of the voters.
  • Currently the four important political parties are CDA (Christian ), D’66 , PvdA (Social ), and VVD .
  • The government of The Netherlands has formed a coalition between two or three of these four parties for the past several decades.

4.2 Data

  • The authors data were collected by the market research agency Inter/View in April and June 1994 (elections were held in June 1994).
  • The authors combine the other parties into one “Other party”(j = 5) category.
  • As explanatory variables, individual-specific socio-economic and demographic variables are contained in the k-dimensional vector xi.
  • The authors believe this stated voting behavior to be the best yardstick available for validation in the absence of the option of retrieving the real votes.

4.3 Descriptive Statistics of the Data

  • Table 1 summarizes the actual votes in 1989, the intentions in 1994, the consideration set 1994, the actual votes in 1994, as well as a cross-tabulation of the vote intention variable with the explanatory variables.
  • The VVD appear to be home owners with high incomes.
  • This category could consider politics to be something quite distinct from their own interests.
  • Finally, the “do not want to say” category does not have a particularly clear profile.
  • Only for 2289 (49.5%) of the individuals, the vote intention is the same as their actual vote, which may be considered as a rather low number.

5 RESULTS AND COMPARISONS

  • Next to the two-stage model presented above, the authors estimate three simplified versions of it.
  • Note that this model does not predict the “no vote” option.
  • Table 5 shows the posterior mean and standard deviation of the covariance matrix of the multivariate probit model for the consideration set formation.
  • And, if the authors finally add the consideration set stage as in model D, the prediction becomes even better.
  • Similarly, VVD appears to be competing less with PvdA and D’66 in model D1 relative to C.

6 CONCLUSIONS AND LIMITATIONS

  • The prediction of outcomes of elections for political parties and candidates receives much interest in democracies across the world, an interest that is reflected intensive media coverage.
  • Rather than imputing those missing responses based on available procedures, the authors have postulated a plausible behavioral mechanism for them that enables forecasts of votes of subjects in each of those categories.
  • Loosely speaking, this sampler switches the two steps in the Metropolis-Hasting sampler of Metropolis et al. (1953).
  • To derive the full conditional posterior distributions, the authors consider three cases.
  • Sampling of and Æ Model (A.4) can be seen as a multivariate regression model with regression parameters and Æ.

Did you find this useful? Give us your feedback

Citations
More filters
Posted Content
20 Feb 2006
TL;DR: A list of all publications over the period 1956-2005, as reported in the Rotterdam Econometric Institute Reprint series during 1957-2005 can be found in this article.
Abstract: This paper contains a list of all publications over the period 1956-2005, as reported in the Rotterdam Econometric Institute Reprint series during 1957-2005.

84 citations

Journal ArticleDOI
TL;DR: This paper illustrates the computational advantages of Bayesian estimation using MCMC in several popular latent variable models and implies that Bayesian parameter estimation is faster than classical maximum likelihood estimation.
Abstract: Recent developments in Markov chain Monte Carlo [MCMC] methods have increased the popularity of Bayesian inference in many fields of research in economics, such as marketing research and financial econometrics. Gibbs sampling in combination with data augmentation allows inference in statistical/econometric models with many unobserved variables. The likelihood functions of these models may contain many integrals, which often makes a standard classical analysis difficult or even unfeasible. The advantage of the Bayesian approach using MCMC is that one only has to consider the likelihood function conditional on the unobserved variables. In many cases this implies that Bayesian parameter estimation is faster than classical maximum likelihood estimation. In this paper we illustrate the computational advantages of Bayesian estimation using MCMC in several popular latent variable models.

39 citations

Journal ArticleDOI
TL;DR: In this paper, the theoretical foundations of consideration set models of electoral choice are discussed and three methodological issues: research design, measurement, and statistical modelling. And they recommend the use of pre-election panel surveys, direct measures of electoral consideration sets and statistical models suitable for analysing dichotomous variables and voter-party dyads.

24 citations

Journal ArticleDOI
TL;DR: In this article, the role of party systems as institutional contexts and the relationship between social pressure and information sharing as mechanisms of influence are discussed. But they are not well understood, and the evidence supports the second proposition, although less unequivocally.
Abstract: This article addresses two aspects of social network influence on voters’ electoral choices that are not well understood: the role of party systems as institutional contexts and the relationship between social pressure and information sharing as mechanisms of influence. It argues that in the cleavage-based multiparty systems of Western Europe, discussant influence at elections occurs in two stages. First, discussants place social pressure on voters to opt for parties from the same ideological camp. Secondly, by providing information, discussants influence which parties voters eventually choose out of these restricted ‘consideration sets’. The study tests these assumptions using a panel survey conducted at the 2009 German federal election. The first proposition is clearly confirmed, and the evidence supports the second proposition, although less unequivocally.

14 citations

Journal ArticleDOI
TL;DR: In this paper, the authors show that the common practice of estimating models using only the set of alternatives deemed to be in the set considered by a consumer will usually result in estimated parameters that are biased due to a sample selection effect.
Abstract: The concept of a consideration set has become a central concept in the study of consumer behavior. This paper shows that the common practice of estimating models using only the set of alternatives deemed to be in the set considered by a consumer will usually result in estimated parameters that are biased due to a sample selection effect. This effect is generic to many consideration set models and can be large in practice. To overcome this problem, models of an antecedent volition process that defines consideration will effectively need to incorporate the selection mechanism used for inclusion of choice alternatives in the consideration set.

10 citations

References
More filters
Journal ArticleDOI
TL;DR: In this article, a modified Monte Carlo integration over configuration space is used to investigate the properties of a two-dimensional rigid-sphere system with a set of interacting individual molecules, and the results are compared to free volume equations of state and a four-term virial coefficient expansion.
Abstract: A general method, suitable for fast computing machines, for investigating such properties as equations of state for substances consisting of interacting individual molecules is described. The method consists of a modified Monte Carlo integration over configuration space. Results for the two‐dimensional rigid‐sphere system have been obtained on the Los Alamos MANIAC and are presented here. These results are compared to the free volume equation of state and to a four‐term virial coefficient expansion.

35,161 citations

Journal ArticleDOI
TL;DR: The analogy between images and statistical mechanics systems is made and the analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations, creating a highly parallel ``relaxation'' algorithm for MAP estimation.
Abstract: We make an analogy between images and statistical mechanics systems. Pixel gray levels and the presence and orientation of edges are viewed as states of atoms or molecules in a lattice-like physical system. The assignment of an energy function in the physical system determines its Gibbs distribution. Because of the Gibbs distribution, Markov random field (MRF) equivalence, this assignment also determines an MRF image model. The energy function is a more convenient and natural mechanism for embodying picture attributes than are the local characteristics of the MRF. For a range of degradation mechanisms, including blurring, nonlinear deformations, and multiplicative or additive noise, the posterior distribution is an MRF with a structure akin to the image model. By the analogy, the posterior distribution defines another (imaginary) physical system. Gradual temperature reduction in the physical system isolates low energy states (``annealing''), or what is the same thing, the most probable states under the Gibbs distribution. The analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations. The result is a highly parallel ``relaxation'' algorithm for MAP estimation. We establish convergence properties of the algorithm and we experiment with some simple pictures, for which good restorations are obtained at low signal-to-noise ratios.

18,761 citations

Journal ArticleDOI
TL;DR: A generalization of the sampling method introduced by Metropolis et al. as mentioned in this paper is presented along with an exposition of the relevant theory, techniques of application and methods and difficulties of assessing the error in Monte Carlo estimates.
Abstract: SUMMARY A generalization of the sampling method introduced by Metropolis et al. (1953) is presented along with an exposition of the relevant theory, techniques of application and methods and difficulties of assessing the error in Monte Carlo estimates. Examples of the methods, including the generation of random orthogonal matrices and potential applications of the methods to numerical problems arising in statistics, are discussed. For numerical problems in a large number of dimensions, Monte Carlo methods are often more efficient than conventional numerical methods. However, implementation of the Monte Carlo methods requires sampling from high dimensional probability distributions and this may be very difficult and expensive in analysis and computer time. General methods for sampling from, or estimating expectations with respect to, such distributions are as follows. (i) If possible, factorize the distribution into the product of one-dimensional conditional distributions from which samples may be obtained. (ii) Use importance sampling, which may also be used for variance reduction. That is, in order to evaluate the integral J = X) p(x)dx = Ev(f), where p(x) is a probability density function, instead of obtaining independent samples XI, ..., Xv from p(x) and using the estimate J, = Zf(xi)/N, we instead obtain the sample from a distribution with density q(x) and use the estimate J2 = Y{f(xj)p(x1)}/{q(xj)N}. This may be advantageous if it is easier to sample from q(x) thanp(x), but it is a difficult method to use in a large number of dimensions, since the values of the weights w(xi) = p(x1)/q(xj) for reasonable values of N may all be extremely small, or a few may be extremely large. In estimating the probability of an event A, however, these difficulties may not be as serious since the only values of w(x) which are important are those for which x -A. Since the methods proposed by Trotter & Tukey (1956) for the estimation of conditional expectations require the use of importance sampling, the same difficulties may be encountered in their use. (iii) Use a simulation technique; that is, if it is difficult to sample directly from p(x) or if p(x) is unknown, sample from some distribution q(y) and obtain the sample x values as some function of the corresponding y values. If we want samples from the conditional dis

14,965 citations

Journal ArticleDOI
TL;DR: If data augmentation can be used in the calculation of the maximum likelihood estimate, then in the same cases one ought to be able to use it in the computation of the posterior distribution of parameters of interest.
Abstract: The idea of data augmentation arises naturally in missing value problems, as exemplified by the standard ways of filling in missing cells in balanced two-way tables. Thus data augmentation refers to a scheme of augmenting the observed data so as to make it more easy to analyze. This device is used to great advantage by the EM algorithm (Dempster, Laird, and Rubin 1977) in solving maximum likelihood problems. In situations when the likelihood cannot be approximated closely by the normal likelihood, maximum likelihood estimates and the associated standard errors cannot be relied upon to make valid inferential statements. From the Bayesian point of view, one must now calculate the posterior distribution of parameters of interest. If data augmentation can be used in the calculation of the maximum likelihood estimate, then in the same cases one ought to be able to use it in the computation of the posterior distribution. It is the purpose of this article to explain how this can be done. The basic idea ...

4,020 citations

Journal ArticleDOI
TL;DR: In this paper, exact Bayesian methods for modeling categorical response data are developed using the idea of data augmentation, which can be summarized as follows: the probit regression model for binary outcomes is seen to have an underlying normal regression structure on latent continuous data, and values of the latent data can be simulated from suitable truncated normal distributions.
Abstract: A vast literature in statistics, biometrics, and econometrics is concerned with the analysis of binary and polychotomous response data. The classical approach fits a categorical response regression model using maximum likelihood, and inferences about the model are based on the associated asymptotic theory. The accuracy of classical confidence statements is questionable for small sample sizes. In this article, exact Bayesian methods for modeling categorical response data are developed using the idea of data augmentation. The general approach can be summarized as follows. The probit regression model for binary outcomes is seen to have an underlying normal regression structure on latent continuous data. Values of the latent data can be simulated from suitable truncated normal distributions. If the latent data are known, then the posterior distribution of the parameters can be computed using standard results for normal linear models. Draws from this posterior are used to sample new latent data, and t...

3,272 citations

Frequently Asked Questions (8)
Q1. What contributions have the authors mentioned in the paper "Consideration sets, intentions and the inclusion of “don’t know” in a two-stage model for voter choice" ?

The authors present a statistical model for voter choice that incorporates a consideration set stage and final vote intention stage. Thus, the authors consider the missing data generating mechanism to be non-ignorable and build a model based on utility maximization to describe the voting intentions of these respondents. The authors illustrate the merits of the model as they have information on a sample of about 5000 individuals from the Netherlands for who they know how they voted last time ( if at all ), which parties they would consider for the upcoming election, and what their voting intention is. The authors find that the inclusion of the consideration set stage in the model enables the user to make more precise inferences on the competitive structure in the political domain and to get better out-of-sample forecasts. 

The authors find that the predictive capacity of the model is further improved by the inclusion of a state-dependence variable accounting for previous voting behavior as an explanatory variable. The development of models to accommodate such effects is an important topic for future research. The authors leave these issues for future research. However, the authors can not split this set into all the parties it consists of, because the response parameters would be unidentified due too few observations. 

As explanatory variables, individual-specific socio-economic and demographic variables are contained in the k-dimensional vector xi. 

The term I[d 1 i = j] is a dummy variable that equals 1 if the actual vote of the individual in the previous election is party j and zero elsewhere. 

The authors find that the addition of the consideration set stage improves predictive accuracy and strongly decreases the posterior variance of the predictions. 

One limitation is that the “other party” option includes very heterogenous parties, whereas their model assumes that the explanatory variables affect the choice for all parties within this set in the same way. 

The prediction of outcomes of elections for political parties and candidates receives much interest in democracies across the world, an interest that is reflected intensive media coverage. 

the inclusion of the consideration set stage in the model enables the user to make more precise inferences on the competitive structure in the political domain and to get better out-of-sample forecasts.