Journal Article•DOI•

Estimation of ergodic agent-based models by simulated minimum distance

Jakob Grazzini¹, Matteo Richiardi², Matteo Richiardi³•Institutions (3)

University of Milan¹, Nuffield College², Collegio Carlo Alberto³

01 Feb 2015-Journal of Economic Dynamics and Control (Masanao Aoki and Stephen J Turnovsky)-Vol. 51, pp 148-165

TL;DR: In this article, the authors show how to consistently estimate ergodic models by simulated minimum distance techniques, both in a long-run equilibrium and during an adjustment phase, under a variety of conditions.

read less

About: This article is published in Journal of Economic Dynamics and Control.The article was published on 2015-02-01 and is currently open access. It has received 122 citations till now. The article focuses on the topics: Ergodic theory.

...read moreread less

Summary (6 min read)

Jump to: [1 Introduction] – [2 Little AB models grow big] – [3 Estimation of DSGE models] – [3.1 Maximum likelihood] – [3.2 Simulated minimum distance] – [4 AB models as recursive systems] – [4.1 Equilibrium] – [5 Estimation of AB models] – [5.1 Consistency in minimum distance and simulated minimum distance esti-] – [5.2 Consistency conditions and AB models] – [5.2.1 Convergence] – [5.2.2 Identification] – [5.3 Small-sample estimation bias] – [5.4 Summary: estimation of ergodic models] – [6 Examples] – [6.1 Estimation in an absorbing equilibrium] – [6.2 Estimation in a transient equilibrium] and [7 Conclusions]

1 Introduction

Agent-based (AB) models are sometimes considered as a candidate to replace or at least complement dynamic stochastic general equilibrium (DSGE) as the standard tool for macroeconomic analysis.
When present, this is often limited to some ad-hoc calibration of the relevant parameters; this resembles the state of the art in DSGE modeling a few years ago, which has now moved forward toward more formal estimation.
The authors then turn to some basic questions concerning estimation of AB models:.
Therefore, the aggregate properties of an AB model remain hidden in the complexity of the relations among the different elements and the different layers (micro, macro and possibly meso) of the system.
In section 6 the authors give two examples of estimation of ergodic AB models: in the first, estimation is 3In this paper they focus on estimation of ergodic models.

2 Little AB models grow big

AB models have long been considered as theoretical exercises aimed at investigating the macro effects arising from the interaction of many individuals –each following possibly simple rules of behavior– or the individual routines/strategies underlying some observed macro phenomenon (Richiardi, 2012).
Here, empirical validation is crucial, as the lack of empirical relevance is the ultimate critique that has been moved to DSGE models.
While in the last decade the literature on DSGE models has evolved from simple calibration to proper estimation, the AB macro models are still lagging behind, although some sophisticated examples of calibration are starting to close the gap (Bianchi et al., 2007, 2008; Fabretti, 2012; Cirillo and Gallegati, 2012).6.
As such, estimation is concerned with the properties of the estimators and the quantification of the uncertainty around the estimates.
Winker and Gilli (2001) and Gilli and Winker (2003) estimate respectively two and three parameters of an AB model of the foreign exchange market, by employing the method of simulated moments (MSM).

3 Estimation of DSGE models

In DSGE models aggregation is generally not a problem, thanks to a very low level of heterogeneity.
A common strategy to solve the models involves the linearization of first order conditions and constraints by means of a Taylor approximation around the steady state.9.
In presence of nonstationarity, the model is rescaled and detrended, in order to make it stationary around a balanced growth path; the trend is often assumed to follow an exogenous stochastic process.
Turning to estimation methods, old vintage DSGE models (as most current AB macro models) where mainly calibrated: the values of the parameters were chosen according to some external knowledge, theoretical belief or empirical criterion.
The tide has however turned in favor of a more formal estimation approach.

3.1 Maximum likelihood

This is the standard method for estimating DSGE models, so it deserves a little more space.
This raises serious identification issues (see subsection 5.2.2 below).
If however, the state space representation is not linear and the shocks are not normal, filtering becomes more complicated and must be performed by means of simulation (for instance, with the use of the so-called particle filters –see below): the conditional distribution of the states given the past observations and the parameters is replaced by a simulated distribution.
In turn, the flatness of the likelihood 12Strictly speaking, stochastic singularity is a feature of linearized DSGE models, but it may also have implications for the estimation of nonlinear models depending on the extent to which they differ from their linearized counterparts.
17 Alternatively, one could think of using simulated empirical maximum likelihood to approximate the likelihood function, as for instance in Kristensen and Shin (2012).

3.2 Simulated minimum distance

The main alternatives to ML are GMM, MSM and II (or extended method of simulated moments, EMSM).
SMD summarizes the real system and the artificial system by means of a set of functions on the observed and artificial data and minimizes the difference between the “short summaries” of the two systems.
Often, a vector autoregression (VAR) is used as meta-model.
The latter is a weaker restriction because it is possible to find independent moments that incorporate information about more variables than those that are linearly independent.” (Ruge-Murcia, 2007, p. 2600).
He concludes that moment-based estimation methods (GMM and MSM) compare very favorably to the more widely used method of ML.

4 AB models as recursive systems

This is an essential feature as “[t]he elimination of simultaneous equations allows us to get results from a 19Indirect inference has a very interesting application which allows to estimate a model without (directly using) the data.
While eq. (2) has an explicit analytical formulation in DSGE models, it remains only implicitly defined by the micro transition equations (1) in AB models.
21Here and in the following the authors use “behavioral rules” and similar terms in a loose sense that encompasses the actual intentional behaviors of individuals as well as other factors such as technology etc. 22For a discussion of the Markov chain representation of AB models, see Izquierdo et al. (2009).
In general it may happen that, for a given MC, some projections are Markov and others are not.
Let Z0 = {X0, s} be the set of initial conditions of the simulation, where X0 is the initial state of the system and s stands for the seed(s) used by the random number generator(s) in the simulation, which determine(s) the evolution of ξt and κt.

4.1 Equilibrium

One important difference between DSGE and AB models lies in the definition of equilibrium.
On the contrary, if Yt is stationary but not ergodic, different (absorbing and/or transient) equilibria are obtained, for the same values of the parameters, depending on the initial conditions.
Its properties are then suited for estimation, also known as equilibrium.
This new regularity breaks down when GDP reaches its steady state: it is therefore a transient statistical equilibrium.

5 Estimation of AB models

In the following the authors assume that the model is correctly specified: this is a fundamental hypothesis, which implies that the model perfectly describes the real world, that the structural parameters exist and have a finite value and that all the parameters in the real system are represented in the model.
This can be seen directly in eq. (3): while it is always theoretically possible to solve for Y t by substituting in eqs.
At an empirical level, ergodicity of the true DGP cannot be assessed in the real data, as Nature only plays once, and is therefore assumed (or derived from the assumption of correct specification).
To summarize, there are two issues in estimation of AB models, and they both descend from the fact that the properties of the model are not analytically known: the objective function used for estimation must be numerically evaluated, and ergodicity and stationarity of the aggregated series used for estimation must be tested.
The first problem is common to other modeling strategies (as in DSGE), and calls for the use of simulation-based econometric techniques (Stern, 1997, 2000).

5.1 Consistency in minimum distance and simulated minimum distance esti-

Mators Define a set of estimators ĥn and a vector of functions h(θ), where the estimators are functions of the data, the functions h(θ) are the mapping between the model and the estimators and θ are the structural parameters.
The minimum distance estimator belongs to the class of extremum estimators, which also includes maximum likelihood, nonlinear least square and generalized method of moments.
The theorem states that if Q̂n(θ) converges to Q0(θ) for all θ ∈ Θ and Q0(θ) is maximized only at the true parameters, then the limit of the maximum θ̂ is the maximum of the limit θ0.
If it is possible (in terms of computational time) to simulate the theoretical moments for each possible combination of the parameters, the technique gives the value of the objective function for all θ ∈ Θ, so that a θ̂ is chosen such that the objective function is maximized.
The authors now turn to investigate under which conditions an AB model can be consistently estimated by SMD.

5.2 Consistency conditions and AB models

Consistency conditions for AB models refer to the properties of the statistics used for estimation.
The attention therefore turns from the objective function to the constituents of the objective function.
Preliminarily, note that in a computer program as an AB model the objective function is discrete by construction, since only a countable number of values of the parameters can be tested.
32Fixing the random numbers across iterations is required to achieve uniform convergence (McFadden, 1989; Pakes and Pollard, 1989).

5.2.1 Convergence

The first essential condition for convergence of the objective function is the convergence of its elements to their true value.
Each statistics used in the objective function should therefore be tested for stationarity in the simulated data.
The second is that the observed moments converge in probability to their expected value and in turn the objective function Q̂n(θ) converges in probability to Q0(θ).
Note that the authors are talking here about convergence in time.
This is a crucial point which distinguishes estimation from many calibration exercises, where cross-sectional statistics are used to characterize a model.37.

5.2.2 Identification

The objective function does not display a unique minimum), under-identification (when the objective function is independent of certain structural parameters), partial identification (when some parameters enter the objective function only as a linear combination of other parameters), weak identification (when the objective function has a unique minimum, but its curvature is very flat at the minimum).
As they argue, “[t]hese problems turn out to be inherent to DSGE models –they are typically produced because the mapping from the structural parameters to the coefficients of the solution is ill-behaved– and standard choices of data summaries and of objective functions may make identification deficiencies worse” (p. 432).
A common practice when dealing with identification issues is to reduce the number of parameters to be estimated (for instance by dropping observationally equivalent parameters from the specification) or calibrate some of the parameters and estimate the others, conditional on the calibrated values.
Monotonicity implies that the only point in the parameter space in which all h(θ) = h0 is θ0.

5.3 Small-sample estimation bias

If the moments are a nonlinear function of the parameters, the authors get a bias in the estimates.
This is easy to show in the case of perfect identification when there is only one parameter and one moment.
The observed moment can be written as h(θ0, ζn) = h(θ0) + ζn (10) The observed moment is the “true moment” plus an error ζn which depends on the sample (and on sample size).
In the same way it is possible to show that if the moment function is concave, the estimated parameter is upward biased if the moment function is increasing and downward biased if the moment function is decreasing.
For simplicity the authors omit here to consider this additional source of variability.

5.4 Summary: estimation of ergodic models

As the authors have seen, estimating an ergodic AB model by means of simulated minimum distance is conceptually straightforward, the crucial step being the choice of appropriate longitudinal statistics (longitudinal “moments”) to match simulated and real data.
In order for these longitudinal moments to be meaningful quantities, the time series considered (both real and simulated) must be stationary: (weak) stationarity implies, in loose terms, that the theoretical mean and variance are constant in time.
Stationarity can be a permanent property of the time series, that is, once it becomes stationary, the time series remains stationary.
This the authors call an absorbing equilibrium of the model, with respect to the time series considered.
The observed moments are then constructed by averaging over the available real data.

6 Examples

While the second is estimated in a transient equilibrium.the authors.
In both cases, because the moments used for estimation are non-linear, the authors get a small-sample bias (of predictable direction).
All estimation strategies are explored by means of Monte Carlo experiments: pseudo-true data are created by running the model with some chosen value of the parameters; then, estimates are obtained by matching the simulated moments to the moments computed on the pseudo-true data.
This procedure is repeated many times in order to get a distribution of the estimates.

6.1 Estimation in an absorbing equilibrium

The model used here is an AB stock market model proposed in Cliff and Bruten (1997) to reproduce the experimental results obtained by Smith (1962), showing how a small number of inexperienced traders converge rapidly to a competitive equilibrium under the double auction mechanism (Smith, 1962, p.157).
In each period traders look at the book and define a target price τi(t): traders increase their target price if the last trade occurred at a high price, and lower it otherwise –see Cliff and Bruten (1997) for details.
The behavior around the equilibrium price, on the other hand, depends on β.
The variance is convex in the parameter: this introduces a downward bias in the estimates.
Figure 3 shows the shape of the objective function, for one particular Monte Carlo experiment (pseudo-true series) using the variance (left panel) and the standard deviation (right panel).

6.2 Estimation in a transient equilibrium

In that paper the authors estimate an AB version of the well known Bass (1969) model, which provides a mathematical explanation of the different stages of product adoption described in Rogers (1962) (innovators, early adopters, early majority, large majority, and laggards), and formalizes the crucial distinction between innovators and imitators.
The authors therefore focus on the adjustment process, looking for regularities which define the transient equilibria of the system.
First, the authors get an estimate p̂ for the external influence parameter, conditional on a specific value of the population parameter m, by matching the observed and simulated adoption rate at t = 0.
Because the moment τ is not linear in m (figure 5), the authors get an upward small sample bias.

7 Conclusions

In this paper the authors have identified simulated minimum distance as a natural approach to estimation of AB models.
The theoretical quantities or statistics used for characterizing the model conditional on the values of the parameters (the moments, for instance), for which no analytical expression is available in AB models, are replaced by their simulated counterparts.
This requires that these statistics are appropriately chosen so that their estimates in the simulated data converge to the theoretical values.
One can think of many open issues and avenues for research.the authors.
Finally, one could think of DSGE and AB models as a mechanism for generating priors, rather than a model of the data.

Did you find this useful? Give us your feedback

Figures (5)

Figure 3: Objective function, constructed over different moments. Left: variance. Right: standard deviation. The theoretical moments are computed, for each value of the parameter, averaging over 100 runs each lasting for 400 trading days after the absorbing equilibrium is reached.

Figure 2: Standard deviation of the price. β is sampled in [0, 1] at intervals of 0.01. 100 runs with different seeds are performed for each value of β (the seeds are kept constant while varying the value of the parameter). Each run lasts for 400 trading days after the absorbing equilibrium has been reached. The gray dots are the moments computed in each simulation; the black dots mark the averages, conditional on β.

Figure 5: Average adoption time for the adopters τ(T,m), different values of m. Other parameters: p = 0.03, q = 0.4, T = 10. Ten artificial adoption sequences are simulated for each value of m. For each sequence, 10 replications of the estimation procedure are performed, with different pseudo-random numbers. For each set of estimated parameters, τ(T,m) is computed. The graph reports average values. Source: Grazzini et al. (2012b).

Figure 1: Model behavior in the absorbing equilibrium (stationary price). β is sampled in [0, 1] at intervals of 0.01. 100 runs with different seeds are performed for each value of β (the seeds are kept constant while varying the value of the parameter). Each run lasts for 400 trading days after the absorbing equilibrium has been reached. The gray dots are the moments computed in each simulation; the black dots mark the averages, conditional on β.

Figure 4: Monte Carlo results, 8,000 experiments (pseudo-true series). The theoretical moments are computed, for each value of the parameter, averaging over 100 runs each lasting 400 trading days after the absorbing equilibrium is reached. Left: empirical distribution of the estimates (the correct value is 0.55). Right: Cumulative mean of the estimates.

Frequently Asked Questions (18)

Q1. What are the contributions mentioned in the paper "Estimation of ergodic agent-based models by simulated minimum distance" ?

In this paper the authors show how to circumvent these difficulties and under which conditions ergodic models can be consistently estimated by simulated minimum distance techniques, both in a long-run equilibrium and during an adjustment phase.

Q2. What are the future works in "Estimation of ergodic agent-based models by simulated minimum distance" ?

Of course, the analysis and recommendations contained in this paper must be regarded as a mere introduction to the problem of estimating AB models ( in their companion paper ( Grazzini and Richiardi, 2014 ) the authors extend the analysis to estimation of non-ergodic models ). Third, the possibility of applying Bayesian methods should be investigated, not only in conjunction with simulated maximum likelihood but also with the other estimation procedures. However, including priors allows for a more general estimation procedure, and leaves open the possibility of using little informative priors. In this respect, the authors can think of many open issues and avenues for research.

Q3. What is the importance of knowing the corresponding estimators?

In particular, since the simulated moments (or other summary measures) are used as an estimate of the theoretical moments, it is crucial to know whether the corresponding estimators are consistent.

Q4. What is the main problem of DSGE models?

In particular, DSGE models not only feature a large number of parameters, but share with AB models an important aspect of complex systems: they include many nonlinear feedback effects.

Q5. What are the conditions for consistency in minimum distance estimators?

The minimum distance estimator belongs to the class of extremum estimators, which also includes maximum likelihood, nonlinear least square and generalized method of moments.

Q6. What are the tests used to assess whether the statistics of interest are constant in time and across runs?

The tests are used to assess whether the statistics of interest are constant in time and across runs: the stationarity test uses samples from a given simulation run, while the ergodicity test uses samples from different runs.

Q7. What is the reason why the assumption of correct specification must be invoked?

35If the real data cannot be tested for stationarity due to small sample size, the assumption of correct specification must again be invoked.

Q8. What is the definition of a vector of aggregate variables Y?

A vector of aggregate variables Y t is defined as a (vectorial) function over the state of the system, that is as a projection from X to Y :Y t = G(Xt,κt).

Q9. What is the problem with the deterministic linear combination of observable variables?

The first one is stochastic singularity, which arises when a small number of structural shocks is used to generate predictions about a large number of observable variables: the model then predicts a deterministic linear combination of observable variables which causes the likelihood to be 0 with probability 1.12 Solutions to this problem involve reducing the number of observable variables on which inference is made (or using a projection from the set of observables to a smaller set of composed indicators), increasing the number of shocks, or adding measurement errors.

Q10. Why do the authors use the term'stationarity' to describe the theoretical moments?

Because the theoretical moments cannot be analytically derived, the authors proceed by simulating them: stationarity (and ergodicity) ensure that the empirical means computed on the artificial data converge to the theoretical ones.

Q11. Why is the convergence of a smooth function so slow?

This is because of the curse of dimensionality, the fact that the convergence of any estimator to the true value of a smooth function defined on a space of high dimension (the parameter space) is very slow (De Marchi, 2005; Weeks, 1995).

Q12. What is the way to estimate a nonlinear state space?

if the state space is nonlinear an extended Kalman filter (XKF) can be implemented on a linearized system around the steady state; this provides sub-optimal inference, but still relies on the Gaussian assumption.

Q13. What are the consistency conditions for minimum distance estimators?

The consistency conditions for extremum estimators, including the minimum distance estimatordefined in equation (7), are given in the following theorem (Newey and McFadden, 1994, p.2121):Theorem If there is a function Q0(θ) such that (i) Q0(θ) is uniquely maximized at θ0; (ii) Θ is compact; (iii) Q0(θ) is continuous; (iv) Q̂n(θ) converges uniformly in probability to Q0(θ), then θ̂n p−→ θ0.

Q14. How many indirect estimation techniques can be used for the distribution of relevant statistics?

For some of them direct estimation techniques can be used, as the models are simple enough to derive a closed form solution for the distribution of relevant statistics.

Q15. Why is the AB model under- or weakly identified?

As stated by Canova and Sala (2009, p. 448), when models are under- or weaklyidentified “reasonable estimates are obtained not because the model and the data are informative but because auxiliary restrictions make the likelihood of the data (or a portion of it) informative.

Q16. Why is ML limited by the number of linearly independent variables?

This is because “ML estimation is limited by the number of linearly independent variables while moment-based estimation is limited by the number of linearly independent moments.

Q17. How do the authors analyze the mapping of (X0,) into Y t?

the only way to analyze the mapping of (X0,θ) into Y t is by means of Monte Carlo analysis, by simulating the model for different initial states and values of the parameters, and repeating each simulation experiment many times to obtain a distribution of Y t.

Q18. What is the intermediate approach to estimating the parameters?

An intermediate approach is to calibrate some parameters, and then estimate the others conditional on the values of the calibrated set.