# A path integral method for data assimilation

## Summary (4 min read)

### 1 Introduction

- Data assimilation refers to finding an estimator that derives its statistical nature from a model as well as data; the model or the data may have a stochastic component, which are referred to as model error and measurement error, respectively.
- The state vector for which an estimator is sought can consist of dynamic quantities (e.g., physical variables) as well as parameters.
- Another sampling strategy is the generalized Hybrid Monte Carlo (gHMC) approach (see Ferreira and Toral (1993)).
- It leads to very small time steps in the assimilation: beyond the obvious issue that relates computational expense, for a given tolerance in the accuracy, the new dimension on this problem is that on finite precision machines it may have an effect on how the relative variances of the model and the data are balanced.

### 2 Problem Statement

- Considered here is the determination of the statistics of the state variable x(t), whose dimension is Nx, given incomplete and possibly imprecise observations of that system.
- The distribution of the stochastic component of the model is assumed known.
- The diffusion matrix is D, and the vectorvalue dW(t) represents an incremental standard Wiener process.

### 3 The Path Integral Method

- The Euler-Maruyama discretization is the simplest and makes the presentation of the method clearest.
- More importantly, it is a convenient scheme with regard to adapting legacy code in the assimilation strategy.
- Without loss of generality it will be assumed that the discrete time steps are equally spaced, and further, that the measurement times are commensurate with δt, the time step interval.
- The total action or cost functional associated with the distribution of the state variable, conditioned on measurements, is then U = Udynamics + Uobs. (10) Rather than trying to minimize the cost functional (10) PIMC proposes to sample exp(−U), which is proportional to the conditional probability distribution.
- A slight modification of this sampling strategy, which is referred to here as gMC, would differ from the above in only one respect: the new proposal consists of a new configuration for all components of the state vector at every time slice.

### 3.1 The Accelerated Sampler: Using Molecular Dynamics

- The two hybrid samplers considered next take the correspondence between the estimation problem and the mechanics of a multi-particle system further.
- Equally important, however, is to choose a matrix that does not increase the computational cost unreasonably: at worst (12) can increase the cost by O(T 2) due to the matrix/vector multiplies.
- The gHMC proposes a dynamic over fictitious time that may no longer be Hamiltonian, with the purpose of overcoming the torpor of standard Monte Carlo sampling.
- The Metropolis step corrects for τ -time discretization errors.
- In the manner presented here the intermediate values, in fictitious time, for the conjugate position, are not saved.

### 4 Higher Order SDE Integrators

- Nevertheless, described in what follows is how to modify the PIMC assimilation methodology to handle integrators other than Euler-Mayurama (note that the point-vortex/drifter system is in fact one in which perhaps a higher order method would be more suitable in integrating the stochastic differential equations).
- A concrete example illustrates this: Euler-Mayurama results from retaining the first three terms in the above expression.
- The modifications required are in the definition of the cost function, relevant to the accept/reject stage, and the Hamiltonian in the molecular dynamics stage (see Section 3.1).
- Assume, for illustration that the noise is and that D is a constant, then the probability distribution associated with the predictor is, by definition, capable of generating samples independent of the distribution for the corrector changes (8).

### 5 Implementation Details

- Presuming that the model is already in the form of (6) what is required is a code that implements the MCMC trials (consisting of a scheme to generate proposals, calculate the cost function, and accept/reject the proposal within a Markov Chain Monte Carlo context).
- Proposals are generated via random walks of single degrees of freedom in MC and of sets of state vectors and time steps gMC, or via molecular dynamics runs in HMC and gHMC.
- Within the MCMC trials loop care must be taken when computing ∆Ĥ (see (14)): If Nx×T is very large it is possible to degrade the computation due to loss-of-precision errors in subtracting large numbers from one another.
- In solving the Hamiltonian system of HMC there is an increase in storage, when compared with MC, say, by the Verlet integrator.
- If the data being assimilated was produced by (6) that a good starting guess would be produced by solving the SDE itself with no noise.

### 5.1 The MCMC Trial Loop

- The MCMC trial section of the code, in its simplest guise, might look as in Algorithm 1.
- Far more elegant and compact version of this algorithm are possible but the one presented here is easy to follow.
- The while loop turns out to be a convenient construct from the point of view of monitoring progress in the MCMC loop.
- The while loop can be implemented with added diagnostic tools that can examine the statistics of the cost function itself.

### 5.2 Acceleration by Fictitious-Time Molecular Dynamics

- The parameters dτ and J are used to change the ratio of accepted trials to total trials.
- GMC results from not using the molecular dynamics routine.
- When the model is simple the subroutine representing Fk can be built by hand.
- J the number of fictitious time steps, N{·} the MCMC trials required by each of the sampling methods.
- This defines a “correlation length” for the method –which, incidentally, will pin down the value of N{·}: statistical stability of the results was obtained with N{·} in the order of 8 to 10 times the correlation length.

### 5.3 Testing the Outcomes

- Debugging the code is greatly facilitated by the relatively simple structure of the algorithm: the gMC strategy can be checked by turning off the molecular dynamics routine.
- Finally gHMC can be tested by checking for agreement with gMC and HMC.
- Monitoring the spectrum of the cost function is also very useful.
- Another test requires generating the data y(tm) by solving the SDE.
- For MCMC trials, in the order of 10 million, it was found that the mean relative error in all methods to be used in the next section to evaluate the samplers was very small.

### 6 Example Calculations

- The example featured here consists of producing the conditional mean history and the uncertainty of a system comprised of Np passive drifters and Nv point vortices (see Friedrichs (1966)).
- They attribute failure in the estimation, mainly, to repeated crossings of saddle points by the drifter.
- The estimated histories, shown in the following figures, were computed via HMC with J = 1.
- The predicted standard deviation for the drifter position is plotted as a function of time in Figure 2b.
- In Figure 2c appear the estimated vortical paths, as connected dots, the connected circles are the vortical paths that were not used in the assimilation process but produced by the numerical run that generated the drifter data.

### 6.1 Comparison of the Different Samplers

- Table 1 summarizes the computational efficiency of the methods in obtaining the results portrayed in Figure 2. HMCJ refers to HMC using J fictitious time steps.
- The time of each run is quoted in seconds, all of the cases were run using 10 million MCMC trials.
- The correlation time, in seconds, for gMC is only about 1.5 times better than MC and higher than a couple of the HMC entries, however, this strategy does not require a gradient.
- In order for this method to have been competitive on this specific problem the correlation length would have to be approximately 353, a correlation length that seemed impossible to achieve, though trial and error was the only strategy available to test this conjecture computationally.
- It is also possible that the circulant matrix was well matched to the double well problem, but not as well matched to the Lagrangian problem, leading to much more impressive results in the use of gHMC on the double-well problem than in the Lagrangian problem, using the correlation length metric.

### 6.2 Assimilating Discretized-SDE Data

- PIMC should deliver highly accurate estimates when the discretzed SDE that was used in deriving the action is used to produce the data and the uncertainty in the data is small.
- This should still be the case in the hidden variable case.
- To illustrate this case drifter data was generated by actually solving the SDE via (6), with all parameters equal to those used in generating the results in Figure 2.
- Figure 3a shows the drifter path data as well as its estimated mean, and Figure 3b shows the assimilated vortical paths.
- Granted, the data was considerably smoother and less noisy when compared to the results in Figure 2a as it enters the creation of the data in a different way.

### 6.3 Uncertainties, Data Insertion Frequency

- Considered here are two more examples that further illustrate how the method deals with changes in data parameters.
- The data was the same as that used in generating Figure 2.
- The first case will correspond to dropping the insertion rate from every 4 to every 15 time steps.
- Figure 4 shows how the results are modified by a significantly lower contribution in the cost function from observations.
- The insertion interval of every 15 steps was been chosen because it is not commensurate with the inherent frequencies of motion of the dynamical system (a commensurate interval would either follow the twists and turns of the drifter data more faithfully or will suppress completely the higher frequency motions.

### 7 Conclusions

- PIMC is a data assimilation scheme which makes use of the discretized model in the formulation of the cost function.
- The preferred outcomes of the assimilation process are moments of histories, conditioned on data and thus the structure of the action is also influenced by a Bayesian inter-relation between model and data.
- In order to exploit the computational savings that the hybrid Monte Carlo methods afford, a code for the gradient of the action is needed.
- GHMC had a higher computational overhead than other methods explored here.
- When linear and linearized methods fail, the matter of efficiency becomes a moot point, and the feasibility of getting an estimate, particularly one that is optimal or nearoptimal, make the methods developed for nonlinear/non-Gaussian problems viable.

Did you find this useful? Give us your feedback

##### Citations

771 citations

### Cites methods from "A path integral method for data ass..."

...Finally, [82] proposes a path integral approach to particle filtering for data assimilation....

[...]

475 citations

67 citations

### Cites methods from "A path integral method for data ass..."

...Statistical properties of estimated quantities such as the expected state during a learning window, or in a prediction window following the learning period, along with the estimated errors of these expected values, are given by the evaluation of a path integral through the state space of the model [60–63]....

[...]

33 citations

31 citations

##### References

13,828 citations

7,182 citations

### Additional excerpts

...See Gardiner (2004)....

[...]

6,574 citations

6,284 citations

5,172 citations

### "A path integral method for data ass..." refers methods in this paper

...However, it should be noted that in the gHHMC calculations FFTW was used (see Frigo and Johnson (2005)) and thus matrix-vector multiplies are performed in 4T (1 + log T )) per conjugate position/momenta using an already optimized FFT code....

[...]