What have the authors stated for future works in "Data-driven predictions of the lorenz system" ?

Future works could include the tuning of hyperparameters ( to have an optimal design for each neural networks ) and the application to a high dimensional attractor where, similarly to Lorenz system, extreme events could be encountered.

How do the authors make a continuous forecast of the state using a data-driven dynamical?

To make a continuous forecast of the state using a data-driven dynamical model, it is necessary to limit the accumulation of prediction errors [23] by incorporating online data in the prediction process.

How many samples are simulated using the Lorenz system?

The system is simulated using a Runge Kutta 4 method, a random initial condition and a time step of 0.005s, for a total of 15000 samples.

what is the future of the attractor?

Future works could include the tuning of hyperparameters (to have an optimal design for each neural networks) and the application to a high dimensional attractor where, similarly to Lorenz system, extreme events could be encountered.

What is the way to avoid overfitting?

To avoid overfitting and ensure that weights and biases learned during training are relevant for future use on test set, errors evaluated on training and validation sets should be close.

What is the correlation coefficient between vx and the state?

It appears that small sequences of vxare linearly correlated to all features in the state (linear correlation coefficient close to 1), which is no longer the case for medium and large sequences where nonlinearities arise (linear correlation coefficient between 0.6 and 0.7).

How does Vashista train a kalman filter?

Vashista [27] directly train a RNN - LSTM network to simulate ensemble kalman filter data assimilation using the differentiable architecture search framework.

How do they estimate errors in a kalman filter?

In Loh et al. [23], authors update LSTM predictions of flow rates in gaz wells using an ensemble kalman filter, thus estimating errors via the covariance of an ensemble of predictions.

What is the impact of the forecast window on the global error?

As expected, increasing the forecast window leads to a bigger impact on the global score (e2/e1 increasing) because prediction errors accumulate on longer sequences.

What is the effect of forcing statistics on the system?

Following this method, forcing statistics appear nongaussian, with long tails corresponding to rare intermitting forcing preceding switching events (see Figures 7a and 7c).

What is the performance of the DAN on small sequences?

The authors can observe that the DAN performs better for medium and large sequences but has poor performance on small sequences compared to the Kalman filter.

(Open Access) Data-driven predictions of the Lorenz system (2020) | Pierre Dubois

Q: What other architectures of artificial neural networks have then been developed?

Other architectures of artificial neural networks have then been developed, including convolutional networks (CNN, for image recognition) or recurrent neural networks (RNN, inputs are taken sequentially).

HAL Id: hal-02475962

https://hal.archives-ouvertes.fr/hal-02475962v2

Submitted on 10 Jun 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entic research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diusion de documents

scientiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Data-driven predictions of the Lorenz system

Pierre Dubois, Thomas Gomez, Laurent Planckaert, Laurent Perret

To cite this version:

Pierre Dubois, Thomas Gomez, Laurent Planckaert, Laurent Perret. Data-driven predictions

of the Lorenz system. Physica D: Nonlinear Phenomena, Elsevier, 2020, 408, pp.132495.

�10.1016/j.physd.2020.132495�. �hal-02475962v2�

Data-driven predictions of the Lorenz system

Pierre Dubois

a,∗

, Thomas Gomez

, Laurent Planckaert

, Laurent Perret

Univ. Lille, CNRS, ONERA, Arts et Metiers Institute of Technology, Centrale Lille, UMR 9014 - LMFL -

Laboratoire de M´ecanique des ﬂuides de Lille - Kamp´e de F´eriet, F-59000 Lille, France

Centrale Nantes, LHEEA UMR CNRS 6598, Nantes, France

Abstract

This paper investigates the use of a data-driven method to model the dynamics of the chaotic Lorenz

system. An architecture based on a recurrent neural network with long and short term dependencies

predicts multiple time steps ahead the position and velocity of a particle using a sequence of past

states as input. To account for modeling errors and make a continuous forecast, a dense artiﬁcial

neural network assimilates online data to detect and update wrong predictions such as non-relevant

switchings between lobes. The data-driven strategy leads to good prediction scores and does not require

statistics of errors to be known, thus providing signiﬁcant beneﬁts compared to a simple Kalman ﬁlter

update.

Keywords: data-driven modeling, data assimilation, chaotic system, neural networks

1. Introduction

Chaotic dynamical systems exhibit character-

istics (nonlinearities, boundedness, initial condi-

tion sensitivity) [1] encountered in real-world prob-

lems such as meteorology [2] and oceanography

[3]. The multiple time steps ahead prediction

of such a system is challenging because govern-

ing equations may be unknown or too costly to

evaluate. For instance, the Navier Stokes equa-

tions require prohibitive computational resources

to predict with great accuracy the velocity ﬁeld

of a turbulent ﬂow [4].

Data-driven modeling of dynamical systems is

an active research ﬁeld whose objective is to in-

fer dynamics from data [5]. Regressive methods

in machine learning [6] are particularly suitable

for such tasks and have proven to reliably recon-

struct the state of a given system [7]. If param-

eters are not overﬁtted to training examples, the

data-driven model can also be used for predictive

∗

Corresponding author: pierre.dubois@onera.fr

tasks, providing the input lies in the input domain

used for training. Main techniques in the litera-

ture include autoregressive techniques [8], dynam-

ical mode decomposition (DMD) [9], Hankel al-

ternative view of Koopman (HAVOK) [10] or un-

supervised methods such as CROM [11]. Neural

networks are also of increasing interest since they

can perform nonlinear regressions that are fast to

evaluate. Architectures with recurrent units are

recommended for time-series predictions because

memory is incorporated in the prediction process.

Neural networks can then learn chaotic dynamics

[12] and predict with great accuracy the future

state [13].

However, errors in modeling can lead to bad

multiple time steps ahead predictions of chaotic

dynamical systems: a tiny change in the initial

condition results in a big change in the output

[12]. To overcome the propagation of uncertain-

ties from the dynamical model (bad regression

choice in a data-driven approach or bad turbu-

lence modeling in CFD for instance) data assimi-

lation (DA) techniques have been developed [14].

March 12, 2020

They combine the predicted state of a system with

online measurements to get an updated state. Such

methods have successfully been applied in ﬂuid

mechanics to obtain a better description of initial

or boundary conditions by ﬁnding the best com-

promise between experimental measurements and

CFD predictions [15]. Nevertheless, the dynam-

ical model can be slow to evaluate (limiting the

use to oﬄine assimilations) and errors (initial con-

dition, dynamical model, measurements and un-

certainties) can be hard to estimate in real-world

applications.

In this paper, a data-driven approach is used

to discover a dynamical model for the Lorenz sys-

tem. To handle the chaotic nature of the system,

a recurrent neural network (RNN) dealing with

long and short term dependencies (LSTM) is con-

sidered [16]. To correct modeling errors, a dense

neural network (denoted hereafter DAN) whose

design is based on Kalman ﬁltering techniques

is developed. Results are promising for predict-

ing multiple steps ahead the position and velocity

of a particle on the Lorenz attractor, using only

the initial sequence and real-time measurements

of the complete acceleration, the complete veloc-

ity or a single component of the velocity.

The paper is organized as follows. In Sec-

tion 2, the overall strategy is presented with a

quick understanding of how neural networks work.

In Section 3, results about the low dimensional

Lorenz system are shown, with a particular inter-

est in the impact of forecast horizon and noise.

A discussion is given in Section 4 before giving

concluding remarks.

2. Strategy

2.1. Proposed methodology

This paper investigates the use of neural net-

works to continuously predict a chaotic system

using a data-driven dynamical model and online

measurements. The method is summarized in Fig-

ure 1 and contains the following steps:

 Consider m temporal states of the system.

The sequence is denoted [s]

t−m−1

where s is

the state of the system and whose dimension

is n

 Predict n future states using a RNN with

long and short-term memory (LSTM). This

gives a predicted sequence [s

]

t+n

t+1

where su-

perscript b indicates a prediction.

 Predict the measured sequence. This gives

]

t+n

t+1

where y

is the predicted measure of

the state. The mapping between the state

space and the measurement space is per-

formed by a dense neural network called the

shallow encoder (SE).

 Assimilate the exact sequence of measure-

ments [y]

t+n

t+1

and update the predicted se-

quence of states. This work is performed

by a dense neural network which gives an

updated sequence [s

]

t+n

t+1

where superscript

a stands for ”analyzed”. The network is

called the data assimilation network (DAN).

 Construct [s

]

t+n

t+n−m+1

by adding m − n up-

dated states from the previous iteration. This

gives a new input that can be used to cycle

and continue the forecasting process.

In this section, we give a quick overview of

neural networks and explain architectures behind

the dynamical model (RNN-LSTM), the measure-

ment operator (SE) and the data assimilation pro-

cess (DAN).

2.2. Quick overview of neural networks

A neuron is a unit passing a sum of weighted

inputs through an activation function that intro-

duce nonlinearities. These functions are classi-

cally a sigmoid σ(x) =

1 + e

−x

, a hyperbolic tan-

gent tanh(x) or a rectiﬁed linear unit relu(x) =

max(0, x). When neurons are organized in fully

connected layers, the resulting network is called

a dense neural network. The universal approxi-

mation theorem [17] states that any function can

be approximated by a suﬃciently large network

i.e. one hidden layer with a large number of neu-

rons. Just like a linear regression y = ax + b aims

[s]

t−m−1

RNN - LSTM

]

t+n

t+1

]

t+n

t+1

[y]

t+n

t+1

DAN

]

t+n

t+1

]

t+n

t+n−m+1

Figure 1: Summary of the data-driven method to make predictions of a chaotic system. A data-driven dynamical model

(RNN-LSTM) predicts n future states of the system and the predicted sequence is updated according to a real sequence

of measurements.

at learning the best a and b parameters, a neu-

ral network regression y = NN(x) aims at learn-

ing the best weights and biases in the network by

optimizing a loss function evaluated on a set of

training data.

Although they are universal approximators,

dense neural networks face some limitations: they

may suﬀer from vanishing or exploding gradient

(arising from derivatives of activation functions,

see [18]), are prone to overﬁtting (ﬁtting that cor-

responds too much to training data) and inputs

are not individually processed. Other architec-

tures of artiﬁcial neural networks have then been

developed, including convolutional networks (CNN,

for image recognition) or recurrent neural net-

works (RNN, inputs are taken sequentially). Re-

current networks use their internal state (denoted

h) to process each input from the sequence of in-

puts. This internal state is computed using an

activation function but to avoid limitations from

dense networks, its form is more elaborate. For

example, Long Short-Term Memory (LSTM) cells

[19] are combinations of classical activation func-

tions (sigmoids and tanh) that incorparate a long

and short term memory mechanism through the

cell state (see Figure 2).

Several techniques exist to learn parameters in

neural networks. The most common is the gradi-

ent descent, which iteratively update parameters

according to the gradient of the cost function with

respect to weights and biases. The computation

of gradients is made by backpropagating errors

in the network, using backpropagation for dense

neural networks or backpropagation through time

for RNN [6]. The equations can be found in [20]

for the curious reader. In this paper, all neural

networks are implemented using the Keras library

[21].

In this paper, hyperparameters are not tuned.

No grid search or genetic optimization is intended

and number of neurons, number of hidden lay-

ers and activation functions are found by succes-

sive trials. Deﬁned architectures must not then

be considered as a rule of thumb.

2.3. Novelty of the work

This paper proposes a regressive framework

for assimilating data as opposed to standard data

assimilation techniques whose architecture does

not depend on the problem. Besides, the present

paper considers time marching of an entire se-

quence of the state while the most standard ap-

proaches involve a time marching of the predicted

state at regular time units. More details about

existing works are given in Section 4.

2.4. Dynamical model

The ﬁrst step is to establish a dynamical model

mapping m previous states s(t) to n future states.

The chosen architecture is summarized in Figure

s(j)

tanh

h(j)

(a) RNN

s(j)

LSTM

h(j) and C(j)

(b) RNN - LSTM

σ σ

tanh

× +

× ×

tanh

C(j − 1)

h(j − 1)

s(j)

C(j)

h(j)

F G

IG OG

gating mechanisms. The cell state C is modiﬁed when fed with a

new time step from the input sequence, forgetting past information

(via Forget Gate FG), storing new information (via Input Gate IG)

and creating a short-memory (via Output Gate OG). Mathematical

details are given in the appendix.

Figure 2: Two types of recurrent neural networks: simple RNN handling short-term depedencies via a hidden state h

(subﬁgure a) and RNN-LSTM handling short and long-term depedencies via a hidden state h, a cell state C and gating

mechanisms (subﬁgures b and c). Each time step s(j) from the input sequence is combined with h(j − 1) (and C(j − 1)

for LSTM-RNN) which was (were) computed at previous time step.

3. In the recurent layer, 2m LSTM cells (making

the cell state a 2m dimensional vector) processes

the input sequence [s]

t−m+1

. This results in a ﬁnal

output o(t) = h(t) summarizing all relevant infor-

mation from the input sequence. In dense layers,

the ﬁnal output from the recurent layer is used to

predict n future states [s

]

t+n

t+1

Concerning the the

number of recurent units, it has been chosen to

echo results of Faqih et al. [1] where best scores

were obtained by considering twice as many neu-

rons than the history window. Authors made this

conclusion after trying to predict multiple steps

ahead the state of the Lorenz 63 system using a

dense neural network with radial basis functions.

About the training of the model, the procedure is

as follows:

1. Simulate the system to get data t → s(t).

For the considered Lorenz system, only one

trajectory is simulated but it covers a good

region of the phase space.

2. Split data into training and testing sets. In

this work, 2/3 of the data is used for the

Data-driven predictions of the Lorenz system

Figures

Citations

Data-driven modeling and analysis based on complex network for multimode recognition of industrial processes

Prediction of chaotic time series using recurrent neural networks and reservoir computing techniques: A comparative study

Machine learning for fluid flow reconstruction from limited measurements

Learning continuous models for continuous physics

New Results for Prediction of Chaotic Systems Using Deep Recurrent Neural Networks

References

Adam: A Method for Stochastic Optimization

Long short-term memory

Adam: A Method for Stochastic Optimization

Multilayer feedforward networks are universal approximators

Deterministic nonperiodic flow

Related Papers (5)

Deterministic nonperiodic flow

Model-Free Prediction of Large Spatiotemporally Chaotic Systems from Data: A Reservoir Computing Approach

Hierarchical delay-memory echo state network: A model designed for multi-step chaotic time series prediction

Prediction of chaotic time series with neural networks and the issue of dynamic modeling

Online anomaly detection with concept drift adaptation using recurrent neural networks

Frequently Asked Questions (15)

Q1. What are the contributions in "Data-driven predictions of the lorenz system" ?

Q2. What have the authors stated for future works in "Data-driven predictions of the lorenz system" ?

Q3. How do the authors make a continuous forecast of the state using a data-driven dynamical?

Q4. How many samples are simulated using the Lorenz system?

Q5. what is the future of the attractor?

Q6. What other architectures of artificial neural networks have then been developed?

Q7. What is the way to avoid overfitting?

Q8. What is the main reason for the lack of accuracy in the prediction of chaotic dynamical systems?

Q9. What is the correlation coefficient between vx and the state?

Q10. How does Vashista train a kalman filter?

Q11. What is the way to predict the position and velocity of a particle on the Lorenz?

Q12. How do they estimate errors in a kalman filter?

Q13. What is the impact of the forecast window on the global error?

Q14. What is the effect of forcing statistics on the system?

Q15. What is the performance of the DAN on small sequences?