What are the contributions mentioned in the paper "A simple machine learning technique for model predictive control" ?

This paper is devoted to a simple approach for the offline computation of closed-loop optimal control for dynamical systems with imposed terminal state arising in Model Predictive Control Scheme ( MPC ). Some examples are provided in the paper, which demonstrate the ability of this approach to tackle some quite large problems, with state dimensions reaching 50, without encountering limitations due to the so-called curse of dimensionality.

What future works have the authors mentioned in the paper "A simple machine learning technique for model predictive control" ?

Further studies are still needed to evaluate medium-scale nonlinear problems and investigate optimal state estimation.

What is the potential limitation of the here-proposed approach?

A potential limitationto this approach is ill-conditioning and loss of continuities of the characteristic equations when the value of the control horizon is too large.

What is the main advantage of the method?

Since the method intrinsically solves an optimal control problem with a final state constraint, it can be used to solve other regular optimal control problems (with a final time cost for instance).

What is the drawback of the Levenberg-Marquardt algorithm?

In order to overcome this drawback, a low-discrepancy sequence of M samples {pi(T )}i=1,M , each of them belonging to a domain Vd of Rn, is generated.

What is the estimate of the MPC approach?

demonstrates the effectiveness of the proposed approach to deal with medium-scale MPC problems at least when the dynamics is linear and the cost function is convex and nonlinear.

What is the potential limitation of the proposed approach?

Small control horizons potentially can lead to large optimal controls which could be sometimes incompatible with physical constraints.

What is the solution to the MPC problem?

Here the MPC problem is defined by the following convex but nonlinear optimal problem with control horizon T = 0.25 defined as what follows:min u12 ∫ t+0.25 t (log(1 + 1 2 ‖x(τ)‖2) + ‖u(τ)‖2)dτ. (28)Supervised learning was performed from a Sobol sequence of only 1000 samples pi(t + T ) defined in hypercube [−0.15, 0.15]50.

(Open Access) A Simple Machine Learning Technique for Model Predictive Control (2019) | Didier Georges

Q: What is the average model of a boost converter?

The average model of a boost converter is given bydi dt = −(1− u)v/L+ E/Ldv dt = (1− u)i/C − v/(RC) (18)where E is the input voltage, i is the inductance current, v is the output voltage, and u is the duty cycle of the converter acting as control input.

HAL Id: hal-02179706

https://hal.archives-ouvertes.fr/hal-02179706

Submitted on 11 Jul 2019

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entic research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diusion de documents

scientiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

A Simple Machine Learning Technique for Model

Predictive Control

Didier Georges

To cite this version:

Didier Georges. A Simple Machine Learning Technique for Model Predictive Control. MED

2019 - 27th Mediterranean Conference on Control and Automation, Jul 2019, Akko, Israel.

�10.1109/MED.2019.8798512�. �hal-02179706�

A Simple Machine Learning Technique for Model Predictive Control

Didier Georges

Abstract— This paper is devoted to a simple approach for the

ofﬂine computation of closed-loop optimal control for dynamical

systems with imposed terminal state arising in Model Predictive

Control Scheme (MPC). The here-proposed approach simply

relies on some integrations of the characteristic equations

associated to the optimal control problem, together with the

classical supervised learning of a one-hidden-layer neuron

network, to get a closed-loop MPC completely computed ofﬂine.

Some examples are provided in the paper, which demonstrate

the ability of this approach to tackle some quite large prob-

lems, with state dimensions reaching 50, without encountering

limitations due to the so-called curse of dimensionality.

I. INTRODUCTION

The computation of nonlinear model predictive control in

closed-loop form still remains a challenge due to the so-

called curse of dimensionality of the associated optimal con-

trol problem. An alternative solution is to solve the associated

optimal control problem online. However the computational

cost may be incompatible with real-time for fast systems.

Except for linear-quadratic optimal control problems, the

computation of closed-loop solutions remains largely chal-

lenging when the state dimension is typically greater than

5. Several attempts have been made to compute the ofﬂine

closed-loop solution to nonlinear optimal control problems

by using a polynomial approximation of the solution of the

associated Hamilton-Jacobi-Bellman equation [?] or some

more general functional approximations of the optimal con-

trol thanks to some Galerkin approaches [?], [?]. Several

approaches based on reinforcement learning and adaptive

dynamic programming have been proposed [?]. However

these latter approaches remain computionally expensive for

medium or large scale nonlinear systems. In practice, all the

above-mentionned approaches fail to offer practical solutions

to problems whose state dimension is greater than 3 or 4,

and are not appropriate to include terminal constraints. Other

approaches can be derived by using model reduction to deal

with the optimal control of a ”small” system (see [?] for a

recent paper in the linear case). However the extension of

such an approach is to be made for the closed-loop optimal

control of nonlinear systems. In this paper, the combination

of a supervised learning technique with the integration of the

characteristics of the associated Hamilton-Jacobi equation of

the optimal control problem associated to a MPC scheme

with a terminal state constraints and possibly some input

bounds is proposed and experimented on several case studies.

The paper is organized as follows. In section 2, some

background is provided on nonlinear MPC and the necessary

Didier Georges is with Univ. Grenoble Alpes, CNRS, Grenoble INP*,

GIPSA-lab, 38000 Grenoble, France *Institute of Engineering Univ. Greno-

ble Alpes, didier.georges@grenoble-inp.fr

conditions for optimality of the associated optimal control

problem. Section 3 describes the control design methodology

proposed in this paper. In section 4, four illustrative examples

demonstrate the effectiveness and easiness of the approach.

Some conclusions and perspectives are given in section 5.

II. SOME BACKGROUND ON MODEL PREDICTIVE

CONTROL OF NONLINEAR SYSTEMS

We consider a class of nonlinear systems deﬁned by

˙x(t) = F (x(t)) + G(x(t))u(t) (1)

where x(t) ∈ R

and u(t) ∈ R

, with F (0) = 0 (0 is an

equilibrium state of the system). F and G are also assumed

to be at least continously differentiable.

In this paper we consider the continuous-time model

predictive control of such systems around the origin with

terminal state constraint, which consist in

1) Solving at time instant t, knowing the current state

x(t), an open-loop optimal control problem with ﬁnite

control horizon t + T deﬁned by:

min

t+T

L(x(τ), u(τ ))dτ (2)

subject to ˙x(τ) = F (x(τ))+ G(x(τ ))u(τ ), τ ∈ [t, t +

T ], with x(t) known, x(t + T ) = 0, and where

L(x, u) = l(x) +

R(x)u − g(x)

u (3)

l(x) ≥ 0, ∀x 6= 0, l(0) = 0, (4)

R(x) = R

(x) > 0, ∀x, (5)

where R(x) is a m × m matrix, and g(x) is a vector

of R

2) Applying optimal control solution u

∗

(t) obtained at

time t. At time t + , the system reaches a new state

x(t+) = x(t)+

t+

(F (x(τ))+G(x(τ))u

∗

(τ)dτ) ≈

x(t) + (F (x(t) + G(x(t)u

∗

(t)).

3) Repeating the above sequence with t +  → t.

Under mainly system controllability and zero-state

observability of (L(x, 0), F (x)) assumptions, it can

be shown that optimal cost function V (t, x(t)) =

min

t+T

L(x(τ), u(τ ))dτ with x(t + T ) = 0 is a

Lyapunov function of the closed-loop system [?] under

control law u

∗

(t, x(t)), optimal control solution of (??)

at time t. Therefore, this model predictive control scheme

described above ensures asymptotic stability around the

origin.

In what follows, we will consider a nonsingular optimal

control problem deﬁned from 0 to T , since both F and G

do not explicitely depend on t (time-invariant systems).

If H(x, p) = L(x, u)+ p

(F (x)+ G(x)u) deﬁnes the so-

called Hamiltonian associated to the problem, Pontryagin’s

principle [?] for optimality provides the following necessary

conditions for optimality

∇

H = 0

⇔ u

∗

(t) = g(x(t)) − R

−1

(x(t))G

(x(t))p(t), (6)

∇

H = 0 ⇔ ˙x = F (x) + G(x)u

∗

, (7)

x(0) known, x(T ) = 0, (8)

∇

H = 0 ⇔ ˙p = −∇

L(x, u

∗

)

−

∂

∂x

[F (x) + G(x)u

∗

]

p. (9)

where p(t) is the adjoint state of the system.

This deﬁnes a two-point boundary value problem (TP-

BVP) which can be solved, for instance by using a shoot-

ing method which consists in ﬁnding p(0, x(0)) such that

x(T ) = 0, in order to get u

∗

(0) thanks to (??). A basic

shooting method can be deﬁned as the solution of the

following nonlinear least-square problem:

min

p(0))

kx(T )k

(10)

s.t.

˙x = F (x) + G(x)g(x) − G(x)R

−1

(x)G

(x)p,

x(0) known (11)

˙p = −∇

L(x, g(x) − R

−1

(x)G

(x)p)

−

∂

∂x

[F (x) + G(x)g(x) − G(x)R

−1

(x)G

(x)p]

(12)

Since p(0) is obtained, optimal control at time t = 0

∗

(0, x(0)) can be easily derived from (??).

Multi-shooting methods ([?]) are recommended for the

computation of TPBVPs with large horizon T to avoid ill-

conditioning.

In practice, the solution of TPBVP (??)-(??) can be

very time consuming, especially for large systems, and the

approach is then not appropriate for fast systems, since the

TPBVP has to solved at each time instant of control.

Rather than trying to solve TPBVP (??)-(??) at every

time instant of control, we will consider an approach which

will generate ofﬂine a closed-loop solution of the model

predictive control deﬁned above. This approach relies on a

sequence of simple integrations of differential equations

(??) and (??) performed backward in time and starting

always from x(T ) = 0. It can be noticed that equations

(??) and (??) are nothing else that the equations of the

characteristics of Hamilton-Jacobi-Bellman equation

∂V

∂t

(t, x) + min

u(.)

H(x,

∂V

∂x

) = 0, (13)

associated to optimal problem (??), where p(t) =

∂V

∂x

Constrained input case: According to Pontryagin prin-

ciple, if u is constrained to belong to a compact set U (for

instance a hypercube of R

), necessary condition (??) has

to be replaced by

∗

= Arg min

u∈U

⇔ u

∗

= P roj

(g(x) − R

−1

(x)G

(x))p) (14)

where P roj

denotes the projection operator onto U.

The approach remains unchanged by replacing (??) by

(??) in characteristic equations (??) and (??), that however

become discountinuous, what potentially makes integration

much more tricky.

III. CLOSED-LOOP SOLUTION TO THE MODEL

PREDICTIVE CONTROL PROBLEM

The ofﬂine derivation of a closed-loop solution of the

model predictive control scheme relies on two ingredients:

1) The deﬁnition of a one-hidden-layer neural network,

with N neurons in the hidden layer with n input

neurons and m output neurons, used to approximate

∗

(x), which represents the model predictive control

given by (??) at time t = 0, as a function of any initial

state x(0) = x in a domain large enough:

∗

(x) ≈

i=1

σ(α

x + b

) = Φ

(x), (15)

where σ(x) =

1 + e

−x

or tanh represents the neuron

activation function, and w

∈ R

, α

∈ R

, b

∈ R,

and θ = (w

, α

, β

)

i=1,...,N

It has been shown in [?] that such networks can be

used as multi-dimensional approximants. In this paper,

best results have been obtained with tanh activation

function. The use of multi-layer networks seems not to

provide signiﬁcant improvements in the present case

according to preliminary experiments. However this

point should be more deeply investigated.

2) The use of low-discrepancy sequences such as the

ones proposed by Halton, Sobol, Faure (see [?] for

instance) to generate learning sequences.

Such sequences have been proposed to solve the

problem of optimally choosing M samples x

in a

hypercube C = [0, 1]

to ”minimize holes” in the

sense of the best possible approximation of integrals:

i=1

f(x

) −

f(x)dx| ≤ W (f)

log(M)

(16)

W (f) is the variation of f in the sense of Hardy &

Krause.

This approach usually provides better approximation

results than other approaches based on random se-

quences for n ≤ 20.

Low-discrepancy sequences could be used to generate

initial state sequences by ”ﬁlling” a hypercube of

and then compute the corresponding sequence of

TPBVPs (??)-(??). Even though this approach cer-

tainly can provide an effective way to get closed-loop

approximation to the nonlinear optimal solution thanks

to supervised learning, it appears to be very computa-

tionally expensive for medium/large-scale systems.

In order to overcome this drawback, a low-discrepancy

sequence of M samples {p

(T )}

i=1,M

, each of them

belonging to a domain V

of R

, is generated.

Then a standard backward-in-time numerical integra-

tion of characteristic equations (??) and (??) is per-

formed starting from t = T with each of the M samples

(T )}

i=1,M

and x(T ) = 0 to initial time t = 0 in order to

get a sequence of M samples x

(0, p

(T )), representing the

initial state of the optimal trajectory satisfying x(T ) = 0, to-

gether with the set of M samples of control u

∗

(0, p

(T )))

obtained by using control equation (??) or (??).

A classical supervised learning technique is ﬁnally used

to tune parameters θ of the neural network, as solution of the

following nonlinear regression problem, using the M pairs

of samples (x

(0, p

(T )), u

∗

(0, p

(T ))), i = 1, ..., M

generated above:

min

i=1

kΦ

(0, p

(T )) − u

∗

(0, p

(T ))k

. (17)

Many approaches can be used to solve this problem (stochas-

tic gradient, quasi-Newton ...) (see [?] for instance). In this

paper, the Levenberg-Marquardt algorithm [?] implemented

in MATLAB neural network toolbox has been used. Ac-

cording to my experience, this approach appears to be more

efﬁcient that the stochastic gradient approach for this class

of problems. Due to nonconvexity, several learning trials are

needed to retain the best solution in the least-square sense.

Remarks.

• Adjoint state domain V

should be adequately chosen to

ensure a well deﬁned domain V of the x

(0, p

(T ))’s

including the origin (for instance a hypercube of the

form V =

i=1

, b

]). For that purpose, the computation

of a ﬁnite number of TPBVPs (??)-(??) with initial state

x(0) deﬁned in V , can be helpful to infer V

as an

hypercube in R

of the form V

i=1

, b

• Control horizon T should be chosen not too large

to avoid ill-conditioning and/or discontinuities which

occur with nonlinear problems when the trajectories

of characteristic equations (??) or (??) intersect then

generating shocks and leading to integration failures and

singularities.

IV. SOME ILLUSTRATIVE EXAMPLES

A. Constrained nonlinear model predictive control of a boost

converter

In this section, the model predictive control of a boost

converter is considered.

DC-to-DC boost converters are a class of power electron-

ics devices used to elevate input voltage ([?]). This system

exhibits non minimum phase behavior, which renders it quite

difﬁcult to control.

The average model of a boost converter is given by

= −(1 − u)v/L + E/L

= (1 − u)i/C − v/(RC) (18)

where E is the input voltage, i is the inductance current, v

is the output voltage, and u is the duty cycle of the converter

acting as control input. The duty cycle is constrained to

belong to interval [0, 1].

In order to avoid ill-conditioning, a classical normalization

procedure is performed using following changes of variables:







L/C 0





, τ =

√

(19)

The normalized average model is then given by

dτ

= −(1 − u)z

+ 1

dτ

= (1 − u)z

−

(R + ∆R)

C/L

(20)

Here we consider the nonlinear MPC around the equilib-

rium deﬁned by (z

, z

, u

) of the boost converter described

by Table ??.

The optimal control problem associated to the MPC

scheme is deﬁned by

min

u∈[0,1]

(kz − z

+ (u − u

)

)dt (21)

where z = (z

, z

)

, and z

= (z

, z

)

, subject to average

model dynamics (??).

TABLE I

BOOST CONVERTER PARAMETERS

L C R E i

15mH 50µF R = 50Ω 12V 2.165A 2.5V 0.6

Fig. ?? shows the closed-loop dynamics of the boost

converter under the constrained MPC. Fig. ?? shows the

constrained closed-loop control input. Clearly, the bounds

of the duty cycle are satisﬁed. The number of neurons in the

hidden layer is equal to 40. The number of Sobol sequence

samples of ﬁnal time adjoint state p(T ) ∈ [−0.4, 0.4]

equal to 2000. Although the characteristics equations are

unsmooth due to the input constraints, MATLAB function

ode45 was successfully used for the numerical integration.

The retained solution was the best among 10 learning trials.

B. A small-scale linear MPC problem

The goal here is to compare the approximate solution

of the following MPC with an controllable and unstable

linear MPC system (with 3 unstable eigenvalues) with control

horizon T = 4:

min

t+4

(kxk

+ kuk

)dτ (22)

subject to

˙x = Ax + Bu, x ∈ R

, u ∈ R

, x(t + 4) = 0, (23)

0 0.5 1 1.5 2 2.5 3 3.5 4

Time

Closed-loop dynamics of a boost converter

Normalized Current

Normalized Voltage

0 0.5 1 1.5 2 2.5 3 3.5 4

Time

0.4

0.6

0.8

1.2

Control input

Fig. 1. Constrained MPC of the boost converter.

2.5

0.2

Neural network closed-loop input

0.4

2.5

0.6

0.8

1.5

Fig. 2. Constrained MPC duty cycle as a function of the two states.

where coefﬁcients of both A and B were randomly generated

by using a uniform distribution on [−1, 1], with the reference

solution derived from the explicit solution obtained from the

Hamiltonian system given by



˙x

˙p



= H







A −BB

−I

−A





, (24)

x(t) known, x(t + T ) = 0.

The closed-loop control is explicitly given by

u(t, x(t)) = −B

p(t) (25)

p(t) = −E

(t + T )

−1

(t + T )x(t) (26)

where E

and E

are given by

E(t) =



(t) E

(t)

(t) E

(t)



= exp(Ht). (27)

Supervised learning was performed from a Sobol sequence of

3000 samples p

(t + T ) deﬁned in hypercube [−0.5, 0.5]

The hidden layer of the neural network used 25 neurons.

Fig ?? shows the closed-loop dynamics with the best ap-

proximate control in terms of least-square cost function value

obtained after solving 10 learning problems (??) for different

initializations of network parameter θ.

Figs. ?? and ?? show very similar behaviors. Relative error

between Hamiltonian solution u and approximate solution u

t+T

ku(τ )−u

(τ )k

dtτ

t+T

ku(τ )k

dtτ

is equal to 7.6e

−2

0 2 4 6 8 10 12

Time (sec)

-1000

-500

500

Closed-loop dynamics under NN MPC

0 2 4 6 8 10 12

Time (sec)

-2000

2000

4000

Control inputs

Fig. 3. Approximate MPC solution.

0 2 4 6 8 10 12

Time

-1000

-500

500

Closed-loop dynamics with Hamiltonian solution

0 2 4 6 8 10 12

Time

-2000

2000

4000

Control inputs

Fig. 4. Explicit linear MPC solution.

C. A medium-scale nonlinear MPC problem

In this section, the ability of the approach to deal with a

medium-scale MPC problem is investigated.

A Simple Machine Learning Technique for Model Predictive Control

Figures

Citations

Supervised Imitation Learning of Finite-Set Model Predictive Control Systems for Power Electronics

Modeling, diagnostics, optimization, and control of internal combustion engines via modern machine learning techniques: A review and future directions

Neural Network Based Model Predictive Controllers for Modular Multilevel Converters

Verification of Neural Networks Meets PLC Code: An LHC Cooling Tower Control System at CERN

Inductive biases and Self Supervised Learning in modelling a physical heating system.

References

Approximation by superpositions of a sigmoidal function

Survey Constrained model predictive control: Stability and optimality

Fundamentals of Power Electronics

Iterative Methods for Optimization

Approximation by Superpositions of a Sigmoidal Function

Related Papers (5)

Performance and safety of Bayesian model predictive control: Scalable model-based RL with guarantees

Deep Learning Approximation for Stochastic Control Problems.

Localized active learning of Gaussian process state space models

GP-ILQG: Data-driven Robust Optimal Control for Uncertain Nonlinear Dynamical Systems.

Provably Correct Learning Algorithms in the Presence of Time-Varying Features Using a Variational Perspective

Frequently Asked Questions (9)

Q1. What are the contributions mentioned in the paper "A simple machine learning technique for model predictive control" ?

Q2. What future works have the authors mentioned in the paper "A simple machine learning technique for model predictive control" ?

Q3. What is the potential limitation of the here-proposed approach?

Q4. What is the main advantage of the method?

Q5. What is the average model of a boost converter?

Q6. What is the drawback of the Levenberg-Marquardt algorithm?

Q7. What is the estimate of the MPC approach?

Q8. What is the potential limitation of the proposed approach?

Q9. What is the solution to the MPC problem?