scispace - formally typeset
Open AccessProceedings ArticleDOI

A Simple Machine Learning Technique for Model Predictive Control

TLDR
The here-proposed approach simply relies on some integrations of the characteristic equations associated to the optimal control problem, together with the classical supervised learning of a one-hidden-layer neuron network, to get a closed-loop MPC completely computed offline.
Abstract
This paper is devoted to a simple approach for the offline computation of closed-loop optimal control for dynamical systems with imposed terminal state arising in Model Predictive Control Scheme (MPC). The here-proposed approach simply relies on some integrations of the characteristic equations associated to the optimal control problem, together with the classical supervised learning of a one-hidden-layer neuron network, to get a closed-loop MPC completely computed offline. Some examples are provided in the paper, which demonstrate the ability of this approach to tackle some quite large problems, with state dimensions reaching 50, without encountering limitations due to the so-called curse of dimensionality.

read more

Content maybe subject to copyright    Report

HAL Id: hal-02179706
https://hal.archives-ouvertes.fr/hal-02179706
Submitted on 11 Jul 2019
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
A Simple Machine Learning Technique for Model
Predictive Control
Didier Georges
To cite this version:
Didier Georges. A Simple Machine Learning Technique for Model Predictive Control. MED
2019 - 27th Mediterranean Conference on Control and Automation, Jul 2019, Akko, Israel.
�10.1109/MED.2019.8798512�. �hal-02179706�

A Simple Machine Learning Technique for Model Predictive Control
Didier Georges
1
Abstract This paper is devoted to a simple approach for the
offline computation of closed-loop optimal control for dynamical
systems with imposed terminal state arising in Model Predictive
Control Scheme (MPC). The here-proposed approach simply
relies on some integrations of the characteristic equations
associated to the optimal control problem, together with the
classical supervised learning of a one-hidden-layer neuron
network, to get a closed-loop MPC completely computed offline.
Some examples are provided in the paper, which demonstrate
the ability of this approach to tackle some quite large prob-
lems, with state dimensions reaching 50, without encountering
limitations due to the so-called curse of dimensionality.
I. INTRODUCTION
The computation of nonlinear model predictive control in
closed-loop form still remains a challenge due to the so-
called curse of dimensionality of the associated optimal con-
trol problem. An alternative solution is to solve the associated
optimal control problem online. However the computational
cost may be incompatible with real-time for fast systems.
Except for linear-quadratic optimal control problems, the
computation of closed-loop solutions remains largely chal-
lenging when the state dimension is typically greater than
5. Several attempts have been made to compute the offline
closed-loop solution to nonlinear optimal control problems
by using a polynomial approximation of the solution of the
associated Hamilton-Jacobi-Bellman equation [?] or some
more general functional approximations of the optimal con-
trol thanks to some Galerkin approaches [?], [?]. Several
approaches based on reinforcement learning and adaptive
dynamic programming have been proposed [?]. However
these latter approaches remain computionally expensive for
medium or large scale nonlinear systems. In practice, all the
above-mentionned approaches fail to offer practical solutions
to problems whose state dimension is greater than 3 or 4,
and are not appropriate to include terminal constraints. Other
approaches can be derived by using model reduction to deal
with the optimal control of a ”small” system (see [?] for a
recent paper in the linear case). However the extension of
such an approach is to be made for the closed-loop optimal
control of nonlinear systems. In this paper, the combination
of a supervised learning technique with the integration of the
characteristics of the associated Hamilton-Jacobi equation of
the optimal control problem associated to a MPC scheme
with a terminal state constraints and possibly some input
bounds is proposed and experimented on several case studies.
The paper is organized as follows. In section 2, some
background is provided on nonlinear MPC and the necessary
1
Didier Georges is with Univ. Grenoble Alpes, CNRS, Grenoble INP*,
GIPSA-lab, 38000 Grenoble, France *Institute of Engineering Univ. Greno-
ble Alpes, didier.georges@grenoble-inp.fr
conditions for optimality of the associated optimal control
problem. Section 3 describes the control design methodology
proposed in this paper. In section 4, four illustrative examples
demonstrate the effectiveness and easiness of the approach.
Some conclusions and perspectives are given in section 5.
II. SOME BACKGROUND ON MODEL PREDICTIVE
CONTROL OF NONLINEAR SYSTEMS
We consider a class of nonlinear systems defined by
˙x(t) = F (x(t)) + G(x(t))u(t) (1)
where x(t) R
n
and u(t) R
m
, with F (0) = 0 (0 is an
equilibrium state of the system). F and G are also assumed
to be at least continously differentiable.
In this paper we consider the continuous-time model
predictive control of such systems around the origin with
terminal state constraint, which consist in
1) Solving at time instant t, knowing the current state
x(t), an open-loop optimal control problem with finite
control horizon t + T defined by:
min
u
Z
t+T
t
L(x(τ), u(τ )) (2)
subject to ˙x(τ) = F (x(τ))+ G(x(τ ))u(τ ), τ [t, t +
T ], with x(t) known, x(t + T ) = 0, and where
L(x, u) = l(x) +
1
2
u
T
R(x)u g(x)
T
u (3)
l(x) 0, x 6= 0, l(0) = 0, (4)
R(x) = R
T
(x) > 0, x, (5)
where R(x) is a m × m matrix, and g(x) is a vector
of R
m
.
2) Applying optimal control solution u
(t) obtained at
time t. At time t + , the system reaches a new state
x(t+) = x(t)+
Z
t+
t
(F (x(τ))+G(x(τ))u
(τ))
x(t) + (F (x(t) + G(x(t)u
(t)).
3) Repeating the above sequence with t + t.
Under mainly system controllability and zero-state
observability of (L(x, 0), F (x)) assumptions, it can
be shown that optimal cost function V (t, x(t)) =
min
u
Z
t+T
t
L(x(τ), u(τ )) with x(t + T ) = 0 is a
Lyapunov function of the closed-loop system [?] under
control law u
(t, x(t)), optimal control solution of (??)
at time t. Therefore, this model predictive control scheme
described above ensures asymptotic stability around the
origin.

In what follows, we will consider a nonsingular optimal
control problem defined from 0 to T , since both F and G
do not explicitely depend on t (time-invariant systems).
If H(x, p) = L(x, u)+ p
T
(F (x)+ G(x)u) defines the so-
called Hamiltonian associated to the problem, Pontryagin’s
principle [?] for optimality provides the following necessary
conditions for optimality
u
H = 0
u
(t) = g(x(t)) R
1
(x(t))G
T
(x(t))p(t), (6)
p
H = 0 ˙x = F (x) + G(x)u
, (7)
x(0) known, x(T ) = 0, (8)
x
H = 0 ˙p = −∇
x
L(x, u
)
x
[F (x) + G(x)u
]
T
p. (9)
where p(t) is the adjoint state of the system.
This defines a two-point boundary value problem (TP-
BVP) which can be solved, for instance by using a shoot-
ing method which consists in finding p(0, x(0)) such that
x(T ) = 0, in order to get u
(0) thanks to (??). A basic
shooting method can be defined as the solution of the
following nonlinear least-square problem:
min
p(0))
1
2
kx(T )k
2
(10)
s.t.
˙x = F (x) + G(x)g(x) G(x)R
1
(x)G
T
(x)p,
x(0) known (11)
˙p = −∇
x
L(x, g(x) R
1
(x)G
T
(x)p)
x
[F (x) + G(x)g(x) G(x)R
1
(x)G
T
(x)p]
T
p.
(12)
Since p(0) is obtained, optimal control at time t = 0
u
(0, x(0)) can be easily derived from (??).
Multi-shooting methods ([?]) are recommended for the
computation of TPBVPs with large horizon T to avoid ill-
conditioning.
In practice, the solution of TPBVP (??)-(??) can be
very time consuming, especially for large systems, and the
approach is then not appropriate for fast systems, since the
TPBVP has to solved at each time instant of control.
Rather than trying to solve TPBVP (??)-(??) at every
time instant of control, we will consider an approach which
will generate offline a closed-loop solution of the model
predictive control defined above. This approach relies on a
sequence of simple integrations of differential equations
(??) and (??) performed backward in time and starting
always from x(T ) = 0. It can be noticed that equations
(??) and (??) are nothing else that the equations of the
characteristics of Hamilton-Jacobi-Bellman equation
V
t
(t, x) + min
u(.)
H(x,
V
x
) = 0, (13)
associated to optimal problem (??), where p(t) =
V
x
.
Constrained input case: According to Pontryagin prin-
ciple, if u is constrained to belong to a compact set U (for
instance a hypercube of R
m
), necessary condition (??) has
to be replaced by
u
= Arg min
uU
H
u
= P roj
U
(g(x) R
1
(x)G
T
(x))p) (14)
where P roj
U
denotes the projection operator onto U.
The approach remains unchanged by replacing (??) by
(??) in characteristic equations (??) and (??), that however
become discountinuous, what potentially makes integration
much more tricky.
III. CLOSED-LOOP SOLUTION TO THE MODEL
PREDICTIVE CONTROL PROBLEM
The offline derivation of a closed-loop solution of the
model predictive control scheme relies on two ingredients:
1) The definition of a one-hidden-layer neural network,
with N neurons in the hidden layer with n input
neurons and m output neurons, used to approximate
u
(x), which represents the model predictive control
given by (??) at time t = 0, as a function of any initial
state x(0) = x in a domain large enough:
u
(x)
N
X
i=1
w
i
σ(α
T
i
x + b
i
) = Φ
θ
(x), (15)
where σ(x) =
1
1 + e
x
or tanh represents the neuron
activation function, and w
i
R
m
, α
i
R
n
, b
i
R,
and θ = (w
i
, α
i
, β
i
)
i=1,...,N
.
It has been shown in [?] that such networks can be
used as multi-dimensional approximants. In this paper,
best results have been obtained with tanh activation
function. The use of multi-layer networks seems not to
provide significant improvements in the present case
according to preliminary experiments. However this
point should be more deeply investigated.
2) The use of low-discrepancy sequences such as the
ones proposed by Halton, Sobol, Faure (see [?] for
instance) to generate learning sequences.
Such sequences have been proposed to solve the
problem of optimally choosing M samples x
i
in a
hypercube C = [0, 1]
n
to ”minimize holes” in the
sense of the best possible approximation of integrals:
|
1
M
M
X
i=1
f(x
i
)
Z
C
f(x)dx| W (f)
log(M)
n
M
(16)
W (f) is the variation of f in the sense of Hardy &
Krause.
This approach usually provides better approximation
results than other approaches based on random se-
quences for n 20.
Low-discrepancy sequences could be used to generate
initial state sequences by ”filling” a hypercube of
R
n
and then compute the corresponding sequence of
TPBVPs (??)-(??). Even though this approach cer-
tainly can provide an effective way to get closed-loop

approximation to the nonlinear optimal solution thanks
to supervised learning, it appears to be very computa-
tionally expensive for medium/large-scale systems.
In order to overcome this drawback, a low-discrepancy
sequence of M samples {p
i
(T )}
i=1,M
, each of them
belonging to a domain V
d
of R
n
, is generated.
Then a standard backward-in-time numerical integra-
tion of characteristic equations (??) and (??) is per-
formed starting from t = T with each of the M samples
{p
i
(T )}
i=1,M
and x(T ) = 0 to initial time t = 0 in order to
get a sequence of M samples x
i
(0, p
i
(T )), representing the
initial state of the optimal trajectory satisfying x(T ) = 0, to-
gether with the set of M samples of control u
i
(x
i
(0, p
i
(T )))
obtained by using control equation (??) or (??).
A classical supervised learning technique is finally used
to tune parameters θ of the neural network, as solution of the
following nonlinear regression problem, using the M pairs
of samples (x
i
(0, p
i
(T )), u
i
(x
i
(0, p
i
(T ))), i = 1, ..., M
generated above:
min
θ
1
2
M
X
i=1
kΦ
θ
(x
i
(0, p
i
(T )) u
i
(x
i
(0, p
i
(T ))k
2
. (17)
Many approaches can be used to solve this problem (stochas-
tic gradient, quasi-Newton ...) (see [?] for instance). In this
paper, the Levenberg-Marquardt algorithm [?] implemented
in MATLAB neural network toolbox has been used. Ac-
cording to my experience, this approach appears to be more
efficient that the stochastic gradient approach for this class
of problems. Due to nonconvexity, several learning trials are
needed to retain the best solution in the least-square sense.
Remarks.
Adjoint state domain V
d
should be adequately chosen to
ensure a well defined domain V of the x
i
(0, p
i
(T ))s
including the origin (for instance a hypercube of the
form V =
n
Y
i=1
[a
i
, b
i
]). For that purpose, the computation
of a finite number of TPBVPs (??)-(??) with initial state
x(0) defined in V , can be helpful to infer V
d
as an
hypercube in R
n
of the form V
d
=
n
Y
i=1
[a
d
i
, b
d
i
].
Control horizon T should be chosen not too large
to avoid ill-conditioning and/or discontinuities which
occur with nonlinear problems when the trajectories
of characteristic equations (??) or (??) intersect then
generating shocks and leading to integration failures and
singularities.
IV. SOME ILLUSTRATIVE EXAMPLES
A. Constrained nonlinear model predictive control of a boost
converter
In this section, the model predictive control of a boost
converter is considered.
DC-to-DC boost converters are a class of power electron-
ics devices used to elevate input voltage ([?]). This system
exhibits non minimum phase behavior, which renders it quite
difficult to control.
The average model of a boost converter is given by
di
dt
= (1 u)v/L + E/L
dv
dt
= (1 u)i/C v/(RC) (18)
where E is the input voltage, i is the inductance current, v
is the output voltage, and u is the duty cycle of the converter
acting as control input. The duty cycle is constrained to
belong to interval [0, 1].
In order to avoid ill-conditioning, a classical normalization
procedure is performed using following changes of variables:
z
1
z
2
=
1
E
p
L/C 0
0
1
E
i
v
, τ =
t
LC
(19)
The normalized average model is then given by
dz
1
= (1 u)z
2
+ 1
dz
2
= (1 u)z
1
1
(R + R)
p
C/L
z
2
,
(20)
Here we consider the nonlinear MPC around the equilib-
rium defined by (z
e
1
, z
e
2
, u
e
) of the boost converter described
by Table ??.
The optimal control problem associated to the MPC
scheme is defined by
min
u[0,1]
1
2
Z
2
0
(kz z
e
k
2
+ (u u
e
)
2
)dt (21)
where z = (z
1
, z
2
)
T
, and z
e
= (z
e
1
, z
e
2
)
T
, subject to average
model dynamics (??).
TABLE I
BOOST CONVERTER PARAMETERS
L C R E i
e
v
e
u
e
15mH 50µF R = 50Ω 12V 2.165A 2.5V 0.6
Fig. ?? shows the closed-loop dynamics of the boost
converter under the constrained MPC. Fig. ?? shows the
constrained closed-loop control input. Clearly, the bounds
of the duty cycle are satisfied. The number of neurons in the
hidden layer is equal to 40. The number of Sobol sequence
samples of final time adjoint state p(T ) [0.4, 0.4]
2
is
equal to 2000. Although the characteristics equations are
unsmooth due to the input constraints, MATLAB function
ode45 was successfully used for the numerical integration.
The retained solution was the best among 10 learning trials.
B. A small-scale linear MPC problem
The goal here is to compare the approximate solution
of the following MPC with an controllable and unstable
linear MPC system (with 3 unstable eigenvalues) with control
horizon T = 4:
min
u
1
2
Z
t+4
t
(kxk
2
+ kuk
2
) (22)
subject to
˙x = Ax + Bu, x R
6
, u R
2
, x(t + 4) = 0, (23)

0 0.5 1 1.5 2 2.5 3 3.5 4
Time
1
2
3
4
Closed-loop dynamics of a boost converter
Normalized Current
Normalized Voltage
0 0.5 1 1.5 2 2.5 3 3.5 4
Time
0.4
0.6
0.8
1
1.2
Control input
Fig. 1. Constrained MPC of the boost converter.
3
0
2.5
3
0.2
Neural network closed-loop input
x1
0.4
2.5
x2
2
0.6
0.8
2
1
1.5
1.5
Fig. 2. Constrained MPC duty cycle as a function of the two states.
where coefficients of both A and B were randomly generated
by using a uniform distribution on [1, 1], with the reference
solution derived from the explicit solution obtained from the
Hamiltonian system given by
˙x
˙p
= H
x
p
=
A BB
T
p
I
6
A
T
x
p
, (24)
x(t) known, x(t + T ) = 0.
The closed-loop control is explicitly given by
u(t, x(t)) = B
T
p(t) (25)
p(t) = E
2
(t + T )
1
E
1
(t + T )x(t) (26)
where E
1
and E
2
are given by
E(t) =
E
1
(t) E
2
(t)
E
3
(t) E
4
(t)
= exp(Ht). (27)
Supervised learning was performed from a Sobol sequence of
3000 samples p
i
(t + T ) defined in hypercube [0.5, 0.5]
6
.
The hidden layer of the neural network used 25 neurons.
Fig ?? shows the closed-loop dynamics with the best ap-
proximate control in terms of least-square cost function value
obtained after solving 10 learning problems (??) for different
initializations of network parameter θ.
Figs. ?? and ?? show very similar behaviors. Relative error
between Hamiltonian solution u and approximate solution u
a
R
t+T
t
ku(τ )u
a
(τ )k
2
dtτ
R
t+T
t
ku(τ )k
2
dtτ
is equal to 7.6e
2
.
0 2 4 6 8 10 12
Time (sec)
-1000
-500
0
500
Closed-loop dynamics under NN MPC
0 2 4 6 8 10 12
Time (sec)
-2000
0
2000
4000
Control inputs
Fig. 3. Approximate MPC solution.
0 2 4 6 8 10 12
Time
-1000
-500
0
500
Closed-loop dynamics with Hamiltonian solution
0 2 4 6 8 10 12
Time
-2000
0
2000
4000
Control inputs
Fig. 4. Explicit linear MPC solution.
C. A medium-scale nonlinear MPC problem
In this section, the ability of the approach to deal with a
medium-scale MPC problem is investigated.

Citations
More filters
Journal ArticleDOI

Supervised Imitation Learning of Finite-Set Model Predictive Control Systems for Power Electronics

TL;DR: The proposed imitator is an artificial neural network trained offline using data labeled by the original FS-MPC algorithm to imitate the predictive controller and its key role is to keep approximately the same performance while at the same time reducing the computational burden.
Journal ArticleDOI

Modeling, diagnostics, optimization, and control of internal combustion engines via modern machine learning techniques: A review and future directions

TL;DR: A critical review of the existing internal combustion engine (ICE) modeling, optimization, diagnosis, and control challenges and the promising state-of-the-art Machine Learning (ML) solutions for them is provided in this paper.
Journal ArticleDOI

Neural Network Based Model Predictive Controllers for Modular Multilevel Converters

TL;DR: This work designs machine learning (ML) based controllers for MMC based on the data collection from the MPC algorithm and shows that NN regression has a much better control performance and lower computation burden than the NN pattern recognition.
Book ChapterDOI

Verification of Neural Networks Meets PLC Code: An LHC Cooling Tower Control System at CERN

TL;DR: In this paper , the authors discuss the peculiarities of formal verification of NNs controllers running on PLCs and outline a set of properties that should be satisfied by a NN that is intended to be deployed in a critical high availability installation at CERN.
Posted Content

Inductive biases and Self Supervised Learning in modelling a physical heating system.

TL;DR: In this article, the authors infer inductive biases about a physical system and use these biases to derive a new neural network architecture that can model this real system that has noise and inertia, which can be used in real scenario to control systems with delayed responses with respect to its controls or inputs.
References
More filters
Journal ArticleDOI

Approximation by superpositions of a sigmoidal function

TL;DR: It is demonstrated that finite linear combinations of compositions of a fixed, univariate function and a set of affine functionals can uniformly approximate any continuous function ofn real variables with support in the unit hypercube.
Journal ArticleDOI

Survey Constrained model predictive control: Stability and optimality

TL;DR: This review focuses on model predictive control of constrained systems, both linear and nonlinear, and distill from an extensive literature essential principles that ensure stability to present a concise characterization of most of the model predictive controllers that have been proposed in the literature.
Book

Fundamentals of Power Electronics

TL;DR: Converters in Equilibrium, Steady-State Equivalent Circuit Modeling, Losses, and Efficiency, and Power and Harmonics in Nonsinusoidal Systems.
Book

Iterative Methods for Optimization

C. T. Kelley
TL;DR: Iterative Methods for Optimization does more than cover traditional gradient-based optimization: it is the first book to treat sampling methods, including the Hooke& Jeeves, implicit filtering, MDS, and Nelder& Mead schemes in a unified way.
Journal ArticleDOI

Approximation by Superpositions of a Sigmoidal Function

TL;DR: Theorem 2.7 as discussed by the authors generalizes a result of Gao and Xu [4] concerning the approximation of functions of bounded variation by linear combinations of a fixed sigmoidal function.
Related Papers (5)
Frequently Asked Questions (9)
Q1. What are the contributions mentioned in the paper "A simple machine learning technique for model predictive control" ?

This paper is devoted to a simple approach for the offline computation of closed-loop optimal control for dynamical systems with imposed terminal state arising in Model Predictive Control Scheme ( MPC ). Some examples are provided in the paper, which demonstrate the ability of this approach to tackle some quite large problems, with state dimensions reaching 50, without encountering limitations due to the so-called curse of dimensionality. 

Further studies are still needed to evaluate medium-scale nonlinear problems and investigate optimal state estimation. 

A potential limitationto this approach is ill-conditioning and loss of continuities of the characteristic equations when the value of the control horizon is too large. 

Since the method intrinsically solves an optimal control problem with a final state constraint, it can be used to solve other regular optimal control problems (with a final time cost for instance). 

The average model of a boost converter is given bydi dt = −(1− u)v/L+ E/Ldv dt = (1− u)i/C − v/(RC) (18)where E is the input voltage, i is the inductance current, v is the output voltage, and u is the duty cycle of the converter acting as control input. 

In order to overcome this drawback, a low-discrepancy sequence of M samples {pi(T )}i=1,M , each of them belonging to a domain Vd of Rn, is generated. 

demonstrates the effectiveness of the proposed approach to deal with medium-scale MPC problems at least when the dynamics is linear and the cost function is convex and nonlinear. 

Small control horizons potentially can lead to large optimal controls which could be sometimes incompatible with physical constraints. 

Here the MPC problem is defined by the following convex but nonlinear optimal problem with control horizon T = 0.25 defined as what follows:min u12 ∫ t+0.25 t (log(1 + 1 2 ‖x(τ)‖2) + ‖u(τ)‖2)dτ. (28)Supervised learning was performed from a Sobol sequence of only 1000 samples pi(t + T ) defined in hypercube [−0.15, 0.15]50.