scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A classification-based approach to the optimal control of affine switched systems

01 Dec 2015-pp 2963-2968
TL;DR: A key feature of the proposed approach is the use of a classification method that provides guarantees on the generalization properties of the classifier that is tested on a multi-room heating control problem.
Abstract: This paper deals with the optimal control of discrete-time switched systems, characterized by a finite set of operating modes, each one associated with given affine dynamics. The objective is the design of the switching law so as to minimize an infinite-horizon expected cost, that penalizes frequent switchings. The optimal switching law is computed off-line, which allows an efficient online operation of the control via a state feedback policy. The latter associates a mode to each state and, as such, can be viewed as a classifier. In order to train such classifier-type controller one needs first to generate a set of training data in the form of optimal state-mode pairs. In the considered setting, this involves solving a Mixed Integer Quadratic Programming (MIQP) problem for each pair. A key feature of the proposed approach is the use of a classification method that provides guarantees on the generalization properties of the classifier. The approach is tested on a multi-room heating control problem.

Summary (2 min read)

Introduction

  • Which are hybrid systems where the discrete state identifies the mode of the system and the continuous state is governed by mode–specific affine dynamics.the authors.
  • The objective is to compute a state–feedback switching law that minimizes a cost function over an infinite time horizon.
  • [7] considers switching costs, but only for finite switching sequences.
  • For each state in a training set, a state–action pair is generated based on the estimated value function associated to the current policy.
  • To address the classification task, the authors employ the Guaranteed Error Machine (GEM) classifier [17], which provides theoretical guarantees on the probability of classification errors.

II. PRELIMINARIES AND PROBLEM STATEMENT

  • The switching signal at time k may depend on the previous mode of the system, identified by σ(k−1).
  • Open-loop policies select the control input (σ(k)) based on the initial state: σ(k) = πol(x(0), q(0), k), i.e. without any direct measurement of the effects of past inputs.
  • The authors focus on the infinite–horizon case, for which the optimal control design and implementation are easier.
  • Furthermore, only a finite number of initial states can be considered, so that the desired map must be constructed by generalization over the whole state space based on a finite number of samples.

III. CLASSIFICATION-BASED TWO-STAGE ALGORITHM

  • In this section the authors introduce an algorithm to compute offline a closed-loop control policy π̄∗cl that minimizes (7).
  • The computed optimal policy is then stored in memory and applied on–line.
  • Π̄∗cl is a map from the state to the switching signal.
  • The next two sub–sections explain in detail the two main stages of the algorithm.

A. Data-set generation: the DATAGEN subroutine

  • If L is sufficiently large, the finite-horizon cost is a good approximation of the infinite horizon cost and the value obtained for the control input at time 0 approximates the optimal one for the infinite horizon case.
  • This optimization problem can be efficiently solved via MIQP [19].
  • This equation is nonlinear, since it involves products between states and logical inputs.
  • Introducing the auxiliary continuous variables zi(k) ∈ Rn: zi(k) =.
  • The cost function can be convexified by adding a correction term that is constant with respect to the optimization variables and hence does not change the optimal switching sequence (details are omitted for brevity).

B. The GEM classification machine: the LEARN procedure

  • The LEARN procedure consists in training a classifier on the data gathered as explained in the previous subsection.
  • Interestingly, the derived bound for N is independent of the state space dimension n.
  • All instances included into R1 share the same label as x1 and are thus removed from the training set, while the instances on the boundary Ω(R1) of the region are marked as “active” points and added to a set Q.
  • For more details on the GEM algorithm refer to [17].
  • The specific nature of the considered problem, which deals with affine switching systems, may induce a special structure in the switching regions, also known as Remark 3.2.

IV. NUMERICAL EXAMPLE

  • In the following the proposed classification-based control design methodology is tested on a modified version of a benchmark multi-room heating control problem described in [22].
  • A switching control strategy must be designed to decide at each time step which room should be heated, depending on the temperature values in all the rooms.
  • To evaluate the performance of the policy obtained by the proposed algorithm (denoted π̄∗GEM in the following), the authors compared it with a standard MPC policy (referred as π̄∗MPC).
  • Notice that in the absence of switching costs the optimal policy is independent of the current mode.
  • Notice also that, while equation (13) provides a lower bound for the probability that PE(π̄∗GEM ) ≤ , the practical application of the GEM classifier typically leads to better performances.

V. CONCLUSIONS

  • A classification-based approach has been proposed for the optimal control of discrete-time switched affine systems.
  • The proposed method operates in two steps.
  • First, a number of initial states is drawn from a uniform distribution over the state space, and an optimal control action is associated to each of them.
  • Precise bounds can be derived on the generalization capabilities of the classifier, which indirectly affect the control performance.
  • Some simulation experiments on a benchmark problem reveal that the difference with respect to a standard MPC policy is small.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

A classification–based approach to the optimal control
of affine switched systems*
Giorgio Manganini, Luigi Piroddi, and Maria Prandini
1
Abstract This paper deals with the optimal control of
discrete–time switched systems, characterized by a finite set
of operating modes, each one associated with given affine
dynamics. The objective is the design of the switching law so
as to minimize an infinite–horizon expected cost, that penalizes
frequent switchings. The optimal switching law is computed off–
line, which allows an efficient online operation of the control
via a state feedback policy. The latter associates a mode to
each state and, as such, can be viewed as a classifier. In
order to train such classifier–type controller one needs first
to generate a set of training data in the form of optimal state–
mode pairs. In the considered setting, this involves solving a
Mixed Integer Quadratic Programming (MIQP) problem for
each pair. A key feature of the proposed approach is the
use of a classification method that provides guarantees on the
generalization properties of the classifier. The approach is tested
on a multi–room heating control problem.
I. INTRODUCTION
In this paper we consider the optimal control of discrete–
time switched affine systems, which are hybrid systems
where the discrete state identifies the mode of the system
and the continuous state is governed by mode–specific affine
dynamics. The objective is to compute a state–feedback
switching law that minimizes a cost function over an infinite
time horizon.
A general framework for the optimal control of switched
systems was established by [1] in the context of hybrid
systems. For the class of switched affine systems with
no continuous input, [2] showed that the optimal control
formulation leads to a two-point boundary value problem,
and a general solution is difficult to obtain both analyti-
cally and numerically. A state–space discretization technique
was adopted in [3] to solve the Hamilton–Jacobi–Bellman
equations and determine the optimal feedback control law.
One way to reduce the deriving computational complexity
consists in performing optimal control decisions over a
(short) receding horizon, as in Model Predictive Control
(MPC) [4]. Alternatively, [5] dealt with positive switched
systems and used piecewise co–positive Lyapunov functions
to obtain suboptimal switching rules with a guaranteed level
of performance, still retaining a full horizon for the decisions.
Optimal feedback laws are designed for the particular case
where the continuous component has linear dynamics and the
cost function is piecewise-quadratic, under the assumption
that the switching sequence has finite length and that the
mode sequence is fixed, so that only the switching instants
must be optimized [6]. Later, in [7] the approach is extended
to affine dynamics and the optimal control problem is solved
*The authors gratefully acknowledge financial support by the European
Commission project UnCoVerCPS under grant number 643921.
1
Giorgio Manganini, Luigi Piroddi, and Maria Prandini are with the Di-
partimento di Elettronica, Informazione e Bioingegneria, Politecnico di Mi-
lano, via Ponzio 34/5, 20133 Milano, Italy. {giorgio.manganini,
luigi.piroddi,maria.prandini}@polimi.it
with both the switching instants and the mode sequence
as decision variables. Finally, in [8] an infinite number of
switchings is allowed.
A similar setting is studied here, albeit circumventing
some of the limitations of the mentioned approaches. In
particular, an infinite horizon problem is analyzed that allows
for infinite switchings and, at the same time, encompasses
switching costs. Combining all these features together ap-
pears to be a challenging problem. For example, [7] considers
switching costs, but only for finite switching sequences.
Switching costs are handled also in [9], using an ad–hoc
heuristic, but only for a finite time horizon control problem.
Additionally, the presented approach removes the require-
ment of the discretization of the state space, which is a
well-known source of computational complexity even for
relatively small scale systems. Also, differently from most
of the works in the literature, we adopt a discrete–time
framework. This can be viewed as a way of implicitly
enforcing a certain minimum dwell time between consecutive
switchings, thus avoiding chattering phenomena.
Similarly to [10], we employ Mixed Integer Quadratic
Programming (MIQP) to determine the optimal control pol-
icy. However, while in [10] MIQP problems are repeatedly
solved on–line to determine the optimal control sequence
at each time instant (only the first control is applied every
time, according to the receding horizon strategy), MIQP is
here used for the off–line computation of the switching law.
When online optimization is not viable, [11], [12] suggest a
multi–parametric programming approach for solving a finite–
horizon hybrid optimal control problem in a state–feedback
form. Notice, however, that the solution proposed in these
works is not applicable here, since it refers to a continuous
control input rather than the mode switching signal.
In order to transfer the computational load off–line, we in-
troduce a classification-based algorithm that (approximately)
computes the optimal feedback policy as a map from the
state space onto the control input space, given by the set of
operating modes for the system. The key idea is to represent
such a policy by means of a (multi–class) classifier: each
operating mode is viewed as a distinct class and states are
instances to be classified. The use of classifiers for repre-
senting control policies has been recently suggested in the
reinforcement learning literature in combination with policy
iteration schemes [13]–[16]. For instance, [13] formulates the
policy improvement step as a classification problem. For each
state in a training set, a state–action pair is generated based
on the estimated value function associated to the current
policy. Then, the updated policy is obtained as a classifier
trained over the given state-action pairs.
Rather than resorting to a policy iteration scheme, we here
perform a direct policy search by first generating a data-set of
optimal state–input pairs, and then training a classifier over

these data to provide an (approximately) optimal closed-loop
control policy. In the case of systems affine in the state and
quadratic cost, the data-set generation reduces to solving a
convex optimization problem per initial state. To address the
classification task, we employ the Guaranteed Error Machine
(GEM) classifier [17], which provides theoretical guarantees
on the probability of classification errors. Moreover, this
error probability can be tuned by the user by adequately
setting some parameters entering the classifier definition.
II. PRELIMINARIES AND PROBLEM STATEMENT
In this paper we consider the class of hybrid systems, com-
monly denoted as switched systems, with affine dynamics:
x(k + 1) = A
σ(k)
x(k) + f
σ(k)
, σ(k) I, (1)
where x(k) R
n
is the continuous state, σ(k) I =
{1, 2, . . . , m} is the switching signal that selects the mode of
the system. Each mode i I identifies a matrix A
i
R
n×n
and a vector f
i
R
n
characterizing the system dynamics.
The switching signal at time k may depend on the previous
mode of the system, identified by σ(k1). To model this, we
add the discrete variable q(k) I that specifies the previous
active mode:
q(k + 1) = σ(k). (2)
Note that (1-2) can be viewed as the equations of a
nonlinear stationary system of the form:
z(k + 1) = F (z(k), σ(k)), (3)
with state z = [x
T
q]
T
and control input σ.
With reference to system (1), a control policy is a rule
that selects which mode to activate at each time instant, so
as to achieve a desired behavior of the controlled system.
The control objective can be formulated, e.g., in terms of a
discounted cost over the time horizon [0, L]:
J
L
(x(0), q(0)) =
L1
X
k=0
γ
k
g(x(k), q(k), σ(k))
+ γ
L
g
L
(x(L), q(L)), (4)
where x(0) is the initial state and q(0) is the initial mode,
γ (0, 1) is the discount factor, g(x(k), q(k), σ(k)) R
+
is the cost per stage (assumed stationary for simplicity), and
g
L
(x(L), q(L)) is the terminal cost. In the sequel, the cost
per stage includes a state tracking term and a switching cost:
g(x(k), q(k), σ(k)) =
(x(k) x
ref
)
T
Q(x(k) x
ref
) + H
q(k)(k)
, (5)
where x
ref
is some reference set point and Q = Q
T
0
is a given positive semi-definite weighting matrix. Parameter
H
i,j
, i, j I, identifies the cost incurred for switching from
mode i to mode j, and satisfies H
i,j
0 (H
i,j
= 0 when
i = j). The desired trade–off between the two objectives can
be enforced by suitably setting Q and H
i,j
, i, j I. Finally,
the terminal cost is set to
g
L
(x(L), q(L)) = (x(L) x
ref
)
T
Q(x(L) x
ref
). (6)
If the time horizon length L grows to infinity, the cost
function becomes:
J
(x(0), q(0)) =
X
k=0
γ
k
g(x(k), q(k), σ(k)), (7)
and the terminal cost disappears.
The overall cost depends on the sequence σ(0), . . . ,
σ(L 1) of control input values. A control sequence is
optimal (denoted σ
) for a given initial state x(0) and mode
q(0) if it minimizes J
L
(x(0), q(0)). A control policy π is
a (possibly time-varying) map from the state (x, q) to σ. A
control policies can be defined either in open-loop or closed-
loop. Open-loop policies select the control input (σ(k)) based
on the initial state:
σ(k) = π
ol
(x(0), q(0), k),
i.e. without any direct measurement of the effects of past
inputs. On the other hand, closed-loop policies select the
control input based on the knowledge of the current state:
σ(k) = π
cl
(x(k), q(k), k).
A control policy is optimal (denoted π
) if it provides the
optimal control sequence for all initial states. In particular,
it must hold that π
cl
(x, q, 0) = π
ol
(x, q, 0) for every (x, q),
whereas for k > 0 the two policies provide the same control
input sequence only in the absence of disturbances and
model uncertainty. A closed-loop policy implementation of
the optimal control sequence is preferable, in order to deal
with possible uncertainties affecting the real system.
In this paper, we focus on the infinite–horizon case, for
which the optimal control design and implementation are
easier. Indeed, since the system is stationary and the cost
function has a time–invariant cost per stage, it turns out
that there exists an optimal closed–loop stationary control
policy, [18]. Denoting such policy as ¯π
cl
, it holds that
¯π
cl
(x, q) = π
cl
(x, q, k), k 0. A consequence of this is
that ¯π
cl
(x, q) = π
ol
(x, q, 0). In principle, then, one could
compute the optimal closed–loop policy by calculating the
optimal input at time k = 0 according to the open–loop
policy for all possible (initial) states.
In practice, the calculation of the optimal open–loop policy
is approximated over a finite horizon [0, L]. Furthermore,
only a finite number of initial states can be considered, so that
the desired map must be constructed by generalization over
the whole state space based on a finite number of samples.
Both these sources of approximation can by controlled by the
user, by extending the time horizon length L and exploiting
the generalization error guarantees of the GEM classifier.
III. CLASSIFICATION-BASED TWO-STAGE ALGORITHM
In this section we introduce an algorithm to compute off-
line a closed-loop control policy ¯π
cl
that minimizes (7).
The computed optimal policy is then stored in memory and
applied on–line. ¯π
cl
is a map from the (hybrid) state to the
(discrete) switching signal. For convenience purposes we
represent it as a collection of functions of the continuous
state component x, indexed by the mode q: {¯π
cl
(·, q) : R
n
I}
q∈I
. For a given q I, the switching policy ¯π
cl
(·, q) is a
piece–wise constant function of x.
For each q I, the algorithm computes the control policy
¯π
cl
(·, q), in two stages:

1) Data–set generation stage: a data-set E
q
N
of (x, σ
)
pairs is generated, where x is drawn from µ
x
and σ
is the optimal switching mode associated to (x, q);
2) Learning stage: a classifier is trained over E
q
N
to
provide (an approximation of) the optimal closed–loop
policy ¯π
cl
(·, q).
A pseudo-code of the algorithm is provided in Algorithm 1.
Algorithm 1 Classification-based two-stage algorithm
Input: µ
x
(initial state distribution)
N (size of the training set)
1: for all q I do
2: E
q
N
3: for i = 1 to N do
4: Draw x
(i)
µ
x
5: σ
(i)
DATAGEN(x
(i)
, q)
6: E
q
N
E
q
N
{(x
(i)
, σ
(i)
)}
7: end for
8: ¯π
cl
(·, q) LEARN(E
q
N
)
9: end for
Output: Policy ¯π
cl
The next two sub–sections explain in detail the two main
stages of the algorithm.
A. Data-set generation: the DATAGEN subroutine
To generate the data–set E
q
N
of training examples associ-
ated to mode q I, we solve the following optimization
problem for any given initial state x(0) = x
0
extracted at
random according to µ
x
:
min
σ(0),...,σ(L1)
J
L
(x(0), q(0))
subject to:
x(k + 1) = A
σ(k)
x(k) + f
σ( k)
, k = 0, . . . , L 1
q(k + 1) = σ(k), k = 0, . . . , L 1
x(0) = x
0
, q(0) = q
(8)
where J
L
is defined as in (4). Finally, the pair (x
0
, σ(0)) is
stored in E
q
N
.
If L is sufficiently large, the finite-horizon cost is a good
approximation of the infinite horizon cost and the value
obtained for the control input at time 0 approximates the
optimal one for the infinite horizon case.
This optimization problem can be efficiently solved via
MIQP [19]. To this end, it is useful to reformulate system (1)
as a Mixed Logical Dynamical (MLD) system [10]. An
MLD systems is described by affine dynamic equations
subject to linear mixed-integer inequalities involving both
continuous and binary variables.
The first step in the reformulation is the introduction of
the binary input variables δ
i
(k) {0, 1} to model the choice
of the control input at time k. More precisely, we have
δ
i
(k) = 1 σ(k) = i, k = 0, . . . , L 1, i I
m
X
i=1
δ
i
(k) = 1, k = 0, . . . , L 1, (9)
where condition (9) implements the exclusive-or constraint
that the control input can take only one value at any time.
System (1) can now be rewritten as:
x(k + 1) =
m
X
i=1
[A
i
x(k) + f
i
] δ
i
(k).
This equation is nonlinear, since it involves products between
states and logical inputs. However, introducing the auxiliary
continuous variables z
i
(k) R
n
:
z
i
(k) = [A
i
x(k) + f
i
] δ
i
(k), i I (10)
and setting x(k + 1) =
P
m
i=1
z
i
(k), one can transform con-
straint (10) into a set of mixed-integer linear inequalities by
using the so-called “big-M” approach [10]. More precisely,
for each i I we can set:
z
i
(k) Mδ
i
(k)
z
i
(k)
i
(k)
z
i
(k) A
i
x(k) + f
i
m(1 δ
i
(k))
z
i
(k) A
i
x(k) + f
i
M (1 δ
i
(k))
(11)
where M R
n
and m R
n
are an upper and lower bound
on the state vector x, respectively.
The switching cost in (5) can be reformulated as a
quadratic function of the decision variables, by expressing
the state variable q(k) in a binary form, i.e. by introducing
q
δ
(k) {0, 1}
m
such that q
δ
i
(k) = 1 if q(k) = i and 0
otherwise:
H
q( k)(k)
=
q
δ
1
(k)
.
.
.
q
δ
m
(k)
T
0 H
12
. . . H
1m
H
21
0 . . . H
2m
.
.
.
.
.
.
.
.
.
.
.
.
H
m1
H
m2
. . . 0
δ
1
(k)
.
.
.
δ
m
(k)
(12)
In view of expressions (10-12) problem (8) is reformulated
as a MIQP, which can be solved via standard solvers like
CPLEX [20].
Remark 3.1: The main diagonal of matrix H in (12)
contains all zero elements and, therefore, leads to a non–
convex quadratic function of the control variables, which
cannot be efficiently solved by a MIQP solver. However, the
cost function can be convexified by adding a correction term
that is constant with respect to the optimization variables
and hence does not change the optimal switching sequence
(details are omitted for brevity).
B. The GEM classification machine: the LEARN procedure
The LEARN procedure consists in training a classifier on
the data gathered as explained in the previous subsection. For
this purpose, we employ the GEM [17] algorithm, in view of
the guarantees it provides on the probability of classification
errors. These guarantees can be used to prescribe a desired
level of the generalization error and determine the number N
of training data to be calculated by the DATAGEN procedure,
or, alternatively, to calculate the error probability associated
to a given size N of the training data–set.
Let x R
d
be a vector of measured attributes and y =
y(x) Y = {1, . . . , m} the corresponding (discrete) class
label. A classifier ˆy = ˆy(x) provides an estimate for the class
y of x. It errs on x if y(x) 6= ˆy(x). Differently from most
other classifying machines, the GEM returns an augmented
class set Y {unknown}, that includes an unknown label,
expressing the inability to classify the sample.

Now, let E
N
= {(x
1
, y
1
), . . . , (x
N
, y
N
)} be a data-set
of N training samples, where x
1
, . . . , x
N
are independently
extracted according to a probability distribution µ and y
i
=
y(x
i
). The probability of error (or generalization error) of a
classifier ˆy
N
(·) trained on these data is defined as
P E(ˆy
N
) = µ (ˆy
N
(x) Y y(x) 6= ˆy
N
(x)) ,
that is the probability that an output is issued and that the
output is not correct. Given that ˆy
N
(·) is derived based on
the set E
N
of randomly sampled training data, P E(ˆy
N
) is a
random variable whose distribution depends on the unknown
data generation mechanism (µ, y(·)).
Theorem 1 in [17] provides a formal expression for
the probability distribution F
P E
() := µ
N
{P E(ˆy
N
) },
where 1 , with (0, 1), is the accuracy level:
F
P E
()
N1
X
i=k
N 1
i
i
(1 )
N1i
, (13)
where the right hand side is a Beta(k, N k) distribution,
that represents the confidence over P E(ˆy
N
) and can be
expressed as
Beta(k, N k) = 1 δ, δ (0, 1).
Following [21], we can compute the sample complexity, i.e.
a lower bound on the minimum number of data N that are
needed for expression (13) to hold, as a function of , δ, and
k. Given δ (0, 1), (0, 1), and the nonnegative integer
k, if N satisfies the inequality
N 1 +
1
k 1 + log
1
1 δ
+
p
2(k 1) log
1
1 δ
,
then, (13) holds. Interestingly, the derived bound for N is
independent of the state space dimension n.
The GEM algorithm takes as input the training data E
N
and the tuning parameter k N 1. It constructs an ordered
sequence of (hyper)ellipsoidal regions R = {R
1
, . . . , R
r
}
which constitute a (sub)partition of the space of the training
example. Each of these regions R
j
has an associated label
`
j
Y , such that the union of these regions corresponds to
that part of the input space where the GEM issues an answer,
whereas in the remaining (uncovered) part of the space the
machine returns the label unknown.
More in detail, starting from x
1
, the algorithm constructs
region R
1
, that contains x
1
and extends until it touches
another datum x
j
, j 6= 1, such that y
j
6= y
1
. All instances
included into R
1
share the same label as x
1
and are thus
removed from the training set, while the instances on the
boundary Ω(R
1
) of the region are marked as “active” points
and added to a set Q. If |Q| k and the training data-set is
not empty, then the active point farthest from x
1
is selected
as the new base instance, and a new region is constructed.
Ensuring that |Q| k is the key property that guarantees
that (13) holds [17]. For more details on the GEM algorithm
refer to [17].
Remark 3.2: The specific nature of the considered prob-
lem, which deals with affine switching systems, may induce
a special structure in the switching regions. More precisely,
in the absence of switching costs these regions are conical
[7]. This can be exploited to tailor the classification method.
IV. NUMERICAL EXAMPLE
In the following the proposed classification-based control
design methodology is tested on a modified version of a
benchmark multi-room heating control problem described
in [22]. The problem concerns the simultaneous temperature
regulation in n rooms, each room being endowed with a
heater, with the constraint that at most one heater at a time
can be active. A switching control strategy must be designed
to decide at each time step which room should be heated,
depending on the temperature values in all the rooms. With
respect to the original benchmark description, a deterministic
setting is here adopted and an energy–related cost associated
to the on/off switching of the heaters is included.
The n-room heating system can be modeled as a switched
system with continuous state component x = (x
1
, . . . , x
n
)
X = R
n
representing the (average) temperature in each
room. The switching signal σ I = {1, . . . , n + 1} has
n + 1 command options, namely turning on the heater of
the ith room (σ = i), i = 1, . . . , n, or turning them all off
(σ = n + 1).
The average temperature in room i is ruled by the follow-
ing difference equation, obtained by Euler discretization of
the corresponding continuous–time dynamics with constant
time step t:
x
i
(k + 1) = x
i
(k) + [b
i
(x
a
x
i
(k)) + c
i
h
i
(σ(k)) (14)
+
X
j=1,...,n;j6=i
a
ij
(x
j
(k) x
i
(k))]∆t, i = 1, . . . , n,
where x
a
is the ambient temperature (assumed constant),
and h
i
(k) is a boolean function equal to 1 when room i is
heated, and 0 otherwise. Parameters a
ij
, b
i
and c
i
in (14)
are non-negative constants representing the heat exchange
coefficients between room i and room j (a
ij
), the heat loss
rate of room i to the ambient (b
i
) and the heat rate supplied
by the heater in room i (c
i
), all normalized with respect to
the average thermal capacity of room i. The parameters are
set as follows: t = 1/30, x
a
= 6, b
i
= 0.25 and c
i
= 12
for i = 1 . . . n, a
ij
= a
ji
= 0.33, for i = 1, . . . , n 1,
j = i + 1.
The control problem can be formulated as in Section II,
where the objective is to track the (constant) reference
state x
ref
, while weighting the cost penalty component
associated with changing the heated room. Accordingly, we
adopt the infinite horizon cost (7), with a discount factor
γ = 0.95. The initial state distribution µ
x
is uniform over
the domain [15.25, 24.25]
n
. A prediction horizon of length
L = 10 is used for building the training data-set according
to the approach in Section III-A. The control performance
is evaluated over the same look-ahead horizon. The one-step
cost function defined in (5) is employed with Q = I
n×n
and
x
i,ref
= 19
for all rooms, whereas the switching cost H
i,j
represents a tuning parameter.
To evaluate the performance of the policy obtained by the
proposed algorithm (denoted ¯π
GEM
in the following), we
compared it with a standard MPC policy (referred as ¯π
MP C
).
In the MPC approach, a closed–loop policy is computed by
using a time receding horizon strategy, solving on–line at
every time instant k the optimization problem (8), with the
initial state given by x(k) and q(k), and a time horizon of
k + L. Notice that the data used to train the GEM classifier

are obtained by solving the same optimization problem, so
that the performance of the GEM–based controller can be
put in direct relation with that of the MPC controller, and
the approximation and generalization properties of the former
can be evaluated.
To illustrate the results, we first make reference to the 2-
room case. In this case, N = 3000 training state–control
input pairs have been generated for modes q = 1, 2, 3,
while the confidence and accuracy parameters for the GEM
classifiers have been set to δ = 10
5
and 1 = 0.97,
respectively. No unknown regions resulted after the GEM
training. Figures 1-2 show the system trajectories starting
from the initial state x(0) = [17 17]
T
and mode q(0) =
3, under both the control policies ¯π
MP C
and ¯π
GEM
. In
particular, Figure 1 refers to the case with no switching costs
(H
i,j
= 0, i, j I), while in Figure 2 we assign a high cost
penalty to a switch occurrence (H
i,j
= 50, i, j I, i 6= j).
Notice that in the absence of switching costs the optimal
policy is independent of the current mode. Conversely, its
dependence on the mode increases with the switching cost.
Temperature [
/
C]
16
17
18
19
20
Trajectory steps - k
0 5 10 15 20 25 30
Switching signal - <(k)
1
2
3
Temperature [
/
C]
16
17
18
19
20
Trajectory steps - k
0 5 10 15 20 25 30
Switching signal - <(k)
1
2
3
Fig. 1. Continuous state trajectories (red and blue lines represent the
1
st
and 2
nd
room, respectively) and switching sequences starting from
x(0) = [17 17]
T
and q(0) = 3, under policies ¯π
M P C
(top) and ¯π
GEM
(bottom), in the absence of switching costs.
The policy ¯π
GEM
computed by the proposed algorithm
shows a similar behavior with respect to ¯π
MP C
, though it
generates a slightly different switching sequence. Not sur-
prisingly, better tracking performances are obtained when the
switching cost is absent (Figure 1), which allows both poli-
cies to alternate the heating in the two rooms. On the other
hand, when the switching cost becomes significant, both
policies have to exercise a trade–off between the conflicting
Temperature [
/
C]
16
18
20
22
Trajectory steps - k
0 5 10 15 20 25 30
Switching signal - <(k)
1
2
3
Temperature [
/
C]
16
18
20
22
Trajectory steps - k
0 5 10 15 20 25 30
Switching signal - <(k)
1
2
3
Fig. 2. Continuous state trajectories (red and blue lines represent the
1
st
and 2
nd
room, respectively) and switching sequences starting from
x(0) = [17 17]
T
and q(0) = 3, under policies ¯π
M P C
(top) and ¯π
GEM
(bottom), in the presence of large switching costs.
goals, and the number of switching occurrences is greatly
reduced. For the reader’s reference, the performance of the
MPC policy is equal to J = 37.0980 and J = 174.5605,
in the two examined conditions respectively, which are
only 0.3% and 0.12% better than the corresponding values
obtained with ¯π
GEM
.
The performance loss due to the approximation introduced
by the GEM classifier in the definition of the policy ¯π
GEM
has been further tested in larger case instances. A Monte
Carlo estimate of the expected cost
E
x(0)µ
x
q(0)µ
q
[J
(x(0), q(0))]
with µ
q
uniform, is evaluated for the n-room scenario, when
n = 1, . . . , 4, for both control policies: the simulations
are performed following each policy for 50 steps, starting
from 100 initial states drawn from the distribution µ
x
, with
initial mode q(0) = q = n + 1, and no switching cost is
considered. N = 15000 training input pairs are generated
for the computation of the policy ¯π
GEM
. Figure 3 shows
the relative performance loss of the policy ¯π
GEM
with
respect to ¯π
MP C
as a function of the accuracy of the GEM
machine. Apparently, this dependence is essentially linear,
showing a graceful degradation of the performance as is
increased. Notice also that, while equation (13) provides a
lower bound for the probability that P E(¯π
GEM
) , the
practical application of the GEM classifier typically leads to
better performances.

Citations
More filters
Proceedings ArticleDOI
05 Nov 2015
TL;DR: The effectiveness of the proposed majority voting classifier is shown on both synthetic and real benchmark data-sets, and the results are compared with other well-established classification algorithms.
Abstract: This paper deals with supervised learning for classification. A new general purpose classifier is proposed that builds upon the Guaranteed Error Machine (GEM). Standard GEM can be tuned to guarantee a desired (small) misclassification probability and this is achieved by letting the classifier return an unknown label. In the proposed classifier, the size of the unknown classification region is reduced by introducing a majority voting mechanism over multiple GEMs. At the same time, the possibility of tuning the misclassification probability is retained. The effectiveness of the proposed majority voting classifier is shown on both synthetic and real benchmark data-sets, and the results are compared with other well-established classification algorithms.

3 citations

Journal ArticleDOI
TL;DR: The preliminary results presented in this paper are very general and apply in principle to any weighted majority voting scheme involving individual classifiers that come with statistical guarantees, in the spirit of Probably Approximately Correct (PAC) learning.

2 citations

Journal ArticleDOI
01 Nov 2021
TL;DR: In this paper, the authors focus on the case where the voting agents are binary classifiers and introduce novel bounds on the probability of misclassification conditioned on the size of the majority.
Abstract: Majority voting is often employed as a tool to increase the robustness of data-driven decisions and control policies, a fact which calls for rigorous, quantitative evaluations of the limits and the potentials of majority voting schemes. This letter focuses on the case where the voting agents are binary classifiers and introduces novel bounds on the probability of misclassification conditioned on the size of the majority. We show that these bounds can be much smaller than the traditional upper bounds on the probability of misclassification. These bounds can be used in a ‘Probably Approximately Correct’ (PAC) setting, which allows for a practical implementation.

1 citations

References
More filters
Journal ArticleDOI
TL;DR: A predictive control scheme is proposed which is able to stabilize MLD systems on desired reference trajectories while fulfilling operating constraints, and possibly take into account previous qualitative knowledge in the form of heuristic rules.

2,980 citations


"A classification-based approach to ..." refers background or methods in this paper

  • ...and setting x(k + 1) = ∑m i=1 zi(k), one can transform constraint (10) into a set of mixed-integer linear inequalities by using the so-called “big-M” approach [10]....

    [...]

  • ...To this end, it is useful to reformulate system (1) as a Mixed Logical Dynamical (MLD) system [10]....

    [...]

  • ...Similarly to [10], we employ Mixed Integer Quadratic Programming (MIQP) to determine the optimal control policy....

    [...]

  • ...However, while in [10] MIQP problems are repeatedly solved on–line to determine the optimal control sequence at each time instant (only the first control is applied every time, according to the receding horizon strategy), MIQP is here used for the off–line computation of the switching law....

    [...]

Book
01 Feb 2007
TL;DR: This research monograph is the authoritative and comprehensive treatment of the mathematical foundations of stochastic optimal control of discrete-time systems, including thetreatment of the intricate measure-theoretic issues.
Abstract: This research monograph is the authoritative and comprehensive treatment of the mathematical foundations of stochastic optimal control of discrete-time systems, including the treatment of the intricate measure-theoretic issues.

1,811 citations


"A classification-based approach to ..." refers background in this paper

  • ...Indeed, since the system is stationary and the cost function has a time–invariant cost per stage, it turns out that there exists an optimal closed–loop stationary control policy, [18]....

    [...]

Journal ArticleDOI
TL;DR: This work introduces a mathematical model of hybrid systems as interacting collections of dynamical systems, evolving on continuous-variable state spaces and subject to continuous controls and discrete transitions, and develops a theory for synthesizing hybrid controllers for hybrid plants in all optimal control framework.
Abstract: We propose a very general framework that systematizes the notion of a hybrid system, combining differential equations and automata, governed by a hybrid controller that issues continuous-variable commands and makes logical decisions. We first identify the phenomena that arise in real-world hybrid systems. Then, we introduce a mathematical model of hybrid systems as interacting collections of dynamical systems, evolving on continuous-variable state spaces and subject to continuous controls and discrete transitions. The model captures the identified phenomena, subsumes previous models, yet retains enough structure to pose and solve meaningful control problems. We develop a theory for synthesizing hybrid controllers for hybrid plants in all optimal control framework. In particular, we demonstrate the existence of optimal (relaxed) and near-optimal (precise) controls and derive "generalized quasi-variational inequalities" that the associated value function satisfies. We summarize algorithms for solving these inequalities based on a generalized Bellman equation, impulse control, and linear programming.

1,363 citations


"A classification-based approach to ..." refers background in this paper

  • ...A general framework for the optimal control of switched systems was established by [1] in the context of hybrid systems....

    [...]

Journal ArticleDOI
TL;DR: It is proved that the framework of piecewise linear systems can be used to analyze smooth nonlinear dynamics with arbitrary accuracy and an upper bound to the optimal cost is obtained by another convex optimization problem using the given control law.
Abstract: The use of piecewise quadratic cost functions is extended from stability analysis of piecewise linear systems to performance analysis and optimal control. Lower bounds on the optimal control cost are obtained by semidefinite programming based on the Bellman inequality. This also gives an approximation to the optimal control law. An upper bound to the optimal cost is obtained by another convex optimization problem using the given control law. A compact matrix notation is introduced to support the calculations and it is proved that the framework of piecewise linear systems can be used to analyze smooth nonlinear dynamics with arbitrary accuracy.

516 citations

Journal ArticleDOI
TL;DR: The aim of the paper is to give basic theoretical results on the structure of the optimal state-feedback solution and of the value function and to describe how the state- feedback optimal control law can be constructed by combining multiparametric programming and dynamic programming.

372 citations


"A classification-based approach to ..." refers methods in this paper

  • ...When online optimization is not viable, [11], [12] suggest a multi–parametric programming approach for solving a finite– horizon hybrid optimal control problem in a state–feedback form....

    [...]

Frequently Asked Questions (1)
Q1. What are the contributions mentioned in the paper "A classification–based approach to the optimal control of affine switched systems*" ?

This paper deals with the optimal control of discrete–time switched systems, characterized by a finite set of operating modes, each one associated with given affine dynamics. A key feature of the proposed approach is the use of a classification method that provides guarantees on the generalization properties of the classifier.