scispace - formally typeset

Journal ArticleDOI

A comparison of classical and learning controllers

01 Jan 2011-IFAC Proceedings Volumes (Elsevier)-Vol. 44, Iss: 1, pp 1102-1107

AbstractThis paper focuses on evaluating Locally Weighted Projection Regression (LWPR) as an alternative control method to traditional model-based control schemes. LWPR is used to estimate the inverse dynamics function of a 6 degree of freedom (DOF) manipulator. The performance of the resulting controller is compared to that of the resolved acceleration and the adaptive computed torque (ACT) controller. Simulations are carried out in order to evaluate the position and orientation tracking performance of each controller while varying trajectory velocities, end effector loading and errors in the known parameters. Both the adaptive controller and LWPR controller have comparable performance in the presence of parametric uncertainty including friction. The ACT controller outperforms LWPR when the dynamic structure is accurately known and the trajectory is persistently exciting.

Topics: Open-loop controller (62%), Control theory (58%), Adaptive control (55%), Inverse dynamics (50%)

Summary (2 min read)

1. INTRODUCTION

  • The use of robotics worldwide is most prevalent in the industrial setting where the environment is highly controlled.
  • Furthermore, uncertainties in the physical parameters of a system may be introduced from discrepancies between the manufacturer data and the actual system (Ayusawa et al., 2008).
  • More recently, statistical regression approaches have been used to infer the optimal structure to describe the observed data, making it possible to encode nonlinearities whose structure may not be well-known.

2. OVERVIEW OF MODEL-BASED CONTROL

  • This term globally linearizes and decouples the system, and thus a linear controller can be applied for the feedback term, uFB, which provides stability and disturbance rejection.
  • Desirable performance of the computed torque approach is based on the assumption that values of the parameters in (1) match the actual parameters of the physical system.
  • Otherwise, imperfect cancelation of the nonlinearities and coupling occurs.

3. LEARNING INVERSE DYNAMICS

  • While the adaptive approach requires accurate knowledge of the structure of the dynamic model of the manipulator, the learning approach obtains a model using measured data (Nguyen-Tuong et al., 2009), allowing unknown or unmodeled nonlinearities such as friction and backlash to be accounted for.
  • In order to be practical for manipulator control, learning algorithms must process continuous streams of training data to update the model and predict outputs fast enough for real-time control.
  • K∑ k=1 wikŷik/ K∑ k=1 wik (8) where K is the number of linear models, ŷik is the prediction of the ikth local linear model given by (6) which is weighed by wik associated with its receptive field.
  • This prediction is repeated i times for each dimension of the output vector y.
  • To reduce computational effort, LWPR assumes that the data can be characterized by local low-dimensional distributions, and attempts to reduce the dimensionality of the input space X using Partial Least Squares regression (PLS).

4. SIMULATIONS

  • In order to evaluate the performance of the LWPR learning controller, two ‘classical’ controllers in the joint space were also implemented: the resolved acceleration (RA) controller (Sciavicco and Scicliano, 2000) and the adaptive computed torque (ACT) controller (Craig et al., 1986; Ortega and Spong, 1988) given by (3).
  • The LWPR controller was trained for 60s on the 0.25Hz trajectory, after which training was stopped and tracking performance was evaluated.
  • Training was stopped when the observed MSE had asymptotically decreased to a low value.
  • The same conditions were repeated on the ACT and RA controller.
  • The third simulation adds varying amounts of error in the inertia parameters of the model while observing the resulting performance of the three controllers when tracking the ‘figure 8’ trajectory.

4.1 Parameter Tuning and Initialization

  • The stability of the ACT controller was found to be highly sensitive to the adaptive gain parameter, γ (3).
  • The initial value for the distance parameter D (7) dictates how large a receptive field is upon initialization.
  • This parameter was generally tuned through a trial-and-error process which involved monitoring the MSE of the predicted values during the training phase.
  • The initial performance of the LWPR controller is also highly dependent upon the data sets that are used to train the LWPR model.
  • Because LWPR is a local learning approach, it must be trained in the region(s) of input space that the manipulator will be operating in.

4.2 Results

  • The LWPR model was trained on the ‘figure 8’ trajectory at 0.25Hz, enabling it to predict the necessary torques for tracking at frequencies near 0.25Hz.
  • Table 3 illustrates that the inaccurate knowledge of the dynamic parameters causes significant degradation in performance for the RA controller, due to the imperfect linearization of the system dynamics.
  • For the LWPR controller, similar findings to that of simulations one and two were observed in that inertia parameter perturbations greater than 10% were sufficient to prevent LWPR from predicting accurate joint torques.
  • The first test involved using the model that was learned in simulation three to attempt to track the PE trajectory.
  • Unlike the adaptive controller which relies on persistence of excitation for tracking performance, the LWPR approach can be trained on an arbitrary trajectory provided that it is given sufficient time to learn.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

A Comparison of Classical and Learning
Controllers
Joseph Sun de la Cruz
Dana Kuli´c
William Owen
∗∗
Department of Electrical and Computer Engineering
University of Waterloo, Waterloo, ON, Canada
(e-mail: {jsundela, dkulic}@uwaterloo.ca
∗∗
Department of Mechanical and Mechatronics Engineering
University of Waterloo, Waterloo, ON, Canada
(e-mail: bowen@uwaterloo.ca)
Abstract: This paper focuses on evaluating Locall y Weighted Projection Regression (LWPR)
as an alternative control method to traditional model-based control schemes. LWPR is used
to estimate the inverse dynamics function of a 6 degree of freedom (DOF) manipulator. The
performance of the resulting controller is compared to that of the resolved acceleration and the
adaptive computed torque (ACT) controller. Simulations are carried out in order to evaluate
the position and orientation tracking performance of each controller whi le vary in g traje ct ory
velocities, end eector loading and er r or s in the known parameters. Both the adaptive controller
and LWPR controller have comparable performance in the presence of parametric uncertainty
including friction. The ACT controller outperforms LWPR when the dynamic structure is
accurately known and the trajectory is persistently exciting.
Keywords: Learning Control, Adaptive Control, Robot Dynamics
1. INTRODUCTION
The use of robotics worldwide is most prevalent in the
industrial setting where the environment is highly con-
trolled. Under these conditions, robot manipulation often
consists of repetitive tasks such as pick-and-place motions,
allowing the use of simple, computationally inexpensive
decentralized Proportional-Integral-Derivative (PID) con-
trol which treat the nonlinearities and highly coupled
nature of manipulators as disturbances. Un like decen-
tralized controllers, control strategies that are based on
the dynamic model of the manipulator, known as model-
based controllers, present numerous advantages such as
increased performance during high-speed movements, re-
duced ene r gy consumption, improved tracking accuracy
and the possibility of compliance (Nguyen-Tuong et al.,
2009). However, th e performance of model-based control
is highly dependent upon the accurate representation of
the robots dynamics, which includes precise knowledge of
the inertial parameters of l in k mass, centre of mass and
moments of inertia, and friction parameters (Craig et al.,
1986). In practice, obtaining su ch a model is a challenging
task which involves modeling physical processes that are
not well understood or dicult to model, such as friction
(Armstrong-H´elouvry et al., 1994) and backlash. Thus, as-
sumptions conc er n in g these eects are often made, leading
to inaccuracies in the model. Furthermore, uncertainties
in the physical parameters of a system may be introduced
from discrepancies betwee n the manufacturer data and the
actual system (Ayusawa et al., 2008). Changes to operat-
This work was partially supported by the Natural Sciences and
Engineering Research Council of Canada.
ing conditi ons can also cause the structure of the system
model to change, thus resulting in degraded perf ormanc e.
Tradit ionall y, adaptive control strategies have been used
to estimate parameters of the dynamic model online (Craig
et al., 1986), but with the requirements of kn owing joint
accelerations and good initial estimates of the system
parameters. Since ( Cr aig et al., 1986), subsequent work
has been done to eliminate these constraints (Ortega and
Spong, 1988; Burdet and Codourey, 1998; Yu and Lloyd,
1995). A number of adaptive laws have been proposed
(Slotine and Li, 1987; Landau and Horowitz, 1988) based
on the preservation of passivity properties of the robot.
Although they dier from the class of controllers in (Craig
et al., 1986), the motivation for these schemes is also
to eliminate the need for joint acceleration measurement
(Ortega and Spong, 1988). De s pit e these advancements,
adaptive methods are still reliant upon adequate knowl-
edge of the structure of the dynamic model and are thus
susceptible to mo d el in g errors and changes in the model
structure. An alternative solution is sliding mode control
(Slotine, 1985) which has been shown to be robust to
system modeling errors, but is susceptible to control chat-
tering due to its discontinuity across sliding surfaces (Yao
and Tomizuka, 1996).
Whereas adaptive control strategies assume an underlying
dynamic structure to the system, model learning con-
trollers attempt to learn the dynamic model of the system.
An early approach to this pr oblem named MEMor y (Bur-
det and Codourey, 1998) involves storing the dynamics of
a system in memory for use as the feedforward term of sub-
sequent runs, assuming that the system repeats the same
trajectory. More recently, statistical regressi on approaches

have been used to infer the optimal structure to des c r ibe
the observed data, making it possible to encode nonlinear-
ities whose structure may not be well-known. Solutions to
this form of supervised learning can be broadly cate goriz ed
into two types (Vijayakumar et al., 2005) - global methods
such as G aus s ian Process Regression (G PR) (Rasmussen
and Williams, 2006) and Support Vector Regression (SVR)
(Nguyen-Tuong et al., 2009), and local methods such as
Locally Weighted Pr ojection Regression (LWPR). Recent
studies comparing these le arn in g methods (Nguyen-Tuong
et al., 2008) show that while SVR and GPR can potentially
yield higher accuracy th an LWPR, their computational
cost i s still prohibitive for online incremental learning.
Rather than learn in g the underlying model structure of the
system, Iterative learn in g control (ILC) (Arimoto et al.,
1984; Bristow et al., 2006) incorporates information from
error signals in previous iterations to directly modify the
control input for sub s eq ue nt iterations. However, ILC is
limited primarily to systems which track a specic repeat-
ing trajectory and are subject to repeating disturbances
(Bristow et al., 2006), whereas the model learn in g ap-
proaches such as LWPR can be incrementally trained to
deal wit h non-repeating trajectories.
(Burdet and Codour ey , 1998) compared various non-
parametric learning approaches, including Neural Net-
works and the MEMory algorithm to model based adaptive
controllers. An experimental comparison of several adap-
tive laws is given in (Whit comb et al., 1993). However,
to the authors knowledge, since these papers, there has
not been any work in comparing the newer generation
of regression-based learning techniques to model based
control strategies. Hence, this paper focuses on evaluating
LWPR as an alternative control method to traditional
model-based control schemes. The performance of the
LWPR controller is compared to model-based controllers,
i.e. Resolved Acceleration (Sciavicco and Scicliano, 2000),
and Adaptive computed t orq ue (Craig et al., 1986). A
quantitative analysis and comparison of the performance
of each controller is given by carrying out simulations
involving various trajector ie s , acceler ation s and velocity
proles, as well as parameter uncertainty. By analyzing
the performance of each controller under these conditions,
this p aper also aims to identify scenarios for which each
controller is best suited.
2. OVERVIEW OF MODEL-BASED CONTROL
The dynamic eq u ation of a manipulator characterizes the
relationship between its motion (position, velocity and ac-
celeration) and the joint torques (Sciavicco and Scicliano,
2000):
M(q)¨q + C(q, ˙q) + G(q) = τ (1)
q is the nx1 vector of joint angles for an n-degree of
freedom (DOF) manipulator, M(q) is the nxn inertia
matrix, C(q, ˙q) is the nx1 centripetal and Coriolis force
vector, G(q) i s the nx1 gravity l oading vector and τ
is the nx1 torque vector. Equation (1) does not include
additional tor qu e components caused by friction, backlash,
actuator dynamics and contact with the environment. If
accounted for, these components are modeled as additional
terms in (1).
Model-based controllers apply the joint space dynamic
equation (1) to cancel the nonlinear and coupling eects
of the manipulator. A common example of this is the
computed torque approach (Sciavicco and Scicliano, 2000)
in which the control signal u is composed of the computed
torque signal, u
CT
, which is se t to the torque determined
directly from (1). This term globally linearizes and decou-
ples the system, and thus a linear c ontroller can be applied
for the feedback term, u
FB
, which provides stability and
disturbance reject ion. Typically a PD scheme is used.
Desirable performance of the computed torque approach
is based on the assumption that values of the parameters
in (1) match the actual parameters of the physical system.
Otherwise, imperfect cancelation of the nonlinearities and
coupling occurs. Hence, the resulting system is not fully
linearized and decoupled and thus higher feedback gains
are nec es s ary to achieve good performance.
In practice, the dynamic parameters of a manipulator not
known precisely enough to perfectly cancel the nonl ine ar
and coupling terms (Craig et al., 1986). One solution is
the adaptive control approach (Craig et al., 1986),(Ortega
and Spong, 1988). In addition to an underlying control
objective, an adaptive controller also incorporates a pa-
rameter update law wh ich estimates unknown parameters
based on the tracking error. In (Craig et al., 1986), an
adaptive version of the computed torque control method
is presented. In order to estimate the inertia parameters
of the robot, the dynamic model (1) is reformulated as:
τ = φ(q, ˙q, ¨q)θ (2)
where φ is an nxr regressor matrix which depends on
the kin emati cs of the robot, θ is an rx1 vector of un-
known parameters. This model is linear in the parameters,
allowing a Lyapunov-based parameter update law to be
implemented:
˙
ˆ
θ = γφ
T
ˆ
M
1
e (3)
where
ˆ
θ is the estimate of the unknown inertia parameters,
γ is an rxr gain matrix,
ˆ
M is the estimated inertia matrix,
e i s the ltered servo error and r is the number of un-
known parameters. For this paper, the inertia parameters
(Khosla, 1989) as well as Coulomb and friction parameters,
are tr eate d as unknowns.
3. LEARNING INVERSE DYNAMICS
While the adaptive approach requires accurate knowledge
of the structure of the dy namic model of the manipulator,
the learning approach obtains a model using measured
data (Nguyen-Tuong et al., 2009), allowing unknown or
unmodeled nonlinearities such as friction and backlash
to be accounted for. In order to be practical for manip-
ulator control, learning algorithms must process contin-
uous str e ams of training data to update the model and
predict outputs fast enough for real-time control. Locally
We ighted Pr ojection Regression (LWPR) achieves the se
objectives using nonparametric statistics (Schaal et al.,
2002).
The problem of learning the inverse dynamics relationship
in the joint space can be described as the map from joint
positions, velocities and accelerations to torques

(q, ˙q, ¨q
d
) 7→ τ (4)
where τ is the nx1 torque vector, and q is the nx1 vector
of gener aliz ed coordinates.
LWPR approximates this mapping with a set of piecewise
local li ne ar models based on the training data that the
algorithm receives. Formally stated, this approach assumes
a stand ard regression model of the form
y = f (X) + ε (5)
where X is the input vector, y the output vector, and
ǫ a zero-mean random noise term. For a single output
dimension of y, given a data point X
c
, and a s u bs et of
data close to X
c
, with the ap pr opr i ately chosen measure
of closeness, a linear model can be t t o the subset of data:
y
ik
= β
ik
T
X + ε (6)
where y
i
k denotes the k
th
subset of data close to X
c
corresponding to the i
th
output dimension and β
ik
is the
set of parameters of the hyperplane that describe y
i
k. The
region of validity, termed the receptive eld (Vijayakumar
et al., 2005) is given by
w
ik
= exp(
1
2
(X X
ck
)
T
D
ik
(X X
ck
)) (7)
where w
ik
determines the weight of the k
th
local linear
model of the i
th
output dimension (i.e. the ik
th
local linear
model), X
ck
is the centre of the k
th
linear model, D
ik
corresponds to a positive semidefinite distance parameter
which determine s the size of the ik
th
receptive eld. Given
a quer y point X, LWPR calculates a predicted output
ˆy
i
(X) =
K
X
k=1
w
ik
ˆy
ik
/
K
X
k=1
w
ik
(8)
where K is the number of linear models, ˆy
ik
is the
prediction of the ik
th
local l in ear model given by (6)
which is weighed by w
ik
associated with its receptive e ld.
Thus, the prediction ˆy
i
(X) is the weighted sum of all the
predictions of the local models, where the models having
receptive elds centered closest to the query point ar e most
signicant to the predict ion. This prediction is repeated i
times for each dimension of the output vector y.
Determining the set of parameters β of the hyperplane
is done via regression, but can be a time consuming task
in the presenc e of high-dimensional input data. To reduce
computational eort, LWPR assumes that the data can
be characterized by local low-dimensional distributions,
and attempts to reduce the dimensionality of the input
space X u si ng Partial Least Squ are s regre ss ion (PLS) .
PLS ts linear models using a set of univariate regressions
along selected projections in input space which are chosen
according to the correlation between input and output
data ( Schaal et al., 2002).
4. SIMULATIONS
In order to evaluate the performance of the LWPR learning
controller, two classical’ controllers in the joint space
were also implemented: t he resolved acceleration (RA)
controller (Sciavicco and Scicliano, 2000) and the adaptive
computed torque (ACT) controller (Craig et al., 1986;
Ortega and Spong, 1988) given by (3).The performance of
these controllers was evaluated in simulation using Matlab,
the Robotics Toolbox (RTB) (Corke, 1996) and the open
source LWPR code (Vijayakumar et al., 2005). LWPR was
used to learn the joint space dynamics of a standard 6 DOF
Puma 560 with the kinematic and dynamic paramet er s
obtained from the RTB. The control loop executed at
1ms while the model was updated every 5ms for both the
LWPR and ACT algorithms.
In order to properly assess the performance of the model-
based controllers, a trajectory which excites the dynam-
ics of the system caused by inertia, gravity and Corio-
lis/centripetal ee ct s in (1) must be tracked. The gure
8 trajectory (Nakanishi et al., 2008) is used, as it includes
both straight and curved sections which induce signicant
torques from Coriolis/centripetal eects.
The rst simulation involves the position tracking of the
gure 8 trajectory in the horiz ontal XY plane, with
a length and width of 0.2m. The position component
of the end eector trajectory was designed in the task
space and converted to a joint space trajectory using
a numerical inverse kinematics algorithm (Corke, 1996).
To evaluate orientation control, a sinusoidal signal was
input as the desired angular velocity of the end eector.
Figure 8 frequencies of approximately 0.25 and 0.3 Hz
were used to test the tracking capabilities of the controllers
under various velocities. The LWPR controller was trained
for 60s on the 0.25Hz t r ajectory, after which training
was stopped and tracking performance was evaluated.
The system was then allowed to train for another 120s.
Next the trajectory frequency was increased to 0.3Hz,
and the system was trained for an additional 30s, after
which tracking performance was evaluated. These results
are then compared against the performance of the RA
controller. The dur ations of training were determined by
observing the mean sq uar ed error (M SE ) of the predicted
torques from the LWPR controller. Training was stopped
when the observed MSE had asymptotically decreased to a
low value. Since no parameter perturbation was introduced
yet, only the RA and LWPR c ontrollers are compared i n
the rst simulation.
The second simulation involves tracking the gure 8
pattern with varied end eector loads, thereby introducing
model parameter errors. The trained system from the rst
simulation (0.25Hz and 180s training) was used to track
the gure 8 trajectory, but with additional end eector
masses of 0.5 and 1kg. The same conditions were repeated
on the ACT and RA controller. After 30s of training
time for the LWPR controller and ACT controller, the
performance of all th r ee controllers was evaluated.
The thir d simulation adds varying amounts of er ror in the
inertia parameters of the model while observ in g the re-
sulting performance of the three controllers when tracking
the gure 8 trajectory. Training and adaptation of these
models was done in the same manner as above.
The fourth simulation introduces friction in addition to
inertia parameter uncertainty while tracking a persistently
exciting (PE) trajectory as described in (Craig et al.,
1986), which is designed directly in the joint space as a
linear combination of sinusoids. The resulting task space
trajectory is a cardioid-like shape, as seen in gure 2.
This trajectory is used for two re asons . Firstly, by using

a trajectory with signicant frequency c ontent, the eect
of persistence of excitation on the tr acking and perfor-
mance of ACT will be evaluated. Secondly, by shifting
the operating range of the manipulator away from that
of the gur e 8 trajectory, the generalization performance
of LWPR will be tested. Furthermore, both Coulomb and
viscous friction are introduced into the simulation. In
order to assess the ACT controllers ability to cope with
unmodeled dynamics, two cases are tested: one in which
both Coulomb and viscous f r ict ion are accounted for in
equation (1), and one in which only Coulomb friction is
modeled. Friction is modeled as:
τ
f
= csign(
˙
q) + v
˙
q
(9)
where τ
f
is the torque due to Coulomb and viscous
friction, c is the Coulomb friction constant, and v is
the viscous frict ion constant. The friction constants were
obtained from the defaults for the Puma 560 in the RTB.
4.1 Parameter Tuning and Initialization
The s t abili ty of the ACT controller was found to be highly
sensitive to the adaptive gain parameter, γ (3). While a
higher value of γ generally results in faster adaptation
time, it increases the systems sensitivity to noise and
numerical errors from integration of the time derivative
of the estimated parameters (3). An adaptive gain of 0.01
was found to be the best tradeo.
Although LWPR incorporates many algorithms which
enable the system to automatically adjust its parameters
for opti mal performance, initial values of the s e parameters
can signicantly impact the convergence rate. The initial
value for the distance parameter D (7) dictates how large
a receptive eld is upon initialization. Too small a value of
D (corresponding to large receptive elds) tends to delay
convergence while a larger value of D results in overtting
of the data (Vijayakumar et al., 2005). This parameter was
generally tuned through a trial-and-error process which
involved monitoring the MSE of the predicted values
during the training phase. The initial performance of the
LWPR controller is also highly dependent upon the data
sets that are u se d to train the LWPR model. Because
LWPR is a local l ear ni ng approach, it must be tr aine d
in the region(s) of input space that the manipulator will
be operating in. In order to train the model, a low-gain PD
controller was used to track the desired trajectory while
the LWPR model obtained training data according to the
mapping in (4). The initial value if the distanc e p aramet er ,
D, was set to 0.05 for each input dimension.
4.2 Results
The LWPR model was trained on the gure 8 trajectory
at 0.25Hz, enabling it to predict the necessary torques for
tracking at frequencies near 0.25Hz. As seen in Figure
1 and Table 1, after an additional training period of
40s, the LWPR controller compensated for the 0.3Hz
trajectory, allowing it to perform nearly as well as the ideal
RA controller. This illustrates the ability of the LWPR
controller to rapidly adjust to changes in its oper ati ng
conditions. However frequencies greater than 0.3Hz were
sucient to push the system far enough from the trained
Table 1. Frequency - RMS tracking error
(mm,deg)
Frequency 0.25Hz 0.30Hz
RA 1.80, 0.15 2.03, 0.22
LWPR, 60s 2.71, 1.31 /
LWPR, 180s 1.95, 0.25 2.20, 0.42
region of input space, eventually resulting in the prediction
of zero for all t he joint torques. A potential soluti on is a
more comple te initial training of the LWPR model. If the
initial traini ng set were to include a lar ger subset of the
input space obtained through motor babbling (Peters and
Schaal, 2008) for example, it is expected that the LWPR
controller would be able to handle larger perturbations.
Fig. 1. Position Tracking Error at 0.25Hz
0 0.05 0.1 0.15 0.2
−0.4
−0.38
−0.36
−0.34
−0.32
−0.3
−0.28
−0.26
−0.24
−0.22
−0.2
X vs Xref
x [m]
y [m]
Xr
RA
LWPR 60s
LWPR 180s
The second simulation evaluated the ability of the con-
trollers to hand le unmodeled end eector loads. As seen
in Table 2, due to the unknown mass of the end eector,
imperfect linearization and decoupling cause a decrease
in tracking performance of the RA controller. Both the
LWPR and ACT controller are able to outperform the
RA controller after 30s of additional training. Although
the ACT controller has a-priori knowledge of the structure
of the dynamic equation of the manipulator, it does not
perform any better than the LWPR controller in position
tracking after the same length of adaptation time. This
is due to the slow convergence of estimated masses to
their actual values, which can in turn be explained by
the relatively low adaptive gain and the lack of a PE
trajectory. However, as seen in the orientation results, by
tracking sinusoidal angular velocities on each joint of the
wrist, the ACT controller yields much better orientation
tracking as compared to the LWPR controller, illustrating
the importance of PE trajectories in the performance of
the ACT controller. The LWPR controller was able to
learn the inverse dynamics for both th e 0.5kg and 1kg
payload, but n ot for masses greater than 1kg. Similar to
the rst simulation, a suciently large disturbance will
push the system to operate in a region outside its training,
thus yielding poor tracking performance.
Simulation three introduces parameter estimate error s
into all the link masses, centre of mass locations and
the moments of inertia of each li nk . Table 3 illustrates
that the inaccurate knowledge of the dynamic parameters
causes signicant degradation in performance for the RA
controller, due to the imperfect linearization of the s ys tem

Table 2. Payload - RMS tracking error
(mm,deg)
Payload +0.5kg +1kg
RA 4.95, 0.60 16.45, 1.15
ACT, 40s 4.44, 0.44 2.70, 1.26
LWPR, +40s 4.16, 0.65 7.01, 1.29
dynamics. As see n in Table 3, the performance of the
ACT controller was particularly poor in comparison to
the LWPR controller, and even the RA controller. This
can be explained by the fact that the perturbation of
the inertia parameters was applied to all the joints of the
manipulator, unlike the c ase of end eector loading where
only the parameters of one link was perturbed. Hence, it is
expected that the adaptive controller would require b oth
a persistently exciting trajectory, which excites all the
dynamic modes of the structure, (Craig et al., 1986) and
a longer adaptation time than 40s to yield better tracking
results in this scenario. The importance of PE trajectories
will be illustrated in the n ex t simulation. For the LWPR
controller, s i milar findings to that of simulations one and
two were observed in that inertia parameter perturbations
greater than 10% were sucient to prevent LWPR from
predicting accurate joint torques.
Simulation four introduces the PE trajectory and Coulomb
and viscous friction in addition to inertia parameter error.
As seen in Table 4, the RA controller performed the
poorest due to the error in its model parameters. When
the ACT is presented with the structure of the full friction
model, the resulting tracking performance is signicantly
better than the RA controller.
Fig. 2. PE Trajectory - 5% inertia and friction error
−0.1 0 0.1 0.2 0.3 0.4
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
X vs Xref
x [m]
y [m]
Xr
RA
LWPR
ACT
However, when only partial knowledge of the model is
present (in this case only viscous friction), the performance
gain is no longer present. Unlike simulation three, the
ACT controller now outperforms the LWPR controller,
provided that the fric ti on model is fully known. This
further illustrates the importance of: 1) a persistently
exciting trajectory and 2) acc ur ate knowledge of structure
of the dynamic model when using adaptive control. The
PE trajectory was also chosen to be signicantly dierent
from the gure 8 in order to test the generalization of
the LWPR model. The rst test involved using the model
that was learned in simulation three to attempt to track
the PE trajector y. This model had seen roughly 10,000
training points, all of which were localized to the gure
Table 3. Parameter Error - RMS tracking error
(mm,deg)
Parameter Error +1% +5%
RA 2.15, 0.25 2.65, 0.30
ACT, 40s 2.20, 0.12 2.75, 0.15
LWPR, +40s 1.55, 0.75 1.66, 0.86
Table 4. Parameter Error and Friction - RMS
tracking error (mm,deg)
Parameter error +1% +5%
RA 2.45, 0.55 3.10, 0.80
ACT, partial friction, 40s 2.0, 0.40 2.1, 0.45
ACT, full friction, 40s 1.65, 0.30 1.70, 0.32
LWPR, 240s 1.75, 0.45 1.80, 0.50
LWPR, motor babbling, 120s 1.82, 0.50 1.85, 0.55
8 trajectory. As expected, this LWPR model was unable
to predict e nou gh torques in the operating range of the
PE trajectory. Hence, the LWPR model had to be re-
trained on the PE trajector y, taking roughly 240 seconds
to achieve good p er for manc e in the presence of friction.
This simulation was th en rep e ated with a model that was
initialized through the use of motor babbling (Peters and
Schaal, 2008). Here, a joint space trajectory was made
by randomly selecting a point in the robots expected
operating range ab out which small sinusoidal trajectories
were executed by the joints. This sequence was rep e ated
at dierent points until sucient coverage of the operating
range was seen. Roughly 10,000 training points were
generated from motor babbling. A PD controller was then
used to track this trajectory and the res ul tin g data was
used to train the LWPR model. As seen in Table 4, by
initializing the LWPR model t hi s way, good performanc e
on the PE trajectory could be learned in half the time
compared to the case without motor babbl ing.
Even without any a-priori knowledge of the structure or
parameters of the dynamics, LWPR is able to learn the
inverse dynamics function of a manipulator accurately
enough to yield near-optimal control results within min-
utes of training. Unlike the adaptive controller which relies
on pe r s is ten ce of excitation for tracking performance, the
LWPR approach can be traine d on an arbitrary trajectory
provided that it is given sucient time to learn.
When tracking a PE trajectory with a known dyn amic
model, the ACT clearly outperforms the LWPR controller
in terms of tracking accuracy and adaptation time, which
is expected due to its incorporation of a-priori knowledge.
However, if LWPR is presented with sucient time to
learn, its performance will closely approach that of ACT,
but will not surpass it due to the use of local linear approx-
imations of the system dynamics. Howe ver, the ACT con-
troller is at a disadvantage since not all trajectories meet
the PE requir eme nt. For this reason, the identication of
system parameters is often done oine on a predetermined
trajectory which is optimized to yield the best parameter
estimates (Khosla, 1989). W hi le this may yield r es ul ts
better than the LWPR controller, the benets of online,
incremental learning are lost, where LWPR excels.
The performance of LWPR outside of are as in which it
has trained is poor. This was clearly illustrated when a
large perturbation t o the in er t ia parameters caused the

Citations
More filters

Journal ArticleDOI
Abstract: Feel lonely? What about reading books? Book is one of the greatest friends to accompany while in your lonely time. When you have no friends and activities somewhere and sometimes, reading book can be a great choice. This is not only for spending the time, it will increase the knowledge. Of course the b=benefits to take will relate to what kind of book that you are reading. And now, we will concern you to try reading modelling and control of robot manipulators as one of the reading material to finish quickly.

473 citations


Proceedings ArticleDOI
24 Dec 2012
TL;DR: The proposed approach for online learning of the inverse dynamics model using Gaussian Process Regression is compared to existing learning and fixed control algorithms and shown to be capable of fast initialization and learning rate.
Abstract: Model-based control strategies for robot manipulators can present numerous performance advantages when an accurate model of the system dynamics is available. In practice, obtaining such a model is a challenging task which involves modeling such physical processes as friction, which may not be well understood and difficult to model. This paper proposes an approach for online learning of the inverse dynamics model using Gaussian Process Regression. The Sparse Online Gaussian Process (SOGP) algorithm is modified to allow for incremental updates of the model and hyperparameters. The influence of initialization on the performance of the learning algorithms, based on any a-priori knowledge available, is also investigated. The proposed approach is compared to existing learning and fixed control algorithms and shown to be capable of fast initialization and learning rate.

15 citations


Cites background from "A comparison of classical and learn..."

  • ...While adaptive control can provide an online estimate of the dynamic parameters, is still reliant upon adequate knowledge of the structure of the dynamic model and is thus particularly susceptible to the effects of unmodeled dynamics [13]....

    [...]

  • ...A comparison between learning approaches such as LWPR and classical control techniques [13] shows that both the adaptive controller and LWPR controller have comparable performance in the presence of parametric uncertainty....

    [...]


Journal ArticleDOI
TL;DR: The proposed approach for online learning of the dynamic model of a robot manipulator is tested on an industrial robot, and shown to outperform independent joint and fixed model-based control.
Abstract: This paper proposes an approach for online learning of the dynamic model of a robot manipulator. The dynamic model is formulated as a weighted sum of locally linear models, and Locally Weighted Projection Regression (LWPR) is used to learn the models based on training data obtained during operation. The LWPR model can be initialized with partial knowledge of rigid body parameters to improve the initial performance. The resulting dynamic model is used to implement a model-based controller. Both feedforward and feedback configurations are investigated. The proposed approach is tested on an industrial robot, and shown to outperform independent joint and fixed model-based control.

12 citations


Cites background or methods from "A comparison of classical and learn..."

  • ...A technique for improving the initial learning performance by making use of full or partial knowledge of the rigid body model is applied (de la Cruz et al. (2011b))....

    [...]

  • ...However, both dynamic parameter estimation and adaptive control methods assume a known dynamic model structure, and can be sensitive to error when the structure is not modeled accurately (de la Cruz et al. (2011a))....

    [...]

  • ...In order to improve the generalization performance of LWPR, an algorithm for incorporating a-priori knowledge from the RBD model (1) into the LWPR algorithm is applied (de la Cruz et al. (2011b))....

    [...]

  • ...The same motor babbling strategy as in (de la Cruz et al. (2011a)) was applied for this experiment, resulting in roughly 50,000 initial training points from motor babbling....

    [...]

  • ...Although LWPR has the ability to learn in an online, incremental manner due to its local learning approach, performance deteriorates quickly as the system moves outside of the region of state space it has been trained in (de la Cruz et al. (2011a))....

    [...]


Proceedings ArticleDOI
21 May 2018
TL;DR: This work proposes a scheme for collecting a sample of correspondences from the robots for training transfer models, and demonstrates the benefit of knowledge transfer in accelerating online learning of the inverse dynamics model between several robots, including between a low-cost Interbotix PhantomX Pincher arm and a more expensive and relatively heavier Kuka youBot arm.
Abstract: Online learning of a robot's inverse dynamics model for trajectory tracking necessitates an interaction between the robot and its environment to collect training data This is challenging for physical robots in the real world, especially for humanoids and manipulators due to their large and high dimensional state and action spaces, as a large amount of data must be collected over time This can put the robot in danger when learning tabula rasa and can also be a time-intensive process especially in a multi-robot setting, where each robot is learning its model from scratch We propose accelerating learning of the inverse dynamics model for trajectory tracking tasks in this multi-robot setting using knowledge transfer, where robots share and re-use data collected by preexisting robots, in order to speed up learning for new robots We propose a scheme for collecting a sample of correspondences from the robots for training transfer models, and demonstrate, in simulations, the benefit of knowledge transfer in accelerating online learning of the inverse dynamics model between several robots, including between a low-cost Interbotix PhantomX Pincher arm, and a more expensive and relatively heavier Kuka youBot arm We show that knowledge transfer can save up to 63% of training time of the youBot arm compared to learning from scratch, and about 58% for the lighter Pincher arm

10 citations


Cites background or methods from "A comparison of classical and learn..."

  • ...In some cases it has been shown that learning can be accelerated by initializing the model with random data generated through a motor babbling process [28], [29]....

    [...]

  • ...Random initialization We also separately initialize learning with random data generated in the motor babbling session as a benchmark, denoted ‘random’, as it has previously been shown to accelerate learning [28], [29]....

    [...]


References
More filters

Book
23 Nov 2005
TL;DR: The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and deals with the supervised learning problem for both regression and classification.
Abstract: A comprehensive and self-contained introduction to Gaussian processes, which provide a principled, practical, probabilistic approach to learning in kernel machines. Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics. The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

11,343 citations


Journal ArticleDOI
TL;DR: A betterment process for the operation of a mechanical robot in a sense that it betters the nextoperation of a robot by using the previous operation's data is proposed.
Abstract: This article proposes a betterment process for the operation of a mechanical robot in a sense that it betters the next operation of a robot by using the previous operation's data. The process has an iterative learning structure such that the (k + 1)th input to joint actuators consists of the kth input plus an error increment composed of the derivative difference between the kth motion trajectory and the given desired motion trajectory. The convergence of the process to the desired motion trajectory is assured under some reasonable conditions. Numerical results by computer simulation are presented to show the effectiveness of the proposed learning scheme.

2,966 citations


"A comparison of classical and learn..." refers background in this paper

  • ...Rather than learning the underlying model structure of the system, Iterative learning control (ILC) (Arimoto et al., 1984; Bristow et al., 2006) incorporates information from error signals in previous iterations to directly modify the control input for subsequent iterations....

    [...]

  • ...However, ILC is limited primarily to systems which track a specific repeating trajectory and are subject to repeating disturbances (Bristow et al., 2006), whereas the model learning approaches such as LWPR can be incrementally trained to deal with non-repeating trajectories....

    [...]


Journal ArticleDOI
TL;DR: This survey is the first to bring to the attention of the controls community the important contributions from the tribology, lubrication and physics literatures, and provides a set of models and tools for friction compensation which will be of value to both research and application engineers.
Abstract: While considerable progress has been made in friction compensation, this is, apparently, the first survey on the topic. In particular, it is the first to bring to the attention of the controls community the important contributions from the tribology, lubrication and physics literatures. By uniting these results with those of the controls community, a set of models and tools for friction compensation is provided which will be of value to both research and application engineers. The successful design and analysis of friction compensators depends heavily upon the quality of the friction model used, and the suitability of the analysis technique employed. Consequently, this survey first describes models of machine friction, followed by a discussion of relevant analysis techniques and concludes with a survey of friction compensation methods reported in the literature. An overview of techniques used by practising engineers and a bibliography of 280 papers is included.

2,545 citations


"A comparison of classical and learn..." refers background in this paper

  • ...In practice, obtaining such a model is a challenging task which involves modeling physical processes that are not well understood or difficult to model, such as friction (Armstrong-Hélouvry et al., 1994) and backlash....

    [...]

  • ...In practice, obtaining such a model is a challenging task which involves modeling physical processes that are not well understood or difficult to model, such as friction (Armstrong-Hélouvry et al., 1994) and backlash....

    [...]


Journal ArticleDOI
TL;DR: Though beginning its third decade of active research, the field of ILC shows no sign of slowing down and includes many results and learning algorithms beyond the scope of this survey.
Abstract: This article surveyed the major results in iterative learning control (ILC) analysis and design over the past two decades. Problems in stability, performance, learning transient behavior, and robustness were discussed along with four design techniques that have emerged as among the most popular. The content of this survey was selected to provide the reader with a broad perspective of the important ideas, potential, and limitations of ILC. Indeed, the maturing field of ILC includes many results and learning algorithms beyond the scope of this survey. Though beginning its third decade of active research, the field of ILC shows no sign of slowing down.

2,261 citations


"A comparison of classical and learn..." refers background in this paper

  • ...Rather than learning the underlying model structure of the system, Iterative learning control (ILC) (Arimoto et al., 1984; Bristow et al., 2006) incorporates information from error signals in previous iterations to directly modify the control input for subsequent iterations....

    [...]

  • ...However, ILC is limited primarily to systems which track a specific repeating trajectory and are subject to repeating disturbances (Bristow et al., 2006), whereas the model learning approaches such as LWPR can be incrementally trained to deal with non-repeating trajectories....

    [...]

  • ...limited primarily to systems which track a specific repeating trajectory and are subject to repeating disturbances (Bristow et al., 2006), whereas the model learning approaches such as LWPR can be incrementally trained to deal with non-repeating trajectories....

    [...]


Journal ArticleDOI
Abstract: A new adaptive robot control algorithm is derived, which consists of a PD feedback part and a full dynamics feedfor ward compensation part, with the unknown manipulator and payload parameters being estimated online. The algorithm is computationally simple, because of an effective exploitation of the structure of manipulator dynamics. In particular, it requires neither feedback of joint accelerations nor inversion of the estimated inertia matrix. The algorithm can also be applied directly in Cartesian space.

2,033 citations


Frequently Asked Questions (2)
Q1. What have the authors stated for future works in "A comparison of classical and learning controllers" ?

Future studies will involve experimental validation on physical robots as well as incorporating a-priori knowledge of the dynamic model of the system to improve performance of learning methods outside of the trained regions. 

This paper focuses on evaluating Locally Weighted Projection Regression ( LWPR ) as an alternative control method to traditional model-based control schemes.