scispace - formally typeset
Proceedings ArticleDOI

Closed-form conditions for convergence of the Gaussian kernel-least-mean-square algorithm

TLDR
Examining two easy-to-handle conditions that allow to examine how the stability limit varies as a function of the step-size, the kernel bandwidth, and the filter length allows to greatly simplify calculations.
Abstract
In addition to the choice of the usual linear adaptive filter parameters, designing kernel adaptive filters requires the choice of the kernel and its parameters One of our recent works has brought a new contribution to the discussion about kernel-based adaptive filtering by providing the first convergence analysis of the kernel-LMS algorithm with Gaussian kernel A necessary and sufficient condition for convergence has been clearly established Checking the stability of the algorithm can, unfortunately, be computationally expensive because one needs to calculate the extreme eigenvalues of a large matrix, for each set of candidate tuning parameters The aim of this paper is to circumvent this drawback by examining two easy-to-handle conditions that allow to examine how the stability limit varies as a function of the step-size, the kernel bandwidth, and the filter length One of them is a conjectured necessary and sufficient condition for convergence that allows to greatly simplify calculations

read more

Content maybe subject to copyright    Report

Closed-form conditions for convergence
of the Gaussian kernel-least-mean-square algorithm
C
´
edric Richard
(1)
, Jose-Carlos M. Bermudez
(2)
(1)
Universit
´
e de Nice Sophia-Antipolis, France
(2)
Department of Electrical Engineering, Federal University of Santa Catarina, Florian
`
opolis, SC, Brazil
Abstract—In addition to the choice of the usual linear adaptive
filter parameters, designing kernel adaptive filters requires the
choice of the kernel and its parameters. One of our recent
works has brought a new contribution to the discussion about
kernel-based adaptive filtering by providing the first convergence
analysis of the kernel-LMS algorithm with Gaussian kernel.
A necessary and sufficient condition for convergence has been
clearly established. Checking the stability of the algorithm can,
unfortunately, be computationally expensive because one needs
to calculate the extreme eigenvalues of a large matrix, for each
set of candidate tuning parameters. The aim of this paper is
to circumvent this drawback by examining two easy-to-handle
conditions that allow to examine how the stability limit varies
as a function of the step-size, the kernel bandwidth, and the
filter length. One of them is a conjectured necessary and
sufficient condition for convergence that allows to greatly simplify
calculations.
I. INTRODUCTION
Many practical applications require nonlinear signal pro-
cessing. Nonlinear system identification methods based on
reproducing kernel Hilbert spaces (RKHS) have gained popu-
larity over the last decades [2], [6]. Recently, kernel adaptive
filtering has been recognized as an appealing solution to the
nonlinear adaptive filtering problem, as working in RKHS
allows the use of linear structures to solve nonlinear estima-
tion problems. For an overview, see [8]. The block diagram
of a kernel-based adaptive system identification problem is
presented in Figure 1. Here, U is a compact subspace of IR
q
,
κ : U × U R is a reproducing kernel, (H, ,·i
H
) is the
induced RKHS with its inner product and z(n) is a zero-
mean additive noise uncorrelated with any other signal. The
representer theorem [6] states that the function ψ(·) which
minimizes the cost function
P
N
n=1
(ψ(u(n)) d(n))
2
, given
N input vectors u(n) and desired outputs d(n), can be written
as ψ(·) =
P
N
n=1
α
n
κ(·, u(n)). Since the order of the model
is equal to the number N of available data u(n), this approach
cannot be considered for online applications. To overcome this
barrier, authors in the field have focused on finite-order models
ψ(·) =
M
X
j=1
α
j
κ(·, u(ω
j
)). (1)
In [8], the authors present an overview of the existing tech-
niques to select the M kernel functions in (1) that form the
so-called dictionary, an example of which is the coherence
criterion [12]. The algorithms developed using these ideas
include the kernel least-mean-square (KLMS) algorithm [7],
the kernel recursive-least-square (KRLS) algorithm [3], the
kernel normalized least-mean-square (KNLMS) algorithm and
the kernel affine projection (KAPA) algorithm [5], [12], [13].
In addition to the choice of the usual linear adaptive filter pa-
rameters, designing kernel adaptive filters requires the choice
of the kernel and its parameters. Choosing the algorithm and
nonlinear model parameters to achieve a prescribed perfor-
mance is a difficult task, and requires an extensive analysis of
the algorithm stochastic behavior. Our work [11] has recently
brought a new contribution to the discussion about kernel-
based adaptive filtering by providing the first convergence
analysis of the KLMS algorithm with Gaussian kernel. The
filtering process is defined by
α(n + 1) = α(n) + η e(n) κ
ω
(n). (2)
where κ
ω
(n) = [κ(u(n), u(ω
1
)), . . . , κ(u(n), u(ω
M
))]
>
,
and κ(u, u
0
) the Gaussian kernel
κ(u, u
0
) = exp
−ku u
0
k
2
2ξ
2
(3)
with kernel bandwidth ξ. In [11], we derived expressions
for the mean-weight-error vector and the mean-square-error.
These models give engineers the opportunity to choose the
algorithm parameters a priori in order to achieve prescribed
convergence speed and quality of the estimate, and allow the
determination of stability limits. Checking the stability of the
algorithm (2) can be computationally expensive as it needs to
calculate the extreme eigenvalues of an (M
2
× M
2
) matrix,
say G, for each candidate tuning parameters η, M and ξ.
The aim of this paper is to circumvent this drawback
by examining two easy-to-handle conditions that allow to
examine how the stability limit varies as a function of the
step-size, the kernel bandwidth, and the filter length. The first
one is a sufficient condition based on the Gerschgorin disk
theorem, which has already been derived in [11]. The second
one is a conjectured necessary and sufficient condition for
convergence. It allows to greatly simplify calculations, and
to examine how the stability limits vary as a function of the
step-size η, the kernel bandwidth ξ, and the filter length M.
II. CONVERGENCE ANALYSIS
Let v(n) = α(n) α
opt
be the weight-error vector.
Let vector c
v
(n) be the lexicographic representation of the
autocorrelation matrix C
v
(n) = E{v(n) v
>
(n)}, i.e., the
matrix C
v
(n) is stacked column-wise into a vector c
v
(n).

Nonlinear system
Adaptive algorith m
u(n)
α(n)
z(n)
e(n)
U H
d(n)
ˆ
d(n)
+
+
+
Fig. 1. Kernel-based adaptive system identification.
It was shown in [11] that, under some simplifying statistical
assumptions, we have
c
v
(n + 1) = G c
v
(n) + η
2
J
min
r
κκ
(4)
with r
κκ
the lexicographic representation of the correlation
matrix R
κκ
= E{κ
ω
(n) κ
>
ω
(n)} of the kernelized input, and
J
min
the minimum MSE corresponding to the optimum weight
vector α
opt
= R
1
κκ
p
κd
, where p
κd
= E{d(n) κ
ω
(n)} is the
cross-correlation vector between κ
ω
(n) and d(n). Matrix G,
of size (M
2
× M
2
), is defined as
G =
h
h
11
h
12
. . . h
1M
. . . h
MM
i
(5)
with h
`p
the (M
2
× 1) lexicographic representation of the
matrix H
`p
, given by
if (i = j)
[H
ii
]
ii
= 1 2ηr
md
+ η
2
µ
1
[H
ii
]
pp
= η
2
µ
3
p 6= i
[H
ii
]
ip
= η
2
µ
2
ηr
od
= [H
ii
]
pi
p 6= i
[H
ii
]
p`
= η
2
µ
4
otherwise
if (i 6= j)
[H
ij
]
ij
= [H
ij
]
ji
=
1
2
(1 2ηr
md
+ 2η
2
µ
3
)
[H
ij
]
pp
= η
2
µ
4
p 6= i, j
[H
ij
]
ii
= [H
ij
]
jj
= η
2
µ
2
ηr
od
[H
ij
]
ip
= [H
ij
]
pi
=
1
2
(2η
2
µ
4
ηr
od
) p 6= i, j
[H
ij
]
pj
= [H
ij
]
jp
=
1
2
(2η
2
µ
4
ηr
od
) p 6= i, j
[H
ii
]
p`
= η
2
µ
5
otherwise
where the µ
k
s are the fourth-order moments of the kernelized
input defined as
µ
1
:= E{κ
4
ω
i
(n)}
µ
2
:= E{κ
3
ω
i
(n) κ
ω
j
(n)}
µ
3
:= E{κ
2
ω
i
(n) κ
2
ω
j
(n)}
µ
4
:= E{κ
ω
i
(n) κ
ω
j
(n) κ
2
ω
`
(n)}
µ
5
:= E{κ
ω
i
(n) κ
ω
j
(n) κ
ω
`
(n) κ
ω
p
(n)}.
(6)
Parameters r
md
and r
od
are the main-diagonal and off-diagonal
entries of the correlation matrix R
κκ
given by
r
md
:= E{κ
2
ω
i
(n)}
r
od
:= E{κ
ω
i
(n) κ
ω
j
(n)}.
(7)
As extensively explained in [11], parameters µ
k
s, r
md
and r
od
can be calculated theoretically in the case of i.i.d. Gaussian
inputs u(n). Their value depends on the moments of u(n),
the filter length M, and the kernel bandwidth ξ.
Before concluding this section, let us introduce the follow-
ing inequalities relating the fourth-order moments µ
k
and the
entries of the correlation matrix R
κκ
, which will be used in
the sequel. Using H
¨
older’s inequality, it can be shown that
µ
5
µ
4
µ
3
µ
2
µ
1
, (8)
and, by virtue of the Chebyshev’s sum inequality
r
2
md
µ
3
. (9)
We refer the reader to the proofs in [11].
We shall now examine the conditions for convergence
of the Gaussian KLMS algorithm using model (4). It can
be checked that the matrix G is symmetric. This implies
that it can be diagonalized, and all its eigenvalues are real-
valued. A necessary and sufficient condition for convergence
is that all these eigenvalues lie inside (1, 1) [9, Section 5.9].
First, we shall consider a sufficient condition based on the
Gerschgorin disk theorem. After arguing that these conditions
are too restrictive, we provide an easy-to-handle necessary and
sufficient condition for convergence, based on a conjecture.
A. Gerschgorin disk conditions
The eigenvalues of matrix G lie inside the union of Ger-
schgorin disks [4], each disk being centered at a diagonal
element of G, with radius given by the sum of the absolute
values of the remaining elements of the same row. A sufficient
condition for stability of (4) is thus given by
|[G]
ii
| +
M
2
X
`=1
`6=i
|[G]
i`
| < 1 (10)
for i = 1, . . . , M
2
. The definition of G shows that the rows
of this matrix have only two distinct forms, in the sense that
each row of G has the same entries as one of these two
distinct rows, up to a permutation. This implies that only two
Gerschgorin disks can be distinguished. Using (8), it can be
shown that all the entries of G are positive except possibly
for [G]
i`
= η
2
µ
2
η r
od
and [G]
i`
=
1
2
(2 η
2
µ
4
η r
od
).
Expression (10) thus leads to only two sufficient conditions,
defined as follows for M 3,
λ
(1)
ger
:= (1 2ηr
md
+ η
2
µ
1
) + (M 1)η
2
µ
3
+ 2(M 1)|η
2
µ
2
ηr
od
|
+ (M 1)(M 2)η
2
µ
4
< 1,
(11a)

TABLE I
STABILITY RESULTS FOR EXAMPLE 1
ξ M η
max
, η
conj
η
ger
0.0075 17 1.70 0.29
0.01 13 1.70 0.30
0.025 6 1.66 0.22
0.05 3 1.80 1.47
TABLE II
STABILITY RESULTS FOR EXAMPLE 2
ξ M η
max
, η
conj
η
ger
0.05 7 2.33
0.065 4 2.49 0.68
0.075 3 2.60 1.92
0.125 2 2.39 2.32
TABLE III
STABILITY RESULTS FOR EXAMPLE 3
ξ M η
max
, η
conj
η
ger
0.15 11 1.17
0.20 7 1.19
0.25 5 1.24
0.30 3 1.59
λ
(2)
ger
:= (1 2ηr
md
+ 2η
2
µ
3
) + 2|η
2
µ
2
ηr
od
|
+ (M 2)η
2
µ
4
+ 2(M 2)|2η
2
µ
4
ηr
od
|
+ (M 2)(M 3)η
2
µ
5
< 1.
(11b)
The intersection of these two conditions provides the following
sufficient condition for stability
λ
ger
(η, M, ξ) := max{λ
(1)
ger
, λ
(2)
ger
} < 1. (12)
which avoids multiple time consuming diagonalizations of the
matrix G. Solving λ
ger
= 1 to derive upper-bounds with
respect to η, M or ξ requires (basic) numerical methods.
We observe that λ
(1)
ger
and λ
(2)
ger
are piecewise polynomial
functions in η. Because they are both equal to 1 for η = 0,
their derivative at the origin must be strictly negative for
the conditions (11a)–(11b) to be meaningful. This leads to
the condition (M 1)r
od
< r
md
, which is very restrictive.
Application examples in Section III show situations where the
Gerschgorin disk test is ineffective.
B. Conjectured necessary and sufficient condition
It can be shown that there exist θ
1
, θ
2
R not simultane-
ously equal to zero, so that the (M
2
× 1) vector w with i-th
entry defined by
(
w
i
= θ
1
, if (i 1) M Z
w
i
= θ
2
, otherwise,
(13)
is an eigenvector of G. The conjecture is that the largest eigen-
value of G in absolute value is associated to an eigenvector of
the form (13). While we currently have no proof, we have not
numerically experienced any contradiction. The difficulty in
proving this result is that it does not only rely on the specific
structure of the matrix, but also on the expression and/or some
order relations of its entries such as (8).
Due to symmetries in matrix G, the eigensystem Gw = λw
of M
2
linear equations in unknowns θ
1
and θ
2
reduces to the
equation det(A λI) = 0, where A is the (2 × 2) matrix
whose entries a
ij
are given by
a
11
= η
2
(µ
1
+ (M 1)µ
3
) 2ηr
md
+ 1
a
12
= (M 1)
η
2
(2µ
2
+ (M 2)µ
4
) 2ηr
od
a
21
= η
2
(2µ
2
+ (M 2)µ
4
) 2ηr
od
a
22
= η
2
(2µ
3
+ 4(M 2)µ
4
+ (M 2)(M 3)µ
5
)
2η(r
md
+ (M 2)r
od
) + 1
(14)
Solving the above-mentioned equation yields the following
two real-valued eigenvalues
λ =
1
2
(a
11
+ a
22
∆)
λ
0
=
1
2
(a
11
+ a
22
+
∆)
(15)
with = (a
11
a
22
)
2
+4(M 1) a
2
21
. This finally implies the
conjectured necessary and sufficient condition for convergence
λ
conj
(η, M, ξ) :=
1
2
(|a
11
+ a
22
| +
∆) < 1. (16)
Obviously, exploiting this condition is much less computation-
ally demanding than diagonalizing the (M
2
×M
2
) matrix G,
and checking if its eigenvalues lie inside (1, 1). In addition,
it provides an upper-bound that can be easily studied even if
solving λ
conj
= 1 with respect to η, M or ξ requires (basic)
numerical methods.
III. EXPERIMENTATIONS
We shall now consider the experiments described in [11],
and compare the upper bounds λ
ger
and λ
conj
provided by the
Gerschgorin disk conditions (11a)–(11b), and the (conjectured)
necessary and sufficient condition (16), respectively. We shall
also check that λ
conj
matches the estimated largest eigenvalue
λ
max
in absolute value of the matrix G. Let η
ger
, η
conj
and η
max
be the maximum step sizes provided by these three approaches,
for fixed parameters M and ξ.
All the Matlab codes used in this paper are available on the
personal website of the first author: www.cedric-richard.fr
A. Experiment 1
We consider the problem studied in [10], for which
y(n) =
y(n 1)
1 + y
2
(n 1)
+ u
3
(n 1) (17)
where the output signal d(n) = y(n) + z(n) is corrupted by a
zero-mean i.i.d. Gaussian noise z(n) of variance σ
2
z
= 10
4
.
The input sequence u(n) is a zero-mean i.i.d. Gaussian se-
quence with standard deviation σ
u
= 0.15.
Table I reports the maximum step sizes η
ger
, η
conj
and η
max
,
for several values of M and ξ. It can be observed that the
condition imposed by the Gerschgorin disks is very restrictive
compared to the two others. Figure 2 (left) represents λ
ger
,
λ
conj
and λ
max
as a function of η, with parameters M and
ξ defined as in the first row of Table I. It can be noticed
that the two latter superimpose perfectly. Figure 3 represents
the conjectured largest eigenvalue λ
conj
of G as a function of
parameters η (left), M (middle), and ξ (right), in the vicinity

0 1 2.5 3
0.95
1
1.05
1.1
1.7036
0.29438
conj
,
max
(1)
ger
(2)
ger
0 0.5 1 1.5 2 3
0.95
1
1.05
1.1
2.3278
(1)
ger
(2)
ger
conj
,
max
0 0.5 1 1.5 2
0.95
1
1.05
1.1
1.1727
conj
,
max
(1)
ger
,
(2)
ger
Fig. 2. Comparison of the upper bounds (λ
(1)
ger
, λ
(2)
ger
) and λ
conj
provided by the Gerschgorin disk conditions (11a)–(11b), and the (conjectured) necessary
and sufficient condition (16), respectively, with the largest eigenvalue λ
max
of G in absolute value. The three experimental setups are described in the first
row of Table I (left), Table II (middle) and Table III (right).
of the stability limit defined by the first row of Table I. Finally,
Figure 4 (left) illustrates the convergence of the mean-square
error estimated by averaging over 500 runs. The step size was
arbitrarily chosen to be 1/3 of the maximum step size η
conj
.
We encourage the reader to refer to [11] for an analysis of the
stochastic behavior of the KLMS algorithm.
B. Experiment 2
We now consider the nonlinear dynamic system identifica-
tion problem studied in [14]. The input signal is a sequence
of statistically independent vectors u(n) = [u
1
(n) u
2
(n)] with
correlated samples satisfying u
1
(n) = 0.5 u
2
(n) + η
u
(n). The
second component of u(n) is an i.i.d. Gaussian noise sequence
with variance σ
2
u
2
= 0.0156, and η
u
(n) is a white Gaussian
noise with variance σ
2
u
1
= 0.0156. The nonlinear system under
study consists of the linear system with memory defined by
y(n) = u
1
(n)+0.5u
2
(n)0.2 y(n1)+0.35 y(n2), (18)
and the nonlinear Wiener function
ϕ
y
(n) =
y(n)
3(0.1 + 0.9 y
2
(n))
1/2
if y(n) 0
y
2
(n)(1 exp(0.7y(n)))
3
otherwise.
(19)
The signal d(n) = ϕ
y
(n) + z(n) is corrupted by a zero-mean
i.i.d. Gaussian noise z(n) with variance σ
2
z
= 10
6
. The initial
condition y(1) = 0 was considered in this example.
Table II reports the maximum step sizes η
ger
, η
conj
and η
max
,
for several values of M and ξ. Observe in Figure 2 (middle)
that, with the experimental setup described in the first row of
Table II, no bound on η was provided by the Gerschgorin disk
condition (12). The reason is that (M 1)r
od
< r
md
is not
satisfied in this case, because r
md
= 0.0439 and r
od
= 0.0088.
Finally, Figure 4 (middle) illustrates the convergence of the
mean-square error estimated by averaging over 500 runs. The
step size was arbitrarily chosen to be 1/3 of η
conj
.
C. Experiment 3
Finally, as a third example, we considered the fluid-flow
control problem studied in [1], [15]. The input signal was a
sequence u(n) = [u
1
(n) u
2
(n)] of statistically independent
vectors with samples satisfying u
1
(n) = 0.5 u
2
(n) + η
u
(n).
The second component u
2
(n) is a i.i.d. Gaussian sequence
with variance σ
2
u
2
= 0.0625, and η
u
(n) is a i.i.d. Gaussian
noise so that u
1
(n) has variance σ
2
u
1
= 0.0625. The nonlinear
system under study consists of the linear system
y(n) =0.1044 u
1
(n) + 0.0883 u
2
(n)
+ 1.4138 y(n 1) 0.6065 y(n 2)
(20)
and the nonlinear Wiener function
ϕ
y
(n) =
0.3163 y(n)
p
0.10 + 0.90 y
2
(n)
. (21)
The signal d(n) = ϕ
y
(n) + z(n) is corrupted by a zero-mean
i.i.d. Gaussian noise z(n) with variance σ
2
z
= 10
6
. The initial
condition y(1) = y(2) = 0 was considered in this example.
It can be noticed in Table III that no upper bound for the
step size η was provided by the Gerschgorin disk condition.
As previously, condition (M 1)r
od
< r
md
was not satisfied
in these cases. Figure 2 (right) represents λ
ger
, λ
conj
and λ
max
as a function of η, with parameters M and ξ defined as in the
first row of Table III. It can be noticed that λ
conj
and λ
max
superimpose perfectly. Finally, Figure 4 (middle) illustrates the
convergence of the mean-square error estimated by averaging
over 500 runs. The step size was arbitrarily chosen to be 1/3
of the maximum step size η
conj
.
IV. CONCLUSION
The kernel least-mean-square filter has become a popular
algorithm in nonlinear adaptive filtering due to its simplicity
and robustness. One of our recent works has brought a new
contribution to the analysis of this approach by providing the
first analytical models of convergence of the Gaussian kernel
least-mean-square algorithm. Checking its stability can be
computationally expensive as it needs to calculate the extreme
eigenvalues of large matrix, for each candidate parameter set-
ting. To circumvent this drawback, in this paper, we presented
two easy-to-handle conditions. The first one is a sufficient
condition based on the Gerschgorin disk theorem. The second
one is a conjectured necessary and sufficient condition for
convergence that allows to greatly simplify calculations.

0 0.5 1 1.5 2
0.98
0.99
1
1.01
1.02
1.7036
max
M = 17
=7.5 · 10
3
5 10 15 20 25 30
0.98
1
1.02
=1.70
=7.5 · 10
3
max
M
17
10
-4
10
-3
10
-2
0.995
1
1.01
max
M = 17
=1.70
0.0075
Fig. 3. Largest eigenvalue of G in absolute value provided by the (conjectured) expression λ
conj
as a function of η (left), M (middle), and ξ (right), in the
vicinity of the stability limit defined by the first row of Table I.
0 1000 2000 3000
-25
-20
-15
-10
-5
0
n
MSE
0 500 1000 1500
-25
-20
-15
-10
-5
0
n
MSE
0 200 400 600 800
-25
-20
-15
-10
-5
0
n
MSE
Fig. 4. Monte-Carlo simulation of KLMS algorithm with Gaussian kernel. The three experimental setups are described in the first row of Table I (left),
Table II (middle) and Table III (right). In each case, the step size η was arbitrarily chosen to be 1/3 of the maximum step size.
REFERENCES
[1] H. Al-Duwaish, M. N. Karim, and V. Chandrasekar. Use of multilayer
feedfoorward neural networks in identification and control of wiener
model. In Proc. Control Theory Appl., volume 143, May 1996.
[2] D. L. Duttweiler and T. Kailath. An RKHS approach to detection and
estimation theory: Some parameter estimation problems (Part V). IEEE
Trans. Inf. Theory, 19(1):29–37, 1973.
[3] Y. Engel, S. Mannor, and R. Meir. Kernel recursive least squares. IEEE
Trans. Signal Process., 52(8):2275–2285, 2004.
[4] G. H. Golub and C. F. V. Loan. Matrix Computations. The Johns
Hopkins University Press, 1996.
[5] P. Honeine, C. Richard, and J. C. M. Bermudez. On-line nonlinear
sparse approximation of functions. In Proc. IEEE ISIT’07, pages 956–
960, Nice, France, June 2007.
[6] G. Kimeldorf and G. Wahba. Some results on Tchebycheffian spline
functions. J. Math. Anal. Appl., 33:82–95, 1971.
[7] W. Liu, P. P. Pokharel, and J. C. Principe. The kernel least-mean-squares
algorithm. IEEE Trans. Signal Process., 56(2):543–554, February 2008.
[8] W. Liu, J. C. Principe, and S. Haykin. Kernel Adaptive Filtering. Jonh
Wiley & Sons, Inc., 2010.
[9] D. G. Luenberger. Introduction Dynamic Systems: Theory, Models &
Applications. Jonh Wiley & Sons, Inc., Stanford, CA, 1979.
[10] D. P. Mandic. A generalized normalized gradient descent algorithm.
IEEE Signal Processing Letters, 2:115–118, February 2004.
[11] W. D. Parreira, J. C. M. Bermudez, C. Richard, and J.-Y. Tourneret.
Stochastic behavior analysis of the Gaussian kernel-least-mean-square
algorithm. IEEE Trans. Signal Process., 60(5):2208–2222, 2012.
[12] C. Richard, J. C. M. Bermudez, and P. Honeine. Online prediction of
time series data with kernels. IEEE Trans. Signal Process., 57(3):1058
–1067, March 2009.
[13] K. Slavakis and S. Theodoridis. Sliding window generalized kernel affine
projection algorithm using projection mappings. EURASIP Journal on
Advances in Signal Processing, 2008:ID 735351, 2008.
[14] J. V
¨
or
¨
os. Modeling and identification of Wiener systems with two-
segment nonlinearities. IEEE Transactions on Control Systems Technol-
ogy, 11(2):253 257, March 2003.
[15] J.-S. Wang and Y.-L. Hsu. Dynamic nonlinear system identification
using a Wiener-type recurrent network with okid algorithm. Journal of
Information Science and Engineering, 24:891–905, 2008.
Citations
More filters
Journal ArticleDOI

Online Dictionary Learning for Kernel LMS

TL;DR: This theoretical analysis highlights the need for updating the dictionary in an online way, by discarding the obsolete elements and adding appropriate ones, and introduces a kernel least-mean-square algorithm with ℓ1-norm regularization to automatically perform this task.
Journal ArticleDOI

Kernel Affine Projection Sign Algorithms for Combating Impulse Interference

TL;DR: Simulations in the context of time-series prediction show that both the KAPSA and the VSS-KAPSA are robust against impulse interference and that both outperform other affine projection algorithms in terms of steady-state mean square errors.
Proceedings ArticleDOI

Kernel LMS algorithm with forward-backward splitting for dictionary learning

TL;DR: The aim of this paper is to provide theoretical results on the convergence of an ℓ1-norm regularization criterion with the mean-square error criterion for nonlinear adaptive filtering with kernels.
Proceedings ArticleDOI

Convergence analysis of kernel LMS algorithm with pre-tuned dictionary

TL;DR: Theoretical analysis of KLMS as a function of dictionary setting has rarely, if ever, been addressed in the literature and this theoretical analysis paves the way for future investigations on KLMS dictionary design.
Proceedings ArticleDOI

Convex combinations of kernel adaptive filters

TL;DR: This paper focuses on convex combinations of two single- kernel adaptive filters, characterized by different convergence speeds and steady-state performances, in order to get the best of both online estimation using single-kernel adaptive filters that may use different algorithms and kernels.
References
More filters
Book

Matrix computations

Gene H. Golub
Journal ArticleDOI

Some results on Tchebycheffian spline functions

TL;DR: This article derived explicit solutions to problems involving Tchebycheffian spline functions using a reproducing kernel Hilbert space which depends on the smoothness criterion, but not on the form of the data, to solve explicitly Hermite-Birkhoff interpolation and smoothing problems.
Journal ArticleDOI

The kernel recursive least-squares algorithm

TL;DR: A nonlinear version of the recursive least squares (RLS) algorithm that uses a sequential sparsification process that admits into the kernel representation a new input sample only if its feature space image cannot be sufficiently well approximated by combining the images of previously admitted samples.
Journal ArticleDOI

The Kernel Least-Mean-Square Algorithm

TL;DR: It is shown that with finite data the KLMS algorithm can be readily used in high dimensional spaces and particularly in RKHS to derive nonlinear, stable algorithms with comparable performance to batch, regularized solutions.
Journal ArticleDOI

Online Prediction of Time Series Data With Kernels

TL;DR: This paper investigates a new model reduction criterion that makes computationally demanding sparsification procedures unnecessary and incorporates the coherence criterion into a new kernel-based affine projection algorithm for time series prediction.
Related Papers (5)