Closed-form conditions for convergence of the Gaussian kernel-least-mean-square algorithm

doi:10.1109/ACSSC.2012.6489344

Closed-form conditions for convergence

of the Gaussian kernel-least-mean-square algorithm

C

´

edric Richard

(1)

, Jose-Carlos M. Bermudez

(2)

(1)

Universit

´

e de Nice Sophia-Antipolis, France

(2)

Department of Electrical Engineering, Federal University of Santa Catarina, Florian

`

opolis, SC, Brazil

Abstract—In addition to the choice of the usual linear adaptive

ﬁlter parameters, designing kernel adaptive ﬁlters requires the

choice of the kernel and its parameters. One of our recent

works has brought a new contribution to the discussion about

kernel-based adaptive ﬁltering by providing the ﬁrst convergence

analysis of the kernel-LMS algorithm with Gaussian kernel.

A necessary and sufﬁcient condition for convergence has been

clearly established. Checking the stability of the algorithm can,

unfortunately, be computationally expensive because one needs

to calculate the extreme eigenvalues of a large matrix, for each

set of candidate tuning parameters. The aim of this paper is

to circumvent this drawback by examining two easy-to-handle

conditions that allow to examine how the stability limit varies

as a function of the step-size, the kernel bandwidth, and the

ﬁlter length. One of them is a conjectured necessary and

sufﬁcient condition for convergence that allows to greatly simplify

calculations.

I. INTRODUCTION

Many practical applications require nonlinear signal pro-

cessing. Nonlinear system identiﬁcation methods based on

reproducing kernel Hilbert spaces (RKHS) have gained popu-

larity over the last decades [2], [6]. Recently, kernel adaptive

ﬁltering has been recognized as an appealing solution to the

nonlinear adaptive ﬁltering problem, as working in RKHS

allows the use of linear structures to solve nonlinear estima-

tion problems. For an overview, see [8]. The block diagram

of a kernel-based adaptive system identiﬁcation problem is

presented in Figure 1. Here, U is a compact subspace of IR

q

,

κ : U × U → R is a reproducing kernel, (H, h·,·i

H

) is the

induced RKHS with its inner product and z(n) is a zero-

mean additive noise uncorrelated with any other signal. The

representer theorem [6] states that the function ψ(·) which

minimizes the cost function

P

N

n=1

(ψ(u(n)) − d(n))

2

, given

N input vectors u(n) and desired outputs d(n), can be written

as ψ(·) =

P

N

n=1

α

n

κ(·, u(n)). Since the order of the model

is equal to the number N of available data u(n), this approach

cannot be considered for online applications. To overcome this

barrier, authors in the ﬁeld have focused on ﬁnite-order models

ψ(·) =

M

X

j=1

α

j

κ(·, u(ω

j

)). (1)

In [8], the authors present an overview of the existing tech-

niques to select the M kernel functions in (1) that form the

so-called dictionary, an example of which is the coherence

criterion [12]. The algorithms developed using these ideas

include the kernel least-mean-square (KLMS) algorithm [7],

the kernel recursive-least-square (KRLS) algorithm [3], the

kernel normalized least-mean-square (KNLMS) algorithm and

the kernel afﬁne projection (KAPA) algorithm [5], [12], [13].

In addition to the choice of the usual linear adaptive ﬁlter pa-

rameters, designing kernel adaptive ﬁlters requires the choice

of the kernel and its parameters. Choosing the algorithm and

nonlinear model parameters to achieve a prescribed perfor-

mance is a difﬁcult task, and requires an extensive analysis of

the algorithm stochastic behavior. Our work [11] has recently

brought a new contribution to the discussion about kernel-

based adaptive ﬁltering by providing the ﬁrst convergence

analysis of the KLMS algorithm with Gaussian kernel. The

ﬁltering process is deﬁned by

α(n + 1) = α(n) + η e(n) κ

ω

(n). (2)

where κ

ω

(n) = [κ(u(n), u(ω

1

)), . . . , κ(u(n), u(ω

M

))]

>

,

and κ(u, u

0

) the Gaussian kernel

κ(u, u

0

) = exp



−ku − u

0

k

2

2ξ

2



(3)

with kernel bandwidth ξ. In [11], we derived expressions

for the mean-weight-error vector and the mean-square-error.

These models give engineers the opportunity to choose the

algorithm parameters a priori in order to achieve prescribed

convergence speed and quality of the estimate, and allow the

determination of stability limits. Checking the stability of the

algorithm (2) can be computationally expensive as it needs to

calculate the extreme eigenvalues of an (M

2

× M

2

) matrix,

say G, for each candidate tuning parameters η, M and ξ.

The aim of this paper is to circumvent this drawback

by examining two easy-to-handle conditions that allow to

examine how the stability limit varies as a function of the

step-size, the kernel bandwidth, and the ﬁlter length. The ﬁrst

one is a sufﬁcient condition based on the Gerschgorin disk

theorem, which has already been derived in [11]. The second

one is a conjectured necessary and sufﬁcient condition for

convergence. It allows to greatly simplify calculations, and

to examine how the stability limits vary as a function of the

step-size η, the kernel bandwidth ξ, and the ﬁlter length M.

II. CONVERGENCE ANALYSIS

Let v(n) = α(n) − α

opt

be the weight-error vector.

Let vector c

v

(n) be the lexicographic representation of the

autocorrelation matrix C

v

(n) = E{v(n) v

>

(n)}, i.e., the

matrix C

v

(n) is stacked column-wise into a vector c

v

(n).

Nonlinear system

Adaptive algorith m

u(n)

α(n)

z(n)

e(n)

U → H

d(n)

κ

ω

(n)

ˆ

d(n)

+

−

Fig. 1. Kernel-based adaptive system identiﬁcation.

It was shown in [11] that, under some simplifying statistical

assumptions, we have

c

v

(n + 1) = G c

v

(n) + η

2

J

min

r

κκ

(4)

with r

κκ

the lexicographic representation of the correlation

matrix R

κκ

= E{κ

ω

(n) κ

>

ω

(n)} of the kernelized input, and

J

min

the minimum MSE corresponding to the optimum weight

vector α

opt

= R

−1

κκ

p

κd

, where p

κd

= E{d(n) κ

ω

(n)} is the

cross-correlation vector between κ

ω

(n) and d(n). Matrix G,

of size (M

2

× M

2

), is deﬁned as

G =

h

11

h

12

. . . h

1M

. . . h

MM

i

(5)

with h

`p

the (M

2

× 1) lexicographic representation of the

matrix H

`p

, given by

if (i = j)

[H

ii

]

ii

= 1 − 2ηr

md

+ η

2

µ

1

[H

ii

]

pp

= η

2

µ

3

p 6= i

[H

ii

]

ip

= η

2

µ

2

− ηr

od

= [H

ii

]

pi

p 6= i

[H

ii

]

p`

= η

2

µ

4

otherwise

if (i 6= j)

[H

ij

]

ij

= [H

ij

]

ji

=

1

2

(1 − 2ηr

md

+ 2η

2

µ

3

)

[H

ij

]

pp

= η

2

µ

4

p 6= i, j

[H

ij

]

ii

= [H

ij

]

jj

= η

2

µ

2

− ηr

od

[H

ij

]

ip

= [H

ij

]

pi

=

1

2

(2η

2

µ

4

− ηr

od

) p 6= i, j

[H

ij

]

pj

= [H

ij

]

jp

=

1

2

(2η

2

µ

4

− ηr

od

) p 6= i, j

[H

ii

]

p`

= η

2

µ

5

otherwise

where the µ

k

’s are the fourth-order moments of the kernelized

input deﬁned as

µ

1

:= E{κ

4

ω

i

(n)}

µ

2

:= E{κ

3

ω

i

(n) κ

ω

j

(n)}

µ

3

:= E{κ

2

ω

i

(n) κ

2

ω

j

(n)}

µ

4

:= E{κ

ω

i

(n) κ

ω

j

(n) κ

2

ω

`

(n)}

µ

5

:= E{κ

ω

i

(n) κ

ω

j

(n) κ

ω

`

(n) κ

ω

p

(n)}.

(6)

Parameters r

md

and r

od

are the main-diagonal and off-diagonal

entries of the correlation matrix R

κκ

given by

r

md

:= E{κ

2

ω

i

(n)}

r

od

:= E{κ

ω

i

(n) κ

ω

j

(n)}.

(7)

As extensively explained in [11], parameters µ

k

’s, r

md

and r

od

can be calculated theoretically in the case of i.i.d. Gaussian

inputs u(n). Their value depends on the moments of u(n),

the ﬁlter length M, and the kernel bandwidth ξ.

Before concluding this section, let us introduce the follow-

ing inequalities relating the fourth-order moments µ

k

and the

entries of the correlation matrix R

κκ

, which will be used in

the sequel. Using H

¨

older’s inequality, it can be shown that

µ

5

≤ µ

4

≤ µ

3

≤ µ

2

≤ µ

1

, (8)

and, by virtue of the Chebyshev’s sum inequality

r

2

md

≤ µ

3

. (9)

We refer the reader to the proofs in [11].

We shall now examine the conditions for convergence

of the Gaussian KLMS algorithm using model (4). It can

be checked that the matrix G is symmetric. This implies

that it can be diagonalized, and all its eigenvalues are real-

valued. A necessary and sufﬁcient condition for convergence

is that all these eigenvalues lie inside (−1, 1) [9, Section 5.9].

First, we shall consider a sufﬁcient condition based on the

Gerschgorin disk theorem. After arguing that these conditions

are too restrictive, we provide an easy-to-handle necessary and

sufﬁcient condition for convergence, based on a conjecture.

A. Gerschgorin disk conditions

The eigenvalues of matrix G lie inside the union of Ger-

schgorin disks [4], each disk being centered at a diagonal

element of G, with radius given by the sum of the absolute

values of the remaining elements of the same row. A sufﬁcient

condition for stability of (4) is thus given by

|[G]

ii

| +

M

2

X

`=1

`6=i

|[G]

i`

| < 1 (10)

for i = 1, . . . , M

2

. The deﬁnition of G shows that the rows

of this matrix have only two distinct forms, in the sense that

each row of G has the same entries as one of these two

distinct rows, up to a permutation. This implies that only two

Gerschgorin disks can be distinguished. Using (8), it can be

shown that all the entries of G are positive except possibly

for [G]

i`

= η

2

µ

2

− η r

od

and [G]

i`

=

1

2

(2 η

2

µ

4

− η r

od

).

Expression (10) thus leads to only two sufﬁcient conditions,

deﬁned as follows for M ≥ 3,

λ

(1)

ger

:= (1 − 2ηr

md

+ η

2

µ

1

) + (M − 1)η

2

µ

3

+ 2(M − 1)|η

2

µ

2

− ηr

od

|

+ (M − 1)(M − 2)η

2

µ

4

< 1,

(11a)

TABLE I

STABILITY RESULTS FOR EXAMPLE 1

ξ M η

max

, η

conj

η

ger

0.0075 17 1.70 0.29

0.01 13 1.70 0.30

0.025 6 1.66 0.22

0.05 3 1.80 1.47

TABLE II

STABILITY RESULTS FOR EXAMPLE 2

ξ M η

max

, η

conj

η

ger

0.05 7 2.33 −

0.065 4 2.49 0.68

0.075 3 2.60 1.92

0.125 2 2.39 2.32

TABLE III

STABILITY RESULTS FOR EXAMPLE 3

ξ M η

max

, η

conj

η

ger

0.15 11 1.17 −

0.20 7 1.19 −

0.25 5 1.24 −

0.30 3 1.59 −

λ

(2)

ger

:= (1 − 2ηr

md

+ 2η

2

µ

3

) + 2|η

2

µ

2

− ηr

od

|

+ (M − 2)η

2

µ

4

+ 2(M − 2)|2η

2

µ

4

− ηr

od

|

+ (M − 2)(M − 3)η

2

µ

5

< 1.

(11b)

The intersection of these two conditions provides the following

sufﬁcient condition for stability

λ

ger

(η, M, ξ) := max{λ

(1)

ger

, λ

(2)

ger

} < 1. (12)

which avoids multiple time consuming diagonalizations of the

matrix G. Solving λ

ger

= 1 to derive upper-bounds with

respect to η, M or ξ requires (basic) numerical methods.

We observe that λ

(1)

ger

and λ

(2)

ger

are piecewise polynomial

functions in η. Because they are both equal to 1 for η = 0,

their derivative at the origin must be strictly negative for

the conditions (11a)–(11b) to be meaningful. This leads to

the condition (M − 1)r

od

< r

md

, which is very restrictive.

Application examples in Section III show situations where the

Gerschgorin disk test is ineffective.

B. Conjectured necessary and sufﬁcient condition

It can be shown that there exist θ

1

, θ

2

∈ R not simultane-

ously equal to zero, so that the (M

2

× 1) vector w with i-th

entry deﬁned by

(

w

i

= θ

1

, if (i − 1) ∈ M Z

w

i

= θ

2

, otherwise,

(13)

is an eigenvector of G. The conjecture is that the largest eigen-

value of G in absolute value is associated to an eigenvector of

the form (13). While we currently have no proof, we have not

numerically experienced any contradiction. The difﬁculty in

proving this result is that it does not only rely on the speciﬁc

structure of the matrix, but also on the expression and/or some

order relations of its entries such as (8).

Due to symmetries in matrix G, the eigensystem Gw = λw

of M

2

linear equations in unknowns θ

1

and θ

2

reduces to the

equation det(A − λI) = 0, where A is the (2 × 2) matrix

whose entries a

ij

are given by

a

11

= η

2

(µ

1

+ (M − 1)µ

3

) − 2ηr

md

+ 1

a

12

= (M − 1)



η

2

(2µ

2

+ (M − 2)µ

4

) − 2ηr

od



a

21

= η

2

(2µ

2

+ (M − 2)µ

4

) − 2ηr

od

a

22

= η

2

(2µ

3

+ 4(M − 2)µ

4

+ (M − 2)(M − 3)µ

5

)

− 2η(r

md

+ (M − 2)r

od

) + 1

(14)

Solving the above-mentioned equation yields the following

two real-valued eigenvalues

λ =

1

2

(a

11

+ a

22

−

√

∆)

λ

0

=

1

2

(a

11

+ a

22

+

√

∆)

(15)

with ∆ = (a

11

−a

22

)

2

+4(M −1) a

2

21

. This ﬁnally implies the

conjectured necessary and sufﬁcient condition for convergence

λ

conj

(η, M, ξ) :=

1

2

(|a

11

+ a

22

| +

√

∆) < 1. (16)

Obviously, exploiting this condition is much less computation-

ally demanding than diagonalizing the (M

2

×M

2

) matrix G,

and checking if its eigenvalues lie inside (−1, 1). In addition,

it provides an upper-bound that can be easily studied even if

solving λ

conj

= 1 with respect to η, M or ξ requires (basic)

numerical methods.

III. EXPERIMENTATIONS

We shall now consider the experiments described in [11],

and compare the upper bounds λ

ger

and λ

conj

provided by the

Gerschgorin disk conditions (11a)–(11b), and the (conjectured)

necessary and sufﬁcient condition (16), respectively. We shall

also check that λ

conj

matches the estimated largest eigenvalue

λ

max

in absolute value of the matrix G. Let η

ger

, η

conj

and η

max

be the maximum step sizes provided by these three approaches,

for ﬁxed parameters M and ξ.

All the Matlab codes used in this paper are available on the

personal website of the ﬁrst author: www.cedric-richard.fr

A. Experiment 1

We consider the problem studied in [10], for which

y(n) =

y(n −1)

1 + y

2

(n − 1)

+ u

3

(n − 1) (17)

where the output signal d(n) = y(n) + z(n) is corrupted by a

zero-mean i.i.d. Gaussian noise z(n) of variance σ

2

z

= 10

−4

.

The input sequence u(n) is a zero-mean i.i.d. Gaussian se-

quence with standard deviation σ

u

= 0.15.

Table I reports the maximum step sizes η

ger

, η

conj

and η

max

,

for several values of M and ξ. It can be observed that the

condition imposed by the Gerschgorin disks is very restrictive

compared to the two others. Figure 2 (left) represents λ

ger

,

λ

conj

and λ

max

as a function of η, with parameters M and

ξ deﬁned as in the ﬁrst row of Table I. It can be noticed

that the two latter superimpose perfectly. Figure 3 represents

the conjectured largest eigenvalue λ

conj

of G as a function of

parameters η (left), M (middle), and ξ (right), in the vicinity

0 1 2.5 3

0.95

1

1.05

1.1

1.7036

0.29438



conj

, 

max



(1)

ger



(2)

ger

⌘

0 0.5 1 1.5 2 3

0.95

1

1.05

1.1

2.3278



(1)

ger



(2)

ger



conj

, 

max

⌘

0 0.5 1 1.5 2

0.95

1

1.05

1.1

1.1727



conj

, 

max



(1)

ger

, 

(2)

ger

⌘

Fig. 2. Comparison of the upper bounds (λ

(1)

ger

, λ

(2)

ger

) and λ

conj

provided by the Gerschgorin disk conditions (11a)–(11b), and the (conjectured) necessary

and sufﬁcient condition (16), respectively, with the largest eigenvalue λ

max

of G in absolute value. The three experimental setups are described in the ﬁrst

row of Table I (left), Table II (middle) and Table III (right).

of the stability limit deﬁned by the ﬁrst row of Table I. Finally,

Figure 4 (left) illustrates the convergence of the mean-square

error estimated by averaging over 500 runs. The step size was

arbitrarily chosen to be 1/3 of the maximum step size η

conj

.

We encourage the reader to refer to [11] for an analysis of the

stochastic behavior of the KLMS algorithm.

B. Experiment 2

We now consider the nonlinear dynamic system identiﬁca-

tion problem studied in [14]. The input signal is a sequence

of statistically independent vectors u(n) = [u

1

(n) u

2

(n)] with

correlated samples satisfying u

1

(n) = 0.5 u

2

(n) + η

u

(n). The

second component of u(n) is an i.i.d. Gaussian noise sequence

with variance σ

2

u

2

= 0.0156, and η

u

(n) is a white Gaussian

noise with variance σ

2

u

1

= 0.0156. The nonlinear system under

study consists of the linear system with memory deﬁned by

y(n) = u

1

(n)+0.5u

2

(n)−0.2 y(n−1)+0.35 y(n−2), (18)

and the nonlinear Wiener function

ϕ

y

(n) =











y(n)

3(0.1 + 0.9 y

2

(n))

1/2

if y(n) ≥ 0

−y

2

(n)(1 − exp(0.7y(n)))

3

otherwise.

(19)

The signal d(n) = ϕ

y

(n) + z(n) is corrupted by a zero-mean

i.i.d. Gaussian noise z(n) with variance σ

2

z

= 10

−6

. The initial

condition y(1) = 0 was considered in this example.

Table II reports the maximum step sizes η

ger

, η

conj

and η

max

,

for several values of M and ξ. Observe in Figure 2 (middle)

that, with the experimental setup described in the ﬁrst row of

Table II, no bound on η was provided by the Gerschgorin disk

condition (12). The reason is that (M − 1)r

od

< r

md

is not

satisﬁed in this case, because r

md

= 0.0439 and r

od

= 0.0088.

Finally, Figure 4 (middle) illustrates the convergence of the

mean-square error estimated by averaging over 500 runs. The

step size was arbitrarily chosen to be 1/3 of η

conj

.

C. Experiment 3

Finally, as a third example, we considered the ﬂuid-ﬂow

control problem studied in [1], [15]. The input signal was a

sequence u(n) = [u

1

(n) u

2

(n)] of statistically independent

vectors with samples satisfying u

1

(n) = 0.5 u

2

(n) + η

u

(n).

The second component u

2

(n) is a i.i.d. Gaussian sequence

with variance σ

2

u

2

= 0.0625, and η

u

(n) is a i.i.d. Gaussian

noise so that u

1

(n) has variance σ

2

u

1

= 0.0625. The nonlinear

system under study consists of the linear system

y(n) =0.1044 u

1

(n) + 0.0883 u

2

(n)

+ 1.4138 y(n − 1) − 0.6065 y(n − 2)

(20)

and the nonlinear Wiener function

ϕ

y

(n) =

0.3163 y(n)

p

0.10 + 0.90 y

2

(n)

. (21)

The signal d(n) = ϕ

y

(n) + z(n) is corrupted by a zero-mean

i.i.d. Gaussian noise z(n) with variance σ

2

z

= 10

−6

. The initial

condition y(1) = y(2) = 0 was considered in this example.

It can be noticed in Table III that no upper bound for the

step size η was provided by the Gerschgorin disk condition.

As previously, condition (M − 1)r

od

< r

md

was not satisﬁed

in these cases. Figure 2 (right) represents λ

ger

, λ

conj

and λ

max

as a function of η, with parameters M and ξ deﬁned as in the

ﬁrst row of Table III. It can be noticed that λ

conj

and λ

max

superimpose perfectly. Finally, Figure 4 (middle) illustrates the

convergence of the mean-square error estimated by averaging

over 500 runs. The step size was arbitrarily chosen to be 1/3

of the maximum step size η

conj

.

IV. CONCLUSION

The kernel least-mean-square ﬁlter has become a popular

algorithm in nonlinear adaptive ﬁltering due to its simplicity

and robustness. One of our recent works has brought a new

contribution to the analysis of this approach by providing the

ﬁrst analytical models of convergence of the Gaussian kernel

least-mean-square algorithm. Checking its stability can be

computationally expensive as it needs to calculate the extreme

eigenvalues of large matrix, for each candidate parameter set-

ting. To circumvent this drawback, in this paper, we presented

two easy-to-handle conditions. The ﬁrst one is a sufﬁcient

condition based on the Gerschgorin disk theorem. The second

one is a conjectured necessary and sufﬁcient condition for

convergence that allows to greatly simplify calculations.

0 0.5 1 1.5 2

0.98

0.99

1

1.01

1.02

1.7036



max

⌘

M = 17

⇠ =7.5 · 10

3

5 10 15 20 25 30

0.98

1

1.02

⌘ =1.70

⇠ =7.5 · 10

3



max

M

17

10

-4

10

-3

10

-2

0.995

1

1.01



max

M = 17

⌘ =1.70

⇠

0.0075

Fig. 3. Largest eigenvalue of G in absolute value provided by the (conjectured) expression λ

conj

as a function of η (left), M (middle), and ξ (right), in the

vicinity of the stability limit deﬁned by the ﬁrst row of Table I.

0 1000 2000 3000

-25

-20

-15

-10

-5

0

n

MSE

0 500 1000 1500

-25

-20

-15

-10

-5

0

n

MSE

0 200 400 600 800

-25

-20

-15

-10

-5

0

n

MSE

Fig. 4. Monte-Carlo simulation of KLMS algorithm with Gaussian kernel. The three experimental setups are described in the ﬁrst row of Table I (left),

Table II (middle) and Table III (right). In each case, the step size η was arbitrarily chosen to be 1/3 of the maximum step size.

REFERENCES

[1] H. Al-Duwaish, M. N. Karim, and V. Chandrasekar. Use of multilayer

feedfoorward neural networks in identiﬁcation and control of wiener

model. In Proc. Control Theory Appl., volume 143, May 1996.

[2] D. L. Duttweiler and T. Kailath. An RKHS approach to detection and

estimation theory: Some parameter estimation problems (Part V). IEEE

Trans. Inf. Theory, 19(1):29–37, 1973.

[3] Y. Engel, S. Mannor, and R. Meir. Kernel recursive least squares. IEEE

Trans. Signal Process., 52(8):2275–2285, 2004.

[4] G. H. Golub and C. F. V. Loan. Matrix Computations. The Johns

Hopkins University Press, 1996.

[5] P. Honeine, C. Richard, and J. C. M. Bermudez. On-line nonlinear

sparse approximation of functions. In Proc. IEEE ISIT’07, pages 956–

960, Nice, France, June 2007.

[6] G. Kimeldorf and G. Wahba. Some results on Tchebychefﬁan spline

functions. J. Math. Anal. Appl., 33:82–95, 1971.

[7] W. Liu, P. P. Pokharel, and J. C. Principe. The kernel least-mean-squares

algorithm. IEEE Trans. Signal Process., 56(2):543–554, February 2008.

[8] W. Liu, J. C. Principe, and S. Haykin. Kernel Adaptive Filtering. Jonh

Wiley & Sons, Inc., 2010.

[9] D. G. Luenberger. Introduction Dynamic Systems: Theory, Models &

Applications. Jonh Wiley & Sons, Inc., Stanford, CA, 1979.

[10] D. P. Mandic. A generalized normalized gradient descent algorithm.

IEEE Signal Processing Letters, 2:115–118, February 2004.

[11] W. D. Parreira, J. C. M. Bermudez, C. Richard, and J.-Y. Tourneret.

Stochastic behavior analysis of the Gaussian kernel-least-mean-square

algorithm. IEEE Trans. Signal Process., 60(5):2208–2222, 2012.

[12] C. Richard, J. C. M. Bermudez, and P. Honeine. Online prediction of

time series data with kernels. IEEE Trans. Signal Process., 57(3):1058

–1067, March 2009.

[13] K. Slavakis and S. Theodoridis. Sliding window generalized kernel afﬁne

projection algorithm using projection mappings. EURASIP Journal on

Advances in Signal Processing, 2008:ID 735351, 2008.

[14] J. V

¨

or

¨

os. Modeling and identiﬁcation of Wiener systems with two-

segment nonlinearities. IEEE Transactions on Control Systems Technol-

ogy, 11(2):253 – 257, March 2003.

[15] J.-S. Wang and Y.-L. Hsu. Dynamic nonlinear system identiﬁcation

using a Wiener-type recurrent network with okid algorithm. Journal of

Information Science and Engineering, 24:891–905, 2008.

Closed-form conditions for convergence of the Gaussian kernel-least-mean-square algorithm

Citations

Online Dictionary Learning for Kernel LMS

Kernel Affine Projection Sign Algorithms for Combating Impulse Interference

Kernel LMS algorithm with forward-backward splitting for dictionary learning

Convergence analysis of kernel LMS algorithm with pre-tuned dictionary

Convex combinations of kernel adaptive filters

References

Matrix computations

Some results on Tchebycheffian spline functions

The kernel recursive least-squares algorithm

The Kernel Least-Mean-Square Algorithm

Online Prediction of Time Series Data With Kernels

Related Papers (5)

Stochastic Behavior Analysis of the Gaussian Kernel Least-Mean-Square Algorithm

Online Prediction of Time Series Data With Kernels

The kernel recursive least-squares algorithm

Quantized Kernel Least Mean Square Algorithm

The Kernel Least-Mean-Square Algorithm