Supplementary Material for ECCV 2014 Paper: Spatio-temporal Event Classication using Time-series Kernel based Structured Sparsity

Supplementary Material for ECCV 2014 Paper:

Spatio-temporal Event Classiﬁcation using

Time-series Kernel based Structured Sparsity

L´aszl´o A. Jeni

1

, Andr´as L˝orincz

2

, Zolt´an Szab´o

3

,

Jeﬀrey F. Cohn

1,4

, and Takeo Kanade

1

Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA

2

Faculty of Informatics, E¨otv¨os Lor´and University, Budapest, Hungary

3

Gatsby Computational Neuroscience Unit, University College London, London, UK

4

Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA

laszlo.jeni@ieee.org, andras.lorincz@elte.hu,

zoltan.szabo@gatsby.ucl.ac.uk, jeffcohn@cs.cmu.edu, tk@cs.cmu.edu

Abstract. This document accompanies the paper: ”Spatio-temporal Event

Classiﬁcation using Time-series Kernel based Structured Sparsity”. We

provide implementation details of the Kernel Structured Sparsity (KSS)

method for FISTA (fast iterative shrinkage-thresholding algorithm) [2].

We also show scaling results on the 6D Motion Gesture Database [3].

Note that information given in this document is not necessary to under-

stand the content of the main paper.

S1 Implementation

We can rewrite our optimization cost function (Eq. (S1)) as

J(α) =

1

2

*

ϕ(x) −

M

X

m=1

ϕ(d

i

)α

i

, ϕ(x) −

M

X

i=1

ϕ(d

i

)α

i

+

H

+ κΩ(α) (S1)

=

1

2



hϕ(x), ϕ(x)i

H

− 2

*

ϕ(x),

M

X

i=1

ϕ(d

i

)α

i

+

H

+

*

M

X

i=1

ϕ(d

i

)α

i

,

M

X

i=1

ϕ(d

i

)α

i

+

H



+ κΩ(α)

=

1

2



hϕ(x), ϕ(x)i

H

− 2

M

X

i=1

k(x, d

i

)α

i

+

M

X

i=1

M

X

j=1

α

i

k(d

i

, d

j

)α

j



+ κΩ(α)

=

1

2



−2k

T

α + α

T

Gα



+ κΩ(α) + c, (S2)

2 L´aszl´o A. Jeni et al.

where we used that the kernel represents an inner product and bilinearity of the

inner product was exploited, superscript ’T’ denotes transposition, we introduced

the

k = [k(x, d

1

); . . . ; k(x, d

M

)] ∈ R

M

, (S3)

G = [G

ij

] = [k(d

i

, d

j

)] ∈ R

M×M

(S4)

notations, and c is an additive constant independent of α. By discarding c, our

objective function is of the form

J(α) = f(α) + κΩ(α), (S5)

where

f(α) =

1

2

α

T

Gα − k

T

α (S6)

is a quadratic function.

Using this form, a FISTA optimization can be adapted to the solution. Our

experiments were based on the modiﬁcation of the SLEP package

5

. We need the

following elements for the implementation:

1. The proximal operator of Ω (it has not changed).

2. f(α) from Eq. (S6)

3. The gradient of f :

∇

α

f(α) = Gα − k. (S7)

4. The stopping criterion for FISTA.

Furthermore, for practical issues, it is important to identify a set of candidate

values for the regularization parameter κ, since it aﬀects the performance of the

ﬁtted model. We address the stopping criterion and the regularization parameter

selection in the following two subsections.

S1.1 FISTA Stopping Criterion

The principle behind FISTA is to iteratively form quadratic approximations

Q(α, β) to J(α) around a carefully chosen point β, and to minimize Q(α, β)

rather than the original cost function J(α). We can stop the iterations if the

relative change in α between consecutive iterations being suﬃciently small.

Lets deﬁne Q(α, β) as:

Q(α, β) = f(β) + hα − β, ∇f(β)i +

L

2

kα − βk

2

+ κΩ(α). (S8)

where L = L(f) is a Lipschitz constant L(f).

To deﬁne the stopping criterion we need the following fundamental property

for a smooth function [2]:

5

http://www.public.asu.edu/˜jye02/Software/SLEP/

Spatio-temporal Event Classiﬁcation using Time-series Kernel based Structured Sparsity 3

Lemma 1. Let f(α) : R

n

→ R be a continuously diﬀerentiable function with

Lipschitz continuous gradient and Lipschitz constant L(f). Then, for any L ≥

L(f),

f(α) ≤ f(β) + hα − β, ∇f(β)i +

L

2

kα − βk

2

, (S9)

for every α, β ∈ R

n

Applying Lemma 1 to (Eq. (S6)) we get

1

2

α

T

Gα − k

T

α ≤

1

2

β

T

Gβ − k

T

β+

(α − β)

T

(Gβ − k) +

L

2

kα − βk

2

, (S10)

1

2

α

T

Gα − k

T

α ≤

1

2

β

T

Gβ − k

T

β+ (S11)

α

T

Gβ − α

T

k − β

T

Gβ + β

T

k +

L

2

kα − βk

2

, (S12)

1

2

α

T

Gα ≤ α

T

Gβ −

1

2

β

T

Gβ +

L

2

kα − βk

2

, (S13)

(S14)

which leads to the stopping criterion:

α

T

Gα − 2α

T

Gβ + β

T

Gβ ≤ L kα − βk

2

. (S15)

S1.2 Candidate Values for Regularization Parameter κ

It is important to choose an appropriate regularization parameter from a set

of candidate values, because it aﬀects the predictive performance of the ﬁtted

model. Since the derived objection function (Eq. (13)) is of the form of Eq. (1.1)

in [1], thus in case of

Ω

∗

(∇f(0)) ≤ κ (S16)

the solution is guaranteed to be α = 0 (see the notes for Proposition 1.2 in [1]).

Let Ω be the l

1

/l

q

norm (q ≥ 1) induced by a G = {{1, . . . , d}, {d+1, . . . , 2d},

. . . , {(M − 1)d + 1, . . . , Md}} partition

Ω(u) =

X

G∈G

ku

G

k

q

. (S17)

Then

Ω

∗

(u) = max

G∈G

ku

G

k

q

0

(S18)

4 L´aszl´o A. Jeni et al.

is the l

∞

/l

0

q

norm, where q are q

0

conjugate, i.e.,

1

q

+

1

q

0

= 1. (S19)

For example, in case of q = 2, q

0

= 2 and using ∇f (0) = −k (Eq. (S7)), and

for

max

G∈G

kk

G

k

2

≤ κ (S20)

α = 0 is guaranteed. In other words, the application of a

κ = κ

0

max

G∈G

kk

G

k

2

(S21)

parameterization is advisable, where κ

0

∈ (0, 1).

S2 Scalability Experiment on the 6DMG

In this set of experiment we studied the scalability of the proposed methods by

varying the dimensions of the optimization problems. We measured the perfor-

mances of the methods for gesture classiﬁcation. We calculated Gram matrices

using the GA kernel from the time-series provided with the dataset and per-

formed leave-one-subject out cross validation. We searched for the best param-

eter (σ of GA kernel) between 0.4 and 20 and selected the parameter having

the lowest mean classiﬁcation error. The SVM regularization parameter (C) was

searched within 2

−10

and 2

10

and the KSS regularization parameter (κ) was

searched within 0 and 0.5 in a similar fashion. We summarize the results in

Table S1.

Table S1. The character error rates (CER) of motion character recognition using single

modalities with varying numbers of classes. The diﬀerent attributes in the columns:

position (P), velocity (V), acceleration (A), angular velocity (W), orientation (O). The

best results are denoted with bold letters.

# of Classiﬁer P W O A V

Classes

3

GA + SVM 0.53 0.27 0.8 0.537 0.27

GA + KSS 0.13 0.13 0.53 0.13 0.27

10

GA + SVM 2.08 1.8 4.52 2.32 1.8

GA + KSS 0.8 0.56 4.04 0.96 0.84

26

GA + SVM 3.68 4.83 7.82 6.15 3.69

GA + KSS 3.43 4.95 13.15 4.8 3.88

Spatio-temporal Event Classiﬁcation using Time-series Kernel based Structured Sparsity 5

In all experiments the KSS method outperformed the time-series SVM. In the

cases with 3 and 10 class labels, the KSS method achieved a two- and three-fold

reduction in recognition error (respectively).

Regarding the parameters, we note that the KSS achieved the best results

with κ

0

≥ 0 values, which supports the relevancy of the structured sparse regu-

larization.

Figure S1 shows the confusion matrix of the KSS method using all the 26

class labels. Most gestures were predicted with 100% accuracy, while some of

the similar looking gesture-pairs, such as D-P, K-R, U-V and Q-G, show some

misclassiﬁcation error.

Fig. S1. Confusion matrix for gesture classiﬁcation on the 6DMG dataset using Kernel

Structured Sparsity (KSS) applied to position data (P).

Supplementary Material for ECCV 2014 Paper: Spatio-temporal Event Classication using Time-series Kernel based Structured Sparsity

References

A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

Optimization with Sparsity-Inducing Penalties

Optimization with Sparsity-Inducing Penalties

6DMG: a new 6D motion gesture database

Related Papers (5)

Multi-view multi-sparsity kernel reconstruction for multi-class image classification

Abnormal event detection in crowded scenes using sparse representation

Multiple Kernel Sparse Representations for Supervised and Unsupervised Learning

Sparsity-Based Image Error Concealment via Adaptive Dual Dictionary Learning and Regularization

Robust visual tracking with structured sparse representation appearance model