scispace - formally typeset
Open Access

Supplementary Material for ECCV 2014 Paper: Spatio-temporal Event Classication using Time-series Kernel based Structured Sparsity

TLDR
This document provides implementation details of the Kernel Structured Sparsity (KSS) method for FISTA (fast iterative shrinkage-thresholding algorithm) and shows scaling results on the 6D Motion Gesture Database.
Abstract
This document accompanies the paper: "Spatio-temporal Event Classication using Time-series Kernel based Structured Sparsity". We provide implementation details of the Kernel Structured Sparsity (KSS) method for FISTA (fast iterative shrinkage-thresholding algorithm) (2). We also show scaling results on the 6D Motion Gesture Database (3). Note that information given in this document is not necessary to under- stand the content of the main paper.

read more

Content maybe subject to copyright    Report

Supplementary Material for ECCV 2014 Paper:
Spatio-temporal Event Classification using
Time-series Kernel based Structured Sparsity
aszl´o A. Jeni
1
, Andr´as orincz
2
, Zolt´an Szab´o
3
,
Jeffrey F. Cohn
1,4
, and Takeo Kanade
1
1
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
2
Faculty of Informatics, otv¨os Lor´and University, Budapest, Hungary
3
Gatsby Computational Neuroscience Unit, University College London, London, UK
4
Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA
laszlo.jeni@ieee.org, andras.lorincz@elte.hu,
zoltan.szabo@gatsby.ucl.ac.uk, jeffcohn@cs.cmu.edu, tk@cs.cmu.edu
Abstract. This document accompanies the paper: ”Spatio-temporal Event
Classification using Time-series Kernel based Structured Sparsity”. We
provide implementation details of the Kernel Structured Sparsity (KSS)
method for FISTA (fast iterative shrinkage-thresholding algorithm) [2].
We also show scaling results on the 6D Motion Gesture Database [3].
Note that information given in this document is not necessary to under-
stand the content of the main paper.
S1 Implementation
We can rewrite our optimization cost function (Eq. (S1)) as
J(α) =
1
2
*
ϕ(x)
M
X
m=1
ϕ(d
i
)α
i
, ϕ(x)
M
X
i=1
ϕ(d
i
)α
i
+
H
+ κΩ(α) (S1)
=
1
2
hϕ(x), ϕ(x)i
H
2
*
ϕ(x),
M
X
i=1
ϕ(d
i
)α
i
+
H
+
*
M
X
i=1
ϕ(d
i
)α
i
,
M
X
i=1
ϕ(d
i
)α
i
+
H
+ κΩ(α)
=
1
2
hϕ(x), ϕ(x)i
H
2
M
X
i=1
k(x, d
i
)α
i
+
M
X
i=1
M
X
j=1
α
i
k(d
i
, d
j
)α
j
+ κΩ(α)
=
1
2
2k
T
α + α
T
Gα
+ κΩ(α) + c, (S2)

2 aszl´o A. Jeni et al.
where we used that the kernel represents an inner product and bilinearity of the
inner product was exploited, superscript ’T’ denotes transposition, we introduced
the
k = [k(x, d
1
); . . . ; k(x, d
M
)] R
M
, (S3)
G = [G
ij
] = [k(d
i
, d
j
)] R
M×M
(S4)
notations, and c is an additive constant independent of α. By discarding c, our
objective function is of the form
J(α) = f(α) + κΩ(α), (S5)
where
f(α) =
1
2
α
T
Gα k
T
α (S6)
is a quadratic function.
Using this form, a FISTA optimization can be adapted to the solution. Our
experiments were based on the modification of the SLEP package
5
. We need the
following elements for the implementation:
1. The proximal operator of (it has not changed).
2. f(α) from Eq. (S6)
3. The gradient of f :
α
f(α) = Gα k. (S7)
4. The stopping criterion for FISTA.
Furthermore, for practical issues, it is important to identify a set of candidate
values for the regularization parameter κ, since it affects the performance of the
fitted model. We address the stopping criterion and the regularization parameter
selection in the following two subsections.
S1.1 FISTA Stopping Criterion
The principle behind FISTA is to iteratively form quadratic approximations
Q(α, β) to J(α) around a carefully chosen point β, and to minimize Q(α, β)
rather than the original cost function J(α). We can stop the iterations if the
relative change in α between consecutive iterations being sufficiently small.
Lets define Q(α, β) as:
Q(α, β) = f(β) + hα β, f(β)i +
L
2
kα βk
2
+ κΩ(α). (S8)
where L = L(f) is a Lipschitz constant L(f).
To define the stopping criterion we need the following fundamental property
for a smooth function [2]:
5
http://www.public.asu.edu/˜jye02/Software/SLEP/

Spatio-temporal Event Classification using Time-series Kernel based Structured Sparsity 3
Lemma 1. Let f(α) : R
n
R be a continuously differentiable function with
Lipschitz continuous gradient and Lipschitz constant L(f). Then, for any L
L(f),
f(α) f(β) + hα β, f(β)i +
L
2
kα βk
2
, (S9)
for every α, β R
n
Applying Lemma 1 to (Eq. (S6)) we get
1
2
α
T
Gα k
T
α
1
2
β
T
Gβ k
T
β+
(α β)
T
(Gβ k) +
L
2
kα βk
2
, (S10)
1
2
α
T
Gα k
T
α
1
2
β
T
Gβ k
T
β+ (S11)
α
T
Gβ α
T
k β
T
Gβ + β
T
k +
L
2
kα βk
2
, (S12)
1
2
α
T
Gα α
T
Gβ
1
2
β
T
Gβ +
L
2
kα βk
2
, (S13)
(S14)
which leads to the stopping criterion:
α
T
Gα 2α
T
Gβ + β
T
Gβ L kα βk
2
. (S15)
S1.2 Candidate Values for Regularization Parameter κ
It is important to choose an appropriate regularization parameter from a set
of candidate values, because it affects the predictive performance of the fitted
model. Since the derived objection function (Eq. (13)) is of the form of Eq. (1.1)
in [1], thus in case of
(f(0)) κ (S16)
the solution is guaranteed to be α = 0 (see the notes for Proposition 1.2 in [1]).
Let be the l
1
/l
q
norm (q 1) induced by a G = {{1, . . . , d}, {d+1, . . . , 2d},
. . . , {(M 1)d + 1, . . . , Md}} partition
(u) =
X
GG
ku
G
k
q
. (S17)
Then
(u) = max
GG
ku
G
k
q
0
(S18)

4 aszl´o A. Jeni et al.
is the l
/l
0
q
norm, where q are q
0
conjugate, i.e.,
1
q
+
1
q
0
= 1. (S19)
For example, in case of q = 2, q
0
= 2 and using f (0) = k (Eq. (S7)), and
for
max
GG
kk
G
k
2
κ (S20)
α = 0 is guaranteed. In other words, the application of a
κ = κ
0
max
GG
kk
G
k
2
(S21)
parameterization is advisable, where κ
0
(0, 1).
S2 Scalability Experiment on the 6DMG
In this set of experiment we studied the scalability of the proposed methods by
varying the dimensions of the optimization problems. We measured the perfor-
mances of the methods for gesture classification. We calculated Gram matrices
using the GA kernel from the time-series provided with the dataset and per-
formed leave-one-subject out cross validation. We searched for the best param-
eter (σ of GA kernel) between 0.4 and 20 and selected the parameter having
the lowest mean classification error. The SVM regularization parameter (C) was
searched within 2
10
and 2
10
and the KSS regularization parameter (κ) was
searched within 0 and 0.5 in a similar fashion. We summarize the results in
Table S1.
Table S1. The character error rates (CER) of motion character recognition using single
modalities with varying numbers of classes. The different attributes in the columns:
position (P), velocity (V), acceleration (A), angular velocity (W), orientation (O). The
best results are denoted with bold letters.
# of Classifier P W O A V
Classes
3
GA + SVM 0.53 0.27 0.8 0.537 0.27
GA + KSS 0.13 0.13 0.53 0.13 0.27
10
GA + SVM 2.08 1.8 4.52 2.32 1.8
GA + KSS 0.8 0.56 4.04 0.96 0.84
26
GA + SVM 3.68 4.83 7.82 6.15 3.69
GA + KSS 3.43 4.95 13.15 4.8 3.88

Spatio-temporal Event Classification using Time-series Kernel based Structured Sparsity 5
In all experiments the KSS method outperformed the time-series SVM. In the
cases with 3 and 10 class labels, the KSS method achieved a two- and three-fold
reduction in recognition error (respectively).
Regarding the parameters, we note that the KSS achieved the best results
with κ
0
0 values, which supports the relevancy of the structured sparse regu-
larization.
Figure S1 shows the confusion matrix of the KSS method using all the 26
class labels. Most gestures were predicted with 100% accuracy, while some of
the similar looking gesture-pairs, such as D-P, K-R, U-V and Q-G, show some
misclassification error.
Fig. S1. Confusion matrix for gesture classification on the 6DMG dataset using Kernel
Structured Sparsity (KSS) applied to position data (P).

References
More filters
Journal ArticleDOI

A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

TL;DR: A new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically.
Posted Content

Optimization with Sparsity-Inducing Penalties

TL;DR: In this article, the authors present from a general perspective optimization tools and techniques dedicated to such sparsityinducing penalties, including proximal methods, block-coordinate descent, reweighted $\ell_2$-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions.
Book

Optimization with Sparsity-Inducing Penalties

TL;DR: This monograph covers proximal methods, block-coordinate descent, reweighted l2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provides an extensive set of experiments to compare various algorithms from a computational point of view.
Proceedings ArticleDOI

6DMG: a new 6D motion gesture database

TL;DR: This work presents a database that contains comprehensive motion data, including the position, orientation, acceleration, and angular speed, for a set of common motion gestures performed by different users, and hopes it can be a useful platform for researchers and developers to build their recognition algorithms as well as a common test bench for performance comparisons.
Related Papers (5)