scispace - formally typeset
Open AccessProceedings ArticleDOI

Online Video SEEDS for Temporal Window Objectness

Reads0
Chats0
TLDR
An online, real-time video super pixel algorithm based on the recently proposed SEEDS super pixels is introduced and a new capability is incorporated which delivers multiple diverse samples (hypotheses) of super pixels in the same image or video sequence.
Abstract
Super pixel and objectness algorithms are broadly used as a pre-processing step to generate support regions and to speed-up further computations. Recently, many algorithms have been extended to video in order to exploit the temporal consistency between frames. However, most methods are computationally too expensive for real-time applications. We introduce an online, real-time video super pixel algorithm based on the recently proposed SEEDS super pixels. A new capability is incorporated which delivers multiple diverse samples (hypotheses) of super pixels in the same image or video sequence. The multiple samples are shown to provide a strong cue to efficiently measure the objectness of image windows, and we introduce the novel concept of objectness in temporal windows. Experiments show that the video super pixels achieve comparable performance to state-of-the-art offline methods while running at 30 fps on a single 2.8 GHz i7 CPU. State-of-the-art performance on objectness is also demonstrated, yet orders of magnitude faster and extended to temporal windows in video.

read more

Content maybe subject to copyright    Report

Online Video SEEDS for Temporal Window Objectness
Michael Van den Bergh
1
Gemma Roig
1
Xavier Boix
1
Santiago Manen
1
Luc Van Gool
1,2
1
ETH Z
¨
urich, Switzerland
2
KU Leuven, Belgium
{vamichae,boxavier,gemmar,vangool}@vision.ee.ethz.ch
Abstract
Superpixel and objectness algorithms are broadly used
as a pre-processing step to generate support regions and
to speed-up further computations. Recently, many algo-
rithms have been extended to video in order to exploit the
temporal consistency between frames. However, most meth-
ods are computationally too expensive for real-time appli-
cations. We introduce an online, real-time video superpixel
algorithm based on the recently proposed SEEDS superpix-
els. A new capability is incorporated which delivers multi-
ple diverse samples (hypotheses) of superpixels in the same
image or video sequence. The multiple samples are shown
to provide a strong cue to efficiently measure the object-
ness of image windows, and we introduce the novel concept
of objectness in temporal windows. Experiments show that
the video superpixels achieve comparable performance to
state-of-the-art offline methods while running at 30 fps on
a single 2.8 GHz i7 CPU. State-of-the-art performance on
objectness is also demonstrated, yet orders of magnitude
faster and extended to temporal windows in video.
1. Introduction
Many algorithms use superpixels or objectness scores to
efficiently select areas which to analyze further. With an
increasing number of papers on the analysis of videos, the
interest in having similar concepts extracted from time se-
quences is increasing as well. The exploitation of temporal
continuity can indeed help boost several types of applica-
tions. Yet, most current solutions are computationally ex-
pensive and non-causal (i.e. need to see the whole video
first). We propose a novel method for the online extraction
of video superpixels. In terms of its still counterparts, it
comes closest to the recently introduced SEEDS superpix-
els [15].
Similar to SEEDS, we define an objective function that
prefers video superpixels to have a homogeneous color, and
our video superpixels can be extracted efficiently. Their op-
timization is based on iteratively refining the partition, by
This work has been supported by the European Commission project
RADHAR (FP7 ICT 248873).
Figure 1. Top: Video SEEDS provide temporal superpixel tubes.
Bottom: Randomized SEEDS efficiently produce multiple label
hypotheses per frame. Based on these, a Video Objectness mea-
sure is introduced to propose temporal windows (tubes of bound-
ing boxes) that are likely to contain objects.
exchanging blocks of pixels between superpixels. When
starting off the partition of a new video frame, we ex-
ploit the hierarchical superpixel organization of the previous
frame, the coarser levels of which serve as initialization.
Moreover, we propose a method to extract multiple su-
perpixel partitions with a value of the objective function
close to that of the optimum. Typically the overlapping su-
perpixels differ in non-essential parts of their contours, but
those segments that correspond to a genuine object contour
are shared. This allows us to introduce a new and highly ef-
ficient objectness measure, together with its natural exten-
sion to videos (a tube of bounding boxes spanning a time
interval). Fig. 1 depicts a summary of the contributions of
the paper.
We experimentally validate the video superpixel and ob-
jectness algorithms, where we use standard benchmarks
where possible. Both methods achieve state-of-the-art re-
2013 IEEE International Conference on Computer Vision
1550-5499/13 $31.00 © 2013 IEEE
DOI 10.1109/ICCV.2013.54
377

sults but at much higher speeds than available methods.
2. Related Work
In this section, we review previous work related to su-
perpixels and objectness in videos, the two tasks tackled in
this paper.
Video Superpixels. Most methods are approaches for still
images that have been extended to video. They either
progressively add cuts or grow superpixels from centers.
Adding cuts are the graph-based method [5] and its hier-
archical extensions [8, 17], segmentation by weighted ag-
gregation (SWA) [12], and normalized cuts with Nystrom
optimization [7]. Methods that grow centers are based on
mean shift [10, 9]. Our method also starts from a still-
oriented method, i.e. the recently introduced SEEDS ap-
proach [15]. Thus, our approach can be seen to add a third
strand to video superpixel extraction, namely one that that
moves the boundaries in an initial superpixel partition.
Recently, Xu et al. [16, 17] proposed a benchmark to
evaluate video superpixels and a framework for streaming
video segmentation using the graph-based superpixel ap-
proach of [5]. They achieved state-of-the-art results, but
only at 4 seconds/frame, i.e. 2 orders of magnitude from
real-time.
Temporal Window Objectness. The objectness measure
was introduced by Alexe et al. [1] for still images, where-
after [11] and [6] introduced new cues to boost perfor-
mance. To the best of our knowledge, objectness throughout
video shots has not been introduced before. It should not
be confused with the recently introduced dynamic object-
ness [13], which extracts objectness within a frame by in-
cluding instantaneous motion. In contrast, we deliver tubes
of bounding boxes throughout extended time intervals.
3. Video SEEDS
In this section, we first review the SEEDS algorithm [15]
for the extraction of superpixels in stills. Subsequently, we
discuss the extension of this concept for videos, the corre-
sponding energy function, and how to optimize it.
3.1. SEEDS for stills
Let s represent the superpixel partition of an image, such
that s : {1,...,N}→{1,...,K}, in which N represents
the number of pixels in the image, and K the number of
superpixels. Superpixels are constrained to be contiguous
blobs, which is indicated by s ∈S, where S is the set of
valid superpixel partitions. The SEEDS approach [15] for
extracting superpixels in stills serves as starting point for
our video extension. Yet, we propose important refinements
on which the algorithm’s efficiency critically depends.
frame 0 frame 1 frame 2
initialization
pixel-updates
DVSSFOU
frame
block-updates
propagation
Figure 2. Overview of the Video SEEDS algorithm: The super-
pixel labels are propagated at an intermediary step of block-level
updates. The result is fine-tuned for each frame individually.
SEEDS extracts superpixels by maximizing an objective
function, thus enforcing the color histograms of superpixels
to be each concentrated in a single bin. The hill climbing
optimization starts from a grid of square superpixels, which
it iteratively refines by swapping blocks of pixels at their
boundaries. We chose SEEDS as they are extracted in real-
time on a single CPU.
3.2. SEEDS for videos
Our video approach propagates superpixels over multi-
ple frames to build 3D spatio-temporal constructs. As time
goes on, new video superpixels can appear and others may
terminate. In the literature, this is controlled by constraining
the number of superpixel tubes in the sequence. For online
applications this is not possible however, since the upcom-
ing length and content of the sequence are unknown. Thus,
we use alternative constraints defined through 2 parameters:
Superpixels per frame: number of superpixels in which
each single frame is partitioned.
Superpixel rate: the rate of creating/terminating super-
pixels over time.
In order to fulfill both constraints, the termination of a su-
perpixel implies the creation of a new one in the same
frame. In the experiments, we discuss how we select these
parameters.
Let S be the set of valid partitions of a video. These
are the partitions for which the superpixels are contiguous
blobs in all frames and that exhibit the correct superpixel-
per-frame and superpixel-rate behavior. Let A
t
k
denote the
378

layer 1 (pixels)
layer 2 (blocks) layer 3 (blocks) layer 4 (superpixels)
Figure 3. Hierarchy of blocks of pixels of 4 layers.
set of pixels that belong to superpixel k, at frame t.To
indicate all pixels of the video superpixel up to frame t,we
use A
t:0
k
.
Similarly to [15], the energy function encourages color
homogeneity within the 3D superpixels. We use a color
histogram of each superpixel to evaluate this. The color
histogram of A
t:0
k
is written as c
A
t:0
k
. Let H
j
be a subset of
the color space which determines the colors in a bin of the
histogram. Then the energy function is
H(s)=
k
{H
j
}
(c
A
t:0
k
(j))
2
, (1)
which is maximal when the histograms have only one non-
zero bin for each video superpixel.
3.3. Online Optimization via Hill Climbing
The optimization algorithm is designed to maximize the
energy function in an online fashion (i.e. only using past
frames and at video rate). It computes the partition of the
current frame, starting from an approximation of the last
partition. Once the partition of the current frame is deliv-
ered, it remains fixed. We introduce a hill climbing algo-
rithm that runs in real-time. It maximizes the energy by
exchanging pixels between superpixels at their boundaries.
This section describes the optimization in more detail. See
Fig. 2 for an overview of the algorithm.
Hierarchy of blocks of pixels. Both the pixel exchange
between superpixels and their temporal propagation are reg-
ulated through blocks of pixels. The SEEDS algorithm [15]
started by dividing a still image into a regular grid of blocks.
An important difference with our algorithm is that we con-
sider a hierarchy of blocks at different sizes. Starting from
pixels as the most detailed scale, 2 × 2 or 3 × 3 pixel blocks
are formed (how that choice is made is to be clarified soon)
for the second layer. Further layers each time combine 2×2
blocks of the previous one. The block size at the second
layer (2 × 2 or 3 × 3) and the number of layers are cho-
sen such that the image subdivision at the highest layer ap-
proximately yields the prescribed number of superpixels per
frame. In Fig. 3 we illustrate an example of the hierarchy of
4 layers of block sizes.
Pixel and block-level updates. An initial partition of the
current frame is provided by the previous frame. This prop-
agation process will be described shortly. In case of the first
initialization
layer 3 (blocks) layer 2 (blocks) layer 1 (pixels)
initialization layer 2 (blocks) layer 1 (pixels)
t = 0
t = 1
Figure 4. Efficient updating at different block sizes.
frame, the initial partition corresponds to the highest block
layer as just described, i.e. a regular grid. The hill climb-
ing optimization starts from the initialization to then itera-
tively propose local changes in the partition. Multiple pixel
block exchanges between superpixels are considered, one
after the other. If such an exchange increases the objective
function, it is accepted and the partition is updated; else, the
exchange is discarded. The exchanged pixel blocks are ad-
jacent to the superpixel boundaries. The algorithm starts by
exchanging bigger blocks, and then it descends in the block
hierarchy until it reaches the pixel level. Thus, in the first
iterations larger blocks are exchanged to quickly arrive at a
coarse partition that captures the global structure. Later, the
partition is refined through smaller blocks and pixels that
capture more details. This process is shown in Fig. 4.
Let B
t
n
be a block of pixels of the current frame that be-
longs to the superpixel n, i.e. B
t
n
⊂A
t
n
⊂A
t:0
n
. To evaluate
whether exchanging the block B
t
n
from superpixel n to m
increases the objective function, we can use one histogram
intersection computation, rather than evaluating the com-
plete energy function. This is
int(c
B
t
n
,c
A
t:0
m
) int(c
B
t
n
,c
A
t:0
n
\B
t
n
), (2)
in which int(·, ·) denotes the intersection between two his-
tograms, and \ the exclusion of a set. Thus, if the inter-
section of B
t
n
to the video superpixel A
t:0
m
is higher than
the intersection to the superpixel it currently belongs to, the
exchange is accepted, otherwise it is discarded. The speed
of the hill climbing optimization stems from Eq. (2), since
it can evaluate a block exchange with a single intersection
distance computation.
In the supplementary material we show that using Eq. (2)
maximizes the energy under the assumptions that |A
t:0
m
|≈
|A
t:0
n
|, |B
t
n
||A
t:0
n
|, where |·|is the cardinality of the set.
Also, it assumes that the histogram of B
t
n
is concentrated in
a single bin. The first one is that video superpixels are of
similar size and that the blocks are much smaller than the
video superpixels. This holds most of the time, since super-
pixels indeed tend to be of the same size, and the blocks are
379

defined to be at most one fourth of a superpixel in a frame,
and hence, are much smaller than superpixels extending on
multiple frames in the video. The second assumption is that
the block of pixels have a homogeneous color histograms.
This was empirically shown to hold in practice by [15] (in
more than 90% of the cases), and we observed the same.
Creating and terminating video superpixels. Accord-
ing to the superpixel rate, some frames are selected to termi-
nate and create superpixels. When a frame is selected, we
first terminate a superpixel, and then we create a new one.
To this aim, we introduce similar inequalities as in Eq. (2).
They allow to evaluate which termination and creation of
superpixels yield higher energy using efficient intersection
distances, as well.
In Fig. 5 there is an illustration of the creation and termi-
nation of superpixels with the notation used. When a super-
pixel is terminated, its pixels at frame t are incorporated to
a neighbor superpixel. Let A
t
n
⊂A
t:0
n
and A
t
m
⊂A
t:0
m
be
two candidates of superpixels to terminate at frame t. Let
A
t:0
p
and A
t:0
q
be the superpixel candidate to incorporate A
t
n
and A
t
m
, respectively. The superpixel with larger intersec-
tion with its neighbor is the one selected to terminate, i.e.
int(c
A
t
n
,c
A
t:0
p
) int(c
A
t
m
,c
A
t:0
q
). (3)
We terminate the superpixel with higher intersection to its
neighbor among all superpixels in the frame. In the supple-
mentary material, we show that Eq. (3) leads to the highest
energy state, under the assumptions that |A
t:0
p
|≈|A
t:0
q
|,
|A
t
n
||A
t:0
p
|, |A
t
m
||A
t:0
q
|, and that both A
t
n
and A
t
m
have histograms concentrated into one bin. These are sim-
ilar to the assumptions for Eq. (2). Additionally, it is also
assumed that c
A
t:0
n
c
A
(t1):0
n
and c
A
t:0
m
c
A
(t1):0
m
. This
is that the color histogram of the temporal superpixel re-
mains approximately the same including and excluding the
pixels at the current frame. This holds most of the time,
given the fact that |A
t
n
||A
t:0
n
|.
If a superpixel is terminated, a new one should be created
to fulfill the constraint of number of superpixels per frame
(Sec. 3.2). The candidates to form a new superpixel are
blocks of pixels that belong to an existing video superpixel.
Let B
t
n
⊂A
t:0
n
and B
t
m
⊂A
t:0
m
be blocks of superpixels
candidates to create a new superpixel. We select the block
of pixels which histogram minimally intersects with its cur-
rent superpixel. This is,
int(c
B
t
m
,c
A
t:0
m
\B
t
m
) int(c
B
t
n
,c
A
t:0
n
\B
t
n
). (4)
We select the block of pixels with minimum intersection in
the frame. We show in the supplementary material, that this
yields the highest energy, assuming that |A
t:0
m
|≈|A
t:0
n
|,
|B
t
n
||A
t:0
n
|, |B
t
m
||A
t:0
m
|, and that both B
t
n
and B
t
m
have histograms concentrated into one bin. These assump-
tions are similar to the ones of Eq. (3).
Termination
A
t
n
A
t:0
p
A
t-1:0
n
time
current
frame
Creation
t
m
A
t:0
m
time
current
frame
B
Figure 5. Termination and creation of superpixels.
Iterations. We can stop the optimization for a frame at
any time and obtain a valid partition. We expect a higher
value of the energy function if we let the hill-climbing do
more iterations, until convergence. We can fix the allowed
time to run per frame, or set it on-the-fly, depending on the
application. In principle, the algorithm can run for an in-
finitely long video, since it generates the partition online,
and in memory we only need the histograms of the video
superpixels that propagate to the current frame.
Initialization and Propagation. In the first frame of the
video, the superpixels are initialized along a grid using the
hierarchy of blocks. In the subsequent frames, the block
hierarchy is exploited to initialize the superpixels. Rather
than re-initializing along a grid, the new frame is initialized
by taking an intermediary block-level result from the previ-
ous frame (Fig. 2). Like this, the superpixel structure can be
propagated from the previous frame while discarding small
details. In practice, we use 4 block layers and propagate at
the 2nd layer, as shown in Fig. 4.
4. Randomized SEEDS
Some superpixel methods offer extra capabilities, such
as the extraction of a hierarchy of superpixels [17]. In this
section, we introduce a new capability of superpixels that,
to the best of our knowledge, has never been explored so
far. In the next section we exploit it to design an object-
ness measure of temporal windows, though we expect that
applications may not be limited to that one.
Superpixels are over-segmentations with many more re-
gions than objects in the image. A region that is uniform in
color can be over-segmented in many different correct ways,
and thus, more than one partition can be valid. In Fig. 6,we
give an example of different partitions with the same num-
ber of superpixels, with similar energy value and which so-
lutions have very similar accuracy according to the super-
pixel benchmarks. This shows that we can extract multiple
samples of superpixel partitions from the same video, all of
them of comparable quality.
Since there may be a considerable amount of those par-
titions, we aim at extracting samples that differ as much as
possible between themselves. We found a heuristic way,
yet effective and fast to compute, that consists on injecting
380

multiple SEEDS samples
Randomized
SEEDS
objectness scorelabels
Figure 6. Different samples of randomized SEEDS segmentations of the same frame and with the same accuracy are combined. In the
randomized SEEDS, we show the average of the different samples. The objectness score is computed as the sum of the distances to the
common superpixel boundaries.
noise to the evaluation of the exchanges of pixels in the hill-
climbing, i.e. in Eq. (2). This is,
int(c
B
t
n
,c
A
t:0
m
)+ int( c
B
t
n
,c
A
t:0
n
\B
t
n
), (5)
where ξ is the variable for the uniform random noise in the
interval [1, 1] and a is a scale factor. Note that if a is
small, the noise only affects the block exchanges which do
not produce a large change in the energy value. In the ex-
periments section, we analyze the effect of injecting noise
by changing its scale a and show that up to a certain level,
the performance is not degraded compared to the sample
obtained without adding noise, i.e. a =0. This corrobo-
rates that there exists a diversity of over-segmentations with
energy very close to the maximum that are equally valid.
Injecting noise may not be the only way for extracting
samples, but is by far the most efficient to compute that we
found. For example, changing the order in which we pro-
pose the exchanges of blocks of pixels in the hill-climbing,
turned to be successful but slower in our implementation.
5. Video Objectness
In this section, we introduce an application of random-
ized SEEDS to video objectness. It is based on the ob-
servation that the coincidences among multiple superpixel
partitions, reveal the true boundaries of objects. Fig. 6
shows that when superimposing a diverse set of superpixel
samples obtained with randomized SEEDS, the boundaries
of the objects are preserved, and the boundaries due to
over-segmentation fade away. This is because the over-
segmentation coincides where there are true region bound-
aries, and does not in regions with a similar uniform color.
In the following, we first define the measure of the ob-
jectness in a still image, and then we introduce how to ex-
tend it to temporal windows (tubes of bounding boxes).
Objectness Measure for Still Images. We use O to rep-
resent the intersection of several superpixel samples of ran-
domized SEEDS. O(i) takes value 1 if all samples have a
superpixel boundary at pixel i, and 0 otherwise. Thus, O is
an image that indicates in which pixels the samples of ran-
domized SEEDS agree that there is a superpixel boundary.
We define the objectness score for a still image using O.
It measures the closed boundary characteristic of objects.
A bounding box is more likely to contain an object when
there is a closed line in O that fits tightly the bounding box.
Specifically, we compute the distance from each pixel on
the perimeter of the bounding box to the nearest pixel that
fulfills O(i)=1. Thus, in case we are in the bottom or
the top of the bounding box, the distance is computed to the
closest pixel in the same column, and in case we are in one
of the sides, in the same row. See Fig. 6 for an illustration.
Let X be the set of pixels inside the bounding box, Per(X )
the set of pixels in the perimeter of the bounding box, and
X
R,C(p)
the pixels that are inside the bounding box and in the
same row or column as pixel p. Thus, the objectness score
is:
1
A
pPer(X )
min
i∈X
R,C(p)
O(i)=1
d(p, i), (6)
where d(·, ·) is the Euclidean distance, and A normalizes the
score using the area of the bounding box. In the supplemen-
tary material, we show that the score can be computed very
efficiently using two levels of integral images, with only 8
additions, allowing for the evaluation of over 100 million
bounding boxes per second. To the best of our knowledge,
no earlier work has used multiple superpixel hypotheses to
build an objectness score. In the experiments, we show that
using multiple hypothesis has an important impact on the
performance.
381

Citations
More filters
Journal ArticleDOI

What Makes for Effective Detection Proposals

TL;DR: An in-depth analysis of twelve proposal methods along with four baselines regarding proposal repeatability, ground truth annotation recall on PASCAL, ImageNet, and MS COCO, and their impact on DPM, R-CNN, and Fast R- CNN detection performance shows that for object detection improving proposal localisation accuracy is as important as improving recall.
Journal ArticleDOI

Superpixels: An evaluation of the state-of-the-art

TL;DR: An overall ranking of superpixel algorithms is presented which redefines the state-of-the-art and enables researchers to easily select appropriate algorithms and the corresponding implementations which themselves are made publicly available as part of the authors' benchmark at http://www.davidstutz.de/projects/superpixel-benchmark/ .
Proceedings Article

Video Segmentation by Non-Local Consensus voting.

TL;DR: This work proposes a computationally efficient algorithm which is able to produce accurate results on a large variety of unconstrained videos and outperforms current state-of-the-art methods.
Proceedings ArticleDOI

Fast action proposals for human action detection and search

TL;DR: Experimental results on two challenging datasets, MSRII and UCF 101, validate the superior performance of the action proposals as well as competitive results on action detection and search.
Book ChapterDOI

Spatio-Temporal Object Detection Proposals

TL;DR: This paper extends a recent 2D object proposal method, to produce spatio-temporal proposals by a randomized supervoxel merging process, and proposes a new efficient supervoxe method that leads to more accurate proposals when compared to existing state-of-the-art supervoxels.
References
More filters
Journal ArticleDOI

The Pascal Visual Object Classes (VOC) Challenge

TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
Journal ArticleDOI

Efficient Graph-Based Image Segmentation

TL;DR: An efficient segmentation algorithm is developed based on a predicate for measuring the evidence for a boundary between two regions using a graph-based representation of the image and it is shown that although this algorithm makes greedy decisions it produces segmentations that satisfy global properties.
Journal ArticleDOI

Contour Detection and Hierarchical Image Segmentation

TL;DR: This paper investigates two fundamental problems in computer vision: contour detection and image segmentation and presents state-of-the-art algorithms for both of these tasks.
Journal ArticleDOI

Spectral grouping using the Nystrom method

TL;DR: The contribution of this paper is a method that substantially reduces the computational requirements of grouping algorithms based on spectral partitioning making it feasible to apply them to very large grouping problems.
Related Papers (5)
Frequently Asked Questions (19)
Q1. What have the authors contributed in "Online video seeds for temporal window objectness" ?

The authors introduce an online, real-time video superpixel algorithm based on the recently proposed SEEDS superpixels. The multiple samples are shown to provide a strong cue to efficiently measure the objectness of image windows, and the authors introduce the novel concept of objectness in temporal windows. 

If each bounding box could move to 100 nearby positions in each subsequent frame, it leaves around 1050 possible temporal windows in a 25-frame sequence. 

The 3D Under-segmentation Error penalizes temporal superpixels that contain more than one object, the 3D Boundary Recall is the standard recall for temporal object boundaries, and the Explained Variation is a human-independent metric that considers how well the superpixel means represent the information in the video. 

The authors use the standard metrics, which are 2D Undersegmentation Error, 2D Boundary Recall, and 2D Segmentation Accuracy for still images, and 3D Undersegmentation Error, 3D Boundary Recall and Explained Variation for video. 

Then the energy function isH(s) = ∑k∑{Hj} (cAt:0k (j))2, (1)which is maximal when the histograms have only one nonzero bin for each video superpixel. 

As baselines, the authors use the output of boundary detectors, instead of using randomized SEEDS, to compute their objectness score in still images. 

In the supplementary material, the authors show that the score can be computed very efficiently using two levels of integral images, with only 8 additions, allowing for the evaluation of over 100 million bounding boxes per second. 

Note that the amount of variation grows faster in videos because it is propagated from the first frame of the video until the end. 

The objectness measure based on randomised SEEDS with 5 samples outperforms the one computed using only one sample, which emphasises the usefulness of using Randomized SEEDS. 

The video objectness score is proposed as a volumetric extension of Eq. (6) in the time dimension, normalized by the tube volume (we denoted as 3D edge score in the experiments). 

With an increasing number of papers on the analysis of videos, the interest in having similar concepts extracted from time sequences is increasing as well. 

The reason why is that the video objectness score can be seen as a form of multiple samples as well: the score is the sum over 25 samples in time. 

The result of this experiment is shown in Fig. 8.In both cases (still images and video), a variation between samples of about 20-30% per frame can be induced without significantly affecting the accuracy of the superpixels. 

In the following, the authors first define the measure of the objectness in a still image, and then the authors introduce how to extend it to temporal windows (tubes of bounding boxes). 

The objectness measure was introduced by Alexe et al. [1] for still images, whereafter [11] and [6] introduced new cues to boost performance. 

to show the usefulness of the video objectness score (noted as 3D edge in the figure), the authors compare with a method that uses only propagation. 

In practice, the authors use 4 block layers and propagate at the 2nd layer, as shown in Fig. 4.Some superpixel methods offer extra capabilities, such as the extraction of a hierarchy of superpixels [17]. 

This is,int(cBtn , cAt:0m ) + aξ ≥ int(cBtn , cAt:0n \\Btn), (5) where ξ is the variable for the uniform random noise in the interval [−1, 1] and a is a scale factor. 

In this paper the authors have introduced a novel online video superpixel algorithm that is able to run in real-time, with accuracy comparable to offline methods.