scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Degraf-Flow: Extending Degraf Features for Accurate and Efficient Sparse-To-Dense Optical Flow Estimation

TL;DR: Evaluation on established real-world benchmark datasets show test performance in an autonomous vehicle setting where DeGraF-Flow shows promising results in terms of accuracy with competitive computational efficiency among non-GPU based methods, including a marked increase in speed over the conceptually similar EpicFlow approach.
Abstract: Modern optical flow methods make use of salient scene feature points detected and matched within the scene as a basis for sparse-to-dense optical flow estimation. Current feature detectors however either give sparse, non uniform point clouds (resulting in flow inaccuracies) or lack the efficiency for frame-rate real-time applications. In this work we use the novel Dense Gradient Based Features (DeGraF) as the input to a sparse-to-dense optical flow scheme. This consists of three stages: 1) efficient detection of uniformly distributed Dense Gradient Based Features (DeGraF) [1]; 2) feature tracking via robust local optical flow [2]; and 3) edge preserving flow interpolation [3] to recover overall dense optical flow. The tunable density and uniformity of DeGraF features yield superior dense optical flow estimation compared to other popular feature detectors within this three stage pipeline. Furthermore, the comparable speed of feature detection also lends itself well to the aim of real-time optical flow recovery. Evaluation on established real-world benchmark datasets show test performance in an autonomous vehicle setting where DeGraF-Flow shows promising results in terms of accuracy with competitive computational efficiency among non-GPU based methods, including a marked increase in speed over the conceptually similar EpicFlow approach [3].

Summary (2 min read)

1. INTRODUCTION

  • Optical flow estimation is the recovery of the motion fields between temporally adjacent images within a sequence.
  • Such matching of image points can accurately recover long range motions, however, as the matches are sparse (only cover a fraction of the image) they can lead to loss of accuracy at motion boundaries [3] in the refined dense optical flow.
  • This shows improved results compared against previous interpolation methods [19] and is currently used as a popular post-processing step in contemporary state-of-the-art dense optical flow estimation methods [20, 21, 22].
  • By contrast, recent work on the use of Dense Gradient Based Features [1] address these above issues.
  • This approach provides spatially tunable feature density and uniformity in addition to sub-pixel accuracy.

2. APPROACH

  • Dense optical flow is recovered from two temporally adjacent images (Figure 2A) using a three step process: Point detection on the first image is carried out by calculation of an even grid of DeGraF points [1] shown in Figure 2B.
  • A sliding window is passed over the image with step size of δ.
  • For an image region I of dimensions w × h containing grayscale pixels, two centroids, Cpos and Cneg are defined which define a gradient vector −−−−−−→ CposCneg for the region.
  • The key-point in each region is taken as the location of the most stable centroid, i.e if Sneg >.
  • The sparse optical flow vectors recovered are shown in Figure 2C.

3. EVALUATION

  • Statistical accuracy is measured using the established End-Point Error (EPE) metric [10, 13].
  • The authors method (DeGraF / DeGraF-Flow) was implemented in C++ and all experiments run on a Core i7 using four CPU cores.
  • All timings reported are for the run-time of the algorithm excluding image input/output and display.
  • For RLOF, the global motion and illumination models are used as per [2] with the adaptive cross based support region described in [29].

3.1. Comparison of Feature Detectors

  • To justify the use of DeGraF points, the authors compare with established feature detectors for dense flow computation on KITTI 2012 [10].
  • As is shown in [26], when using RLOF and interpolation to recover dense optical flow, a uniform grid of points is a superior input compared to other feature point detectors.
  • Here the authors repeat the experiment of [26] but with the addition of DeGraF (Table 1).
  • To allow meaningful comparison, each detector is tuned to ensure a comparable number of points are detected.
  • DeGraF shows the best performing EPE and efficient detection, equal to FAST.

3.2. Benchmark Comparison - KITTI 2012

  • Semi dense (50%) ground truth, calculated using a LiDAR, is provided.
  • The result of standalone Pyramidal Lucas-Kanade (PLK) [24] is shown as a baseline reference.
  • The first two columns give the percentage of estimated flow vectors that have an EPE of more than 3px.
  • DeGraF-Flow shows promising results in terms of balancing computational efficiency (run-time, Table 2) and accuracy.
  • In Table 2 it ranks thirteenth in accuracy (Out-Noc) and is shown to be the fourth fastest CPU method that has a percentage outlier of non occluded pixels of less than 10%.

3.3. Benchmark Comparison - KITTI 2015

  • These exhibit far larger pixel displacements in some areas resulting in lower algorithm performance on KITTI 2015 (Table 3) than on KITTI 2012 (Table 2).
  • Table 3 shows the results of DeGraF-Flow on the test set compared to DeepFlow [18] and EpicFlow [3] (which both use DeepMatch as the sparse matching technique).
  • The percentage of flow vectors with an EPE greater than 3px are shown, with fg and bg referring to foreground objects and the background scene respectively.
  • As with the KITTI 2012 benchmark results their approach shows comparable accuracy with significantly reduced runtime over EpicFlow and DeepFlow.
  • In terms of accuracy it places 13th (out of 22) from CPU methods that take less than than 10 seconds to process an image pair.

4. CONCLUSION

  • A novel optical flow estimation method.the authors.
  • -Flow uses a rapidly computed grid of Dense Gradient based Features and then combines an existing state of the art sparse point tracker (RLOF [2]) and interpolator (EPIC [3]) to recover dense flow.
  • With only minimal impact on accuracy (within 2% of EPICFlow [3] and DeepFlow [18] across all metrics) their approach offers significant gains in computational performance for dense optic flow estimation.
  • On the KITTI 2012 and 2015 benchmarks [10, 13] their method shows competitive run-time and comparable accuracy with other CPU methods.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Durham Research Online
Deposited in DRO:
05 June 2019
Version of attached le:
Accepted Version
Peer-review status of attached le:
Peer-reviewed
Citation for published item:
Stephenson, F. and Breckon, T.P. and Katramados, I. (2019) 'DeGraF-Flow : extending DeGraF features for
accurate and ecient sparse-to-dense optical ow estimation.', in 2019 IEEE International Conference on
Image Processing (ICIP) ; proceedings. Piscataway, NJ: IEEE, pp. 1277-1281.
Further information on publisher's website:
https://doi.org/10.1109/ICIP.2019.8803739
Publisher's copyright statement:
c
2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in
any current or future media, including reprinting/republishing this material for advertising or promotional purposes,
creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of
this work in other works.
Additional information:
Use policy
The full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or charge, for
personal research or study, educational, or not-for-prot purposes provided that:
a full bibliographic reference is made to the original source
a link is made to the metadata record in DRO
the full-text is not changed in any way
The full-text must not be sold in any format or medium without the formal permission of the copyright holders.
Please consult the full DRO policy for further details.
Durham University Library, Stockton Road, Durham DH1 3LY, United Kingdom
Tel : +44 (0)191 334 3042 | Fax : +44 (0)191 334 2971
https://dro.dur.ac.uk

DEGRAF-FLOW: EXTENDING DEGRAF FEATURES FOR ACCURATE AND EFFICIENT
SPARSE-TO-DENSE OPTICAL FLOW ESTIMATION
Felix Stephenson
1
, Toby P. Breckon
1
, Ioannis Katramados
2
Durham University, UK
1
| NHL Stenden University of Applied Sciences, Netherlands
2
ABSTRACT
Modern optical flow methods make use of salient scene fea-
ture points detected and matched within the scene as a ba-
sis for sparse-to-dense optical flow estimation. Current fea-
ture detectors however either give sparse, non uniform point
clouds (resulting in flow inaccuracies) or lack the efficiency
for frame-rate real-time applications. In this work we use
the novel Dense Gradient Based Features (DeGraF) as the
input to a sparse-to-dense optical flow scheme. This con-
sists of three stages: 1) efficient detection of uniformly dis-
tributed Dense Gradient Based Features (DeGraF) [1]; 2) fea-
ture tracking via robust local optical flow [2]; and 3) edge
preserving flow interpolation [3] to recover overall dense op-
tical flow. The tunable density and uniformity of DeGraF
features yield superior dense optical flow estimation com-
pared to other popular feature detectors within this three stage
pipeline. Furthermore, the comparable speed of feature de-
tection also lends itself well to the aim of real-time optical
flow recovery. Evaluation on established real-world bench-
mark datasets show test performance in an autonomous ve-
hicle setting where DeGraF-Flow shows promising results in
terms of accuracy with competitive computational efficiency
among non-GPU based methods, including a marked increase
in speed over the conceptually similar EpicFlow approach [3].
Index Terms optical flow, Dense Gradient Based Fea-
tures, DeGraF, automotive vision, feature points
1. INTRODUCTION
Optical flow estimation is the recovery of the motion fields
between temporally adjacent images within a sequence. Since
its conception over 35 years ago [4, 5] it remains an area of
intense interest in computer vision in terms of accurate and
efficient computation for numerous applications [6].
Dense optical flow estimation aims to accurately recover
per-pixel motion vectors from every pixel in a video frame to
the corresponding locations in the subsequent (or previous)
image frame in the sequence. These vector fields form the
basis for applications such as scene segmentation [7], object
detection and tracking [8], structure from motion and visual
odometry [9]. The development of autonomous vehicles has
revealed the necessity for real-time scene understanding [10].
This has lead to increased pressure to improve both the quality
and computational efficiency of dense optical flow estimation.
To date, recent progress in this area has been driven
by the introduction of increasingly challenging benchmark
Fig. 1. Dense optical flow results and error maps from KITTI
2012 [10] (left) and KITTI 2015 [13] (right).
datasets [11, 12, 10, 13] providing accurate ground truth op-
tical flow for comparison. Approaches which are robust to
scene discontinuities (occlusions, motion boundaries) and
appearance changes (illumination, chromacity) have shown
strong results under static scene conditions [11]. However,
the recent KITTI optical flow benchmarks [10, 13], which
comprise video sequences of dynamic real-world urban driv-
ing scenarios, presents additional challenges. In particular,
significant motion vectors between subsequent video frames
due to vehicle velocity through the scene, present a key prob-
lem in accurate optical flow estimation [14] for such large
displacement vectors.
To cope with such challenges, contemporary optical flow
methods use a sparse-to-dense estimation scheme, whereby a
sparse set of points on a video frame are matched to points
in the subsequent frame. The sparse set of optical flow vec-
tors recovered from this matching are then used as input to a
refinement scheme to recover dense optical flow [14, 15, 16,
3, 17, 18]. Such matching of image points can accurately re-
cover long range motions, however, as the matches are sparse
(only cover a fraction of the image) they can lead to loss of
accuracy at motion boundaries [3] in the refined dense optical
flow. To address this issue, the recent work of [3] (EpicFlow)
incorporates a novel state-of-the-art interpolator, Edge Pre-
serving Interpolation of Correspondences (EPIC), which re-
covers dense flow from sparse matches using edge detection
to preserve accuracy at motion boundaries. This shows im-
proved results compared against previous interpolation meth-
ods [19] and is currently used as a popular post-processing
step in contemporary state-of-the-art dense optical flow esti-
mation methods [20, 21, 22].
Many of these sparse matching techniques are highly ac-
curate but are notably incapable of real-time performance re-

quired for applications such as vehicle autonomy [3, 17, 18,
23]. By contrast, computational efficiency is more readily
achievable via sparse point tracking whereby flow estimation
only takes place on a fraction of the image. Interestingly,
the seminal Lucas-Kanade [5] sparse point tracker proposed
over 30 years ago still forms the basis for many contemporary
state-of-the-art sparse flow techniques [24, 25, 2]. Robust Lo-
cal Optical Flow (RLOF) [2] is one such derivative that shows
state-of-the-art accuracy on the KITTI benchmark [10, 13].
The most recent work on RLOF [26] demonstrates that com-
bining sparse flow field with interpolators can achieve both
efficient and accurate dense optical flow.
All methods that employ sparse feature tracking require a
well defined feature set upon which matching can take place.
Furthermore, sparse flow vectors with uniform spatial cover-
age is an ideal for accurate dense optical flow recovery [26]
making uniform feature distribution across the scene a key
conduit to success. Common feature choices are Harris [27]
or FAST key-points [28] due to their relative speed. How-
ever these detectors do not guarantee uniform spatial feature
distribution as they locate points only on highly textured im-
age regions (e.g corners and edges). To address this issue,
a current state of the art sparse flow method, denoted Fast
Semi Dense Epipolar Flow (FSDEF) [25], forces uniformity
of FAST key-point feature by use of a block-wise selection;
this however adversely effects saliency causing increased er-
roneous matches. Further to this, FAST points do not provide
sub-pixel precision so further refinement to the point matches
is required to recover accurate optical flow. By contrast, re-
cent work on the use of Dense Gradient Based Features (De-
GraF) [1] address these above issues. This approach provides
spatially tunable feature density and uniformity in addition to
sub-pixel accuracy. DeGraF has comparable speed to FAST
and has been shown to be superior in terms of noise and illu-
mination invariance [1].
Motivated by these desirable attributes, our proposed
method, DeGraF-Flow, takes a spatially uniform grid of De-
GraF feature points as an input to a sparse-to-dense optical
flow estimation scheme. Given two temporally adjacent im-
ages in a sequence, DeGraF points are detected in the first
image and then efficiently tracked to the subsequent image
using RLOF [2]. Finally dense optical flow is recovered using
the established EPIC interpolation approach [3].
2. APPROACH
Dense optical flow is recovered from two temporally adjacent
images (Figure 2A) using a three step process:
Point detection on the first image is carried out by calcu-
lation of an even grid of DeGraF points [1] shown in Figure
2B. A sliding window is passed over the image with step size
of δ. A key-point is detected within the window at each step
as follows.
For an image region I of dimensions w × h containing
grayscale pixels, two centroids, C
pos
and C
neg
are defined
Fig. 2. An overview of the DeGraF-Flow pipeline
which define a gradient vector
C
pos
C
neg
for the region. C
pos
is computed as the spatially weighted average pixel value:
C
pos
(x
pos
, y
pos
) = C
pos
h1
P
i=0
w1
P
j=0
iI(i,j)
S
pos
,
h1
P
i=0
w1
P
j=0
jI(i,j)
S
pos
(1)
where S
pos
=
h1
P
i=0
w1
P
j=0
I(i, j). The negative centroid
C
neg
is similarly defined as the weighted average of inverted
pixel values:
C
neg
(x
neg
, y
neg
) = C
neg
h1
P
i=0
w1
P
j =0
i(1+mI(i,j))
S
neg
,
h1
P
i=0
w1
P
j =0
j(1+mI(i,j))
S
neg
(2)
where S
neg
=
h1
P
i=0
w1
P
j=0
(1+mI(i, j)) and m = max
(i,j)
I(i, j).
Inverted pixel values are normalised (1 256) to avoid divi-
sion by zero.
The key-point in each region is taken as the location of the
most stable centroid, i.e if S
neg
> S
pos
then the key-point is
at (x
neg
, y
neg
) and vice versa. This choice is made because
the larger value from S
neg
and S
pos
is less sensitive to noise
and so the corresponding centroid is more robust.
Sparse Point tracking of each DeGraF point to the sub-
sequent image is carried out using the Robust Local Optical
Flow approach of [2]. The sparse optical flow vectors recov-
ered are shown in Figure 2C.

Interpolation of sparse vectors to recover the dense flow
field in Figure 2D is achieved using EPIC (Edge Preserving
Interpolation of Correspondences) [3]. An affine transforma-
tion is fitted to the k nearest support flow vectors, estimated
using a geodesic distance which penalises crossing of image
edges. This work uses image gradients instead of structured
edge maps for defining image edges. The result is a dense
optical flow field estimation as illustrated in Figure 2D.
3. EVALUATION
Evaluation is carried out on the KITTI optic flow estimation
benchmark data sets (denoted as KITTI 2012 [10] and KITTI
2015 [13]).
Statistical accuracy is measured using the established
End-Point Error (EPE) metric [10, 13]. For a predicted opti-
cal flow vector u
p
at every pixel with corresponding ground
flow truth vector u
g t
, the EPE is then defined as the average
difference between the predicted and ground truth vectors
over the image:
EPE =
1
N
X
i
ku
p
i
u
g t
i
k
2
, (3)
where N is the number of pixels and EPE is hence mea-
sured in pixels.
Our method (DeGraF / DeGraF-Flow) was implemented
in C++ and all experiments run on a Core i7 using four CPU
cores. All timings reported are for the run-time of the algo-
rithm excluding image input/output and display.
Parameters: DeGraF window size, w = h = 3 and step
size, δ = 9. For RLOF, the global motion and illumina-
tion models are used as per [2] with the adaptive cross based
support region described in [29]. This algorithm is termed
RLOF(IM-GM) in reported results in Table 2. For EPIC we
use k = 128 [3].
3.1. Comparison of Feature Detectors
To justify the use of DeGraF points, we compare with estab-
lished feature detectors for dense flow computation on KITTI
2012 [10].
As is shown in [26], when using RLOF and interpolation
to recover dense optical flow, a uniform grid of points is a su-
perior input compared to other feature point detectors. Here
we repeat the experiment of [26] but with the addition of De-
GraF (Table 1).
Table 1 shows the EPE on a KITTI image pair from using
popular feature detectors for the first stage of optical flow es-
timation. To allow meaningful comparison, each detector is
tuned to ensure a comparable number of points are detected.
DeGraF shows the best performing EPE and efficient detec-
tion, equal to FAST.
3.2. Benchmark Comparison - KITTI 2012
The KITTI 2012 benchmark [10] comprises 194 training and
194 test image pairs (1240 × 376 pixels) depicting static road
Point Detector # Points EPE
Detection
Time(s)
DeGraF [1] 5400 1.34 0.07
SURF [30] 5282 1.43 0.90
SIFT [31] 5400 1.77 1.80
AGAST [32] 5624 1.92 0.11
FAST [28] 5562 2.88 0.07
ORB [33] 5400 5.60 0.28
Table 1. Comparison of differing point detectors for dense
optical flow computation on KITTI 2012 example #164 with
each tuned to detect approximately 5400 points per image.
scenes. Semi dense (50%) ground truth, calculated using a
LiDAR, is provided.
Table 2 shows the results on the KITTI test set for CPU
methods that can process an image pair in under 20 seconds.
All such methods that perform better than DeGraf-Flow in
terms of EPE are included in the table. Not all less accurate
methods are shown. Below the dividing line the best sparse
flow methods are shown. Note how, although these methods
have very low error, this error is only reported over a greatly
reduced density of points. The result of standalone Pyrami-
dal Lucas-Kanade (PLK) [24] is shown as a baseline refer-
ence. The first two columns give the percentage of estimated
flow vectors that have an EPE of more than 3px. The next
two columns give average EPE values over the test set. Noc
denotes statistics on only the non occluded pixels, where oc-
cluded pixels are those which appear in the first image but
not in the second. Methods are ranked in order of increasing
non-occluded outlier percentage (Out-Noc).
DeGraF-Flow shows promising results in terms of balanc-
ing computational efficiency (run-time, Table 2) and accuracy.
In Table 2 it ranks thirteenth in accuracy (Out-Noc) and is
shown to be the fourth fastest CPU method that has a per-
centage outlier of non occluded pixels of less than 10%. Other
faster methods such as PCA-Flow and DIS-Fast show signifi-
cantly higher EPE . PCA-Flow and PCA-Layers both employ
an interpolation scheme for computing dense flow. The supe-
rior results of DeGraF-Flow show that the EPIC interpolator
is the correct choice, which agrees with the findings in [26].
At the time of testing, DeGraF-Flow was the 18th fastest over-
all, including GPU methods with an overall accuracy rank of
56 from a total of 95 submissions within KITTI 2012.
EpicFlow and RLOF(IM-GM) are highlighted in Table 2
as these are the two constituent components of our method.
DeGraF-Flow is outperformed by EpicFlow but runs almost
five times faster. RLOF(IM-GM) represents near state of the
art in sparse optical flow, making it an excellent candidate for
tracking DeGraF points. RLOF is second only to FSDEF [25]
which is comtemporary work to that presented here.
3.3. Benchmark Comparison - KITTI 2015
The KITTI 2015 benchmark [13] comprises 200 training and
200 test image pairs (1242 × 375 pixels) with the increased

Method Out-Noc Out-All Avg-Noc Avg-All Density Run-time Environment
SPS-Fl [23] 3.38 % 10.06 % 0.9 px 2.9 px 100.00 % 11 s 1 core @ 3.5 Ghz
SDF [34] 3.80 % 7.69 % 1.0 px 2.3 px 100.00 % TBA 1 core @ 2.5 Ghz
MotionSLIC [35] 3.91 % 10.56 % 0.9 px 2.7 px 100.00 % 11 s 1 core @ 3.0 Ghz
RicFlow [36] 4.96 % 13.04 % 1.3 px 3.2 px 100.00 % 5 s 1 core @ 3.5 Ghz
CPM2 [37] 5.60 % 13.52 % 1.3 px 3.3 px 100.00 % 4 s 1 core @ 2.5 Ghz
CPM-Flow [37] 5.79 % 13.70 % 1.3 px 3.2 px 100.00 % 4.2s 1 core @ 3.5 Ghz
MEC-Flow [38] 6.95 % 17.91 % 1.8 px 6.0 px 100.00 % 3 s 1 core @ 2.5 Ghz
DeepFlow [18] 7.22 % 17.79 % 1.5 px 5.8 px 100.00 % 17 s 1 core @ 3.6Ghz
RecSPy+ [39] 7.51 % 15.96 % 1.6 px 3.6 px 100.00 % 0.16 s 1 core @ 2.5 Ghz
RDENSE(anon) 7.72 % 14.02 % 1.9 px 4.6 px 100.00 % 0.5 s 4 cores @ 2.5 Ghz
EpicFlow [3] 7.88 % 17.08 % 1.5 px 3.8 px 100.00 % 15 s 1 core @ 3.6 Ghz
SparseFlow [17] 9.09 % 19.32 % 2.6 px 7.6 px 100.00 % 10 s 1 core @ 3.5 Ghz
DeGraF-Flow 9.41 % 16.93 % 2.5 px 8.4 px 100.00 % 3.2 s 4 cores @ 2.5 Ghz
PCA-Layers [19] 12.02 % 19.11 % 2.5 px 5.2 px 100.00 % 3.2 s 1 core @ 2.5 Ghz
PCA-Flow [19] 15.67 % 24.59 % 2.7 px 6.2 px 100.00 % 0.19 s 1 core @ 2.5 Ghz
DB-TV-L1 [40] 30.87 % 39.25 % 7.9 px 14.6 px 100.00 % 16 s 1 core @ 2.5 Ghz
DIS-FAST [41] 38.58 % 46.21 % 7.8 px 14.4 px 100.00 % 0.023s 1 core @ 4 Ghz
FSDEF [25] 1.07 % 1.17 % 0.7 px 0.7 px 41.81 % 0.26s 4 cores @ 3.5 Ghz
RLOF(IM-GM) [2] 2.48 % 2.64 % 0.8 px 1.0 px 11.84 % 3.7 s 4 core @ 3.4 Ghz
RLOF [42] 3.14 % 3.39 % 1.0 px 1.2 px 14.76 % 0.488 s GPU @ 700 Mhz
BERLOF [43] 3.31 % 3.60 % 1.0 px 1.2 px 15.26 % 0.231 s GPU @ 700 Mhz
PLK [24] 27.44 % 31.04 % 11.3 px 17.3 px 92.33 % 1.3 s 4 cores @ 3.5 Ghz
Table 2. KITTI 2012 Benchmark Results - comparison of the best CPU methods that can process an image pair in under 20
seconds. Below the line = sparse flow algorithms; Noc = pixels that are not occluded in the second image; First two columns =
percentage of flow vectors that have an EPE of greater than 3px; methods are ordered by Out-Noc. Full table of results can be
found on the KITTI benchmark website [44].
Rank Method Fl-bg Fl-fg Fl-all Run-time Environment
65 EpicFlow [3] 25.81 % 28.69 % 26.29 % 15 s 1 core @ 3.6 Ghz
67 DeepFlow [18] 27.96 % 31.06 % 28.48 % 17 s 1 core @ 3.5 Ghz
68 DeGraF-Flow 28.78 % 29.69 % 28.94 % 3.2 s 4 cores @ 2.5 Ghz
Table 3. KITTI 2015 Benchmark Results - percentage of pixels with EPE 3 pixels are given; f g and bg refer to motion of
foreground moving vehicles and the static background scene respectively.
challenges of dynamic scene objects (vehicles). These ex-
hibit far larger pixel displacements in some areas resulting in
lower algorithm performance on KITTI 2015 (Table 3) than
on KITTI 2012 (Table 2).
Table 3 shows the results of DeGraF-Flow on the test set
compared to DeepFlow [18] and EpicFlow [3] (which both
use DeepMatch as the sparse matching technique). The per-
centage of flow vectors with an EPE greater than 3px are
shown, with fg and bg referring to foreground objects and
the background scene respectively.
As with the KITTI 2012 benchmark results our approach
shows comparable accuracy with significantly reduced run-
time over EpicFlow and DeepFlow. Over all submissions to
the KITTI 2015 benchmark, DeGraF-Flow reports the 10th
fastest CPU method. In terms of accuracy it places 13th (out
of 22) from CPU methods that take less than than 10 seconds
to process an image pair. Over the total of 90 submissions
DeGraF-Flow places 68 in terms of Fl-all and 20 in terms of
run-time.
4. CONCLUSION
In this paper we present DeGraF-Flow, a novel optical flow
estimation method. DeGraF-Flow uses a rapidly computed
grid of Dense Gradient based Features (DeGraF) and then
combines an existing state of the art sparse point tracker
(RLOF [2]) and interpolator (EPIC [3]) to recover dense flow.
With only minimal impact on accuracy (within 2% of EPIC-
Flow [3] and DeepFlow [18] across all metrics) our approach
offers significant gains in computational performance for
dense optic flow estimation.
We show that the invariability, density and uniformity of
DeGraF points yield superior dense flow results compared to
other popular point detectors. The rapid end-to-end optic flow
estimation time is also conducive to real-time applications
such as scene understanding for future autonomous vehicle
applications.
On the KITTI 2012 and 2015 benchmarks [10, 13] our
method shows competitive run-time and comparable accuracy
with other CPU methods. Future work will exploit the track-
ing of DeGraF features for applications in an autonomous ve-
hicle setting.

Citations
More filters
Posted Content
TL;DR: In this article, the authors propose to estimate the traffic participants using instance-level segmentation and use the epipolar constraints that govern each independent motion for faster and more accurate estimation.
Abstract: We tackle the problem of estimating optical flow from a monocular camera in the context of autonomous driving. We build on the observation that the scene is typically composed of a static background, as well as a relatively small number of traffic participants which move rigidly in 3D. We propose to estimate the traffic participants using instance-level segmentation. For each traffic participant, we use the epipolar constraints that govern each independent motion for faster and more accurate estimation. Our second contribution is a new convolutional net that learns to perform flow matching, and is able to estimate the uncertainty of its matches. This is a core element of our flow estimation pipeline. We demonstrate the effectiveness of our approach in the challenging KITTI 2015 flow benchmark, and show that our approach outperforms published approaches by a large margin.

10 citations

References
More filters
Proceedings ArticleDOI
07 Jun 2015
TL;DR: A novel model and dataset for 3D scene flow estimation with an application to autonomous driving by representing each element in the scene by its rigid motion parameters and each superpixel by a 3D plane as well as an index to the corresponding object.
Abstract: This paper proposes a novel model and dataset for 3D scene flow estimation with an application to autonomous driving. Taking advantage of the fact that outdoor scenes often decompose into a small number of independently moving objects, we represent each element in the scene by its rigid motion parameters and each superpixel by a 3D plane as well as an index to the corresponding object. This minimal representation increases robustness and leads to a discrete-continuous CRF where the data term decomposes into pairwise potentials between superpixels and objects. Moreover, our model intrinsically segments the scene into its constituting dynamic components. We demonstrate the performance of our model on existing benchmarks as well as a novel realistic dataset with scene flow ground truth. We obtain this dataset by annotating 400 dynamic scenes from the KITTI raw data collection using detailed 3D CAD models for all vehicles in motion. Our experiments also reveal novel challenges which cannot be handled by existing methods.

1,918 citations


"Degraf-Flow: Extending Degraf Featu..." refers background or methods in this paper

  • ...On the KITTI 2012 and 2015 benchmarks [10, 13] our method shows competitive run-time and comparable accuracy with other CPU methods....

    [...]

  • ...The KITTI 2015 benchmark [13] comprises 200 training and 200 test image pairs (1242 × 375 pixels) with the increased...

    [...]

  • ...Statistical accuracy is measured using the established End-Point Error (EPE) metric [10, 13]....

    [...]

  • ...Dense optical flow results and error maps from KITTI 2012 [10] (left) and KITTI 2015 [13] (right)....

    [...]

  • ...However, the recent KITTI optical flow benchmarks [10, 13], which comprise video sequences of dynamic real-world urban driving scenarios, presents additional challenges....

    [...]

Journal ArticleDOI
TL;DR: A new heuristic for feature detection is presented and, using machine learning, a feature detector is derived from this which can fully process live PAL video using less than 5 percent of the available processing time.
Abstract: The repeatability and efficiency of a corner detector determines how likely it is to be useful in a real-world application. The repeatability is important because the same scene viewed from different positions should yield features which correspond to the same real-world 3D locations. The efficiency is important because this determines whether the detector combined with further processing can operate at frame rate. Three advances are described in this paper. First, we present a new heuristic for feature detection and, using machine learning, we derive a feature detector from this which can fully process live PAL video using less than 5 percent of the available processing time. By comparison, most other detectors cannot even operate at frame rate (Harris detector 115 percent, SIFT 195 percent). Second, we generalize the detector, allowing it to be optimized for repeatability, with little loss of efficiency. Third, we carry out a rigorous comparison of corner detectors based on the above repeatability criterion applied to 3D scenes. We show that, despite being principally constructed for speed, on these stringent tests, our heuristic detector significantly outperforms existing feature detectors. Finally, the comparison demonstrates that using machine learning produces significant improvements in repeatability, yielding a detector that is both very fast and of very high quality.

1,847 citations


"Degraf-Flow: Extending Degraf Featu..." refers background in this paper

  • ...Common feature choices are Harris [27] or FAST key-points [28] due to their relative speed....

    [...]

Book ChapterDOI
12 Sep 2007
TL;DR: This work presents a novel approach to solve the TV-L1 formulation, which is based on a dual formulation of the TV energy and employs an efficient point-wise thresholding step.
Abstract: Variational methods are among the most successful approaches to calculate the optical flow between two image frames. A particularly appealing formulation is based on total variation (TV) regularization and the robust L1 norm in the data fidelity term. This formulation can preserve discontinuities in the flow field and offers an increased robustness against illumination changes, occlusions and noise. In this work we present a novel approach to solve the TV-L1 formulation. Our method results in a very efficient numerical scheme, which is based on a dual formulation of the TV energy and employs an efficient point-wise thresholding step. Additionally, our approach can be accelerated by modern graphics processing units. We demonstrate the real-time performance (30 fps) of our approach for video inputs at a resolution of 320 × 240 pixels.

1,759 citations

Book ChapterDOI
07 Oct 2012
TL;DR: A new optical flow data set derived from the open source 3D animated short film Sintel is introduced, which has important features not present in the popular Middlebury flow evaluation: long sequences, large motions, specular reflections, motion blur, defocus blur, and atmospheric effects.
Abstract: Ground truth optical flow is difficult to measure in real scenes with natural motion As a result, optical flow data sets are restricted in terms of size, complexity, and diversity, making optical flow algorithms difficult to train and test on realistic data We introduce a new optical flow data set derived from the open source 3D animated short film Sintel This data set has important features not present in the popular Middlebury flow evaluation: long sequences, large motions, specular reflections, motion blur, defocus blur, and atmospheric effects Because the graphics data that generated the movie is open source, we are able to render scenes under conditions of varying complexity to evaluate where existing flow algorithms fail We evaluate several recent optical flow algorithms and find that current highly-ranked methods on the Middlebury evaluation have difficulty with this more complex data set suggesting further research on optical flow estimation is needed To validate the use of synthetic data, we compare the image- and flow-statistics of Sintel to those of real films and videos and show that they are similar The data set, metrics, and evaluation website are publicly available

1,742 citations


Additional excerpts

  • ...datasets [11, 12, 10, 13] providing accurate ground truth optical flow for comparison....

    [...]

Journal ArticleDOI
TL;DR: SIFT flow is proposed, a method to align an image to its nearest neighbors in a large image corpus containing a variety of scenes, where image information is transferred from the nearest neighbors to a query image according to the dense scene correspondence.
Abstract: While image alignment has been studied in different areas of computer vision for decades, aligning images depicting different scenes remains a challenging problem. Analogous to optical flow, where an image is aligned to its temporally adjacent frame, we propose SIFT flow, a method to align an image to its nearest neighbors in a large image corpus containing a variety of scenes. The SIFT flow algorithm consists of matching densely sampled, pixelwise SIFT features between two images while preserving spatial discontinuities. The SIFT features allow robust matching across different scene/object appearances, whereas the discontinuity-preserving spatial model allows matching of objects located at different parts of the scene. Experiments show that the proposed approach robustly aligns complex scene pairs containing significant spatial differences. Based on SIFT flow, we propose an alignment-based large database framework for image analysis and synthesis, where image information is transferred from the nearest neighbors to a query image according to the dense scene correspondence. This framework is demonstrated through concrete applications such as motion field prediction from a single image, motion synthesis via object transfer, satellite image registration, and face recognition.

1,726 citations


"Degraf-Flow: Extending Degraf Featu..." refers methods in this paper

  • ...The sparse set of optical flow vectors recovered from this matching are then used as input to a refinement scheme to recover dense optical flow [14, 15, 16, 3, 17, 18]....

    [...]

Frequently Asked Questions (11)
Q1. What contributions have the authors mentioned in the paper "Degraf-flow: extending degraf features for accurate and efficient sparse-to-dense optical flow estimation" ?

In this work the authors use the novel Dense Gradient Based Features ( DeGraF ) as the input to a sparse-to-dense optical flow scheme. Furthermore, the comparable speed of feature detection also lends itself well to the aim of real-time optical flow recovery. Evaluation on established real-world benchmark datasets show test performance in an autonomous vehicle setting where DeGraF-Flow shows promising results in terms of accuracy with competitive computational efficiency among non-GPU based methods, including a marked increase in speed over the conceptually similar EpicFlow approach [ 3 ]. 

Future work will exploit the tracking of DeGraF features for applications in an autonomous vehicle setting. 

sparse flow vectors with uniform spatial coverage is an ideal for accurate dense optical flow recovery [26] making uniform feature distribution across the scene a key conduit to success. 

Evaluation is carried out on the KITTI optic flow estimation benchmark data sets (denoted as KITTI 2012 [10] and KITTI 2015 [13]). 

The KITTI 2015 benchmark [13] comprises 200 training and 200 test image pairs (1242 × 375 pixels) with the increasedchallenges of dynamic scene objects (vehicles). 

Dense optical flow is recovered from two temporally adjacent images (Figure 2A) using a three step process:Point detection on the first image is carried out by calculation of an even grid of DeGraF points [1] shown in Figure 2B. 

To cope with such challenges, contemporary optical flow methods use a sparse-to-dense estimation scheme, whereby a sparse set of points on a video frame are matched to points in the subsequent frame. 

This choice is made because the larger value from Sneg and Spos is less sensitive to noise and so the corresponding centroid is more robust. 

For a predicted optical flow vector up at every pixel with corresponding ground flow truth vector ugt, the EPE is then defined as the average difference between the predicted and ground truth vectors over the image:EPE = 1N ∑ i ‖upi − u gt i ‖ 2, (3)where N is the number of pixels and EPE is hence measured in pixels. 

Given two temporally adjacent images in a sequence, DeGraF points are detected in the first image and then efficiently tracked to the subsequent image using RLOF [2]. 

The percentage of flow vectors with an EPE greater than 3px are shown, with fg and bg referring to foreground objects and the background scene respectively.