scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Degraf-Flow: Extending Degraf Features for Accurate and Efficient Sparse-To-Dense Optical Flow Estimation

TL;DR: Evaluation on established real-world benchmark datasets show test performance in an autonomous vehicle setting where DeGraF-Flow shows promising results in terms of accuracy with competitive computational efficiency among non-GPU based methods, including a marked increase in speed over the conceptually similar EpicFlow approach.
Abstract: Modern optical flow methods make use of salient scene feature points detected and matched within the scene as a basis for sparse-to-dense optical flow estimation. Current feature detectors however either give sparse, non uniform point clouds (resulting in flow inaccuracies) or lack the efficiency for frame-rate real-time applications. In this work we use the novel Dense Gradient Based Features (DeGraF) as the input to a sparse-to-dense optical flow scheme. This consists of three stages: 1) efficient detection of uniformly distributed Dense Gradient Based Features (DeGraF) [1]; 2) feature tracking via robust local optical flow [2]; and 3) edge preserving flow interpolation [3] to recover overall dense optical flow. The tunable density and uniformity of DeGraF features yield superior dense optical flow estimation compared to other popular feature detectors within this three stage pipeline. Furthermore, the comparable speed of feature detection also lends itself well to the aim of real-time optical flow recovery. Evaluation on established real-world benchmark datasets show test performance in an autonomous vehicle setting where DeGraF-Flow shows promising results in terms of accuracy with competitive computational efficiency among non-GPU based methods, including a marked increase in speed over the conceptually similar EpicFlow approach [3].

Summary (2 min read)

1. INTRODUCTION

  • Optical flow estimation is the recovery of the motion fields between temporally adjacent images within a sequence.
  • Such matching of image points can accurately recover long range motions, however, as the matches are sparse (only cover a fraction of the image) they can lead to loss of accuracy at motion boundaries [3] in the refined dense optical flow.
  • This shows improved results compared against previous interpolation methods [19] and is currently used as a popular post-processing step in contemporary state-of-the-art dense optical flow estimation methods [20, 21, 22].
  • By contrast, recent work on the use of Dense Gradient Based Features [1] address these above issues.
  • This approach provides spatially tunable feature density and uniformity in addition to sub-pixel accuracy.

2. APPROACH

  • Dense optical flow is recovered from two temporally adjacent images (Figure 2A) using a three step process: Point detection on the first image is carried out by calculation of an even grid of DeGraF points [1] shown in Figure 2B.
  • A sliding window is passed over the image with step size of δ.
  • For an image region I of dimensions w × h containing grayscale pixels, two centroids, Cpos and Cneg are defined which define a gradient vector −−−−−−→ CposCneg for the region.
  • The key-point in each region is taken as the location of the most stable centroid, i.e if Sneg >.
  • The sparse optical flow vectors recovered are shown in Figure 2C.

3. EVALUATION

  • Statistical accuracy is measured using the established End-Point Error (EPE) metric [10, 13].
  • The authors method (DeGraF / DeGraF-Flow) was implemented in C++ and all experiments run on a Core i7 using four CPU cores.
  • All timings reported are for the run-time of the algorithm excluding image input/output and display.
  • For RLOF, the global motion and illumination models are used as per [2] with the adaptive cross based support region described in [29].

3.1. Comparison of Feature Detectors

  • To justify the use of DeGraF points, the authors compare with established feature detectors for dense flow computation on KITTI 2012 [10].
  • As is shown in [26], when using RLOF and interpolation to recover dense optical flow, a uniform grid of points is a superior input compared to other feature point detectors.
  • Here the authors repeat the experiment of [26] but with the addition of DeGraF (Table 1).
  • To allow meaningful comparison, each detector is tuned to ensure a comparable number of points are detected.
  • DeGraF shows the best performing EPE and efficient detection, equal to FAST.

3.2. Benchmark Comparison - KITTI 2012

  • Semi dense (50%) ground truth, calculated using a LiDAR, is provided.
  • The result of standalone Pyramidal Lucas-Kanade (PLK) [24] is shown as a baseline reference.
  • The first two columns give the percentage of estimated flow vectors that have an EPE of more than 3px.
  • DeGraF-Flow shows promising results in terms of balancing computational efficiency (run-time, Table 2) and accuracy.
  • In Table 2 it ranks thirteenth in accuracy (Out-Noc) and is shown to be the fourth fastest CPU method that has a percentage outlier of non occluded pixels of less than 10%.

3.3. Benchmark Comparison - KITTI 2015

  • These exhibit far larger pixel displacements in some areas resulting in lower algorithm performance on KITTI 2015 (Table 3) than on KITTI 2012 (Table 2).
  • Table 3 shows the results of DeGraF-Flow on the test set compared to DeepFlow [18] and EpicFlow [3] (which both use DeepMatch as the sparse matching technique).
  • The percentage of flow vectors with an EPE greater than 3px are shown, with fg and bg referring to foreground objects and the background scene respectively.
  • As with the KITTI 2012 benchmark results their approach shows comparable accuracy with significantly reduced runtime over EpicFlow and DeepFlow.
  • In terms of accuracy it places 13th (out of 22) from CPU methods that take less than than 10 seconds to process an image pair.

4. CONCLUSION

  • A novel optical flow estimation method.the authors.
  • -Flow uses a rapidly computed grid of Dense Gradient based Features and then combines an existing state of the art sparse point tracker (RLOF [2]) and interpolator (EPIC [3]) to recover dense flow.
  • With only minimal impact on accuracy (within 2% of EPICFlow [3] and DeepFlow [18] across all metrics) their approach offers significant gains in computational performance for dense optic flow estimation.
  • On the KITTI 2012 and 2015 benchmarks [10, 13] their method shows competitive run-time and comparable accuracy with other CPU methods.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Durham Research Online
Deposited in DRO:
05 June 2019
Version of attached le:
Accepted Version
Peer-review status of attached le:
Peer-reviewed
Citation for published item:
Stephenson, F. and Breckon, T.P. and Katramados, I. (2019) 'DeGraF-Flow : extending DeGraF features for
accurate and ecient sparse-to-dense optical ow estimation.', in 2019 IEEE International Conference on
Image Processing (ICIP) ; proceedings. Piscataway, NJ: IEEE, pp. 1277-1281.
Further information on publisher's website:
https://doi.org/10.1109/ICIP.2019.8803739
Publisher's copyright statement:
c
2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in
any current or future media, including reprinting/republishing this material for advertising or promotional purposes,
creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of
this work in other works.
Additional information:
Use policy
The full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or charge, for
personal research or study, educational, or not-for-prot purposes provided that:
a full bibliographic reference is made to the original source
a link is made to the metadata record in DRO
the full-text is not changed in any way
The full-text must not be sold in any format or medium without the formal permission of the copyright holders.
Please consult the full DRO policy for further details.
Durham University Library, Stockton Road, Durham DH1 3LY, United Kingdom
Tel : +44 (0)191 334 3042 | Fax : +44 (0)191 334 2971
https://dro.dur.ac.uk

DEGRAF-FLOW: EXTENDING DEGRAF FEATURES FOR ACCURATE AND EFFICIENT
SPARSE-TO-DENSE OPTICAL FLOW ESTIMATION
Felix Stephenson
1
, Toby P. Breckon
1
, Ioannis Katramados
2
Durham University, UK
1
| NHL Stenden University of Applied Sciences, Netherlands
2
ABSTRACT
Modern optical flow methods make use of salient scene fea-
ture points detected and matched within the scene as a ba-
sis for sparse-to-dense optical flow estimation. Current fea-
ture detectors however either give sparse, non uniform point
clouds (resulting in flow inaccuracies) or lack the efficiency
for frame-rate real-time applications. In this work we use
the novel Dense Gradient Based Features (DeGraF) as the
input to a sparse-to-dense optical flow scheme. This con-
sists of three stages: 1) efficient detection of uniformly dis-
tributed Dense Gradient Based Features (DeGraF) [1]; 2) fea-
ture tracking via robust local optical flow [2]; and 3) edge
preserving flow interpolation [3] to recover overall dense op-
tical flow. The tunable density and uniformity of DeGraF
features yield superior dense optical flow estimation com-
pared to other popular feature detectors within this three stage
pipeline. Furthermore, the comparable speed of feature de-
tection also lends itself well to the aim of real-time optical
flow recovery. Evaluation on established real-world bench-
mark datasets show test performance in an autonomous ve-
hicle setting where DeGraF-Flow shows promising results in
terms of accuracy with competitive computational efficiency
among non-GPU based methods, including a marked increase
in speed over the conceptually similar EpicFlow approach [3].
Index Terms optical flow, Dense Gradient Based Fea-
tures, DeGraF, automotive vision, feature points
1. INTRODUCTION
Optical flow estimation is the recovery of the motion fields
between temporally adjacent images within a sequence. Since
its conception over 35 years ago [4, 5] it remains an area of
intense interest in computer vision in terms of accurate and
efficient computation for numerous applications [6].
Dense optical flow estimation aims to accurately recover
per-pixel motion vectors from every pixel in a video frame to
the corresponding locations in the subsequent (or previous)
image frame in the sequence. These vector fields form the
basis for applications such as scene segmentation [7], object
detection and tracking [8], structure from motion and visual
odometry [9]. The development of autonomous vehicles has
revealed the necessity for real-time scene understanding [10].
This has lead to increased pressure to improve both the quality
and computational efficiency of dense optical flow estimation.
To date, recent progress in this area has been driven
by the introduction of increasingly challenging benchmark
Fig. 1. Dense optical flow results and error maps from KITTI
2012 [10] (left) and KITTI 2015 [13] (right).
datasets [11, 12, 10, 13] providing accurate ground truth op-
tical flow for comparison. Approaches which are robust to
scene discontinuities (occlusions, motion boundaries) and
appearance changes (illumination, chromacity) have shown
strong results under static scene conditions [11]. However,
the recent KITTI optical flow benchmarks [10, 13], which
comprise video sequences of dynamic real-world urban driv-
ing scenarios, presents additional challenges. In particular,
significant motion vectors between subsequent video frames
due to vehicle velocity through the scene, present a key prob-
lem in accurate optical flow estimation [14] for such large
displacement vectors.
To cope with such challenges, contemporary optical flow
methods use a sparse-to-dense estimation scheme, whereby a
sparse set of points on a video frame are matched to points
in the subsequent frame. The sparse set of optical flow vec-
tors recovered from this matching are then used as input to a
refinement scheme to recover dense optical flow [14, 15, 16,
3, 17, 18]. Such matching of image points can accurately re-
cover long range motions, however, as the matches are sparse
(only cover a fraction of the image) they can lead to loss of
accuracy at motion boundaries [3] in the refined dense optical
flow. To address this issue, the recent work of [3] (EpicFlow)
incorporates a novel state-of-the-art interpolator, Edge Pre-
serving Interpolation of Correspondences (EPIC), which re-
covers dense flow from sparse matches using edge detection
to preserve accuracy at motion boundaries. This shows im-
proved results compared against previous interpolation meth-
ods [19] and is currently used as a popular post-processing
step in contemporary state-of-the-art dense optical flow esti-
mation methods [20, 21, 22].
Many of these sparse matching techniques are highly ac-
curate but are notably incapable of real-time performance re-

quired for applications such as vehicle autonomy [3, 17, 18,
23]. By contrast, computational efficiency is more readily
achievable via sparse point tracking whereby flow estimation
only takes place on a fraction of the image. Interestingly,
the seminal Lucas-Kanade [5] sparse point tracker proposed
over 30 years ago still forms the basis for many contemporary
state-of-the-art sparse flow techniques [24, 25, 2]. Robust Lo-
cal Optical Flow (RLOF) [2] is one such derivative that shows
state-of-the-art accuracy on the KITTI benchmark [10, 13].
The most recent work on RLOF [26] demonstrates that com-
bining sparse flow field with interpolators can achieve both
efficient and accurate dense optical flow.
All methods that employ sparse feature tracking require a
well defined feature set upon which matching can take place.
Furthermore, sparse flow vectors with uniform spatial cover-
age is an ideal for accurate dense optical flow recovery [26]
making uniform feature distribution across the scene a key
conduit to success. Common feature choices are Harris [27]
or FAST key-points [28] due to their relative speed. How-
ever these detectors do not guarantee uniform spatial feature
distribution as they locate points only on highly textured im-
age regions (e.g corners and edges). To address this issue,
a current state of the art sparse flow method, denoted Fast
Semi Dense Epipolar Flow (FSDEF) [25], forces uniformity
of FAST key-point feature by use of a block-wise selection;
this however adversely effects saliency causing increased er-
roneous matches. Further to this, FAST points do not provide
sub-pixel precision so further refinement to the point matches
is required to recover accurate optical flow. By contrast, re-
cent work on the use of Dense Gradient Based Features (De-
GraF) [1] address these above issues. This approach provides
spatially tunable feature density and uniformity in addition to
sub-pixel accuracy. DeGraF has comparable speed to FAST
and has been shown to be superior in terms of noise and illu-
mination invariance [1].
Motivated by these desirable attributes, our proposed
method, DeGraF-Flow, takes a spatially uniform grid of De-
GraF feature points as an input to a sparse-to-dense optical
flow estimation scheme. Given two temporally adjacent im-
ages in a sequence, DeGraF points are detected in the first
image and then efficiently tracked to the subsequent image
using RLOF [2]. Finally dense optical flow is recovered using
the established EPIC interpolation approach [3].
2. APPROACH
Dense optical flow is recovered from two temporally adjacent
images (Figure 2A) using a three step process:
Point detection on the first image is carried out by calcu-
lation of an even grid of DeGraF points [1] shown in Figure
2B. A sliding window is passed over the image with step size
of δ. A key-point is detected within the window at each step
as follows.
For an image region I of dimensions w × h containing
grayscale pixels, two centroids, C
pos
and C
neg
are defined
Fig. 2. An overview of the DeGraF-Flow pipeline
which define a gradient vector
C
pos
C
neg
for the region. C
pos
is computed as the spatially weighted average pixel value:
C
pos
(x
pos
, y
pos
) = C
pos
h1
P
i=0
w1
P
j=0
iI(i,j)
S
pos
,
h1
P
i=0
w1
P
j=0
jI(i,j)
S
pos
(1)
where S
pos
=
h1
P
i=0
w1
P
j=0
I(i, j). The negative centroid
C
neg
is similarly defined as the weighted average of inverted
pixel values:
C
neg
(x
neg
, y
neg
) = C
neg
h1
P
i=0
w1
P
j =0
i(1+mI(i,j))
S
neg
,
h1
P
i=0
w1
P
j =0
j(1+mI(i,j))
S
neg
(2)
where S
neg
=
h1
P
i=0
w1
P
j=0
(1+mI(i, j)) and m = max
(i,j)
I(i, j).
Inverted pixel values are normalised (1 256) to avoid divi-
sion by zero.
The key-point in each region is taken as the location of the
most stable centroid, i.e if S
neg
> S
pos
then the key-point is
at (x
neg
, y
neg
) and vice versa. This choice is made because
the larger value from S
neg
and S
pos
is less sensitive to noise
and so the corresponding centroid is more robust.
Sparse Point tracking of each DeGraF point to the sub-
sequent image is carried out using the Robust Local Optical
Flow approach of [2]. The sparse optical flow vectors recov-
ered are shown in Figure 2C.

Interpolation of sparse vectors to recover the dense flow
field in Figure 2D is achieved using EPIC (Edge Preserving
Interpolation of Correspondences) [3]. An affine transforma-
tion is fitted to the k nearest support flow vectors, estimated
using a geodesic distance which penalises crossing of image
edges. This work uses image gradients instead of structured
edge maps for defining image edges. The result is a dense
optical flow field estimation as illustrated in Figure 2D.
3. EVALUATION
Evaluation is carried out on the KITTI optic flow estimation
benchmark data sets (denoted as KITTI 2012 [10] and KITTI
2015 [13]).
Statistical accuracy is measured using the established
End-Point Error (EPE) metric [10, 13]. For a predicted opti-
cal flow vector u
p
at every pixel with corresponding ground
flow truth vector u
g t
, the EPE is then defined as the average
difference between the predicted and ground truth vectors
over the image:
EPE =
1
N
X
i
ku
p
i
u
g t
i
k
2
, (3)
where N is the number of pixels and EPE is hence mea-
sured in pixels.
Our method (DeGraF / DeGraF-Flow) was implemented
in C++ and all experiments run on a Core i7 using four CPU
cores. All timings reported are for the run-time of the algo-
rithm excluding image input/output and display.
Parameters: DeGraF window size, w = h = 3 and step
size, δ = 9. For RLOF, the global motion and illumina-
tion models are used as per [2] with the adaptive cross based
support region described in [29]. This algorithm is termed
RLOF(IM-GM) in reported results in Table 2. For EPIC we
use k = 128 [3].
3.1. Comparison of Feature Detectors
To justify the use of DeGraF points, we compare with estab-
lished feature detectors for dense flow computation on KITTI
2012 [10].
As is shown in [26], when using RLOF and interpolation
to recover dense optical flow, a uniform grid of points is a su-
perior input compared to other feature point detectors. Here
we repeat the experiment of [26] but with the addition of De-
GraF (Table 1).
Table 1 shows the EPE on a KITTI image pair from using
popular feature detectors for the first stage of optical flow es-
timation. To allow meaningful comparison, each detector is
tuned to ensure a comparable number of points are detected.
DeGraF shows the best performing EPE and efficient detec-
tion, equal to FAST.
3.2. Benchmark Comparison - KITTI 2012
The KITTI 2012 benchmark [10] comprises 194 training and
194 test image pairs (1240 × 376 pixels) depicting static road
Point Detector # Points EPE
Detection
Time(s)
DeGraF [1] 5400 1.34 0.07
SURF [30] 5282 1.43 0.90
SIFT [31] 5400 1.77 1.80
AGAST [32] 5624 1.92 0.11
FAST [28] 5562 2.88 0.07
ORB [33] 5400 5.60 0.28
Table 1. Comparison of differing point detectors for dense
optical flow computation on KITTI 2012 example #164 with
each tuned to detect approximately 5400 points per image.
scenes. Semi dense (50%) ground truth, calculated using a
LiDAR, is provided.
Table 2 shows the results on the KITTI test set for CPU
methods that can process an image pair in under 20 seconds.
All such methods that perform better than DeGraf-Flow in
terms of EPE are included in the table. Not all less accurate
methods are shown. Below the dividing line the best sparse
flow methods are shown. Note how, although these methods
have very low error, this error is only reported over a greatly
reduced density of points. The result of standalone Pyrami-
dal Lucas-Kanade (PLK) [24] is shown as a baseline refer-
ence. The first two columns give the percentage of estimated
flow vectors that have an EPE of more than 3px. The next
two columns give average EPE values over the test set. Noc
denotes statistics on only the non occluded pixels, where oc-
cluded pixels are those which appear in the first image but
not in the second. Methods are ranked in order of increasing
non-occluded outlier percentage (Out-Noc).
DeGraF-Flow shows promising results in terms of balanc-
ing computational efficiency (run-time, Table 2) and accuracy.
In Table 2 it ranks thirteenth in accuracy (Out-Noc) and is
shown to be the fourth fastest CPU method that has a per-
centage outlier of non occluded pixels of less than 10%. Other
faster methods such as PCA-Flow and DIS-Fast show signifi-
cantly higher EPE . PCA-Flow and PCA-Layers both employ
an interpolation scheme for computing dense flow. The supe-
rior results of DeGraF-Flow show that the EPIC interpolator
is the correct choice, which agrees with the findings in [26].
At the time of testing, DeGraF-Flow was the 18th fastest over-
all, including GPU methods with an overall accuracy rank of
56 from a total of 95 submissions within KITTI 2012.
EpicFlow and RLOF(IM-GM) are highlighted in Table 2
as these are the two constituent components of our method.
DeGraF-Flow is outperformed by EpicFlow but runs almost
five times faster. RLOF(IM-GM) represents near state of the
art in sparse optical flow, making it an excellent candidate for
tracking DeGraF points. RLOF is second only to FSDEF [25]
which is comtemporary work to that presented here.
3.3. Benchmark Comparison - KITTI 2015
The KITTI 2015 benchmark [13] comprises 200 training and
200 test image pairs (1242 × 375 pixels) with the increased

Method Out-Noc Out-All Avg-Noc Avg-All Density Run-time Environment
SPS-Fl [23] 3.38 % 10.06 % 0.9 px 2.9 px 100.00 % 11 s 1 core @ 3.5 Ghz
SDF [34] 3.80 % 7.69 % 1.0 px 2.3 px 100.00 % TBA 1 core @ 2.5 Ghz
MotionSLIC [35] 3.91 % 10.56 % 0.9 px 2.7 px 100.00 % 11 s 1 core @ 3.0 Ghz
RicFlow [36] 4.96 % 13.04 % 1.3 px 3.2 px 100.00 % 5 s 1 core @ 3.5 Ghz
CPM2 [37] 5.60 % 13.52 % 1.3 px 3.3 px 100.00 % 4 s 1 core @ 2.5 Ghz
CPM-Flow [37] 5.79 % 13.70 % 1.3 px 3.2 px 100.00 % 4.2s 1 core @ 3.5 Ghz
MEC-Flow [38] 6.95 % 17.91 % 1.8 px 6.0 px 100.00 % 3 s 1 core @ 2.5 Ghz
DeepFlow [18] 7.22 % 17.79 % 1.5 px 5.8 px 100.00 % 17 s 1 core @ 3.6Ghz
RecSPy+ [39] 7.51 % 15.96 % 1.6 px 3.6 px 100.00 % 0.16 s 1 core @ 2.5 Ghz
RDENSE(anon) 7.72 % 14.02 % 1.9 px 4.6 px 100.00 % 0.5 s 4 cores @ 2.5 Ghz
EpicFlow [3] 7.88 % 17.08 % 1.5 px 3.8 px 100.00 % 15 s 1 core @ 3.6 Ghz
SparseFlow [17] 9.09 % 19.32 % 2.6 px 7.6 px 100.00 % 10 s 1 core @ 3.5 Ghz
DeGraF-Flow 9.41 % 16.93 % 2.5 px 8.4 px 100.00 % 3.2 s 4 cores @ 2.5 Ghz
PCA-Layers [19] 12.02 % 19.11 % 2.5 px 5.2 px 100.00 % 3.2 s 1 core @ 2.5 Ghz
PCA-Flow [19] 15.67 % 24.59 % 2.7 px 6.2 px 100.00 % 0.19 s 1 core @ 2.5 Ghz
DB-TV-L1 [40] 30.87 % 39.25 % 7.9 px 14.6 px 100.00 % 16 s 1 core @ 2.5 Ghz
DIS-FAST [41] 38.58 % 46.21 % 7.8 px 14.4 px 100.00 % 0.023s 1 core @ 4 Ghz
FSDEF [25] 1.07 % 1.17 % 0.7 px 0.7 px 41.81 % 0.26s 4 cores @ 3.5 Ghz
RLOF(IM-GM) [2] 2.48 % 2.64 % 0.8 px 1.0 px 11.84 % 3.7 s 4 core @ 3.4 Ghz
RLOF [42] 3.14 % 3.39 % 1.0 px 1.2 px 14.76 % 0.488 s GPU @ 700 Mhz
BERLOF [43] 3.31 % 3.60 % 1.0 px 1.2 px 15.26 % 0.231 s GPU @ 700 Mhz
PLK [24] 27.44 % 31.04 % 11.3 px 17.3 px 92.33 % 1.3 s 4 cores @ 3.5 Ghz
Table 2. KITTI 2012 Benchmark Results - comparison of the best CPU methods that can process an image pair in under 20
seconds. Below the line = sparse flow algorithms; Noc = pixels that are not occluded in the second image; First two columns =
percentage of flow vectors that have an EPE of greater than 3px; methods are ordered by Out-Noc. Full table of results can be
found on the KITTI benchmark website [44].
Rank Method Fl-bg Fl-fg Fl-all Run-time Environment
65 EpicFlow [3] 25.81 % 28.69 % 26.29 % 15 s 1 core @ 3.6 Ghz
67 DeepFlow [18] 27.96 % 31.06 % 28.48 % 17 s 1 core @ 3.5 Ghz
68 DeGraF-Flow 28.78 % 29.69 % 28.94 % 3.2 s 4 cores @ 2.5 Ghz
Table 3. KITTI 2015 Benchmark Results - percentage of pixels with EPE 3 pixels are given; f g and bg refer to motion of
foreground moving vehicles and the static background scene respectively.
challenges of dynamic scene objects (vehicles). These ex-
hibit far larger pixel displacements in some areas resulting in
lower algorithm performance on KITTI 2015 (Table 3) than
on KITTI 2012 (Table 2).
Table 3 shows the results of DeGraF-Flow on the test set
compared to DeepFlow [18] and EpicFlow [3] (which both
use DeepMatch as the sparse matching technique). The per-
centage of flow vectors with an EPE greater than 3px are
shown, with fg and bg referring to foreground objects and
the background scene respectively.
As with the KITTI 2012 benchmark results our approach
shows comparable accuracy with significantly reduced run-
time over EpicFlow and DeepFlow. Over all submissions to
the KITTI 2015 benchmark, DeGraF-Flow reports the 10th
fastest CPU method. In terms of accuracy it places 13th (out
of 22) from CPU methods that take less than than 10 seconds
to process an image pair. Over the total of 90 submissions
DeGraF-Flow places 68 in terms of Fl-all and 20 in terms of
run-time.
4. CONCLUSION
In this paper we present DeGraF-Flow, a novel optical flow
estimation method. DeGraF-Flow uses a rapidly computed
grid of Dense Gradient based Features (DeGraF) and then
combines an existing state of the art sparse point tracker
(RLOF [2]) and interpolator (EPIC [3]) to recover dense flow.
With only minimal impact on accuracy (within 2% of EPIC-
Flow [3] and DeepFlow [18] across all metrics) our approach
offers significant gains in computational performance for
dense optic flow estimation.
We show that the invariability, density and uniformity of
DeGraF points yield superior dense flow results compared to
other popular point detectors. The rapid end-to-end optic flow
estimation time is also conducive to real-time applications
such as scene understanding for future autonomous vehicle
applications.
On the KITTI 2012 and 2015 benchmarks [10, 13] our
method shows competitive run-time and comparable accuracy
with other CPU methods. Future work will exploit the track-
ing of DeGraF features for applications in an autonomous ve-
hicle setting.

Citations
More filters
Posted Content
TL;DR: In this article, the authors propose to estimate the traffic participants using instance-level segmentation and use the epipolar constraints that govern each independent motion for faster and more accurate estimation.
Abstract: We tackle the problem of estimating optical flow from a monocular camera in the context of autonomous driving. We build on the observation that the scene is typically composed of a static background, as well as a relatively small number of traffic participants which move rigidly in 3D. We propose to estimate the traffic participants using instance-level segmentation. For each traffic participant, we use the epipolar constraints that govern each independent motion for faster and more accurate estimation. Our second contribution is a new convolutional net that learns to perform flow matching, and is able to estimate the uncertainty of its matches. This is a core element of our flow estimation pipeline. We demonstrate the effectiveness of our approach in the challenging KITTI 2015 flow benchmark, and show that our approach outperforms published approaches by a large margin.

10 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

01 Jan 2011
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

14,708 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations


"Degraf-Flow: Extending Degraf Featu..." refers background in this paper

  • ...Common feature choices are Harris [27] or FAST key-points [28] due to their relative speed....

    [...]

  • ...[27] C. Harris and M. Stephens, “A Combined Corner and Edge Detector,” Procedings Alvey Vis....

    [...]

Proceedings Article
24 Aug 1981
TL;DR: In this paper, the spatial intensity gradient of the images is used to find a good match using a type of Newton-Raphson iteration, which can be generalized to handle rotation, scaling and shearing.
Abstract: Image registration finds a variety of applications in computer vision. Unfortunately, traditional image registration techniques tend to be costly. We present a new image registration technique that makes use of the spatial intensity gradient of the images to find a good match using a type of Newton-Raphson iteration. Our technique is taster because it examines far fewer potential matches between the images than existing techniques Furthermore, this registration technique can be generalized to handle rotation, scaling and shearing. We show how our technique can be adapted tor use in a stereo vision system.

12,944 citations

Journal ArticleDOI
TL;DR: A novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features), which approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.

12,449 citations

Frequently Asked Questions (11)
Q1. What contributions have the authors mentioned in the paper "Degraf-flow: extending degraf features for accurate and efficient sparse-to-dense optical flow estimation" ?

In this work the authors use the novel Dense Gradient Based Features ( DeGraF ) as the input to a sparse-to-dense optical flow scheme. Furthermore, the comparable speed of feature detection also lends itself well to the aim of real-time optical flow recovery. Evaluation on established real-world benchmark datasets show test performance in an autonomous vehicle setting where DeGraF-Flow shows promising results in terms of accuracy with competitive computational efficiency among non-GPU based methods, including a marked increase in speed over the conceptually similar EpicFlow approach [ 3 ]. 

Future work will exploit the tracking of DeGraF features for applications in an autonomous vehicle setting. 

sparse flow vectors with uniform spatial coverage is an ideal for accurate dense optical flow recovery [26] making uniform feature distribution across the scene a key conduit to success. 

Evaluation is carried out on the KITTI optic flow estimation benchmark data sets (denoted as KITTI 2012 [10] and KITTI 2015 [13]). 

The KITTI 2015 benchmark [13] comprises 200 training and 200 test image pairs (1242 × 375 pixels) with the increasedchallenges of dynamic scene objects (vehicles). 

Dense optical flow is recovered from two temporally adjacent images (Figure 2A) using a three step process:Point detection on the first image is carried out by calculation of an even grid of DeGraF points [1] shown in Figure 2B. 

To cope with such challenges, contemporary optical flow methods use a sparse-to-dense estimation scheme, whereby a sparse set of points on a video frame are matched to points in the subsequent frame. 

This choice is made because the larger value from Sneg and Spos is less sensitive to noise and so the corresponding centroid is more robust. 

For a predicted optical flow vector up at every pixel with corresponding ground flow truth vector ugt, the EPE is then defined as the average difference between the predicted and ground truth vectors over the image:EPE = 1N ∑ i ‖upi − u gt i ‖ 2, (3)where N is the number of pixels and EPE is hence measured in pixels. 

Given two temporally adjacent images in a sequence, DeGraF points are detected in the first image and then efficiently tracked to the subsequent image using RLOF [2]. 

The percentage of flow vectors with an EPE greater than 3px are shown, with fg and bg referring to foreground objects and the background scene respectively.