Exploiting Semantic Information and Deep Matching for Optical Flow

doi:10.1007/978-3-319-46466-4_10

Home
/
Papers
/
Exploiting Semantic Information and Deep Matching for Optical Flow

Book Chapter•DOI•

Exploiting Semantic Information and Deep Matching for Optical Flow

Min Bai¹, Wenjie Luo¹, Kaustav Kundu¹, Raquel Urtasun¹•Institutions (1)

University of Toronto¹

08 Oct 2016-pp 154-170

TL;DR: This work tackles the problem of estimating optical flow from a monocular camera in the context of autonomous driving by proposing to estimate the traffic participants using instance-level segmentation, and introduces a new convolutional net that learns to perform flow matching, and is able to estimates the uncertainty of its matches.

read less

Abstract: We tackle the problem of estimating optical flow from a monocular camera in the context of autonomous driving. We build on the observation that the scene is typically composed of a static background, as well as a relatively small number of traffic participants which move rigidly in 3D. We propose to estimate the traffic participants using instance-level segmentation. For each traffic participant, we use the epipolar constraints that govern each independent motion for faster and more accurate estimation. Our second contribution is a new convolutional net that learns to perform flow matching, and is able to estimate the uncertainty of its matches. This is a core element of our flow estimation pipeline. We demonstrate the effectiveness of our approach in the challenging KITTI 2015 flow benchmark, and show that our approach outperforms published approaches by a large margin.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume

[...]

Deqing Sun¹, Xiaodong Yang¹, Ming-Yu Liu¹, Jan Kautz¹•Institutions (1)

Nvidia¹

01 Jun 2018

TL;DR: PWC-Net as discussed by the authors uses the current optical flow estimate to warp the CNN features of the second image, which is processed by a CNN to estimate the optical flow, and achieves state-of-the-art performance on the MPI Sintel final pass and KITTI 2015 benchmarks.

...read moreread less

Abstract: We present a compact but effective CNN model for optical flow, called PWC-Net. PWC-Net has been designed according to simple and well-established principles: pyramidal processing, warping, and the use of a cost volume. Cast in a learnable feature pyramid, PWC-Net uses the current optical flow estimate to warp the CNN features of the second image. It then uses the warped features and features of the first image to construct a cost volume, which is processed by a CNN to estimate the optical flow. PWC-Net is 17 times smaller in size and easier to train than the recent FlowNet2 model. Moreover, it outperforms all published optical flow methods on the MPI Sintel final pass and KITTI 2015 benchmarks, running at about 35 fps on Sintel resolution (1024 A— 436) images. Our models are available on our project website.

...read moreread less

2,231 citations

Book•

Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art

[...]

Joel Janai¹, Fatma Güney², Aseem Behl¹, Andreas Geiger¹•Institutions (2)

Max Planck Society¹, Koç University²

03 Jul 2020

TL;DR: This survey includes both the historically most relevant literature as well as the current state of the art on several specific topics, including recognition, reconstruction, motion estimation, tracking, scene understanding, and end-to-end learning for autonomous driving.

...read moreread less

Abstract: Recent years have witnessed enormous progress in AI-related fields such as computer vision, machine learning, and autonomous vehicles. As with any rapidly growing field, it becomes increasingly difficult to stay up-to-date or enter the field as a beginner. While several survey papers on particular sub-problems have appeared, no comprehensive survey on problems, datasets, and methods in computer vision for autonomous vehicles has been published. This monograph attempts to narrow this gap by providing a survey on the state-of-the-art datasets and techniques. Our survey includes both the historically most relevant literature as well as the current state of the art on several specific topics, including recognition, reconstruction, motion estimation, tracking, scene understanding, and end-to-end learning for autonomous driving. Towards this goal, we analyze the performance of the state of the art on several challenging benchmarking datasets, including KITTI, MOT, and Cityscapes. Besides, we discuss open problems and current research challenges. To ease accessibility and accommodate missing references, we also provide a website that allows navigating topics as well as methods and provides additional information.

...read moreread less

579 citations

Cites background or methods from "Exploiting Semantic Information and..."

...As mentioned above, Bai et al. (2016) extract traffic participants using instance-level segmentation and estimate the optical flow independently for different instances....
[...]
...When the goal is to represent real-world scenes with independent object motion as in scene flow or optical flow, superpixels can be generalized to rigidly moving segments (Vogel et al. (2015); Menze & Geiger (2015)), or semantic segments (Bai et al. (2016); Sevilla-Lara et al. (2016))....
[...]
...(2016)) or use semantic segmentation to split the scene into independently moving objects Bai et al. (2016); Sevilla-Lara et al....
[...]
...(Bai et al. (2016); Sevilla-Lara et al. (2016)) and learned highcapacity models (Dosovitskiy et al. (2015); Ranjan & Black (2016); Ilg et al. (2016)) have already proven to improve optical flow estimation by resolving ambiguities in the data....
[...]
...(Bai et al. (2016); Sevilla-Lara et al. (2016)) and learned highcapacity models (Dosovitskiy et al. (2015); Ranjan & Black (2016); Ilg et al....
[...]

Proceedings Article•DOI•

Deep Feature Flow for Video Recognition

[...]

Xizhou Zhu¹, Yuwen Xiong², Jifeng Dai², Lu Yuan², Yichen Wei² - Show less +1 more•Institutions (2)

University of Science and Technology of China¹, Microsoft²

01 Jul 2017

TL;DR: Deep Feature Flow as mentioned in this paper runs the expensive convolutional sub-network only on sparse key frames and propagates their deep feature maps to other frames via a flow field, which achieves significant speedup as flow computation is relatively fast.

...read moreread less

Abstract: Deep convolutional neutral networks have achieved great success on image recognition tasks. Yet, it is non-trivial to transfer the state-of-the-art image recognition networks to videos as per-frame evaluation is too slow and unaffordable. We present deep feature flow, a fast and accurate framework for video recognition. It runs the expensive convolutional sub-network only on sparse key frames and propagates their deep feature maps to other frames via a flow field. It achieves significant speedup as flow computation is relatively fast. The end-to-end training of the whole architecture significantly boosts the recognition accuracy. Deep feature flow is flexible and general. It is validated on two recent large scale video datasets. It makes a large step towards practical video recognition. Code would be released.

...read moreread less

422 citations

Posted Content•

Deep Feature Flow for Video Recognition

[...]

Xizhou Zhu¹, Yuwen Xiong², Jifeng Dai², Lu Yuan², Yichen Wei² - Show less +1 more•Institutions (2)

University of Science and Technology of China¹, Microsoft²

23 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: Deep feature flow is presented, a fast and accurate framework for video recognition that runs the expensive convolutional sub-network only on sparse key frames and propagates their deep feature maps to other frames via a flow field and achieves significant speedup as flow computation is relatively fast.

...read moreread less

356 citations

Cites background from "Exploiting Semantic Information and..."

...Other works attempt to exploit semantic segmentation information to help optical flow estimation [37, 1, 19], e....
[...]

Book Chapter•DOI•

SegStereo: Exploiting Semantic Information for Disparity Estimation

[...]

Guorun Yang¹, Hengshuang Zhao², Jianping Shi³, Zhidong Deng¹, Jiaya Jia² - Show less +1 more•Institutions (3)

Tsinghua University¹, The Chinese University of Hong Kong², SenseTime³

08 Sep 2018

TL;DR: This paper suggests that appropriate incorporation of semantic cues can greatly rectify prediction in commonly-used disparity estimation frameworks and proposes a unified model SegStereo, which employs semantic features from segmentation and introduces semantic softmax loss, which helps improve the prediction accuracy of disparity maps.

...read moreread less

Abstract: Disparity estimation for binocular stereo images finds a wide range of applications. Traditional algorithms may fail on featureless regions, which could be handled by high-level clues such as semantic segments. In this paper, we suggest that appropriate incorporation of semantic cues can greatly rectify prediction in commonly-used disparity estimation frameworks. Our method conducts semantic feature embedding and regularizes semantic cues as the loss term to improve learning disparity. Our unified model SegStereo employs semantic features from segmentation and introduces semantic softmax loss, which helps improve the prediction accuracy of disparity maps. The semantic cues work well in both unsupervised and supervised manners. SegStereo achieves state-of-the-art results on KITTI Stereo benchmark and produces decent prediction on both CityScapes and FlyingThings3D datasets.

...read moreread less

329 citations

Cites background from "Exploiting Semantic Information and..."

...[2] tackled instance-level segmentation and epipolar constraints to reduce the uncertainty of optical flow estimation....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

Collapse

References

PDF

Open Access

More filters

Proceedings Article•

Adam: A Method for Stochastic Optimization

[...]

Diederik P. Kingma¹, Jimmy Ba²•Institutions (2)

University of Amsterdam¹, University of Toronto²

01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

...read moreread less

111,197 citations

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

[...]

Sergey Ioffe¹, Christian Szegedy¹•Institutions (1)

Google¹

06 Jul 2015

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

...read moreread less

Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

...read moreread less

30,843 citations

Posted Content•

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

[...]

Sergey Ioffe¹, Christian Szegedy¹•Institutions (1)

Google¹

11 Feb 2015-arXiv: Learning

TL;DR: Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.

...read moreread less

Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.

...read moreread less

17,184 citations

Proceedings Article•DOI•

Fast R-CNN

[...]

Ross Girshick¹•Institutions (1)

Microsoft¹

07 Dec 2015

TL;DR: Fast R-CNN as discussed by the authors proposes a Fast Region-based Convolutional Network method for object detection, which employs several innovations to improve training and testing speed while also increasing detection accuracy and achieves a higher mAP on PASCAL VOC 2012.

...read moreread less

Abstract: This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks. Compared to previous work, Fast R-CNN employs several innovations to improve training and testing speed while also increasing detection accuracy. Fast R-CNN trains the very deep VGG16 network 9x faster than R-CNN, is 213x faster at test-time, and achieves a higher mAP on PASCAL VOC 2012. Compared to SPPnet, Fast R-CNN trains VGG16 3x faster, tests 10x faster, and is more accurate. Fast R-CNN is implemented in Python and C++ (using Caffe) and is available under the open-source MIT License at https://github.com/rbgirshick/fast-rcnn.

...read moreread less

14,824 citations