scispace - formally typeset
Search or ask a question
Author

Moritz Menze

Bio: Moritz Menze is an academic researcher from Leibniz University of Hanover. The author has contributed to research in topics: Optical flow & Motion estimation. The author has an hindex of 6, co-authored 9 publications receiving 1862 citations.

Papers
More filters
Proceedings ArticleDOI
07 Jun 2015
TL;DR: A novel model and dataset for 3D scene flow estimation with an application to autonomous driving by representing each element in the scene by its rigid motion parameters and each superpixel by a 3D plane as well as an index to the corresponding object.
Abstract: This paper proposes a novel model and dataset for 3D scene flow estimation with an application to autonomous driving. Taking advantage of the fact that outdoor scenes often decompose into a small number of independently moving objects, we represent each element in the scene by its rigid motion parameters and each superpixel by a 3D plane as well as an index to the corresponding object. This minimal representation increases robustness and leads to a discrete-continuous CRF where the data term decomposes into pairwise potentials between superpixels and objects. Moreover, our model intrinsically segments the scene into its constituting dynamic components. We demonstrate the performance of our model on existing benchmarks as well as a novel realistic dataset with scene flow ground truth. We obtain this dataset by annotating 400 dynamic scenes from the KITTI raw data collection using detailed 3D CAD models for all vehicles in motion. Our experiments also reveal novel challenges which cannot be handled by existing methods.

1,918 citations

Journal ArticleDOI
TL;DR: A novel unified approach which reasons jointly about 3D scene flow as well as the pose, shape and motion of vehicles in the scene is proposed and the results provide a prove of concept and demonstrate the usefulness of the method.
Abstract: . driving. While much progress has been made in recent years, imaging conditions in natural outdoor environments are still very challenging for current reconstruction and recognition methods. In this paper, we propose a novel unified approach which reasons jointly about 3D scene flow as well as the pose, shape and motion of vehicles in the scene. Towards this goal, we incorporate a deformable CAD model into a slanted-plane conditional random field for scene flow estimation and enforce shape consistency between the rendered 3D models and the parameters of all superpixels in the image. The association of superpixels to objects is established by an index variable which implicitly enables model selection. We evaluate our approach on the challenging KITTI scene flow dataset in terms of object and scene flow estimation. Our results provide a prove of concept and demonstrate the usefulness of our method.

315 citations

Journal ArticleDOI
TL;DR: A unified random field model which reasons jointly about 3D scene flow as well as the location, shape and motion of vehicles in the observed scene is proposed, which is the first to provide stereo and optical flow ground truth for dynamic real-world urban scenes at large scale.
Abstract: This work investigates the estimation of dense three-dimensional motion fields, commonly referred to as scene flow. While great progress has been made in recent years, large displacements and adverse imaging conditions as observed in natural outdoor environments are still very challenging for current approaches to reconstruction and motion estimation. In this paper, we propose a unified random field model which reasons jointly about 3D scene flow as well as the location, shape and motion of vehicles in the observed scene. We formulate the problem as the task of decomposing the scene into a small number of rigidly moving objects sharing the same motion parameters. Thus, our formulation effectively introduces long-range spatial dependencies which commonly employed local rigidity priors are lacking. Our inference algorithm then estimates the association of image segments and object hypotheses together with their three-dimensional shape and motion. We demonstrate the potential of the proposed approach by introducing a novel challenging scene flow benchmark which allows for a thorough comparison of the proposed scene flow approach with respect to various baseline models. In contrast to previous benchmarks, our evaluation is the first to provide stereo and optical flow ground truth for dynamic real-world urban scenes at large scale. Our experiments reveal that rigid motion segmentation can be utilized as an effective regularizer for the scene flow problem, improving upon existing two-frame scene flow methods. At the same time, our method yields plausible object segmentations without requiring an explicitly trained recognition model for a specific object class.

198 citations

Book ChapterDOI
07 Oct 2015
TL;DR: Three different strategies are investigated, each able to reduce computation and memory demands by several orders of magnitude, and their combination allows us to estimate large-displacement optical flow both accurately and efficiently and demonstrates the potential of discrete optimization for optical flow.
Abstract: We propose to look at large-displacement optical flow from a discrete point of view. Motivated by the observation that sub-pixel accuracy is easily obtained given pixel-accurate optical flow, we conjecture that computing the integral part is the hardest piece of the problem. Consequently, we formulate optical flow estimation as a discrete inference problem in a conditional random field, followed by sub-pixel refinement. Naive discretization of the 2D flow space, however, is intractable due to the resulting size of the label set. In this paper, we therefore investigate three different strategies, each able to reduce computation and memory demands by several orders of magnitude. Their combination allows us to estimate large-displacement optical flow both accurately and efficiently and demonstrates the potential of discrete optimization for optical flow. We obtain state-of-the-art performance on MPI Sintel and KITTI.

128 citations

Proceedings Article
01 Oct 2012
TL;DR: Robust multi-camera multi-person tracking is combined with a flexible analysis module, which uses online learning classification algorithms as well as user-generated filters to process the persons' trajectories in the surveillance space.
Abstract: The CamInSens system is a next-generation self-organizing video surveillance system that combines research being done in the fields of person-tracking, trajectory analysis, visual analytics, and self-organizing system management algorithms. Its purpose is the online threat detection by analysing anomalies in persons' trajectories. Therefore, robust multi-camera multi-person tracking is combined with a flexible analysis module, which uses online learning classification algorithms as well as user-generated filters to process the persons' trajectories in the surveillance space.

8 citations


Cited by
More filters
Proceedings ArticleDOI
01 Jun 2018
TL;DR: PWC-Net as discussed by the authors uses the current optical flow estimate to warp the CNN features of the second image, which is processed by a CNN to estimate the optical flow, and achieves state-of-the-art performance on the MPI Sintel final pass and KITTI 2015 benchmarks.
Abstract: We present a compact but effective CNN model for optical flow, called PWC-Net. PWC-Net has been designed according to simple and well-established principles: pyramidal processing, warping, and the use of a cost volume. Cast in a learnable feature pyramid, PWC-Net uses the current optical flow estimate to warp the CNN features of the second image. It then uses the warped features and features of the first image to construct a cost volume, which is processed by a CNN to estimate the optical flow. PWC-Net is 17 times smaller in size and easier to train than the recent FlowNet2 model. Moreover, it outperforms all published optical flow methods on the MPI Sintel final pass and KITTI 2015 benchmarks, running at about 35 fps on Sintel resolution (1024 A— 436) images. Our models are available on our project website.

2,231 citations

Proceedings ArticleDOI
TL;DR: In this article, a large-scale synthetic stereo video dataset is proposed to enable training and evaluation of optical flow estimation with a convolutional network and disparity estimation with CNNs.
Abstract: Recent work has shown that optical flow estimation can be formulated as a supervised learning task and can be successfully solved with convolutional networks. Training of the so-called FlowNet was enabled by a large synthetically generated dataset. The present paper extends the concept of optical flow estimation via convolutional networks to disparity and scene flow estimation. To this end, we propose three synthetic stereo video datasets with sufficient realism, variation, and size to successfully train large networks. Our datasets are the first large-scale datasets to enable training and evaluating scene flow methods. Besides the datasets, we present a convolutional network for real-time disparity estimation that provides state-of-the-art results. By combining a flow and disparity estimation network and training it jointly, we demonstrate the first scene flow estimation with a convolutional network.

1,759 citations

Proceedings ArticleDOI
01 Oct 2017
TL;DR: A novel deep learning architecture for regressing disparity from a rectified pair of stereo images is proposed, leveraging knowledge of the problem’s geometry to form a cost volume using deep feature representations and incorporating contextual information using 3-D convolutions over this volume.
Abstract: We propose a novel deep learning architecture for regressing disparity from a rectified pair of stereo images. We leverage knowledge of the problem’s geometry to form a cost volume using deep feature representations. We learn to incorporate contextual information using 3-D convolutions over this volume. Disparity values are regressed from the cost volume using a proposed differentiable soft argmin operation, which allows us to train our method end-to-end to sub-pixel accuracy without any additional post-processing or regularization. We evaluate our method on the Scene Flow and KITTI datasets and on KITTI we set a new stateof-the-art benchmark, while being significantly faster than competing approaches.

1,204 citations

Proceedings ArticleDOI
27 Jun 2016
TL;DR: This paper proposes three synthetic stereo video datasets with sufficient realism, variation, and size to successfully train large networks and presents a convolutional network for real-time disparity estimation that provides state-of-the-art results.
Abstract: Recent work has shown that optical flow estimation can be formulated as a supervised learning task and can be successfully solved with convolutional networks. Training of the so-called FlowNet was enabled by a large synthetically generated dataset. The present paper extends the concept of optical flow estimation via convolutional networks to disparity and scene flow estimation. To this end, we propose three synthetic stereo video datasets with sufficient realism, variation, and size to successfully train large networks. Our datasets are the first large-scale datasets to enable training and evaluation of scene flow methods. Besides the datasets, we present a convolutional network for real-time disparity estimation that provides state-of-the-art results. By combining a flow and disparity estimation network and training it jointly, we demonstrate the first scene flow estimation with a convolutional network.

1,184 citations

Book ChapterDOI
23 Aug 2020
TL;DR: RAFT as mentioned in this paper extracts per-pixel features, builds multi-scale 4D correlation volumes for all pairs of pixels, and iteratively updates a flow field through a recurrent unit that performs lookups on the correlation volumes.
Abstract: We introduce Recurrent All-Pairs Field Transforms (RAFT), a new deep network architecture for optical flow. RAFT extracts per-pixel features, builds multi-scale 4D correlation volumes for all pairs of pixels, and iteratively updates a flow field through a recurrent unit that performs lookups on the correlation volumes. RAFT achieves state-of-the-art performance. On KITTI, RAFT achieves an F1-all error of 5.10%, a 16% error reduction from the best published result (6.10%). On Sintel (final pass), RAFT obtains an end-point-error of 2.855 pixels, a 30% error reduction from the best published result (4.098 pixels). In addition, RAFT has strong cross-dataset generalization as well as high efficiency in inference time, training speed, and parameter count. Code is available at https://github.com/princeton-vl/RAFT.

1,006 citations