scispace - formally typeset
Search or ask a question
Author

Martin R. Oswald

Bio: Martin R. Oswald is an academic researcher from ETH Zurich. The author has contributed to research in topics: Computer science & 3D reconstruction. The author has an hindex of 16, co-authored 78 publications receiving 907 citations. Previous affiliations of Martin R. Oswald include Technische Universität München & University of Bonn.


Papers
More filters
Posted Content
TL;DR: This work proposes a novel method for instance label segmentation of dense 3D voxel grids that achieves state-of-the-art performance on the ScanNet 3D instance segmentation benchmark.
Abstract: We propose a novel method for instance label segmentation of dense 3D voxel grids. We target volumetric scene representations, which have been acquired with depth sensors or multi-view stereo methods and which have been processed with semantic 3D reconstruction or scene completion methods. The main task is to learn shape information about individual object instances in order to accurately separate them, including connected and incompletely scanned objects. We solve the 3D instance-labeling problem with a multi-task learning strategy. The first goal is to learn an abstract feature embedding, which groups voxels with the same instance label close to each other while separating clusters with different instance labels from each other. The second goal is to learn instance information by densely estimating directional information of the instance's center of mass for each voxel. This is particularly useful to find instance boundaries in the clustering post-processing step, as well as, for scoring the segmentation quality for the first goal. Both synthetic and real-world experiments demonstrate the viability and merits of our approach. In fact, it achieves state-of-the-art performance on the ScanNet 3D instance segmentation benchmark.

116 citations

Proceedings ArticleDOI
20 Jun 2019
TL;DR: In this paper, a multi-task learning strategy is proposed to learn an abstract feature embedding, which groups voxels with the same instance label close to each other while separating clusters with different instance labels from each other.
Abstract: We propose a novel method for instance label segmentation of dense 3D voxel grids. We target volumetric scene representations, which have been acquired with depth sensors or multi-view stereo methods and which have been processed with semantic 3D reconstruction or scene completion methods. The main task is to learn shape information about individual object instances in order to accurately separate them, including connected and incompletely scanned objects. We solve the 3D instance-labeling problem with a multi-task learning strategy. The first goal is to learn an abstract feature embedding, which groups voxels with the same instance label close to each other while separating clusters with different instance labels from each other. The second goal is to learn instance information by densely estimating directional information of the instance's center of mass for each voxel. This is particularly useful to find instance boundaries in the clustering post-processing step, as well as, for scoring the segmentation quality for the first goal. Both synthetic and real-world experiments demonstrate the viability and merits of our approach. In fact, it achieves state-of-the-art performance on the ScanNet 3D instance segmentation benchmark.

106 citations

Book ChapterDOI
08 Oct 2016
TL;DR: A novel prior for variational 3D reconstruction that favors symmetric solutions when dealing with noisy or incomplete data and is able to denoise and complete surface geometry and even hallucinate large scene parts is proposed.
Abstract: We propose a novel prior for variational 3D reconstruction that favors symmetric solutions when dealing with noisy or incomplete data. We detect symmetries from incomplete data while explicitly handling unexplored areas to allow for plausible scene completions. The set of detected symmetries is then enforced on their respective support domain within a variational reconstruction framework. This formulation also handles multiple symmetries sharing the same support. The proposed approach is able to denoise and complete surface geometry and even hallucinate large scene parts. We demonstrate in several experiments the benefit of harnessing symmetries when regularizing a surface.

61 citations

Proceedings ArticleDOI
01 Oct 2016
TL;DR: This work proposes a way to reduce the memory consumption of existing methods by determining early on in the reconstruction process which labels need to be active in which block, and shows results of joint semantic 3D reconstruction and semantic segmentation with significantly more labels than previous approaches were able to handle.
Abstract: Techniques that jointly perform dense 3D reconstruction and semantic segmentation have recently shown very promising results. One major restriction so far is that they can often only handle a very low number of semantic labels. This is mostly due to their high memory consumption caused by the necessity to store indicator variables for every label and transition. We propose a way to reduce the memory consumption of existing methods. Our approach is based on the observation that many semantic labels are only present at very localized positions in the scene, such as cars. Therefore this label does not need to be active at every location. We exploit this observation by dividing the scene into blocks in which generally only a subset of labels is active. By determining early on in the reconstruction process which labels need to be active in which block the memory consumption can be significantly reduced. In order to recover from mistakes we propose to update the set of active labels during the iterative optimization procedure based on the current solution. We also propose a way to initialize the set of active labels using a boosted classifier. In our experimental evaluation we show the reduction of memory usage quantitatively. Eventually, we show results of joint semantic 3D reconstruction and semantic segmentation with significantly more labels than previous approaches were able to handle.

60 citations

Book ChapterDOI
23 Aug 2020
TL;DR: Local Invariance Selection at Runtime for Descriptors (LISRD) as mentioned in this paper proposes to combine local and meta descriptors to select the right invariance when matching the local descriptors.
Abstract: To be invariant, or not to be invariant: that is the question formulated in this work about local descriptors. A limitation of current feature descriptors is the trade-off between generalization and discriminative power: more invariance means less informative descriptors. We propose to overcome this limitation with a disentanglement of invariance in local descriptors and with an online selection of the most appropriate invariance given the context. Our framework (https://github.com/rpautrat/LISRD) consists in a joint learning of multiple local descriptors with different levels of invariance and of meta descriptors encoding the regional variations of an image. The similarity of these meta descriptors across images is used to select the right invariance when matching the local descriptors. Our approach, named Local Invariance Selection at Runtime for Descriptors (LISRD), enables descriptors to adapt to adverse changes in images, while remaining discriminative when invariance is not required. We demonstrate that our method can boost the performance of current descriptors and outperforms state-of-the-art descriptors in several matching tasks, when evaluated on challenging datasets with day-night illumination as well as viewpoint changes.

55 citations


Cited by
More filters
Proceedings ArticleDOI
21 Jul 2017
TL;DR: This work introduces ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations, and shows that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks.
Abstract: A key requirement for leveraging supervised deep learning methods is the availability of large, labeled datasets. Unfortunately, in the context of RGB-D scene understanding, very little data is available – current datasets cover a small range of scene views and have limited semantic annotations. To address this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations. To collect this data, we designed an easy-to-use and scalable RGB-D capture system that includes automated surface reconstruction and crowdsourced semantic annotation. We show that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks, including 3D object classification, semantic voxel labeling, and CAD model retrieval.

2,305 citations

Book ChapterDOI
08 Oct 2016
TL;DR: The core contributions are the joint estimation of depth andnormal information, pixelwise view selection using photometric and geometric priors, and a multi-view geometric consistency term for the simultaneous refinement and image-based depth and normal fusion.
Abstract: This work presents a Multi-View Stereo system for robust and efficient dense modeling from unstructured image collections. Our core contributions are the joint estimation of depth and normal information, pixelwise view selection using photometric and geometric priors, and a multi-view geometric consistency term for the simultaneous refinement and image-based depth and normal fusion. Experiments on benchmarks and large-scale Internet photo collections demonstrate state-of-the-art performance in terms of accuracy, completeness, and efficiency.

1,372 citations

Journal ArticleDOI
TL;DR: This paper presents a comprehensive review of recent progress in deep learning methods for point clouds, covering three major tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation.
Abstract: Point cloud learning has lately attracted increasing attention due to its wide applications in many areas, such as computer vision, autonomous driving, and robotics As a dominating technique in AI, deep learning has been successfully used to solve various 2D vision problems However, deep learning on point clouds is still in its infancy due to the unique challenges faced by the processing of point clouds with deep neural networks Recently, deep learning on point clouds has become even thriving, with numerous methods being proposed to address different problems in this area To stimulate future research, this paper presents a comprehensive review of recent progress in deep learning methods for point clouds It covers three major tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation It also presents comparative results on several publicly available datasets, together with insightful observations and inspiring future research directions

1,021 citations

Posted Content
TL;DR: The ScanNet dataset as discussed by the authors contains 2.5M RGB-D views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations.
Abstract: A key requirement for leveraging supervised deep learning methods is the availability of large, labeled datasets. Unfortunately, in the context of RGB-D scene understanding, very little data is available -- current datasets cover a small range of scene views and have limited semantic annotations. To address this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations. To collect this data, we designed an easy-to-use and scalable RGB-D capture system that includes automated surface reconstruction and crowdsourced semantic annotation. We show that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks, including 3D object classification, semantic voxel labeling, and CAD model retrieval. The dataset is freely available at this http URL.

978 citations