Top 32 papers published by Thomas Brox from University of Freiburg in 2019

Journal Article•DOI•

U-Net: deep learning for cell counting, detection, and morphometry

[...]

Thorsten Falk¹, Dominic Mai, Robert Bensch¹, Özgün Çiçek¹, Ahmed Abdulkadir¹, Ahmed Abdulkadir², Yassine Marrakchi¹, Anton Böhm¹, Jan Deubner¹, Zoe Jäckel¹, Katharina Seiwald¹, Alexander Dovzhenko¹, Olaf Tietz¹, Cristina Dal Bosco¹, Sean Walsh¹, Deniz Saltukoglu, Tuan Leng Tay¹, Marco Prinz¹, Klaus Palme¹, Matias Simons, Ilka Diester¹, Thomas Brox, Olaf Ronneberger¹ - Show less +19 more•Institutions (2)

University of Freiburg¹, University of Bern²

01 Jan 2019-Nature Methods

TL;DR: An ImageJ plugin is presented that enables non-machine-learning experts to analyze their data with U-Net on either a local computer or a remote server/cloud service.

...read moreread less

Abstract: U-Net is a generic deep-learning solution for frequently occurring quantification tasks such as cell detection and shape measurements in biomedical image data. We present an ImageJ plugin that enables non-machine-learning experts to analyze their data with U-Net on either a local computer or a remote server/cloud service. The plugin comes with pretrained models for single-cell segmentation and allows for U-Net to be adapted to new tasks on the basis of a few annotated samples.

...read moreread less

1,222 citations

Proceedings Article•DOI•

FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape From Single RGB Images

[...]

Christian Zimmermann¹, Duygu Ceylan², Jimei Yang², Bryan Russell², Max Argus¹, Thomas Brox¹ - Show less +2 more•Institutions (2)

University of Freiburg¹, Adobe Systems²

01 Oct 2019

TL;DR: This paper introduces the first large-scale, multi-view hand dataset that is accompanied by both 3D hand pose and shape annotations and proposes an iterative, semi-automated `human-in-the-loop' approach, which includes hand fitting optimization to infer both the 3D pose andshape for each sample.

...read moreread less

Abstract: Estimating 3D hand pose from single RGB images is a highly ambiguous problem that relies on an unbiased training dataset. In this paper, we analyze cross-dataset generalization when training on existing datasets. We find that approaches perform well on the datasets they are trained on, but do not generalize to other datasets or in-the-wild scenarios. As a consequence, we introduce the first large-scale, multi-view hand dataset that is accompanied by both 3D hand pose and shape annotations. For annotating this real-world dataset, we propose an iterative, semi-automated `human-in-the-loop' approach, which includes hand fitting optimization to infer both the 3D pose and shape for each sample. We show that methods trained on our dataset consistently perform well when tested on other datasets. Moreover, the dataset allows us to train a network that predicts the full articulated hand shape from a single RGB image. The evaluation set can serve as a benchmark for articulated hand shape estimation.

...read moreread less

293 citations

Proceedings Article•DOI•

What Do Single-View 3D Reconstruction Networks Learn?

[...]

Maxim Tatarchenko, Stephan R. Richter¹, Rene Ranftl¹, Zhuwen Li, Vladlen Koltun¹, Thomas Brox² - Show less +2 more•Institutions (2)

Intel¹, University of Freiburg²

15 Jun 2019

TL;DR: This work sets up two alternative approaches that perform image classification and retrieval respectively and shows that encoder-decoder methods are statistically indistinguishable from these baselines, indicating that the current state of the art in single-view object reconstruction does not actually perform reconstruction but image classification.

...read moreread less

Abstract: Convolutional networks for single-view object reconstruction have shown impressive performance and have become a popular subject of research. All existing techniques are united by the idea of having an encoder-decoder network that performs non-trivial reasoning about the 3D structure of the output space. In this work, we set up two alternative approaches that perform image classification and retrieval respectively. These simple baselines yield better results than state-of-the-art methods, both qualitatively and quantitatively. We show that encoder-decoder methods are statistically indistinguishable from these baselines, thus indicating that the current state of the art in single-view object reconstruction does not actually perform reconstruction but image classification. We identify aspects of popular experimental procedures that elicit this behavior and discuss ways to improve the current state of research.

...read moreread less

258 citations

Proceedings Article•DOI•

Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction

[...]

Osama Makansi¹, Eddy Ilg¹, Özgün Çiçek¹, Thomas Brox¹•Institutions (1)

University of Freiburg¹

15 Jun 2019

TL;DR: In this paper, a winner-takes-all loss and an iterative grouping of samples to multiple modes is proposed to predict multimodal distributions of the future states, including the common real scenario.

...read moreread less

Abstract: Future prediction is a fundamental principle of intelligence that helps plan actions and avoid possible dangers. As the future is uncertain to a large extent, modeling the uncertainty and multimodality of the future states is of great relevance. Existing approaches are rather limited in this regard and mostly yield a single hypothesis of the future or, at the best, strongly constrained mixture components that suffer from instabilities in training and mode collapse. In this work, we present an approach that involves the prediction of several samples of the future with a winner-takes-all loss and iterative grouping of samples to multiple modes. Moreover, we discuss how to evaluate predicted multimodal distributions, including the common real scenario, where only a single sample from the ground-truth distribution is available for evaluation. We show on synthetic and real data that the proposed approach triggers good estimates of multimodal distributions and avoids mode collapse.

...read moreread less

120 citations

Posted Content•

What Do Single-view 3D Reconstruction Networks Learn?

[...]

Maxim Tatarchenko, Stephan R. Richter¹, Rene Ranftl¹, Zhuwen Li, Vladlen Koltun¹, Thomas Brox² - Show less +2 more•Institutions (2)

Intel¹, University of Freiburg²

09 May 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper showed that the current state-of-the-art in single-view object reconstruction does not actually perform reconstruction but image classification, and proposed two alternative approaches that perform image classification and retrieval respectively.

...read moreread less

Abstract: Convolutional networks for single-view object reconstruction have shown impressive performance and have become a popular subject of research. All existing techniques are united by the idea of having an encoder-decoder network that performs non-trivial reasoning about the 3D structure of the output space. In this work, we set up two alternative approaches that perform image classification and retrieval respectively. These simple baselines yield better results than state-of-the-art methods, both qualitatively and quantitatively. We show that encoder-decoder methods are statistically indistinguishable from these baselines, thus indicating that the current state of the art in single-view object reconstruction does not actually perform reconstruction but image classification. We identify aspects of popular experimental procedures that elicit this behavior and discuss ways to improve the current state of research.

...read moreread less

113 citations

Journal Article•DOI•

Lucid Data Dreaming for Video Object Segmentation

[...]

Anna Khoreva¹, Rodrigo Benenson², Eddy Ilg³, Thomas Brox³, Bernt Schiele¹ - Show less +1 more•Institutions (3)

Max Planck Society¹, Google², University of Freiburg³

15 Mar 2019-International Journal of Computer Vision

TL;DR: The results indicate that using a larger training set is not automatically better, and that for the video object segmentation task a smaller training set that is closer to the target domain is more effective.

...read moreread less

Abstract: Convolutional networks reach top quality in pixel-level video object segmentation but require a large amount of training data (1k–100k) to deliver such results. We propose a new training strategy which achieves state-of-the-art results across three evaluation datasets while using $$20\,\times $$–$$1000\,\times $$ less annotated data than competing methods. Our approach is suitable for both single and multiple object segmentation. Instead of using large training sets hoping to generalize across domains, we generate in-domain training data using the provided annotation on the first frame of each video to synthesize—“lucid dream” (in a lucid dream the sleeper is aware that he or she is dreaming and is sometimes able to control the course of the dream)—plausible future video frames. In-domain per-video training data allows us to train high quality appearance- and motion-based models, as well as tune the post-processing stage. This approach allows to reach competitive results even when training from only a single annotated frame, without ImageNet pre-training. Our results indicate that using a larger training set is not automatically better, and that for the video object segmentation task a smaller training set that is closer to the target domain is more effective. This changes the mindset regarding how many training samples and general “objectness” knowledge are required for the video object segmentation task.

...read moreread less

109 citations

Proceedings Article•DOI•

Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction.

[...]

Osama Makansi¹, Eddy Ilg¹, Özgün Çiçek¹, Thomas Brox¹•Institutions (1)

University of Freiburg¹

09 Jun 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents an approach that involves the prediction of several samples of the future with a winner-takes-all loss and iterative grouping of samples to multiple modes and shows on synthetic and real data that the proposed approach triggers good estimates of multimodal distributions and avoids mode collapse.

...read moreread less

Abstract: Future prediction is a fundamental principle of intelligence that helps plan actions and avoid possible dangers. As the future is uncertain to a large extent, modeling the uncertainty and multimodality of the future states is of great relevance. Existing approaches are rather limited in this regard and mostly yield a single hypothesis of the future or, at the best, strongly constrained mixture components that suffer from instabilities in training and mode collapse. In this work, we present an approach that involves the prediction of several samples of the future with a winner-takes-all loss and iterative grouping of samples to multiple modes. Moreover, we discuss how to evaluate predicted multimodal distributions, including the common real scenario, where only a single sample from the ground-truth distribution is available for evaluation. We show on synthetic and real data that the proposed approach triggers good estimates of multimodal distributions and avoids mode collapse. Source code is available at $\href{this https URL}{\text{this https URL.}}$

...read moreread less

93 citations

Proceedings Article•

DeepUSPS: Deep Robust Unsupervised Saliency Prediction via Self-supervision

[...]

Tam Nguyen, Maximilian Dax¹, Chaithanya Kumar Mummadi¹, Nhung Ngo, Thi Hoai Phuong Nguyen, Zhongyu Lou¹, Thomas Brox² - Show less +3 more•Institutions (2)

Bosch¹, University of Freiburg²

01 Jan 2019

TL;DR: This work proposes a two-stage mechanism for robust unsupervised object saliency prediction, where the first stage involves refinement of the noisy pseudo labels generated from different handcrafted methods, and shows that this self-learning procedure outperforms all the existing unsuper supervised methods over different datasets.

...read moreread less

Abstract: Deep neural network (DNN) based salient object detection in images based on high-quality labels is expensive. Alternative unsupervised approaches rely on careful selection of multiple handcrafted saliency methods to generate noisy pseudo-ground-truth labels. In this work, we propose a two-stage mechanism for robust unsupervised object saliency prediction, where the first stage involves refinement of the noisy pseudo labels generated from different handcrafted methods. Each handcrafted method is substituted by a deep network that learns to generate the pseudo labels. These labels are refined incrementally in multiple iterations via our proposed self-supervision technique. In the second stage, the refined labels produced from multiple networks representing multiple saliency methods are used to train the actual saliency detection network. We show that this self-learning procedure outperforms all the existing unsupervised methods over different datasets. Results are even comparable to those of fully-supervised state-of-the-art approaches.

...read moreread less

85 citations

Proceedings Article•DOI•

CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth

[...]

José M. Fácil¹, Benjamin Ummenhofer², Huizhong Zhou³, Luis Montesano¹, Thomas Brox³, Javier Civera¹ - Show less +2 more•Institutions (3)

University of Zaragoza¹, Intel², University of Freiburg³

01 Jun 2019

TL;DR: In this article, a new type of convolution is proposed to take the camera parameters into account, thus allowing neural networks to learn calibration-aware patterns, which improves the generalization capabilities of depth prediction networks considerably.

...read moreread less

Abstract: Single-view depth estimation suffers from the problem that a network trained on images from one camera does not generalize to images taken with a different camera model Thus, changing the camera model requires collecting an entirely new training dataset In this work, we propose a new type of convolution that can take the camera parameters into account, thus allowing neural networks to learn calibration-aware patterns Experiments confirm that this improves the generalization capabilities of depth prediction networks considerably, and clearly outperforms the state of the art when the train and test images are acquired with different cameras

...read moreread less

54 citations

Journal Article•DOI•

Author Correction: U-Net: deep learning for cell counting, detection, and morphometry.

[...]

Thorsten Falk¹, Dominic Mai, Robert Bensch¹, Özgün Çiçek¹, Ahmed Abdulkadir¹, Ahmed Abdulkadir², Yassine Marrakchi¹, Anton Böhm¹, Jan Deubner¹, Zoe Jäckel¹, Katharina Seiwald¹, Alexander Dovzhenko¹, Olaf Tietz¹, Cristina Dal Bosco¹, Sean Walsh¹, Deniz Saltukoglu, Tuan Leng Tay¹, Marco Prinz¹, Klaus Palme¹, Matias Simons, Ilka Diester¹, Thomas Brox, Olaf Ronneberger¹ - Show less +19 more•Institutions (2)

University of Freiburg¹, University of Bern²

01 Apr 2019-Nature Methods

TL;DR: Corrections have been made in the PDF and HTML versions of the article, as well as in any cover sheets for associated Supplementary Information.

...read moreread less

Abstract: In the version of this paper originally published, one of the affiliations for Dominic Mai was incorrect: "Center for Biological Systems Analysis (ZBSA), Albert-Ludwigs-University, Freiburg, Germany" should have been "Life Imaging Center, Center for Biological Systems Analysis, Albert-Ludwigs-University, Freiburg, Germany." This change required some renumbering of subsequent author affiliations. These corrections have been made in the PDF and HTML versions of the article, as well as in any cover sheets for associated Supplementary Information.

...read moreread less

53 citations

Proceedings Article•DOI•

Defending Against Universal Perturbations With Shared Adversarial Training

[...]

Chaithanya Kumar Mummadi¹, Thomas Brox², Jan Hendrik Metzen¹•Institutions (2)

Bosch¹, University of Freiburg²

01 Oct 2019

TL;DR: This work shows that adversarial training is more effective in preventing universal perturbations, where the same perturbation needs to fool a classifier on many inputs, and investigates the trade-off between robustness against universal perturbed data and performance on unperturbed data.

...read moreread less

Abstract: Classifiers such as deep neural networks have been shown to be vulnerable against adversarial perturbations on problems with high-dimensional input space. While adversarial training improves the robustness of image classifiers against such adversarial perturbations, it leaves them sensitive to perturbations on a non-negligible fraction of the inputs. In this work, we show that adversarial training is more effective in preventing universal perturbations, where the same perturbation needs to fool a classifier on many inputs. Moreover, we investigate the trade-off between robustness against universal perturbations and performance on unperturbed data and propose an extension of adversarial training that handles this trade-off more gracefully. We present results for image classification and semantic segmentation to showcase that universal perturbations that fool a model hardened with adversarial training become clearly perceptible and show patterns of the target scene.

...read moreread less

Proceedings Article•DOI•

AutoDispNet: Improving Disparity Estimation With AutoML

[...]

Tonmoy Saikia¹, Yassine Marrakchi¹, Arber Zela¹, Frank Hutter¹, Thomas Brox¹ - Show less +1 more•Institutions (1)

University of Freiburg¹

01 Oct 2019

TL;DR: In this article, the authors leverage gradient-based neural architecture search and Bayesian optimization for hyperparameter search to optimize large-scale U-Net-like encoder-decoder architectures.

...read moreread less

Abstract: Much research work in computer vision is being spent on optimizing existing network architectures to obtain a few more percentage points on benchmarks. Recent AutoML approaches promise to relieve us from this effort. However, they are mainly designed for comparatively small-scale classification tasks. In this work, we show how to use and extend existing AutoML techniques to efficiently optimize large-scale U-Net-like encoder-decoder architectures. In particular, we leverage gradient-based neural architecture search and Bayesian optimization for hyperparameter search. The resulting optimization does not require a large-scale compute cluster. We show results on disparity estimation that clearly outperform the manually optimized baseline and reach state-of-the-art performance.

...read moreread less

Posted Content•

CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth

[...]

José M. Fácil¹, Benjamin Ummenhofer², Huizhong Zhou³, Luis Montesano¹, Thomas Brox³, Javier Civera¹ - Show less +2 more•Institutions (3)

University of Zaragoza¹, Intel², University of Freiburg³

03 Apr 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: A new type of convolution is proposed that can take the camera parameters into account, thus allowing neural networks to learn calibration-aware patterns, and improves the generalization capabilities of depth prediction networks considerably.

...read moreread less

Abstract: Single-view depth estimation suffers from the problem that a network trained on images from one camera does not generalize to images taken with a different camera model. Thus, changing the camera model requires collecting an entirely new training dataset. In this work, we propose a new type of convolution that can take the camera parameters into account, thus allowing neural networks to learn calibration-aware patterns. Experiments confirm that this improves the generalization capabilities of depth prediction networks considerably, and clearly outperforms the state of the art when the train and test images are acquired with different cameras.

...read moreread less

Posted Content•

Parting with Illusions about Deep Active Learning

[...]

Sudhanshu Mittal, Maxim Tatarchenko, Özgün Çiçek, Thomas Brox

11 Dec 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work re-implement various latest active learning approaches for image classification and evaluate them under more realistic settings and realistically assess the current state of the field and propose a more suitable evaluation protocol.

...read moreread less

Abstract: Active learning aims to reduce the high labeling cost involved in training machine learning models on large datasets by efficiently labeling only the most informative samples. Recently, deep active learning has shown success on various tasks. However, the conventional evaluation scheme used for deep active learning is below par. Current methods disregard some apparent parallel work in the closely related fields. Active learning methods are quite sensitive w.r.t. changes in the training procedure like data augmentation. They improve by a large-margin when integrated with semi-supervised learning, but barely perform better than the random baseline. We re-implement various latest active learning approaches for image classification and evaluate them under more realistic settings. We further validate our findings for semantic segmentation. Based on our observations, we realistically assess the current state of the field and propose a more suitable evaluation protocol.

...read moreread less

Journal Article•DOI•

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms

[...]

Peter Ochs¹, Jalal M. Fadili², Thomas Brox³•Institutions (3)

Saarland University¹, Centre national de la recherche scientifique², University of Freiburg³

01 Apr 2019-Journal of Optimization Theory and Applications

TL;DR: A flexible algorithm for non-smooth non-convex optimization is proposed for which (subsequential) convergence to a stationary point under weak assumptions on the growth of the model function error is proved.

...read moreread less

Abstract: We propose a unifying algorithm for non-smooth non-convex optimization. The algorithm approximates the objective function by a convex model function and finds an approximate (Bregman) proximal point of the convex model. This approximate minimizer of the model function yields a descent direction, along which the next iterate is found. Complemented with an Armijo-like line search strategy, we obtain a flexible algorithm for which we prove (subsequential) convergence to a stationary point under weak assumptions on the growth of the model function error. Special instances of the algorithm with a Euclidean distance function are, for example, Gradient Descent, Forward-Backward Splitting, ProxDescent, without the common requirement of a "Lipschitz continuous gradient". In addition, we consider a broad class of Bregman distance functions (generated by Legendre functions), replacing the Euclidean distance. The algorithm has a wide range of applications including many linear and non-linear inverse problems in signal/image processing and machine learning.

...read moreread less

Proceedings Article•

Anomaly Detection With Multiple-Hypotheses Predictions

[...]

Duc Tam Nguyen¹, Zhongyu Lou¹, Michael Klar¹, Thomas Brox²•Institutions (2)

Bosch¹, University of Freiburg²

24 May 2019

TL;DR: In this paper, a multi-hypotheses autoencoder is proposed to learn the data distribution of the foreground more efficiently with a discriminator, which prevents artificial data modes not supported by data, and enforces diversity across hypotheses.

...read moreread less

Abstract: In one-class-learning tasks, only the normal case (foreground) can be modeled with data, whereas the variation of all possible anomalies is too erratic to be described by samples. Thus, due to the lack of representative data, the wide-spread discriminative approaches cannot cover such learning tasks, and rather generative models, which attempt to learn the input density of the foreground, are used. However, generative models suffer from a large input dimensionality (as in images) and are typically inefficient learners. We propose to learn the data distribution of the foreground more efficiently with a multi-hypotheses autoencoder. Moreover, the model is criticized by a discriminator, which prevents artificial data modes not supported by data, and enforces diversity across hypotheses. Our multiple-hypothesesbased anomaly detection framework allows the reliable identification of out-of-distribution samples. For anomaly detection on CIFAR-10, it yields up to 3.9% points improvement over previously reported results. On a real anomaly detection task, the approach reduces the error of the baseline models from 6.8% to 1.5%.

...read moreread less

Posted Content•

Understanding and Robustifying Differentiable Architecture Search

[...]

Arber Zela¹, Thomas Elsken¹, Tonmoy Saikia¹, Yassine Marrakchi¹, Thomas Brox¹, Frank Hutter² - Show less +2 more•Institutions (2)

University of Freiburg¹, Bosch²

20 Sep 2019-arXiv: Learning

TL;DR: Differentiable architecture search (DARTS) has attracted a lot of attention due to its simplicity and small search costs achieved by a continuous relaxation and an approximation of the resulting bi-level optimization problem.

...read moreread less

Abstract: Differentiable Architecture Search (DARTS) has attracted a lot of attention due to its simplicity and small search costs achieved by a continuous relaxation and an approximation of the resulting bi-level optimization problem. However, DARTS does not work robustly for new problems: we identify a wide range of search spaces for which DARTS yields degenerate architectures with very poor test performance. We study this failure mode and show that, while DARTS successfully minimizes validation loss, the found solutions generalize poorly when they coincide with high validation loss curvature in the architecture space. We show that by adding one of various types of regularization we can robustify DARTS to find solutions with less curvature and better generalization properties. Based on these observations, we propose several simple variations of DARTS that perform substantially more robustly in practice. Our observations are robust across five search spaces on three image classification tasks and also hold for the very different domains of disparity estimation (a dense regression task) and language modelling.

...read moreread less

Posted Content•

CrossNorm: Normalization for Off-Policy TD Reinforcement Learning

[...]

Aditya Bhatt, Max Argus¹, Artemij Amiranashvili, Thomas Brox•Institutions (1)

University of Freiburg¹

14 Feb 2019-arXiv: Learning

TL;DR: It is shown that naive application of existing normalization techniques is indeed not effective, but that well-designed normalization improves optimization stability and removes the necessity of target networks, and introduces a normalization based on a mixture of on- and off-policy transitions, which is called cross-normalization.

...read moreread less

Abstract: Off-policy temporal difference (TD) methods are a powerful class of reinforcement learning (RL) algorithms. Intriguingly, deep off-policy TD algorithms are not commonly used in combination with feature normalization techniques, despite positive effects of normalization in other domains. We show that naive application of existing normalization techniques is indeed not effective, but that well-designed normalization improves optimization stability and removes the necessity of target networks. In particular, we introduce a normalization based on a mixture of on- and off-policy transitions, which we call cross-normalization. It can be regarded as an extension of batch normalization that re-centers data for two different distributions, as present in off-policy learning. Applied to DDPG and TD3, cross-normalization improves over the state of the art across a range of MuJoCo benchmark tasks.

...read moreread less

Posted Content•

AutoDispNet: Improving Disparity Estimation With AutoML

[...]

Tonmoy Saikia¹, Yassine Marrakchi¹, Arber Zela¹, Frank Hutter¹, Thomas Brox¹ - Show less +1 more•Institutions (1)

University of Freiburg¹

17 May 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work shows how to use and extend existing AutoML techniques to efficiently optimize large-scale U-Net-like encoder-decoder architectures by leveraging gradient-based neural architecture search and Bayesian optimization for hyperparameter search.

...read moreread less

Abstract: Much research work in computer vision is being spent on optimizing existing network architectures to obtain a few more percentage points on benchmarks. Recent AutoML approaches promise to relieve us from this effort. However, they are mainly designed for comparatively small-scale classification tasks. In this work, we show how to use and extend existing AutoML techniques to efficiently optimize large-scale U-Net-like encoder-decoder architectures. In particular, we leverage gradient-based neural architecture search and Bayesian optimization for hyperparameter search. The resulting optimization does not require a large-scale compute cluster. We show results on disparity estimation that clearly outperform the manually optimized baseline and reach state-of-the-art performance.

...read moreread less

Posted Content•

SELF: Learning to Filter Noisy Labels with Self-Ensembling

[...]

Duc Tam Nguyen¹, Chaithanya Kumar Mummadi², Thi Phuong Nhung Ngo, Thi Hoai Phuong Nguyen, Laura Beggel², Thomas Brox¹ - Show less +2 more•Institutions (2)

University of Freiburg¹, Bosch²

04 Oct 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: Self-ensemble label filtering (SELF) as discussed by the authors improves the task performance by gradually allowing supervision only from the potentially non-noisy (clean) labels and stopping learning on the filtered noisy labels.

...read moreread less

Abstract: Deep neural networks (DNNs) have been shown to over-fit a dataset when being trained with noisy labels for a long enough time. To overcome this problem, we present a simple and effective method self-ensemble label filtering (SELF) to progressively filter out the wrong labels during training. Our method improves the task performance by gradually allowing supervision only from the potentially non-noisy (clean) labels and stops learning on the filtered noisy labels. For the filtering, we form running averages of predictions over the entire training dataset using the network output at different training epochs. We show that these ensemble estimates yield more accurate identification of inconsistent predictions throughout training than the single estimates of the network at the most recent training epoch. While filtered samples are removed entirely from the supervised training loss, we dynamically leverage them via semi-supervised learning in the unsupervised loss. We demonstrate the positive effect of such an approach on various image classification tasks under both symmetric and asymmetric label noise and at different noise ratios. It substantially outperforms all previous works on noise-aware learning across different datasets and can be applied to a broad set of network architectures.

...read moreread less

Posted Content•

FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape from Single RGB Images

[...]

Christian Zimmermann¹, Duygu Ceylan², Jimei Yang², Bryan Russell², Max Argus¹, Thomas Brox¹ - Show less +2 more•Institutions (2)

University of Freiburg¹, Adobe Systems²

10 Sep 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a large-scale multi-view hand dataset with both 3D hand pose and shape annotations is introduced, and an iterative, semi-automated human-in-the-loop approach is proposed.

...read moreread less

Abstract: Estimating 3D hand pose from single RGB images is a highly ambiguous problem that relies on an unbiased training dataset. In this paper, we analyze cross-dataset generalization when training on existing datasets. We find that approaches perform well on the datasets they are trained on, but do not generalize to other datasets or in-the-wild scenarios. As a consequence, we introduce the first large-scale, multi-view hand dataset that is accompanied by both 3D hand pose and shape annotations. For annotating this real-world dataset, we propose an iterative, semi-automated `human-in-the-loop' approach, which includes hand fitting optimization to infer both the 3D pose and shape for each sample. We show that methods trained on our dataset consistently perform well when tested on other datasets. Moreover, the dataset allows us to train a network that predicts the full articulated hand shape from a single RGB image. The evaluation set can serve as a benchmark for articulated hand shape estimation.

...read moreread less

Proceedings Article•

Motion Perception in Reinforcement Learning with Dynamic Objects

[...]

Artemij Amiranashvili¹, Alexey Dosovitskiy², Vladlen Koltun², Thomas Brox¹•Institutions (2)

University of Freiburg¹, Intel²

10 Jan 2019

TL;DR: It is shown that for continuous control tasks learning an explicit representation of motion improves the quality of the learned controller in dynamic scenarios, and that using an image difference between the current and the previous frame as an additional input leads to better results than a temporal stack of frames.

...read moreread less

Abstract: In dynamic environments, learned controllers are supposed to take motion into account when selecting the action to be taken. However, in existing reinforcement learning works motion is rarely treated explicitly; it is rather assumed that the controller learns the necessary motion representation from temporal stacks of frames implicitly. In this paper, we show that for continuous control tasks learning an explicit representation of motion improves the quality of the learned controller in dynamic scenarios. We demonstrate this on common benchmark tasks (Walker, Swimmer, Hopper), on target reaching and ball catching tasks with simulated robotic arms, and on a dynamic single ball juggling task. Moreover, we find that when equipped with an appropriate network architecture, the agent can, on some tasks, learn motion features also with pure reinforcement learning, without additional supervision. Further we find that using an image difference between the current and the previous frame as an additional input leads to better results than a temporal stack of frames.

...read moreread less

Proceedings Article•DOI•

Self-supervised 3D Shape and Viewpoint Estimation from Single Images for Robotics

[...]

Oier Mees¹, Maxim Tatarchenko¹, Thomas Brox¹, Wolfram Burgard¹•Institutions (1)

University of Freiburg¹

01 Nov 2019

TL;DR: In this article, a convolutional neural network was proposed for joint 3D shape prediction and viewpoint estimation from a single input image, which does not require ground truth data for 3D shapes and the viewpoints.

...read moreread less

Abstract: We present a convolutional neural network for joint 3D shape prediction and viewpoint estimation from a single input image. During training, our network gets the learning signal from a silhouette of an object in the input image-a form of self-supervision. It does not require ground truth data for 3D shapes and the viewpoints. Because it relies on such a weak form of supervision, our approach can easily be applied to real-world data. We demonstrate that our method produces reasonable qualitative and quantitative results on natural images for both shape estimation and viewpoint prediction. Unlike previous approaches, our method does not require multiple views of the same object instance in the dataset, which significantly expands the applicability in practical robotics scenarios. We showcase it by using the hallucinated shapes to improve the performance on the task of grasping real-world objects both in simulation and with a PR2 robot.

...read moreread less

Posted Content•

Self-supervised 3D Shape and Viewpoint Estimation from Single Images for Robotics

[...]

Oier Mees¹, Maxim Tatarchenko¹, Thomas Brox¹, Wolfram Burgard¹•Institutions (1)

University of Freiburg¹

17 Oct 2019-arXiv: Robotics

TL;DR: A convolutional neural network for joint 3D shape prediction and viewpoint estimation from a single input image that gets the learning signal from a silhouette of an object in the input image-a form of self-supervision.

...read moreread less

Abstract: We present a convolutional neural network for joint 3D shape prediction and viewpoint estimation from a single input image. During training, our network gets the learning signal from a silhouette of an object in the input image - a form of self-supervision. It does not require ground truth data for 3D shapes and the viewpoints. Because it relies on such a weak form of supervision, our approach can easily be applied to real-world data. We demonstrate that our method produces reasonable qualitative and quantitative results on natural images for both shape estimation and viewpoint prediction. Unlike previous approaches, our method does not require multiple views of the same object instance in the dataset, which significantly expands the applicability in practical robotics scenarios. We showcase it by using the hallucinated shapes to improve the performance on the task of grasping real-world objects both in simulation and with a PR2 robot.

...read moreread less

Posted Content•

Robust Learning Under Label Noise With Iterative Noise-Filtering.

[...]

Duc Tam Nguyen, Thi-Phuong-Nhung Ngo¹, Zhongyu Lou¹, Michael Klar¹, Laura Beggel¹, Thomas Brox - Show less +2 more•Institutions (1)

Bosch¹

01 Jun 2019-arXiv: Learning

TL;DR: This paper proposes an iterative semi-supervised mechanism for robust learning which excludes noisy labels but is still able to learn from the corresponding samples, and adds an unsupervised loss term that also serves as a regularizer against the remaining label noise.

...read moreread less

Abstract: We consider the problem of training a model under the presence of label noise. Current approaches identify samples with potentially incorrect labels and reduce their influence on the learning process by either assigning lower weights to them or completely removing them from the training set. In the first case the model however still learns from noisy labels; in the latter approach, good training data can be lost. In this paper, we propose an iterative semi-supervised mechanism for robust learning which excludes noisy labels but is still able to learn from the corresponding samples. To this end, we add an unsupervised loss term that also serves as a regularizer against the remaining label noise. We evaluate our approach on common classification tasks with different noise ratios. Our robust models outperform the state-of-the-art methods by a large margin. Especially for very large noise ratios, we achieve up to 20 % absolute improvement compared to the previous best model.

...read moreread less

Proceedings Article•DOI•

Automated Boxwood Topiary Trimming with a Robotic Arm and Integrated Stereo Vision

[...]

Dejan Kaljaca¹, Nikolaus Mayer², B.A. Vroegindeweij¹, Angelo Mencarelli, Eldert J. van Henten¹, Thomas Brox² - Show less +2 more•Institutions (2)

Wageningen University and Research Centre¹, University of Freiburg²

01 Nov 2019

TL;DR: This paper presents an integrated hardware-software solution to perform fully automated robotic bush trimming to user-specified shapes via a vision-based shape fitting module that allows fitting an arbitrary mesh into a bush at hand.

...read moreread less

Abstract: This paper presents an integrated hardware-software solution to perform fully automated robotic bush trimming to user-specified shapes. In contrast to specialized solutions that can trim only bushes of a certain shape, the approach ensures flexibility via a vision-based shape fitting module that allows fitting an arbitrary mesh into a bush at hand. A trimming planning method considers the available degrees of freedom of the robot arm to achieve effective cutting motions. The performance of the mesh fitting module is assessed in multiple experiments involving both artificial and real plants with a variety of shapes. The trimming accuracy of the overall approach is quantitatively evaluated by inspecting the bush pointcloud before and after robotic trimming, and measuring the change in the deviation from the originally computed target mesh.

...read moreread less

Posted Content•

DeepUSPS: Deep Robust Unsupervised Saliency Prediction With Self-Supervision.

[...]

Duc Tam Nguyen¹, Maximilian Dax, Chaithanya Kumar Mummadi, Thi Phuong Nhung Ngo, Thi Hoai Phuong Nguyen, Zhongyu Lou, Thomas Brox - Show less +3 more•Institutions (1)

Bosch¹

28 Sep 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: Li et al. as mentioned in this paper proposed a two-stage self-supervision method for robust unsupervised object saliency prediction, where the first stage involves refinement of the noisy pseudo labels generated from different handcrafted methods.

...read moreread less

Abstract: Deep neural network (DNN) based salient object detection in images based on high-quality labels is expensive. Alternative unsupervised approaches rely on careful selection of multiple handcrafted saliency methods to generate noisy pseudo-ground-truth labels. In this work, we propose a two-stage mechanism for robust unsupervised object saliency prediction, where the first stage involves refinement of the noisy pseudo labels generated from different handcrafted methods. Each handcrafted method is substituted by a deep network that learns to generate the pseudo labels. These labels are refined incrementally in multiple iterations via our proposed self-supervision technique. In the second stage, the refined labels produced from multiple networks representing multiple saliency methods are used to train the actual saliency detection network. We show that this self-learning procedure outperforms all the existing unsupervised methods over different datasets. Results are even comparable to those of fully-supervised state-of-the-art approaches. The code is available at this https URL .

...read moreread less

Posted Content•

Semi-Supervised Semantic Segmentation with High- and Low-level Consistency

[...]

Sudhanshu Mittal¹, Maxim Tatarchenko¹, Thomas Brox¹•Institutions (1)

University of Freiburg¹

15 Aug 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a dual-branch approach is proposed for semi-supervised semantic segmentation, which learns from limited pixel-wise annotated samples while exploiting additional annotation-free images.

...read moreread less

Abstract: The ability to understand visual information from limited labeled data is an important aspect of machine learning. While image-level classification has been extensively studied in a semi-supervised setting, dense pixel-level classification with limited data has only drawn attention recently. In this work, we propose an approach for semi-supervised semantic segmentation that learns from limited pixel-wise annotated samples while exploiting additional annotation-free images. It uses two network branches that link semi-supervised classification with semi-supervised segmentation including self-training. The dual-branch approach reduces both the low-level and the high-level artifacts typical when training with few labels. The approach attains significant improvement over existing methods, especially when trained with very few labeled samples. On several standard benchmarks - PASCAL VOC 2012, PASCAL-Context, and Cityscapes - the approach achieves new state-of-the-art in semi-supervised learning.

...read moreread less

Posted Content•

Adaptive Curriculum Generation from Demonstrations for Sim-to-Real Visuomotor Control.

[...]

Lukas Hermann¹, Max Argus¹, Andreas Eitel¹, Artemij Amiranashvili¹, Wolfram Burgard¹, Thomas Brox¹ - Show less +2 more•Institutions (1)

University of Freiburg¹

17 Oct 2019-arXiv: Robotics

TL;DR: This article proposed Adaptive Curriculum Generation from Demonstrations (ACGD) for reinforcement learning in the presence of sparse rewards, which adaptively sets the appropriate task difficulty for the learner by controlling where to sample from the demonstration trajectories and which set of simulation parameters to use.

...read moreread less

Abstract: We propose Adaptive Curriculum Generation from Demonstrations (ACGD) for reinforcement learning in the presence of sparse rewards. Rather than designing shaped reward functions, ACGD adaptively sets the appropriate task difficulty for the learner by controlling where to sample from the demonstration trajectories and which set of simulation parameters to use. We show that training vision-based control policies in simulation while gradually increasing the difficulty of the task via ACGD improves the policy transfer to the real world. The degree of domain randomization is also gradually increased through the task difficulty. We demonstrate zero-shot transfer for two real-world manipulation tasks: pick-and-stow and block stacking. A video showing the results can be found at this https URL

...read moreread less

Book Chapter•DOI•

Group Pruning using a Bounded-Lp norm for Group Gating and Regularization

[...]

Chaithanya Kumar Mummadi¹, Tim Genewein¹, Dan Zhang¹, Thomas Brox², Volker Fischer¹ - Show less +1 more•Institutions (2)

Bosch¹, University of Freiburg²

10 Sep 2019

TL;DR: A gating factor after every convolutional layer to induce channel level sparsity, encouraging insignificant channels to become exactly zero is proposed, and a bounded variant of the $\ell _1$ regularizer is introduced, which interpolates between $\ell_0$-norms to retain performance of the network at higher pruning rates.

...read moreread less

Abstract: Deep neural networks achieve state-of-the-art results on several tasks while increasing in complexity. It has been shown that neural networks can be pruned during training by imposing sparsity inducing regularizers. In this paper, we investigate two techniques for group-wise pruning during training in order to improve network efficiency. We propose a gating factor after every convolutional layer to induce channel level sparsity, encouraging insignificant channels to become exactly zero. Further, we introduce and analyse a bounded variant of the $\ell _1$ regularizer, which interpolates between $\ell _1$ and $\ell _0$-norms to retain performance of the network at higher pruning rates. To underline effectiveness of the proposed methods, we show that the number of parameters of ResNet-164, DenseNet-40 and MobileNetV2 can be reduced down by $30\%$, $69\%$, and $75\%$ on CIFAR100 respectively without a significant drop in accuracy. We achieve state-of-the-art pruning results for ResNet-50 with higher accuracy on ImageNet. Furthermore, we show that the light weight MobileNetV2 can further be compressed on ImageNet without a significant drop in performance .

...read moreread less

Showing papers by "Thomas Brox published in 2019"