Top 4 papers published by Thomas M. Breuel from Nvidia in 2017

Proceedings Article•

Unsupervised Image-to-Image Translation Networks

[...]

Ming-Yu Liu¹, Thomas M. Breuel², Jan Kautz³•Institutions (3)

National Tsing Hua University¹, Kaiserslautern University of Technology², Nvidia³

02 Mar 2017

TL;DR: This work makes a shared-latent space assumption and proposes an unsupervised image-to-image translation framework based on Coupled GANs that achieves state-of-the-art performance on benchmark datasets.

...read moreread less

Abstract: Unsupervised image-to-image translation aims at learning a joint distribution of images in different domains by using images from the marginal distributions in individual domains. Since there exists an infinite set of joint distributions that can arrive the given marginal distributions, one could infer nothing about the joint distribution from the marginal distributions without additional assumptions. To address the problem, we make a shared-latent space assumption and propose an unsupervised image-to-image translation framework based on Coupled GANs. We compare the proposed framework with competing approaches and present high quality image translation results on various challenging unsupervised image translation tasks, including street scene image translation, animal image translation, and face image translation. We also apply the proposed framework to domain adaptation and achieve state-of-the-art performance on benchmark datasets. Code and additional results are available in https://github.com/mingyuliutw/unit.

...read moreread less

1,496 citations

Posted Content•

Unsupervised Image-to-Image Translation Networks

[...]

Ming-Yu Liu¹, Thomas M. Breuel², Jan Kautz³•Institutions (3)

National Tsing Hua University¹, Kaiserslautern University of Technology², Nvidia³

02 Mar 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors make a shared-latent space assumption and propose an unsupervised image-to-image translation framework based on Coupled GANs, which achieves state-of-the-art performance on benchmark datasets.

...read moreread less

Abstract: Unsupervised image-to-image translation aims at learning a joint distribution of images in different domains by using images from the marginal distributions in individual domains. Since there exists an infinite set of joint distributions that can arrive the given marginal distributions, one could infer nothing about the joint distribution from the marginal distributions without additional assumptions. To address the problem, we make a shared-latent space assumption and propose an unsupervised image-to-image translation framework based on Coupled GANs. We compare the proposed framework with competing approaches and present high quality image translation results on various challenging unsupervised image translation tasks, including street scene image translation, animal image translation, and face image translation. We also apply the proposed framework to domain adaptation and achieve state-of-the-art performance on benchmark datasets. Code and additional results are available in this https URL .

...read moreread less

538 citations

Proceedings Article•DOI•

High Performance Text Recognition Using a Hybrid Convolutional-LSTM Implementation

[...]

Thomas M. Breuel¹•Institutions (1)

Kaiserslautern University of Technology¹

01 Nov 2017

TL;DR: A new, open-source line recognizer combining deep convolutional networks and LSTMs, implemented in PyTorch and using CUDA kernels for speed.

...read moreread less

Abstract: Optical character recognition (OCR) has made great progress in recent years due to the introduction of recognition engines based on recurrent neural networks, in particular the LSTM architecture. This paper describes a new, open-source line recognizer combining deep convolutional networks and LSTMs, implemented in PyTorch and using CUDA kernels for speed. Experimental results are given comparing the performance of different combinations of geometric normalization, 1D LSTM, deep convolutional networks, and 2D LSTM networks. An important result is that while deep hybrid networks without geometric text line normalization outperform 1D LSTM networks with geometric normalization, deep hybrid networks with geometric text line normalization still outperform all other networks. The best networks achieve a throughput of more than 100 lines per second and test set error rates on UW3 of 0.25%.

...read moreread less

86 citations

Proceedings Article•DOI•

Robust, Simple Page Segmentation Using Hybrid Convolutional MDLSTM Networks

[...]

Thomas M. Breuel¹•Institutions (1)

Kaiserslautern University of Technology¹

01 Nov 2017

TL;DR: It is demonstrated that relatively simple networks are capable of fast, reliable text line segmentation and document layout analysis even on complex and noisy inputs, without manual parameter tuning or heuristics.

...read moreread less

Abstract: Analyzing and segmenting scanned documents is an important step in optical character recognition The problem is difficult because of the complexity of 2D layouts, the small tolerance of segmentation errors in the output, and the relatively small amount of labeled training data available Traditional approaches have relied on a combination of sophisticated geometric algorithms, domain knowledge, heuristics, and carefully tuned parameters This paper describes the use of deep neural networks, in particular a combination of convolutional and multidimensional LSTM networks, for document image and demonstrates that relatively simple networks are capable of fast, reliable text line segmentation and document layout analysis even on complex and noisy inputs, without manual parameter tuning or heuristics The method is easily adaptable to new datasets by retraining and an open source implementation is available

...read moreread less

22 citations

Showing papers by "Thomas M. Breuel published in 2017"