scispace - formally typeset
Search or ask a question

Showing papers by "Thomas M. Breuel published in 2017"


Proceedings Article
02 Mar 2017
TL;DR: This work makes a shared-latent space assumption and proposes an unsupervised image-to-image translation framework based on Coupled GANs that achieves state-of-the-art performance on benchmark datasets.
Abstract: Unsupervised image-to-image translation aims at learning a joint distribution of images in different domains by using images from the marginal distributions in individual domains. Since there exists an infinite set of joint distributions that can arrive the given marginal distributions, one could infer nothing about the joint distribution from the marginal distributions without additional assumptions. To address the problem, we make a shared-latent space assumption and propose an unsupervised image-to-image translation framework based on Coupled GANs. We compare the proposed framework with competing approaches and present high quality image translation results on various challenging unsupervised image translation tasks, including street scene image translation, animal image translation, and face image translation. We also apply the proposed framework to domain adaptation and achieve state-of-the-art performance on benchmark datasets. Code and additional results are available in https://github.com/mingyuliutw/unit.

1,496 citations


Posted Content
TL;DR: In this paper, the authors make a shared-latent space assumption and propose an unsupervised image-to-image translation framework based on Coupled GANs, which achieves state-of-the-art performance on benchmark datasets.
Abstract: Unsupervised image-to-image translation aims at learning a joint distribution of images in different domains by using images from the marginal distributions in individual domains. Since there exists an infinite set of joint distributions that can arrive the given marginal distributions, one could infer nothing about the joint distribution from the marginal distributions without additional assumptions. To address the problem, we make a shared-latent space assumption and propose an unsupervised image-to-image translation framework based on Coupled GANs. We compare the proposed framework with competing approaches and present high quality image translation results on various challenging unsupervised image translation tasks, including street scene image translation, animal image translation, and face image translation. We also apply the proposed framework to domain adaptation and achieve state-of-the-art performance on benchmark datasets. Code and additional results are available in this https URL .

538 citations


Proceedings ArticleDOI
01 Nov 2017
TL;DR: A new, open-source line recognizer combining deep convolutional networks and LSTMs, implemented in PyTorch and using CUDA kernels for speed.
Abstract: Optical character recognition (OCR) has made great progress in recent years due to the introduction of recognition engines based on recurrent neural networks, in particular the LSTM architecture. This paper describes a new, open-source line recognizer combining deep convolutional networks and LSTMs, implemented in PyTorch and using CUDA kernels for speed. Experimental results are given comparing the performance of different combinations of geometric normalization, 1D LSTM, deep convolutional networks, and 2D LSTM networks. An important result is that while deep hybrid networks without geometric text line normalization outperform 1D LSTM networks with geometric normalization, deep hybrid networks with geometric text line normalization still outperform all other networks. The best networks achieve a throughput of more than 100 lines per second and test set error rates on UW3 of 0.25%.

86 citations


Proceedings ArticleDOI
01 Nov 2017
TL;DR: It is demonstrated that relatively simple networks are capable of fast, reliable text line segmentation and document layout analysis even on complex and noisy inputs, without manual parameter tuning or heuristics.
Abstract: Analyzing and segmenting scanned documents is an important step in optical character recognition The problem is difficult because of the complexity of 2D layouts, the small tolerance of segmentation errors in the output, and the relatively small amount of labeled training data available Traditional approaches have relied on a combination of sophisticated geometric algorithms, domain knowledge, heuristics, and carefully tuned parameters This paper describes the use of deep neural networks, in particular a combination of convolutional and multidimensional LSTM networks, for document image and demonstrates that relatively simple networks are capable of fast, reliable text line segmentation and document layout analysis even on complex and noisy inputs, without manual parameter tuning or heuristics The method is easily adaptable to new datasets by retraining and an open source implementation is available

22 citations