Artistic Style Transfer for Videos

doi:10.1007/978-3-319-45886-1_3

Home
/
Papers
/
Artistic Style Transfer for Videos

Book Chapter•DOI•

Artistic Style Transfer for Videos

Manuel Ruder¹, Alexey Dosovitskiy¹, Thomas Brox¹•Institutions (1)

University of Freiburg¹

12 Sep 2016-pp 26-36

TL;DR: This work presents an approach that transfers the style from one image (for example, a painting) to a whole video sequence, and makes use of recent advances in style transfer in still images and proposes new initializations and loss functions applicable to videos.

read less

Abstract: In the past, manually re-drawing an image in a certain artistic style required a professional artist and a long time. Doing this for a video sequence single-handed was beyond imagination. Nowadays computers provide new possibilities. We present an approach that transfers the style from one image (for example, a painting) to a whole video sequence. We make use of recent advances in style transfer in still images and propose new initializations and loss functions applicable to videos. This allows us to generate consistent and stable stylized video sequences, even in cases with large motion and strong occlusion. We show that the proposed method clearly outperforms simpler baselines both qualitatively and quantitatively.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization

[...]

Xun Huang¹, Serge Belongie¹•Institutions (1)

Cornell University¹

01 Oct 2017

TL;DR: In this paper, adaptive instance normalization (AdaIN) is proposed to align the mean and variance of the content features with those of the style features, which enables arbitrary style transfer in real-time.

...read moreread less

Abstract: Gatys et al. recently introduced a neural algorithm that renders a content image in the style of another image, achieving so-called style transfer. However, their framework requires a slow iterative optimization process, which limits its practical application. Fast approximations with feed-forward neural networks have been proposed to speed up neural style transfer. Unfortunately, the speed improvement comes at a cost: the network is usually tied to a fixed set of styles and cannot adapt to arbitrary new styles. In this paper, we present a simple yet effective approach that for the first time enables arbitrary style transfer in real-time. At the heart of our method is a novel adaptive instance normalization (AdaIN) layer that aligns the mean and variance of the content features with those of the style features. Our method achieves speed comparable to the fastest existing approach, without the restriction to a pre-defined set of styles. In addition, our approach allows flexible user controls such as content-style trade-off, style interpolation, color & spatial controls, all using a single feed-forward neural network.

...read moreread less

2,266 citations

Posted Content•

Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization

[...]

Xun Huang¹, Serge Belongie¹•Institutions (1)

Cornell University¹

20 Mar 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper presents a simple yet effective approach that for the first time enables arbitrary style transfer in real-time, comparable to the fastest existing approach, without the restriction to a pre-defined set of styles.

...read moreread less

1,286 citations

Cites background from "Artistic Style Transfer for Videos"

...[45] improved the quality 2017 IEEE International Conference on Computer Vision...
[...]

Proceedings Article•DOI•

DeepXplore: Automated Whitebox Testing of Deep Learning Systems

[...]

Kexin Pei¹, Yinzhi Cao², Junfeng Yang¹, Suman Jana¹•Institutions (2)

Columbia University¹, Lehigh University²

14 Oct 2017

TL;DR: DeepXplore efficiently finds thousands of incorrect corner case behaviors in state-of-the-art DL models with thousands of neurons trained on five popular datasets including ImageNet and Udacity self-driving challenge data.

...read moreread less

Abstract: Deep learning (DL) systems are increasingly deployed in safety- and security-critical domains including self-driving cars and malware detection, where the correctness and predictability of a system's behavior for corner case inputs are of great importance Existing DL testing depends heavily on manually labeled data and therefore often fails to expose erroneous behaviors for rare inputs We design, implement, and evaluate DeepXplore, the first whitebox framework for systematically testing real-world DL systems First, we introduce neuron coverage for systematically measuring the parts of a DL system exercised by test inputs Next, we leverage multiple DL systems with similar functionality as cross-referencing oracles to avoid manual checking Finally, we demonstrate how finding inputs for DL systems that both trigger many differential behaviors and achieve high neuron coverage can be represented as a joint optimization problem and solved efficiently using gradient-based search techniques DeepXplore efficiently finds thousands of incorrect corner case behaviors (eg, self-driving cars crashing into guard rails and malware masquerading as benign software) in state-of-the-art DL models with thousands of neurons trained on five popular datasets including ImageNet and Udacity self-driving challenge data For all tested DL models, on average, DeepXplore generated one test input demonstrating incorrect behavior within one second while running only on a commodity laptop We further show that the test inputs generated by DeepXplore can also be used to retrain the corresponding DL model to improve the model's accuracy by up to 3%

...read moreread less

884 citations

Cites methods from "Artistic Style Transfer for Videos"

...Gradients have been used in the past for visualizing activation of different intermediate layers of a DNN for tasks like object segmentation [44, 66], artistic style transfer between two images [24, 43, 59], etc....
[...]

Proceedings Article•DOI•

DeepXplore: Automated Whitebox Testing of Deep Learning Systems

[...]

Kexin Pei¹, Yinzhi Cao², Junfeng Yang¹, Suman Jana¹•Institutions (2)

Columbia University¹, Johns Hopkins University²

18 May 2017-arXiv: Learning

TL;DR: DeepXplore as discussed by the authors is a white box framework for systematically testing real-world deep learning (DL) systems, which leverages multiple DL systems with similar functionality as cross-referencing oracles to avoid manual checking.

...read moreread less

Abstract: Deep learning (DL) systems are increasingly deployed in safety- and security-critical domains including self-driving cars and malware detection, where the correctness and predictability of a system's behavior for corner case inputs are of great importance. Existing DL testing depends heavily on manually labeled data and therefore often fails to expose erroneous behaviors for rare inputs. We design, implement, and evaluate DeepXplore, the first whitebox framework for systematically testing real-world DL systems. First, we introduce neuron coverage for systematically measuring the parts of a DL system exercised by test inputs. Next, we leverage multiple DL systems with similar functionality as cross-referencing oracles to avoid manual checking. Finally, we demonstrate how finding inputs for DL systems that both trigger many differential behaviors and achieve high neuron coverage can be represented as a joint optimization problem and solved efficiently using gradient-based search techniques. DeepXplore efficiently finds thousands of incorrect corner case behaviors (e.g., self-driving cars crashing into guard rails and malware masquerading as benign software) in state-of-the-art DL models with thousands of neurons trained on five popular datasets including ImageNet and Udacity self-driving challenge data. For all tested DL models, on average, DeepXplore generated one test input demonstrating incorrect behavior within one second while running only on a commodity laptop. We further show that the test inputs generated by DeepXplore can also be used to retrain the corresponding DL model to improve the model's accuracy by up to 3%.

...read moreread less

651 citations

Posted Content•

Video-to-Video Synthesis

[...]

Ting-Chun Wang¹, Ming-Yu Liu¹, Jun-Yan Zhu², Guilin Liu³, Andrew Tao¹, Jan Kautz¹, Bryan Catanzaro¹ - Show less +3 more•Institutions (3)

Nvidia¹, University of California, Berkeley², Adobe Systems³

20 Aug 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a video-to-video synthesis approach under the generative adversarial learning framework is proposed, which achieves high-resolution, photorealistic, temporally coherent video results on a diverse set of input formats.

...read moreread less

Abstract: We study the problem of video-to-video synthesis, whose goal is to learn a mapping function from an input source video (e.g., a sequence of semantic segmentation masks) to an output photorealistic video that precisely depicts the content of the source video. While its image counterpart, the image-to-image synthesis problem, is a popular topic, the video-to-video synthesis problem is less explored in the literature. Without understanding temporal dynamics, directly applying existing image synthesis approaches to an input video often results in temporally incoherent videos of low visual quality. In this paper, we propose a novel video-to-video synthesis approach under the generative adversarial learning framework. Through carefully-designed generator and discriminator architectures, coupled with a spatio-temporal adversarial objective, we achieve high-resolution, photorealistic, temporally coherent video results on a diverse set of input formats including segmentation masks, sketches, and poses. Experiments on multiple benchmarks show the advantage of our method compared to strong baselines. In particular, our model is capable of synthesizing 2K resolution videos of street scenes up to 30 seconds long, which significantly advances the state-of-the-art of video synthesis. Finally, we apply our approach to future video prediction, outperforming several state-of-the-art competing systems.

...read moreread less

385 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

Collapse

References

PDF

Open Access

More filters

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

...read moreread less

55,235 citations

"Artistic Style Transfer for Videos" refers methods in this paper

...We used the following layers of the VGG-19 network [10] for computing the losses: relu4 2 for the content and relu1 1,relu2 1,relu3 1,relu4 1,relu5 1 for the style....
[...]
...Their approach uses high-level feature representations of the images from hidden layers of the VGG convolutional network [10] to separate and reassemble content and style....
[...]

Book Chapter•DOI•

A naturalistic open source movie for optical flow evaluation

[...]

Daniel J. Butler¹, Jonas Wulff², Garrett B. Stanley³, Michael J. Black²•Institutions (3)

University of Washington¹, Max Planck Society², Georgia Institute of Technology³

07 Oct 2012

TL;DR: A new optical flow data set derived from the open source 3D animated short film Sintel is introduced, which has important features not present in the popular Middlebury flow evaluation: long sequences, large motions, specular reflections, motion blur, defocus blur, and atmospheric effects.

...read moreread less

Abstract: Ground truth optical flow is difficult to measure in real scenes with natural motion As a result, optical flow data sets are restricted in terms of size, complexity, and diversity, making optical flow algorithms difficult to train and test on realistic data We introduce a new optical flow data set derived from the open source 3D animated short film Sintel This data set has important features not present in the popular Middlebury flow evaluation: long sequences, large motions, specular reflections, motion blur, defocus blur, and atmospheric effects Because the graphics data that generated the movie is open source, we are able to render scenes under conditions of varying complexity to evaluate where existing flow algorithms fail We evaluate several recent optical flow algorithms and find that current highly-ranked methods on the Middlebury evaluation have difficulty with this more complex data set suggesting further research on optical flow estimation is needed To validate the use of synthetic data, we compare the image- and flow-statistics of Sintel to those of real films and videos and show that they are similar The data set, metrics, and evaluation website are publicly available

...read moreread less

1,742 citations

"Artistic Style Transfer for Videos" refers methods in this paper

...We evaluated our short-term temporal loss on 5 diverse scenes from the MPI Sintel Dataset [1], with 20 to 50 frames of resolution 1024 × 436 pixels per scene, and 6 famous paintings (shown in section 7....
[...]
...We evaluated our short-term temporal loss on 5 diverse scenes from the MPI Sintel Dataset [1], with 20 to 50 frames of resolution 1024 × 436 pixels per scene, and 6 famous paintings (shown in section 7.1) as style images....
[...]

Proceedings Article•

Torch7: A Matlab-like Environment for Machine Learning

[...]

Ronan Collobert¹, Koray Kavukcuoglu², Clement Farabet²•Institutions (2)

Princeton University¹, New York University²

01 Jan 2011

TL;DR: Torch7 is a versatile numeric computing framework and machine learning library that extends Lua that can easily be interfaced to third-party software thanks to Lua’s light interface.

...read moreread less

Abstract: Torch7 is a versatile numeric computing framework and machine learning library that extends Lua. Its goal is to provide a flexible environment to design and train learning machines. Flexibility is obtained via Lua, an extremely lightweight scripting language. High performance is obtained via efficient OpenMP/SSE and CUDA implementations of low-level numeric routines. Torch7 can easily be interfaced to third-party software thanks to Lua’s light interface.

...read moreread less

1,602 citations

"Artistic Style Transfer for Videos" refers methods in this paper

...Our implementation1 is based on the Torch [2] implementation called neuralstyle2....
[...]
...Our implementation(1) is based on the Torch [2] implementation called neuralstyle(2)....
[...]

Proceedings Article•DOI•

DeepFlow: Large Displacement Optical Flow with Deep Matching

[...]

Philippe Weinzaepfel¹, Jerome Revaud¹, Zaid Harchaoui¹, Cordelia Schmid¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Dec 2013

TL;DR: This work proposes a descriptor matching algorithm, tailored to the optical flow problem, that allows to boost performance on fast motions, and sets a new state-of-the-art on the MPI-Sintel dataset.

...read moreread less

Abstract: Optical flow computation is a key component in many computer vision systems designed for tasks such as action detection or activity recognition. However, despite several major advances over the last decade, handling large displacement in optical flow remains an open problem. Inspired by the large displacement optical flow of Brox and Malik, our approach, termed Deep Flow, blends a matching algorithm with a variational approach for optical flow. We propose a descriptor matching algorithm, tailored to the optical flow problem, that allows to boost performance on fast motions. The matching algorithm builds upon a multi-stage architecture with 6 layers, interleaving convolutions and max-pooling, a construction akin to deep convolutional nets. Using dense sampling, it allows to efficiently retrieve quasi-dense correspondences, and enjoys a built-in smoothing effect on descriptors matches, a valuable asset for integration into an energy minimization framework for optical flow estimation. Deep Flow efficiently handles large displacements occurring in realistic videos, and shows competitive performance on optical flow benchmarks. Furthermore, it sets a new state-of-the-art on the MPI-Sintel dataset.

...read moreread less

1,099 citations

Posted Content•

A Neural Algorithm of Artistic Style

[...]

Leon A. Gatys¹, Alexander S. Ecker¹, Matthias Bethge¹•Institutions (1)

University of Tübingen¹

26 Aug 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work introduces an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality and offers a path forward to an algorithmic understanding of how humans create and perceive artistic imagery.

...read moreread less

Abstract: In fine art, especially painting, humans have mastered the skill to create unique visual experiences through composing a complex interplay between the content and style of an image. Thus far the algorithmic basis of this process is unknown and there exists no artificial system with similar capabilities. However, in other key areas of visual perception such as object and face recognition near-human performance was recently demonstrated by a class of biologically inspired vision models called Deep Neural Networks. Here we introduce an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality. The system uses neural representations to separate and recombine content and style of arbitrary images, providing a neural algorithm for the creation of artistic images. Moreover, in light of the striking similarities between performance-optimised artificial neural networks and biological vision, our work offers a path forward to an algorithmic understanding of how humans create and perceive artistic imagery.

...read moreread less

1,019 citations

"Artistic Style Transfer for Videos" refers background or result in this paper

...[3] and extends style transfer to video sequences....
[...]
...[3] showed remarkable results by using the VGG-19 deep neural network for style transfer....
[...]
...[3] proposed a novel approach using neural networks to capture the style of artistic images and transfer it to real world photographs....
[...]