scispace - formally typeset
Search or ask a question
Book ChapterDOI

Artistic Style Transfer for Videos

12 Sep 2016-pp 26-36
TL;DR: This work presents an approach that transfers the style from one image (for example, a painting) to a whole video sequence, and makes use of recent advances in style transfer in still images and proposes new initializations and loss functions applicable to videos.
Abstract: In the past, manually re-drawing an image in a certain artistic style required a professional artist and a long time. Doing this for a video sequence single-handed was beyond imagination. Nowadays computers provide new possibilities. We present an approach that transfers the style from one image (for example, a painting) to a whole video sequence. We make use of recent advances in style transfer in still images and propose new initializations and loss functions applicable to videos. This allows us to generate consistent and stable stylized video sequences, even in cases with large motion and strong occlusion. We show that the proposed method clearly outperforms simpler baselines both qualitatively and quantitatively.
Citations
More filters
Proceedings ArticleDOI
01 Oct 2017
TL;DR: In this paper, adaptive instance normalization (AdaIN) is proposed to align the mean and variance of the content features with those of the style features, which enables arbitrary style transfer in real-time.
Abstract: Gatys et al. recently introduced a neural algorithm that renders a content image in the style of another image, achieving so-called style transfer. However, their framework requires a slow iterative optimization process, which limits its practical application. Fast approximations with feed-forward neural networks have been proposed to speed up neural style transfer. Unfortunately, the speed improvement comes at a cost: the network is usually tied to a fixed set of styles and cannot adapt to arbitrary new styles. In this paper, we present a simple yet effective approach that for the first time enables arbitrary style transfer in real-time. At the heart of our method is a novel adaptive instance normalization (AdaIN) layer that aligns the mean and variance of the content features with those of the style features. Our method achieves speed comparable to the fastest existing approach, without the restriction to a pre-defined set of styles. In addition, our approach allows flexible user controls such as content-style trade-off, style interpolation, color & spatial controls, all using a single feed-forward neural network.

2,266 citations

Posted Content
TL;DR: This paper presents a simple yet effective approach that for the first time enables arbitrary style transfer in real-time, comparable to the fastest existing approach, without the restriction to a pre-defined set of styles.
Abstract: Gatys et al. recently introduced a neural algorithm that renders a content image in the style of another image, achieving so-called style transfer. However, their framework requires a slow iterative optimization process, which limits its practical application. Fast approximations with feed-forward neural networks have been proposed to speed up neural style transfer. Unfortunately, the speed improvement comes at a cost: the network is usually tied to a fixed set of styles and cannot adapt to arbitrary new styles. In this paper, we present a simple yet effective approach that for the first time enables arbitrary style transfer in real-time. At the heart of our method is a novel adaptive instance normalization (AdaIN) layer that aligns the mean and variance of the content features with those of the style features. Our method achieves speed comparable to the fastest existing approach, without the restriction to a pre-defined set of styles. In addition, our approach allows flexible user controls such as content-style trade-off, style interpolation, color & spatial controls, all using a single feed-forward neural network.

1,286 citations


Cites background from "Artistic Style Transfer for Videos"

  • ...[45] improved the quality 2017 IEEE International Conference on Computer Vision...

    [...]

Proceedings ArticleDOI
14 Oct 2017
TL;DR: DeepXplore efficiently finds thousands of incorrect corner case behaviors in state-of-the-art DL models with thousands of neurons trained on five popular datasets including ImageNet and Udacity self-driving challenge data.
Abstract: Deep learning (DL) systems are increasingly deployed in safety- and security-critical domains including self-driving cars and malware detection, where the correctness and predictability of a system's behavior for corner case inputs are of great importance Existing DL testing depends heavily on manually labeled data and therefore often fails to expose erroneous behaviors for rare inputs We design, implement, and evaluate DeepXplore, the first whitebox framework for systematically testing real-world DL systems First, we introduce neuron coverage for systematically measuring the parts of a DL system exercised by test inputs Next, we leverage multiple DL systems with similar functionality as cross-referencing oracles to avoid manual checking Finally, we demonstrate how finding inputs for DL systems that both trigger many differential behaviors and achieve high neuron coverage can be represented as a joint optimization problem and solved efficiently using gradient-based search techniques DeepXplore efficiently finds thousands of incorrect corner case behaviors (eg, self-driving cars crashing into guard rails and malware masquerading as benign software) in state-of-the-art DL models with thousands of neurons trained on five popular datasets including ImageNet and Udacity self-driving challenge data For all tested DL models, on average, DeepXplore generated one test input demonstrating incorrect behavior within one second while running only on a commodity laptop We further show that the test inputs generated by DeepXplore can also be used to retrain the corresponding DL model to improve the model's accuracy by up to 3%

884 citations


Cites methods from "Artistic Style Transfer for Videos"

  • ...Gradients have been used in the past for visualizing activation of different intermediate layers of a DNN for tasks like object segmentation [44, 66], artistic style transfer between two images [24, 43, 59], etc....

    [...]

Proceedings ArticleDOI
TL;DR: DeepXplore as discussed by the authors is a white box framework for systematically testing real-world deep learning (DL) systems, which leverages multiple DL systems with similar functionality as cross-referencing oracles to avoid manual checking.
Abstract: Deep learning (DL) systems are increasingly deployed in safety- and security-critical domains including self-driving cars and malware detection, where the correctness and predictability of a system's behavior for corner case inputs are of great importance. Existing DL testing depends heavily on manually labeled data and therefore often fails to expose erroneous behaviors for rare inputs. We design, implement, and evaluate DeepXplore, the first whitebox framework for systematically testing real-world DL systems. First, we introduce neuron coverage for systematically measuring the parts of a DL system exercised by test inputs. Next, we leverage multiple DL systems with similar functionality as cross-referencing oracles to avoid manual checking. Finally, we demonstrate how finding inputs for DL systems that both trigger many differential behaviors and achieve high neuron coverage can be represented as a joint optimization problem and solved efficiently using gradient-based search techniques. DeepXplore efficiently finds thousands of incorrect corner case behaviors (e.g., self-driving cars crashing into guard rails and malware masquerading as benign software) in state-of-the-art DL models with thousands of neurons trained on five popular datasets including ImageNet and Udacity self-driving challenge data. For all tested DL models, on average, DeepXplore generated one test input demonstrating incorrect behavior within one second while running only on a commodity laptop. We further show that the test inputs generated by DeepXplore can also be used to retrain the corresponding DL model to improve the model's accuracy by up to 3%.

651 citations

Posted Content
TL;DR: In this article, a video-to-video synthesis approach under the generative adversarial learning framework is proposed, which achieves high-resolution, photorealistic, temporally coherent video results on a diverse set of input formats.
Abstract: We study the problem of video-to-video synthesis, whose goal is to learn a mapping function from an input source video (e.g., a sequence of semantic segmentation masks) to an output photorealistic video that precisely depicts the content of the source video. While its image counterpart, the image-to-image synthesis problem, is a popular topic, the video-to-video synthesis problem is less explored in the literature. Without understanding temporal dynamics, directly applying existing image synthesis approaches to an input video often results in temporally incoherent videos of low visual quality. In this paper, we propose a novel video-to-video synthesis approach under the generative adversarial learning framework. Through carefully-designed generator and discriminator architectures, coupled with a spatio-temporal adversarial objective, we achieve high-resolution, photorealistic, temporally coherent video results on a diverse set of input formats including segmentation masks, sketches, and poses. Experiments on multiple benchmarks show the advantage of our method compared to strong baselines. In particular, our model is capable of synthesizing 2K resolution videos of street scenes up to 30 seconds long, which significantly advances the state-of-the-art of video synthesis. Finally, we apply our approach to future video prediction, outperforming several state-of-the-art competing systems.

385 citations

References
More filters
Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations


"Artistic Style Transfer for Videos" refers methods in this paper

  • ...We used the following layers of the VGG-19 network [10] for computing the losses: relu4 2 for the content and relu1 1,relu2 1,relu3 1,relu4 1,relu5 1 for the style....

    [...]

  • ...Their approach uses high-level feature representations of the images from hidden layers of the VGG convolutional network [10] to separate and reassemble content and style....

    [...]

Book ChapterDOI
07 Oct 2012
TL;DR: A new optical flow data set derived from the open source 3D animated short film Sintel is introduced, which has important features not present in the popular Middlebury flow evaluation: long sequences, large motions, specular reflections, motion blur, defocus blur, and atmospheric effects.
Abstract: Ground truth optical flow is difficult to measure in real scenes with natural motion As a result, optical flow data sets are restricted in terms of size, complexity, and diversity, making optical flow algorithms difficult to train and test on realistic data We introduce a new optical flow data set derived from the open source 3D animated short film Sintel This data set has important features not present in the popular Middlebury flow evaluation: long sequences, large motions, specular reflections, motion blur, defocus blur, and atmospheric effects Because the graphics data that generated the movie is open source, we are able to render scenes under conditions of varying complexity to evaluate where existing flow algorithms fail We evaluate several recent optical flow algorithms and find that current highly-ranked methods on the Middlebury evaluation have difficulty with this more complex data set suggesting further research on optical flow estimation is needed To validate the use of synthetic data, we compare the image- and flow-statistics of Sintel to those of real films and videos and show that they are similar The data set, metrics, and evaluation website are publicly available

1,742 citations


"Artistic Style Transfer for Videos" refers methods in this paper

  • ...We evaluated our short-term temporal loss on 5 diverse scenes from the MPI Sintel Dataset [1], with 20 to 50 frames of resolution 1024 × 436 pixels per scene, and 6 famous paintings (shown in section 7....

    [...]

  • ...We evaluated our short-term temporal loss on 5 diverse scenes from the MPI Sintel Dataset [1], with 20 to 50 frames of resolution 1024 × 436 pixels per scene, and 6 famous paintings (shown in section 7.1) as style images....

    [...]

Proceedings Article
01 Jan 2011
TL;DR: Torch7 is a versatile numeric computing framework and machine learning library that extends Lua that can easily be interfaced to third-party software thanks to Lua’s light interface.
Abstract: Torch7 is a versatile numeric computing framework and machine learning library that extends Lua. Its goal is to provide a flexible environment to design and train learning machines. Flexibility is obtained via Lua, an extremely lightweight scripting language. High performance is obtained via efficient OpenMP/SSE and CUDA implementations of low-level numeric routines. Torch7 can easily be interfaced to third-party software thanks to Lua’s light interface.

1,602 citations


"Artistic Style Transfer for Videos" refers methods in this paper

  • ...Our implementation1 is based on the Torch [2] implementation called neuralstyle2....

    [...]

  • ...Our implementation(1) is based on the Torch [2] implementation called neuralstyle(2)....

    [...]

Proceedings ArticleDOI
01 Dec 2013
TL;DR: This work proposes a descriptor matching algorithm, tailored to the optical flow problem, that allows to boost performance on fast motions, and sets a new state-of-the-art on the MPI-Sintel dataset.
Abstract: Optical flow computation is a key component in many computer vision systems designed for tasks such as action detection or activity recognition. However, despite several major advances over the last decade, handling large displacement in optical flow remains an open problem. Inspired by the large displacement optical flow of Brox and Malik, our approach, termed Deep Flow, blends a matching algorithm with a variational approach for optical flow. We propose a descriptor matching algorithm, tailored to the optical flow problem, that allows to boost performance on fast motions. The matching algorithm builds upon a multi-stage architecture with 6 layers, interleaving convolutions and max-pooling, a construction akin to deep convolutional nets. Using dense sampling, it allows to efficiently retrieve quasi-dense correspondences, and enjoys a built-in smoothing effect on descriptors matches, a valuable asset for integration into an energy minimization framework for optical flow estimation. Deep Flow efficiently handles large displacements occurring in realistic videos, and shows competitive performance on optical flow benchmarks. Furthermore, it sets a new state-of-the-art on the MPI-Sintel dataset.

1,099 citations

Posted Content
TL;DR: This work introduces an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality and offers a path forward to an algorithmic understanding of how humans create and perceive artistic imagery.
Abstract: In fine art, especially painting, humans have mastered the skill to create unique visual experiences through composing a complex interplay between the content and style of an image. Thus far the algorithmic basis of this process is unknown and there exists no artificial system with similar capabilities. However, in other key areas of visual perception such as object and face recognition near-human performance was recently demonstrated by a class of biologically inspired vision models called Deep Neural Networks. Here we introduce an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality. The system uses neural representations to separate and recombine content and style of arbitrary images, providing a neural algorithm for the creation of artistic images. Moreover, in light of the striking similarities between performance-optimised artificial neural networks and biological vision, our work offers a path forward to an algorithmic understanding of how humans create and perceive artistic imagery.

1,019 citations


"Artistic Style Transfer for Videos" refers background or result in this paper

  • ...[3] and extends style transfer to video sequences....

    [...]

  • ...[3] showed remarkable results by using the VGG-19 deep neural network for style transfer....

    [...]

  • ...[3] proposed a novel approach using neural networks to capture the style of artistic images and transfer it to real world photographs....

    [...]