FiLM: Visual Reasoning with a General Conditioning Layer

Open AccessPosted Content

FiLM: Visual Reasoning with a General Conditioning Layer

Ethan Perez, +5 more

- 22 Sep 2017 -

arXiv: Computer Vision and Pattern Recog...

Chats0

TLDR

Feature-wise linear modulation (FiLM) as mentioned in this paper is a general-purpose conditioning method for neural networks, which can influence neural network computation via a simple, feature-wise affine transformation based on conditioning information.

Abstract:

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Grandmaster level in StarCraft II using multi-agent reinforcement learning.

Oriol Vinyals, +41 more

- 30 Oct 2019 -

Nature

TL;DR: The agent, AlphaStar, is evaluated, which uses a multi-agent reinforcement learning algorithm and has reached Grandmaster level, ranking among the top 0.2% of human players for the real-time strategy game StarCraft II.

...read moreread less

Posted Content

Large Scale GAN Training for High Fidelity Natural Image Synthesis

Andrew Brock, +2 more

- 28 Sep 2018 -

arXiv: Learning

TL;DR: BigGAN as mentioned in this paper applies orthogonal regularization to the generator, allowing fine control over the trade-off between sample fidelity and variety by reducing the variance of the generator's input, leading to models which set the new state of the art in class-conditional image synthesis.

...read moreread less

Proceedings Article

TADAM: Task dependent adaptive metric for improved few-shot learning

Boris N. Oreshkin, +2 more

TL;DR: This work identifies that metric scaling and metric task conditioning are important to improve the performance of few-shot algorithms and proposes and empirically test a practical end-to-end optimization procedure based on auxiliary task co-training to learn a task-dependent metric space.

...read moreread less

Proceedings ArticleDOI

Meta-Transfer Learning for Few-Shot Learning

Qianru Sun, +3 more

TL;DR: In this paper, the authors proposed a meta-transfer learning approach to adapt a base-learner to a new task for which only a few labeled samples are available, which learns scaling and shifting functions of DNN weights for each task.

...read moreread less

Posted Content

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

Saining Xie, +4 more

- 13 Dec 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: It is shown that it is possible to replace many of the 3D convolutions by low-cost 2D convolution, suggesting that temporal representation learning on high-level “semantic” features is more useful.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less