Dynamic Inference with Neural Interpreters

Open AccessPosted Content

Dynamic Inference with Neural Interpreters

- 12 Oct 2021 -

TLDR

Neural interpreters as mentioned in this paper factorizes inference in a self-attention network as a system of modules, which are called ''functions'' and are routed through a sequence of functions in a way that is end-to-end learned.

Abstract:

Modern neural network architectures can leverage large amounts of data to generalize well within the training distribution. However, they are less capable of systematic generalization to data drawn from unseen but related distributions, a feat that is hypothesized to require compositional reasoning and reuse of knowledge. In this work, we present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules, which we call \emph{functions}. Inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. The proposed architecture can flexibly compose computation along width and depth, and lends itself well to capacity extension after training. To demonstrate the versatility of Neural Interpreters, we evaluate it in two distinct settings: image classification and visual abstract reasoning on Raven Progressive Matrices. In the former, we show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner. In the latter, we find that Neural Interpreters are competitive with respect to the state-of-the-art in terms of systematic generalization

References

PDF

Open Access

More filters

Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Posted Content

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, +11 more

- 22 Oct 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

...read moreread less

Reading Digits in Natural Images with Unsupervised Feature Learning

Yuval Netzer, +5 more

TL;DR: A new benchmark dataset for research use is introduced containing over 600,000 labeled digits cropped from Street View images, and variants of two recently proposed unsupervised feature learning methods are employed, finding that they are convincingly superior on benchmarks.

...read moreread less

Proceedings Article

Dynamic Routing Between Capsules

Sara Sabour, +2 more

TL;DR: It is shown that a discrimininatively trained, multi-layer capsule system achieves state-of-the-art performance on MNIST and is considerably better than a convolutional net at recognizing highly overlapping digits.

...read moreread less

Collapse

Dynamic Inference with Neural Interpreters

References

Attention is All you Need

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Reading Digits in Natural Images with Unsupervised Feature Learning

Dynamic Routing Between Capsules

Related Papers (5)

Generalization in Multimodal Language Learning from Simulation

Augmenting Neural Networks with First-order Logic

Explainable Neural Computation via Stack Neural Module Networks

Neural Programmer: Inducing Latent Programs with Gradient Descent

Compositional Attention Networks for Machine Reasoning