scispace - formally typeset
Search or ask a question

Showing papers by "Geoffrey E. Hinton published in 2021"


Journal ArticleDOI
TL;DR: In this paper, neural networks are used to learn the rich internal representations required for difficult tasks such as recognizing objects or understanding language, which can be used to classify objects or understand language.
Abstract: How can neural networks learn the rich internal representations required for difficult tasks such as recognizing objects or understanding language?

294 citations


Posted Content
TL;DR: The authors proposed a system called GLOM, which combines transformers, neural fields, contrastive representation learning, and capsules. But GLOM does not describe a working system, only a single idea about representation which allows advances made by several different groups to be combined into an imaginary system.
Abstract: This paper does not describe a working system. Instead, it presents a single idea about representation which allows advances made by several different groups to be combined into an imaginary system called GLOM. The advances include transformers, neural fields, contrastive representation learning, distillation and capsules. GLOM answers the question: How can a neural network with a fixed architecture parse an image into a part-whole hierarchy which has a different structure for each image? The idea is simply to use islands of identical vectors to represent the nodes in the parse tree. If GLOM can be made to work, it should significantly improve the interpretability of the representations produced by transformer-like systems when applied to vision or language

39 citations


Posted Content
Ting Chen1, Saurabh Saxena, Lala Li, David J. Fleet, Geoffrey E. Hinton 
TL;DR: Pix2Seq as mentioned in this paper cast object detection as a language modeling task conditioned on the observed pixel inputs, where object descriptions (e.g., bounding boxes and class labels) are expressed as sequences of discrete tokens and train a neural network to perceive the image and generate the desired sequence.
Abstract: This paper presents Pix2Seq, a simple and generic framework for object detection. Unlike existing approaches that explicitly integrate prior knowledge about the task, we simply cast object detection as a language modeling task conditioned on the observed pixel inputs. Object descriptions (e.g., bounding boxes and class labels) are expressed as sequences of discrete tokens, and we train a neural net to perceive the image and generate the desired sequence. Our approach is based mainly on the intuition that if a neural net knows about where and what the objects are, we just need to teach it how to read them out. Beyond the use of task-specific data augmentations, our approach makes minimal assumptions about the task, yet it achieves competitive results on the challenging COCO dataset, compared to highly specialized and well optimized detection algorithms.

4 citations



Proceedings Article
03 May 2021
TL;DR: This paper propose a flexible teaching framework using commentaries, meta-learned information helpful for training on a particular task or dataset, and explore diverse applications of commentaries from learning weights for individual training examples, to parameterising label dependent data augmentation policies, to representing attention masks that highlight salient image regions.
Abstract: Effective training of deep neural networks can be challenging, and there remain many open questions on how to best learn these models. Recently developed methods to improve neural network training examine teaching: providing learned information during the training process to improve downstream model performance. In this paper, we take steps towards extending the scope of teaching. We propose a flexible teaching framework using commentaries, meta-learned information helpful for training on a particular task or dataset. We present an efficient and scalable gradient-based method to learn commentaries, leveraging recent work on implicit differentiation. We explore diverse applications of commentaries, from learning weights for individual training examples, to parameterising label-dependent data augmentation policies, to representing attention masks that highlight salient image regions. In these settings, we find that commentaries can improve training speed and/or performance and also provide fundamental insights about the dataset and training process.