Inductive Biases for Deep Learning of Higher-Level Cognition

Home
/
Papers
/
Inductive Biases for Deep Learning of Higher-Level Cognition

Posted Content•

Inductive Biases for Deep Learning of Higher-Level Cognition

30 Nov 2020-arXiv: Learning-

TL;DR: This work considers a larger list of inductive biases that humans and animals exploit, focusing on those which concern mostly higher-level and sequential conscious processing, and suggests they could potentially help build AI systems benefiting from humans' abilities in terms of flexible out-of-distribution and systematic generalization.

read less

Abstract: A fascinating hypothesis is that human and animal intelligence could be explained by a few principles (rather than an encyclopedic list of heuristics). If that hypothesis was correct, we could more easily both understand our own intelligence and build intelligent machines. Just like in physics, the principles themselves would not be sufficient to predict the behavior of complex systems like brains, and substantial computation might be needed to simulate human-like intelligence. This hypothesis would suggest that studying the kind of inductive biases that humans and animals exploit could help both clarify these principles and provide inspiration for AI research and neuroscience theories. Deep learning already exploits several key inductive biases, and this work considers a larger list, focusing on those which concern mostly higher-level and sequential conscious processing. The objective of clarifying these particular principles is that they could potentially help us build AI systems benefiting from humans' abilities in terms of flexible out-of-distribution and systematic generalization, which is currently an area where a large gap exists between state-of-the-art machine learning and human intelligence.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•

Thinking fast and slow.

[...]

Neil McGlynn

01 Dec 2014-Australian Veterinary Journal

TL;DR: Prospect Theory led cognitive psychology in a new direction that began to uncover other human biases in thinking that are probably not learned but are part of the authors' brain’s wiring.

...read moreread less

Abstract: In 1974 an article appeared in Science magazine with the dry-sounding title “Judgment Under Uncertainty: Heuristics and Biases” by a pair of psychologists who were not well known outside their discipline of decision theory. In it Amos Tversky and Daniel Kahneman introduced the world to Prospect Theory, which mapped out how humans actually behave when faced with decisions about gains and losses, in contrast to how economists assumed that people behave. Prospect Theory turned Economics on its head by demonstrating through a series of ingenious experiments that people are much more concerned with losses than they are with gains, and that framing a choice from one perspective or the other will result in decisions that are exactly the opposite of each other, even if the outcomes are monetarily the same. Prospect Theory led cognitive psychology in a new direction that began to uncover other human biases in thinking that are probably not learned but are part of our brain’s wiring.

...read moreread less

4,351 citations

The theory of affordances

[...]

博之三嶋

01 Nov 2008

2,686 citations

Journal Article•DOI•

End-to-end prostate cancer detection in bpMRI via 3D CNNs: Effects of attention mechanisms, clinical priori and decoupled false positive reduction.

[...]

Anindo Saha¹, Matin Hosseinzadeh¹, Henkjan J. Huisman¹•Institutions (1)

Radboud University Nijmegen¹

29 Jun 2021-Medical Image Analysis

TL;DR: Megen et al. as discussed by the authors presented a multi-stage 3D computer-aided detection and diagnosis (CAD) model for automated localization of clinically significant prostate cancer (csPCa) in bi-parametric MR imaging (BPMRI).

...read moreread less

53 citations

Journal Article•DOI•

How does hemispheric specialization contribute to human-defining cognition?

[...]

Gesa Hartwigsen¹, Yoshua Bengio², Danilo Bzdok³, Danilo Bzdok⁴•Institutions (4)

Max Planck Society¹, Université de Montréal², McGill University³, Montreal Neurological Institute and Hospital⁴

07 Jul 2021-Neuron

TL;DR: In this paper, dual-processing theories of cognition have been used to explain human cognitive abilities, such as semantic understanding of world structure, logical reasoning, and communication via language, and they have been integrated with the global workspace theory to explain dynamic relay of information products between two systems.

...read moreread less

33 citations

Journal Article•DOI•

How we learn: why brains learn better than any machine … for now: by Stanislas Dehaene, New York, Viking, 2020, £20.15 (Hardback), £12.23 (Paperback), ISBN: 9780525559887

[...]

John Essington¹•Institutions (1)

Blackburn College¹

15 Jun 2021-Educational Review

TL;DR: Dehaene et al. as discussed by the authors used a quote from MIT President L. Rafael Reif: "If we don't know how to learn, we can't learn how to adapt."

...read moreread less

Abstract: French neuroscientist Stanislas Dehaene begins his newest book How we learn: why brains learn better than any machine … for now with a quote from MIT President L. Rafael Reif: “If we don’t know how...

...read moreread less

33 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Deep Residual Learning for Image Recognition

[...]

Kaiming He¹, Xiangyu Zhang¹, Shaoqing Ren¹, Jian Sun¹•Institutions (1)

Microsoft¹

27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

...read moreread less

123,388 citations

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Journal Article•DOI•

Long short-term memory

[...]

Sepp Hochreiter¹, Jürgen Schmidhuber²•Institutions (2)

Technische Universität München¹, Dalle Molle Institute for Artificial Intelligence Research²

01 Nov 1997-Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Abstract: Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

...read moreread less

72,897 citations

"Inductive Biases for Deep Learning ..." refers methods in this paper

..., 2017), parameter sharing (Hochreiter and Schmidhuber, 1997; Pham et al., 2018), implicit effects of the choice of the optimization method, or choices of prior distributions in a Bayesian model....
[...]
...…architectural constraints (Yu and Koltun, 2015; Long et al., 2015; Dumoulin and Visin, 2016; He et al., 2016; Huang et al., 2017), parameter sharing (Hochreiter and Schmidhuber, 1997; Pham et al., 2018), implicit effects of the choice of the optimization method, or choices of prior distributions…...
[...]

Proceedings Article•

Attention is All you Need

[...]

Ashish Vaswani¹, Noam Shazeer¹, Niki Parmar², Jakob Uszkoreit¹, Llion Jones¹, Aidan N. Gomez¹, Lukasz Kaiser¹, Illia Polosukhin¹ - Show less +4 more•Institutions (2)

Google¹, University of Southern California²

12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

Abstract: The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.

...read moreread less

52,856 citations

Proceedings Article•DOI•

ImageNet: A large-scale hierarchical image database

[...]

Jia Deng¹, Wei Dong¹, Richard Socher¹, Li-Jia Li¹, Kai Li¹, Li Fei-Fei¹ - Show less +2 more•Institutions (1)

Princeton University¹

20 Jun 2009

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Abstract: The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.

...read moreread less

49,639 citations

"Inductive Biases for Deep Learning ..." refers background in this paper

...Many machine learning systems have achieved excellent accuracy across a variety of tasks (Deng et al., 2009; Mnih et al., 2013; Schrittwieser et al., 2019), yet the question of whether their reasoning or judgement is correct has come under question, and answers seem to be wildly inconsistent,…...
[...]
...Is 100% accuracy on the test set enough? Many machine learning systems have achieved excellent accuracy across a variety of tasks (Deng et al., 2009; Mnih et al., 2013; Schrittwieser et al., 2019), yet the question of whether their reasoning or judgement is correct has come under question, and answers seem to be wildly inconsistent, depending on the task, architecture, training data, and interestingly, the extent to which test conditions match the training distribution....
[...]