scispace - formally typeset
Open AccessJournal ArticleDOI

Hybrid computing using a neural network with dynamic external memory

TLDR
A machine learning model called a differentiable neural computer (DNC), which consists of a neural network that can read from and write to an external memory matrix, analogous to the random-access memory in a conventional computer.
Abstract
Artificial neural networks are remarkably adept at sensory processing, sequence learning and reinforcement learning, but are limited in their ability to represent variables and data structures and to store data over long timescales, owing to the lack of an external memory. Here we introduce a machine learning model called a differentiable neural computer (DNC), which consists of a neural network that can read from and write to an external memory matrix, analogous to the random-access memory in a conventional computer. Like a conventional computer, it can use its memory to represent and manipulate complex data structures, but, like a neural network, it can learn to do so from data. When trained with supervised learning, we demonstrate that a DNC can successfully answer synthetic questions designed to emulate reasoning and inference problems in natural language. We show that it can learn tasks such as finding the shortest path between specified points and inferring the missing links in randomly generated graphs, and then generalize these tasks to specific graphs such as transport networks and family trees. When trained with reinforcement learning, a DNC can complete a moving blocks puzzle in which changing goals are specified by sequences of symbols. Taken together, our results demonstrate that DNCs have the capacity to solve complex, structured tasks that are inaccessible to neural networks without external read-write memory.

read more

Content maybe subject to copyright    Report

Symbolic Reasoning with Differentiable Neural Comput- 1
ers 2
Alex Graves*, Greg Wayne*, Malcolm Reynolds, Tim Harley, Ivo Danihelka, Agnieszka 3
Grabska-Barwi
´
nska, Sergio Gomez, Edward Grefenstette, Tiago Ramalho, John Agapiou, Adri
`
a 4
Puigdom
`
enech Badia, Karl Moritz Hermann, Yori Zwols, Georg Ostrovski, Adam Cain, Helen 5
King, Christopher Summerfield, Phil Blunsom, Koray Kavukcuoglu, Demis Hassabis. 6
*Joint first authors 7
Recent breakthroughs demonstrate that neural networks are remarkably adept at sensory 8
processing
1
and sequence
2, 3
and reinforcement learning
4
. However, cognitive scientists and 9
neuroscientists have argued that neural networks are limited in their ability to define vari- 10
ables and data structures
5–9
, store data over long time scales without interference
10, 11
, and 11
manipulate it to solve tasks. Conventional computers, on the other hand, can easily be pro- 12
grammed to store and process large data structures in memory, but cannot learn to recognise 13
complex patterns. This work aims to combine the advantages of neural and computational 14
processing by providing a neural network with read-write access to an external memory. We 15
refer to the resulting architecture as a Differentiable Neural Computer (DNC). Memory access 16
is sparse, minimising interference among memoranda and enabling long-term storage
12, 13
, 17
and the entire system can be trained with gradient descent, allowing the network to learn how 18
to operate and organise the memory in a goal-directed manner. We demonstrate DNC’s abil- 19
ity to manipulate large data structures by applying it to a set of synthetic question-answering 20
tasks involving graphs, such as finding shortest paths and inferring missing links. We then 21
show that DNC can learn, based solely on behavioral reinforcement
14, 15
, to carry out com- 22
plex symbolic instructions in a game environment
16
. Taken together, these results suggest 23
1

that DNC is a promising model for tasks requiring a combination of pattern recognition and 24
symbol manipulation, such as question-answering and memory-based reinforcement learn- 25
ing. 26
Modern computers separate computation and memory. Computation is performed by a pro- 27
cessor, which can use an addressable memory to bring operands in and out of play. This confers on 28
the computer two important properties: it provides extensible storage to write new information as 29
it arrives and the ability to treat the contents of memory locations as variables. Variables are criti- 30
cal to algorithm generality: to perform the same procedure on one datum or another, an algorithm 31
merely has to change the address it looks up or the content of the address. By contrast to com- 32
puters, the computational and memory resources of artificial neural networks are mixed together 33
in the network weights and neuron activity. This is a major liability: as the memory demands of a 34
task increase, these networks cannot allocate new storage dynamically, nor easily learn algorithms 35
that act independently of the values realised by the task variables. 36
The Differentiable Neural Computer (DNC) is a neural network coupled to an external mem- 37
ory matrix (Figure 1). The behaviour of the controller network is independent of the memory size 38
as long as the memory is not filled to capacity, which is why we view the memory as “external”. 39
If the memory can be thought of as DNC’s RAM, then the network, referred to as the controller, 40
is a CPU whose operations are learned. DNCs differ from recent neural memory frameworks
17, 18
41
in that the memory can be selectively written to as well as read, allowing iterative modification of 42
memory content. An earlier form of DNC, the Neural Turing Machine
19
, had a similar structure 43
2

but less flexible memory access methods (Methods). 44
While conventional computers use unique addresses to access memory contents, DNC uses 45
differentiable attention mechanisms
2, 19–22
to define distributions over the rows, or locations, in the 46
memory matrix. These distributions, which we call weightings, represent the degree to which each 47
location is involved in a read or write operation, and are typically very sparse in a trained system. 48
For example, the read vector r returned by weighting w over memory M is simply a weighted 49
sum over the N memory locations: r =
P
N
i=1
M[i, .]w[i]. The functional units that determine 50
and apply the weightings are called read and write heads. Crucially, the heads are differentiable, 51
allowing the complete system to learn by gradient descent. 52
The heads employ three distinct forms of attention. The first is content lookup
19, 20, 23–25
in 53
which a key emitted by the controller is compared to the content of each location in memory accord- 54
ing to a similarity measure (here: cosine similarity). The similarity scores determine a weighting 55
that can be used by the read heads for associative recall
26
or by the write head to modify an ex- 56
isting vector in memory. Importantly, a key that only partially matches the content of a memory 57
location can still be used to attend strongly to that location. This enables key-value retrieval where 58
the value recovered by reading the memory location includes additional information not present in 59
the key. Key-value retrieval provides a rich mechanism for navigating associative data structures 60
in the external memory, as the content of one address can effectively encode references to other 61
addresses. In our experiments, this proved essential to processing graph data. 62
A second attention mechanism records transitions between consecutively written locations 63
3

Figure 1: DNC Architecture. a: A recurrent controller network receives input from an external
data source and produces output. b & c: The controller also outputs vectors that parameterise one
write head (green) and multiple read heads (two in this case: blue and pink). The heads define
weightings that selectively focus on the rows, or locations, in the memory matrix (stronger colour
for higher weight). The read vectors returned by the read heads are passed to the controller at the
next time step. d: A temporal link matrix records the order locations were written in; here, we
represent the order locations were written to using directed arrows. The grey arrows indicates a
write event that was split between two locations.
4

in an N × N temporal link matrix L (Figure 1d). L[i, j] is close to one if i was the next location 64
written after j, and is close to zero otherwise. For any weighting w, the operation Lw smoothly 65
shifts the focus forward to the locations written after those emphasised in w, while L
>
w shifts the 66
focus backward. This gives DNC the native ability to recover sequences in the order in which they 67
were presented. 68
The third form of attention allocates memory for writing. The usage of each location is 69
represented as a number between zero and one. Based on the usages, a weighting over unused 70
locations is delivered to the write head. As well as automatically increasing with each write to a 71
location, usage can be decreased after each read using the free gates. This allows the controller 72
to reallocate memory that is no longer required (Supplementary Figure 3). As a consequence 73
of its allocation mechanism, DNC can be trained to solve a task using one size of memory and 74
later be upgraded to a larger memory without retraining and without any impact on performance 75
(Supplementary Figure 1). This property would also make it possible to use an unbounded external 76
memory by automatically increasing the number of locations every time the usage of all locations 77
passes a certain threshold. 78
Although the design of DNC was motivated largely by computational considerations, we 79
cannot resist drawing some connection between the attention mechanisms and the mammalian 80
hippocampus’ functional capabilities. DNC memory modification is fast and can be one-shot, re- 81
sembling the associative long-term potentiation of hippocampal CA3 and CA1 synapses
27
. The 82
hippocampal dentate gyrus, a region known to support neurogenesis
28
, has been proposed to in- 83
5

Figures
Citations
More filters
Journal ArticleDOI

Continual lifelong learning with neural networks: A review.

TL;DR: This review critically summarize the main challenges linked to lifelong learning for artificial learning systems and compare existing neural network approaches that alleviate, to different extents, catastrophic forgetting.
Journal ArticleDOI

Building machines that learn and think like people.

TL;DR: In this article, a review of recent progress in cognitive science suggests that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn and how they learn it.
Journal ArticleDOI

Deep learning with coherent nanophotonic circuits

TL;DR: A new architecture for a fully optical neural network is demonstrated that enables a computational speed enhancement of at least two orders of magnitude and three order of magnitude in power efficiency over state-of-the-art electronics.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Journal Article

Visualizing Data using t-SNE

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Journal ArticleDOI

Human-level control through deep reinforcement learning

TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Proceedings Article

Sequence to Sequence Learning with Neural Networks

TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.
Related Papers (5)