scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The Role of Architectural and Learning Constraints in Neural Network Models: A Case Study on Visual Space Coding.

TL;DR: It is argued that unsupervised deep learning represents an important step forward for improving neurocomputational models of perception and cognition, because it emphasizes the role of generative learning as opposed to discriminative (supervised) learning.
Abstract: The recent “deep learning revolution” in artificial neural networks had strong impact and widespread deployment for engineering applications, but the use of deep learning for neurocomputational modeling has been so far limited. In this article we argue that unsupervised deep learning represents an important step forward for improving neurocomputational models of perception and cognition, because it emphasizes the role of generative learning as opposed to discriminative (supervised) learning. As a case study, we present a series of simulations investigating the emergence of neural coding of visual space for sensorimotor transformations. We compare different network architectures commonly used as building blocks for unsupervised deep learning by systematically testing the type of receptive fields and gain modulation developed by the hidden neurons. In particular, we compare Restricted Boltzmann Machines (RBMs), which are stochastic, generative networks with bidirectional connections trained using contrastive divergence, with autoencoders, which are deterministic networks trained using error backpropagation. For both learning architectures we also explore the role of sparse coding, which has been identified as a fundamental principle of neural computation. The unsupervised models are then compared with supervised, feed-forward networks that learn an explicit mapping between different spatial reference frames. Our simulations show that both architectural and learning constraints strongly influenced the emergent coding of visual space in terms of distribution of tuning functions at the level of single neurons. Unsupervised models, and particularly RBMs, were found to more closely adhere to neurophysiological data from single-cell recordings in the primate parietal cortex. These results provide new insights into how basic properties of artificial neural networks might be relevant for modeling neural information processing in biological systems.

Content maybe subject to copyright    Report

ORIGINAL RESEARCH
published: 21 March 2017
doi: 10.3389/fncom.2017.00013
Frontiers in Computational Neuroscience | www.frontiersin.org 1 March 2017 | Volume 11 | Article 13
Edited by:
Marcel van Gerven,
Radboud University Nijmegen,
Netherlands
Reviewed by:
Michael W. Spratling,
King’s College London, UK
Kandan Ramakrishnan,
University of Amsterdam, Netherlands
*Correspondence:
Alberto Testolin
alberto.testolin@unipd.it
Marco Zorzi
marco.zorzi@unipd.it
Received: 30 November 2016
Accepted: 27 February 2017
Published: 21 March 2017
Citation:
Testolin A, De Filippo De Grazia M and
Zorzi M (2017) The Role of
Architectural and Learning Constraints
in Neural Network Models: A Case
Study on Visual Space Coding.
Front. Comput. Neurosci. 11:13.
doi: 10.3389/fncom.2017.00013
The Role of Architectural and
Learning Constraints in Neural
Network Models: A Case Study on
Visual Space Coding
Alberto Testolin
1
*
, Michele De Filippo De Grazia
1
and Marco Zorzi
1, 2
*
1
Department of General Psychology and Padova Neuroscience Center, University of Padova, Padova, Italy,
2
San Camillo
Hospital IRCCS, Venice, Italy
The recent “deep learning revolution” in artificial neural networks had strong impact
and widespread deployment for engineering applications, but the use of deep learning
for neurocomputational modeling has been so far limited. In this article we argue
that unsupervised deep learning represents an important step forward for improving
neurocomputational models of perception and cognition, because it emphasizes the role
of generative learning as opposed to discriminative (supervised) learning. As a case study,
we present a series of simulations investigating the emergence of neural coding of visual
space for sensorimotor transformations. We compare different network architectures
commonly used as building blocks for unsupervised deep lear ning by systematically
testing the type of receptive fields and gain modulation developed by the hidden
neurons. In particular, we compare Restricted Boltzmann Machines (RBMs), which are
stochastic, generative networks with bidirectional connections trained using contrastive
divergence, with autoencoders, which are deterministic networks trained using error
backpropagation. For both learning architecture s we also explore the role of sparse
coding, which has been identified as a fundamental principle of neural computation. The
unsupervised models are then compared with supervised, feed-forward networks that
learn an explicit mapping between different spatial reference frames. Our simulations
show that both architectural and learning constraints strongly influenced the emergent
coding of visual space in terms of distribution of tuning functions at the level of single
neurons. Unsupervised models, and particularly RBMs, were found to more closely
adhere to neurophysiological data from single-cell recordings i n the primate parietal
cortex. These results provide new insights into how basic properties of artificial neural
networks might be relevant for modeling neural information processing in biological
systems.
Keywords: connectionist modeling, unsupervised deep learning, restricted Boltzmann machines, autoencoders,
sparseness, space coding, gain modulation, sensorimotor transformations

Testolin et al. Architecture and Learning Shape Space Coding
INTRODUCTION
Artificial neural network models aim at explaining human
cognition and behavior in terms of the emergent consequences
of a large number of simple, subcognitive processes (McClelland
et al., 2010). Within this framework, the pattern seen in
overt behavior (macroscopic dynamics of the system) reflects
the coordinated operations of simple biophysical mechanisms
(microscopic dynamics of the system), such as the propagation
of activation and inhibition among elementary processing units.
Though t h is general tenet is shared by all connectionist models,
there is large variability in processing architectures and learning
algorithms, which turns into varying degrees of psychological and
biological realism (e.g.,
Thorpe and Imbert, 1989; O’Reilly, 1998).
When the aim is to investigate high-le vel cognitive functions,
simplification is essential (McClelland, 2009) and the underlying
processing mechanisms do not need to faithfully implement the
neuronal circuits supposed to carry out such functions in the
brain. However, modelers should strive to consider biological
plausibility if this can bridge different levels of description
(
Testolin and Zorzi, 2016).
Recent theoretic a l and technical progress in artificial neural
networks has significantly expanded the range of tasks that
can be solved by machine intelligence. In particular, the advent
of powerful parallel computing architectures based on Graphic
Processing Units (GPUs), coupled with the availability of “big
data, has allowed to create and train large-scale, hierarchical
neural networks known as deep neural networks (LeCun et al.,
2015, for review). These powerful learning systems achieve
impressive performance in many challenging cognitive tasks,
such as visual object recognition (
Krizhevsky et al., 2012),
speech processing (Mohamed et al., 2012) and natural language
understanding (Collobert et al., 2011). However, while the impact
of deep learning for engineering applications is undisputed,
its relevance for modeling neural information processing in
biological systems still needs to be fully evaluated (for seminal
attempts, see Stoianov and Zorzi, 2012; Khaligh-Razavi and
Kriegeskorte, 2014; Güçlü and van Gerven, 2015).
One critical aspect of most deep learning systems is
the reliance on a feed-forward architecture trained wit h
error backpropagation (Rumelhart et al., 1986), which has
been repeatedly shown to yield state-of-the-art performance
in a variety of problems (
LeCun et al., 2015). However,
the assumptions that learning is largely discriminative (e.g.,
classification or function learning) and that an external teaching
signal is always available at each learning event (i.e., all training
data is “labeled”) are clearly implausible from both a cognitive
and a biological perspective (
Zorzi et al., 2013; Cox and Dean,
2014). Reinforcement learning is a valuable alternative and
it has already shown promising results when combined with
deep learning (Mnih et al., 2015; Silver et al., 2016), but
there is a broad range of situations where learning seems
to be fully unsupervised and its only objective is t hat of
discovering the latent structure of the input data in order to build
rich, internal representations of the environment (
Hinton and
Sejnowski, 1999
). We argue that more realistic neurocognitive
models should therefore also exploit unsuper vised forms of deep
learning, where the objective is not to explicitly classify the
input patterns but rather to discover internal representations
by fitting a hierarchical generative model to the sensory d ata
(
Hinton, 2007 , 2013; Zorzi et al., 2013). Compared to its
supervised counterpart, this modeling approach emphasizes the
role of feedback, recurrent connections (
Sillito et al., 2006),
which carry top-down expectations that are gradually adjusted to
better reflect the observed data (Hinton and Ghahramani, 1997;
Friston, 2010) and which can be used to implement concurrent
probabilistic inference along the whole cortical hierarchy (Lee
and Mumford, 2003; Gilbert and Sigman, 2007). Notably, top-
down processing is also relevant for understanding attentional
mechanisms in terms of modulation of neural information
processing (Kastner and Ungerleider, 2000).
A powerful class of stochastic neural networks that learn a
generative model of the data is that of Restricted Boltzmann
Machines (RBMs), which can efficiently discover internal
representations (i.e., latent features) using Hebbian-like learning
mechanisms (
Hinton, 2002). RBMs constitute the building block
of hierarchical generative models such as Deep Belief Networks
(Hinton and Salakhutdinov, 2006) and Deep Boltzmann
Machines (Salakhutdinov, 2015). These unsupervised deep
learning models have been successfully used to simulate a
variety of cognitive functions, such as numerosity perception
(Stoianov and Zorzi, 2012), letter perception (Testolin et al.,
under review), location-invariant visual word recognition (Di
Bono and Zorzi, 2013), and visual hallucinations in psychiatric
syndromes (Reichert et al., 2013). A similar approach has been
used to simulate how early visual cortic al representations are
adapted to statistical regularities in natural images, in order to
predict single voxel responses to natural images and identify
images from stimulus-evoked multiple voxel responses (
Güçlü
and van Gerven, 2014). A temporal extension of RBMs has also
been recently used to model sequential orthographic processing
and spontaneous pseudoword generation (Testolin et al., 2016).
Unsupervised deep learning can be implemented using an
alternative architecture based on autoencoders (Bengio et al.,
2007), which are deterministic, feed-forward networks whose
learning goal is to accurately reconstruct the input dat a into
a separate layer of output units. Single-layer autoencoders are
trained using error backpropagation, and can be stacked in
order to build more complex, multi-layer architectures. However,
despite the common view that RBMs and autoencoders could
be considered equivalent (Ranzato et al., 2007), we note that
their underlying a rchitectural and learning assumptions are
significantly different. In this study we empirically compare
RBMs and autoencoders in terms of the type of internal encoding
emerging in the hidden neurons. Moreover, we investigate how
additional learning constraints, such as sparsity and limitation
of computa tional resources (i.e., hidden la yer size), could
influence the representations de veloped by the networks. As a
case study, we focus on the problem of learning visuospatial
coding for sensorimotor transformations, which is a prominent
example of how the emergentist approach based on learning
in artificial neural networks has offered important insights into
the computat ions performed by biological neurons (
Zipser and
Andersen, 1988
).
Frontiers in Computational Neuroscience | www.frontiersin.org 2 March 2017 | Volume 11 | Article 13

Testolin et al. Architecture and Learning Shape Space Coding
Sensorimotor transformations refer to the process by
which sensory stimuli are converted into motor commands.
For example, reaching requires to map visual information,
represented in retinal coordinates, into a system of coordinates
that is centered on the effector. Coordinate transformations
can be accomplished by combining sensory information with
extra-retinal information, such as postural signals representing
the position of eyes, head, or hand, thereby obtaining abstract
representations of the space interposed between the sensory input
and the motor output (
Pouget and Snyder, 2000). Single-neuron
recordings from monkey posterior parietal cortex have shown
that the response amplitude of many neurons indeed depends
on the position of the eyes, thereby unveiling a fundamental
coding principle used to perform this type of signal integration
(Andersen et al., 1985). The term gain field was coined to describe
this gaze-dependent response of parietal neurons, and since then
the notion of gain modulation has been generalized to indicate the
multiplicative control of one neuron’s responses by the responses
of another set of neurons (
Salinas and Thier, 2000). Another
fundamental property unveiled by neuronal recordings is that the
encoding of space used for coordinate transformations involves
a variety of different, complement ary frames of reference. For
example, although many parietal neurons are centered on retinal
coordinates (Andersen et al., 1985; Duhamel et al., 1992),
others represent space using body-centered (Snyder et al., 1998)
or effector-centered (Sakata et al., 1995) coordinate systems.
Moreover, some neurons exhibit multiple gain modulation
(Chang et al., 2009), suggesting more complex forms of spatial
coding. For example, postural information related to both eye
and he a d positions can be combined in order to encode “gaze
direction (
Brotchie et al., 1995; Stricanne et al., 1996; Duhamel
et al., 1997).
From a computational perspective, the seminal work of
Zipser and Andersen (1988) showed that gain modulation
could spontaneously emerge in supervised, feed-forward neural
networks trained to explicitly map visual targets into head-
centered coordinates, giving as input any arbitrary pair of eye and
retinal positions. Similar results have been observed using more
biologically-plausible learning setti ngs, such as reinforcement
learning (Mazzoni et al., 1991) and predictive coding (De Meyer
and Spratling, 2011
). Note that these learning settings assume
that gain modulation emerges because the task implies to
establish a mapping between different reference frames. However,
it is unclear wheth er the form of modulation and the distribution
of neuronal tuning functions is influenced by the type of
learning algorithm and/or by the nature of the learning task (i.e.,
learning input-output mappings vs. unsupervised learning of
internal representations). We also note that a popular alternative
framework for modeling sensorimotor transformations is not
based on learning, but rather stipulates that parietal neurons
represent a set of basis functions that combine vi sual and postural
information (for review, see
Pouget and Snyder, 2000).
In summary, space coding represents an interesting case
study for testing the adequacy of different neural network
architectures and learning algorithms, because it provides a
wealth of neurophysiological data (both at the population and
single-neuron levels), and it departs from the classic problem of
visual object recognition investigated in the large majority of deep
learning research.
MATERIALS AND METHODS
In this section we describe the space coding tasks used in our
simulations, including training and test stimuli, the different
learning architectures, and the procedures for analyzing the
emergent neural representations.
Space Coding Tasks
In this study we consider a visual signal in retinotopic coordinates
and two different postural signals, one for eye position and
another for a generic “effector, which might represent, for
example, the position of the hand. We do not consider
the integration between different modalities (see
Xing and
Andersen, 2000, for a computational investigation of multimodal
integration in several coordinate frames). We implemented
three types of space coding tasks to test the different learning
architectures.
Unsupervised Learning with No Coordinate
Transformation
The first learning architecture is depicted in Figure 1A .
Unsupervised learning is represented by undirected arrows,
which connect the sensory input to a separate layer of hidden
neurons. The input signal to the network consists of a visual map,
which represents t a rget location in retinotopic coordinates, and
two postural maps, which represent eye and effector positions.
The learning goal is only to build a compact representation of
these input signals in the hidden layer, which is later rea d-out by
a simple linear associator in order to establish a mapping with
the corresponding motor program. Details of input and output
representations are provided in Section D ataset and Stimuli. The
unsupervised learning phase does not involve any coordinate
transformation because information about the motor program is
not available.
Unsupervised Learning with Coordinate
Transformation
The second learning architecture is depicted in Figure 1B. The
input signal to the network still consists of a visual map and
two postural maps, but in this c ase we also provide as input the
corresponding motor program. In this setting the unsupervised
learning phase implicitly involves coordinate transformation
(i.e., different coordinate systems become associated). In
order to compare the mapping accuracy of different learning
architectures using the same method, the motor program
is still read-out from hidden neurons via a simple line a r
associator.
Supervised Learning with Coordinate Transformation
The third learning architecture is depicted in Figure 1C, and it
corresponds to the model used by
Zipser and Andersen (1988).
The input is the same of t he unsupervised architecture shown in
Figure 1A, but in th is case supervised learning (directed arrows)
is used to establish an explicit mapping between input signals
Frontiers in Computational Neuroscience | www.frontiersin.org 3 March 2017 | Volume 11 | Article 13

Testolin et al. Architecture and Learning Shape Space Coding
FIGURE 1 | Graphical representations of the learning architectures
used to simulate the space coding tasks. Undirected edges entail
bidirectional (recurrent) connections, while directed arrows represent
feed-forward connections. (A) Unsupervised learning with no coordinate
transformation. (B) Unsupervised learning with coordinate transformation. (C)
Supervised learning with coordinate transformation.
and motor programs. As for the previous architectures, accuracy
of the motor program is also tested by read-out from hidden
neurons via linear association.
Dataset and Stimuli
The representation format adopted for the sensory stimuli
was the same used in previous computational investigations
(
Zipser and Andersen, 198 8; Pouget and Snyder, 2000; De
Filippo De Grazia et al., 2012
), which is broadly consistent
with neurophysiological data recorded in animals performing
tasks involving coordinate transformations (e.g.,
Andersen et al.,
1985
).
The visual input to the models consisted in a real-valued
vector representing the position of the stimulus as a Gaussian
peak of activity in a specific location. These visible neurons
simulate the activity of the cortic al areas supplying reti notopic
sensory information to the posterior parietal cortex. The
retinotopic map consisted in a square matrix of 17 × 17 neurons,
which employed a population code with Gaussian tuning
functions (standard deviation = 4
). Visual receptive fields were
uniformly spread between 9
and +9
with increments of 3
,
both in the horizontal and vertical dimensions.
Four postural maps, each one consisting of 17 neurons, were
used to represent the horizontal and vertical positions of the eye
and the effector. These visible neurons used a sigmoid activation
function (steepness parameter = 0.125) to represent postural
information between 18 and +18
, with steps of 3
.
The motor program consisted in a real-valued vector
representing the target position of t he stimulus. Similarly to
the retinotopic map, it was coded as a square matrix of 25 ×
25 neurons, which employed a population code with G a ussian
tuning functions to represent target position in coordinates
centered on the effector (standard deviation = 6
). Motor
programs were uniformly spread between 9
and +9
with
increments of 3
, both in the horizontal and vertical dimensions.
In order to create the stimuli dataset, all possible combinations
of visual input and postural si gnals were first generated, and the
corresponding motor program (target location) was computed.
We then balanced the patterns to ensure that target locations
were equally distributed across the motor map to avoid position
biases when decoding the motor program. This resulted in a total
of 28,880 patt erns, which were randomly split into a training set
(20,000 patterns) and an independent test set (8,880 patterns).
The latter was used to assess the generalization performance of
the models.
Learning Architectures
Despite they differ in several aspects, Boltzmann machines and
autoencoders can both be defined within the mathematical
framework of energy-based models (
Ranzato et al., 2007), where
the learning objective is to carve the surface of an energy function
so as to minimize the energies of training points and maximize
the energies of unobserved points. A set of latent variables is
used to learn an internal code that can efficiently represent the
obser ved data points, and since the number of latent variables is
usually smaller than that of the observed variables the encoding
process can be interpreted as a form of dimensionality reduction
(Hinton and Salakhutdinov, 2006). In this unsupervised setting,
the model learns the statistical structure of the d a ta without the
need for any explicit, external label.
Restricted Boltzmann Machines (RBMs)
Boltzmann machines are stochastic neural networks that use a
set of hidden neurons to model the latent causes of the observed
data vectors, which are presented to the network through a set of
Frontiers in Computational Neuroscience | www.frontiersin.org 4 March 2017 | Volume 11 | Article 13

Testolin et al. Architecture and Learning Shape Space Coding
visible neurons (Ackley et al., 1985). In the “restricted” case, the
network connectivity is constrained in order to obtain a bipartite
graph (i.e., there are no connections within the same layer; see
Figure 2A for a graphical representation). The behavior of the
network is driven by an energy function E, which defines the
joint distribution of the hidden and visible neurons by assigning
a probability value to each of their possible configurations:
p(v, h) =
e
E(v, h)
Z
where v and h are the column vectors containing the values of
visible and hidden neurons, respectively, and Z is the partition
function. The energy function is defined as a linear combination
of visible and hidden neurons activation:
E(v, h) = b
T
v c
T
h h
T
Wv
where W is the matrix of connections weights, b and c are two
additional parameters known as unit biases and T denotes the
transpose operator. Since there are no connections within the
same layer, hidden neurons are conditionally independent given
the state of visible neurons (and vice versa). In particular, the
activation probability of the neurons in each layer conditioned
on the activation of the neurons in the opposite layer can be
efficiently computed in one parallel step:
P(h
j
= 1|v) = σ (c
j
+
X
i
w
ij
v
i
)
P(v
i
= 1|h) = σ (b
i
+
X
j
w
ij
h
j
)
where σ is the sigmoid function, c
j
and b
i
are the biases of
hidden and visible neurons (h
j
and v
i
respectively), and w
ij
is
the connection weight between h
j
and v
i
. Learning in RBMs can
be per formed through maximum-likelihood, where each weight
should be changed at each step according to a Hebbian-like
learning rule:
1W = η(v
+
h
+
v
h
)
where η represents the learning rate, v
+
h
+
are the visible-hidden
correlations computed on the training data (positive phase), and
v
h
are th e visible-hidden correlations computed according to
the model’s expectations (negative phase). Model’s expectations
have been traditionally computed by running Gibbs sampling
algorithms until the network reached equilibrium (
Ackley et al.,
1985). However, more efficient algorithms such as contrastive
divergence (Hinton, 2002) speed-up learning by approximating
the log-probability gradient. The reader is referred to Hinton
(2010) and Zorzi et al. (2013) for more details about RBMs and
for the discussion of hyper-parameters of t he learning algorithm.
In our simulations, RBMs were trained using 1-step
contrastive divergence with a learning rate of 0.03, a weight
decay of 0.0002 and a momentum coefficient of 0.9, which was
initialized to 0.5 for the first few epochs. Learning was pe rformed
using a mini-batch scheme, with a mini-batch size of 4 patterns,
for a total of 100 learning epochs (reconstruction error always
converged). Sparse representations were encouraged by forcing
the network’s internal representations to rely on a limited number
of active hidden units, that is, by driving the probability q of a
unit to be active to a certain desired (low) probability p (Lee et al.,
2008
). For logistic units, this can be practically implemented by
first calculating the quantity q-p, which is then multiplied by a
scaling factor and added to the biases of each hidden units at
every weight update. When the sparsity constraint was applied,
we always verified that the average activation of hidden units was
indeed maintained below the desired level. All the simulations
were performed using an efficient implementation of RBMs on
graphic processors (
Testolin et al., 2013). The complete source
code is available for download
1
.
Autoencoders
Similarly to RBMs, autoencoders rely on a single layer of
nonlinear hidden units to compactly represent the statistical
regularities of the training data. However, autoencoders
are feed-forward, deterministic networks trained with error
backpropagation (
Bengio et al., 2007). The training data is
presented to a layer of input units, and the learning goal is
to accurately reconstruct such input vector into a separate,
output layer. An autoencoder is therefore composed of a set of
encoding weights W
1
that are used to compute the activation
of hidden h units given the activation of input units v, and a set
of decoding weights W
2
that are used to compute the network
reconstructions v_rec from the activations of hidden units:
h = σ (W
1
v + c)
v_rec = σ (W
2
h + b)
where b and c are the vectors of output and hidden unit
biases, and σ is the sigmoid function (see Figure 2B for a
graphical representation). The error function E to be minimized
corresponds to the average reconstruction error, which is
quantified by the sum across all output units of the squared
difference between the original and the reconstructed values:
E =
1
N
N
X
n = 1
K
X
k = 1
(v
k
v_rec
k
)
2
+ β
sparsity
where K is the number of output units and N is the number of
training patterns. Similarly to RBMs, sparse representations can
be induced by adding to the cost function a regularization term
sparsity
that takes a large value when the average activation value
q of each hidden neuron diverges from a certain desired (low)
value p. In particular, the sparsity constraint was implemented as
the Kullback-Leibler divergence from q to p:
sparsity
=
H
X
i = 1
KL(p || q
i
)
where H is the number of hidden units. As for RBMs, when
sparsity was applied we always verified that t h e average activation
of hidden units was indeed maintained below the desired level.
1
http://ccnl.psy.unipd.it/research/deeplearning
Frontiers in Computational Neuroscience | www.frontiersin.org 5 March 2017 | Volume 11 | Article 13

Citations
More filters
01 Jun 2014
TL;DR: This article found that higher order correlations in natural scenes induced a sparser code, in which information is encoded by reliable activation of a smaller set of neurons and can be read out more easily.
Abstract: Neural codes are believed to have adapted to the statistical properties of the natural environment. However, the principles that govern the organization of ensemble activity in the visual cortex during natural visual input are unknown. We recorded populations of up to 500 neurons in the mouse primary visual cortex and characterized the structure of their activity, comparing responses to natural movies with those to control stimuli. We found that higher order correlations in natural scenes induced a sparser code, in which information is encoded by reliable activation of a smaller set of neurons and can be read out more easily. This computationally advantageous encoding for natural scenes was state-dependent and apparent only in anesthetized and active awake animals, but not during quiet wakefulness. Our results argue for a functional benefit of sparsification that could be a general principle governing the structure of the population activity throughout cortical microcircuits.

125 citations

Journal ArticleDOI
TL;DR: It is concluded that advanced deep learning architectures are combinations of few conventional architectures, which are more robust to explore the problem space and thus can be the answer to build a general‐purpose architecture.

77 citations

Journal ArticleDOI
TL;DR: It is shown that deep neural networks endowed with basic visuospatial processing exhibit a remarkable performance in numerosity discrimination before any experience-dependent learning, whereas unsupervised sensory experience with visual sets leads to subsequent improvement of number acuity and reduces the influence of continuous visual cues.
Abstract: The finding that human infants and many other animal species are sensitive to numerical quantity has been widely interpreted as evidence for evolved, biologically determined numerical capacities ac...

55 citations

Journal ArticleDOI
TL;DR: This article proposes to study deep belief networks using techniques commonly employed in the study of complex networks, in order to gain some insights into the structural and functional properties of the computational graph resulting from the learning process.
Abstract: Thanks to the availability of large scale digital datasets and massive amounts of computational power, deep learning algorithms can learn representations of data by exploiting multiple levels of abstraction. These machine-learning methods have greatly improved the state-of-the-art in many challenging cognitive tasks, such as visual object recognition, speech processing, natural language understanding and automatic translation. In particular, one class of deep learning models, known as deep belief networks (DBNs), can discover intricate statistical structure in large datasets in a completely unsupervised fashion, by learning a generative model of the data using Hebbian-like learning mechanisms. Although these self-organizing systems can be conveniently formalized within the framework of statistical mechanics, their internal functioning remains opaque, because their emergent dynamics cannot be solved analytically. In this article, we propose to study DBNs using techniques commonly employed in the study of complex networks, in order to gain some insights into the structural and functional properties of the computational graph resulting from the learning process.

22 citations


Cites background from "The Role of Architectural and Learn..."

  • ...For example, response profiles of individual neurons in deep networks often exhibit an impressive match with neurophysiological data [31, 54, 56, 61]....

    [...]

Journal ArticleDOI
TL;DR: In this brief, sustaining and intermittent run-to-run controllers are designed to achieve the stability of singular discrete-time neural networks with state-dependent coefficients.
Abstract: In this brief, sustaining and intermittent run-to-run controllers are designed to achieve the stability of singular discrete-time neural networks with state-dependent coefficients. The controllers are designed for two reasons: 1) it is very difficult and almost impossible to only measure the in situ feedback information for the controllers and 2) the controllers may not always exist at any time. The stability is then established for singular discrete-time neural networks with state-dependent coefficients. Finally, numerical simulations are shown to illustrate the usefulness of the obtained criteria.

15 citations


Cites background from "The Role of Architectural and Learn..."

  • ...[23] illustrate how some learning constraints, such as the limitations of the computational resources, can impact the manifestation revealed by neural networks....

    [...]

References
More filters
Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Journal ArticleDOI
28 May 2015-Nature
TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Abstract: Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

46,982 citations

Journal ArticleDOI
13 May 1983-Science
TL;DR: There is a deep and useful connection between statistical mechanics and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters), and a detailed analogy with annealing in solids provides a framework for optimization of very large and complex systems.
Abstract: There is a deep and useful connection between statistical mechanics (the behavior of systems with many degrees of freedom in thermal equilibrium at a finite temperature) and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters). A detailed analogy with annealing in solids provides a framework for optimization of the properties of very large and complex systems. This connection to statistical mechanics exposes new information and provides an unfamiliar perspective on traditional optimization problems and methods.

41,772 citations

Journal ArticleDOI
01 Jan 1988-Nature
TL;DR: Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.
Abstract: We describe a new learning procedure, back-propagation, for networks of neurone-like units. The procedure repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector. As a result of the weight adjustments, internal ‘hidden’ units which are not part of the input or output come to represent important features of the task domain, and the regularities in the task are captured by the interactions of these units. The ability to create useful new features distinguishes back-propagation from earlier, simpler methods such as the perceptron-convergence procedure1.

23,814 citations


"The Role of Architectural and Learn..." refers background in this paper

  • ...For both learning architectures we also explore the role of sparse coding, which has been identified as a fundamental principle of neural computation....

    [...]

  • ...One critical aspect of most deep learning systems is the reliance on a feed-forward architecture trained with error backpropagation (Rumelhart et al., 1986), which has been repeatedly shown to yield state-of-the-art performance in a variety of problems (LeCun et al., 2015)....

    [...]

Journal ArticleDOI
26 Feb 2015-Nature
TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Abstract: The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

23,074 citations


"The Role of Architectural and Learn..." refers background in this paper

  • ...These results provide new insights into how basic properties of artificial neural networks might be relevant for modeling neural information processing in biological systems....

    [...]

  • ...Reinforcement learning is a valuable alternative and it has already shown promising results when combined with deep learning (Mnih et al., 2015; Silver et al., 2016), but there is a broad range of situations where learning seems to be fully unsupervised and its only objective is that of discovering…...

    [...]

Trending Questions (1)
What are the basic components of artificial neural network?

These results provide new insights into how basic properties of artificial neural networks might be relevant for modeling neural information processing in biological systems.