scispace - formally typeset
Search or ask a question
Posted ContentDOI

A connectivity-constrained computational account of topographic organization in high-level visual cortex

30 May 2021-bioRxiv (Cold Spring Harbor Laboratory)-
TL;DR: In this paper, the authors present an account of topographic organization involving a computational model with two components: 1) a feature-extracting encoder model of early visual processes, followed by 2) a model of high-level hierarchical visual processing in IT subject to specific biological constraints.
Abstract: Inferotemporal cortex (IT) in humans and other primates is topographically organized, with multiple domain-selective areas and other general patterns of functional organization. What factors underlie this organization, and what can this neural arrangement tell us about the mechanisms of high level vision? Here, we present an account of topographic organization involving a computational model with two components: 1) a feature-extracting encoder model of early visual processes, followed by 2) a model of high-level hierarchical visual processing in IT subject to specific biological constraints. In particular, minimizing the wiring cost on spatially organized feedforward and lateral connections within IT, combined with constraining the feedforward processing to be strictly excitatory, results in a hierarchical, topographic organization. This organization replicates a number of key properties of primate IT cortex, including the presence of domain-selective spatial clusters preferentially involved in the representation of faces, objects, and scenes, within-domain topographic organization such as animacy and indoor/outdoor distinctions, and generic spatial organization whereby the response correlation of pairs of units falls off with their distance. The model supports a view in which both domain-specific and domain-general topographic organization arise in the visual system from an optimization process that maximizes behavioral performance while minimizing wiring costs.

Summary (2 min read)

Domain selectivity exists within a broader organization similar to that of primate IT cortex.

  • Previous empirical research has demonstrated that the response correlations between pairs of neurons fall off smoothly with increasing distance between the neurons (data from 57, as plotted in (50), Figure 4A .).
  • Along with previous results (50), their findings suggest that the domain-level topography may simply be a large-scale manifestation of a more general representational topography in which the information represented by neighboring units is more similar than that represented by more distal units.
  • To compute this wiring cost Lw,u, the authors sparsified the network to contain only the 1% strongest connections (sparsity=0.99), and took the averaged squared distance of remaining connections (59, see Equation 6); this sparsification introduces minimal performance deficits in the main ITN model .
  • This model yielded a strong generic topographic organization, but a weaker domain-level topographic organization than the full model.
  • The variant without sign-based constraintsthat demonstrated the least emergent topography-also had the highest wiring cost, and this was due to increases in feedforward spatial costs .

Discussion

  • The investigations presented here demonstrate that many of the key findings thought to support a modular view of separable, innately-specified mechanisms for the recognition of different high-level domains (faces, objects, scenes) can be accounted for within a learning-based account operating under generic connectivity constraints (also see 21).
  • Specifically, the authors observed that the model developed strongly domain-selective spatial clusters which contain preferential information for each domain, and which, when lesioned, produced largely (but not purely) specific deficits.
  • Final spatial cost, computed as the spatial cost of all between-area connections, also known as Third.
  • In the brain, feature tuning is not actually 590 uniform across the visual field (68) .
  • The work presented here makes important progress in modeling, both quantitatively and qualitatively, the factors underlying visual cortical development throughout the visual hierarchy.

Materials and Methods

  • Here, the authors introduce the Interactive Topographic Network (ITN), a framework for computational modeling of high-level visual cortex, under specific biological constraints and in the service of specific task demands.
  • Their main modeling focus is on IT, which consists of a series of pairs of recurrent layers that are subject to biological constraints.
  • The authors reused the same subsets of faces and objects as in (84) , and an additional scene domain was constructed to match the other two domains in total images.
  • An initial learning rate of 0.01 was used, and this learning rate was decayed 5 times by a factor of 10 upon plateau of the validation error; after the 5th learning rate decay, the next validation error plateau determined the end of training.

Recurrent neural network formulation of IT. Our model of IT extends

  • The standard discrete-time Recurrent Neural Network (RNN) for-666 mulation common in computational neuroscience (e.g., 85).
  • Extending the standard RNN framework with biological constraints.

Mass univariate analyses.

  • The first analytic approach is the simple mass-univariate approach, in which each unit is analyzed separately for its mean response to each stimulus domain (objects, faces, scenes), using untrained validation images from the same categories used in training.
  • The authors compare the responses of each domain versus the others using a two-tailed t-test, and given the test statistic t, the significance value p of the test, and the sign of the test statistic s = sign(t), they compute the selectivity as −s log(p).

Searchlight decoding analysis.

  • The effect of a lesion is measured by computing the accuracy following the lesion and relating that to the baseline accuracy.
  • To do so, the authors take the selectivity map, perform spatial smoothing, and select the unit u of peak smoothed selectivity.
  • When the topography is smooth and the regions approxi-851 mately circular, the selectivity-ordered and focal lesions yield similar 852 results.
  • To the extent that the topography is not perfectly 853 smooth or circular, the selectivity-ordered lesion may knock-out a 854 more relevant set of units for a given task.

Did you find this useful? Give us your feedback

Figures (6)

Content maybe subject to copyright    Report

A connectivity-constrained computational account
of topographic organization in primate high-level
visual cortex
Nicholas M. Blauch
a,b,
, Marlene Behrmann
b,c,
, and David C. Plaut
b,c,
a
Program in Neural Computation, Carnegie Mellon University;
b
Neuroscience Institute, Carnegie Mellon University;
c
Department of Psychology, Carnegie Mellon University
This manuscript was compiled on July 13, 2021
Inferotemporal cortex (IT) in humans and other primates is topo-
graphically organized, containing multiple hierarchically-organized
areas selective for particular domains, such as faces and scenes.
This organization is commonly viewed in terms of evolved domain-
specific visual mechanisms. Here, we develop an alternative,
domain-general and developmental account of IT cortical organiza-
tion. The account is instantiated as an Interactive Topographic Net-
work (ITN), a form of computational model in which a hierarchy of
model IT areas, subject to connectivity-based constraints, learns
high-level visual representations optimized for multiple domains. We
find that minimizing a wiring cost on spatially organized feedforward
and lateral connections within IT, combined with constraining the
feedforward processing to be strictly excitatory, results in a hierar-
chical, topographic organization. This organization replicates a num-
ber of key properties of primate IT cortex, including the presence of
domain-selective spatial clusters preferentially involved in the repre-
sentation of faces, objects, and scenes, columnar responses across
separate excitatory and inhibitory units, and generic spatial organi-
zation whereby the response correlation of pairs of units falls off with
their distance. We thus argue that domain-selectivity is an emergent
property of a visual system optimized to maximize behavioral perfor-
mance while minimizing wiring costs.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Inferotemporal cortex
|
Functional organization
|
Topography
|
Neural
network | Development
I
nferotemporal cortex (IT) subserves higher-order visual abil-
1
ities in primates, including the visual recognition of objects
2
and faces. By adulthood in humans, IT cortex, and ventral
3
temporal cortex more generally, contains substantial func-
4
tional topographic organization, including the presence of
5
domain-selective spatial clusters in reliable spatial locations,
6
including clusters for faces (
1
3
), objects (
4
), buildings and
7
scenes (
5
,
6
), and words (
7
). Similar domain-level topographic
8
properties have been found in rhesus macaque monkeys, in-
9
cluding multiple regions of clustered face selectivity (
8
10
).
10
Intriguingly, this selectivity is encompassed in a larger scale
11
“mosaic” of category-selectivity, in which areas of category-
12
selectivity themselves have further columnar clustering within
13
them (
11
13
), pointing to more general principles of organiza-
14
tion beyond the domain level. In line with this idea, human
15
IT cortex also exhibits larger-scale organization for properties
16
such as animacy and real-world size (
14
,
15
), and midlevel
17
features characteristic of these properties and domains have
18
been shown to account well for patterns of high-level visual
19
selectivity (
16
). How these domain-level and more general
20
facets of functional organization arise, how they are related,
21
and whether and in what ways they rely on innate specifica-
22
tion and/or experience-based developmental processes remain
23
contentious.24
Recent work has demonstrated that the neural basis of
25
face recognition depends crucially on experience, given that
26
deprivation of face viewing in juvenile macaque monkeys pre-
27
vents the emergence of face-selective regions (
17
). Relatedly,
28
the absence of exposure to written forms through reading
29
acquisition precludes the emergence of word-selective regions
30
(
18
,
19
). That there exists clustered neural response selectivity
31
for evolutionarily new visual categories such as written words
32
offers further evidence that the topographic development of
33
the human visual system has a critical experience-dependent
34
component (
20
,
21
). In contrast with a system in which innate
35
mechanisms are determined through natural selection, this
36
experiential plasticity permits the tuning of the visual system
37
based on the most frequent and important visual stimuli that
38
are actually encountered, thereby enabling greater flexibility
39
for ongoing adaptation across the lifespan. 40
There is considerable computational evidence that
41
experience-dependent neural plasticity can account for the
42
response properties of the visual system at the single neuron
43
level. Classic work demonstrated that the statistics of natural
44
images are sufficient for learning V1-like localized edge-tuning
45
within a sparse coding framework (
22
,
23
). More recently,
46
deep convolutional neural networks (DCNNs) trained on im-
47
age classification have been successful in accounting for the
48
tuning of neurons in V1, V2, V4, and IT in a hierarchically
49
consistent manner, where deeper layers of the DCNN map
50
Significance Statement
We introduce the Interactive Topographic Network, a framework
for modeling high-level vision, to demonstrate in computational
simulations that the spatial clustering of domains in late stages
of the primate visual system may arise from the demands of
visual recognition under the constraints of minimal wiring costs
and excitatory between-area neuronal communication. The
learned organization of the model is highly specialized but not
fully modular, capturing many of the properties of organization
in primates. Our work is significant for cognitive neuroscience,
by providing a domain-general developmental account of topo-
graphic functional specialization, and for computational neuro-
science, by demonstrating how well-known biological details
can be successfully incorporated into neural network models in
order to account for critical empirical findings.
N.M.B., M.B., and D.C.P. conceived of the work. N.M.B. developed software and performed simula-
tions and data analyses. M.B. and D.C.P supervised the project. N.M.B. wrote the first draft of the
paper. N.M.B, M.B., and D.C.P. revised the paper.
The authors declare no competing interests.
To whom correspondence should be addressed. E-mail: {blauch,behrmann,plaut}@cmu.edu
https://www.biorxiv.org/content/10.1101/2021.05.29.446297 July 13, 2021 | 1–12
.CC-BY-NC-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 12, 2021. ; https://doi.org/10.1101/2021.05.29.446297doi: bioRxiv preprint

onto later layers of the anatomical hierarchy (24, 25).51
Above the single-neuron level, considerable prior work has
52
demonstrated that topographic organization in V1 may emerge
53
from self-organizing, input-driven mechanisms (
26
32
) (for
54
review, see
33
). For example, the pinwheel architecture of spa-
55
tially repeating smooth orientation selectivity overlaid with
56
global retinotopy has been shown to be well-accounted for
57
by Self-Organizing Maps (SOMs) (
29
,
30
,
34
). One notable
58
application of an SOM to modeling high-level visual cortex by
59
Cowell and Cottrell (
35
) demonstrated stronger topographic
60
clustering for faces compared to other object categories (e.g.,
61
chairs, shoes), suggesting that the greater topographic cluster-
62
ing of faces in IT is due to greater within-category similarity
63
among faces compared to these other categories. This work
64
provides a strong case for domain-general developmental prin-
65
ciples underlying cortical topography in IT, but at least two
66
important issues remain unaddressed. First, rather than only
67
supporting discrimination of face from non-face categories (as
68
in
35
), face representations in humans (and likely non-human
69
primates, though see (
36
)) must support the more difficult
70
and fine-grained task of individuation; this task requires a
71
“spreading transformation” of representations for different face
72
identities (
37
,
38
), which could could alter the feature space
73
and its topographic mapping, and necessitate a more domain-
74
specialized representation than arises in an SOM. And secondly,
75
rather than a single face-selective area, IT cortex actually con-
76
tains multiple hierarchically-organized face-selective regions
77
with preferential inter-connectivity (
39
). Generally, SOMs are
78
not well equipped to explain such hierarchical topographic
79
interactions, as they are designed to map a feature space into
80
a topographic embedding, but not to transform the feature
81
space hierarchically in the way needed to untangle invariant
82
visual object representation from the statistics of natural im-
83
ages (
40
). This suggests that SOMs may not be a good model
84
of topographic development in cortical networks.85
An alternative approach to studying topographic organi-
86
zation involves incorporating distance-dependent constraints
87
on neural computation within more general neural network
88
models (
41
44
). Of particular interest is a hierarchical neural
89
network developed by Jacobs and Jordan (
43
) in which error-
90
driven learning was augmented with a spatial loss function
91
penalizing large weights to a greater degree on longer versus
92
shorter connections. This model was shown to develop to-
93
pographic organization for ’what’ versus ’where’ information
94
when trained with spatially segregated output units for the
95
two tasks. Closely related work by Plaut and Behrmann (
45
)
96
demonstrated that a similar spatially-constrained model with
97
biased demands on input (e.g., retinotopy) and output (e.g.
98
left-lateralized language) could account for the organization of
99
domain-specific areas in IT cortex, such as the foveal bias for
100
words and faces, leftward lateralization of words, and right-
101
ward lateralization of faces (
46
48
). However, to date, none of
102
these structurally-biased neural network models have been ap-
103
plied to large-scale sets of naturalistic images, the statistics of
104
which are thought to organize high-level visual representations
105
in IT cortex (
49
), and the topography in these models (
43
,
45
)
106
has been analyzed at a relatively coarse level. Nonetheless,
107
this early work raises the possibility that the application of
108
distance-dependent constraints in a modern deep neural ar-
109
chitecture trained on natural images might provide a more
110
comprehensive account of topographic organization in IT.111
Recently, Lee and colleagues (
50
) have modeled the topogra-
112
phy of IT cortex with a deep neural network trained on a large
113
set of natural images, using a correlation-based layout that
114
explicitly encouraged units within a layer of the network to be
115
spatially nearer to units with correlated responses, and farther
116
from units with uncorrelated or anti-correlated responses. As
117
a result, the network developed face-selective topography that
118
corresponded well with data from macaque monkeys. However,
119
this approach imposes topographic functional organization on
120
the network based on measured functional responses, rather
121
than deriving it from realistic principles of cortical structure
122
and function, such as constraints on connectivity. Moreover,
123
like the SOM, the approach can explain only within-area to-
124
pographic organization, and not relationships between areas,
125
such as multiple stages of IT cortex and their interactions with
126
upstream and downstream cortical areas. Thus, the question
127
remains whether such basic structural principles can account
128
for the topographic organization of IT. 129
In the current work, we combined the approaches of task-
130
optimized DCNN modeling (
49
,
50
) with flexible connectivity-
131
constrained architectures (
43
,
45
) to develop a hierarchical
132
model of topographic organization in IT cortex. We imple-
133
mented a bias towards local connectivity through minimization
134
of an explicit wiring cost function (
43
) alongside a task per-
135
formance cost function. Intriguingly, we observed that this
136
pressure on local connectivity was, on its own, insufficient
137
to drive topographic organization in our model. This led
138
us to explore two neurobiological constraints on the sign of
139
connectivity—strictly excitatory feedforward connectivity, and
140
the separation of excitation and inhibition—with the result
141
that both, and particularly excitatory feedforward connectiv-
142
ity, provided a powerful further inductive bias for developing 143
topographic organization when combined with a bias towards
144
local connectivity. 145
Results 146
A connectivity-constrained model of ventral temporal cortex 147
produces hierarchical, domain-selective response topogra- 148
phy.
Our Interactive Topographic Network (ITN) framework
149
for modeling high-level visual cortex consists of an encoder
150
that approximates early visual cortex, followed by interactive
151
topography areas that approximate IT cortex (Figure 1A; see
152
Methods for details). We first present the results of simulations
153
of a specific ITN model, in which a ResNet-50 encoder is pre-
154
trained on a large dataset including several categories from the
155
domains of objects, faces, and scenes (each domain matched in
156
total training images). The trained encoder provides input to
157
a 3-area IT with separate posterior (pIT), central (cIT), and
158
anterior (aIT) areas. Each IT area consists of separate banks
159
of excitatory (E) and inhibitory (I) units, and feedforward
160
connectivity between areas is limited to the E units. After
161
training, the model performed well on each domain, reaching
162
a classification accuracy of 86.4% on the face domain, 81.8%
163
on the object domain, and 65.9% on the scene domain (see
164
Supplementary Figure S1). Performance differences across do-
165
mains are unlikely to be an artifact of the specific architecture
166
as they can be seen across a variety of CNNs, reflecting the
167
intrinsic difficulty of each task given the variability within and
168
between categories of each domain for the given image sets. 169
The trained model exhibits domain-level topographic or-
170
ganization that is hierarchically linked across corresponding
171
2 | https://www.biorxiv.org/content/10.1101/2021.05.29.446297 Blauch et al.
.CC-BY-NC-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 12, 2021. ; https://doi.org/10.1101/2021.05.29.446297doi: bioRxiv preprint

Fig. 1.
The Interactive Topographic Network produces hierarchical domain-level organization.
A.
diagram of the Interactive Topographic Network (ITN). An ITN model consists
of three components: an encoder that approximates early visual processing prior to inferotemporal cortex, the interactive topography (IT) areas that approximate inferotemporal
cortex, and the readout mechanism for tasks such as object, scene, and face recognition. The architecture of each component is flexible. For example, a 4-layer simple
convolutional network or a deep 50-layer ResNet can be used as the encoder; whereas the former facilitates end-to-end training along with a temporally-precise IT model, the
latter supports better learning of the features that discriminate among trained categories. In this work, topographic organization is restricted to the IT layers. The figure depicts
the main version of the ITN containing three constraints: a spatial connectivity cost pressuring local connectivity, separation of neurons with excitatory and inhibitory influences,
and the restriction that all between-area connections are sent by the excitatory neurons. The final IT layer projects to the category readout layer containing one localist unit per
learned category, here shown organized into three learned domains. (Note that this organization is merely visual and does not indicate any architectural segregation in the
model.
B.
Domain selectivity at each level of the IT hierarchy. Selectivity is computed separately for each domain, and then binarized by including all units corresponding to
p <
0
.
001. Each domain is assigned a color channel in order to plot all selectivities simultaneously. Note that a unit can have zero, one, or two selective domains, but not
three, as indicated in the color legend.
C.
Detailed investigation of domain-level topography in aIT. Each heatmap plots a metric for each unit in aIT. The first column shows the
mean domain response for each domain, the second column shows domain selectivity, the third column shows the within-domain searchlight decoding accuracy, and the fourth
column shows the mean of weights of a given aIT unit into the readout categories of a given domain.
sectors of each layer (see Figure 1B). This result reflects the
172
fact that the distance-dependent constraints on feedforward
173
connectivity pressured units that have minimal between-area
174
distances to learn a similar tuning, which means that each
175
layer is roughly overlapping in their respective (separate) 2D
176
topography. The topographic organization gets somewhat
177
smoother moving from pIT to cIT, most likely because units
178
in cIT and aIT (but not pIT) have local feedforward receptive
179
fields and thus greater constraint on local cooperation.180
We next scrutinized the topography in aIT, where there
181
are very smooth domain-level responses, and where we can di-
182
rectly compare responses with those of the recognition readout
183
mechanism. We computed mean domain responses, plotted
184
in the first column of Figure 1C, and domain selectivity, plot-
185
ted in the second column, which demonstrates corresponding
186
topographic organization. We confirmed the functional sig-
187
nificance of response topography by conducting a searchlight
188
analysis inspired by multivariate approaches to analyzing func-
189
tional magnetic resonance imaging (fMRI) data (
51
). We
190
used searchlights containing the 10% (102) nearest units. The
191
results of this analysis, shown in the third column of Figure
192
1C, revealed topographic organization of information for dis-
193
criminating between categories of each domain that is strongly
194
correlated with the domain selectivity maps for each domain
195
(all ps < 0.0001). 196
To further confirm the functional significance of the topo-
197
graphic organization, we analyzed the spatial organization of
198
readout weights from aIT to the localist category readout layer.
199
We evaluated whether each domain placed more weight in read-
200
ing out from the units for which there was greater selectivity,
201
by calculating the mean domain response weight for each unit,
202
averaged over classes in each domain. This produced a map
203
for each domain, shown in the last column of Figure 1C. We
204
find a large positive correlation between the mean readout
205
Blauch et al. bioRXiv | 3
.CC-BY-NC-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 12, 2021. ; https://doi.org/10.1101/2021.05.29.446297doi: bioRxiv preprint

Fig. 2.
E and I cells act as functional columns. Selectivity of cIT excitatory (E)
units (left columns), and inhibitory (I) units (middle column) for each domain, and
histograms of response correlations between co-localized E and I units for images
from each domain (right column).
weight and the mean response for each domain (all
r
s>0.7, all
206
p
s
<
0
.
0001), further demonstrating the functional significance
207
of the response topography.208
Excitatory and inhibitory units operate as functional columns.209
Thus far we have focused on the representations in the E cells,
210
both for convenience and clarity, and because it is the E units
211
that exclusively project to downstream areas (including the
212
category readout units). We next assessed whether the I units
213
show a similar topographic organization, and whether it is
214
linked with the E cells. The selectivity for E and I cells is
215
plotted and correlated in Figure 2. The I cells show similar
216
domain-selective topography to the E cells. Moreover, the
217
activities of E and I units in the same 2D location have highly
218
correlated activities over each domain of images, as well as
219
over all images. If we consider a pair of E and I neurons at
220
a given location on the 2D sheet to correspond to a cortical
221
column, our result is reminiscent of the finding that biological
222
neurons in different layers at the same location on the 2D
223
flattened cortex have similar response properties (
52
). In this
224
way, E and I units in the model appear to act as functional
225
columns.226
Effects of lesions indicate strong yet graded domain-level227
specialization.
We next performed a series of “lesion” anal-
228
yses in the model in order to compare with neuropsychological
229
data on face and object recognition (
53
55
). First, we per-
230
formed focal lesions, as would be experienced by most patients
231
with acquired brain damage. To simulate the impairment of
232
patients with maximally specific deficits, we centered circular
233
focal lesions of various sizes at the center of (smoothed) domain
234
selectivity. Performance following each lesion was measured
235
separately for each domain.236
The results of this lesion analysis are shown in Figure 3A.
237
Focal lesions centered on each domain for two representative
238
lesion sizes—using 20% and 30% of the aIT units—are shown
239
in Figure 3A. Focal lesions centered on each domain lead to
240
an especially severe deficit in recognition for that domain, and
241
milder but significant deficits for the other domains as well.
242
For a medium sized lesion of 20% of the units (Figure 3A,
243
right), the deficit is significant for all domains (all
p
s<0.05),
244
and significantly stronger for recognition of the target domain
245
(all ps<0.05). 246
Are these more general effects of circumscribed lesions
247
on non-preferred domains the result of imperfect (patchy)
248
or non-circular topographic organization of an underlying
249
modular organization? To answer this question, we performed
250
selectivity-ordered lesions, in which units were sorted by their
251
selectivity for a given domain, and selected according to their
252
sorting index, shown in Figure 3B. The effects of damage in
253
this case are similar to those for focal lesions, with greater
254
damage to the domain on which sorting was performed, and
255
smaller deficits to other domains for lesions targeting at least
256
20% of the units. Specifically, for 20% lesions, we found smaller
257
but still significant deficits for both the preferred and non-
258
preferred domains compared to focal lesions. This suggests
259
that some but not all of the damage to the non-preferred
260
domain induced by focal lesions may be due to imperfect or
261
non-circular topographic functional organization. Importantly,
262
these more distributed effects of lesions indicate that the
263
functional organization, while highly specialized, is not strictly
264
modular, at least with respect to one influential definition of
265
modularity (
56
). Supplementary Figures S3, and S4 provide
266
additional data on the nature of domain specialization in the
267
network. 268
Domain selectivity exists within a broader organization simi- 269
lar to that of primate IT cortex.
Previous empirical research has
270
demonstrated that the response correlations between pairs of
271
neurons fall off smoothly with increasing distance between the
272
neurons (data from
57
, as plotted in (
50
), Figure 4A.). This
273
finding has been used to develop a class of topographic neural
274
network models that explicitly fits the spatial layout of units
275
to this relationship (
50
). We explored whether this relation-
276
ship emerged naturally in our network due to its constrained
277
connectivity, in line with the emergence of domain-selective
278
topography. We thus computed the correlations among pairs 279
of unit activations across images as a function of the distance
280
between the units, focusing on aIT. As shown in Figure 4B,
281
there is, indeed, a smooth decay of response correlations with
282
distance, matching the qualitative trend in the empirical data
283
(50, 57). 284
This result is not simply due to differences between do-
285
mains, as it is also found when examining responses to images
286
within each domain separately (shown for objects in Figure
287
4C). Along with previous results (
50
), our findings suggest
288
that the domain-level topography may simply be a large-scale
289
manifestation of a more general representational topography in
290
which the information represented by neighboring units is more
291
similar than that represented by more distal units. Impor-
292
tantly, our results go beyond previous ones to also demonstrate
293
that this organization can arise under explicit wiring length
294
and sign-based constraints on connectivity. 295
The generic distance-dependent functional relationship just
296
discussed would suggest that functional organization may be
297
exhibited at finer scales than the domain level. To assess this,
298
we performed a clustering analysis on the readout weights
299
from aIT. We adopted this approach due to the similarity
300
between the readout weights and response topography in aIT
301
(Figure 1C). A given category will achieve its maximal output
302
4 | https://www.biorxiv.org/content/10.1101/2021.05.29.446297 Blauch et al.
.CC-BY-NC-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 12, 2021. ; https://doi.org/10.1101/2021.05.29.446297doi: bioRxiv preprint

Fig. 3.
Lesion results in the ITN model. Each plot shows the relative effects of a set of same-sized lesions on recognition performance for each domain, relative to the
performance on the same domain in the undamaged model. Error bars show bootstrapped 95% confidence intervals over trials; thus, the statistical significance of a given
lesion can be assessed by determining whether the confidence interval includes 0.
A.
Damage from circular focal lesions centered on the peak of smoothed selectivity for each
domain. Left: results for a variety of lesion sizes. Right: a focused analysis of an intermediate lesion size of 20% of the aIT units.
B.
Damage from selectivity-ordered lesions for
each domain. Left: results for a variety of lesion sizes. Right: a focused analysis of an intermediate lesion size of 20% of the aIT units.
response when the activation pattern in aIT most closely aligns
303
with the readout weights. Thus, the readout weights for a
304
given category act as a sort of category template to match
305
with representations in aIT. Clustering the readout weights
306
directly, rather than interpreting a set of activations to natural
307
images, enables clustering solutions to be explicitly linked to
308
each category. This allows for a concise clustering solution
309
containing one element for each category: the readout weights
310
projecting from aIT to the identity unit for that category. We
311
thus performed k-means clustering on the readout weights of
312
all categories separately for each domain using
k
= 3 clus-
313
ters (Figure 4D), finding the centroids of these clusters, and
314
visualizing them in the 2D layout of aIT. The centroids and
315
cluster category members are shown in Figure 4E. The cluster
316
centroids show smooth topographic organization, with each
317
cluster having a primary hot-spot of weight, and graded weight
318
in other parts of aIT. Visual inspection of the cluster category
319
members suggests a striking organization for different classes
320
of object categories. This organization is confirmed through
321
cluster assignment quantification in Figure 4F. The first two
322
clusters represent the vast majority of animate categories, with
323
the first cluster representing mostly non-mammalian animate
324
categories such as birds and reptiles, and the second cluster
325
representing mostly dogs and other mammals such as bears
326
and raccoons. Last, the third cluster represents the vast ma-
327
jority of inanimate objects such as clocks and various tools.
328
Further analysis of the scene and face domain readout weights
329
indicated a similar within-domain organization, with scenes
330
being clustered by indoors-outdoors and natural-manmade
331
dimensions, and faces being clustered by gender and hair color
332
dimensions (Supplementary Figures S7, S8).333
Networks can reduce spatial costs and maintain performance334
by increasing topographic organization.
The optimization
335
problem introduced by Jacobs and Jordan (
43
) and employed
336
in this work (Equation 4) explicitly works to both maximize
337
visual recognition performance through a task-based loss term
338
L
t
, and to minimize overall wiring cost through a connection-
339
based lost term
L
w
that scales with the square of connection
340
distance. To what extent does minimizing this wiring cost term
341
compromise performance? To answer this question, we tested
342
multiple ITN models with varying wiring cost penalties
λ
w
343
and measured the resulting wiring cost and task performance.
344
We computed wiring cost in two ways. The first way is by
345
using the
L
w
term, which takes into account both the length
346
and strength of connections. The second way is inspired by
347
the wiring cost minimization framework (
58
), which cares only
348
about the presence—rather than the strength—of connections,
349
along with their distance. To compute this wiring cost
L
w,u
,
350
we sparsified the network to contain only the 1% strongest
351
connections (sparsity=0.99), and took the averaged squared
352
distance of remaining connections (
59
, see Equation 6); this
353
sparsification introduces minimal performance deficits in the
354
main ITN model (and Figure 5A). The results, shown in Figure
355
5A., demonstrate that increasing the wiring cost penalty
λ
w
356
by an order of magnitude decreased the first spatial cost
L
w
by
357
roughly an order of magnitude. Precisely, the log-log plot in
358
Figure 5A (left) revealed a power law relationship of the form
359
y
=
Ax
m
, where
m
=
1
.
24 (
p <
0
.
001). The unweighted
360
wiring cost
L
w,u
similarly decays roughly linearly on the log-log
361
plot up to
λ
w
= 0
.
1, after which
L
w,u
saturates and then rises
362
for increasing values of
λ
w
. Thus, an intermediate value of
λ
w
363
appears sufficient to drive the network towards preferentially 364
local connectivity, and further increasing
λ
w
may minimize
365
further the optimization term
L
w
through other means, such
366
as by further shrinking small long-range weights and reducing
367
participation at the grid boundaries where mean connection
368
lengths are longest (see Figure 5C, top right). In contrast
369
to the wiring costs, the final classification performance was
370
only marginally affected by
λ
w
(log-log slope
m
=
0
.
0016,
371
p <
0
.
001, explained variance
r
2
= 0
.
582; fit was not sig-
372
Blauch et al. bioRXiv | 5
.CC-BY-NC-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 12, 2021. ; https://doi.org/10.1101/2021.05.29.446297doi: bioRxiv preprint

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper , the authors consider the case of face perception using artificial neural networks to test the hypothesis that functional segregation of face recognition in the brain reflects a computational optimization for the broader problem of visual recognition of faces and other visual categories.
Abstract: The human brain contains multiple regions with distinct, often highly specialized functions, from recognizing faces to understanding language to thinking about what others are thinking. However, it remains unclear why the cortex exhibits this high degree of functional specialization in the first place. Here, we consider the case of face perception using artificial neural networks to test the hypothesis that functional segregation of face recognition in the brain reflects a computational optimization for the broader problem of visual recognition of faces and other visual categories. We find that networks trained on object recognition perform poorly on face recognition and vice versa and that networks optimized for both tasks spontaneously segregate themselves into separate systems for faces and objects. We then show functional segregation to varying degrees for other visual categories, revealing a widespread tendency for optimization (without built-in task-specific inductive biases) to lead to functional specialization in machines and, we conjecture, also brains.

30 citations

Journal ArticleDOI
TL;DR: Norman-Haignere et al. as discussed by the authors revealed a neural population that responds to singing, but not instrumental music or speech, by modeling electrode responses as a weighted sum of canonical response components.

27 citations

Posted ContentDOI
13 Nov 2021-bioRxiv
TL;DR: In this article, the authors used multiple functional localizers to discover regions in the intraparietal sulcus (IPS) that were selectively involved in computing object-centered part relations and found that these regions exhibited task-dependent functional connectivity with ventral cortex.
Abstract: Although there is mounting evidence that input from the dorsal visual pathway is crucial for object processes in the ventral pathway, the specific functional contributions of dorsal cortex to these processes remains poorly understood. Here, we hypothesized that dorsal cortex computes the spatial relations among an object9s parts — a processes crucial for forming global shape percepts — and transmits this information to the ventral pathway to support object categorization. Using multiple functional localizers, we discovered regions in the intraparietal sulcus (IPS) that were selectively involved in computing object-centered part relations. These regions exhibited task-dependent functional connectivity with ventral cortex, and were distinct from other dorsal regions, such as those representing allocentric relations, 3D shape, and tools. In a subsequent experiment, we found that the multivariate response of posterior IPS, defined on the basis of part-relations, could be used to decode object category at levels comparable to ventral object regions. Moreover, mediation and multivariate connectivity analyses further suggested that IPS may account for representations of part relations in the ventral pathway. Together, our results highlight specific contributions of the dorsal visual pathway to object recognition. We suggest that dorsal cortex is a crucial source of input to the ventral pathway and may support the ability to categorize objects on the basis of global shape.

11 citations

Posted ContentDOI
02 Feb 2022
TL;DR: GLMsingle as mentioned in this paper is a toolbox available in MATLAB and Python that enables accurate estimation of single-trial fMRI responses, which can significantly improve the quality of past, present, and future neuroimaging datasets that sample brain activity across many experimental conditions.
Abstract: ABSTRACT Advances in modern artificial intelligence (AI) have inspired a paradigm shift in human neuroscience, yielding large-scale functional magnetic resonance imaging (fMRI) datasets that provide high-resolution brain responses to tens of thousands of naturalistic visual stimuli. Because such experiments necessarily involve brief stimulus durations and few repetitions of each stimulus, achieving sufficient signal-to-noise ratio can be a major challenge. We address this challenge by introducing GLMsingle , a scalable, user-friendly toolbox available in MATLAB and Python that enables accurate estimation of single-trial fMRI responses ( glmsingle.org ). Requiring only fMRI time-series data and a design matrix as inputs, GLMsingle integrates three techniques for improving the accuracy of trial-wise general linear model (GLM) beta estimates. First, for each voxel, a custom hemodynamic response function (HRF) is identified from a library of candidate functions. Second, cross-validation is used to derive a set of noise regressors from voxels unrelated to the experimental paradigm. Third, to improve the stability of beta estimates for closely spaced trials, betas are regularized on a voxel-wise basis using ridge regression. Applying GLMsingle to the Natural Scenes Dataset and BOLD5000, we find that GLMsingle substantially improves the reliability of beta estimates across visually-responsive cortex in all subjects. Furthermore, these improvements translate into tangible benefits for higher-level analyses relevant to systems and cognitive neuroscience. Specifically, we demonstrate that GLMsingle: (i) improves the decorrelation of response estimates between trials that are nearby in time; (ii) enhances representational similarity between subjects both within and across datasets; and (iii) boosts one-versus-many decoding of visual stimuli. GLMsingle is a publicly available tool that can significantly improve the quality of past, present, and future neuroimaging datasets that sample brain activity across many experimental conditions.

8 citations

Posted ContentDOI
06 Jul 2021-bioRxiv
TL;DR: In this article, the authors used artificial neural networks to test the hypothesis that functional segregation of face recognition in the brain reflects the computational requirements of the task, and found that networks trained on generic object recognition perform poorly on face recognition and vice versa.
Abstract: The last quarter century of cognitive neuroscience has revealed numerous cortical regions in humans with distinct, often highly specialized functions, from recognizing faces to understanding language to thinking about what other people are thinking. But it remains unclear why the cortex exhibits this high degree of functional specialization in the first place. Here, we consider the case of face perception, using artificial neural networks to test the hypothesis that functional segregation of face recognition in the brain reflects the computational requirements of the task. We find that networks trained on generic object recognition perform poorly on face recognition and vice versa, and further that networks optimized for both tasks spontaneously segregate themselves into separate systems for faces and objects. Thus, generic visual features that suffice for object recognition are apparently suboptimal for face recognition and vice versa. We then show functional segregation to varying degrees for other visual categories, revealing a widespread tendency for optimization (without built-in task-specific inductive biases) to lead to functional specialization in machines and, we conjecture, also brains.

4 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings ArticleDOI
Jia Deng1, Wei Dong1, Richard Socher1, Li-Jia Li1, Kai Li1, Li Fei-Fei1 
20 Jun 2009
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Abstract: The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.

49,639 citations

Journal ArticleDOI
TL;DR: This article reviews studies investigating complex brain networks in diverse experimental modalities and provides an accessible introduction to the basic principles of graph theory and highlights the technical challenges and key questions to be addressed by future developments in this rapidly moving field.
Abstract: Recent developments in the quantitative analysis of complex networks, based largely on graph theory, have been rapidly translated to studies of brain network organization. The brain's structural and functional systems have features of complex networks--such as small-world topology, highly connected hubs and modularity--both at the whole-brain scale of human neuroimaging and at a cellular scale in non-human animals. In this article, we review studies investigating complex brain networks in diverse experimental modalities (including structural and functional MRI, diffusion tensor imaging, magnetoencephalography and electroencephalography in humans) and provide an accessible introduction to the basic principles of graph theory. We also highlight some of the technical challenges and key questions to be addressed by future developments in this rapidly moving field.

9,700 citations

Journal ArticleDOI
TL;DR: In this paper, the authors describe a self-organizing system in which the signal representations are automatically mapped onto a set of output responses in such a way that the responses acquire the same topological order as that of the primary events.
Abstract: This work contains a theoretical study and computer simulations of a new self-organizing process. The principal discovery is that in a simple network of adaptive physical elements which receives signals from a primary event space, the signal representations are automatically mapped onto a set of output responses in such a way that the responses acquire the same topological order as that of the primary events. In other words, a principle has been discovered which facilitates the automatic formation of topologically correct maps of features of observable events. The basic self-organizing system is a one- or two-dimensional array of processing units resembling a network of threshold-logic units, and characterized by short-range lateral feedback between neighbouring units. Several types of computer simulations are used to demonstrate the ordering process as well as the conditions under which it fails.

8,247 citations

Frequently Asked Questions (1)
Q1. What have the authors contributed in "A connectivity-constrained computational account of topographic organization in primate high-level visual cortex" ?

The authors thus argue that domain-selectivity is an emergent property of a visual system optimized to maximize behavioral performance while minimizing wiring costs.