scispace - formally typeset
Search or ask a question

Showing papers by "Eero P. Simoncelli published in 2019"


Journal ArticleDOI
TL;DR: A methodology for estimating the curvature of an internal trajectory from human perceptual judgments is developed, and this is used to test three distinct predictions: natural sequences that are highly curved in the space of pixel intensities should be substantially straighter perceptually; in contrast, artificial sequences that is straight in the intensity domain should be more curved perceptually.
Abstract: Many behaviors rely on predictions derived from recent visual input, but the temporal evolution of those inputs is generally complex and difficult to extrapolate. We propose that the visual system transforms these inputs to follow straighter temporal trajectories. To test this ‘temporal straightening’ hypothesis, we develop a methodology for estimating the curvature of an internal trajectory from human perceptual judgments. We use this to test three distinct predictions: natural sequences that are highly curved in the space of pixel intensities should be substantially straighter perceptually; in contrast, artificial sequences that are straight in the intensity domain should be more curved perceptually; finally, naturalistic sequences that are straight in the intensity domain should be relatively less curved. Perceptual data validate all three predictions, as do population models of the early visual system, providing evidence that the visual system specifically straightens natural videos, offering a solution for tasks that rely on prediction. The brain predicts future sensory input. The authors hypothesize that the visual system achieves this by straightening the temporal trajectories of natural videos, and they provide evidence using human perceptual experiments and computational modeling.

63 citations


Proceedings ArticleDOI
01 Sep 2019
TL;DR: This work develops a blind IQA (BIQA) model, and a method of training it without human ratings, and demonstrates that this model outperforms state-of-the-art BIQA models in terms of correlation with human ratings in existing databases, as well in group maximum differentiation (gMAD) competition.
Abstract: Models for image quality assessment (IQA) are generally optimized and tested by comparing to human ratings, which are expensive to obtain. Here, we develop a blind IQA (BIQA) model, and a method of training it without human ratings. We first generate a large number of corrupted image pairs, and use a set of existing IQA models to identify which image of each pair has higher quality. We then train a convolutional neural network to estimate perceived image quality along with the uncertainty, optimizing for consistency with the binary labels. The reliability of each IQA annotator is also estimated during training. Experiments demonstrate that our model outperforms state-of-the-art BIQA models in terms of correlation with human ratings in existing databases, as well in group maximum differentiation (gMAD) competition.

46 citations


Posted Content
TL;DR: It is shown that deep convolutional networks systematically overfit the noise levels for which they are trained: when deployed at noise levels outside the training range, performance degrades dramatically; in contrast, a bias-free architecture is shown, obtained by removing the constant terms in every layer of the network, including those used for batch normalization.
Abstract: Deep convolutional networks often append additive constant ("bias") terms to their convolution operations, enabling a richer repertoire of functional mappings. Biases are also used to facilitate training, by subtracting mean response over batches of training images (a component of "batch normalization"). Recent state-of-the-art blind denoising methods (e.g., DnCNN) seem to require these terms for their success. Here, however, we show that these networks systematically overfit the noise levels for which they are trained: when deployed at noise levels outside the training range, performance degrades dramatically. In contrast, a bias-free architecture -- obtained by removing the constant terms in every layer of the network, including those used for batch normalization-- generalizes robustly across noise levels, while preserving state-of-the-art performance within the training range. Locally, the bias-free network acts linearly on the noisy image, enabling direct analysis of network behavior via standard linear-algebraic tools. These analyses provide interpretations of network functionality in terms of nonlinear adaptive filtering, and projection onto a union of low-dimensional subspaces, connecting the learning-based method to more traditional denoising methodology.

38 citations


Proceedings Article
01 Jan 2019
TL;DR: This work simulates an encoding population of spiking neurons whose rates are modulated by a shared stochastic signal, and shows that a linear decoder with readout weights approximating neuron-specific modulation strength can achieve near-optimal accuracy.
Abstract: Humans and animals are capable of flexibly switching between a multitude of tasks, each requiring rapid, sensory-informed decision making. Incoming stimuli are processed by a hierarchy of neural circuits consisting of millions of neurons with diverse feature selectivity. At any given moment, only a small subset of these carry task-relevant information. In principle, downstream processing stages could identify the relevant neurons through supervised learning, but this would require many example trials. Such extensive learning periods are inconsistent with the observed flexibility of humans or animals, who can adjust to changes in task parameters or structure almost immediately. Here, we propose a novel solution based on functionally-targeted stochastic modulation. It has been observed that trial-to-trial neural activity is modulated by a shared, low-dimensional, stochastic signal that introduces task-irrelevant noise. Counter-intuitively this noise is preferentially targeted towards task-informative neurons, corrupting the encoded signal. However, we hypothesize that this modulation offers a solution to the identification problem, labeling task-informative neurons so as to facilitate decoding. We simulate an encoding population of spiking neurons whose rates are modulated by a shared stochastic signal, and show that a linear decoder with readout weights approximating neuron-specific modulation strength can achieve near-optimal accuracy. Such a decoder allows fast and flexible task-dependent information routing without relying on hardwired knowledge of the task-informative neurons (as in maximum likelihood) or unrealistically many supervised training trials (as in regression).

7 citations


Journal ArticleDOI
01 Nov 2019
TL;DR: It is concluded that direction selectivity in MT is primarily computed by summing V1 afferents, but pattern-invariant velocity tuning for complex stimuli may arise from local, recurrent interactions.
Abstract: Motion selectivity in primary visual cortex (V1) is approximately separable in orientation, spatial frequency, and temporal frequency ("frequency-separable"). Models for area MT neurons posit that their selectivity arises by combining direction-selective V1 afferents whose tuning is organized around a tilted plane in the frequency domain, specifying a particular direction and speed ("velocity-separable"). This construction explains "pattern direction-selective" MT neurons, which are velocity-selective but relatively invariant to spatial structure, including spatial frequency, texture and shape. We designed a set of experiments to distinguish frequency-separable and velocity-separable models and executed them with single-unit recordings in macaque V1 and MT. Surprisingly, when tested with single drifting gratings, most MT neurons' responses are fit equally well by models with either form of separability. However, responses to plaids (sums of two moving gratings) tend to be better described as velocity-separable, especially for pattern neurons. We conclude that direction selectivity in MT is primarily computed by summing V1 afferents, but pattern-invariant velocity tuning for complex stimuli may arise from local, recurrent interactions.

7 citations


Posted ContentDOI
02 May 2019-bioRxiv
TL;DR: It is shown in simulations of a modulated Poisson spiking model that a linear decoder with readout weights proportional to the estimated neuron-specific strength of modulation achieves performance close to an optimal decoder.
Abstract: Sensory-guided behavior requires reliable encoding of information (from stimuli to neural responses) and flexible decoding (from neural responses to behavior). In typical decision tasks, a small subset of cells within a large population encode task-relevant stimulus information and need to be identified by later processing stages for relevant information to be transmitted. A statistically optimal decoder (e.g., maximum likelihood) can utilize task-relevant cells for any given task configuration, but relies on complete knowledge of the relationship between the task and the stimulus-response and noise properties of the encoding population. The brain could learn an optimal decoder for a task through supervised learning (i.e., regression), but this typically requires many training trials, and thus lacks the flexibility of humans or animals, that can rapidly adjust to changes in task parameters or structure. Here, we propose a novel decoding solution based on functionally targeted stochastic modulation. Population recordings during different discrimination tasks have revealed that a substantial portion of trial-to-trial variability in cell responses can be explained by stochastic modulatory signals that are shared, and that seem to preferentially target task-informative neurons (Rabinowitz et al., 2015). The variability introduced by these modulators corrupts the encoded stimulus signal, but we propose that it also serves as a label for the informative neurons, allowing the decoder to solve the identification problem. We show in simulations of a modulated Poisson spiking model that a linear decoder with readout weights proportional to the estimated neuron-specific strength of modulation achieves performance close to an optimal decoder.

5 citations



Posted ContentDOI
04 Jul 2019-bioRxiv
TL;DR: It is concluded that direction selectivity in MT is primarily computed by summing V1 afferents, but pattern-invariant velocity tuning for complex stimuli may arise from local, recurrent interactions, as well as a novel, generalized model of motion computation.
Abstract: Motion selectivity in primary visual cortex (V1) is approximately separable in orientation, spatial frequency, and temporal frequency (“frequency-separable”). Models for area MT neurons posit that their selectivity arises by combining direction-selective V1 afferents whose tuning is organized around a tilted plane in the frequency domain, specifying a particular direction and speed (“velocity-separable”). This construction explains “pattern direction selective” MT neurons, which are velocity-selective but relatively invariant to spatial structure, including spatial frequency, texture and shape. Surprisingly, when tested with single drifting gratings, most MT neurons’ responses are fit equally well by models with either form of separability. However, responses to plaids (sums of two moving gratings) tend to be better described as velocity-separable, especially for pattern neurons. We conclude that direction selectivity in MT is primarily computed by summing V1 afferents, but pattern-invariant velocity tuning for complex stimuli may arise from local, recurrent interactions. Significance Statement How do sensory systems build representations of complex features from simpler ones? Visual motion representation in cortex is a well-studied example: the direction and speed of moving objects, regardless of shape or texture, is computed from the local motion of oriented edges. Here we quantify tuning properties based on single-unit recordings in primate area MT, then fit a novel, generalized model of motion computation. The model reveals two core properties of MT neurons — speed tuning and invariance to local edge orientation — result from a single organizing principle: each MT neuron combines afferents that represent edge motions consistent with a common velocity, much as V1 simple cells combine thalamic inputs consistent with a common orientation.

4 citations


Journal ArticleDOI
TL;DR: The original and corrected figures are shown in the accompanying Author Correction.
Abstract: The original and corrected figures are shown in the accompanying Author Correction.

1 citations


14 Sep 2019
TL;DR: It is shown that bias terms used in most CNNs interfere with the interpretability of these networks, do not help performance, and in fact prevent generalization of performance to noise levels not including in the training data.
Abstract: Deep convolutional networks often append additive constant ("bias") terms to their convolution operations, enabling a richer repertoire of functional mappings. Biases are also used to facilitate training, by subtracting mean response over batches of training images (a component of "batch normalization"). Recent state-of-the-art blind denoising methods seem to require these terms for their success. Here, however, we show that bias terms used in most CNNs (additive constants, including those used for batch normalization) interfere with the interpretability of these networks, do not help performance, and in fact prevent generalization of performance to noise levels not including in the training data. In particular, bias-free CNNs (BF-CNNs) are locally linear, and hence amenable to direct analysis with linear-algebraic tools. These analyses provide interpretations of network functionality in terms of projection onto a union of low-dimensional subspaces, connecting the learning-based method to more traditional denoising methodology. Additionally, BF-CNNs generalize robustly, achieving near-state-of-the-art performance at noise levels well beyond the range over which they have been trained.