Showing papers by "Eero P. Simoncelli published in 2019"

PDF

Open Access

Journal Article•DOI•

Perceptual straightening of natural videos.

[...]

Olivier J. Hénaff¹, Robbe L. T. Goris², Eero P. Simoncelli•Institutions (2)

Center for Neural Science¹, University of Texas at Austin²

29 Apr 2019-Nature Neuroscience

TL;DR: A methodology for estimating the curvature of an internal trajectory from human perceptual judgments is developed, and this is used to test three distinct predictions: natural sequences that are highly curved in the space of pixel intensities should be substantially straighter perceptually; in contrast, artificial sequences that is straight in the intensity domain should be more curved perceptually.

...read moreread less

Abstract: Many behaviors rely on predictions derived from recent visual input, but the temporal evolution of those inputs is generally complex and difficult to extrapolate. We propose that the visual system transforms these inputs to follow straighter temporal trajectories. To test this ‘temporal straightening’ hypothesis, we develop a methodology for estimating the curvature of an internal trajectory from human perceptual judgments. We use this to test three distinct predictions: natural sequences that are highly curved in the space of pixel intensities should be substantially straighter perceptually; in contrast, artificial sequences that are straight in the intensity domain should be more curved perceptually; finally, naturalistic sequences that are straight in the intensity domain should be relatively less curved. Perceptual data validate all three predictions, as do population models of the early visual system, providing evidence that the visual system specifically straightens natural videos, offering a solution for tasks that rely on prediction. The brain predicts future sensory input. The authors hypothesize that the visual system achieves this by straightening the temporal trajectories of natural videos, and they provide evidence using human perceptual experiments and computational modeling.

...read moreread less

63 citations

Proceedings Article•DOI•

Blind Image Quality Assessment by Learning from Multiple Annotators

[...]

Kede Ma¹, Xuelin Liu², Yuming Fang², Eero P. Simoncelli¹•Institutions (2)

Courant Institute of Mathematical Sciences¹, Jiangxi University of Finance and Economics²

01 Sep 2019

TL;DR: This work develops a blind IQA (BIQA) model, and a method of training it without human ratings, and demonstrates that this model outperforms state-of-the-art BIQA models in terms of correlation with human ratings in existing databases, as well in group maximum differentiation (gMAD) competition.

...read moreread less

Abstract: Models for image quality assessment (IQA) are generally optimized and tested by comparing to human ratings, which are expensive to obtain. Here, we develop a blind IQA (BIQA) model, and a method of training it without human ratings. We first generate a large number of corrupted image pairs, and use a set of existing IQA models to identify which image of each pair has higher quality. We then train a convolutional neural network to estimate perceived image quality along with the uncertainty, optimizing for consistency with the binary labels. The reliability of each IQA annotator is also estimated during training. Experiments demonstrate that our model outperforms state-of-the-art BIQA models in terms of correlation with human ratings in existing databases, as well in group maximum differentiation (gMAD) competition.

...read moreread less

46 citations

Posted Content•

Robust and interpretable blind image denoising via bias-free convolutional neural networks

[...]

Sreyas Mohan¹, Zahra Kadkhodaie¹, Eero P. Simoncelli¹, Carlos Fernandez-Granda¹•Institutions (1)

New York University¹

13 Jun 2019-arXiv: Image and Video Processing

TL;DR: It is shown that deep convolutional networks systematically overfit the noise levels for which they are trained: when deployed at noise levels outside the training range, performance degrades dramatically; in contrast, a bias-free architecture is shown, obtained by removing the constant terms in every layer of the network, including those used for batch normalization.

...read moreread less

Abstract: Deep convolutional networks often append additive constant ("bias") terms to their convolution operations, enabling a richer repertoire of functional mappings. Biases are also used to facilitate training, by subtracting mean response over batches of training images (a component of "batch normalization"). Recent state-of-the-art blind denoising methods (e.g., DnCNN) seem to require these terms for their success. Here, however, we show that these networks systematically overfit the noise levels for which they are trained: when deployed at noise levels outside the training range, performance degrades dramatically. In contrast, a bias-free architecture -- obtained by removing the constant terms in every layer of the network, including those used for batch normalization-- generalizes robustly across noise levels, while preserving state-of-the-art performance within the training range. Locally, the bias-free network acts linearly on the noisy image, enabling direct analysis of network behavior via standard linear-algebraic tools. These analyses provide interpretations of network functionality in terms of nonlinear adaptive filtering, and projection onto a union of low-dimensional subspaces, connecting the learning-based method to more traditional denoising methodology.

...read moreread less

38 citations

Proceedings Article•

Flexible information routing in neural populations through stochastic comodulation

[...]

Caroline Haimerl¹, Cristina Savin, Eero P. Simoncelli•Institutions (1)

New York University¹

01 Jan 2019

TL;DR: This work simulates an encoding population of spiking neurons whose rates are modulated by a shared stochastic signal, and shows that a linear decoder with readout weights approximating neuron-specific modulation strength can achieve near-optimal accuracy.

...read moreread less

Abstract: Humans and animals are capable of flexibly switching between a multitude of tasks, each requiring rapid, sensory-informed decision making. Incoming stimuli are processed by a hierarchy of neural circuits consisting of millions of neurons with diverse feature selectivity. At any given moment, only a small subset of these carry task-relevant information. In principle, downstream processing stages could identify the relevant neurons through supervised learning, but this would require many example trials. Such extensive learning periods are inconsistent with the observed flexibility of humans or animals, who can adjust to changes in task parameters or structure almost immediately. Here, we propose a novel solution based on functionally-targeted stochastic modulation. It has been observed that trial-to-trial neural activity is modulated by a shared, low-dimensional, stochastic signal that introduces task-irrelevant noise. Counter-intuitively this noise is preferentially targeted towards task-informative neurons, corrupting the encoded signal. However, we hypothesize that this modulation offers a solution to the identification problem, labeling task-informative neurons so as to facilitate decoding. We simulate an encoding population of spiking neurons whose rates are modulated by a shared stochastic signal, and show that a linear decoder with readout weights approximating neuron-specific modulation strength can achieve near-optimal accuracy. Such a decoder allows fast and flexible task-dependent information routing without relying on hardwired knowledge of the task-informative neurons (as in maximum likelihood) or unrealistically many supervised training trials (as in regression).

...read moreread less

7 citations

Journal Article•DOI•

Compound Stimuli Reveal the Structure of Visual Motion Selectivity in Macaque MT Neurons.

[...]

Andrew D Zaharia¹, Andrew D Zaharia², Robbe L. T. Goris³, J. Anthony Movshon³, Eero P. Simoncelli³, Eero P. Simoncelli⁴, Eero P. Simoncelli² - Show less +3 more•Institutions (4)

New York University¹, Howard Hughes Medical Institute², Center for Neural Science³, Courant Institute of Mathematical Sciences⁴

01 Nov 2019

TL;DR: It is concluded that direction selectivity in MT is primarily computed by summing V1 afferents, but pattern-invariant velocity tuning for complex stimuli may arise from local, recurrent interactions.

...read moreread less

Abstract: Motion selectivity in primary visual cortex (V1) is approximately separable in orientation, spatial frequency, and temporal frequency ("frequency-separable"). Models for area MT neurons posit that their selectivity arises by combining direction-selective V1 afferents whose tuning is organized around a tilted plane in the frequency domain, specifying a particular direction and speed ("velocity-separable"). This construction explains "pattern direction-selective" MT neurons, which are velocity-selective but relatively invariant to spatial structure, including spatial frequency, texture and shape. We designed a set of experiments to distinguish frequency-separable and velocity-separable models and executed them with single-unit recordings in macaque V1 and MT. Surprisingly, when tested with single drifting gratings, most MT neurons' responses are fit equally well by models with either form of separability. However, responses to plaids (sums of two moving gratings) tend to be better described as velocity-separable, especially for pattern neurons. We conclude that direction selectivity in MT is primarily computed by summing V1 afferents, but pattern-invariant velocity tuning for complex stimuli may arise from local, recurrent interactions.

...read moreread less

7 citations

Posted Content•DOI•

Flexible and accurate decoding of neural populations through stochastic comodulation

[...]

Caroline Haimerl¹, Cristina Savin¹, Eero P. Simoncelli¹•Institutions (1)

New York University¹

02 May 2019-bioRxiv

TL;DR: It is shown in simulations of a modulated Poisson spiking model that a linear decoder with readout weights proportional to the estimated neuron-specific strength of modulation achieves performance close to an optimal decoder.

...read moreread less

Abstract: Sensory-guided behavior requires reliable encoding of information (from stimuli to neural responses) and flexible decoding (from neural responses to behavior). In typical decision tasks, a small subset of cells within a large population encode task-relevant stimulus information and need to be identified by later processing stages for relevant information to be transmitted. A statistically optimal decoder (e.g., maximum likelihood) can utilize task-relevant cells for any given task configuration, but relies on complete knowledge of the relationship between the task and the stimulus-response and noise properties of the encoding population. The brain could learn an optimal decoder for a task through supervised learning (i.e., regression), but this typically requires many training trials, and thus lacks the flexibility of humans or animals, that can rapidly adjust to changes in task parameters or structure. Here, we propose a novel decoding solution based on functionally targeted stochastic modulation. Population recordings during different discrimination tasks have revealed that a substantial portion of trial-to-trial variability in cell responses can be explained by stochastic modulatory signals that are shared, and that seem to preferentially target task-informative neurons (Rabinowitz et al., 2015). The variability introduced by these modulators corrupts the encoded stimulus signal, but we propose that it also serves as a label for the informative neurons, allowing the decoder to solve the identification problem. We show in simulations of a modulated Poisson spiking model that a linear decoder with readout weights proportional to the estimated neuron-specific strength of modulation achieves performance close to an optimal decoder.

...read moreread less

5 citations

Peer Review•DOI•

Author response: Inference of nonlinear receptive field subunits with spike-triggered clustering

[...]

Nishal P. Shah¹, Nora Brackbill¹, Colleen Rhoades¹, Alexandra Kling¹, Georges Goetz¹, Alan Litke², Alexander Sher³, Eero P. Simoncelli⁴, Eero P. Simoncelli⁵, E. J. Chichilnisky¹ - Show less +6 more•Institutions (5)

Stanford University¹, University of California, Santa Cruz², Santa Cruz Institute for Particle Physics³, Center for Neural Science⁴, Howard Hughes Medical Institute⁵

29 Jul 2019-eLife

4 citations

Posted Content•DOI•

Compound stimuli reveal the structure of visual motion selectivity in macaque MT neurons

[...]

Andrew D Zaharia¹, Andrew D Zaharia², Robbe L. T. Goris¹, J. Anthony Movshon¹, Eero P. Simoncelli², Eero P. Simoncelli¹ - Show less +2 more•Institutions (2)

Center for Neural Science¹, Howard Hughes Medical Institute²

04 Jul 2019-bioRxiv

...read moreread less

Abstract: Motion selectivity in primary visual cortex (V1) is approximately separable in orientation, spatial frequency, and temporal frequency (“frequency-separable”). Models for area MT neurons posit that their selectivity arises by combining direction-selective V1 afferents whose tuning is organized around a tilted plane in the frequency domain, specifying a particular direction and speed (“velocity-separable”). This construction explains “pattern direction selective” MT neurons, which are velocity-selective but relatively invariant to spatial structure, including spatial frequency, texture and shape. Surprisingly, when tested with single drifting gratings, most MT neurons’ responses are fit equally well by models with either form of separability. However, responses to plaids (sums of two moving gratings) tend to be better described as velocity-separable, especially for pattern neurons. We conclude that direction selectivity in MT is primarily computed by summing V1 afferents, but pattern-invariant velocity tuning for complex stimuli may arise from local, recurrent interactions. Significance Statement How do sensory systems build representations of complex features from simpler ones? Visual motion representation in cortex is a well-studied example: the direction and speed of moving objects, regardless of shape or texture, is computed from the local motion of oriented edges. Here we quantify tuning properties based on single-unit recordings in primate area MT, then fit a novel, generalized model of motion computation. The model reveals two core properties of MT neurons — speed tuning and invariance to local edge orientation — result from a single organizing principle: each MT neuron combines afferents that represent edge motions consistent with a common velocity, much as V1 simple cells combine thalamic inputs consistent with a common orientation.

...read moreread less

4 citations

Journal Article•DOI•

Author Correction: Perceptual straightening of natural videos

[...]

Olivier J. Hénaff¹, Robbe L. T. Goris², Eero P. Simoncelli•Institutions (2)

Center for Neural Science¹, University of Texas at Austin²

01 Jun 2019-Nature Neuroscience

TL;DR: The original and corrected figures are shown in the accompanying Author Correction.

...read moreread less

Abstract: The original and corrected figures are shown in the accompanying Author Correction.

...read moreread less

1 citations

Interpretable and robust blind image denoising with bias-free convolutional neural networks

[...]

Zahra Kadkhodaie, Sreyas Mohan, Eero P. Simoncelli, Carlos Fernandez-Granda

14 Sep 2019

TL;DR: It is shown that bias terms used in most CNNs interfere with the interpretability of these networks, do not help performance, and in fact prevent generalization of performance to noise levels not including in the training data.

...read moreread less

Abstract: Deep convolutional networks often append additive constant ("bias") terms to their convolution operations, enabling a richer repertoire of functional mappings. Biases are also used to facilitate training, by subtracting mean response over batches of training images (a component of "batch normalization"). Recent state-of-the-art blind denoising methods seem to require these terms for their success. Here, however, we show that bias terms used in most CNNs (additive constants, including those used for batch normalization) interfere with the interpretability of these networks, do not help performance, and in fact prevent generalization of performance to noise levels not including in the training data. In particular, bias-free CNNs (BF-CNNs) are locally linear, and hence amenable to direct analysis with linear-algebraic tools. These analyses provide interpretations of network functionality in terms of projection onto a union of low-dimensional subspaces, connecting the learning-based method to more traditional denoising methodology. Additionally, BF-CNNs generalize robustly, achieving near-state-of-the-art performance at noise levels well beyond the range over which they have been trained.

...read moreread less