scispace - formally typeset
Search or ask a question

Showing papers by "Eero P. Simoncelli published in 2021"


Journal ArticleDOI
TL;DR: In this paper, a large-scale comparison of objective image quality assessment (IQA) models in terms of their use as objectives for the optimization of image processing algorithms is performed, and eleven full-reference IQA models are used to train deep neural networks for four low-level vision tasks.
Abstract: The performance of objective image quality assessment (IQA) models has been evaluated primarily by comparing model predictions to human quality judgments. Perceptual datasets gathered for this purpose have provided useful benchmarks for improving IQA methods, but their heavy use creates a risk of overfitting. Here, we perform a large-scale comparison of IQA models in terms of their use as objectives for the optimization of image processing algorithms. Specifically, we use eleven full-reference IQA models to train deep neural networks for four low-level vision tasks: denoising, deblurring, super-resolution, and compression. Subjective testing on the optimized images allows us to rank the competing models in terms of their perceptual performance, elucidate their relative advantages and disadvantages in these tasks, and propose a set of desirable properties for incorporation into future IQA models.

65 citations


Journal Article
TL;DR: A simulation-based denoising (SBD) framework, in which CNNs are trained on simulated images, which outperforms existing techniques by a wide margin on a simulated benchmark dataset, as well as on real data.
Abstract: Denoising is a fundamental challenge in scientific imaging. Deep convolutional neural networks (CNNs) provide the current state of the art in denoising natural images, where they produce impressive results. However, their potential has barely been explored in the context of scientific imaging. Denoising CNNs are typically trained on real natural images artificially corrupted with simulated noise. In contrast, in scientific applications, noiseless ground-truth images are usually not available. To address this issue, we propose a simulation-based denoising (SBD) framework, in which CNNs are trained on simulated images. We test the framework on data obtained from transmission electron microscopy (TEM), an imaging technique with widespread applications in material science, biology, and medicine. SBD outperforms existing techniques by a wide margin on a simulated benchmark dataset, as well as on real data. Apart from the denoised images, SBD generates likelihood maps to visualize the agreement between the structure of the denoised image and the observed data. Our results reveal shortcomings of state-of-the-art denoising architectures, such as their small field-of-view. Through a gradient-based analysis, we show that substantially increasing the field-of-view of the CNNs allows them to exploit non-local periodic patterns in the data, which is crucial at high noise levels. In addition, we perform a thorough analysis of the generalization capability of SBD, demonstrating that the trained networks are robust to variations of imaging parameters and of the underlying signal structure. Finally, we release the first publicly available benchmark dataset of TEM images, containing 18,000 examples.

22 citations


Journal ArticleDOI
TL;DR: An approach based on the log-likelihood ratio test that provides a quantitative measure of the agreement between the noisy observation and the atomic-level structure in the network-denoised image is developed.
Abstract: A deep convolutional neural network has been developed to denoise atomic-resolution TEM image datasets of nanoparticles acquired using direct electron counting detectors, for applications where the image signal is severely limited by shot noise. The network was applied to a model system of CeO2-supported Pt nanoparticles. We leverage multislice image simulations to generate a large and flexible dataset for training and testing the network. The proposed network outperforms state-of-the-art denoising methods by a significant margin both on simulated and experimental test data. Factors contributing to the performance are identified, including most importantly (a) the geometry of the images used during training and (b) the size of the network's receptive field. Through a gradient-based analysis, we investigate the mechanisms learned by the network to denoise experimental images. This shows that the network exploits global and local information in the noisy measurements, for example, by adapting its filtering approach when it encounters atomic-level defects at the nanoparticle surface. Extensive analysis has been done to characterize the network's ability to correctly predict the exact atomic structure at the nanoparticle surface. Finally, we develop an approach based on the log-likelihood ratio test that provides a quantitative measure of the agreement between the noisy observation and the atomic-level structure in the network-denoised image.

18 citations


Journal ArticleDOI
TL;DR: In this article, the authors use a specialized set of stimuli and two complementary discrimination tasks to demonstrate the opposing perceptual implications of the tradeoff between selectivity and invariance, using stimuli and tasks that explicitly reveal their opposing effects on discrimination performance.
Abstract: Sensory processing necessitates discarding some information in service of preserving and reformatting more behaviorally relevant information. Sensory neurons seem to achieve this by responding selectively to particular combinations of features in their inputs, while averaging over or ignoring irrelevant combinations. Here, we expose the perceptual implications of this tradeoff between selectivity and invariance, using stimuli and tasks that explicitly reveal their opposing effects on discrimination performance. We generate texture stimuli with statistics derived from natural photographs, and ask observers to perform two different tasks: Discrimination between images drawn from families with different statistics, and discrimination between image samples with identical statistics. For both tasks, the performance of an ideal observer improves with stimulus size. In contrast, humans become better at family discrimination but worse at sample discrimination. We demonstrate through simulations that these behaviors arise naturally in an observer model that relies on a common set of physiologically plausible local statistical measurements for both tasks. Visual processing necessitates both extracting and discarding information. Here, the authors use a specialized set of stimuli and two complementary discrimination tasks to demonstrate the opposing perceptual implications of these two aspects of information processing.

10 citations


Journal ArticleDOI
TL;DR: In this article, the authors found that the early visual system uses a set of specialized computations to build representations that can support prediction in the natural environment, and that these effects arise in part from computational mechanisms that underlie the stimulus selectivity of V1 cells.
Abstract: Many sensory-driven behaviors rely on predictions about future states of the environment. Visual input typically evolves along complex temporal trajectories that are difficult to extrapolate. We test the hypothesis that spatial processing mechanisms in the early visual system facilitate prediction by constructing neural representations that follow straighter temporal trajectories. We recorded V1 population activity in anesthetized macaques while presenting static frames taken from brief video clips, and developed a procedure to measure the curvature of the associated neural population trajectory. We found that V1 populations straighten naturally occurring image sequences, but entangle artificial sequences that contain unnatural temporal transformations. We show that these effects arise in part from computational mechanisms that underlie the stimulus selectivity of V1 cells. Together, our findings reveal that the early visual system uses a set of specialized computations to build representations that can support prediction in the natural environment. Many behaviours depend on predictions about the environment. Here the authors find neural populations in primary visual cortex to straighten the temporal trajectories of natural video clips, facilitating the extrapolation of past observations.

7 citations


Journal ArticleDOI
TL;DR: The authors found that the IT neural activity pattern that best aligns with single-exposure visual recognition memory behavior is not repetition suppression but rather sensory referenced suppression: reductions in IT population response magnitude, corrected for sensory modulation.
Abstract: Memories of the images that we have seen are thought to be reflected in the reduction of neural responses in high-level visual areas such as inferotemporal (IT) cortex, a phenomenon known as repetition suppression (RS). We challenged this hypothesis with a task that required rhesus monkeys to report whether images were novel or repeated while ignoring variations in contrast, a stimulus attribute that is also known to modulate the overall IT response. The monkeys' behavior was largely contrast invariant, contrary to the predictions of an RS-inspired decoder, which could not distinguish responses to images that are repeated from those that are of lower contrast. However, the monkeys' behavioral patterns were well predicted by a linearly decodable variant in which the total spike count was corrected for contrast modulation. These results suggest that the IT neural activity pattern that best aligns with single-exposure visual recognition memory behavior is not RS but rather sensory referenced suppression: reductions in IT population response magnitude, corrected for sensory modulation.

7 citations


Posted Content
TL;DR: In this paper, a deep convolutional neural network was developed to denoise atomic-resolution TEM image datasets of nanoparticles acquired using direct electron counting detectors, for applications where the image signal is severely limited by shot noise.
Abstract: A deep convolutional neural network has been developed to denoise atomic-resolution TEM image datasets of nanoparticles acquired using direct electron counting detectors, for applications where the image signal is severely limited by shot noise. The network was applied to a model system of CeO2-supported Pt nanoparticles. We leverage multislice image simulations to generate a large and flexible dataset for training and testing the network. The proposed network outperforms state-of-the-art denoising methods by a significant margin both on simulated and experimental test data. Factors contributing to the performance are identified, including most importantly (a) the geometry of the images used during training and (b) the size of the network's receptive field. Through a gradient-based analysis, we investigate the mechanisms learned by the network to denoise experimental images. This shows that the network exploits global and local information in the noisy measurements, for example, by adapting its filtering approach when it encounters atomic-level defects at the nanoparticle surface. Extensive analysis has been done to characterize the network's ability to correctly predict the exact atomic structure at the nanoparticle surface. Finally, we develop an approach based on the log-likelihood ratio test that provides a quantitative measure of the agreement between the noisy observation and the atomic-level structure in the network-denoised image.

3 citations


Posted ContentDOI
23 Feb 2021-bioRxiv
TL;DR: In this article, the authors propose a framework in which shared stochastic modulation of task-informative neurons serves as a label to facilitate downstream decoding, and demonstrate that this modulator label can be used to improve downstream decoding within a small number of training trials, consistent with observed behavior.
Abstract: Sensory-guided behavior requires reliable encoding of stimulus information in neural responses, and task-specific decoding through selective combination of these responses. The former has been the topic of intensive study, but the latter remains largely a mystery. We propose a framework in which shared stochastic modulation of task- informative neurons serves as a label to facilitate downstream decoding. Theoretical analysis and computational simulations demonstrate that a decoder that exploits such a signal can achieve flexible and accurate readout. Using this theoretical framework, we analyze behavioral and physiological data obtained from monkeys performing a visual orientation discrimination task. The responses of recorded V1 neurons exhibit strongly correlated modulation. This modulation is stronger in those neurons that are most informative for the behavioral task and it is substantially reduced in a control condition where recorded neurons are uninformative. We demonstrate that this modulator label can be used to improve downstream decoding within a small number of training trials, consistent with observed behavior. Finally, we find that the trial-by-trial modulatory signal estimated from V1 populations is also present in the activity of simultaneously recorded MT units, and preferentially so if they are task-informative, supporting the hypothesis that it serves as a label for the selection and decoding of relevant downstream neurons.

2 citations


Posted ContentDOI
09 Feb 2021-bioRxiv
TL;DR: The authors showed that the IT neural activity pattern that best aligns with single-exposure visual recognition memory behavior is not repetition suppression, but rather sensory referenced suppression (SRS), a phenomenon known as repetition suppression (RS).
Abstract: Memories of the images that we have seen are thought to be reflected in the reduction of neural responses in high-level visual areas such as inferotemporal (IT) cortex, a phenomenon known as repetition suppression (RS). We challenged this hypothesis with a task that required rhesus monkeys to report whether images were novel or repeated while ignoring variations in contrast, a stimulus attribute that is also known to modulate the overall IT response. The monkeys9 behavior was largely contrast-invariant, contrary to the predictions of an RS-inspired decoder, which could not distinguish responses to images that are repeated from those of lower contrast. However, the monkeys9 behavioral patterns were well-predicted by a linearly decodable variant in which the total spike count was corrected for contrast modulation. These results suggest that the IT neural activity pattern that best aligns with single-exposure visual recognition memory behavior is not RS but rather "sensory referenced suppression (SRS)": reductions in IT population response magnitude, corrected for sensory modulation.

2 citations



Posted Content
19 Jan 2021
TL;DR: A deep learning-based convolutional neural network is developed to denoise atomic-resolution in situ TEM image datasets of catalyst nanoparticles acquired on high speed, direct electron counting detectors, where the signal is severely limited by shot noise.
Abstract: A deep learning-based convolutional neural network has been developed to denoise atomic-resolution in situ TEM image datasets of catalyst nanoparticles acquired on high speed, direct electron counting detectors, where the signal is severely limited by shot noise. The network was applied to a model catalyst of CeO2-supported Pt nanoparticles. We leverage multislice simulation to generate a large and flexible dataset for training and testing the network. The proposed network outperforms state-of-the-art denoising methods by a significant margin both on simulated and experimental test data. Factors contributing to the performance are identified, including most importantly (a) the geometry of the images used during training and (b) the size of the network's receptive field. Through a gradient-based analysis, we investigate the mechanisms used by the network to denoise experimental images. This shows the network exploits information on the surrounding structure and that it adapts its filtering approach when it encounters atomic-level defects at the catalyst surface. Extensive analysis has been done to characterize the network's ability to correctly predict the exact atomic structure at the catalyst surface. Finally, we develop an approach based on the log-likelihood ratio test that provides an quantitative measure of uncertainty regarding the atomic-level structure in the network-denoised image.



Posted Content
TL;DR: GainTuning as discussed by the authors optimizes a single multiplicative scaling parameter (the "Gain") of each channel in the convolutional layers of the CNN to avoid overfitting.
Abstract: Deep convolutional neural networks (CNNs) for image denoising are usually trained on large datasets. These models achieve the current state of the art, but they have difficulties generalizing when applied to data that deviate from the training distribution. Recent work has shown that it is possible to train denoisers on a single noisy image. These models adapt to the features of the test image, but their performance is limited by the small amount of information used to train them. Here we propose "GainTuning", in which CNN models pre-trained on large datasets are adaptively and selectively adjusted for individual test images. To avoid overfitting, GainTuning optimizes a single multiplicative scaling parameter (the "Gain") of each channel in the convolutional layers of the CNN. We show that GainTuning improves state-of-the-art CNNs on standard image-denoising benchmarks, boosting their denoising performance on nearly every image in a held-out test set. These adaptive improvements are even more substantial for test images differing systematically from the training data, either in noise level or image type. We illustrate the potential of adaptive denoising in a scientific application, in which a CNN is trained on synthetic data, and tested on real transmission-electron-microscope images. In contrast to the existing methodology, GainTuning is able to faithfully reconstruct the structure of catalytic nanoparticles from these data at extremely low signal-to-noise ratios.