scispace - formally typeset
Search or ask a question
Author

Eero P. Simoncelli

Bio: Eero P. Simoncelli is an academic researcher from Center for Neural Science. The author has contributed to research in topics: Wavelet & Image processing. The author has an hindex of 81, co-authored 260 publications receiving 68623 citations. Previous affiliations of Eero P. Simoncelli include New York University & Stanford University.


Papers
More filters
Journal ArticleDOI
TL;DR: A population model for mid-ventral processing is developed, in which nonlinear combinations of V1 responses are averaged in receptive fields that grow with eccentricity, providing a quantitative framework for assessing the capabilities and limitations of everyday vision.
Abstract: Receptive fields of visual neurons get bigger along the ventral visual pathway and, in each area, they grow with distance from the fovea. The authors exploit these properties to build a model for visual representation in the ventral stream, using 'metameric' visual stimuli (which appear perceptually identical, but are actually different) to test the model predictions. The model can also explain deficits in peripheral recognition known as visual crowding.

576 citations

Journal ArticleDOI
TL;DR: In this article, a probability model for natural images is proposed based on empirical observation of their statistics in the wavelet transform domain, and an image coder called EPWIC is constructed, in which subband coefficients are encoded one bitplane at a time using a nonadaptive arithmetic encoder.
Abstract: We develop a probability model for natural images, based on empirical observation of their statistics in the wavelet transform domain. Pairs of wavelet coefficients, corresponding to basis functions at adjacent spatial locations, orientations, and scales, are found to be non-Gaussian in both their marginal and joint statistical properties. Specifically, their marginals are heavy-tailed, and although they are typically decorrelated, their magnitudes are highly correlated. We propose a Markov model that explains these dependencies using a linear predictor for magnitude coupled with both multiplicative and additive uncertainties, and show that it accounts for the statistics of a wide variety of images including photographic images, graphical images, and medical images. In order to directly demonstrate the power of the model, we construct an image coder called EPWIC (embedded predictive wavelet image coder), in which subband coefficients are encoded one bitplane at a time using a nonadaptive arithmetic encoder that utilizes conditional probabilities calculated from the model. Bitplanes are ordered using a greedy algorithm that considers the MSE reduction per encoded bit. The decoder uses the statistical model to predict coefficient values based on the bits it has received. Despite the simplicity of the model, the rate-distortion performance of the coder is roughly comparable to the best image coders in the literature.

576 citations

Journal ArticleDOI
TL;DR: Some recent results in statistical modeling of natural images that attempt to explain patterns of non-Gaussian behavior of image statistics, i.e. high kurtosis, heavy tails, and sharp central cusps are reviewed.
Abstract: Statistical analysis of images reveals two interesting properties: (i) invariance of image statistics to scaling of images, and (ii) non-Gaussian behavior of image statistics, i.e. high kurtosis, heavy tails, and sharp central cusps. In this paper we review some recent results in statistical modeling of natural images that attempt to explain these patterns. Two categories of results are considered: (i) studies of probability models of images or image decompositions (such as Fourier or wavelet decompositions), and (ii) discoveries of underlying image manifolds while restricting to natural images. Applications of these models in areas such as texture analysis, image classification, compression, and denoising are also considered.

561 citations

Journal ArticleDOI
TL;DR: This work shows that the responses of MT cells can be captured by a linear-nonlinear model that operates not on the visual stimulus, but on the afferent responses of a population of nonlinear V1 cells, and robustly predicts the separately measured responses to gratings and plaids.
Abstract: Neurons in area MT (V5) are selective for the direction of visual motion. In addition, many are selective for the motion of complex patterns independent of the orientation of their components, a behavior not seen in earlier visual areas. We show that the responses of MT cells can be captured by a linear-nonlinear model that operates not on the visual stimulus, but on the afferent responses of a population of nonlinear V1 cells. We fit this cascade model to responses of individual MT neurons and show that it robustly predicts the separately measured responses to gratings and plaids. The model captures the full range of pattern motion selectivity found in MT. Cells that signal pattern motion are distinguished by having convergent excitatory input from V1 cells with a wide range of preferred directions, strong motion opponent suppression and a tuned normalization that may reflect suppressive input from the surround of V1 cells.

534 citations

Journal ArticleDOI
TL;DR: This work estimated observers' internal models for orientation and found that they matched the local orientation distribution measured in photographs, and determined how a neural population could embed probabilistic information responsible for such biases.
Abstract: Orientation judgments are more accurate at the horizontal and vertical orientations, possibly reflecting a statistical inference. Here the authors provide evidence for this idea, finding that observers' internal models for orientation match the local orientation distribution measured in photographs, and suggest how such information could be encoded in a neural population.

528 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

Proceedings Article
Sergey Ioffe1, Christian Szegedy1
06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

30,843 citations

Journal ArticleDOI
TL;DR: In this paper, it is shown that the difference of information between the approximation of a signal at the resolutions 2/sup j+1/ and 2 /sup j/ (where j is an integer) can be extracted by decomposing this signal on a wavelet orthonormal basis of L/sup 2/(R/sup n/), the vector space of measurable, square-integrable n-dimensional functions.
Abstract: Multiresolution representations are effective for analyzing the information content of images. The properties of the operator which approximates a signal at a given resolution were studied. It is shown that the difference of information between the approximation of a signal at the resolutions 2/sup j+1/ and 2/sup j/ (where j is an integer) can be extracted by decomposing this signal on a wavelet orthonormal basis of L/sup 2/(R/sup n/), the vector space of measurable, square-integrable n-dimensional functions. In L/sup 2/(R), a wavelet orthonormal basis is a family of functions which is built by dilating and translating a unique function psi (x). This decomposition defines an orthogonal multiresolution representation called a wavelet representation. It is computed with a pyramidal algorithm based on convolutions with quadrature mirror filters. Wavelet representation lies between the spatial and Fourier domains. For images, the wavelet representation differentiates several spatial orientations. The application of this representation to data compression in image coding, texture discrimination and fractal analysis is discussed. >

20,028 citations

Book
D.L. Donoho1
01 Jan 2004
TL;DR: It is possible to design n=O(Nlog(m)) nonadaptive measurements allowing reconstruction with accuracy comparable to that attainable with direct knowledge of the N most important coefficients, and a good approximation to those N important coefficients is extracted from the n measurements by solving a linear program-Basis Pursuit in signal processing.
Abstract: Suppose x is an unknown vector in Ropfm (a digital image or signal); we plan to measure n general linear functionals of x and then reconstruct. If x is known to be compressible by transform coding with a known transform, and we reconstruct via the nonlinear procedure defined here, the number of measurements n can be dramatically smaller than the size m. Thus, certain natural classes of images with m pixels need only n=O(m1/4log5/2(m)) nonadaptive nonpixel samples for faithful recovery, as opposed to the usual m pixel samples. More specifically, suppose x has a sparse representation in some orthonormal basis (e.g., wavelet, Fourier) or tight frame (e.g., curvelet, Gabor)-so the coefficients belong to an lscrp ball for 0

18,609 citations