scispace - formally typeset
Search or ask a question
Author

Eero P. Simoncelli

Bio: Eero P. Simoncelli is an academic researcher from Center for Neural Science. The author has contributed to research in topics: Wavelet & Image processing. The author has an hindex of 81, co-authored 260 publications receiving 68623 citations. Previous affiliations of Eero P. Simoncelli include New York University & Stanford University.


Papers
More filters
Journal ArticleDOI
TL;DR: Spike-triggered average and covariance analyses can be used to estimate the filters and nonlinear combination rule from extracellular experimental data and demonstrated with simulated model neuron examples that emphasize practical issues that arise in experimental situations.
Abstract: Response properties of sensory neurons are commonly described using receptive fields. This description may be formalized in a model that operates with a small set of linear filters whose outputs are nonlinearly combined to determine the instantaneous firing rate. Spike-triggered average and covariance analyses can be used to estimate the filters and nonlinear combination rule from extracellular experimental data. We describe this methodology, demonstrating it with simulated model neuron examples that emphasize practical issues that arise in experimental situations.

466 citations

Proceedings ArticleDOI
03 Jun 1991
TL;DR: Distributed optical flow for a synthetic image sequence is computed, and it is demonstrated that the probabilistic model accounts for the errors in the flow estimates.
Abstract: Gradient methods are widely used in the computation of optical flow. The authors discuss extensions of these methods which compute probability distributions of optical flow. The use of distributions allows representation of the uncertainties inherent in the optical flow computation, facilitating the combination with information from other sources. Distributed optical flow for a synthetic image sequence is computed, and it is demonstrated that the probabilistic model accounts for the errors in the flow estimates. The distributed optical flow for a real image sequence is computed. >

443 citations

Proceedings Article
05 Nov 2016
TL;DR: In this paper, a nonlinear analysis transformation, a uniform quantizer, and a non-linear synthesis transformation are used to optimize the entire model for rate-distortion performance over a database of training images.
Abstract: We describe an image compression method, consisting of a nonlinear analysis transformation, a uniform quantizer, and a nonlinear synthesis transformation. The transforms are constructed in three successive stages of convolutional linear filters and nonlinear activation functions. Unlike most convolutional neural networks, the joint nonlinearity is chosen to implement a form of local gain control, inspired by those used to model biological neurons. Using a variant of stochastic gradient descent, we jointly optimize the entire model for rate-distortion performance over a database of training images, introducing a continuous proxy for the discontinuous loss function arising from the quantizer. Under certain conditions, the relaxed loss function may be interpreted as the log likelihood of a generative model, as implemented by a variational autoencoder. Unlike these models, however, the compression model must operate at any given point along the rate-distortion curve, as specified by a trade-off parameter. Across an independent set of test images, we find that the optimized method generally exhibits better rate-distortion performance than the standard JPEG and JPEG 2000 compression methods. More importantly, we observe a dramatic improvement in visual quality for all images at all bit rates, which is supported by objective quality estimates using MS-SSIM.

410 citations

Journal ArticleDOI
TL;DR: The fitted model predicts the detailed time structure of responses to novel stimuli, accurately capturing the interaction between the spiking history and sensory stimulus selectivity, and can be used to derive an explicit, maximum-likelihood decoding rule for neural spike trains.
Abstract: Sensory encoding in spiking neurons depends on both the integration of sensory inputs and the intrinsic dynamics and variability of spike generation. We show that the stimulus selectivity, reliability, and timing precision of primate retinal ganglion cell (RGC) light responses can be reproduced accurately with a simple model consisting of a leaky integrate-and-fire spike generator driven by a linearly filtered stimulus, a postspike current, and a Gaussian noise current. We fit model parameters for individual RGCs by maximizing the likelihood of observed spike responses to a stochastic visual stimulus. Although compact, the fitted model predicts the detailed time structure of responses to novel stimuli, accurately capturing the interaction between the spiking history and sensory stimulus selectivity. The model also accounts for the variability in responses to repeated stimuli, even when fit to data from a single (nonrepeating) stimulus sequence. Finally, the model can be used to derive an explicit, maximum-likelihood decoding rule for neural spike trains, thus providing a tool for assessing the limitations that spiking variability imposes on sensory performance.

410 citations

Proceedings ArticleDOI
13 Oct 1987
TL;DR: In this article, a set of pyramid transforms are used to decompose an image into a setof basis functions that are (a) spatial-frequency tuned, orientation tuned, spatially localized, and self-similar.
Abstract: We describe a set of pyramid transforms that decompose an image into a set of basis functions that are (a) spatial-frequency tuned, (b) orientation tuned, (c) spatially localized, and (d) self-similar. For computational reasons the set is also (e) orthogonal and lends itself to (f) rapid computation. The systems are derived from concepts in matrix algebra, but are closely connected to decompositions based on quadrature mirror filters. Our computations take place hierarchically, leading to a pyramid representation in which all of the basis functions have the same basic shape, and appear at many scales. By placing the high-pass and low-pass kernels on staggered grids, we can derive odd-tap QMF kernels that are quite compact. We have developed pyramids using separable, quincunx, and hexagonal kernels. Image data compression with the pyramids gives excellent results, both in terms of MSE and visual appearance. A non-orthogonal variant allows good performance with 3-tap basis kernels and the appropriate inverse sampling kernels.

355 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

Proceedings Article
Sergey Ioffe1, Christian Szegedy1
06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

30,843 citations

Journal ArticleDOI
TL;DR: In this paper, it is shown that the difference of information between the approximation of a signal at the resolutions 2/sup j+1/ and 2 /sup j/ (where j is an integer) can be extracted by decomposing this signal on a wavelet orthonormal basis of L/sup 2/(R/sup n/), the vector space of measurable, square-integrable n-dimensional functions.
Abstract: Multiresolution representations are effective for analyzing the information content of images. The properties of the operator which approximates a signal at a given resolution were studied. It is shown that the difference of information between the approximation of a signal at the resolutions 2/sup j+1/ and 2/sup j/ (where j is an integer) can be extracted by decomposing this signal on a wavelet orthonormal basis of L/sup 2/(R/sup n/), the vector space of measurable, square-integrable n-dimensional functions. In L/sup 2/(R), a wavelet orthonormal basis is a family of functions which is built by dilating and translating a unique function psi (x). This decomposition defines an orthogonal multiresolution representation called a wavelet representation. It is computed with a pyramidal algorithm based on convolutions with quadrature mirror filters. Wavelet representation lies between the spatial and Fourier domains. For images, the wavelet representation differentiates several spatial orientations. The application of this representation to data compression in image coding, texture discrimination and fractal analysis is discussed. >

20,028 citations

Book
D.L. Donoho1
01 Jan 2004
TL;DR: It is possible to design n=O(Nlog(m)) nonadaptive measurements allowing reconstruction with accuracy comparable to that attainable with direct knowledge of the N most important coefficients, and a good approximation to those N important coefficients is extracted from the n measurements by solving a linear program-Basis Pursuit in signal processing.
Abstract: Suppose x is an unknown vector in Ropfm (a digital image or signal); we plan to measure n general linear functionals of x and then reconstruct. If x is known to be compressible by transform coding with a known transform, and we reconstruct via the nonlinear procedure defined here, the number of measurements n can be dramatically smaller than the size m. Thus, certain natural classes of images with m pixels need only n=O(m1/4log5/2(m)) nonadaptive nonpixel samples for faithful recovery, as opposed to the usual m pixel samples. More specifically, suppose x has a sparse representation in some orthonormal basis (e.g., wavelet, Fourier) or tight frame (e.g., curvelet, Gabor)-so the coefficients belong to an lscrp ball for 0

18,609 citations