Edward H. Adelson
Bio: Edward H. Adelson is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Tactile sensor & Image processing. The author has an hindex of 74, co-authored 228 publications receiving 40686 citations. Previous affiliations of Edward H. Adelson include Sarnoff Corporation & New York University.
Papers published on a yearly basis
TL;DR: A technique for image encoding in which local operators of many scales but identical shape serve as the basis functions, which tends to enhance salient image features and is well suited for many image analysis tasks as well as for image compression.
Abstract: We describe a technique for image encoding in which local operators of many scales but identical shape serve as the basis functions. The representation differs from established techniques in that the code elements are localized in spatial frequency as well as in space. Pixel-to-pixel correlations are first removed by subtracting a lowpass filtered copy of the image from the image itself. The result is a net data compression since the difference, or error, image has low variance and entropy, and the low-pass filtered image may represented at reduced sample density. Further data compression is achieved by quantizing the difference image. These steps are then repeated to compress the low-pass image. Iteration of the process at appropriately expanded scales generates a pyramid data structure. The encoding process is equivalent to sampling the image with Laplacian operators of many scales. Thus, the code tends to enhance salient image features. A further advantage of the present code is that it is well suited for many image analysis tasks as well as for image compression. Fast algorithms are described for coding and decoding.
TL;DR: In this article, the first stage consists of linear filters that are oriented in space-time and tuned in spatial frequency, and the outputs of quadrature pairs of such filters are squared and summed to give a measure of motion energy.
Abstract: A motion sequence may be represented as a single pattern in x–y–t space; a velocity of motion corresponds to a three-dimensional orientation in this space. Motion sinformation can be extracted by a system that responds to the oriented spatiotemporal energy. We discuss a class of models for human motion mechanisms in which the first stage consists of linear filters that are oriented in space-time and tuned in spatial frequency. The outputs of quadrature pairs of such filters are squared and summed to give a measure of motion energy. These responses are then fed into an opponent stage. Energy models can be built from elements that are consistent with known physiology and psychophysics, and they permit a qualitative understanding of a variety of motion phenomena.
TL;DR: The authors present an efficient architecture to synthesize filters of arbitrary orientations from linear combinations of basis filters, allowing one to adaptively steer a filter to any orientation, and to determine analytically the filter output as a function of orientation.
Abstract: The authors present an efficient architecture to synthesize filters of arbitrary orientations from linear combinations of basis filters, allowing one to adaptively steer a filter to any orientation, and to determine analytically the filter output as a function of orientation. Steerable filters may be designed in quadrature pairs to allow adaptive control over phase as well as orientation. The authors show how to design and steer the filters and present examples of their use in the analysis of orientation and phase, angularly adaptive filtering, edge detection, and shape from shading. One can also build a self-similar steerable pyramid representation. The same concepts can be generalized to the design of 3-D steerable filters. >
01 Jan 1991
TL;DR: Early vision as discussed by the authors is defined as measuring the amounts of various kinds of visual substances present in the image (e.g., redness or rightward motion energy) rather than in how it labels "things".
Abstract: What are the elements of early vision? This question might be taken to mean, What are the fundamental atoms of vision?—and might be variously answered in terms ofsuch candidate structures as edges, peaks, corners, and so on. In this chapter we adopt a rather different point of view and ask the question, What are the fundamentalsubstances of vision? This distinction is important becausewe wish to focus on the first steps in extraction of visualinformation. At this level it is premature to talk aboutdiscrete objects, even such simple ones as edges and corners.There is general agreement that early vision involvesmeasurements of a number of basic image properties in-cluding orientation, color, motion, and so on. Figure l.lshows a caricature (in the style of Neisser, 1976), of the sort of architecture that has become quite popular as a model for both human and machine vision. The first stageof processing involves a set of parallel pathways, eachdevoted to one particular-visual property. We propose that the measurements of these basic properties be con-sidered as the elements of early vision. We think of earlyvision as measuring the amounts of various kinds of vi-sual "substances" present in the image (e.g., redness orrightward motion energy). In other words, we are inter- ested in how early vision measures “stuff” rather than in how it labels “things.”What, then, are these elementary visual substances?Various lists have been compiled using a mixture of intui-tion and experiment. Electrophysiologists have describedneurons in striate cortex that are selectively sensitive tocertain visual properties; for reviews, see Hubel (1988) and DeValois and DeValois (1988). Psychophysicists haveinferred the existence of channels that are tuned for cer- tain visual properties; for reviews, see Graham (1989), Olzak and Thomas (1986), Pokorny and Smith (1986), and Watson (1986). Researchers in perception have foundaspects of visual stimuli that are processed pre-attentive- ly (Beck, 1966; Bergen & Julesz, 1983; Julesz & Bergen,
TL;DR: Two examples of jointly shiftable transforms that are simultaneously shiftable in more than one domain are explored and the usefulness of these image representations for scale-space analysis, stereo disparity measurement, and image enhancement is demonstrated.
Abstract: One of the major drawbacks of orthogonal wavelet transforms is their lack of translation invariance: the content of wavelet subbands is unstable under translations of the input signal. Wavelet transforms are also unstable with respect to dilations of the input signal and, in two dimensions, rotations of the input signal. The authors formalize these problems by defining a type of translation invariance called shiftability. In the spatial domain, shiftability corresponds to a lack of aliasing; thus, the conditions under which the property holds are specified by the sampling theorem. Shiftability may also be applied in the context of other domains, particularly orientation and scale. Jointly shiftable transforms that are simultaneously shiftable in more than one domain are explored. Two examples of jointly shiftable transforms are designed and implemented: a 1-D transform that is jointly shiftable in position and scale, and a 2-D transform that is jointly shiftable in position and orientation. The usefulness of these image representations for scale-space analysis, stereo disparity measurement, and image enhancement is demonstrated. >
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.
••07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.
TL;DR: In this paper, it is shown that the difference of information between the approximation of a signal at the resolutions 2/sup j+1/ and 2 /sup j/ (where j is an integer) can be extracted by decomposing this signal on a wavelet orthonormal basis of L/sup 2/(R/sup n/), the vector space of measurable, square-integrable n-dimensional functions.
Abstract: Multiresolution representations are effective for analyzing the information content of images. The properties of the operator which approximates a signal at a given resolution were studied. It is shown that the difference of information between the approximation of a signal at the resolutions 2/sup j+1/ and 2/sup j/ (where j is an integer) can be extracted by decomposing this signal on a wavelet orthonormal basis of L/sup 2/(R/sup n/), the vector space of measurable, square-integrable n-dimensional functions. In L/sup 2/(R), a wavelet orthonormal basis is a family of functions which is built by dilating and translating a unique function psi (x). This decomposition defines an orthogonal multiresolution representation called a wavelet representation. It is computed with a pyramidal algorithm based on convolutions with quadrature mirror filters. Wavelet representation lies between the spatial and Fourier domains. For images, the wavelet representation differentiates several spatial orientations. The application of this representation to data compression in image coding, texture discrimination and fractal analysis is discussed. >
••01 Dec 2001
TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Abstract: This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the "integral image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efficient classifiers. The third contribution is a method for combining increasingly more complex classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest. In the domain of face detection the system yields detection rates comparable to the best previous systems. Used in real-time applications, the detector runs at 15 frames per second without resorting to image differencing or skin color detection.
01 Jan 1998
TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
Abstract: Introduction to a Transient World. Fourier Kingdom. Discrete Revolution. Time Meets Frequency. Frames. Wavelet Zoom. Wavelet Bases. Wavelet Packet and Local Cosine Bases. An Approximation Tour. Estimations are Approximations. Transform Coding. Appendix A: Mathematical Complements. Appendix B: Software Toolboxes.