Author
Peter J. Burt
Bio: Peter J. Burt is an academic researcher from Rensselaer Polytechnic Institute. The author has contributed to research in topics: Limit (mathematics) & Window (computing). The author has an hindex of 7, co-authored 9 publications receiving 6960 citations.
Papers
More filters
TL;DR: A technique for image encoding in which local operators of many scales but identical shape serve as the basis functions, which tends to enhance salient image features and is well suited for many image analysis tasks as well as for image compression.
Abstract: We describe a technique for image encoding in which local operators of many scales but identical shape serve as the basis functions. The representation differs from established techniques in that the code elements are localized in spatial frequency as well as in space. Pixel-to-pixel correlations are first removed by subtracting a lowpass filtered copy of the image from the image itself. The result is a net data compression since the difference, or error, image has low variance and entropy, and the low-pass filtered image may represented at reduced sample density. Further data compression is achieved by quantizing the difference image. These steps are then repeated to compress the low-pass image. Iteration of the process at appropriately expanded scales generates a pyramid data structure. The encoding process is equivalent to sampling the image with Laplacian operators of many scales. Thus, the code tends to enhance salient image features. A further advantage of the present code is that it is well suited for many image analysis tasks as well as for image compression. Fast algorithms are described for coding and decoding.
6,975 citations
TL;DR: In this paper, a model of visual apparent motion is derived from four observations on path selection in ambiguous displays in which apparent motion of illuminated dots could, in principle, be perceived along many possible paths.
Abstract: A model of visual apparent motion is derived from four observations on path selection in ambiguous displays in which apparent motion of illuminated dots could, in principle, be perceived along many possible paths: (a) Whereas motion over each path is clearly visible when its stimulus is presented in isolation, motion is usually seen over only one path when two or more such stimuli are combined (competition), (b) Path selection is nearly independent of viewing distance (scale invariance). (c) At transition points between paths (' and j (where apparent motion is equally likely to be perceived along / and j), the time t and distance d between successive points along the paths are described by a log linear d/t relationship; that is, t = A - B log (d/d,). (d) When successive elements along a path differ in orientation or size, the perceived motion along this path is not necessarily weaker than motion along a path composed entirely of identical elements. The model is a form of strength theory in which the path with greatest strength 5 becomes the dominant path. From scale invariance, we prove that the contributions of time and distance to stimulus strength are independent. From the log linear d/t relationship, we derive the precise trade-off function between d and / and show the existence of an optimal interstimulus interval to maximize the strength for any path. The model accounts well for the path-selection data and suggests a neural interpretation in which motion perception is based on the outputs of elementary detectors that are scaled replicas of each other, all having the same geometry and time delays, and differing only in size and orientation. A visual stimulus, such as a bar or a disk, which is flashed first at one position and then flashed again nearby, may evoke a powerful illusion of movement, provided the spacing and timing of the two flashes is chosen appropriately. The vividness of this apparent motion depends strongly on the spatial and temporal separation of the stimuli and only weakly on the figural similarity of one stimulus to the other (see Kolers, 1972, for a review). However, efforts by Korte (1915), Neuhaus (1930), and others to discover a
244 citations
01 Mar 1983-Graphical Models \/graphical Models and Image Processing \/computer Vision, Graphics, and Image Processing
TL;DR: A highly efficient procedure for computing property estimates within Gaussian-like windows is described, which is obtained within windows of many sizes simultaneously.
Abstract: A common task in image analysis is that of measuring image properties within local windows. Often usefulness of these property estimates is determined by characteristics of the windows themselves. Critical factors include the window size and shape, and the contribution the window makes to the cost of computation, A highly efficient procedure for computing property estimates within Gaussian-like windows is described. Estimates are obtained within windows of many sizes simultaneously.
100 citations
TL;DR: In this paper, a method for computing property estimates within Gaussian-like windows is described, where estimates are obtained within windows of many sizes simultaneously, and the window size and shape are considered.
Abstract: A common task in image analysis is that of measuring image properties within local windows. Often usefulness of these property estimates is determined by characteristics of the windows themselves. Critical factors include the window size and shape, and the contribution the window makes to the cost of computation, A highly efficient procedure for computing property estimates within Gaussian-like windows is described. Estimates are obtained within windows of many sizes simultaneously.
42 citations
TL;DR: The observations reported earlier by the authors are shown to support the view that the constraints for fusion are stimulus-centered and not observer-centered, thus justifying the reformulation of the disparity limits for binocular fusion in terms of a gradient limit which is disputed by Krol and van de Grind as discussed by the authors.
Abstract: The observations reported earlier by the authors are shown to support the view that the constraints for fusion are stimulus-centered and not observer-centered, thus justifying the reformulation of the disparity limits for binocular fusion in terms of a gradient limit which is disputed by Krol and van de Grind.
9 citations
Cited by
More filters
TL;DR: In this paper, it is shown that the difference of information between the approximation of a signal at the resolutions 2/sup j+1/ and 2 /sup j/ (where j is an integer) can be extracted by decomposing this signal on a wavelet orthonormal basis of L/sup 2/(R/sup n/), the vector space of measurable, square-integrable n-dimensional functions.
Abstract: Multiresolution representations are effective for analyzing the information content of images. The properties of the operator which approximates a signal at a given resolution were studied. It is shown that the difference of information between the approximation of a signal at the resolutions 2/sup j+1/ and 2/sup j/ (where j is an integer) can be extracted by decomposing this signal on a wavelet orthonormal basis of L/sup 2/(R/sup n/), the vector space of measurable, square-integrable n-dimensional functions. In L/sup 2/(R), a wavelet orthonormal basis is a family of functions which is built by dilating and translating a unique function psi (x). This decomposition defines an orthogonal multiresolution representation called a wavelet representation. It is computed with a pyramidal algorithm based on convolutions with quadrature mirror filters. Wavelet representation lies between the spatial and Fourier domains. For images, the wavelet representation differentiates several spatial orientations. The application of this representation to data compression in image coding, texture discrimination and fractal analysis is discussed. >
20,028 citations
Book•
01 Jan 1998
TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
Abstract: Introduction to a Transient World. Fourier Kingdom. Discrete Revolution. Time Meets Frequency. Frames. Wavelet Zoom. Wavelet Bases. Wavelet Packet and Local Cosine Bases. An Approximation Tour. Estimations are Approximations. Transform Coding. Appendix A: Mathematical Complements. Appendix B: Software Toolboxes.
17,693 citations
Posted Content•
TL;DR: It is shown that convolutional networks by themselves, trained end- to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation.
Abstract: Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build "fully convolutional" networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a novel architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes one third of a second for a typical image.
9,803 citations
TL;DR: This work construct orthonormal bases of compactly supported wavelets, with arbitrarily high regularity, by reviewing the concept of multiresolution analysis as well as several algorithms in vision decomposition and reconstruction.
Abstract: We construct orthonormal bases of compactly supported wavelets, with arbitrarily high regularity. The order of regularity increases linearly with the support width. We start by reviewing the concept of multiresolution analysis as well as several algorithms in vision decomposition and reconstruction. The construction then follows from a synthesis of these different approaches.
8,588 citations
TL;DR: This paper has designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can easily be extended to include new algorithms.
Abstract: Stereo matching is one of the most active research areas in computer vision. While a large number of algorithms for stereo correspondence have been developed, relatively little work has been done on characterizing their performance. In this paper, we present a taxonomy of dense, two-frame stereo methods designed to assess the different components and design decisions made in individual stereo algorithms. Using this taxonomy, we compare existing stereo methods and present experiments evaluating the performance of many different variants. In order to establish a common software platform and a collection of data sets for easy evaluation, we have designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can be easily extended to include new algorithms. We have also produced several new multiframe stereo data sets with ground truth, and are making both the code and data sets available on the Web.
7,458 citations