Multiresolution representations are effective for analyzing the information content of images. The properties of the operator which approximates a signal at a given resolution were studied. It is shown that the difference of information between the approximation of a signal at the resolutions 2/sup j+1/ and 2/sup j/ (where j is an integer) can be extracted by decomposing this signal on a wavelet orthonormal basis of L/sup 2/(R/sup n/), the vector space of measurable, square-integrable n-dimensional functions. In L/sup 2/(R), a wavelet orthonormal basis is a family of functions which is built by dilating and translating a unique function psi (x). This decomposition defines an orthogonal multiresolution representation called a wavelet representation. It is computed with a pyramidal algorithm based on convolutions with quadrature mirror filters. Wavelet representation lies between the spatial and Fourier domains. For images, the wavelet representation differentiates several spatial orientations. The application of this representation to data compression in image coding, texture discrimination and fractal analysis is discussed. >

/pdf/a-theory-for-multiresolution-signal-decomposition-the-530tw5pzq7.pdf

A theory for multiresolution signal decomposition: the wavelet representation

Introduction to a Transient World. Fourier Kingdom. Discrete Revolution. Time Meets Frequency. Frames. Wavelet Zoom. Wavelet Bases. Wavelet Packet and Local Cosine Bases. An Approximation Tour. Estimations are Approximations. Transform Coding. Appendix A: Mathematical Complements. Appendix B: Software Toolboxes.

http://ahvazdownload.persiangig.com/document/article/39.pdf

A wavelet tour of signal processing

Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build "fully convolutional" networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a novel architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes one third of a second for a typical image.

Fully Convolutional Networks for Semantic Segmentation

We construct orthonormal bases of compactly supported wavelets, with arbitrarily high regularity. The order of regularity increases linearly with the support width. We start by reviewing the concept of multiresolution analysis as well as several algorithms in vision decomposition and reconstruction. The construction then follows from a synthesis of these different approaches.

/pdf/orthonormal-bases-of-compactly-supported-wavelets-2h5ewm1jh3.pdf

Orthonormal bases of compactly supported wavelets

Stereo matching is one of the most active research areas in computer vision. While a large number of algorithms for stereo correspondence have been developed, relatively little work has been done on characterizing their performance. In this paper, we present a taxonomy of dense, two-frame stereo methods designed to assess the different components and design decisions made in individual stereo algorithms. Using this taxonomy, we compare existing stereo methods and present experiments evaluating the performance of many different variants. In order to establish a common software platform and a collection of data sets for easy evaluation, we have designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can be easily extended to include new algorithms. We have also produced several new multiframe stereo data sets with ground truth, and are making both the code and data sets available on the Web.

/pdf/a-taxonomy-and-evaluation-of-dense-two-frame-stereo-4c20etkubp.pdf

A taxonomy and evaluation of dense two-frame stereo correspondence algorithms

We describe a technique for image encoding in which local operators of many scales but identical shape serve as the basis functions. The representation differs from established techniques in that the code elements are localized in spatial frequency as well as in space. Pixel-to-pixel correlations are first removed by subtracting a lowpass filtered copy of the image from the image itself. The result is a net data compression since the difference, or error, image has low variance and entropy, and the low-pass filtered image may represented at reduced sample density. Further data compression is achieved by quantizing the difference image. These steps are then repeated to compress the low-pass image. Iteration of the process at appropriately expanded scales generates a pyramid data structure. The encoding process is equivalent to sampling the image with Laplacian operators of many scales. Thus, the code tends to enhance salient image features. A further advantage of the present code is that it is well suited for many image analysis tasks as well as for image compression. Fast algorithms are described for coding and decoding.

/pdf/the-laplacian-pyramid-as-a-compact-image-code-3pr3qcg6c5.pdf

The Laplacian Pyramid as a Compact Image Code

A model of visual apparent motion is derived from four observations on path selection in ambiguous displays in which apparent motion of illuminated dots could, in principle, be perceived along many possible paths: (a) Whereas motion over each path is clearly visible when its stimulus is presented in isolation, motion is usually seen over only one path when two or more such stimuli are combined (competition), (b) Path selection is nearly independent of viewing distance (scale invariance). (c) At transition points between paths (' and j (where apparent motion is equally likely to be perceived along / and j), the time t and distance d between successive points along the paths are described by a log linear d/t relationship; that is, t = A - B log (d/d,). (d) When successive elements along a path differ in orientation or size, the perceived motion along this path is not necessarily weaker than motion along a path composed entirely of identical elements. The model is a form of strength theory in which the path with greatest strength 5 becomes the dominant path. From scale invariance, we prove that the contributions of time and distance to stimulus strength are independent. From the log linear d/t relationship, we derive the precise trade-off function between d and / and show the existence of an optimal interstimulus interval to maximize the strength for any path. The model accounts well for the path-selection data and suggests a neural interpretation in which motion perception is based on the outputs of elementary detectors that are scaled replicas of each other, all having the same geometry and time delays, and differing only in size and orientation. A visual stimulus, such as a bar or a disk, which is flashed first at one position and then flashed again nearby, may evoke a powerful illusion of movement, provided the spacing and timing of the two flashes is chosen appropriately. The vividness of this apparent motion depends strongly on the spatial and temporal separation of the stimuli and only weakly on the figural similarity of one stimulus to the other (see Kolers, 1972, for a review). However, efforts by Korte (1915), Neuhaus (1930), and others to discover a

/pdf/time-distance-and-feature-trade-offs-in-visual-apparent-3ypf7om872.pdf

Time, distance, and feature trade-offs in visual apparent motion.

A common task in image analysis is that of measuring image properties within local windows. Often usefulness of these property estimates is determined by characteristics of the windows themselves. Critical factors include the window size and shape, and the contribution the window makes to the cost of computation, A highly efficient procedure for computing property estimates within Gaussian-like windows is described. Estimates are obtained within windows of many sizes simultaneously.

Fast algorithms for estimating local image properties

The observations reported earlier by the authors are shown to support the view that the constraints for fusion are stimulus-centered and not observer-centered, thus justifying the reformulation of the disparity limits for binocular fusion in terms of a gradient limit which is disputed by Krol and van de Grind.

Peter J. Burt

Papers

The Laplacian Pyramid as a Compact Image Code

Time, distance, and feature trade-offs in visual apparent motion.

Fast algorithms for estimating local image properties

Fast algorithms for estimating local image properties

The Disparity Gradient Limit for Binocular Fusion: An Answer to J D Krol and W A Van de Grind