scispace - formally typeset
Search or ask a question

Pyramid methods in image processing.

01 Nov 1984-Vol. 29, Iss: 6, pp 33-41
TL;DR: A variety of pyramid methods that have been developed for image data compression, enhancement, analysis and graphics that are versatile, convenient, and efficient to use.
Abstract: The data structure used to represent image information can be critical to the successful completion of an image processing task. One structure that has attracted considerable attention is the image pyramid This consists of a set of lowpass or bandpass copies of an image, each representing pattern information of a different scale. Here we describe a variety of pyramid methods that we have developed for image data compression, enhancement, analysis and graphics. ©1984 RCA Corporation Final manuscript received November 12, 1984 Reprint Re-29-6-5 that can perform most of the routine visual tasks that humans do effortlessly. It is becoming increasingly clear that the format used to represent image data can be as critical in image processing as the algorithms applied to the data. A digital image is initially encoded as an array of pixel intensities, but this raw format i s not suited to most tasks. Alternatively, an image may be represented by its Fourier transform, with operations applied to the transform coefficients rather than to the original pixel values. This is appropriate for some data compression and image enhancement tasks, but inappropriate for others. The transform representation is particularly unsuited for machine vision and computer graphics, where the spatial location of pattem elements is critical. Recently there has been a great deal of interest in representations that retain spatial localization as well as localization in the spatial—frequency domain. This i s achieved by decomposing the image into a set of spatial frequency bandpass component images. Individual samples of a component image represent image pattern information that is appropriately localized, while the bandpassed image as a whole represents information about a particular fineness of detail or scale. There is evidence that the human visual system uses such a representation, and multiresolution schemes are becoming increasingly popular in machine vision and in image processing in general. The importance of analyzing images at many scales arises from the nature of images themselves. Scenes in the world contain objects of many sizes, and these objects contain features of many sizes. Moreover, objects can be at various distances from the viewer. As a result, any analysis procedure that is applied only at a single scale may miss information at other scales. The solution is to carry out analyses at all scales simultaneously. Convolution is the basic operation of most image analysis systems, and convolution with large weighting functions is a notoriously expensive computation. In a multiresolution system one wishes to perform convolutions with kernels of many sizes, ranging from very small to very large. and the computational problems appear forbidding. Therefore one of the main problems in working with multiresolution representations is to develop fast and efficient techniques. Members of the Advanced Image Processing Research Group have been actively involved in the development of multiresolution techniques for some time. Most of the work revolves around a representation known as a "pyramid," which is versatile, convenient, and efficient to use. We have applied pyramid-based methods to some fundamental problems in image analysis, data compression, and image manipulation.
Citations
More filters
Proceedings ArticleDOI
21 Jul 2017
TL;DR: This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.
Abstract: Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But pyramid representations have been avoided in recent object detectors that are based on deep convolutional networks, partially because they are slow to compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A top-down architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, called a Feature Pyramid Network (FPN), shows significant improvement as a generic feature extractor in several applications. Using a basic Faster R-CNN system, our method achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles, surpassing all existing single-model entries including those from the COCO 2016 challenge winners. In addition, our method can run at 5 FPS on a GPU and thus is a practical and accurate solution to multi-scale object detection. Code will be made publicly available.

16,727 citations


Cites background from "Pyramid methods in image processing..."

  • ...Feature pyramids built upon image pyramids (for short we call these featurized image pyramids) form the basis of a standard solution [1] (Fig....

    [...]

Posted Content
TL;DR: Feature pyramid networks (FPNets) as mentioned in this paper exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost.
Abstract: Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But recent deep learning object detectors have avoided pyramid representations, in part because they are compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A top-down architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, called a Feature Pyramid Network (FPN), shows significant improvement as a generic feature extractor in several applications. Using FPN in a basic Faster R-CNN system, our method achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles, surpassing all existing single-model entries including those from the COCO 2016 challenge winners. In addition, our method can run at 5 FPS on a GPU and thus is a practical and accurate solution to multi-scale object detection. Code will be made publicly available.

5,438 citations

Proceedings ArticleDOI
13 Jun 2010
TL;DR: It is discovered that “classical” flow formulations perform surprisingly well when combined with modern optimization and implementation techniques, and while median filtering of intermediate flow fields during optimization is a key to recent performance gains, it leads to higher energy solutions.
Abstract: The accuracy of optical flow estimation algorithms has been improving steadily as evidenced by results on the Middlebury optical flow benchmark. The typical formulation, however, has changed little since the work of Horn and Schunck. We attempt to uncover what has made recent advances possible through a thorough analysis of how the objective function, the optimization method, and modern implementation practices influence accuracy. We discover that “classical” flow formulations perform surprisingly well when combined with modern optimization and implementation techniques. Moreover, we find that while median filtering of intermediate flow fields during optimization is a key to recent performance gains, it leads to higher energy solutions. To understand the principles behind this phenomenon, we derive a new objective that formalizes the median filtering heuristic. This objective includes a nonlocal term that robustly integrates flow estimates over large spatial neighborhoods. By modifying this new term to include information about flow and image boundaries we develop a method that ranks at the top of the Middlebury benchmark.

1,529 citations


Cites background from "Pyramid methods in image processing..."

  • ...For the Charbonnier penalty function, the candidate set is [1, 3, 5, 8, 10] and 5 is optimal....

    [...]

  • ...Current best practices include coarse-to-fine estimation to deal with large motions [8, 13], texture decomposition [32, 34] or high-order filter constancy [3, 12, 16, 22, 40] to reduce the influence of lighting changes, bicubic interpolation-based warping [22, 34], temporal averaging of image derivatives [17, 34], graduated non-convexity [11] to minimize non-convex energies [10, 31], and median filtering after each incremental estimation step to remove outliers [34]....

    [...]

Journal ArticleDOI
TL;DR: Two examples of jointly shiftable transforms that are simultaneously shiftable in more than one domain are explored and the usefulness of these image representations for scale-space analysis, stereo disparity measurement, and image enhancement is demonstrated.
Abstract: One of the major drawbacks of orthogonal wavelet transforms is their lack of translation invariance: the content of wavelet subbands is unstable under translations of the input signal. Wavelet transforms are also unstable with respect to dilations of the input signal and, in two dimensions, rotations of the input signal. The authors formalize these problems by defining a type of translation invariance called shiftability. In the spatial domain, shiftability corresponds to a lack of aliasing; thus, the conditions under which the property holds are specified by the sampling theorem. Shiftability may also be applied in the context of other domains, particularly orientation and scale. Jointly shiftable transforms that are simultaneously shiftable in more than one domain are explored. Two examples of jointly shiftable transforms are designed and implemented: a 1-D transform that is jointly shiftable in position and scale, and a 2-D transform that is jointly shiftable in position and orientation. The usefulness of these image representations for scale-space analysis, stereo disparity measurement, and image enhancement is demonstrated. >

1,448 citations


Cites background from "Pyramid methods in image processing..."

  • ...In order to create representations more suited for analysis of scale, many authors in signal processing and computer vision as well as biological vision have advocated transforms with constant logarithmic-width (constant Q) frequency bands [19], [31], [48], [24], [1], [34], [28]....

    [...]

Proceedings ArticleDOI
23 Oct 1995
TL;DR: An architecture for efficient and accurate linear decomposition of an image into scale and orientation subbands and the construction and implementation of the transform is described.
Abstract: We describe an architecture for efficient and accurate linear decomposition of an image into scale and orientation subbands. The basis functions of this decomposition are directional derivative operators of any desired order. We describe the construction and implementation of the transform.

1,236 citations

References
More filters
Journal ArticleDOI
TL;DR: A technique for image encoding in which local operators of many scales but identical shape serve as the basis functions, which tends to enhance salient image features and is well suited for many image analysis tasks as well as for image compression.
Abstract: We describe a technique for image encoding in which local operators of many scales but identical shape serve as the basis functions. The representation differs from established techniques in that the code elements are localized in spatial frequency as well as in space. Pixel-to-pixel correlations are first removed by subtracting a lowpass filtered copy of the image from the image itself. The result is a net data compression since the difference, or error, image has low variance and entropy, and the low-pass filtered image may represented at reduced sample density. Further data compression is achieved by quantizing the difference image. These steps are then repeated to compress the low-pass image. Iteration of the process at appropriately expanded scales generates a pyramid data structure. The encoding process is equivalent to sampling the image with Laplacian operators of many scales. Thus, the code tends to enhance salient image features. A further advantage of the present code is that it is well suited for many image analysis tasks as well as for image compression. Fast algorithms are described for coding and decoding.

6,975 citations


"Pyramid methods in image processing..." refers methods in this paper

  • ...The pyramid representation also permits data compression.(3) Although it has one...

    [...]

Journal ArticleDOI
TL;DR: A multiresolution spline technique for combining two or more images into a larger image mosaic is defined and coarse features occur near borders are blended gradually over a relatively large distance without blurring or otherwise degrading finer image details in the neighborhood of th e border.
Abstract: We define a multiresolution spline technique for combining two or more images into a larger image mosaic. In this procedure, the images to be splined are first decomposed into a set of band-pass filtered component images. Next, the component images in each spatial frequency hand are assembled into a corresponding bandpass mosaic. In this step, component images are joined using a weighted average within a transition zone which is proportional in size to the wave lengths represented in the band. Finally, these band-pass mosaic images are summed to obtain the desired image mosaic. In this way, the spline is matched to the scale of features within the images themselves. When coarse features occur near borders, these are blended gradually over a relatively large distance without blurring or otherwise degrading finer image details in the neighborhood of th e border.

1,246 citations

Journal ArticleDOI
TL;DR: Data on the threshold visibility of spatially localized, aperiodic patterns are used to derive the properties of a general model for threshold spatial vision that quantitatively predicts the spatial modulation transfer function (cosine grating thresholds) under both sustained and transient conditions with no free parameters.
Abstract: Data on the threshold visibility of spatially localized, aperiodic patterns are used to derive the properties of a general model for threshold spatial vision. The model consists of four different size-tuned mechanisms centered at each eccentricity, each with a center-surround sensitivity profile described by the difference of two Gaussian functions. The two smaller mechanisms show relatively sustained temporal characteristics, while the larger two exhibit transient properties. All four mechanisms increase linearly in size with eccentricity. Mechanism responses are combined through spatial probability summation to predict visual thresholds. The model quantitatively predicts the spatial modulation transfer function (cosine grating thresholds) under both sustained and transient conditions with no free parameters.

741 citations