scispace - formally typeset
Search or ask a question
Author

Peter J. Burt

Bio: Peter J. Burt is an academic researcher from Sarnoff Corporation. The author has contributed to research in topics: Image processing & Pyramid (image processing). The author has an hindex of 32, co-authored 50 publications receiving 9183 citations.


Papers
More filters
01 Jan 2000
TL;DR: An overview of theVSAM system, which uses multiple, cooperative video sensors to provide continuous coverage of people and vehicles in a cluttered environment, is presented.
Abstract: Under the three-year Video Surveillance and Monitoring (VSAM) project (1997‐1999), the Robotics Institute at Carnegie Mellon University (CMU) and the Sarnoff Corporation developed a system for autonomous Video Surveillance and Monitoring. The technical approach uses multiple, cooperative video sensors to provide continuous coverage of people and vehicles in a cluttered environment. This final report presents an overview of the system, and of the technical accomplishments that have been achieved.

1,515 citations

Journal ArticleDOI
TL;DR: A multiresolution spline technique for combining two or more images into a larger image mosaic is defined and coarse features occur near borders are blended gradually over a relatively large distance without blurring or otherwise degrading finer image details in the neighborhood of th e border.
Abstract: We define a multiresolution spline technique for combining two or more images into a larger image mosaic. In this procedure, the images to be splined are first decomposed into a set of band-pass filtered component images. Next, the component images in each spatial frequency hand are assembled into a corresponding bandpass mosaic. In this step, component images are joined using a weighted average within a transition zone which is proportional in size to the wave lengths represented in the band. Finally, these band-pass mosaic images are summed to obtain the desired image mosaic. In this way, the spline is matched to the scale of features within the images themselves. When coarse features occur near borders, these are blended gradually over a relatively large distance without blurring or otherwise degrading finer image details in the neighborhood of th e border.

1,246 citations

01 Nov 1984
TL;DR: A variety of pyramid methods that have been developed for image data compression, enhancement, analysis and graphics that are versatile, convenient, and efficient to use.
Abstract: The data structure used to represent image information can be critical to the successful completion of an image processing task. One structure that has attracted considerable attention is the image pyramid This consists of a set of lowpass or bandpass copies of an image, each representing pattern information of a different scale. Here we describe a variety of pyramid methods that we have developed for image data compression, enhancement, analysis and graphics. ©1984 RCA Corporation Final manuscript received November 12, 1984 Reprint Re-29-6-5 that can perform most of the routine visual tasks that humans do effortlessly. It is becoming increasingly clear that the format used to represent image data can be as critical in image processing as the algorithms applied to the data. A digital image is initially encoded as an array of pixel intensities, but this raw format i s not suited to most tasks. Alternatively, an image may be represented by its Fourier transform, with operations applied to the transform coefficients rather than to the original pixel values. This is appropriate for some data compression and image enhancement tasks, but inappropriate for others. The transform representation is particularly unsuited for machine vision and computer graphics, where the spatial location of pattem elements is critical. Recently there has been a great deal of interest in representations that retain spatial localization as well as localization in the spatial—frequency domain. This i s achieved by decomposing the image into a set of spatial frequency bandpass component images. Individual samples of a component image represent image pattern information that is appropriately localized, while the bandpassed image as a whole represents information about a particular fineness of detail or scale. There is evidence that the human visual system uses such a representation, and multiresolution schemes are becoming increasingly popular in machine vision and in image processing in general. The importance of analyzing images at many scales arises from the nature of images themselves. Scenes in the world contain objects of many sizes, and these objects contain features of many sizes. Moreover, objects can be at various distances from the viewer. As a result, any analysis procedure that is applied only at a single scale may miss information at other scales. The solution is to carry out analyses at all scales simultaneously. Convolution is the basic operation of most image analysis systems, and convolution with large weighting functions is a notoriously expensive computation. In a multiresolution system one wishes to perform convolutions with kernels of many sizes, ranging from very small to very large. and the computational problems appear forbidding. Therefore one of the main problems in working with multiresolution representations is to develop fast and efficient techniques. Members of the Advanced Image Processing Research Group have been actively involved in the development of multiresolution techniques for some time. Most of the work revolves around a representation known as a "pyramid," which is versatile, convenient, and efficient to use. We have applied pyramid-based methods to some fundamental problems in image analysis, data compression, and image manipulation.

1,158 citations

Proceedings ArticleDOI
11 May 1993
TL;DR: The authors present an extension to the pyramid approach to image fusion that provides greater shift invariance and immunity to video noise, and provides at least a partial solution to the problem of combining components that have roughly equal salience but opposite contrasts.
Abstract: The authors present an extension to the pyramid approach to image fusion. The modifications address problems that were encountered with past implementations of pyramid-based fusion. In particular, the modifications provide greater shift invariance and immunity to video noise, and provide at least a partial solution to the problem of combining components that have roughly equal salience but opposite contrasts. The fusion algorithm was found to perform well for a range of tasks without requiring adjustment of the algorithm parameters. Results were remarkably insensitive to changes in these parameters, suggesting that the procedure is both robust and generic. A composite imaging technique is outlined that may provide a powerful tool for image capture. By fusing a set of images obtained under restricted, narrowband, imaging conditions, it is often possible to construct an image that has enhanced information content when compared to a single image obtained directly with a broadband sensor. >

917 citations

Patent
03 Dec 1996
TL;DR: In this article, the irises of a human or an animal in an image with little or no active involvement by the human or animal was analyzed. And a method for obtaining and analyzing images of at least one object in a scene comprising capturing a wide field of view image of the object to locate the object in the scene; and then using a narrow field-of-view (NFOV) imager responsive to the location information provided in the capturing step to obtain higher resolution image.
Abstract: A recognition system which obtains and analyzes images of at least one object in a scene comprising a wide field of view (WFOV) imager which is used to capture an image of the scene and to locate the object and a narrow field of view (NFOV) imager which is responsive to the location information provided by the WFOV imager and which is used to capture an image of the object, the image of the object having a higher resolution than the image captured by the WFOV imager is disclosed. In one embodiment, a system that obtains and analyzes images of the irises of eyes of a human or animal in an image with little or no active involvement by the human or animal is disclosed. A method for obtaining and analyzing images of at least one object in a scene comprising capturing a wide field of view image of the object to locate the object in the scene; and then using a narrow field of view imager responsive to the location information provided in the capturing step to obtain higher resolution image of the object is also disclosed.

599 citations


Cited by
More filters
Book
01 Jan 1998
TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
Abstract: Introduction to a Transient World. Fourier Kingdom. Discrete Revolution. Time Meets Frequency. Frames. Wavelet Zoom. Wavelet Bases. Wavelet Packet and Local Cosine Bases. An Approximation Tour. Estimations are Approximations. Transform Coding. Appendix A: Mathematical Complements. Appendix B: Software Toolboxes.

17,693 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.
Abstract: Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But pyramid representations have been avoided in recent object detectors that are based on deep convolutional networks, partially because they are slow to compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A top-down architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, called a Feature Pyramid Network (FPN), shows significant improvement as a generic feature extractor in several applications. Using a basic Faster R-CNN system, our method achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles, surpassing all existing single-model entries including those from the COCO 2016 challenge winners. In addition, our method can run at 5 FPS on a GPU and thus is a practical and accurate solution to multi-scale object detection. Code will be made publicly available.

16,727 citations

Journal ArticleDOI
TL;DR: A near-real-time computer system that can locate and track a subject's head, and then recognize the person by comparing characteristics of the face to those of known individuals, and that is easy to implement using a neural network architecture.
Abstract: We have developed a near-real-time computer system that can locate and track a subject's head, and then recognize the person by comparing characteristics of the face to those of known individuals. The computational approach taken in this system is motivated by both physiology and information theory, as well as by the practical requirements of near-real-time performance and accuracy. Our approach treats the face recognition problem as an intrinsically two-dimensional (2-D) recognition problem rather than requiring recovery of three-dimensional geometry, taking advantage of the fact that faces are normally upright and thus may be described by a small set of 2-D characteristic views. The system functions by projecting face images onto a feature space that spans the significant variations among known face images. The significant features are known as "eigenfaces," because they are the eigenvectors (principal components) of the set of faces; they do not necessarily correspond to features such as eyes, ears, and noses. The projection operation characterizes an individual face by a weighted sum of the eigenface features, and so to recognize a particular face it is necessary only to compare these weights to those of known individuals. Some particular advantages of our approach are that it provides for the ability to learn and later recognize new faces in an unsupervised manner, and that it is easy to implement using a neural network architecture.

14,562 citations

Journal ArticleDOI
Ingrid Daubechies1
TL;DR: This work construct orthonormal bases of compactly supported wavelets, with arbitrarily high regularity, by reviewing the concept of multiresolution analysis as well as several algorithms in vision decomposition and reconstruction.
Abstract: We construct orthonormal bases of compactly supported wavelets, with arbitrarily high regularity. The order of regularity increases linearly with the support width. We start by reviewing the concept of multiresolution analysis as well as several algorithms in vision decomposition and reconstruction. The construction then follows from a synthesis of these different approaches.

8,588 citations

Proceedings ArticleDOI
03 Jun 1991
TL;DR: An approach to the detection and identification of human faces is presented, and a working, near-real-time face recognition system which tracks a subject's head and then recognizes the person by comparing characteristics of the face to those of known individuals is described.
Abstract: An approach to the detection and identification of human faces is presented, and a working, near-real-time face recognition system which tracks a subject's head and then recognizes the person by comparing characteristics of the face to those of known individuals is described. This approach treats face recognition as a two-dimensional recognition problem, taking advantage of the fact that faces are normally upright and thus may be described by a small set of 2-D characteristic views. Face images are projected onto a feature space ('face space') that best encodes the variation among known face images. The face space is defined by the 'eigenfaces', which are the eigenvectors of the set of faces; they do not necessarily correspond to isolated features such as eyes, ears, and noses. The framework provides the ability to learn to recognize new faces in an unsupervised manner. >

5,489 citations