scispace - formally typeset
Search or ask a question

Showing papers on "Image processing published in 1996"


Journal ArticleDOI
TL;DR: The image coding results, calculated from actual file sizes and images reconstructed by the decoding algorithm, are either comparable to or surpass previous results obtained through much more sophisticated and computationally complex methods.
Abstract: Embedded zerotree wavelet (EZW) coding, introduced by Shapiro (see IEEE Trans. Signal Processing, vol.41, no.12, p.3445, 1993), is a very effective and computationally simple technique for image compression. We offer an alternative explanation of the principles of its operation, so that the reasons for its excellent performance can be better understood. These principles are partial ordering by magnitude with a set partitioning sorting algorithm, ordered bit plane transmission, and exploitation of self-similarity across different scales of an image wavelet transform. Moreover, we present a new and different implementation based on set partitioning in hierarchical trees (SPIHT), which provides even better performance than our previously reported extension of EZW that surpassed the performance of the original EZW. The image coding results, calculated from actual file sizes and images reconstructed by the decoding algorithm, are either comparable to or surpass previous results obtained through much more sophisticated and computationally complex methods. In addition, the new coding and decoding procedures are extremely fast, and they can be made even faster, with only small loss in performance, by omitting entropy coding of the bit stream by the arithmetic code.

5,890 citations


Journal ArticleDOI
TL;DR: Comparisons with other multiresolution texture features using the Brodatz texture database indicate that the Gabor features provide the best pattern retrieval accuracy.
Abstract: Image content based retrieval is emerging as an important research area with application to digital libraries and multimedia databases. The focus of this paper is on the image processing aspects and in particular using texture information for browsing and retrieval of large image data. We propose the use of Gabor wavelet features for texture analysis and provide a comprehensive experimental evaluation. Comparisons with other multiresolution texture features using the Brodatz texture database indicate that the Gabor features provide the best pattern retrieval accuracy. An application to browsing large air photos is illustrated.

4,017 citations


Journal ArticleDOI
TL;DR: In this article, a set of image processing algorithms for extracting quantitative data from digitized video microscope images of colloidal suspensions is described, which can locate submicrometer spheres to within 10 nm in the focal plane and 150 nm in depth.

3,423 citations


PatentDOI
TL;DR: SiMultaneous Acquisition of Spatial Harmonics (SMASH) as mentioned in this paper is a partially parallel imaging strategy, which is readily integrated with many existing fast imaging sequences, yielding multiplicative time savings without a significant sacrifice in spatial resolution or signal-to-noise ratio.
Abstract: A magnetic resonance (MR) imaging apparatus and technique exploits spatial information inherent in a surface coil array to increase MR image acquisition speed, resolution and/or field of view. Partial signals are acquired simultaneously in the component coils of the array and formed into two or more signals corresponding to orthogonal spatial representations. In a Fourier embodiment, lines of the k-space matrix required for image production are formed using a set of separate, preferably linear combinations of the component coil signals to substitute for spatial modulations normally produced by phase encoding gradients. The signal combining may proceed in a parallel or flow-through fashion, or as post-processing, which in either case reduces the need for time-consuming gradient switching and expensive fast magnet arrangements. In the post-processing approach, stored signals are combined after the fact to yield the full data matrix. In the flow-through approach, a plug-in unit consisting of a coil array with an on board processor outputs two or more sets of combined spatial signals for each spin conditioning cycle, each directly corresponding to a distinct line in k-space. This partially parallel imaging strategy, dubbed SiMultaneous Acquisition of Spatial Harmonics (SMASH), is readily integrated with many existing fast imaging sequences, yielding multiplicative time savings without a significant sacrifice in spatial resolution or signal-to-noise ratio. An experimental system achieved two-fold improvement in image acquisition time with a prototype three-coil array, and larger factors are achievable with ther coil arrangements.

2,256 citations


Proceedings ArticleDOI
01 Aug 1996
TL;DR: This work presents a new approach for modeling and rendering existing architectural scenes from a sparse set of still photographs, which combines both geometry-based and imagebased techniques, and presents view-dependent texture mapping, a method of compositing multiple views of a scene that better simulates geometric detail on basic models.
Abstract: We present a new approach for modeling and rendering existing architectural scenes from a sparse set of still photographs. Our modeling approach, which combines both geometry-based and imagebased techniques, has two components. The first component is a photogrammetricmodeling method which facilitates the recovery of the basic geometry of the photographed scene. Our photogrammetric modeling approach is effective, convenient, and robust because it exploits the constraints that are characteristic of architectural scenes. The second component is a model-based stereo algorithm, which recovers how the real scene deviates from the basic model. By making use of the model, our stereo technique robustly recovers accurate depth from widely-spaced image pairs. Consequently, our approach can model large architectural environments with far fewer photographs than current image-based modeling approaches. For producing renderings, we present view-dependent texture mapping, a method of compositing multiple views of a scene that better simulates geometric detail on basic models. Our approach can be used to recover models for use in either geometry-based or image-based rendering systems. We present results that demonstrate our approach’s ability to create realistic renderings of architectural scenes from viewpoints far from the original photographs. CR Descriptors: I.2.10 [Artificial Intelligence]: Vision and Scene Understanding Modeling and recovery of physical attributes; I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism Color, shading, shadowing, and texture I.4.8 [Image Processing]: Scene Analysis Stereo; J.6 [Computer-Aided Engineering]: Computer-aided design (CAD).

2,159 citations


Journal ArticleDOI
TL;DR: Novel features are a suite of operations relating to the determination, modeling, and correction of the contrast transfer function and the availability of the entire documentation in hypertext format.

2,117 citations


Journal ArticleDOI
TL;DR: A framework based on robust estimation is presented that addresses violations of the brightness constancy and spatial smoothness assumptions caused by multiple motions of optical flow, and is applied to standard formulations of the optical flow problem thus reducing their sensitivity to violations of their underlying assumptions.

1,787 citations


Journal ArticleDOI
TL;DR: The Photobook system is described, which is a set of interactive tools for browsing and searching images and image sequences that make direct use of the image content rather than relying on text annotations to provide a sophisticated browsing and search capability.
Abstract: We describe the Photobook system, which is a set of interactive tools for browsing and searching images and image sequences. These query tools differ from those used in standard image databases in that they make direct use of the image content rather than relying on text annotations. Direct search on image content is made possible by use of semantics-preserving image compression, which reduces images to a small set of perceptually-significant coefficients. We discuss three types of Photobook descriptions in detail: one that allows search based on appearance, one that uses 2-D shape, and a third that allows search based on textural properties. These image content descriptions can be combined with each other and with text-based descriptions to provide a sophisticated browsing and search capability. In this paper we demonstrate Photobook on databases containing images of people, video keyframes, hand tools, fish, texture swatches, and 3-D medical data.

1,748 citations


Book
25 Nov 1996
TL;DR: Algorithms for Image Processing and Computer Vision, 2nd Edition provides the tools to speed development of image processing applications.
Abstract: A cookbook of algorithms for common image processing applicationsThanks to advances in computer hardware and software, algorithms have been developed that support sophisticated image processing without requiring an extensive background in mathematics This bestselling book has been fully updated with the newest of these, including 2D vision methods in content-based searches and the use of graphics cards as image processing computational aids Its an ideal reference for software engineers and developers, advanced programmers, graphics programmers, scientists, and other specialists who require highly specialized image processingAlgorithms now exist for a wide variety of sophisticated image processing applications required by software engineers and developers, advanced programmers, graphics programmers, scientists, and related specialistsThis bestselling book has been completely updated to include the latest algorithms, including 2D vision methods in content-based searches, details on modern classifier methods, and graphics cards used as image processing computational aidsSaves hours of mathematical calculating by using distributed processing and GPU programming, and gives non-mathematicians the shortcuts needed to program relatively sophisticated applicationsAlgorithms for Image Processing and Computer Vision, 2nd Edition provides the tools to speed development of image processing applications

1,517 citations


Journal ArticleDOI
TL;DR: The problem of blind deconvolution for images is introduced, the basic principles and methodologies behind the existing algorithms are provided, and the current trends and the potential of this difficult signal processing problem are examined.
Abstract: The goal of image restoration is to reconstruct the original scene from a degraded observation. This recovery process is critical to many image processing applications. Although classical linear image restoration has been thoroughly studied, the more difficult problem of blind image restoration has numerous research possibilities. We introduce the problem of blind deconvolution for images, provide an overview of the basic principles and methodologies behind the existing algorithms, and examine the current trends and the potential of this difficult signal processing problem. A broad review of blind deconvolution methods for images is given to portray the experience of the authors and of the many other researchers in this area. We first introduce the blind deconvolution problem for general signal processing applications. The specific challenges encountered in image related restoration applications are explained. Analytic descriptions of the structure of the major blind deconvolution approaches for images then follows. The application areas, convergence properties, complexity, and other implementation issues are addressed for each approach. We then discuss the strengths and limitations of various approaches based on theoretical expectations and computer simulations.

1,332 citations


Journal ArticleDOI
TL;DR: A novel observation model based on motion compensated subsampling is proposed for a video sequence and Bayesian restoration with a discontinuity-preserving prior image model is used to extract a high-resolution video still given a short low-resolution sequence.
Abstract: The human visual system appears to be capable of temporally integrating information in a video sequence in such a way that the perceived spatial resolution of a sequence appears much higher than the spatial resolution of an individual frame. While the mechanisms in the human visual system that do this are unknown, the effect is not too surprising given that temporally adjacent frames in a video sequence contain slightly different, but unique, information. This paper addresses the use of both the spatial and temporal information present in a short image sequence to create a single high-resolution video frame. A novel observation model based on motion compensated subsampling is proposed for a video sequence. Since the reconstruction problem is ill-posed, Bayesian restoration with a discontinuity-preserving prior image model is used to extract a high-resolution video still given a short low-resolution sequence. Estimates computed from a low-resolution image sequence containing a subpixel camera pan show dramatic visual and quantitative improvements over bilinear, cubic B-spline, and Bayesian single frame interpolations. Visual and quantitative improvements are also shown for an image sequence containing objects moving with independent trajectories. Finally, the video frame extraction algorithm is used for the motion-compensated scan conversion of interlaced video data, with a visual comparison to the resolution enhancement obtained from progressively scanned frames.

Proceedings ArticleDOI
TL;DR: The Virage engine provides an open framework for developers to 'plug-in' primitives to solve specific image management problems and can be utilized to address high-level problems as well, such as automatic, unsupervised keyword assignment, or image classification.
Abstract: Until recently, the management of large image databases has relied exclusively on manually entered alphanumeric annotations. Systems are beginning to emerge in both the research and commercial sectors based on 'content-based' image retrieval, a technique which explicitly manages image assets by directly representing their visual attributes. The Virage image search engine provides an open framework for building such systems. The Virage engine expresses visual features as image 'primitives.' Primitives can be very general (such as color, shape, or texture) or quite domain specific (face recognition, cancer cell detection, etc.). The basic philosophy underlying this architecture is a transformation from the data-rich representation of explicit image pixels to a compact, semantic-rich representation of visually salient characteristics. In practice, the design of such primitives is non-trivial, and is driven by a number of conflicting real-world constraints (e.g. computation time vs. accuracy). The virage engine provides an open framework for developers to 'plug-in' primitives to solve specific image management problems. The architecture has been designed to support both static images and video in a unified paradigm. The infrastructure provided by the Virage engine can be utilized to address high-level problems as well, such as automatic, unsupervised keyword assignment, or image classification.

Journal ArticleDOI
Richard Szeliski1
TL;DR: This article presents algorithms that align images and composite scenes of increasing complexity-beginning with simple planar scenes and progressing to panoramic scenes and, finally, to scenes with depth variation.
Abstract: As computer-based video becomes ubiquitous with the expansion of transmission, storage, and manipulation capabilities, it will offer a rich source of imagery for computer graphics applications. This article looks at one way to use video as a new source of high-resolution, photorealistic imagery for these applications. If you walked through an environment, such as a building interior, and filmed a video sequence of what you saw you could subsequently register and composite the video images together into large mosaics of the scene. In this way, you can achieve an essentially unlimited resolution. Furthermore, since you can acquire the images using any optical technology, you can reconstruct any scene regardless of its range or scale. Video mosaics can be used in many different applications, including the creation of virtual reality environments, computer-game settings, and movie special effects. I present algorithms that align images and composite scenes of increasing complexity-beginning with simple planar scenes and progressing to panoramic scenes and, finally, to scenes with depth variation. I begin with a review of basic imaging equations and conclude with some novel applications of the virtual environments created using the algorithms presented.

Journal ArticleDOI
01 Apr 1996
TL;DR: The wavelet properties that are the most important for biomedical applications are described and an interpretation of the the continuous wavelet transform (CWT) as a prewhitening multiscale matched filter is provided.
Abstract: We present an overview of the various uses of the wavelet transform (WT) in medicine and biology. We start by describing the wavelet properties that are the most important for biomedical applications. In particular we provide an interpretation of the the continuous wavelet transform (CWT) as a prewhitening multiscale matched filter. We also briefly indicate the analogy between the WT and some of the the biological processing that occurs in the early components of the auditory and visual system. We then review the uses of the WT for the analysis of 1-D physiological signals obtained by phonocardiography, electrocardiography (ECG), mid electroencephalography (EEG), including evoked response potentials. Next, we provide a survey of wavelet developments in medical imaging. These include biomedical image processing algorithms (e.g., noise reduction, image enhancement, and detection of microcalcifications in mammograms), image reconstruction and acquisition schemes (tomography, and magnetic resonance imaging (MRI)), and multiresolution methods for the registration and statistical analysis of functional images of the brain (positron emission tomography (PET) and functional MRI (fMRI)). In each case, we provide the reader with same general background information and a brief explanation of how the methods work.

Journal ArticleDOI
TL;DR: This paper describes the current state of a large set of programs written by various members of the Laboratory of Molecular Biology for processing images of two-dimensional crystals and of particles with helical or icosahedral symmetry for determination of macromolecular structures by electron microscopy.

Proceedings ArticleDOI
18 Jun 1996
TL;DR: A new space-sweep approach to true multi-image matching is presented that simultaneously determines 2D feature correspondences and the 3D positions of feature points in the scene.
Abstract: The problem of determining feature correspondences across multiple views is considered. The term "true multi-image" matching is introduced to describe techniques that make full and efficient use of the geometric relationships between multiple images and the scene. A true multi-image technique must generalize to any number of images, be of linear algorithmic complexity in the number of images, and use all the images in an equal manner. A new space-sweep approach to true multi-image matching is presented that simultaneously determines 2D feature correspondences and the 3D positions of feature points in the scene. The method is illustrated on a seven-image matching example from the aerial image domain.

Journal ArticleDOI
TL;DR: This paper presents a comparison of several shot boundary detection and classification techniques and their variations including histograms, discrete cosine transform, motion vector, and block matching methods.
Abstract: Many algorithms have been proposed for detecting video shot boundaries and classifying shot and shot transition types. Few published studies compare available algorithms, and those that do have looked at limited range of test material. This paper presents a comparison of several shot boundary detection and classification techniques and their variations including histograms, discrete cosine transform, motion vector, and block matching methods. The perfor- mance and ease of selecting good thresholds for these algorithms are evaluated based on a wide variety of video sequences with a good mix of transition types. Threshold selection requires a trade-off between recall and precision that must be guided by the target application. © 1996 SPIE and IS&T.

Patent
01 Aug 1996
TL;DR: An optical device for reading one-and two-dimensional symbologies at variable depths of field is described in this paper, where a light source for projecting emitted light toward the symbol image to be reflected back to an optical assembly, or zoom lens.
Abstract: An optical device for reading one- and two-dimensional symbologies at variable depths of field. The device has a light source for projecting emitted light toward the symbol image to be reflected back to an optical assembly, or zoom lens. The zoom gives multiple field of view capability to a CCD detector for detecting the reflected light, and generating a proportional electrical signal. The sensor is aimed for reading the symbology by a frame locator including a light source that emits a beam divided by diffractive optics into beamlets matching the dimensions of the respective field of views. Refractive optics are shifted in response to movement of the zoom lens for aiming the beamlets to form an aiming frame in accordance with the depth of field selected by the zoom lens. The device includes a microcomputer that communicates with a host PC including an API library with downloadable applications for image processing, including segmenting, analyzing, and decoding.

Journal ArticleDOI
TL;DR: The analysis shows that standard regularization penalties induce space-variant local impulse response functions, even for space-invariant tomographic systems, which leads naturally to a modified regularization penalty that yields reconstructed images with nearly uniform resolution.
Abstract: This paper examines the spatial resolution properties of penalized-likelihood image reconstruction methods by analyzing the local impulse response. The analysis shows that standard regularization penalties induce space-variant local impulse response functions, even for space-invariant tomographic systems. Paradoxically, for emission image reconstruction, the local resolution is generally poorest in high-count regions. We show that the linearized local impulse response induced by quadratic roughness penalties depends on the object only through its projections. This analysis leads naturally to a modified regularization penalty that yields reconstructed images with nearly uniform resolution. The modified penalty also provides a very practical method for choosing the regularization parameter to obtain a specified resolution in images reconstructed by penalized-likelihood methods.

Proceedings ArticleDOI
14 Oct 1996
TL;DR: In this paper, a parametrized image motion of planar patches is constrained to enforce articulated motion and is solved for directly using a robust estimation technique, which provides a rich and concise description of the activity that can be used for recognition.
Abstract: We extend the work of Black and Yacoob (1995) on the tracking and recognition of human facial expressions using parametrized models of optical flow to deal with the articulated motion of human limbs. We define a "card-board person model" in which a person's limbs are represented by a set of connected planar patches. The parametrized image motion of these patches in constrained to enforce articulated motion and is solved for directly using a robust estimation technique. The recovered motion parameters provide a rich and concise description of the activity that can be used for recognition. We propose a method for performing view-based recognition of human activities from the optical flow parameters that extends previous methods to cope with the cyclical nature of human motion. We illustrate the method with examples of tracking human legs of long image sequences.

Journal ArticleDOI
John Immerkær1
TL;DR: The paper presents a fast and simple method for estimating the variance of additive zero mean Gaussian noise in an image that requires only the use of a 3 A— 3 mask followed by a summation over the image or a local neighborhood.

Journal ArticleDOI
TL;DR: A general framework for anisotropic diffusion of multivalued images is presented and the proposed framework is applied to the filtering of color images represented in CIE-L*a*b* space.
Abstract: A general framework for anisotropic diffusion of multivalued images is presented. We propose an evolution equation where, at each point in time, the directions and magnitudes of the maximal and minimal rate of change in the vector-image are first evaluated. These are given by eigenvectors and eigenvalues of the first fundamental form in the given image metric. Then, the image diffuses via a system of coupled differential equations in the direction of minimal change. The diffusion "strength" is controlled by a function that measures the degree of dissimilarity between the eigenvalues. We apply the proposed framework to the filtering of color images represented in CIE-L*a*b* space.

Journal ArticleDOI
01 Aug 1996
TL;DR: The authors demonstrate a solution to one of the key problems in image watermarking, namely how to hide robust invisible labels inside grey scale or colour digital images.
Abstract: A watermark is an invisible mark placed on an image that is designed to identify both the source of an image as well as its intended recipient. The authors present an overview of watermarking techniques and demonstrate a solution to one of the key problems in image watermarking, namely how to hide robust invisible labels inside grey scale or colour digital images.

Journal ArticleDOI
TL;DR: In this paper, an Occam's inversion algorithm for crosshole resistivity data that uses a finite-element method forward solution is discussed, where the earth is discretized into a series of parameter blocks, each containing one or more elements.
Abstract: An Occam's inversion algorithm for crosshole resistivity data that uses a finite-element method forward solution is discussed. For the inverse algorithm, the earth is discretized into a series of parameter blocks, each containing one or more elements. The Occam's inversion finds the smoothest 2-D model for which the Chi-squared statistic equals an a priori value. Synthetic model data are used to show the effects of noise and noise estimates on the resulting 2-D resistivity images. Resolution of the images decreases with increasing noise. The reconstructions are underdetermined so that at low noise levels the images converge to an asymptotic image, not the true geoelectrical section. If the estimated standard deviation is too low, the algorithm cannot achieve an adequate data fit, the resulting image becomes rough, and irregular artifacts start to appear. When the estimated standard deviation is larger than the correct value, the resolution decreases substantially (the image is too smooth). The same effects are demonstrated for field data from a site near Livermore, California. However, when the correct noise values are known, the Occam's results are independent of the discretization used. A case history of monitoring at an enhanced oil recovery site is used to illustrate problems in comparing successive images over time from a site where the noise level changes. In this case, changes in image resolution can be misinterpreted as actual geoelectrical changes. One solution to this problem is to perform smoothest, but non-Occam's, inversion on later data sets using parameters found from the background data set.

Proceedings ArticleDOI
16 Sep 1996
TL;DR: A watermarking scheme to hide copyright information in an image by filtering a pseudo-noise sequence with a filter that approximates the frequency masking characteristics of the visual system to guarantee that the embedded watermark is invisible and to maximize the robustness of the hidden data.
Abstract: We propose a watermarking scheme to hide copyright information in an image. The scheme employs visual masking to guarantee that the embedded watermark is invisible and to maximize the robustness of the hidden data. The watermark is constructed for arbitrary image blocks by filtering a pseudo-noise sequence (author id) with a filter that approximates the frequency masking characteristics of the visual system. The noise-like watermark is statistically invisible to deter unauthorized removal. Experimental results show that the watermark is robust to several distortions including white and colored noises, JPEG coding at different qualities, and cropping.

Patent
13 May 1996
TL;DR: In this paper, a system for spectroscopic imaging of bodily tissue in which a scintillation screen and a charged coupled device (CCD) are used to accurately image selected tissue.
Abstract: A system for spectroscopic imaging of bodily tissue in which a scintillation screen and a charged coupled device (CCD) are used to accurately image selected tissue. An x-ray source generates x-rays which pass through a region of a subject's body, forming an x-ray image which reaches the scintillation screen. The scintillation screen reradiates a spatial intensity pattern corresponding to the image, the pattern being detected by a CCD sensor. The image is digitized by the sensor and processed by a controller before being stored as an electronic image. Each image is directed onto an associated respective CCD or amorphous silicon detector to generate individual electronic representations of the separate images.

Journal ArticleDOI
TL;DR: The soft tissue correlation and mutual information measures were found to provide the most robust measures of misregistration, providing results comparable to or better than those from manual point-based registration for all but the most truncated image volumes.

Proceedings ArticleDOI
01 Aug 1996
TL;DR: A prototype system for surgical planning and prediction of human facial shape after craniofacial and maxillofacial surgery for patients with facial deformities is described, which combines, unifies, and extends various methods from geometric modeling, finite element analysis, and image processing to render highly realistic 3D images of the post surgical situation.
Abstract: This paper describes a prototype system for surgical planning and prediction of human facial shape after craniofacial and maxillofacial surgery for patients with facial deformities. For this purpose it combines, unifies, and extends various methods from geometric modeling, finite element analysis, and image processing to render highly realistic 3D images of the post surgical situation. The basic concept of the system is to join advanced geometric modeling and animation systems such as Alias with a special purpose finite element model of the human face developed under AVS. In contrast to existing facial models we acquire facial surface and soft tissue data both from photogrammetric and CT scans of the individual. After initial data preprocessing, reconstruction, and registration, a finite element model of the facial surface and soft tissue is provided which is based on triangular finite elements. Stiffness parameters of the soft tissue are computed using segmentations of the underlying CT data. All interactive procedures such as bone and soft tissue repositioning are performed under the guidance of the modeling system which feeds the processed geometry into the FEM solver. The resulting shape is generated from minimizing the global energy of the surface under the presence of external forces. Photorealistic pictures are obtained from rendering the facial surface with the advanced animation system on which this prototype is built. Although we do not claim any of the presented algorithms themselves to be new, the synthesis of several methods offers a new facial model quality. Our concept is a significant extension to existing ones and, due to its versatility, can be employed in different applications such as facial animation, facial reconstruction, or the simulation of aging. We illustrate features of our system with some examples from the Visible Human Data Set.TM CR Descriptors: I.3.5 [Computational Geometry and Object Modeling]: Physically Based Modeling; I.3.7 [Three-Dimensional Graphics and Realism]; I.4.6 [Segmentation]: Edge and Feature Detection Pixel Classification; I.6.3 [Applications]; Additional

Journal ArticleDOI
TL;DR: An efficient solution is proposed in which the optimum combination of macroblock modes and the associated mode parameters are jointly selected so as to minimize the overall distortion for a given bit-rate budget, and is successfully applied to the emerging H.263 video coding standard.
Abstract: This paper addresses the problem of encoder optimization in a macroblock-based multimode video compression system. An efficient solution is proposed in which, for a given image region, the optimum combination of macroblock modes and the associated mode parameters are jointly selected so as to minimize the overall distortion for a given bit-rate budget. Conditions for optimizing the encoder operation are derived within a rate-constrained product code framework using a Lagrangian formulation. The instantaneous rate of the encoder is controlled by a single Lagrange multiplier that makes the method amenable to mobile wireless networks with time-varying capacity. When rate and distortion dependencies are introduced between adjacent blocks (as is the case when the motion vectors are differentially encoded and/or overlapped block motion compensation is employed), the ensuing encoder complexity is surmounted using dynamic programming. Due to the generic nature of the algorithm, it can be successfully applied to the problem of encoder control in numerous video coding standards, including H.261, MPEG-1, and MPEG-2. Moreover, the strategy is especially relevant for very low bit rate coding over wireless communication channels where the low dimensionality of the images associated with these bit rates makes real-time implementation very feasible. Accordingly, in this paper, the method is successfully applied to the emerging H.263 video coding standard with excellent results at rates as low as 8.0 Kb per second. Direct comparisons with the H.263 test model, TMN5, demonstrate that gains in peak signal-to-noise ratios (PSNR) are achievable over a wide range of rates.

Patent
Dan S. Bloomberg1
06 Sep 1996
TL;DR: In this article, an encoding operation encodes the data unobtrusively in the form of rectangular blocks that have a foreground color and size dimensions proportional to the iconic image so that when placed in the iconic images in horizontal lines, the blocks appear to a viewer to be representative of the text portion of the original image that they replace.
Abstract: Encoded data embedded in an iconic, or reduced size, version of an original text image is decoded and used in a variety of document image management applications to provide input to, or to control the functionality of, an application. The iconic image may be printed in a suitable place (e.g., the margin or other background region) in the original text image so that a text image so annotated will then always carry the embedded data in subsequent copies made from the annotated original. The iconic image may also be used as part of a graphical user interface as a surrogate for the original text image. An encoding operation encodes the data unobtrusively in the form of rectangular blocks that have a foreground color and size dimensions proportional to the iconic image so that when placed in the iconic image in horizontal lines, the blocks appear to a viewer to be representative of the text portion of the original image that they replace. Several embodiments are illustrated, including using the iconic image as a document surrogate for the original text image for data base retrieval operations. The iconic image may also be used in conjunction with the original text image for purposes of authenticating the original document using a digital signature encoded in the iconic image, or for purposes of controlling the authorized distribution of the document. The iconic image may also carry data about the original image that may be used to enhance the performance and accuracy of a subsequent character recognition operation.