scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Image Processing in 2000"


Journal ArticleDOI
TL;DR: An adaptive, data-driven threshold for image denoising via wavelet soft-thresholding derived in a Bayesian framework, and the prior used on the wavelet coefficients is the generalized Gaussian distribution widely used in image processing applications.
Abstract: The first part of this paper proposes an adaptive, data-driven threshold for image denoising via wavelet soft-thresholding. The threshold is derived in a Bayesian framework, and the prior used on the wavelet coefficients is the generalized Gaussian distribution (GGD) widely used in image processing applications. The proposed threshold is simple and closed-form, and it is adaptive to each subband because it depends on data-driven estimates of the parameters. Experimental results show that the proposed method, called BayesShrink, is typically within 5% of the MSE of the best soft-thresholding benchmark with the image assumed known. It also outperforms SureShrink (Donoho and Johnstone 1994, 1995; Donoho 1995) most of the time. The second part of the paper attempts to further validate claims that lossy compression can be used for denoising. The BayesShrink threshold can aid in the parameter selection of a coder designed with the intention of denoising, and thus achieving simultaneous denoising and compression. Specifically, the zero-zone in the quantization step of compression is analogous to the threshold value in the thresholding function. The remaining coder design parameters are chosen based on a criterion derived from Rissanen's minimum description length (MDL) principle. Experiments show that this compression method does indeed remove noise significantly, especially for large noise power. However, it introduces quantization noise and should be used only if bitrate were an additional concern to denoising.

2,917 citations


Journal ArticleDOI
TL;DR: Experimental results show that the proposed diamond search (DS) algorithm is better than the four-step search (4SS) and block-based gradient descent search (BBGDS), in terms of mean-square error performance and required number of search points.
Abstract: Based on the study of motion vector distribution from several commonly used test image sequences, a new diamond search (DS) algorithm for fast block-matching motion estimation (BMME) is proposed in this paper. Simulation results demonstrate that the proposed DS algorithm greatly outperforms the well-known three-step search (TSS) algorithm. Compared with the new three-step search (NTSS) algorithm, the DS algorithm achieves close performance but requires less computation by up to 22% on average. Experimental results also show that the DS algorithm is better than the four-step search (4SS) and block-based gradient descent search (BBGDS), in terms of mean-square error performance and required number of search points.

1,949 citations


Journal ArticleDOI
TL;DR: A new image compression algorithm is proposed, based on independent embedded block coding with optimized truncation of the embedded bit-streams (EBCOT), capable of modeling the spatially varying visual masking phenomenon.
Abstract: A new image compression algorithm is proposed, based on independent embedded block coding with optimized truncation of the embedded bit-streams (EBCOT). The algorithm exhibits state-of-the-art compression performance while producing a bit-stream with a rich set of features, including resolution and SNR scalability together with a "random access" property. The algorithm has modest complexity and is suitable for applications involving remote browsing of large compressed images. The algorithm lends itself to explicit optimization with respect to MSE as well as more realistic psychovisual metrics, capable of modeling the spatially varying visual masking phenomenon.

1,933 citations


Journal ArticleDOI
TL;DR: LOCO-I as discussed by the authors is a low complexity projection of the universal context modeling paradigm, matching its modeling unit to a simple coding unit, which is based on a simple fixed context model, which approaches the capability of more complex universal techniques for capturing high-order dependencies.
Abstract: LOCO-I (LOw COmplexity LOssless COmpression for Images) is the algorithm at the core of the new ISO/ITU standard for lossless and near-lossless compression of continuous-tone images, JPEG-LS. It is conceived as a "low complexity projection" of the universal context modeling paradigm, matching its modeling unit to a simple coding unit. By combining simplicity with the compression potential of context models, the algorithm "enjoys the best of both worlds." It is based on a simple fixed context model, which approaches the capability of the more complex universal techniques for capturing high-order dependencies. The model is tuned for efficient performance in conjunction with an extended family of Golomb (1966) type codes, which are adaptively chosen, and an embedded alphabet extension for coding of low-entropy image regions. LOCO-I attains compression ratios similar or superior to those obtained with state-of-the-art schemes based on arithmetic coding. Moreover, it is within a few percentage points of the best available compression ratios, at a much lower complexity level. We discuss the principles underlying the design of LOCO-I, and its standardization into JPEC-LS.

1,668 citations


Journal ArticleDOI
TL;DR: A filter-based fingerprint matching algorithm which uses a bank of Gabor filters to capture both local and global details in a fingerprint as a compact fixed length FingerCode and is able to achieve a verification accuracy which is only marginally inferior to the best results of minutiae-based algorithms published in the open literature.
Abstract: Biometrics-based verification, especially fingerprint-based identification, is receiving a lot of attention. There are two major shortcomings of the traditional approaches to fingerprint representation. For a considerable fraction of population, the representations based on explicit detection of complete ridge structures in the fingerprint are difficult to extract automatically. The widely used minutiae-based representation does not utilize a significant component of the rich discriminatory information available in the fingerprints. Local ridge structures cannot be completely characterized by minutiae. Further, minutiae-based matching has difficulty in quickly matching two fingerprint images containing a different number of unregistered minutiae points. The proposed filter-based algorithm uses a bank of Gabor filters to capture both local and global details in a fingerprint as a compact fixed length FingerCode. The fingerprint matching is based on the Euclidean distance between the two corresponding FingerCodes and hence is extremely fast. We are able to achieve a verification accuracy which is only marginally inferior to the best results of minutiae-based algorithms published in the open literature. Our system performs better than a state-of-the-art minutiae-based system when the performance requirement of the application system does not demand a very low false acceptance rate. Finally, we show that the matching performance can be improved by combining the decisions of the matchers based on complementary (minutiae-based and filter-based) fingerprint information.

1,207 citations


Journal ArticleDOI
TL;DR: A scheme for adaptive image-contrast enhancement based on a generalization of histogram equalization (HE), which can produce a range of degrees of contrast enhancement, at one extreme leaving the image unchanged, at another yielding full adaptive equalization.
Abstract: This paper proposes a scheme for adaptive image-contrast enhancement based on a generalization of histogram equalization (HE). HE is a useful technique for improving image contrast, but its effect is too severe for many purposes. However, dramatically different results can be obtained with relatively minor modifications. A concise description of adaptive HE is set out, and this framework is used in a discussion of past suggestions for variations on HE. A key feature of this formalism is a "cumulation function," which is used to generate a grey level mapping from the local histogram. By choosing alternative forms of cumulation function one can achieve a wide variety of effects. A specific form is proposed. Through the variation of one or two parameters, the resulting process can produce a range of degrees of contrast enhancement, at one extreme leaving the image unchanged, at another yielding full adaptive equalization.

1,034 citations


Journal ArticleDOI
TL;DR: A spatially adaptive wavelet thresholding method based on context modeling, a common technique used in image compression to adapt the coder to changing image characteristics, which yields significantly superior image quality and lower MSE than the best uniform thresholding with the original image assumed known.
Abstract: The method of wavelet thresholding for removing noise, or denoising, has been researched extensively due to its effectiveness and simplicity. Much of the literature has focused on developing the best uniform threshold or best basis selection. However, not much has been done to make the threshold values adaptive to the spatially changing statistics of images. Such adaptivity can improve the wavelet thresholding performance because it allows additional local information of the image (such as the identification of smooth or edge regions) to be incorporated into the algorithm. This work proposes a spatially adaptive wavelet thresholding method based on context modeling, a common technique used in image compression to adapt the coder to changing image characteristics. Each wavelet coefficient is modeled as a random variable of a generalized Gaussian distribution with an unknown parameter. Context modeling is used to estimate the parameter for each coefficient, which is then used to adapt the thresholding strategy. This spatially adaptive thresholding is extended to the overcomplete wavelet expansion, which yields better results than the orthogonal transform. Experimental results show that spatially adaptive wavelet thresholding yields significantly superior image quality and lower MSE than the best uniform thresholding with the original image assumed known.

875 citations


Journal ArticleDOI
TL;DR: It is demonstrated how to decouple distortion and additive noise degradation in a practical image restoration system and the nonlinear NQM is a better measure of visual quality than peak signal-to noise ratio (PSNR) and linear quality measures.
Abstract: We model a degraded image as an original image that has been subject to linear frequency distortion and additive noise injection. Since the psychovisual effects of frequency distortion and noise injection are independent, we decouple these two sources of degradation and measure their effect on the human visual system. We develop a distortion measure (DM) of the effect of frequency distortion, and a noise quality measure (NQM) of the effect of additive noise. The NQM, which is based on Peli's (1990) contrast pyramid, takes into account the following: 1) variation in contrast sensitivity with distance, image dimensions, and spatial frequency; 2) variation in the local luminance mean; 3) contrast interaction between spatial frequencies; 4) contrast masking effects. For additive noise, we demonstrate that the nonlinear NQM is a better measure of visual quality than peak signal-to noise ratio (PSNR) and linear quality measures. We compute the DM in three steps. First, we find the frequency distortion in the degraded image. Second, we compute the deviation of this frequency distortion from an allpass response of unity gain (no distortion). Finally, we weight the deviation by a model of the frequency response of the human visual system and integrate over the visible frequencies. We demonstrate how to decouple distortion and additive noise degradation in a practical image restoration system.

820 citations


Journal ArticleDOI
TL;DR: The PicHunter project as mentioned in this paper represents a simple instance of a general Bayesian framework which they describe for using relevance feedback to direct a search, with an explicit model of what users would do, given the target image they want, using Bayes's rule to predict the target they want.
Abstract: Presents the theory, design principles, implementation and performance results of PicHunter, a prototype content-based image retrieval (CBIR) system. In addition, this document presents the rationale, design and results of psychophysical experiments that were conducted to address some key issues that arose during PicHunter's development. The PicHunter project makes four primary contributions to research on CBIR. First, PicHunter represents a simple instance of a general Bayesian framework which we describe for using relevance feedback to direct a search. With an explicit model of what users would do, given the target image they want, PicHunter uses Bayes's rule to predict the target they want, given their actions. This is done via a probability distribution over possible image targets, rather than by refining a query. Second, an entropy-minimizing display algorithm is described that attempts to maximize the information obtained from a user at each iteration of the search. Third, PicHunter makes use of hidden annotation rather than a possibly inaccurate/inconsistent annotation structure that the user must learn and make queries in. Finally, PicHunter introduces two experimental paradigms to quantitatively evaluate the performance of the system, and psychophysical experiments are presented that support the theoretical claims.

792 citations


Journal ArticleDOI
TL;DR: This work proposes a new method for the intermodal registration of images using a criterion known as mutual information and builds a multiresolution image pyramid around the unifying concept of spline-processing.
Abstract: We propose a new method for the intermodal registration of images using a criterion known as mutual information. Our main contribution is an optimizer that we specifically designed for this criterion. We show that this new optimizer is well adapted to a multiresolution approach because it typically converges in fewer criterion evaluations than other optimizers. We have built a multiresolution image pyramid, along with an interpolation process, an optimizer, and the criterion itself, around the unifying concept of spline-processing. This ensures coherence in the way we model data and yields good performance. We have tested our approach in a variety of experimental conditions and report excellent results. We claim an accuracy of about a hundredth of a pixel under ideal conditions. We are also robust since the accuracy is still about a tenth of a pixel under very noisy conditions. In addition, a blind evaluation of our results compares very favorably to the work of several other researchers.

780 citations


Journal ArticleDOI
TL;DR: A class of fourth-order partial differential equations (PDEs) are proposed to optimize the trade-off between noise removal and edge preservation, and speckles are more visible in images processed by the proposed PDEs, because piecewise planar images are less likely to mask speckling.
Abstract: A class of fourth-order partial differential equations (PDEs) are proposed to optimize the trade-off between noise removal and edge preservation. The time evolution of these PDEs seeks to minimize a cost functional which is an increasing function of the absolute value of the Laplacian of the image intensity function. Since the Laplacian of an image at a pixel is zero if the image is planar in its neighborhood, these PDEs attempt to remove noise and preserve edges by approximating an observed image with a piecewise planar image. Piecewise planar images look more natural than step images which anisotropic diffusion (second order PDEs) uses to approximate an observed image. So the proposed PDEs are able to avoid the blocky effects widely seen in images processed by anisotropic diffusion, while achieving the degree of noise removal and edge preservation comparable to anisotropic diffusion. Although both approaches seem to be comparable in removing speckles in the observed images, speckles are more visible in images processed by the proposed PDEs, because piecewise planar images are less likely to mask speckles than step images and anisotropic diffusion tends to generate multiple false edges. Speckles can be easily removed by simple algorithms such as the one presented in this paper.

Journal ArticleDOI
TL;DR: A new method for unsharp masking for contrast enhancement of images is presented that employs an adaptive filter that controls the contribution of the sharpening path in such a way that contrast enhancement occurs in high detail areas and little or no image sharpening occurs in smooth areas.
Abstract: This paper presents a new method for unsharp masking for contrast enhancement of images. The approach employs an adaptive filter that controls the contribution of the sharpening path in such a way that contrast enhancement occurs in high detail areas and little or no image sharpening occurs in smooth areas.

Journal ArticleDOI
TL;DR: Analysis of a spread-spectrum-like discrete cosine transform (DCT) domain watermarking technique for copyright protection of still digital images is analyzed and analytical expressions for performance measures are derived and contrasted with experimental results.
Abstract: A spread-spectrum-like discrete cosine transform (DCT) domain watermarking technique for copyright protection of still digital images is analyzed. The DCT is applied in blocks of 8/spl times/8 pixels, as in the JPEG algorithm. The watermark can encode information to track illegal misuses. For flexibility purposes, the original image is not necessary during the ownership verification process, so it must be modeled by noise. Two tests are involved in the ownership verification stage: watermark decoding, in which the message carried by the watermark is extracted, and watermark detection, which decides whether a given image contains a watermark generated with a certain key. We apply generalized Gaussian distributions to statistically model the DCT coefficients of the original image and show how the resulting detector structures lead to considerable improvements in performance with respect to the correlation receiver, which has been widely considered in the literature and makes use of the Gaussian noise assumption. As a result of our work, analytical expressions for performance measures, such as the probability of errors in watermark decoding and the probabilities of false alarms and of detection in watermark detection, are derived and contrasted with experimental results.

Journal ArticleDOI
TL;DR: This work offers a rigorous mathematical analysis using a doubly stochastic model of the images, which not only provides the theoretical explanations necessary, but also leads to insights about various other observations from the literature.
Abstract: Over the past two decades, there have been various studies on the distributions of the DCT coefficients for images. However, they have concentrated only on fitting the empirical data from some standard pictures with a variety of well-known statistical distributions, and then comparing their goodness of fit. The Laplacian distribution is the dominant choice balancing simplicity of the model and fidelity to the empirical data. Yet, to the best of our knowledge, there has been no mathematical justification as to what gives rise to this distribution. We offer a rigorous mathematical analysis using a doubly stochastic model of the images, which not only provides the theoretical explanations necessary, but also leads to insights about various other observations from the literature. This model also allows us to investigate how certain changes in the image statistics could affect the DCT coefficient distributions.

Journal ArticleDOI
TL;DR: This work presents algorithms for detecting and tracking text in digital video that implements a scale-space feature extractor that feeds an artificial neural processor to detect text blocks.
Abstract: Text that appears in a scene or is graphically added to video can provide an important supplemental source of index information as well as clues for decoding the video's structure and for classification. In this work, we present algorithms for detecting and tracking text in digital video. Our system implements a scale-space feature extractor that feeds an artificial neural processor to detect text blocks. Our text tracking scheme consists of two modules: a sum of squared difference (SSD) based module to find the initial position and a contour-based module to refine the position. Experiments conducted with a variety of video sources show that our scheme can detect and track text robustly.

Journal ArticleDOI
TL;DR: The paper shows that the amount of bits necessary to encode a binary partition tree remains moderate and can be used for a large number of processing goals such as filtering, segmentation, information retrieval and visual browsing.
Abstract: This paper discusses the interest of binary partition trees as a region-oriented image representation. Binary partition trees concentrate in a compact and structured representation a set of meaningful regions that can be extracted from an image. They offer a multiscale representation of the image and define a translation invariant 2-connectivity rule among regions. As shown in this paper, this representation can be used for a large number of processing goals such as filtering, segmentation, information retrieval and visual browsing. Furthermore, the processing of the tree representation leads to very efficient algorithms. Finally, for some applications, it may be interesting to compute the binary partition tree once and to store it for subsequent use for various applications. In this context, the paper shows that the amount of bits necessary to encode a binary partition tree remains moderate.

Journal ArticleDOI
TL;DR: A new algorithm based on polar maps is detailed for the accurate and efficient recovery of the template in an image which has undergone a general affine transformation and results are presented which demonstrate the robustness of the method against some common image processing operations.
Abstract: Digital watermarks have been proposed as a method for discouraging illicit copying and distribution of copyrighted material. This paper describes a method for the secure and robust copyright protection of digital images. We present an approach for embedding a digital watermark into an image using the Fourier transform. To this watermark is added a template in the Fourier transform domain to render the method robust against general linear transformations. We detail a new algorithm based on polar maps for the accurate and efficient recovery of the template in an image which has undergone a general affine transformation. We also present results which demonstrate the robustness of the method against some common image processing operations such as compression, rotation, scaling, and aspect ratio changes.

Journal ArticleDOI
TL;DR: Conditions for the existence of solutions in the space of diffeomorphisms are established, with a gradient algorithm provided for generating the optimal flow solving the minimum problem.
Abstract: This paper describes the generation of large deformation diffeomorphisms /spl phi/:/spl Omega/=[0,1]/sup 3//spl rlhar2//spl Omega/ for landmark matching generated as solutions to the transport equation d/spl phi/(x,t)/dt=/spl nu/(/spl phi/(x,t),t),t/spl isin/[0,1] and /spl phi/(x,0)=x, with the image map defined as /spl phi/(/spl middot/,1) and therefore controlled via the velocity field /spl nu/(/spl middot/,t),t/spl isin/[0,1]. Imagery are assumed characterized via sets of landmarks {x/sub n/, y/sub n/, n=1, 2, ..., N}. The optimal diffeomorphic match is constructed to minimize a running smoothness cost /spl par/L/spl nu//spl par//sup 2/ associated with a linear differential operator L on the velocity field generating the diffeomorphism while simultaneously minimizing the matching end point condition of the landmarks. Both inexact and exact landmark matching is studied here. Given noisy landmarks x/sub n/ matched to y/sub n/ measured with error covariances /spl Sigma//sub n/, then the matching problem is solved generating the optimal diffeomorphism /spl phi//spl circ/(x,1)=/spl int//sub 0//sup 1//spl nu//spl circ/(/spl phi//spl circ/(x,t),t)dt+x where /spl nu//spl circ/(/spl middot/)argmin/sub /spl nu/(/spl middot/)//spl int//sub 1//sup 1//spl int//sub /spl Omega///spl par/L/spl nu/(x,t)/spl par//sup 2/dxdt +/spl Sigma//sub n=1//sup N/[y/sub n/-/spl phi/(x/sub n/,1)]/sup T//spl Sigma//sub n//sup -1/[y/sub n/-/spl phi/(x/sub n/,1)]. Conditions for the existence of solutions in the space of diffeomorphisms are established, with a gradient algorithm provided for generating the optimal flow solving the minimum problem. Results on matching two-dimensional (2-D) and three-dimensional (3-D) imagery are presented in the macaque monkey.

Journal ArticleDOI
TL;DR: It is concluded that object retrieval based on composite color and shape invariant features provides excellent retrieval accuracy and the image retrieval scheme is highly robust to partial occlusion, object clutter and a change in the object's pose.
Abstract: We aim at combining color and shape invariants for indexing and retrieving images. To this end, color models are proposed independent of the object geometry, object pose, and illumination. From these color models, color invariant edges are derived from which shape invariant features are computed. Computational methods are described to combine the color and shape invariants into a unified high-dimensional invariant feature set for discriminatory object retrieval. Experiments have been conducted on a database consisting of 500 images taken from multicolored man-made objects in real world scenes. From the theoretical and experimental results it is concluded that object retrieval based on composite color and shape invariant features provides excellent retrieval accuracy. Object retrieval based on color invariants provides very high retrieval accuracy whereas object retrieval based entirely on shape invariants yields poor discriminative power. Furthermore, the image retrieval scheme is highly robust to partial occlusion, object clutter and a change in the object's pose. Finally, the image retrieval scheme is integrated into the PicToSeek system on-line at http://www.wins.uva.nl/research/isis/PicToSeek/ for searching images on the World Wide Web.


Journal ArticleDOI
TL;DR: It is shown that oblivious watermarking techniques that embed information into a host image in a block-wise independent fashion are vulnerable to a counterfeiting attack.
Abstract: We describe a class of attacks on certain block-based oblivious watermarking schemes. We show that oblivious watermarking techniques that embed information into a host image in a block-wise independent fashion are vulnerable to a counterfeiting attack. Specifically, given a watermarked image, one can forge the watermark it contains into another image without knowing the secret key used for watermark insertion and in some cases even without explicitly knowing the watermark. We demonstrate successful implementations of this attack on a few watermarking techniques that have been proposed in the literature. We also describe a possible solution to this problem of block-wise independence that makes our attack computationally intractable.

Journal ArticleDOI
TL;DR: A new, sequential algorithm is presented, which is faster in typical applications and is especially advantageous for image sequences: the KL basis calculation is done with much lower delay and allows for dynamic updating of image databases.
Abstract: The Karhunen-Loeve (KL) transform is an optimal method for approximating a set of vectors or images, which was used in image processing and computer vision for several tasks such as face and object recognition. Its computational demands and its batch calculation nature have limited its application. Here we present a new, sequential algorithm for calculating the KL basis, which is faster in typical applications and is especially advantageous for image sequences: the KL basis calculation is done with much lower delay and allows for dynamic updating of image databases. Systematic tests of the implemented algorithm show that these advantages are indeed obtained with the same accuracy available from batch KL algorithms.

Journal ArticleDOI
TL;DR: At low bit rates, reversible integer-to-integer and conventional versions of transforms were found to often yield results of comparable quality, with the best choice for a given application depending on the relative importance of the preceding criteria.
Abstract: In the context of image coding, a number of reversible integer-to-integer wavelet transforms are compared on the basis of their lossy compression performance, lossless compression performance, and computational complexity. Of the transforms considered, several were found to perform particularly well, with the best choice for a given application depending on the relative importance of the preceding criteria. Reversible integer-to-integer versions of numerous transforms are also compared to their conventional (i.e., nonreversible real-to-real) counterparts for lossy compression. At low bit rates, reversible integer-to-integer and conventional versions of transforms were found to often yield results of comparable quality. Factors affecting the compression performance of reversible integer-to-integer wavelet transforms are also presented, supported by both experimental data and theoretical arguments.

Journal ArticleDOI
TL;DR: From the experimental results, it is concluded that global motion estimation provides significant performance gains for video material with camera zoom and/or pan and that the robust error criterion can introduce additional performance gains without increasing computational complexity.
Abstract: In this paper, we propose an efficient, robust, and fast method for the estimation of global motion from image sequences. The method is generic in that it can accommodate various global motion models, from a simple translation to an eight-parameter perspective model. The algorithm is hierarchical and consists of three stages. In the first stage, a low-pass image pyramid is built. Then, an initial translation is estimated with full-pixel precision at the top of the pyramid using a modified n-step search matching. In the third stage, a gradient descent is executed at each level of the pyramid starting from the initial translation at the coarsest level. Due to the coarse initial estimation and the hierarchical implementation, the method is very fast. To increase robustness to outliers, we replace the usual formulation based on a quadratic error criterion with a truncated quadratic function. We have applied the algorithm to various test sequences within an MPEG-4 coding system. From the experimental results we conclude that global motion estimation provides significant performance gains for video material with camera zoom and/or pan. The gains result from a reduced prediction error and a more compact representation of motion. We also conclude that the robust error criterion can introduce additional performance gains without increasing computational complexity.

Journal ArticleDOI
TL;DR: A new representation of an image which is contrast independent is set out, which is decomposed into a tree of "shapes" based on connected components of level sets, which provides a full and nonredundant representation of the image.
Abstract: This paper sets out a new representation of an image which is contrast independent. The image is decomposed into a tree of "shapes" based on connected components of level sets, which provides a full and nonredundant representation of the image. A fast algorithm to compute the tree, the fast level lines transform (FLLT), is explained in detail. Some simple and direct applications of this representation are shown.

Journal ArticleDOI
TL;DR: A line-based approach for the implementation of the wavelet transform is introduced, which yields the same results as a "normal" implementation, but where, unlike prior work, the memory issues arising from the need to synchronize encoder and decoder are addressed.
Abstract: This paper addresses the problem of low memory wavelet image compression. While wavelet or subband coding of images has been shown to be superior to more traditional transform coding techniques, little attention has been paid until recently to the important issue of whether both the wavelet transforms and the subsequent coding can be implemented in low memory without significant loss in performance. We present a complete system to perform low memory wavelet image coding. Our approach is "line-based" in that the images are read line by line and only the minimum required number of lines is kept in memory. There are two main contributions of our work. First, we introduce a line-based approach for the implementation of the wavelet transform, which yields the same results as a "normal" implementation, but where, unlike prior work, we address memory issues arising from the need to synchronize encoder and decoder. Second, we propose a novel context-based encoder which requires no global information and stores only a local set of wavelet coefficients. This low memory coder achieves performance comparable to state of the art coders at a fraction of their memory utilization.

Journal ArticleDOI
TL;DR: A novel boundary detection scheme based on "edge flow" that facilitates integration of color and texture into a single framework for boundary detection and demonstrates the usefulness of this method to content based image retrieval.
Abstract: A novel boundary detection scheme based on "edge flow" is proposed in this paper. This scheme utilizes a predictive coding model to identify the direction of change in color and texture at each image location at a given scale, and constructs an edge flow vector. By propagating the edge flow vectors, the boundaries can be detected at image locations which encounter two opposite directions of flow in the stable state. A user defined image scale is the only significant control parameter that is needed by the algorithm. The scheme facilitates integration of color and texture into a single framework for boundary detection. Segmentation results on a large and diverse collections of natural images are provided, demonstrating the usefulness of this method to content based image retrieval.

Journal ArticleDOI
TL;DR: A novel formulation for B-spline snakes is presented that can be used as a tool for fast and intuitive contour outlining, and the intrinsic scale of the spline model is adjusted a priori, leading to a reduction of the number of parameters to be optimized and eliminates the need for internal energies.
Abstract: We present a novel formulation for B-spline snakes that can be used as a tool for fast and intuitive contour outlining. We start with a theoretical argument in favor of splines in the traditional formulation by showing that the optimal, curvature-constrained snake is a cubic spline, irrespective of the form of the external energy field. Unfortunately, such regularized snakes suffer from slow convergence speed because of a large number of control points, as well as from difficulties in determining the weight factors associated to the internal energies of the curve. We therefore propose an alternative formulation in which the intrinsic scale of the spline model is adjusted a priori; this leads to a reduction of the number of parameters to be optimized and eliminates the need for internal energies (i.e., the regularization term). In other words, we are now controlling the elasticity of the spline implicitly and rather intuitively by varying the spacing between the spline knots. The theory is embedded into a multiresolution formulation demonstrating improved stability in noisy image environments. Validation results are presented, comparing the traditional snake using internal energies and the proposed approach without internal energies, showing the similar performance of the latter. Several biomedical examples of applications are included to illustrate the versatility of the method.

Journal ArticleDOI
TL;DR: Simplicity of computations and ease of implementation allow for real-time applications such as high-definition television (HDTV) and the power-spectra augmentation and the visual enhancement of several images.
Abstract: A technique for enhancing the perceptual sharpness of an image is described. The enhancement algorithm augments the frequency content of the image using shape-invariant properties of edges across scale by using a nonlinearity that generates phase coherent higher harmonics. The procedure utilizes the Laplacian transform and the Laplacian pyramid image representation. Results are presented depicting the power-spectra augmentation and the visual enhancement of several images. Simplicity of computations and ease of implementation allow for real-time applications such as high-definition television (HDTV).

Journal ArticleDOI
TL;DR: In this article, a hierarchical approach to color image segmentation is studied, where uniform regions are identified via multilevel thresholding on a homogeneity histogram, both local and global information is taken into consideration.
Abstract: In this paper, a novel hierarchical approach to color image segmentation is studied. We extend the general idea of a histogram to the homogeneity domain. In the first phase of the segmentation, uniform regions are identified via multilevel thresholding on a homogeneity histogram. While we process the homogeneity histogram, both local and global information is taken into consideration. This is particularly helpful in taking care of small objects and local variation of color images. An efficient peak-finding algorithm is employed to identify the most significant peaks of the histogram. In the second phase, we perform histogram analysis on the color feature hue for each uniform region obtained in the first phase. We successfully remove about 99.7% singularity off the original images by redefining the hue values for the unstable points according to the local information. After the hierarchical segmentation is performed, a region merging process is employed to avoid over-segmentation. CIE(L*a*b*) color space is used to measure the color difference. Experimental results have demonstrated the effectiveness and superiority of the proposed method after an extensive set of color images was tested.