scispace - formally typeset
Search or ask a question

Showing papers by "Alan C. Bovik published in 2000"


Book
01 Jan 2000
TL;DR: The Handbook of Image and Video Processing contains a comprehensive and highly accessible presentation of all essential mathematics, techniques, and algorithms for every type of image and video processing used by scientists and engineers.
Abstract: 1.0 INTRODUCTION 1.1 Introduction to Image and Video Processing (Bovik) 2.0 BASIC IMAGE PROCESSING TECHNIQUES 2.1 Basic Gray-Level Image Processing (Bovik) 2.2 Basic Binary Image Processing (Desai/Bovik) 2.3 Basic Image Fourier Analysis and Convolution (Bovik) 3.0 IMAGE AND VIDEO PROCESSING Image and Video Enhancement and Restoration 3.1 Basic Linear Filtering for Image Enhancement (Acton/Bovik) 3.2 Nonlinear Filtering for Image Enhancement (Arce) 3.3 Morphological Filtering for Image Enhancement and Detection (Maragos) 3.4 Wavelet Denoising for Image Enhancement (Wei) 3.5 Basic Methods for Image Restoration and Identification (Biemond) 3.6 Regularization for Image Restoration and Reconstruction (Karl) 3.7 Multi-Channel Image Recovery (Galatsanos) 3.8 Multi-Frame Image Restoration (Schulz) 3.9 Iterative Image Restoration (Katsaggelos) 3.10 Motion Detection and Estimation (Konrad) 3.11 Video Enhancement and Restoration (Lagendijk) Reconstruction from Multiple Images 3.12 3-D Shape Reconstruction from Multiple Views (Aggarwal) 3.13 Image Stabilization and Mosaicking (Chellappa) 4.0 IMAGE AND VIDEO ANALYSIS Image Representations and Image Models 4.1 Computational Models of Early Human Vision (Cormack) 4.2 Multiscale Image Decomposition and Wavelets (Moulin) 4.3 Random Field Models (Zhang) 4.4 Modulation Models (Havlicek) 4.5 Image Noise Models (Boncelet) 4.6 Color and Multispectral Representations (Trussell) Image and Video Classification and Segmentation 4.7 Statistical Methods (Lakshmanan) 4.8 Multi-Band Techniques for Texture Classification and Segmentation (Manjunath) 4.9 Video Segmentation (Tekalp) 4.10 Adaptive and Neural Methods for Image Segmentation (Ghosh) Edge and Boundary Detection in Images 4.11 Gradient and Laplacian-Type Edge Detectors (Rodriguez) 4.12 Diffusion-Based Edge Detectors (Acton) Algorithms for Image Processing 4.13 Software for Image and Video Processing (Evans) 5.0 IMAGE COMPRESSION 5.1 Lossless Coding (Karam) 5.2 Block Truncation Coding (Delp) 5.3 Vector Quantization (Smith) 5.4 Wavelet Image Compression (Ramchandran) 5.5 The JPEG Lossy Standard (Ansari) 5.6 The JPEG Lossless Standard (Memon) 5.7 Multispectral Image Coding (Bouman) 6.0 VIDEO COMPRESSION 6.1 Basic Concepts and Techniques of Video Coding (Barnett/Bovik) 6.2 Spatiotemporal Subband/Wavelet Video Compression (Woods) 6.3 Object-Based Video Coding (Kunt) 6.4 MPEG-I and MPEG-II Video Standards (Ming-Ting Sun) 6.5 Emerging MPEG Standards: MPEG-IV and MPEG-VII (Kossentini) 7.0 IMAGE AND VIDEO ACQUISITION 7.1 Image Scanning, Sampling, and Interpolation (Allebach) 7.2 Video Sampling and Interpolation (Dubois) 8.0 IMAGE AND VIDEO RENDERING AND ASSESSMENT 8.1 Image Quantization, Halftoning, and Printing (Wong) 8.2 Perceptual Criteria for Image Quality Evaluation (Pappas) 9.0 IMAGE AND VIDEO STORAGE, RETRIEVAL AND COMMUNICATION 9.1 Image and Video Indexing and Retrieval (Tsuhan Chen) 9.2 A Unified Framework for Video Browsing and Retrieval (Huang) 9.3 Image and Video Communication Networks (Schonfeld) 9.4 Image Watermarking (Pitas) 10.0 APPLICATIONS OF IMAGE PROCESSING 10.1 Synthetic Aperture Radar Imaging (Goodman/Carrera) 10.2 Computed Tomography (Leahy) 10.3 Cardiac Imaging (Higgins) 10.4 Computer-Aided Detection for Screening Mammography (Bowyer) 10.5 Fingerprint Classification and Matching (Jain) 10.6 Probabilistic Models for Face Recognition (Pentland/Moghaddam) 10.7 Confocal Microscopy (Merchant/Bartels) 10.8 Automatic Target Recognition (Miller) Index

1,678 citations


Journal ArticleDOI
TL;DR: It is demonstrated how to decouple distortion and additive noise degradation in a practical image restoration system and the nonlinear NQM is a better measure of visual quality than peak signal-to noise ratio (PSNR) and linear quality measures.
Abstract: We model a degraded image as an original image that has been subject to linear frequency distortion and additive noise injection. Since the psychovisual effects of frequency distortion and noise injection are independent, we decouple these two sources of degradation and measure their effect on the human visual system. We develop a distortion measure (DM) of the effect of frequency distortion, and a noise quality measure (NQM) of the effect of additive noise. The NQM, which is based on Peli's (1990) contrast pyramid, takes into account the following: 1) variation in contrast sensitivity with distance, image dimensions, and spatial frequency; 2) variation in the local luminance mean; 3) contrast interaction between spatial frequencies; 4) contrast masking effects. For additive noise, we demonstrate that the nonlinear NQM is a better measure of visual quality than peak signal-to noise ratio (PSNR) and linear quality measures. We compute the DM in three steps. First, we find the frequency distortion in the degraded image. Second, we compute the deviation of this frequency distortion from an allpass response of unity gain (no distortion). Finally, we weight the deviation by a model of the frequency response of the human visual system and integrate over the visible frequencies. We demonstrate how to decouple distortion and additive noise degradation in a practical image restoration system.

820 citations


Proceedings ArticleDOI
10 Sep 2000
TL;DR: A new approach that can blindly measure blocking artifacts in images without reference to the originals is proposed, which has the flexibility to integrate human visual system features such as the luminance and the texture masking effects.
Abstract: The objective measurement of blocking artifacts plays an important role in the design, optimization, and assessment of image and video coding systems. We propose a new approach that can blindly measure blocking artifacts in images without reference to the originals. The key idea is to model the blocky image as a non-blocky image interfered with a pure blocky signal. The task of the blocking effect measurement algorithm is then to detect and evaluate the power of the blocky signal. The proposed approach has the flexibility to integrate human visual system features such as the luminance and the texture masking effects.

473 citations


Proceedings ArticleDOI
02 Apr 2000
TL;DR: A one-pass pixel-parallel low-complexity method for detecting phase discontinuities based on a supervised feedforward multilayer perceptron neural network that detects the correct unwrapping locations where some conventional methods fail.
Abstract: Imaging systems that construct an image from phase information in received signals include synthetic aperture radar (SAR) and optical Doppler tomography (ODT) systems. A fundamental problem in the image formation is phase ambiguity, i.e., it is impossible to distinguish between phases that differ by 2/spl pi/. Phase unwrapping in two dimensions essentially consists of detecting the pixel locations of the phase discontinuities, finding an ordering among the pixel locations for unwrapping the phase, and adding offsets of multiples of 2/spl pi/. In this paper, we propose a new method for detecting phase discontinuities. The method is based on a supervised feedforward multilayer perceptron neural network. We train and test the neural network on simulated phase images formed in an ODT system. For the ODT phase images, the new method detects the correct unwrapping locations where some conventional methods fail. The key contribution of the paper is a one-pass pixel-parallel low-complexity method for detecting phase discontinuities.

142 citations


Journal ArticleDOI
TL;DR: This paper linearizes error diffusion algorithms by modeling the quantizer as a linear gain plus additive noise, and quantifies the two primary effects of error diffusion: edge sharpening and noise shaping.
Abstract: Digital halftoning quantizes a graylevel image to one bit per pixel. Halftoning by error diffusion reduces local quantization error by filtering the quantization error in a feedback loop. In this paper, we linearize error diffusion algorithms by modeling the quantizer as a linear gain plus additive noise. We confirm the accuracy of the linear model in three independent ways. Using the linear model, we quantify the two primary effects of error diffusion: edge sharpening and noise shaping. For each effect, we develop an objective measure of its impact on the subjective quality of the halftone. Edge sharpening is proportional to the linear gain, and we give a formula to estimate the gain from a given error filter. In quantifying the noise, we modify the input image to compensate for the sharpening distortion and apply a perceptually weighted signal-to-noise ratio to the residual of the halftone and modified input image. We compute the correlation between the residual and the original image to show when the residual can be considered signal independent. We also compute a tonality measure similar to total harmonic distortion. We use the proposed measures for edge sharpening, noise shaping, and tonality to evaluate the quality of error diffusion algorithms.

117 citations


Journal ArticleDOI
TL;DR: The inverse halftoning algorithm is based on anisotropic diffusion and uses the new multiscale gradient estimator to vary the tradeoff between spatial resolution and grayscale resolution at each pixel to obtain a sharp image with a low perceived noise level.
Abstract: Halftones and other binary images are difficult to process with causing several degradation. Degradation is greatly reduced if the halftone is inverse halftoned (converted to grayscale) before scaling, sharpening, rotating, or other processing. For error diffused halftones, we present (1) a fast inverse halftoning algorithm and (2) a new multiscale gradient estimator. The inverse halftoning algorithm is based on anisotropic diffusion. It uses the new multiscale gradient estimator to vary the tradeoff between spatial resolution and grayscale resolution at each pixel to obtain a sharp image with a low perceived noise level. Because the algorithm requires fewer than 300 arithmetic operations per pixel and processes 7/spl times/7 neighborhoods of halftone pixels, it is well suited for implementation in VLSI and embedded software. We compare the implementation cost, peak signal to noise ratio, and visual quality with other inverse halftoning algorithms.

111 citations


Journal ArticleDOI
TL;DR: Dominant component analysis estimates the locally dominant modulations in a signal, which are useful in a variety of machine vision applications, while channelized components analysis delivers a true multidimensional multicomponent signal representation.
Abstract: We develop multicomponent AM-FM models for multidimensional signals. The analysis is cast in a general n-dimensional framework where the component modulating functions are assumed to lie in certain Sobolev spaces. For both continuous and discrete linear shift invariant (LSI) systems with AM-FM inputs, powerful new approximations are introduced that provide closed form expressions for the responses in terms of the input modulations. The approximation errors are bounded by generalized energy variances quantifying the localization of the filter impulse response and by Sobolev norms quantifying the smoothness of the modulations. The approximations are then used to develop novel spatially localized demodulation algorithms that estimate the AM and FM functions for multiple signal components simultaneously from the channel responses of a multiband linear filterbank used to isolate components. Two discrete computational paradigms are presented. Dominant component analysis estimates the locally dominant modulations in a signal, which are useful in a variety of machine vision applications, while channelized components analysis delivers a true multidimensional multicomponent signal representation. We demonstrate the techniques on several images of general interest in practical applications, and obtain reconstructions that establish the validity of characterizing images of this type as sums of locally narrowband modulated components.

109 citations


Journal ArticleDOI
TL;DR: It is shown that the AM-FM image representation can identify normal repetitive structures and sarcomeres, with a good degree of accuracy, and detect abnormalities in sarcomere ultrastructural pattern which alter the normal regular pattern as seen in muscle pathology.
Abstract: Describes the application of an amplitude modulation-frequency modulation (AM-FM) image representation in segmenting electron micrographs of skeletal muscle for the recognition of: (1) normal sarcomere ultrastructural pattern and (2) abnormal regions that occur in sarcomeres in various myopathies. A total of 26 electron micrographs from different myopathies mere used for this study. It is shown that the AM-FM image representation can identify normal repetitive structures and sarcomeres, with a good degree of accuracy. This system can also detect abnormalities in sarcomeres which alter the normal regular pattern, as seen in muscle pathology, with a recognition accuracy of 75%-84% as compared to a human expert.

37 citations


Journal ArticleDOI
TL;DR: This paper introduces a new binary shape coding technique called generalized predictive shape coding (GPSC) to encode the boundary of a visual object compactly by using a vertex-based approach that retains the advantages of existing polygon-based algorithms for visual content description while furnishing better geometric compression.
Abstract: This paper introduces a new binary shape coding technique called generalized predictive shape coding (GPSC) to encode the boundary of a visual object compactly by using a vertex-based approach. GPSC consists of a contour pixel matching algorithm and a motion-compliant contour coding algorithm. The contour pixel matching algorithm utilizes the knowledge of previously decoded contours by using a uniform translational model for silhouette motion, and generalizes polygon approximation for lossless and lossy motion estimation by adjusting a tolerance parameter d max . To represent motion-compliant regions with minimum information in the transmitted bitstream, we develop a reference index-based coding scheme to represent the 2D positions of the matched segments using 1D reference contour indices. Finally, we encode the mismatched segments by sending residual polygons until the distortion is less than d max . While GPSC realizes polygon approximation exactly at every encoding stage, we can guarantee quality of service because the peak distortion is no greater than d max , and we improve coding efficiency as long as a silhouette complies with the model. The tolerance parameter d max can be assigned to each contour to smooth the transmitted data rate, which is especially useful for constant bandwidth channels. Compared with non-predictive approaches, simulation using MPEG-4 sequences demonstrates that GPSC not only improves objective gain but also enhances visual quality based on MPEG-4 subjective tests. The significance of GPSC is that it provides a generic framework for seamlessly extending conventional vertex coding schemes into the temporal domain yet it retains the advantages of existing polygon-based algorithms for visual content description while furnishing better geometric compression.

24 citations


Proceedings ArticleDOI
02 Apr 2000
TL;DR: A novel method for representing and coding wideband signals using permutations and a novel algorithm for encoding the permutation information efficiently that achieves coding gains over Huffman coding is introduced.
Abstract: We introduce a novel method for representing and coding wideband signals using permutations. The signal samples are first sorted, and then encoded using differential pulse code modulation. We show that our method is optimal for DPCM coding and develop a novel algorithm for encoding the permutation information efficiently. We show that the new algorithm achieves coding gains over Huffman coding.

15 citations


Proceedings ArticleDOI
02 Apr 2000
TL;DR: This work develops unique algorithms for assessing the quality of foveated image/video data and analyzes the increase in compression efficiency that is afforded by the foveation approach.
Abstract: We present a framework for assessing the quality of, and determining the efficiency of foveated and compressed images and video streams. We develop unique algorithms for assessing the quality of foveated image/video data. By interpreting foveation as a coordinate transformation, we analyze the increase in compression efficiency that is afforded by our foveation approach. We demonstrate these concepts on foveated, compressed video streams using modified (foveated) versions of H.263 that are standards-compliant. In the simulations, quality versus compression is enhanced considerably by the foveation approach. We obtain compression gains ranging from 8% to 52% for I pictures and from 7% to 68% for P pictures.

Proceedings ArticleDOI
01 Dec 2000
TL;DR: An unequal error protection technique for foveation-based error resilience over highly error-prone mobile networks is introduced and unequal delay-constrained ARQ and RCPC codes in H.223 Annex C are employed.
Abstract: In this paper, we introduce an unequal error protection technique for foveation-based error resilience over highly error-prone mobile networks. For point-to-point visual communications, visual quality can be significantly increased by using foveation-based error resilience where each frame is divided into foveated and background layers according to the gaze direction of the human eye, and two bitstreams are generated. In an effort to increase the source throughput of the foveated layer, we employ unequal delay-constrained ARQ and RCPC (rate compatible punctured convolutional) codes in H.223 Annex C. In the simulation, the visual quality is increased in the range of 0.3 dB to 1 dB over channel SNR 5 dB to 15 dB.

Proceedings ArticleDOI
01 Dec 2000
TL;DR: The hand optimized VLIW DSP implementation is 61/spl times/ faster than the C version compiled with level two optimization, and most of the improvement was due to the efficient placement of data and programs in memory.
Abstract: A Very Long Instruction Word (VLIW) processor and a superscalar processor can execute multiple instructions simultaneously. A VLIW processor depends on the compiler and programmer to find the parallelism in the instructions, whereas a superscaler processor determines the parallelism at runtime. This paper compares TI TMS320C6700 VLIW digital signal processor (DSP) and SimpleScalar superscalar implementations of a baseline 11.263 video encoder in C. With level two C compiler optimization, a one-way issue superscalar processor is 7.5 times faster than the VLIW DSP for the same processor clock speed. The superscalar speedup from one-way to four-way issue is 2.88:1, and from four-way to 256-way issue is 2.43:1. To reduce the execution time on the C6700, we write assembly routines for sum-of-absolute-difference, interpolation, and reconstruction, and place frequently used code and data into on-chip memory. We use TI's discrete cosine transform assembly routines. The hand optimized VLIW DSP implementation is 61/spl times/ faster than the C version compiled with level two optimization. Most of the improvement was due to the efficient placement of data and programs in memory. The hand optimized VLIW implementation is 14% faster than a 256-way superscalar implementation without hand optimizations.

Proceedings ArticleDOI
10 Sep 2000
TL;DR: The model describes the degradation of Raman signals by non-uniform illumination, by the microscopic system, and by additive signal-dependent Gaussian noise.
Abstract: Presents a model for Raman microscopic images. The model describes the degradation of Raman signals by non-uniform illumination, by the microscopic system, and by additive signal-dependent Gaussian noise. Using this model, synthetic images were created to validate the model. Based on these synthetic images, an anisotropic diffusion filter was applied to reduce the signal-dependent Gaussian noise and at the same time not blur the objects' boundary. A Wiener filter was used to restore the blurred Raman images by the microscopic system. And, an image ratioing method was used to correct for the non-uniform illumination. After the restoration, the mean absolute error between the restored image and the true image was minimized.