scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Stereo image compression using wavelet coefficients morphology

01 Apr 2004-Image and Vision Computing (Elsevier)-Vol. 22, Iss: 4, pp 281-290
TL;DR: A new stereo image compression scheme that is based on the wavelet transform of both images and the disparity estimation between the stereo pair subbands and demonstrates very good performance as far as PSNR measures and visual quality are concerned and low complexity.
Abstract: In this paper, we propose a new stereo image compression scheme that is based on the wavelet transform of both images and the disparity estimation between the stereo pair subbands. The two images are decomposed by using a Discrete Wavelet Transform (DWT) and coded by employing the morphological representation of the wavelet coefficients, which is a technique that exploits the intraband–interband statistical properties of them. The progressive pixel-to-pixel evaluation of the disparity has been incorporated to the morphological coder so that a dense disparity field to be formed for every subband. The proposed method demonstrates very good performance as far as PSNR measures and visual quality are concerned and low complexity.
Citations
More filters
Journal ArticleDOI
TL;DR: A full reference metric for quality assessment of stereoscopic images based on the binocular fusion process characterizing the 3D human perception is proposed and the difference of binocular energy has shown a high correlation with the human judgement for different impairments and is used to build the Binocular Energy Quality Metric (BEQM).
Abstract: Stereoscopic imaging is becoming very popular and its deployment by means of photography, television, cinema. . .is rapidly increasing. Obviously, the access to this type of images imposes the use of compression and transmission that may generate artifacts of different natures. Consequently, it is important to have appropriate tools to measure the quality of stereoscopic content. Several studies tried to extend well-known metrics, such as the PSNR or SSIM, to 3D. However, the results are not as good as for 2D images and it becomes important to have metrics dealing with 3D perception. In this work, we propose a full reference metric for quality assessment of stereoscopic images based on the binocular fusion process characterizing the 3D human perception. The main idea consists of the development of a model allowing to reproduce the binocular signal generated by simple and complex cells, and to estimate the associated binocular energy. The difference of binocular energy has shown a high correlation with the human judgement for different impairments and is used to build the Binocular Energy Quality Metric (BEQM). Extensive experiments demonstrated the performance of the BEQM with regards to literature.

131 citations

Journal ArticleDOI
TL;DR: INSLR, a system that integrates various image processing techniques and computational intelligence techniques in order to deal with sentence recognition, is developed to improve communication between hearing impaired people and normal people promising them better social prospects.
Abstract: (INSLR) that integrates various image processing techniques and computational intelligence techniques in order to deal with sentence recognition. The system is developed to improve communication between hearing impaired people and normal people promising them better social prospects. A wavelet based video segmentation technique is proposed which detects shapes of various hand signs and head movement in video based setup. Shape features of hand gestures are extracted using elliptical Fourier descriptions which to the highest degree reduces the feature vectors for an image. Principle component analysis (PCA) still minimizes the feature vector for a particular gesture video and the features are not affected by scaling or rotation of gestures within a video which makes the system more flexible. Features generated using these techniques makes the feature vector unique for a particular gesture. Recognition of gestures from the extracted features is done using a Sugeno type fuzzy inference system which uses linear output membership functions. Finally the INSLR system employs an audio system to play the recognized gestures along with text output. The system is tested using a data set of 80 words and sentences by 10 different signers. The experimental results show that our system has a recognition rate of 96%.

61 citations

Proceedings ArticleDOI
Xin Deng1, Wenzhe Yang1, Ren Yang2, Mai Xu1, Enpeng Liu1, Qianhan Feng1, Radu Timofte2 
20 Jun 2021
TL;DR: In this paper, an end-to-end trainable deep network for stereo image compression (ESIC) is proposed, which uses a deep regression model to estimate the homography matrix, i.e., H matrix, and only the residual information between the left and right images is encoded.
Abstract: In this paper, we propose HESIC, an end-to-end trainable deep network for stereo image compression (SIC). To fully explore the mutual information across two stereo images, we use a deep regression model to estimate the homography matrix, i.e., H matrix. Then, the left image is spatially transformed by the H matrix, and only the residual information between the left and right images is encoded to save bitrates. A two-branch auto-encoder architecture is adopted in HESIC, corresponding to the left and right images, respectively. For entropy coding, we use two conditional stereo entropy models, i.e., Gaussian mixture model (GMM) based and context based entropy models, to fully explore the correlation between the two images to reduce the coding bit-rates. In decoding, a cross quality enhancement module is proposed to enhance the image quality based on inverse H matrix. Experimental results show that our HESIC outperforms state-of-the-art SIC methods on InStereo2K and KITTI datasets both quantitatively and qualitatively. Code is available at https://github.com/ywz978020607/HESIC.

28 citations

01 Jan 2011
TL;DR: The developed system converts words and sentences of Indian sign language into voice and text in English using Elliptical Fourier descriptors for shape feature extraction and principal component analysis for feature set optimization and reduction.
Abstract: We proposed a system to automatically recognize gestures of sign language from a video stream of the signer. The developed system converts words and sentences of Indian sign language into voice and text in English. We have used the power of image processing techniques and artificial intelligence techniques to achieve the objective. To accomplish the task we used powerful image processing techniques such as frame differencing based tracking, edge detection, wavelet transform, image fusion techniques to segment shapes in our videos. It also uses Elliptical Fourier descriptors for shape feature extraction and principal component analysis for feature set optimization and reduction. Database of extracted features are compared with input video of the signer using a trained fuzzy inference system. The proposed system converts gestures into a text and voice message with 91 percent accuracy. The training and testing of the system is done using gestures from Indian Sign Language (INSL). Around 80 gestures from 10 different signers are used. The entire system was developed in a user friendly environment by creating a graphical user interface in MATLAB. The system is robust and can be trained for new gestures using GUI.

28 citations

Proceedings ArticleDOI
22 Jun 2010
TL;DR: This algorithm has the characteristics of wavelet multi-resolution, overcoming the traditional neighborhood selection algorithm for noise suppression and edge positioning accuracy of the contradictions, but also with a wavelet transform the role of band-pass filter.
Abstract: This article describes the commonly used edge detection operator ( Including the Roberts operator, Sobel operator, Prewitt operator, Laplacian operator, as well as the Canny operator)of the basic principles, and its performance is analyzed and evaluated. Since traditional edge detection operator in the image containing Gaussian white noise when there is obvious shortcomings, this paper presents a based on soft-threshold wavelet de-noising combining with the Prewitt operator edge detection algorithm. This algorithm has the characteristics of wavelet multi-resolution, overcoming the traditional neighborhood selection algorithm for noise suppression and edge positioning accuracy of the contradictions, but also with a wavelet transform the role of band-pass filter, through the soft-threshold de-noising methods in the various to remove low-amplitude noise and the undesired signals, combined with Prewitt operator airspace convolution properties that can accurately detect the image edge and effective noise suppression effect of signal detection.

27 citations

References
More filters
Proceedings Article
24 Aug 1981
TL;DR: In this paper, the spatial intensity gradient of the images is used to find a good match using a type of Newton-Raphson iteration, which can be generalized to handle rotation, scaling and shearing.
Abstract: Image registration finds a variety of applications in computer vision. Unfortunately, traditional image registration techniques tend to be costly. We present a new image registration technique that makes use of the spatial intensity gradient of the images to find a good match using a type of Newton-Raphson iteration. Our technique is taster because it examines far fewer potential matches between the images than existing techniques Furthermore, this registration technique can be generalized to handle rotation, scaling and shearing. We show how our technique can be adapted tor use in a stereo vision system.

12,944 citations

Journal ArticleDOI
TL;DR: The image coding results, calculated from actual file sizes and images reconstructed by the decoding algorithm, are either comparable to or surpass previous results obtained through much more sophisticated and computationally complex methods.
Abstract: Embedded zerotree wavelet (EZW) coding, introduced by Shapiro (see IEEE Trans. Signal Processing, vol.41, no.12, p.3445, 1993), is a very effective and computationally simple technique for image compression. We offer an alternative explanation of the principles of its operation, so that the reasons for its excellent performance can be better understood. These principles are partial ordering by magnitude with a set partitioning sorting algorithm, ordered bit plane transmission, and exploitation of self-similarity across different scales of an image wavelet transform. Moreover, we present a new and different implementation based on set partitioning in hierarchical trees (SPIHT), which provides even better performance than our previously reported extension of EZW that surpassed the performance of the original EZW. The image coding results, calculated from actual file sizes and images reconstructed by the decoding algorithm, are either comparable to or surpass previous results obtained through much more sophisticated and computationally complex methods. In addition, the new coding and decoding procedures are extremely fast, and they can be made even faster, with only small loss in performance, by omitting entropy coding of the bit stream by the arithmetic code.

5,890 citations

Journal ArticleDOI
J.M. Shapiro1
TL;DR: The embedded zerotree wavelet algorithm (EZW) is a simple, yet remarkably effective, image compression algorithm, having the property that the bits in the bit stream are generated in order of importance, yielding a fully embedded code.
Abstract: The embedded zerotree wavelet algorithm (EZW) is a simple, yet remarkably effective, image compression algorithm, having the property that the bits in the bit stream are generated in order of importance, yielding a fully embedded code The embedded code represents a sequence of binary decisions that distinguish an image from the "null" image Using an embedded coding algorithm, an encoder can terminate the encoding at any point thereby allowing a target rate or target distortion metric to be met exactly Also, given a bit stream, the decoder can cease decoding at any point in the bit stream and still produce exactly the same image that would have been encoded at the bit rate corresponding to the truncated bit stream In addition to producing a fully embedded bit stream, the EZW consistently produces compression results that are competitive with virtually all known compression algorithms on standard test images Yet this performance is achieved with a technique that requires absolutely no training, no pre-stored tables or codebooks, and requires no prior knowledge of the image source The EZW algorithm is based on four key concepts: (1) a discrete wavelet transform or hierarchical subband decomposition, (2) prediction of the absence of significant information across scales by exploiting the self-similarity inherent in images, (3) entropy-coded successive-approximation quantization, and (4) universal lossless data compression which is achieved via adaptive arithmetic coding >

5,559 citations

Journal ArticleDOI
TL;DR: A scheme for image compression that takes into account psychovisual features both in the space and frequency domains is proposed and it is shown that the wavelet transform is particularly well adapted to progressive transmission.
Abstract: A scheme for image compression that takes into account psychovisual features both in the space and frequency domains is proposed. This method involves two steps. First, a wavelet transform used in order to obtain a set of biorthogonal subclasses of images: the original image is decomposed at different scales using a pyramidal algorithm architecture. The decomposition is along the vertical and horizontal directions and maintains constant the number of pixels required to describe the image. Second, according to Shannon's rate distortion theory, the wavelet coefficients are vector quantized using a multiresolution codebook. To encode the wavelet coefficients, a noise shaping bit allocation procedure which assumes that details at high resolution are less visible to the human eye is proposed. In order to allow the receiver to recognize a picture as quickly as possible at minimum cost, a progressive transmission scheme is presented. It is shown that the wavelet transform is particularly well adapted to progressive transmission. >

3,925 citations

Journal ArticleDOI
TL;DR: A new image compression algorithm is proposed, based on independent embedded block coding with optimized truncation of the embedded bit-streams (EBCOT), capable of modeling the spatially varying visual masking phenomenon.
Abstract: A new image compression algorithm is proposed, based on independent embedded block coding with optimized truncation of the embedded bit-streams (EBCOT). The algorithm exhibits state-of-the-art compression performance while producing a bit-stream with a rich set of features, including resolution and SNR scalability together with a "random access" property. The algorithm has modest complexity and is suitable for applications involving remote browsing of large compressed images. The algorithm lends itself to explicit optimization with respect to MSE as well as more realistic psychovisual metrics, capable of modeling the spatially varying visual masking phenomenon.

1,933 citations