scispace - formally typeset
Search or ask a question

Showing papers in "Signal Processing-image Communication in 2007"


Journal ArticleDOI
TL;DR: This paper discusses an advanced approach for a 3DTV service, which is based on the concept of video-plus-depth data representations, and provides a modular and flexible system architecture supporting a wide range of multi-view structures.
Abstract: Due to enormous progress in the areas of auto-stereoscopic 3D displays, digital video broadcast and computer vision algorithms, 3D television (3DTV) has reached a high technical maturity and many people now believe in its readiness for marketing. Experimental prototypes of entire 3DTV processing chains have been demonstrated successfully during the last few years, and the motion picture experts group (MPEG) of ISO/IEC has launched related ad hoc groups and standardization efforts envisaging the emerging market segment of 3DTV. In this context the paper discusses an advanced approach for a 3DTV service, which is based on the concept of video-plus-depth data representations. It particularly considers aspects of interoperability and multi-view adaptation for the case that different multi-baseline geometries are used for multi-view capturing and 3D display. Furthermore it presents algorithmic solutions for the creation of depth maps and depth image-based rendering related to this framework of multi-view adaptation. In contrast to other proposals, which are more focused on specialized configurations, the underlying approach provides a modular and flexible system architecture supporting a wide range of multi-view structures.

434 citations


Journal ArticleDOI
TL;DR: Recent advances, such as new concepts in rate-distortion modelling and quality constrained control, are presented and the rate control performance can be improved.
Abstract: In this paper, we review the recent advances in rate control techniques for video coding. The rate control algorithms recommended in the video coding standards are briefly described and analyzed. Recent advances, such as new concepts in rate-distortion modelling and quality constrained control, are presented. With these techniques, the rate control performance can be improved. The paper not only summarizes these recent rate control techniques but also provides explicit directions for future rate control algorithm design.

129 citations


Journal ArticleDOI
TL;DR: This paper proposes an innovative scheme, namely the scalable secret image sharing scheme, for sharing an image O among n participants such that the clarity of the reconstructed image scales with proportion with the number of participants.
Abstract: In this paper, we propose an innovative scheme, namely the scalable secret image sharing scheme, for sharing an image O among n participants such that the clarity of the reconstructed image (i.e., the amount of information therein) scales with proportion with the number of the participants. The proposed scheme encodes O into n shadow images that exhibit the following features: (a) each shadow image reveals no information about O, (b) each shadow image is only half the size of O, (c) any k (2=

120 citations


Journal ArticleDOI
TL;DR: A dense stereo matching algorithm for epipolar rectified images capable of handling large untextured regions, estimating precise depth boundaries and propagating disparity information to occluded regions, which are challenging tasks for conventional stereo methods is described.
Abstract: This paper describes a dense stereo matching algorithm for epipolar rectified images. The method applies colour segmentation on the reference image. Our basic assumptions are that disparity varies smoothly inside a segment, while disparity boundaries coincide with the segment borders. The use of these assumptions makes the algorithm capable of handling large untextured regions, estimating precise depth boundaries and propagating disparity information to occluded regions, which are challenging tasks for conventional stereo methods. We model disparity inside a segment by a planar equation. Initial disparity segments are clustered to form a set of disparity layers, which are planar surfaces that are likely to occur in the scene. Assignments of segments to disparity layers are then derived by minimization of a global cost function. This cost function is based on the observation that occlusions cannot be dealt with in the domain of segments. Therefore, we propose a novel cost function that is defined on two levels, one representing the segments and the other corresponding to pixels. The basic idea is that a pixel has to be assigned to the same disparity layer as its segment, but can as well be occluded. The cost function is then effectively minimized via graph-cuts. In the experimental results, we show that our method produces good-quality results, especially in regions of low texture and close to disparity boundaries. Results obtained for the Middlebury test set indicate that the proposed method is able to compete with the best-performing state-of-the-art algorithms.

114 citations


Journal ArticleDOI
TL;DR: A semi-fragile watermarking method for the automatic authentication and restoration of the content of digital images that is robust to common image processing operations such as lossy transcoding and image filtering.
Abstract: This paper presents a semi-fragile watermarking method for the automatic authentication and restoration of the content of digital images. Semi-fragile watermarks are embedded into the original image, which reflect local malicious tampering on the image. When tampered blocks are detected, the restoration problem is formulated as an irregular sampling problem. These blocks are then reconstructed, making use of the information embedded in the same watermarked image, through iterative projections onto convex sets. In contrast to previous methods, the restoration process is robust to common image processing operations such as lossy transcoding and image filtering. Simulation results showed that the scheme keeps the probability of false alarm to a minimum while maintaining the data integrity of the restored images.

98 citations


Journal ArticleDOI
TL;DR: A unified framework for text detection, localization, and tracking in compressed videos using the discrete cosines transform (DCT) coefficients is proposed and the final experimental results show the effectiveness of the proposed methods.
Abstract: Video text information plays an important role in semantic-based video analysis, indexing and retrieval. Video texts are closely related to the content of a video. Usually, the fundamental steps of text-based video analysis, browsing and retrieval consist of video text detection, localization, tracking, segmentation and recognition. Video sequences are commonly stored in compressed formats where MPEG coding techniques are often adopted. In this paper, a unified framework for text detection, localization, and tracking in compressed videos using the discrete cosines transform (DCT) coefficients is proposed. A coarse to fine text detection method is used to find text blocks in terms of the block DCT texture intensity information. The DCT texture intensity of an 8x8 block of an intra-frame is approximately represented by seven AC coefficients. The candidate text block regions are further verified and refined. The text block region localization and tracking are carried out by virtue of the horizontal and vertical block texture intensity projection profiles. The appearing and disappearing frames of each text line are determined by the text tracking. The final experimental results show the effectiveness of the proposed methods.

68 citations


Journal ArticleDOI
TL;DR: It is illustrated how scalable video transmission can be improved with efficient use of the proposed cross-layer design, adaptation mechanisms and control information in the case of streaming of scalable video streams.
Abstract: Multimedia applications such as video conference, digital video broadcasting (DVB), and streaming video and audio have been gaining popularity during last years and the trend has been to allocate these services more and more also on mobile users. The demand of quality of service (QoS) for multimedia raises huge challenges on the network design, not only concerning the physical bandwidth but also the protocol design and services. One of the goals for system design is to provide efficient solutions for adaptive multimedia transmission over different access networks in all-IP environment. The joint source and channel coding (JSCC/D) approach has already given promising results in optimizing multimedia transmission. However, in practice, arranging the required control mechanism and delivering the required side information through network and protocol stack have caused problems and quite often the impact of network has been neglected in studies. In this paper we propose efficient cross-layer communication methods and protocol architecture in order to transmit the control information and to optimize the multimedia transmission over wireless and wired IP networks. We also apply this architecture to the more specific case of streaming of scalable video streams. Scalable video coding has been an active research topic recently and it offers simple and flexible solutions for video transmission over heterogeneous networks to heterogeneous terminals. In addition it provides easy adaptation to varying transmission conditions. In this paper we illustrate how scalable video transmission can be improved with efficient use of the proposed cross-layer design, adaptation mechanisms and control information.

61 citations


Journal ArticleDOI
TL;DR: This work provides a principled analysis of local image distortions and their relation to optical flow and presents the results of a comprehensive DT classification study that compares the performances of different flow features for a NF algorithm and four different complete flow algorithms.
Abstract: We address the problem of dynamic texture (DT) classification using optical flow features. Optical flow based approaches dominate among the currently available DT classification methods. The features used by these approaches often describe local image distortions in terms of such quantities as curl or divergence. Both normal and complete flows have been considered, with normal flow (NF) being used more frequently. However, precise meaning and applicability of normal and complete flow features have never been analysed properly. We provide a principled analysis of local image distortions and their relation to optical flow. Then we present the results of a comprehensive DT classification study that compares the performances of different flow features for a NF algorithm and four different complete flow algorithms. The efficiencies of two flow confidence measures are also studied.

49 citations


Journal ArticleDOI
TL;DR: An efficient coding method for digital hologram video using a three-dimensional scanning method and two-dimensional video compression technique that has compression ratios of 8-16 times higher than the previous researches, which is expected to contribute to reduce the amount ofdigital hologram data for communication or storage.
Abstract: In this paper, we proposed an efficient coding method for digital hologram video using a three-dimensional (3D) scanning method and two-dimensional (2D) video compression technique. It consists of separation of the captured 3D image into R, G, and B color space components, localization by segmenting the fringe pattern in to MxN [pixel^2], frequency-transform by 2D discrete cosine transform (2D DCT), 3D-scanning the segments to form a video sequence, classification of coefficients, and hybrid video coding with H.264/AVC, differential pulse code modulation (DPCM), and lossless coding method. The experimental results with this method showed that the proposed method has compression ratios of 8-16 times higher than the previous researches. Thus, we expect it to contribute to reduce the amount of digital hologram data for communication or storage.

44 citations


Journal ArticleDOI
TL;DR: Experimental results show that WISDOW-Comp outperforms the state of the art of compression based denoisers in terms of both rate and distortion.
Abstract: This paper presents a novel scheme for simultaneous compression and denoising of images: WISDOW-Comp (Wavelet based Image and Signal Denoising via Overlapping Waves-Compression). It is based on the atomic representation of wavelet details employed in WISDOW for image denoising. However, atoms can be also used for achieving compression. In particular, the core of WISDOW-Comp consists of recovering wavelet details, i.e. atoms, by exploiting wavelet low frequency information. Therefore, just the approximation band and significance map of atoms absolute maxima have to be encoded and sent to the decoder for recovering a cleaner as well as compressed version of the image under study. Experimental results show that WISDOW-Comp outperforms the state of the art of compression based denoisers in terms of both rate and distortion. Some technical devices will also be investigated for further improving its performances.

40 citations


Journal ArticleDOI
TL;DR: This work proposes content-adaptive stereo video coding (CA-SC), where additional coding gain is targeted by down-sampling one of the views spatially or temporally depending on the content, based on the well-known theory that the human visual system can perceive high frequencies in three-dimensional (3D) from the higher quality view.
Abstract: We address efficient compression and real-time streaming of stereoscopic video over the current Internet. We first propose content-adaptive stereo video coding (CA-SC), where additional coding gain, over that can be achieved by exploiting only inter-view correlations, is targeted by down-sampling one of the views spatially or temporally depending on the content, based on the well-known theory that the human visual system can perceive high frequencies in three-dimensional (3D) from the higher quality view. We also developed stereoscopic 3D video streaming server and clients by modifying available open source platforms, where each client can view the video in mono or stereo mode depending on its display capabilities. The performance of the end-to-end stereoscopic streaming system is demonstrated using subjective quality tests.

Journal ArticleDOI
TL;DR: Two versions of the new CBA algorithms are introduced and compared and it is shown that by using a Laplacian probability model for the DCT coefficients as well as down-sampling the subordinate colors, the compression results are further improved.
Abstract: Most coding techniques for color image compression employ a de-correlation approach-the RGB primaries are transformed into a de-correlated color space, such as YUV or YCbCr, then the de-correlated color components are encoded separately. Examples of this approach are the JPEG and JPEG2000 image compression standards. A different method, of a correlation-based approach (CBA), is presented in this paper. Instead of de-correlating the color primaries, we employ the existing inter-color correlation to approximate two of the components as a parametric function of the third one, called the base component. We then propose to encode the parameters of the approximation function and part of the approximation errors. We use the DCT (discrete cosine transform) block transform to enhance the algorithm's performance. Thus the approximation of two of the color components based on the third color is performed for each DCT subband separately. We use the rate-distortion theory of subband transform coders to optimize the algorithm's bits allocation for each subband and to find the optimal color components transform to be applied prior to coding. This pre-processing stage is similar to the use of the RGB to YUV transform in JPEG and may further enhance the algorithm's performance. We introduce and compare two versions of the new algorithm and show that by using a Laplacian probability model for the DCT coefficients as well as down-sampling the subordinate colors, the compression results are further improved. Simulation results are provided showing that the new CBA algorithms are superior to presently available algorithms based on the common de-correlation approach, such as JPEG.

Journal ArticleDOI
TL;DR: It is shown that the DCT (discrete cosine transform) can be used to transform the RGB components into an efficient set of color components suitable for subband coding and can be also used to design adaptive quantization tables in the coding stage with results superior to fixed quantification tables.
Abstract: Although subband transform coding is a useful approach to image compression and communication, the performance of this method has not been analyzed so far for color images, especially when the selection of color components is considered. Obviously, the RGB components are not suitable for such a compression method due to their high inter-color correlation. On the other hand, the common selection of YUV or YIQ is rather arbitrary and in most cases not optimal. In this work we introduce a rate-distortion model for color image compression and employ it to find the optimal color components and optimal bit allocation (optimal rates) for the compression. We show that the DCT (discrete cosine transform) can be used to transform the RGB components into an efficient set of color components suitable for subband coding. The optimal rates can be also used to design adaptive quantization tables in the coding stage with results superior to fixed quantization tables. Based on the presented results, our conclusion is that the new approach can improve presently available methods for color image compression and communication.

Journal ArticleDOI
TL;DR: An application-independent task mapping and scheduling solution in multi-hop VSNs is presented that provides real-time guarantees to process video feeds and outperforms existing mechanisms in terms of guaranteeing application deadlines with minimum energy consumption.
Abstract: Video sensor networks (VSNs) has become the recent research focus due to the rich information it provides to address various data-hungry applications. However, VSN implementations face stringent constraints of limited communication bandwidth, processing capability, and power supply. In-network processing has been proposed as efficient means to address these problems. The key component of in-network processing, task mapping and scheduling problem, is investigated in this paper. Although task mapping and scheduling in wired networks of processors has been extensively studied, their application to VSNs remains largely unexplored. Existing algorithms cannot be directly implemented in VSNs due to limited resource availability and shared wireless communication medium. In this work, an application-independent task mapping and scheduling solution in multi-hop VSNs is presented that provides real-time guarantees to process video feeds. The processed data is smaller in volume which further releases the burden on the end-to-end communication. Using a novel multi-hop channel model and a communication scheduling algorithm, computation tasks and associated communication events are scheduled simultaneously with a dynamic critical-path scheduling algorithm. Dynamic voltage scaling (DVS) mechanism is implemented to further optimize energy consumption. According to the simulation results, the proposed solution outperforms existing mechanisms in terms of guaranteeing application deadlines with minimum energy consumption.

Journal ArticleDOI
TL;DR: A point-sampled approach for capturing 3D video footage and subsequent re-rendering of real-world scenes, which allows for efficient post-processing algorithms and leads to a high resulting rendering quality using enhanced probabilistic EWA volume splatting.
Abstract: This paper presents a point-sampled approach for capturing 3D video footage and subsequent re-rendering of real-world scenes. The acquisition system is composed of multiple sparsely placed 3D video bricks. The bricks contain a low-cost projector, two grayscale cameras and a high-resolution color camera. To improve on depth calculation we rely on structured light patterns. Texture images and pattern-augmented views of the scene are acquired simultaneously by time multiplexed projections of complementary patterns and synchronized camera exposures. High-resolution depth maps are extracted using depth-from-stereo algorithms performed on the acquired pattern images. The surface samples corresponding to the depth values are merged into a view-independent, point-based 3D data structure. This representation allows for efficient post-processing algorithms and leads to a high resulting rendering quality using enhanced probabilistic EWA volume splatting. In this paper, we focus on the 3D video acquisition system and necessary image and video processing techniques.

Journal ArticleDOI
TL;DR: This paper describes a method for video retrieval system based on local invariant region descriptors that is highly robust to camera and object motions and can withstand severe illumination changes.
Abstract: This paper describes a method for video retrieval system based on local invariant region descriptors. A novel framework is proposed for combined video segmentation, content extraction and retrieval. A similarity measure, previously proposed by the authors based on local region features, is used for video segmentation. The local regions are tracked throughout a shot and stable features are extracted. The conventional key frame method is replaced with these stable local features to characterise different shots. A grouping technique is introduced to combine these stable tracks into meaningful object clusters. The above method can handle the different scales of object appearance in videos. Compared to previous video retrieval approaches, the proposed method is highly robust to camera and object motions and can withstand severe illumination changes. The proposed framework is applied to scene and object retrieval experiments and significant improvement in performance is demonstrated.

Journal ArticleDOI
TL;DR: A variation of the spectral subtraction method, based on a power spectrum surface of revolution, is proposed and is found to compare favourably with existing direct deconvolution methods for defocus blur identification.
Abstract: A defocus blur metric for use in blind image quality assessment is proposed. Blind image deconvolution methods are used to determine the metric. Existing direct deconvolution methods based on the cepstrum, bicepstrum and on a spectral subtraction technique are compared across 210 images. A variation of the spectral subtraction method, based on a power spectrum surface of revolution, is proposed and is found to compare favourably with existing direct deconvolution methods for defocus blur identification. The method is found to be especially useful when distinguishing between in-focus and out-of-focus images.

Journal ArticleDOI
TL;DR: This work considers two traditional metrics for evaluating performance in automatic image annotation, the normalised score (NS) and the precision/recall (PR) statistics, particularly in connection with a de facto standard 5000 Corel image benchmark annotation task.
Abstract: In this work we consider two traditional metrics for evaluating performance in automatic image annotation, the normalised score (NS) and the precision/recall (PR) statistics, particularly in connection with a de facto standard 5000 Corel image benchmark annotation task. We also motivate and describe another performance measure, de-symmetrised termwise mutual information (DTMI), as a principled compromise between the two traditional extremes. In addition to discussing the measures theoretically, we correlate them experimentally for a family of annotation system configurations derived from the PicSOM image content analysis framework. Looking at the obtained performance figures, we notice that such kind of a system, based on adaptive fusion of numerous global image features, clearly outperforms the considered methods in literature.

Journal ArticleDOI
TL;DR: In this article, the optical field arising from a 3D object is computed and the driver signals for a given optical display device are then generated to generate a desired optical field in space.
Abstract: Image capture and image display will most likely be decoupled in future 3DTV systems. Due to the need to convert abstract representations of 3D images to display driver signals, and to explicitly consider optical diffraction and propagation effects, it is expected that signal processing issues will be of fundamental importance in 3DTV systems. Since diffraction between two parallel planes can be represented as a 2D linear shift-invariant system, various signal processing techniques naturally play an important role. Diffraction between tilted planes can also be modeled as a relatively simple system, leading to efficient discrete computations. Two fundamental problems are digital computation of the optical field arising from a 3D object, and finding the driver signals for a given optical display device which will then generate a desired optical field in space. The discretization of optical signals leads to several interesting issues; for example, it is possible to violate the Nyquist rate while sampling, but still achieve full reconstruction. The fractional Fourier transform is another signal processing tool which finds applications in optical wave propagation.

Journal ArticleDOI
TL;DR: A framework to support semantic-based video classification and annotation is presented and experiments involving several hours of MPEG video and around 1000 of candidate logotypes have been carried out to show the robustness of both detection and classification processes.
Abstract: In conventional video production, logotypes are used to convey information about content originator or the actual video content. Logotypes contain information that is critical to infer genre, class and other important semantic features of video. This paper presents a framework to support semantic-based video classification and annotation. The backbone of the proposed framework is a technique for logotype extraction and recognition. The method consists of two main processing stages. The first stage performs temporal and spatial segmentation by calculating the minimal luminance variance region (MVLR) for a set of frames. Non-linear diffusion filters (NLDF) are used at this stage to reduce noise in the shape of the logotype. In the second stage, logotype classification and recognition are achieved. The earth mover's distance (EMD) is used as a metric to decide if the detected MLVR belongs to one of the following logotype categories: learned or candidate. Learned logos are semantically annotated shapes available in the database. The semantic characterization of such logos is obtained through an iterative learning process. Candidate logos are non-annotated shapes extracted during the first processing stage. They are assigned to clusters grouping different instances of logos of similar shape. Using these clusters, false logotypes are removed and different instances of the same logo are averaged to obtain a unique prototype representing the underlying noisy cluster. Experiments involving several hours of MPEG video and around 1000 of candidate logotypes have been carried out in order to show the robustness of both detection and classification processes.

Journal ArticleDOI
TL;DR: In this work, the discrete form of the plane wave decomposition is used to calculate the diffraction field from a given set of arbitrarily distributed data points in space, and two approaches, based on matrix inversion and on projections on to convex sets (POCS), are studied.
Abstract: Computation of the diffraction field from a given set of arbitrarily distributed data points in space is an important signal processing problem arising in digital holographic 3D displays. The field arising from such distributed data points has to be solved simultaneously by considering all mutual couplings to get correct results. In our approach, the discrete form of the plane wave decomposition is used to calculate the diffraction field. Two approaches, based on matrix inversion and on projections on to convex sets (POCS), are studied. Both approaches are able to obtain the desired field when the number of given data points is larger than the number of data points on a transverse cross-section of the space. The POCS-based algorithm outperforms the matrix-inversion-based algorithm when the number of known data points is large.

Journal ArticleDOI
TL;DR: Two methods for bit-stream allocation based on the concept of fractional bit-planes are reported and results of selected experiments measuring peak signal to noise ratio (PSNR) of decoded video at various bit-rates are reported.
Abstract: The demand for video access services through wireless networks, as important parts of larger heterogeneous networks, is constantly increasing. To cope with this demand, flexible compression technology to enable optimum coding performance, especially at low bit-rates, is required. In this context, scalable video coding emerges as the most promising technology. A critical problem in wavelet-based scalable video coding is bit-stream allocation at any bit-rate and in particular when low bit-rates are targeted. In this paper two methods for bit-stream allocation based on the concept of fractional bit-planes are reported. The first method assumes that minimum rate-distortion (R-D) slope of the same fractional bit-plane within the same bit-plane across different subbands is higher than or equal to the maximum R-D slope of the next fractional bit-plane. This method is characterised by a very low complexity since no distortion evaluation is required. Contrasting this approach, in the second method the distortion caused by quantisation of the wavelet coefficients is considered. Here, a simple yet effective statistical distortion model that is used for estimation of R-D slopes for each fractional bit-plane is derived. Three different strategies are derived from this method. In the first one it is assumed that the used wavelet is nearly orthogonal, i.e. the distortion in the transform domain is treated as being equivalent to the distortion in the signal domain. To reduce the error caused by direct distortion evaluation in the wavelet domain, the weighting factors are applied to the used statistical distortion model in the second strategy. In the last strategy, the derived statistical model is used during the bit-plane encoding to determine optimal position of the fractional bit-plane corresponding to refinement information in the compressed bit-stream. Results of selected experiments measuring peak signal to noise ratio (PSNR) of decoded video at various bit-rates are reported. Additionally, the PSNR of decoded video at various bit-rates is measured for two specific cases: when the methods for bit-stream allocation are used to assign quality layers in the compressed bit-stream, and when quality layers are not assigned.

Journal ArticleDOI
TL;DR: The novelty of the proposed technique relies on the use of JPWL error resilience tools for the codestream partitioning, on optimized UPA amongJPWL packets based on genetic algorithms (GA) and supported by ''light'' FEC channel coding, which is compared to the state of the art UEP techniques on JPEG2000 transmission.
Abstract: This paper deals with the efficient and robust wireless broadcasting of JPEG2000 digital cinema (DC) streams from studios to theatres. Several unequal error protection (UEP) techniques have been proposed in literature for the transmission of JPEG2000 images. Some are based on variable forward error correction (FEC) coding applied to different parts of the stream according to their importance. Alternatively, UEP can be achieved by means of unequal power allocation (UPA) schemes based on differentiated transmission power over the stream. On the other hand, in DC applications UPA achieves weak performance if considered as the only protection strategy, unless high-power budget is assigned to transmission. This work proposes a novel hybrid FEC-UPA system adopting the resilience tools of the JPEG2000 wireless (JPWL) standard. The JPWL stream is partitioned into a certain number of packet groups to which ''light'' FEC coding is applied. Groups are then transmitted through separate wavelet packet division multiplexing (WPDM) sub-channels at different power. Both stream partitioning and UPA are driven by the sensibilities of the JPWL packets to the channel errors. The novelty of the proposed technique relies on the use of JPWL error resilience tools for the codestream partitioning, on optimized UPA among JPWL packets based on genetic algorithms (GA) and supported by ''light'' FEC channel coding. The proposed system is compared to the state of the art UEP techniques on JPEG2000 transmission. The performance is evaluated in case of transmission over wireless channels with both sparse and packet error statistics. Experiments show that the proposed approach allows achieving an average peak signal-to-noise ratio (PSNR) on the reconstructed frames compliant to the standard quality required by DC applications (40dB) for bit error rate (BER) up to 10^-^4.

Journal ArticleDOI
TL;DR: A joint source channel coding (JSCC) scheme to the transmission of fixed images for wireless communication applications that allows preserving the topological organization of the codebook along the transmission chain while keeping a reduced complexity system is proposed.
Abstract: In this paper, we propose a joint source channel coding (JSCC) scheme to the transmission of fixed images for wireless communication applications. The ionospheric channel which presents some characteristics identical to those found on mobile radio channels, like fading, multipath and Doppler effect is our test channel. As this method based on a wavelet transform, a self-organising map (SOM) vector quantization (VQ) optimally mapped on a QAM digital modulation and an unequal error protection (UEP) strategy, this method is particularly well adapted to low bit-rate applications. The compression process consists in applying a SOM VQ on the discrete wavelet transform coefficients and computing several codebooks depending on the sub-images preserved. An UEP is achieved with a correcting code applied on the most significant data. The JSCC consists of an optimal mapping of the VQ codebook vectors on a high spectral efficiency digital modulation. This feature allows preserving the topological organization of the codebook along the transmission chain while keeping a reduced complexity system. This method applied on grey level images can be used for colour images as well. Several tests of transmission for different images have shown the robustness of this method even for high bit error rate (BER>10^-^2). In order to qualify the quality of the image after transmission, we use a PSNR% (peak signal-to-noise ratio) parameter which is the value of the difference of the PSNR after compression at the transmitter and after reception at the receiver. This parameter clearly shows that 95% of the PSNR is preserved when the BER is less than 10^-^2.

Journal ArticleDOI
TL;DR: An analysis for data embedding in two-dimensional signals based on DCT phase modulation is presented and closed form expressions for estimating the number of bits that can be embedded given a specific distortion measure and the probability of bit error are developed.
Abstract: This paper presents an analysis for data embedding in two-dimensional signals based on DCT phase modulation. A communication system model for this data embedding scheme is developed. Closed form expressions for estimating the number of bits that can be embedded given a specific distortion measure and the probability of bit error are developed. The data embedding process is viewed as transmitting data through a binary symmetric channel with crossover error probabilities, which depends only on the power in the selected coefficients and the noise created by the signal processing operations undergone by the image.

Journal ArticleDOI
TL;DR: Two fast multiresolution motion estimation algorithms in shift-invariant wavelet domain are proposed: one is the wavelet matching error characteristic based partial distort search (WMEC-PDS) algorithm, which improves computational efficiency of conventional partial distortion search algorithms while keeping the same estimate accuracy as the FSA; another is the anisotropic double cross search (ADCS) algorithm using multiresolved-spatio-temporal context.
Abstract: Motion estimation and compensation in wavelet domain have received much attention recently. To overcome the inefficiency of motion estimation in critically sampled wavelet domain, the low-band-shift (LBS) method and the complete-to-overcomplete discrete wavelet transform (CODWT) method are proposed for motion estimation in shift-invariant wavelet domain. However, a major disadvantage of these methods is the computational complexity. Although the CODWT method has reduced the computational complexity by skipping the inverse wavelet transform and making the direct link between the critically sampled subbands and the shift-invariant subbands, the full search algorithm (FSA) increases it. In this paper, we proposed two fast multiresolution motion estimation algorithms in shift-invariant wavelet domain: one is the wavelet matching error characteristic based partial distortion search (WMEC-PDS) algorithm, which improves computational efficiency of conventional partial distortion search algorithms while keeping the same estimate accuracy as the FSA; another is the anisotropic double cross search (ADCS) algorithm using multiresolution-spatio-temporal context, which provides a significantly computational load reduction while only introducing negligible distortion compared with the FSA. Due to the multiresolution nature, both the proposed approaches can be applied to wavelet-based scalable video coding. Experimental results show the superiority of the proposed fast motion estimation algorithms against other fast algorithms in terms of speed-up and quality.

Journal ArticleDOI
TL;DR: This paper develops an analytical model of the expected video distortion at the decoder with respect to the expected latency for TCP, the packetization mechanism, and the error-concealment method used at thedecoder.
Abstract: In this paper we explore the use of a new rate-distortion metric for optimizing real-time Internet video streaming with the transmission control protocol (TCP). We lay out the groundwork by developing a simple model that characterizes the expected latency for packets send with TCP-Reno. Subsequently, we develop an analytical model of the expected video distortion at the decoder with respect to the expected latency for TCP, the packetization mechanism, and the error-concealment method used at the decoder. Characterizing the duo protocol/channel more accurately, we obtain a better estimate of the expected distortion and the available channel rate. This better knowledge is exploited with the design of a new algorithm for rate-distortion optimized encoding mode selection for video streaming with TCP. Experimental results for real-time video streaming depict improvement in PSNR in the range of 2dB over metrics that do not consider the behavior of the transport protocol.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed method can effectively mitigate the error propagation due to packet loss as well as achieve fairness among clients in a multicast.
Abstract: In this paper, we propose a two-pass error-resilience transcoding scheme based on adaptive intra-refresh for inserting error-resilience features to a compressed video at the intermediate transcoder of a three-tier streaming system. The proposed transcoder adaptively adjusts the intra-refresh rate according to the video content and the channel's packet-loss rate to protect the most important macroblocks against packet loss. In this work, we consider the problem of multicast of video to multiple clients having disparate channel-loss profiles. We propose a MINMAX loss rate estimation scheme to determine a single intra-refresh rate for all the clients in a multicast group. For the scenario that a quality variation constraint is imposed on the users, we also propose a grouping method to partition a multicast group of heterogeneous users into a minimal number of subgroups to minimize the channel bandwidth consumption while meeting the quality variation constraint. Experimental results show that the proposed method can effectively mitigate the error propagation due to packet loss as well as achieve fairness among clients in a multicast.

Journal ArticleDOI
TL;DR: A temporal recognition scheme that classifies a given image in an unseen video into one of the universal facial expression categories using an analysis-synthesis scheme and an efficient recognition scheme based on the detection of keyframes in videos are proposed.
Abstract: In this paper, we propose a novel approach for facial expression analysis and recognition The main contributions of the paper are as follows First, we propose a temporal recognition scheme that classifies a given image in an unseen video into one of the universal facial expression categories using an analysis-synthesis scheme The proposed approach relies on tracked facial actions provided by a real-time face tracker Second, we propose an efficient recognition scheme based on the detection of keyframes in videos Third, we use the proposed method for extending the human-machine interaction functionality of the AIBO robot More precisely, the robot is displaying an emotional state in response to the user's recognized facial expression Experiments using unseen videos demonstrated the effectiveness of the developed methods

Journal ArticleDOI
TL;DR: This work presents an on-line approach to the selection of a variable number of frames from a compressed video sequence, just attending to selection rules applied over domain independent semantic features.
Abstract: This work presents an on-line approach to the selection of a variable number of frames from a compressed video sequence, attending only to rules applied over domain-independent semantic features. The localization of these semantic features helps infer the heterogeneous distribution of semantically relevant information, which allows to reduce the amount of adapted data while preserving meaningful information. The extraction of the required features is performed on-line, as demanded by many leading applications. This is achieved via techniques operating on the compressed domain, which have been adapted to operate on-line, following a functional analysis model that works transparently over both DCT-based and wavelet-based scalable video. The main innovations presented here are the adaptation of feature extraction techniques to operate on-line, the functional model to achieve independence of the coding scheme, and the subjective evaluation of on-line frame selection validating our results.