scispace - formally typeset
Search or ask a question

Showing papers in "Signal Processing-image Communication in 2008"


Journal ArticleDOI
TL;DR: In this article, the performance of permutation-only multimedia ciphers against known/chosen-plaintext attacks was analyzed and it was shown that O(log"L(MN)) chosen plaintexts are sufficient to recover not less than (in an average sense) half of the plaintext.
Abstract: In recent years secret permutations have been widely used for protecting different types of multimedia data, including speech files, digital images and videos. Based on a general model of permutation-only multimedia ciphers, this paper performs a quantitative cryptanalysis on the performance of these kind of ciphers against plaintext attacks. When the plaintext is of size MxN and with L different levels of values, the following quantitative cryptanalytic findings have been concluded under the assumption of a uniform distribution of each element in the plaintext: (1) all permutation-only multimedia ciphers are practically insecure against known/chosen-plaintext attacks in the sense that only O(log"L(MN)) known/chosen plaintexts are sufficient to recover not less than (in an average sense) half elements of the plaintext; (2) the computational complexity of the known/chosen-plaintext attack is only O(n.(MN)^2), where n is the number of known/chosen plaintexts used. When the plaintext has a non-uniform distribution, the number of required plaintexts and the computational complexity is also discussed. Experiments are given to demonstrate the real performance of the known-plaintext attack for a typical permutation-only image cipher.

336 citations


Journal ArticleDOI
TL;DR: A new approach for designing a no reference image quality evaluation model for JPEG2000 images in this paper, which uses pixel distortions and edge information, which has achieved good quality prediction performance.
Abstract: Perceptual image quality evaluation has become an important issue, due to increasing transmission of multimedia contents over the Internet and 3G mobile networks. Most of the no reference perceptual image quality evaluations traditionally attempted to quantify the predefined artifacts of the coded images. Under the assumption that human visual perception is very sensitive to edge information of an image and any kinds of artifacts create pixel distortion, we propose a new approach for designing a no reference image quality evaluation model for JPEG2000 images in this paper, which uses pixel distortions and edge information. Subjective experiment results on the images are used to train and test the model, which has achieved good quality prediction performance.

144 citations


Journal ArticleDOI
TL;DR: An image quality criterion is proposed, called C4, which is fully generic and based on a rather elaborate model of the human visual system and shows a high correlation between produced objective quality scores and subjective ones, even for images that have been distorted through several different distortion processes.
Abstract: When an image is supposed to have been transformed by a process like image enhancement or lossy image compression for storing or transmission, it is often necessary to measure the quality of the distorted image. This can be achieved using an image processing method called ''quality criterion''. Such a process must produce objective quality scores in close relationship with subjective quality scores given by human observers during subjective quality assessment tests. In this paper, an image quality criterion is proposed. This criterion, called C4, is fully generic (i.e., not designed for predefined distortion types or for particular images types) and based on a rather elaborate model of the human visual system (HVS). This model describes the organization and operation of many stages of vision, from the eye to the ventral and dorsal pathways in the visual cortex. The novelty of this quality criterion relies on the extraction, from an image represented in a perceptual space, of visual features that can be compared to those used by the HVS. Then a similarity metric computes the objective quality score of a distorted image by comparing the features extracted from this image to features extracted from its reference image (i.e., not distorted). Results show a high correlation between produced objective quality scores and subjective ones, even for images that have been distorted through several different distortion processes. To illustrate these performances, they have been computed using three different databases that employed different contents, distortions type, displays, viewing conditions and subjective protocols. The features extracted from the reference image constitute a reduced reference which, in a transmission context with data compression, can be computed at the sender side and transmitted in addition to the compressed image data so that the quality of the decompressed image can be objectively assessed at the receiver side. More, the size of the reduced reference is flexible. This work has been integrated into freely available applications in order to formulate a practical alternative to the PSNR criterion which is still too often used despite its low correlation with human judgments. These applications also enable quality assessment for image transmission purposes.

135 citations


Journal ArticleDOI
TL;DR: This paper intends to contribute for the identification of the most DVC friendly application scenarios, highlighting the expected benefits and drawbacks for each studied scenario.
Abstract: Distributed Video Coding (DVC) is a new video coding paradigm based on two major Information Theory results: the Slepian-Wolf and Wyner-Ziv theorems. Recently, practical DVC solutions have been proposed with promising results; however, there is still a need to study in a more systematic way the set of application scenarios for which DVC may bring major advantages. This paper intends to contribute for the identification of the most DVC friendly application scenarios, highlighting the expected benefits and drawbacks for each studied scenario. This selection is based on a proposed methodology which involves the characterization and clustering of the applications according to their most relevant characteristics, and their matching with the main potential DVC benefits.

99 citations


Journal ArticleDOI
TL;DR: A technique for unsupervised learning of forward motion vectors during the decoding of a frame with reference to its previous reconstructed frame is proposed, an instance of the expectation maximization algorithm.
Abstract: Distributed source coding theory has long promised a new method of encoding video that is much lower in complexity than conventional methods In the distributed framework, the decoder is tasked with exploiting the redundancy of the video signal Among the difficulties in realizing a practical codec has been the problem of motion estimation at the decoder In this paper, we propose a technique for unsupervised learning of forward motion vectors during the decoding of a frame with reference to its previous reconstructed frame The technique, described for both pixel-domain and transform-domain coding, is an instance of the expectation maximization algorithm The performance of our transform-domain motion learning video codec improves as GOP size grows It is better than using motion-compensated temporal interpolation by 05dB when GOP size is 2, and by even more when GOP size is larger It performs within about 025dB of a codec that knows the motion vectors through an oracle, but is hundreds of orders of magnitude less complex than a corresponding brute-force decoder motion search approach would be

96 citations


Journal ArticleDOI
TL;DR: The main purpose and novelty of this paper is the solid and comprehensive performance evaluation made which will provide a strong, and very much needed, performance reference for researchers in this Wz video coding field, as well as a solid way to steer future WZ video coding research.
Abstract: Wyner-Ziv (WZ) video coding-a particular case of distributed video coding (DVC)-is a new video coding paradigm based on two major Information Theory results: the Slepian-Wolf and Wyner-Ziv theorems. In recent years, some practical WZ video coding solutions have been proposed with promising results. One of the most popular WZ video coding architectures in the literature uses turbo codes based Slepian-Wolf coding and a feedback channel to perform rate control at the decoder. This WZ video coding architecture has been first proposed by researchers at Stanford University and has been after adopted and improved by many research groups around the world. However, while there are many papers published with changes and improvements to this architecture, the precise and detailed evaluation of its performance, targeting its deep understanding for future advances, has not been made. Available performance results are mostly partial, under unclear and incompatible conditions, using vaguely defined and also sometimes architecturally unrealistic codec solutions. This paper targets the provision of a detailed, clear, and complete performance evaluation of an advanced transform domain WZ video codec derived from the Stanford turbo coding and feedback channel based architecture. Although the WZ video codec proposed for this evaluation is among the best available, the main purpose and novelty of this paper is the solid and comprehensive performance evaluation made which will provide a strong, and very much needed, performance reference for researchers in this WZ video coding field, as well as a solid way to steer future WZ video coding research.

87 citations


Journal ArticleDOI
Haohao Song1, Songyu Yu1, Xiaokang Yang1, Li Song1, Chen Wang1 
TL;DR: The proposed contourlet-based image adaptive water marking (CIAW) scheme is particularly superior to the conventional watermarking schemes when the watermarked image is attacked by some image processing methods, which destroy the HF subbands, thanks to theWatermarking components preserved in the LF subbands.
Abstract: In the contourlet transform (CT), the Laplacian pyramid (LP) decomposes an image into a low-frequency (LF) subband and a high-frequency (HF) subband. The LF subband is created by filtering the original image with 2-D low-pass filter. However, the HF subband is created by subtracting the synthesized LF subband from the original image but not by 2-D high-pass filtering the original image. In this paper, we propose a contourlet-based image adaptive watermarking (CIAW) scheme, in which the watermark is embedded into the contourlet coefficients of the largest detail subbands of the image. The transform structure of the LP makes the embedded watermark spread out into all subbands likely in which the LF subbands are included when we reconstruct the watermarked image based on the watermarked contourlet coefficients. Since both the LF subbands and the HF subbands contain watermarking components, our watermarking scheme is expected to be robust against both the LF image processing and the HF image processing attacks. The corresponding watermarking detection algorithm is proposed to decide whether the watermark is present or not by exploiting the unique transform structure of LP. With the new proposed concept of spread watermark, the watermark is detected by computing the correlation between the spread watermark and the watermarked image in all contourlet subbands fully. The proposed CIAW scheme is particularly superior to the conventional watermarking schemes when the watermarked image is attacked by some image processing methods, which destroy the HF subbands, thanks to the watermarking components preserved in the LF subbands. Experimental results show the validity of CIAW in terms of both the watermarking invisibility and the watermarking robustness. In addition, the comparison experiments prove the high-efficiency of CIAW again.

62 citations


Journal ArticleDOI
TL;DR: A novel method for repeated sequence detection in an audio-visual TV broadcast that relies on a micro-clustering technique that groups similar audio/visual feature vectors that allows inter-program detection and the extraction of useful programs.
Abstract: In this paper, a novel method for repeated sequence detection in an audio-visual TV broadcast is proposed. This method is required for TV broadcast macro-segmentation which is at the root of many novel services related to TV broadcast and in particular to the TV-on-Demand service. Repeated sequence detection allows inter-program detection (commercials, jingles, credits, ...), which allows the segmentation of the TV broadcast and the extraction of useful programs. Our method is completely non-supervised, that is, it does not require a manually created reference database. It relies on a micro-clustering technique that groups similar audio/visual feature vectors. Clusters are then analyzed and repeated sequences are detected. This method is able to continuously analyze the TV broadcast and to periodically return analysis results. The efficiency and effectiveness of the method have been shown on two real broadcasts of 12h and 7 days.

50 citations


Journal ArticleDOI
TL;DR: This paper proposes an efficient no-reference noticeable blockiness estimation algorithm for BDCT coded images, which is incorporated with the discontinuity map to generate Noticeable Blockiness Map (NBM), which can be used to guide perceptual quality assessment, codec parameter optimization, post-processing, etc.
Abstract: Blocking artifact is a prevailing degradation caused by the block-based Discrete Cosine Transform (BDCT) coding technique under low bit-rate conditions. In this paper, we propose an efficient no-reference noticeable blockiness estimation algorithm for BDCT coded images. The difference on block boundaries is first measured and then transformed into the block discontinuity map. We consider the effects of luminance adaptation and texture masking on blocking and integrate them using a nonlinear operator to form an overall masking map. This map is finally incorporated with the discontinuity map to generate Noticeable Blockiness Map (NBM), which can be used to guide perceptual quality assessment, codec parameter optimization, post-processing, etc. We have demonstrated the validity of the NBM through its applications in no-reference image quality assessment and image deblocking.

42 citations


Journal ArticleDOI
TL;DR: New pixel-decimation patterns are obtained by combining the proposed boundary-based patterns with N-queen patterns and the genetic algorithm (GA)-based search to find optimal M-length patterns in an NxN block.
Abstract: This paper presents a boundary-based approach towards pixel decimation with applications in block-matching algorithms (BMAs). The proposed approach is based on the observation that new objects usually enter macroblocks (MBs) through their boundaries. The MBs are selected based on boundary region matching only. The boundary-based patterns can be used to speed up motion estimation with marginal loss in image quality. Different decimation levels for image quality trade-off with computational power have been presented. The mathematical intuition in support of the proposed patterns has been discussed. Apart from the boundary-based approach, the novelty in our contribution also lies in performing a genetic algorithm (GA)-based search to find optimal M-length patterns in an NxN block. The resultant patterns are found to have better values of spatial homogeneity and directional coverage metrics, as compared to the recently proposed N-queen decimation lattices. Subsequently, we obtain new pixel-decimation patterns by combining the proposed boundary-based patterns with N-queen patterns and the GA-based patterns. Experimental results demonstrate considerably improved coding efficiency and comparable prediction quality of these new patterns as compared to existing decimation lattices.

38 citations


Journal ArticleDOI
TL;DR: Simulation results applying super-resolution show that the image quality can further be improved by reducing motion blur and compression artifacts.
Abstract: This paper presents a new approach for the generation of super-resolution stereoscopic and multi-view video from monocular video. Such multi-view video is used, for instance, with multi-user 3D displays or auto-stereoscopic displays with head-tracking to create a depth impression of the observed scenery. Our approach is an extension of the realistic stereo-view synthesis (RSVS) approach, which is based on structure from motion techniques and image-based rendering to generate the desired stereoscopic views for each point in time. Subjective quality measurements with 25 real and 3 synthetic sequences were carried out to test the performance of RSVS against simple time-shift and depth-image-based rendering (DIBR). Our approach heavily enhances the stereoscopic depth perception and gives a more realistic impression of the observed scenery. Simulation results applying super-resolution show that the image quality can further be improved by reducing motion blur and compression artifacts.

Journal ArticleDOI
TL;DR: A listless modified SPIHT (LMSPIHT) approach is proposed, which is a fast and low memory image coding algorithm based on the lifting wavelet transform and incorporates human visual system characteristics in the coding scheme; thus it outperforms the traditional SPIHT algorithm at low bit rate coding.
Abstract: Due to its excellent rate-distortion performance, set partitioning in hierarchical trees (SPIHT) has become the state-of-the-art algorithm for image compression. However, the algorithm does not fully provide the desired features of progressive transmission, spatial scalability and optimal visual quality, at very low bit rate coding. Furthermore, the use of three linked lists for recording the coordinates of wavelet coefficients and tree sets during the coding process becomes the bottleneck of a fast implementation of the SPIHT. In this paper, we propose a listless modified SPIHT (LMSPIHT) approach, which is a fast and low memory image coding algorithm based on the lifting wavelet transform. The LMSPIHT jointly considers the advantages of progressive transmission, spatial scalability, and incorporates human visual system (HVS) characteristics in the coding scheme; thus it outperforms the traditional SPIHT algorithm at low bit rate coding. Compared with the SPIHT algorithm, LMSPIHT provides a better compression performance and a superior perceptual performance with low coding complexity. The compression efficiency of LMSPIHT comes from three aspects. The lifting scheme lowers the number of arithmetic operations of the wavelet transform. Moreover, a significance reordering of the modified SPIHT ensures that it codes more significant information belonging to the lower frequency bands earlier in the bit stream than that of the SPIHT to better exploit the energy compaction of the wavelet coefficients. HVS characteristics are employed to improve the perceptual quality of the compressed image by placing more coding artifacts in the less visually significant regions of the image. Finally, a listless implementation structure further reduces the amount of memory and improves the speed of compression by more than 51% for a 512x512 image, as compared with that of the SPIHT algorithm.

Journal ArticleDOI
TL;DR: A novel shot boundary detection technique is introduced that operates completely in the compressed domain using the H.264/AVC video standard, and is further enhanced to exploit hierarchical coding patterns.
Abstract: The amount of digital video content has grown extensively during recent years, resulting in a rising need for the development of systems for automatic indexing, summarization, and semantic analysis. A prerequisite for video content analysis is the ability to discover the temporal structure of a video sequence. In this paper, a novel shot boundary detection technique is introduced that operates completely in the compressed domain using the H.264/AVC video standard. As this specification contains a number of new coding tools, the characteristics of a compressed bit stream are different from prior video specifications. Furthermore, the H.264/AVC specification introduces new coding structures such as hierarchical coding patterns, which can have a major influence on video analysis algorithms. First, a shot boundary detection algorithm is proposed which can be used to segment H.264/AVC bit streams based on temporal dependencies and spatial dissimilarities. This algorithm is further enhanced to exploit hierarchical coding patterns. As these sequences are characterized by a pyramidal structure, only a subset of frames needs to be considered during analysis, allowing the reduction of the computational complexity. Besides the increased efficiency, experimental results also show that the proposed shot boundary detection algorithm achieves a high accuracy.

Journal ArticleDOI
TL;DR: This paper investigates predictive coding methods to compress images represented in the Radon domain as a set of projections to achieve lossless compression and presents here the evolution of the compression ratio depending on the chosen redundancy.
Abstract: This paper investigates predictive coding methods to compress images represented in the Radon domain as a set of projections. Both the correlation within and between discrete Radon projections at similar angles can be exploited to achieve lossless compression. The discrete Radon projections investigated here are those used to define the Mojette transform first presented by Guedon et al. [Psychovisual image coding via an exact discrete Radon transform, in: T.W. Lance (Ed.), Proceedings of the Visual Communications AND Image Processing (VCIP), May 1995, Taipei, Taiwan, pp. 562-572]. This work is further to the preliminary investigation presented by Autrusseau et al. [Lossless compression based on a discrete and exact radon transform: a preliminary study, in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. II, May 2006, Toulouse, France, pp. 425-428]. The 1D Mojette projections are re-arranged as two dimensional images, thus allowing the use of 2D image compression techniques onto the projections. Besides the compression capabilities, the Mojette transforms brings an interesting property: a tunable redundancy. As the Mojette transform is able to both compress and add redundancy, the proposed method can be viewed as a joint lossless source-channel coding technique for images. We present here the evolution of the compression ratio depending on the chosen redundancy.

Journal ArticleDOI
TL;DR: This work proposes an efficient and fast 4x4 intra-prediction mode selection scheme that reduces about 91% of mode decision time and 70% of total encoding time of intra-coding with ignorable degradation of coding performance.
Abstract: One of the new features in the H.264/AVC encoder is the use of Lagrangian rate-distortion optimization (RDO) method during mode decision at the macroblock level. The RDO technique has been employed in H.264/AVC for intra-prediction mode selection to achieve better coding efficiency. But the computational complexity of mode decision algorithm is extremely high. To reduce the complexity of mode decision, we propose an efficient and fast 4x4 intra-prediction mode selection scheme. The proposed method reduces the candidate of the prediction modes based on the correlation between neighboring blocks and the sum of absolute transformed difference (SATD) between the original block and the intra-predicted block. Firstly, the rank of each mode is obtained based on the SATD value. Then, the candidate modes are further reduced by using the combination of rank and most probable mode. The proposed method reduces the number of candidate mode to either one or two. Simulation results demonstrate that the proposed mode decision method reduces about 91% of mode decision time and 70% of total encoding time of intra-coding with ignorable degradation of coding performance.

Journal ArticleDOI
TL;DR: An effective ROI determination method is proposed and experiment results on TMN8 reveal that the proposed ROI video coding can significantly enhance the quality at ROIs and provides the better subjective quality compared with previous works.
Abstract: Region-of-interest (ROI) is an essential task that one must undertake in low bit-rate multimedia communications because of the limited bandwidth of the channels and the transcoder between different standards. In this paper, an effective ROI determination method is proposed. The skin-color extraction is first employed to determine the ROI macroblocks; then, the framework for the adjustment of the quantization parameters (QPs) of ROI macroblocks according to the distortion and bit-rate variations is proposed to fit the target bit rate. In view of the residual distortion information, the fuzzy logic controller proposed in our framework can adaptively adjust the weighting factor for corresponding QPs by the distortion variation in the macroblock layer. A linear prediction formula, derived from the rate variation, is proposed to allocate appropriate bits for each ROI macroblock and maintain the target bit rate and buffer fullness. Experiment results on TMN8 reveal that the proposed ROI video coding can significantly enhance the quality at ROIs. Furthermore, the proposed framework obtains about 0.5dB gain in the objective performance and also provides the better subjective quality compared with previous works.

Journal ArticleDOI
TL;DR: An efficient iterative scheme is proposed, which reduces considerably the overall computational cost of the image registration problem and properly combined with the proposed similarity measure results in a fast spatial domain technique for subpixel image registration.
Abstract: In this paper a new technique for performing image registration with subpixel accuracy is presented. The proposed technique, which is based on the maximization of the correlation coefficient function, does not require the reconstruction of the intensity values and provides a closed-form solution to the subpixel translation estimation problem. Moreover, an efficient iterative scheme is proposed, which reduces considerably the overall computational cost of the image registration problem. This scheme properly combined with the proposed similarity measure results in a fast spatial domain technique for subpixel image registration.

Journal ArticleDOI
TL;DR: Several results in complex situations prove the capacity of the algorithm to learn people appearance in unspecified poses and viewpoints, and its efficiency for tracking multiple humans in real-time using the specific updated descriptors.
Abstract: Tracking an unspecified number of people in real-time is one of the most challenging tasks in computer vision. In this paper, we propose an original method to achieve this goal, based on the construction of a 2D human appearance model. The general framework, which is a region-based tracking approach, is applicable to any type of object. We show how to specialize the method for taking advantage of the structural properties of the human body. We segment its visible parts by using a skeletal graph matching strategy inspired by the shock graphs. Only morphological and topological information is encoded in the model graph, making the approach independent of the pose of the person, the viewpoint, the geometry or the appearance of the limbs. The limbs labeling makes it possible to build and update an appearance model for each body part. The resulting discriminative feature, that we denote as an articulated appearance model, captures both color, texture and shape properties of the different limbs. It is used to identify people in complex situations (occlusion, field of view exit, etc.), and maintain the tracking. The model to image matching has proved to be much more robust and better-founded than with existing global appearance descriptors, specifically when dealing with highly deformable objects such as humans. The only assumption for the recognition is the approximate viewpoint correspondence between the different models during the matching process. The method does not make use of skin color detection, which allows us to perform tracking under any viewpoint. Occlusions can be detected by the generic part of the algorithm, and the tracking is performed in such cases by means of a particle filter. Several results in complex situations prove the capacity of the algorithm to learn people appearance in unspecified poses and viewpoints, and its efficiency for tracking multiple humans in real-time using the specific updated descriptors. Finally, the model provides an important clue for further human motion analysis process.

Journal ArticleDOI
TL;DR: The proposed method exploits a spatial domain error concealment technique and a resynchronization technique for detecting errors via some extra parity information, and protects the bit-streams against errors via channel codes and the parity bits of these codes are embedded into other slices.
Abstract: Video transmission over noisy channels makes error concealment an indispensable job. Utilization of data hiding for this problem provides a reserve information about the content at the receiver, while unchanging the transmitted bit-stream syntax; hence, improves the reconstructed video quality with almost no extra channel utilization. A spatial domain error concealment technique, which hides edge orientation information of a block, and a resynchronization technique, which embeds bit-length of a block into other blocks are composed. The proposed method also exploits these two techniques for detecting errors via some extra parity information. Moreover, the motion vectors between consecutive frames are also embedded into the consecutive frames for better concealment at the receiver. Finally, as a novel approach, the bit-streams are further protected against errors via channel codes and the parity bits of these codes are embedded into other slices. In this manner, implicit utilization of error correction codes improves the reconstruction quality significantly. The simulation results show that the proposed approaches perform quite promising for concealing the errors in any compressed video bit-stream.

Journal ArticleDOI
TL;DR: The algorithm presented is easy to use and yields the highest performance in terms of the average number of iterations required to find a specific image, however, it is computationally more expensive and requires more memory than two of the other techniques.
Abstract: CBIR (content-based image retrieval) systems attempt to allow users to perform searches in large picture repositories. In most existing CBIR systems, images are represented by vectors of low level features. Searches in these systems are usually based on distance measurements defined in terms of weighted combinations of the low level features. This paper presents a novel approach to combining features when using multi-image queries consisting of positive and negative selections. A fuzzy set is defined so that the degree of membership of each image in the repository to this fuzzy set is related to the user's interest in that image. Positive and negative selections are then used to determine the degree of membership of each picture to this set. The system attempts to capture the meaning of a selection by modifying a series of parameters at each iteration to imitate user behavior, becoming more selective as the search progresses. The algorithm has been evaluated against four other representative relevance feedback approaches. Both the performance and usability of the five CBIR systems have been studied. The algorithm presented is easy to use and yields the highest performance in terms of the average number of iterations required to find a specific image. However, it is computationally more expensive and requires more memory than two of the other techniques.

Journal ArticleDOI
TL;DR: The performance of the extracted spatio-chromatic spatial patch bases is evaluated in terms of quality of reconstruction with respect to their potential for data compression and a deeper understanding of the role played by chromatic features in data reduction is understood.
Abstract: We investigate the implications of a unified spatio-chromatic basis for image compression and reconstruction. Different adaptive and general methods (principal component analysis, PCA, independent component analysis, ICA, and discrete cosine transform, DCT) are applied to generate bases. While typically such bases with spatial extent are investigated in terms of their correspondence to human visual perception, we are interested in their applicability to multimedia encoding. The performance of the extracted spatio-chromatic spatial patch bases is evaluated in terms of quality of reconstruction with respect to their potential for data compression. Since ICA is not as widely used as it should be, compared to the other decorrelation methods applied here in a new domain, we also provide a review of ICA. The results discussed here are intended to provide another path towards perceptually based encoding of visual data. This leads to a deeper understanding of the role played by chromatic features in data reduction.

Journal ArticleDOI
TL;DR: The rate-distortion cost difference between coding and skipping a macroblock is used as the single decision feature and an appropriate decision threshold is determined following modeling of the cost difference's class-conditional PDFs in order to further limit system complexity.
Abstract: In order to achieve a high compression ratio, the H.264/AVC standard has incorporated a large number of coding modes which must be evaluated during the coding process to determine the optimal rate-distortion tradeoff. The coding gains of H.264/AVC arise at the expense of significant coder complexity which may not be desired for mobile devices with limited battery life. One coder process that has been identified as having potential for achieving computation savings is the selection between skipping the coding of a macroblock and coding of the macroblock in one of the remaining coding modes. In low-motion subsequences, a large percentage of macroblocks are ''skipped'', that is, no coded data are transmitted for these macroblocks. By estimating which macroblocks are to be skipped during the coding process, significant savings in computation can be realized, since the coder then does not evaluate the rate-distortion costs of all candidate coding modes. In this work, we place this skip versus code decision in a Bayesian framework. We use the rate-distortion cost difference between coding and skipping a macroblock as the single decision feature and determine an appropriate decision threshold following modeling of the cost difference's class-conditional PDFs. Finally, in order to further limit system complexity, we model the threshold's parameters as functions of application- and sequence-specific characteristics, namely, the quantization parameter and an activity factor. This results in a decision threshold that is only a function of these two characteristics, which are either known or easily measured. It is shown that this approach can result in a time savings of over 80% for low-motion sequences at a negligible decrease or, in certain cases, a slight increase in quality over a reference H.264 codec.

Journal ArticleDOI
TL;DR: The proposed EMD algorithm detects dim moving target effectively and estimate its trajectory accurately and is adaptable to real-time target detection and tracking.
Abstract: Dim target detection and tracking algorithm based on the empirical mode decomposition (EMD) is proposed. EMD is introduced to decompose the original image into a definite number of high frequency and low frequency components by means of sifting. With the EMD algorithm, it is valid to estimate the background and get the dim target by removing the background from the original image. The algorithm detects dim moving target effectively and estimate its trajectory accurately. The data analysis and experiments show that the proposed algorithm is adaptable to real-time target detection and tracking.

Journal ArticleDOI
TL;DR: A recursive algorithm is developed to compute the GOP-level transmission distortion at pixel-level precision using pre-computed video information and a piecewise linear-fitting approach is proposed to achieve low-complexity transmission distortion modeling.
Abstract: Unequal loss protection is an effective tool in delivering compressed video streaming over packet-switched networks robustly. A critical component in any unequal-loss-protection scheme is a metric for evaluating the importance of different frames in a Group-Of-Pictures (GOP). In the case of video streaming over 3G mobile networks, packet loss usually corresponds to whole-frame loss due to low bandwidth and small picture size, which results in high error rates and thus most of the existing low-complexity transmission-distortion-estimate models may be ineffective. In this paper, we firstly develop a recursive algorithm to compute the GOP-level transmission distortion at pixel-level precision using pre-computed video information. Based on the study on the propagating behavior of the whole-frame-loss transmission distortion, we then propose a piecewise linear-fitting approach to achieve low-complexity transmission distortion modeling. The simulation results demonstrate that the proposed two models are accurate and robust. The proposed transmission distortion models are fast and accurate importance assessment tools in allocating limited channel resources optimally for the mobile streaming video.

Journal ArticleDOI
TL;DR: This contribution describes a cognitive vision system conceived to automatically provide high-level interpretations of complex real-time situations in outdoor and indoor scenarios, and to eventually maintain communication with casual end users in multiple languages.
Abstract: The integration of cognitive capabilities in computer vision systems requires both to enable high semantic expressiveness and to deal with high computational costs as large amounts of data are involved in the analysis. This contribution describes a cognitive vision system conceived to automatically provide high-level interpretations of complex real-time situations in outdoor and indoor scenarios, and to eventually maintain communication with casual end users in multiple languages. The main contributions are: (i) the design of an integrative multilevel architecture for cognitive surveillance purposes; (ii) the proposal of a coherent taxonomy of knowledge to guide the process of interpretation, which leads to the conception of a situation-based ontology; (iii) the use of situational analysis for content detection and a progressive interpretation of semantically rich scenes, by managing incomplete or uncertain knowledge, and (iv) the use of such an ontological background to enable multilingual capabilities and advanced end-user interfaces. Experimental results are provided to show the feasibility of the proposed approach.

Journal ArticleDOI
TL;DR: Experimental results show that this fast intra-mode selection scheme can lessen the encoding time significantly with little loss of bit-rate and visual quality.
Abstract: The H.264/AVC is the newest video coding standard recommended by ITU-T and MPEG. Compared with all existing video coding standards, H.264 can achieve superior performance by using many advanced techniques. Intra mode selection is an important feature in H.264 standard and can reduce the spatial redundancy in intra-frame significantly. An efficient rate distortion optimization (RDO) technique is employed in H.264 to choose the best mode for each MB, but the computational cost increases drastically. In this paper, a fast intra-mode selection algorithm is introduced. By using a fast edge detection method which is based on non-normalized Haar transform (NHT), edge for each sub-block can be extracted. Based on the local edge information, only few intra-modes are chosen as mode candidates. A fast RDO algorithm is also proposed in this paper which is based on accurate rate-distortion estimation model and the fast intra-mode RDO method. By combining these two methods, computational load is reduced remarkably. Experimental results show that this fast intra-mode selection scheme can lessen the encoding time significantly with little loss of bit-rate and visual quality.

Journal ArticleDOI
TL;DR: Experimental results reveal that the proposed error resilient coding scheme has comparable or better performance than a scheme where forward error correction codes are used, and the proposed solution shows good performance when compared to a scheme that uses the intra-macroblock refresh procedure.
Abstract: This paper proposes an error resilient coding scheme that employs distributed video coding tools. A bitstream, produced by any standard motion-compensated predictive codec (MPEG-x, H.26x), is sent over an error-prone channel. In addition, a Wyner-Ziv encoded auxiliary bitstream is sent as redundant information to serve as a forward error correction code. At the decoder side, error concealed reconstructed frames are used as side information by the Wyner-Ziv decoder, and the corrected frame is used as a reference by future frames, thus reducing drift. We explicitly target the problem of rate allocation at the encoder side, by estimating the channel induced distortion in the transform domain. Rate adaptivity is achieved at the frame, subband and bitplane granularity. Experimental results conducted over a simulated error-prone channel reveal that the proposed scheme has comparable or better performance than a scheme where forward error correction codes are used. Moreover the proposed solution shows good performance when compared to a scheme that uses the intra-macroblock refresh procedure.

Journal ArticleDOI
TL;DR: A novel scheme to achieve more effective analysis, retrieval and exploration of large-scale news video collections by performing multi-modal video content analysis and synchronization and a novel hyperbolic visualization scheme is incorporated to visualize large- scale news topics according to their associations and interestingness.
Abstract: In this paper, we have developed a novel scheme to achieve more effective analysis, retrieval and exploration of large-scale news video collections by performing multi-modal video content analysis and synchronization. First, automatic keyword extraction is performed on news closed captions and audio channels to detect the most interesting news topics (i.e., keywords for news topic interpretation), and the associations among these news topics (i.e., contextual relationships among the news topics) are further determined according to their co-occurrence probabilities. Second, visual semantic items, such as human faces, text captions, video concepts, are extracted automatically by using our semantic video analysis techniques. The news topics are automatically synchronized with the most relevant visual semantic items. In addition, an interestingness weight is assigned for each news topic to characterize its importance. Finally, a novel hyperbolic visualization scheme is incorporated to visualize large-scale news topics according to their associations and interestingness. With a better global overview of large-scale news video collections, users can specify their queries more precisely and explore large-scale news video collections interactively. Our experiments on large-scale news video collections have provided very positive results.

Journal ArticleDOI
TL;DR: The proposed symmetrical mode that replaces the conventional bi-directional mode, in which only one motion vector is coded and another is derived from the coded one with the assumption of approximate constant-speed motion and improves the accuracy of derived motion vectors in temporal direct mode with division-free operations.
Abstract: This paper first gives a brief overview of the Chinese audio-video coding standard (AVS) especially on prediction modes of motion compensation for B-picture. Furthermore, two techniques adopted by AVS about how to improve motion compensation for B-picture coding are discussed in detail. The first one is the proposed symmetrical mode that replaces the conventional bi-directional mode, in which only one motion vector is coded and another is derived from the coded one with the assumption of approximate constant-speed motion. It can achieve a better trade-off between prediction accuracy and the bits for coding motion information. The second one is the improved temporal direct mode. It not only solves the problem in AVS on how to correctly derive reference index under the constraint of two reference buffers for both P- and B-pictures but also improves the accuracy of derived motion vectors in temporal direct mode with division-free operations. In experimental results, the proposed symmetrical mode and the improved temporal direct mode were integrated into the H.264/MPEG-4 AVC reference software to exhibit their performances. Furthermore, the B-picture coding performance in AVS is also evaluated using different GOP coding structures.

Journal ArticleDOI
TL;DR: A distributed algorithm for optimized rate allocation, where the media client iteratively determines the best set of streaming paths, based on information gathered by network nodes, which is shown to quickly converge to the rate allocation that provides a maximal quality to the video client.
Abstract: The paper addresses the distributed path computation and rate allocation problems for video delivery over multipath networks. The streaming rate on each path is determined such that the end-to-end media distortion is minimized, when a media client aggregates packets received via multiple network channels to the streaming server. In common practical scenarios, it is, however, difficult for the server to have the full knowledge about the network status. Therefore, we propose here a distributed path selection and rate allocation algorithm, where the network nodes participate to the optimized path selection and rate allocation based on their local view of the network. This eliminates the need for end-to-end network monitoring, and permits the deployment of large scale rate allocation solutions. We design a distributed algorithm for optimized rate allocation, where the media client iteratively determines the best set of streaming paths, based on information gathered by network nodes. Each intermediate node then forwards incoming media flows on the outgoing paths, in a distributed manner. The proposed algorithm is shown to quickly converge to the rate allocation that provides a maximal quality to the video client. We also propose a distributed greedy algorithm that achieves close-to-optimal end-to-end distortion performance in a single pass. Both algorithms are shown to outperform simple heuristic-based rate allocation approaches for numerous random network topologies. They offer an interesting solution for media-specific rate allocation over large scale multipath networks.