scispace - formally typeset
Search or ask a question

Showing papers in "Signal Processing-image Communication in 2010"


Journal ArticleDOI
TL;DR: An extensive list of blind methods for detecting image forgery is presented and an attempt has been made to make this paper complete by listing most of the existing references and by providing a detailed classification group.
Abstract: Verifying the integrity of digital images and detecting the traces of tampering without using any protecting pre-extracted or pre-embedded information have become an important and hot research field. The popularity of this field and the rapid growth in papers published during the last years have put considerable need on creating a complete bibliography addressing published papers in this area. In this paper, an extensive list of blind methods for detecting image forgery is presented. By the word blind we refer to those methods that use only the image function. An attempt has been made to make this paper complete by listing most of the existing references and by providing a detailed classification group.

211 citations


Journal ArticleDOI
TL;DR: A three-stage framework forNR QE is described that encompasses the range of potential use scenarios for the NR QE and allows knowledge of the human visual system to be incorporated throughout, and the measurement stage is surveyed, considering methods that rely on bitstream, pixels, or both.
Abstract: This paper reviews the basic background knowledge necessary to design effective no-reference (NR) quality estimators (QEs) for images and video. We describe a three-stage framework for NR QE that encompasses the range of potential use scenarios for the NR QE and allows knowledge of the human visual system to be incorporated throughout. We survey the measurement stage of the framework, considering methods that rely on bitstream, pixels, or both. By exploring both the accuracy requirements of potential uses as well as evaluation criteria to stress-test a QE, we set the stage for our community to make substantial future improvements to the challenging problem of NR quality estimation.

166 citations


Journal ArticleDOI
TL;DR: This work considers a four-component image model that classifies image local regions according to edge and smoothness properties and provides results that are highly consistent with human subjective judgment of the quality of blurred and noisy images and deliver better overall performance than (G-)SSIM and MS-SSIM on the LIVE Image Quality Assessment Database.
Abstract: The assessment of image quality is important in numerous image processing applications. Two prominent examples, the Structural Similarity Image (SSIM) index and Multi-scale Structural Similarity (MS-SSIM) operate under the assumption that human visual perception is highly adapted for extracting structural information from a scene. Results in large human studies have shown that these quality indices perform very well relative to other methods. However, the performance of SSIM and other Image Quality Assessment (IQA) algorithms are less effective when used to rate blurred and noisy images. We address this defect by considering a four-component image model that classifies image local regions according to edge and smoothness properties. In our approach, SSIM scores are weighted by region type, leading to modified versions of (G-)SSIM and MS-(G-)SSIM, called four-component (G-)SSIM (4-(G-)SSIM) and four-component MS-(G-)SSIM (4-MS-(G-)SSIM). Our experimental results show that our new approach provides results that are highly consistent with human subjective judgment of the quality of blurred and noisy images, and also deliver better overall performance than (G-)SSIM and MS-(G-)SSIM on the LIVE Image Quality Assessment Database.

151 citations


Journal ArticleDOI
TL;DR: This paper surveys state-of-the-art signal-driven perceptual audio and video quality assessment methods independently, and investigates relevant issues in developing joint audio-visual quality metrics, and proposes feasible solutions for future work in perceptual-based audio- visual quality metrics.
Abstract: Accurate measurement of the perceived quality of audio-visual services at the end-user is becoming a crucial issue in digital applications due to the growing demand for compression and transmission of audio-visual services over communication networks. Content providers strive to offer the best quality of experience for customers linked to their different quality of service (QoS) solutions. Therefore, developing accurate, perceptual-based quality metrics is a key requirement in multimedia services. In this paper, we survey state-of-the-art signal-driven perceptual audio and video quality assessment methods independently, and investigate relevant issues in developing joint audio-visual quality metrics. Experiments with respect to subjective quality results have been conducted for analyzing and comparing the performance of the quality metrics. We consider emerging trends in audio-visual quality assessment, and propose feasible solutions for future work in perceptual-based audio-visual quality metrics.

132 citations


Journal ArticleDOI
TL;DR: The aim of this study is to understand how people watch a video sequence during free-viewing and quality assessment tasks and observes that a saliency-based distortion pooling does not significantly improve the performances of the video quality metric.
Abstract: The aim of this study is to understand how people watch a video sequence during free-viewing and quality assessment tasks. To this end, two eye tracking experiments were carried out. The video dataset is composed of 10 original video sequences and 50 impaired video sequences (five levels of impairments obtained by a H.264 video compression). A first experiment consisted in recording eye movements in a free-viewing task. The 10 original video sequences were used. The second experiment concerned an eye tracking experiment in a context of a subjective quality assessment. Eye movements were recorded while observers judged on the quality of the 50 impaired video sequences. The comparison between gaze allocations indicates the quality task has a moderate impact on the visual attention deployment. This impact increases with the presentation number of impaired video sequences. The locations of regions of interest remain highly similar after several presentations of the same video sequence, suggesting that eye movements are still driven by the low level visual features after several viewings. In addition, the level of distortion does not significantly alter the oculomotor behavior. Finally, we modified the pooling of an objective full-reference video quality metric by adjusting the weight applied on the distortions. This adjustment depends on the visual importance (the visual importance is deduced from the eye tracking experiment realized on the impaired video sequences). We observe that a saliency-based distortion pooling does not significantly improve the performances of the video quality metric.

80 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel quality metric for JPEG2000 images that achieves performance competitive with the state-of-the-art no-reference metrics on public datasets and is robust to various image contents.
Abstract: No-reference measurement of perceptual image quality is a crucial and challenging issue in modern image processing applications. One of the major difficulties is that some inherent features of natural images and artifacts are possibly rather ambiguous. In this paper, we tackle this problem using statistical information on image gradient profiles and propose a novel quality metric for JPEG2000 images. The key part of the metric is a histogram representing the sharpness distribution of the gradient profiles, from which a blur metric that is insensitive to inherently blurred structures in the natural image is established. Then a ringing metric is built based on ringing visibilities of regions associated with the gradient profiles. Finally, a combination model optimized through plenty of experiments is developed to predict the perceived image quality. The proposed metric achieves performance competitive with the state-of-the-art no-reference metrics on public datasets and is robust to various image contents.

75 citations


Journal ArticleDOI
TL;DR: A new approach which exploits the probabilistic properties from the phase information of 2-D complex wavelet coefficients for image modeling using a feature in which phase information is incorporated, yielding a higher accuracy in texture image retrieval as well as in segmentation.
Abstract: In this paper, we develop a new approach which exploits the probabilistic properties from the phase information of 2-D complex wavelet coefficients for image modeling. Instead of directly using phases of complex wavelet coefficients, we demonstrate why relative phases should be used. The definition, properties and statistics of relative phases of complex coefficients are studied in detail. We proposed von Mises and wrapped Cauchy for the probability density function (pdf) of relative phases in the complex wavelet domain. The maximum-likelihood method is used to estimate two parameters of von Mises and wrapped Cauchy. We demonstrate that the von Mises and wrapped Cauchy fit well with real data obtained from various real images including texture images as well as standard images. The von Mises and wrapped Cauchy models are compared, and the simulation results show that the wrapped Cauchy fits well with the peaky and heavy-tailed pdf of relative phases and the von Mises fits well with the pdf which is in Gaussian shape. For most of the test images, the wrapped Cauchy model is more accurate than the von Mises model, when images are decomposed by different complex wavelet transforms including dual-tree complex wavelet (DTCWT), pyramidal dual-tree directional filter bank (PDTDFB) and uniform discrete curvelet transform (UDCT). Moreover, the relative phase is applied to obtain new features for texture image retrieval and segmentation applications. Instead of using only real or magnitude coefficients, the new approach uses a feature in which phase information is incorporated, yielding a higher accuracy in texture image retrieval as well as in segmentation. The relative phase information which is complementary to the magnitude is a promising approach in image processing.

73 citations


Journal ArticleDOI
TL;DR: Simulation results obtained using some color and gray-level images clearly demonstrate the strong performance of the proposed SCAN-CA-based image security system.
Abstract: This paper presents a novel SCAN-CA-based image security system which belongs to synchronous stream cipher. Its encryption method is based on permutation of the image pixels and replacement of the pixel values. Permutation is done by scan patterns generated by the SCAN approach. The pixel values are replaced using the recursive cellular automata (CA) substitution. The proposed image encryption method satisfies the properties of confusion and diffusion as the characteristics of SCAN and CA substitution are flexible. The salient features of the proposed image encryption method are lossless, symmetric private key encryption, very large number of secret keys, key-dependent permutation, and key-dependent pixel value replacement. Simulation results obtained using some color and gray-level images clearly demonstrate the strong performance of the proposed SCAN-CA-based image security system.

60 citations


Journal ArticleDOI
TL;DR: The results suggest that video artifacts have no influence on the deployment of visual attention, even though these artifacts have been judged by observers as at least annoying.
Abstract: The visual attention deployment in a visual scene is contingent upon a number of factors. The relationship between the observer's attention and the visual quality of the scene is investigated in this paper: can a video artifact disturb the observer's attention? To answer this question, two experiments have been conducted. First, eye-movements of human observers were recorded, while they watched ten video clips of natural scenes under a free-viewing task. These clips were more or less impaired by a video encoding scheme (H.264/AVC). The second experiment relies on the subjective rating of the quality of the video clips. A quality score was then assigned to each clip, indicating the extent to which the impairments were visible. The standardized method double stimulus impairment scale (DSIS) was used, meaning that each observer viewed the original clip followed by its impaired version. Both experimental results have conjointly been analyzed. Our results suggest that video artifacts have no influence on the deployment of visual attention, even though these artifacts have been judged by observers as at least annoying.

47 citations


Journal ArticleDOI
TL;DR: This method uses reversible data embedding to embed the recovery data and hash value for reversibility and authentication, respectively, to prevent unauthorized users from approximating the original pixels in the watermarked region.
Abstract: This paper proposes a secure reversible visible watermarking approach. The proposed pixel mapping function superposes a binary watermark image on a host image to create an intermediate visible watermarked image. Meanwhile, an almost inverse function generates the recovery data for restoring the original pixels. To prevent unauthorized users from approximating the original pixels in the watermarked region, this method adds an integer sequence in the intermediate watermarked image. The sequence is composed of integers generated by two random variables having normal distributions with zero means and distinct variances. The variances facilitate a trade-off between the watermark transparency and the noise generated by unauthorized users. The proposed method also uses Lagrange multipliers to find the optimized variances for the trade-off. Finally, this method uses reversible data embedding to embed the recovery data and hash value for reversibility and authentication, respectively. Experimental results show the watermark visibility for test images along with the watermark transparency for different variances. Using the optimized variances, the watermarked image is at the balance between the watermark transparency and the unauthorized-user-generating noise.

45 citations


Journal ArticleDOI
TL;DR: To protect copyrights of digital contents securely without quality degradation, this study attempts to discover a sequence of process for generating a 2D barcode from a UCI tag and watermarking the barcode into a digital content.
Abstract: A digital object identifier refers to diverse technologies associated with assigning an identifier to a digital resource and managing the identification system. One type of implementation of a digital object identifier developed by the Korean Government is termed the Universal Content Identifier (UCI) system. It circulates and utilizes identifiable resources efficiently by connecting various online and offline identifying schemes. UCI tags can contain not only identifiers but also abundant additional information regarding contents. So, researchers and practitioners have shown great interest in methods that utilize the two-dimensional barcode (2D barcode) to attach UCI tags to digital contents. However, attaching a 2D barcode directly to a digital content raises two problems. First, quality of the content may deteriorate due to the insertion of the barcode; second, a malicious user can invalidate the identifying tag, simply by removing the tag from the original content. We believe that these concerns can be mitigated by inserting an invisible digital tag containing information about an identifier and digital copyrights into the entire area of the digital content. In this study, to protect copyrights of digital contents securely without quality degradation, we attempt to discover a sequence of process for generating a 2D barcode from a UCI tag and watermarking the barcode into a digital content. Such a UCI system can be widely applied to areas such as e-learning, distribution tracking, transaction certification, and reference linking services when the system is equipped with 2D barcode technology and secure watermarking algorithms. The latter part of this paper analyzes intensive experiments conducted to evaluate the robustness of traditional digital watermarking algorithms against external attacks.

Journal ArticleDOI
Liquan Shen1, Zhi Liu1, Tao Yan1, Zhaoyang Zhang1, Ping An1 
TL;DR: This paper proposes to reduce the complexities of the ME and DE processes with an early SKIP mode decision algorithm based on the analysis of prediction mode distribution regarding the corresponding MBs in the neighbor view, and can achieve computational saving of 46-57% with no significant loss of rate-distortion performance.
Abstract: In the joint multiview video model (JMVM) proposed by JVT, the variable block-size motion estimation (ME) and disparity estimation (DE) have been employed to determine the best coding mode for each macroblock (MB). These give a high coding efficiency for multiview video coding (MVC), however, they cause a very high computational complexity in encoding system. This paper proposes to reduce the complexities of the ME and DE processes with an early SKIP mode decision algorithm based on the analysis of prediction mode distribution regarding the corresponding MBs in the neighbor view. In this method, the mode decision procedures of most of MBs can be early terminated, and thus much of computation for ME and DE can be greatly reduced. Simulation results demonstrate that our algorithm can achieve computational saving of 46-57% (depending on the tested sequences) with no significant loss of rate-distortion performance.

Journal ArticleDOI
TL;DR: Results obtained reveal that the proposed environment has a better solution providing: a scriptable program to establish the communication between the field programmable gate array (FPGA) with IP cores and their host application, power consumption estimation for partial reconfiguration area and automatic generation of the partial and initial bitstreams.
Abstract: This paper describes a dynamic partial reconfiguration (DPR) design flow and environment for image and signal processing algorithms used in adaptive applications. Based on the evaluation of the existing DPR design flow, important features such as overall flexibility, application and standardised interfaces, host applications and DPR area/size placement have been taken into consideration in the proposed design flow and environment. Three intellectual property (IP) cores used in pre-processing and transform blocks of compression systems including colour space conversion (CSC), two-dimensional biorthogonal discrete wavelet transform (2-D DBWT) and three-dimensional Haar wavelet transform (3-D HWT) have been selected to validate the proposed DPR design flow and environment. Results obtained reveal that the proposed environment has a better solution providing: a scriptable program to establish the communication between the field programmable gate array (FPGA) with IP cores and their host application, power consumption estimation for partial reconfiguration area and automatic generation of the partial and initial bitstreams. The design exploration offered by the proposed DPR environment allows the generation of efficient IP cores with optimised area/speed ratios. Analysis of the bitstream size and dynamic power consumption for both static and reconfigurable areas is also presented in this paper.

Journal ArticleDOI
TL;DR: The proposed approach of data hiding in 3D objects is based on minimum spanning tree (MST), which is lossless in the sense that the positions of the vertices are remaining the same before and after embedding.
Abstract: Data hiding has become increasingly important for many applications, like confidential transmission, video surveillance, military and medical applications. In this paper we present a new approach of 3D object data hiding without changing the position of vertices in the 3D space. The main idea of the proposed method is to find and to synchronize particular areas of the 3D objects used to embed the message. The embedding is carried out by changing the connectivity of edges in the selected areas composed of quadruples. The proposed approach of data hiding in 3D objects is based on minimum spanning tree (MST). This method is lossless in the sense that the positions of the vertices are remaining the same before and after embedding. Moreover the method is blind and does not depend of the order of the data in the files. This approach is very interesting when the 3D objects have been digitalized with high precision.

Journal ArticleDOI
TL;DR: A novel tag refinement technique that aims at differentiating noisy tag assignments from correct tag assignments is discussed, which increases the effectiveness of tag recommendation for non-annotated images with 45% when using the P@5 metric and with 41% when use the NDCG metric.
Abstract: Noisy tag assignments lower the effectiveness of multimedia applications that rely on the availability of user-supplied tags for retrieving user-contributed images for further processing. This paper discusses a novel tag refinement technique that aims at differentiating noisy tag assignments from correct tag assignments. The correctness of tag assignments is determined through the combined use of visual similarity and tag co-occurrence statistics. To verify the effectiveness of our tag refinement technique, experiments were performed with user-contributed images retrieved from Flickr. For the image set used, the proposed tag refinement technique reduces the number of noisy tag assignments with 36% (benefit), while removing 10% of the correct tag assignments (cost). In addition, our tag refinement technique increases the effectiveness of tag recommendation for non-annotated images with 45% when using the P@5 metric and with 41% when using the NDCG metric.

Journal ArticleDOI
TL;DR: Experimental results show that the improved H.264/AVC comprehensive video encryption scheme is efficient in computing, the encryption process does not affect the compression ratio greatly, and the encryption/decryption process hardly affects the video quality.
Abstract: An improved H.264/AVC comprehensive video encryption scheme is proposed. In the proposed scheme, the intra-prediction mode, motion vector difference, and quantization coefficients are encrypted. A novel hierarchical key generation method is likewise proposed, in which the encryption keys are generated based on the cryptographic hash function. Generated frame keys are consistent with the corresponding frame serial numbers, which can ensure frame synchronization in the decrypting process when frame loss occurs. This function provides the property that our scheme is secure against some special attacks for video, such as the frame regrouping attack and frame erasure attack. Our method not only avoids the distribution of encryption keys, but also increases the security. Experimental results show that the proposed scheme is efficient in computing, the encryption process does not affect the compression ratio greatly, and the encryption/decryption process hardly affects the video quality.

Journal ArticleDOI
TL;DR: This paper proposes a novel Wyner-Ziv successive refinement approach to improve the motion compensation accuracy and the overall compression efficiency of Wyner -Ziv video coding.
Abstract: Wyner-Ziv coding enables low complexity video encoding with the motion estimation procedure shifted to the decoder. However, the accuracy of decoder motion estimation is often low, due to the absence of the input source frame (at the decoder). In this paper, we propose a novel Wyner-Ziv successive refinement approach to improve the motion compensation accuracy and the overall compression efficiency of Wyner-Ziv video coding. Our approach encodes each frame by multiple Wyner-Ziv coding layers and uses the progressively refined reconstruction frame to guide the motion estimation for progressively improved accuracy. The proposed approach yields competitive results against state-of-the-art low complexity Wyner-Ziv video coding approaches, and can gain up to 3.8dB over the conventional Wyner-Ziv video coding approach and up to 1.5dB over the previous bitplane-based refinement approach. Furthermore, this paper also presents the rate distortion analysis and the performance comparison of the proposed approach and conventional approaches. The rate distortion performance loss (due to performing decoder motion estimation) is at most 2.17dB (or equivalently 14nats/pixel) in our scheme according to our analysis, but can be more than 6dB in the conventional approach according to previous research. For the simplified two-layers case of our approach, we derive the optimal subsampling ratio in the sense of rate distortion performance. We also extend our analysis and conclusions from P frame to B frame. Finally, we verify our analysis by experimental results.

Journal ArticleDOI
TL;DR: A scalable indexing of video content by objects is proposed using the wavelet decomposition used in the JPEG2000 standard, which relies on the combination of robust global motion estimation with morphological colour segmentation at a low spatial resolution.
Abstract: With exponentially growing quantity of video content in various formats, including the popularisation of HD (High Definition) video and cinematographic content, the problem of efficient indexing and retrieval in video databases becomes crucial. Despite efficient methods have been designed for the frame-based queries on video with local features, object-based indexing and retrieval attract attention of research community by the seducing possibility to formulate meaningful queries on semantic objects. In the case of HD video, the principle of scalability addressed by actual compression standards is of great importance. It allows for indexing and retrieval on the lower resolution available in the compressed bit-stream. The wavelet decomposition used in the JPEG2000 standard provides this property. In this paper, we propose a scalable indexing of video content by objects. First, a method for scalable moving object extraction is designed. Using the wavelet data, it relies on the combination of robust global motion estimation with morphological colour segmentation at a low spatial resolution. It is then refined using the scalable order of data. Second, a descriptor is built only on the objects extracted. This descriptor is based on multi-scale histograms of wavelet coefficients of objects. Comparison with SIFT features extracted on segmented object masks gives promising results.

Journal ArticleDOI
TL;DR: Considering the statistical characteristics of residual data in lossless video coding, this work newly design each entropy coding method based on the conventional entropy coders in H.264/AVC with the result that the proposed method provides not only positive bit-saving of 8% but also reduced computational complexity compared to the current H. 264/ AVC lossless coding mode.
Abstract: Context-based adaptive variable length coding (CAVLC) and context-based adaptive binary arithmetic coding (CABAC) are entropy coding methods employed in the H.264/AVC standard. Since these entropy coders are originally designed for encoding residual data, which are zigzag scanned and quantized transform coefficients, they cannot provide adequate coding performance for lossless video coding where residual data are not quantized transform coefficients, but the differential pixel values between the original and predicted pixel values. Therefore, considering the statistical characteristics of residual data in lossless video coding, we newly design each entropy coding method based on the conventional entropy coders in H.264/AVC. From the experimental result, we have verified that the proposed method provides not only positive bit-saving of 8% but also reduced computational complexity compared to the current H.264/AVC lossless coding mode.

Journal ArticleDOI
TL;DR: This paper surveys the CQ problem and provides a detailed analytical formulation of it, allowing to shed light on some details of the optimization process, finding that state-of-the-art algorithms have a suboptimal step.
Abstract: Context-based lossless coding suffers in many cases from the so-called context dilution problem, which arises when, in order to model high-order statistic dependencies among data, a large number of contexts is used. In this case the learning process cannot be fed with enough data, and so the probability estimation is not reliable. To avoid this problem, state-of-the-art algorithms for lossless image coding resort to context quantization (CQ) into a few conditioning states, whose statistics are easier to estimate in a reliable way. It has been early recognized that in order to achieve the best compression ratio, contexts have to be grouped according to a maximal mutual information criterion. This leads to quantization algorithms which are able to determine a local minimum of the coding cost in the general case, and even the global minimum in the case of binary-valued input. This paper surveys the CQ problem and provides a detailed analytical formulation of it, allowing to shed light on some details of the optimization process. As a consequence we find that state-of-the-art algorithms have a suboptimal step. The proposed approach allows a steeper path toward the cost function minimum. Moreover, some sufficient conditions are found that allow to find a globally optimal solution even when the input alphabet is not binary. Even though the paper mainly focuses on the theoretical aspects of CQ, a number of experiments to validate the proposed method have been performed (for the special case of segmentation map lossless coding), and encouraging results have been recorded.

Journal ArticleDOI
TL;DR: A semantic framework for weakly supervised video genre classification and event analysis jointly by using probabilistic models for MPEG video streams is presented and several computable semantic features that can accurately reflect the event attributes are derived.
Abstract: Semantic video analysis is a key issue in digital video applications, including video retrieval, annotation, and management. Most existing work on semantic video analysis is mainly focused on event detection for specific video genres, while the genre classification is treated as another independent issue. In this paper, we present a semantic framework for weakly supervised video genre classification and event analysis jointly by using probabilistic models for MPEG video streams. Several computable semantic features that can accurately reflect the event attributes are derived. Based on an intensive analysis on the connection between video genres and the contextual relationship among events, as well as the statistical characteristics of dominant event, a hidden Markov model (HMM) and naive Bayesian classifier (NBC) based analysis algorithm is proposed for video genre classification. Another Gaussian mixture model (GMM) is built to detect the contained events using the same semantic features, whilst an event adjustment strategy is proposed according to an analysis on the GMM structure and pre-definition of video events. Subsequently, a special event is recognized based on the detected events by another HMM. The simulative experiments on video genre classification and event analysis using a large number of video data sets demonstrate the promising performance of the proposed framework for semantic video analysis.

Journal ArticleDOI
TL;DR: A rate-distortion model is derived that takes into account the position of the side information in the quantization bin and is used to perform mode decision at the coefficient level and bitplane level.
Abstract: Distributed video coding (DVC) features simple encoders but complex decoders, which lies in contrast to conventional video compression solutions such as H.264/AVC. This shift in complexity is realized by performing motion estimation at the decoder side instead of at the encoder, which brings a number of problems that need to be dealt with. One of these problems is that, while employing different coding modes yields significant coding gains in classical video compression systems, it is still difficult to fully exploit this in DVC without increasing the complexity at the encoder side. Therefore, in this paper, instead of using an encoder-side approach, techniques for decoder-side mode decision are proposed. A rate-distortion model is derived that takes into account the position of the side information in the quantization bin. This model is then used to perform mode decision at the coefficient level and bitplane level. Average rate gains of 13-28% over the state-of-the-art DISCOVER codec are reported, for a GOP of size four, for several test sequences.

Journal ArticleDOI
TL;DR: An objective model is proposed to predict overall video quality by integrating the contributions of a spatial quality and a temporal quality, and the non-linear model shows a very high linear correlation with subjective data.
Abstract: Video services have appeared in the recent years due to advances in video coding and convergence to IP networks. As these emerging services mature, the ability to deliver adequate quality to end-users becomes increasingly important. However, the transmission of digital video over error-prone and bandwidth-limited networks may produce spatial and temporal visual distortions in the decoded video. Both types of impairments affect the perceived video quality. In this paper, we examine the impact of spatio-temporal artefacts in video and especially how both types of errors interact to affect the overall perceived video quality. We show that the impact of the spatial quality on overall video quality is dependent on the temporal quality and vice-versa. We observe that the introduction of a degradation in one modality affects the quality perception in the other modality, and this change is larger for high-quality conditions than for low-quality conditions. The contribution of the spatial quality to the overall quality is found to be greater than the contribution of the temporal quality. Our results also indicate that low-motion talking-head content can be more negatively affected by temporal frame freezing artefacts than other general type of content with higher motion. Based on the results of a subjective experiment, we propose an objective model to predict overall video quality by integrating the contributions of a spatial quality and a temporal quality. The non-linear model shows a very high linear correlation with subjective data.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed adaptive morphological dilation image coding with context weights prediction outperforms the state of the art image coding algorithms available today.
Abstract: This paper proposes an adaptive morphological dilation image coding with context weights prediction. The new dilation method is not to use fixed models, but to decide whether a coefficient needs to be dilated or not according to the coefficient's predicted significance degree. It includes two key dilation technologies: (1) controlling dilation process with context weights to reduce the output of insignificant coefficients and (2) using variable-length group test coding with context weights to adjust the coding order and cost as few bits as possible to present the events with large probability. Moreover, we also propose a novel context weight strategy to predict a coefficient's significance degree more accurately, which can be used for two dilation technologies. Experimental results show that our proposed method outperforms the state of the art image coding algorithms available today.

Journal ArticleDOI
TL;DR: Overall architectures are obtained that provide a trade-off between computational complexity and rate-distortion performance and are significantly reduced when compared to cascaded decoder-encoder solutions, which are typically used for H.264/AVC transcoding.
Abstract: In this paper, efficient solutions for requantization transcoding in H264/AVC are presented By requantizing residual coefficients in the bitstream, different error components can appear in the transcoded video stream Firstly, a requantization error is present due to successive quantization in encoder and transcoder In addition to the requantization error, the loss of information caused by coarser quantization will propagate due to dependencies in the bitstream Because of the use of intra prediction and motion-compensated prediction in H264/AVC, both spatial and temporal drift propagation arise in transcoded H264/AVC video streams The spatial drift in intra-predicted blocks results from mismatches in the surrounding prediction pixels as a consequence of requantization In this paper, both spatial and temporal drift components are analyzed As is shown, spatial drift has a determining impact on the visual quality of transcoded video streams in H264/AVC In particular, this type of drift results in serious distortion and disturbing artifacts in the transcoded video stream In order to avoid the spatially propagating distortion, we introduce transcoding architectures based on spatial compensation techniques By combining the individual temporal and spatial compensation approaches and applying different techniques based on the picture and/or macroblock type, overall architectures are obtained that provide a trade-off between computational complexity and rate-distortion performance The complexity of the presented architectures is significantly reduced when compared to cascaded decoder-encoder solutions, which are typically used for H264/AVC transcoding The reduction in complexity is particularly large for the solution which uses spatial compensation only When compared to traditional solutions without spatial compensation, both visual and objective quality results are highly improved

Journal ArticleDOI
TL;DR: This paper focuses on developing a fast, as well as well-performed video scene detection method, in which multiple features extracted from the video, determines the video scene boundaries through an unsupervised clustering procedure.
Abstract: One of the fundamental steps in organizing videos is to parse it in smaller descriptive parts. One way of realizing this step is to obtain shot or scene information. One or more consecutive semantically correlated shots sharing the same content construct video scenes. On the other hand, video scenes are different from the shots in the sense of their boundary definitions; video scenes have semantic boundaries and shots are defined with physical boundaries. In this paper, we concentrate on developing a fast, as well as well-performed video scene detection method. Our graph partition based video scene boundary detection approach, in which multiple features extracted from the video, determines the video scene boundaries through an unsupervised clustering procedure. For each video shot to shot comparison feature, a one-dimensional signal is constructed by graph partitions obtained from the similarity matrix in a temporal interval. After each one-dimensional signal is filtered, an unsupervised clustering is conducted for finding video scene boundaries. We adopt two different graph-based approaches in a single framework in order to find video scene boundaries. The proposed graph-based video scene boundary detection method is evaluated and compared with the graph-based video scene detection method presented in literature.

Journal ArticleDOI
TL;DR: By combining the above two aspects together, a new distance measure is defined, and a novel approach to automatically determine the edge weights in graph-based semi-supervised learning is proposed.
Abstract: We address the task of view-based 3D object retrieval, in which each object is represented by a set of views taken from different positions, rather than a geometrical model based on polygonal meshes. As the number of views and the view point setting cannot always be the same for different objects, the retrieval task is more challenging and the existing methods for 3D model retrieval are infeasible. In this paper, the information in the sets of views is exploited from two aspects. On the one hand, the form of histogram is converted from vector to state sequence, and Markov chain (MC) is utilized for modeling the statistical characteristics of all the views representing the same object. On the other hand, the earth mover's distance (EMD) is involved to achieve many-to-many matching between two sets of views. For 3D object retrieval, by combining the above two aspects together, a new distance measure is defined, and a novel approach to automatically determine the edge weights in graph-based semi-supervised learning is proposed. Experimental results on different databases demonstrate the effectiveness of our proposal.

Journal ArticleDOI
TL;DR: The proposed VBDF can reduce the blocking artifacts, prevent excessive blurring effects, and achieve about 30-40% computational speedup at about the same PSNR compared with the existing methods.
Abstract: H.264/AVC supports variable block motion compensation, multiple reference frames, 1/4-pixel motion vector accuracy, and in-loop deblocking filter, compared with previous video coding standards. While these coding techniques are major functions for video compression improvement, they lead to high computational complexity at the same time. For the H.264 video coding techniques to be actually applied on low-end/low-bit rates terminals more extensively, it is essential to improve the coding efficiency. Currently the H.264 deblocking filter, which can improve the subjective quality of video, is hardly used on low-end terminals due to computational complexity. In this paper, we propose an enhanced method of deblocking filter that efficiently reduces the blocking artifacts occurring during the low-bit rates video coding. In the 'variable block-based deblocking filter (VBDF)' proposed in this paper, the temporal and spatial characteristics of moving pictures are extracted using the variable block-size information of motion compensation, the filter mode is classified into four different modes according to the moving-picture characteristics, and the adaptive filtering is executed in the separate modes. The proposed VBDF can reduce the blocking artifacts, prevent excessive blurring effects, and achieve about 30-40% computational speedup at about the same PSNR compared with the existing methods.

Journal ArticleDOI
TL;DR: A correlation model able to adapt to changes in the content and the coding parameters by exploiting the spatial correlation of the video signal and the quantization distortion is developed and experiments suggest that the performance of distributed coders can be significantly improved by taking video content and coding parameters into account.
Abstract: Aiming for low-complexity encoding, video coders based on Wyner-Ziv theory are still unsuccessfully trying to match the performance of predictive video coders. One of the most important factors concerning the coding performance of distributed coders is modeling and estimating the correlation between the original video signal and its temporal prediction generated at the decoder. One of the problems of the state-of-the-art correlation estimators is that their performance is not consistent across a wide range of video content and different coding settings. To address this problem we have developed a correlation model able to adapt to changes in the content and the coding parameters by exploiting the spatial correlation of the video signal and the quantization distortion. In this paper we describe our model and present experiments showing that our model provides average bit rate gains of up to 12% and average PSNR gains of up to 0.5dB when compared to the state-of-the-art models. The experiments suggest that the performance of distributed coders can be significantly improved by taking video content and coding parameters into account.

Journal ArticleDOI
Lei Wang1, Licheng Jiao1, Jiaji Wu1, Guangming Shi1, Yanjun Gong1 
TL;DR: Simulation results show that RTDLT-based compression system obtains comparable or even higher compression-ratio in lossless compression than that of JPEG2000 and JPEG-LS, as well as gratifying rate distortion performance in lossy compression.
Abstract: In this paper, a reversible integer to integer time domain lapped transform (RTDLT) is introduced. TDLT can be taken as a combination of time domain pre- and post-filter modules with discrete cosine transform (DCT). Different from TDLT, the filters and DCT in our proposed RTDLT are realized from integer to integer by multi-lifting implementations after factorizing the filtering and transforming matrixes into triangular elementary reversible matrices (TERMs). Lifting implementations are realized by only shift and addition without any floating-point multiplier to reduce complexity. The proposed method can realize progressive lossy-to-lossless image compression with a single bit-stream. Simulation results show that RTDLT-based compression system obtains comparable or even higher compression-ratio in lossless compression than that of JPEG2000 and JPEG-LS, as well as gratifying rate distortion performance in lossy compression. Besides, RTDLT keeps low-complexity in hardware realization because it can be parallel implemented on the block level.