scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Circuits and Systems for Video Technology in 2010"


Journal ArticleDOI
TL;DR: A fully pipelined stereo vision system providing a dense disparity image with additional sub-pixel accuracy in real-time in a single field programmable gate array (FPGA) without the necessity of any external devices is proposed.
Abstract: Stereo vision is a well-known ranging method because it resembles the basic mechanism of the human eye. However, the computational complexity and large amount of data access make real-time processing of stereo vision challenging because of the inherent instruction cycle delay within conventional computers. In order to solve this problem, the past 20 years of research have focused on the use of dedicated hardware architecture for stereo vision. This paper proposes a fully pipelined stereo vision system providing a dense disparity image with additional sub-pixel accuracy in real-time. The entire stereo vision process, such as rectification, stereo matching, and post-processing, is realized using a single field programmable gate array (FPGA) without the necessity of any external devices. The hardware implementation is more than 230 times faster when compared to a software program operating on a conventional computer, and shows stronger performance over previous hardware-related studies.

292 citations


Journal ArticleDOI
TL;DR: In this letter, an enhanced pixel domain JND model with a new algorithm for CM estimation is proposed, and the proposed one shows its advantages brought by the better EM and TM estimation.
Abstract: In just noticeable difference (JND) models, evaluation of contrast masking (CM) is a crucial step. More specifically, CM due to edge masking (EM) and texture masking (TM) needs to be distinguished due to the entropy masking property of the human visual system. However, TM is not estimated accurately in the existing JND models since they fail to distinguish TM from EM. In this letter, we propose an enhanced pixel domain JND model with a new algorithm for CM estimation. In our model, total-variation based image decomposition is used to decompose an image into structural image (i.e., cartoon like, piecewise smooth regions with sharp edges) and textural image for estimation of EM and TM, respectively. Compared with the existing models, the proposed one shows its advantages brought by the better EM and TM estimation. It has been also applied to noise shaping and visual distortion gauge, and favorable results are demonstrated by experiments on different images.

218 citations


Journal ArticleDOI
TL;DR: A foveation model as well as a foveated JND (FJND) model in which the spatial and temporal JND models are enhanced to account for the relationship between visibility and eccentricity is described.
Abstract: Traditional video compression methods remove spatial and temporal redundancy based on the signal statistical correlation. However, to reach higher compression ratios without perceptually degrading the reconstructed signal, the properties of the human visual system (HVS) need to be better exploited. Research effort has been dedicated to modeling the spatial and temporal just-noticeable-distortion (JND) based on the sensitivity of the HVS to luminance contrast, and accounting for spatial and temporal masking effects. This paper describes a foveation model as well as a foveated JND (FJND) model in which the spatial and temporal JND models are enhanced to account for the relationship between visibility and eccentricity. Since the visual acuity decreases when the distance from the fovea increases, the visibility threshold increases with increased eccentricity. The proposed FJND model is then used for macroblock (MB) quantization adjustment in H.264/advanced video coding (AVC). For each MB, the quantization parameter is optimized based on its FJND information. The Lagrange multiplier in the rate-distortion optimization is adapted so that the MB noticeable distortion is minimized. The performance of the FJND model has been assessed with various comparisons and subjective visual tests. It has been shown that the proposed FJND model can increase the visual quality versus rate performance of the H.264/AVC video coding scheme.

194 citations


Journal ArticleDOI
TL;DR: This paper proposes and analyze a new method for identifying fire in videos that analyzes the frame-to-frame changes of specific low-level features describing potential fire regions and combines them according to the Bayes classifier for robust fire recognition.
Abstract: Automated fire detection is an active research topic in computer vision. In this paper, we propose and analyze a new method for identifying fire in videos. Computer vision-based fire detection algorithms are usually applied in closed-circuit television surveillance scenarios with controlled background. In contrast, the proposed method can be applied not only to surveillance but also to automatic video classification for retrieval of fire catastrophes in databases of newscast content. In the latter case, there are large variations in fire and background characteristics depending on the video instance. The proposed method analyzes the frame-to-frame changes of specific low-level features describing potential fire regions. These features are color, area size, surface coarseness, boundary roughness, and skewness within estimated fire regions. Because of flickering and random characteristics of fire, these features are powerful discriminants. The behavioral change of each one of these features is evaluated, and the results are then combined according to the Bayes classifier for robust fire recognition. In addition, a priori knowledge of fire events captured in videos is used to significantly improve the classification results. For edited newscast videos, the fire region is usually located in the center of the frames. This fact is used to model the probability of occurrence of fire as a function of the position. Experiments illustrated the applicability of the method.

194 citations


Journal ArticleDOI
TL;DR: A novel video compression scheme based on a highly flexible hierarchy of unit representation which includes three block concepts: coding unit (CU), prediction unit (PU), and transform unit (TU), which was a candidate in the competitive phase of the high-efficiency video coding (HEVC) standardization work.
Abstract: This paper proposes a novel video compression scheme based on a highly flexible hierarchy of unit representation which includes three block concepts: coding unit (CU), prediction unit (PU), and transform unit (TU). This separation of the block structure into three different concepts allows each to be optimized according to its role; the CU is a macroblock-like unit which supports region splitting in a manner similar to a conventional quadtree, the PU supports nonsquare motion partition shapes for motion compensation, while the TU allows the transform size to be defined independently from the PU. Several other coding tools are extended to arbitrary unit size to maintain consistency with the proposed design, e.g., transform size is extended up to 64 × 64 and intraprediction is designed to support an arbitrary number of angles for variable block sizes. Other novel techniques such as a new noncascading interpolation Alter design allowing arbitrary motion accuracy and a leaky prediction technique using both open-loop and closed-loop predictors are also introduced. The video codec described in this paper was a candidate in the competitive phase of the high-efficiency video coding (HEVC) standardization work. Compared to H.264/AVC, it demonstrated bit rate reductions of around 40% based on objective measures and around 60% based on subjective testing with 1080 p sequences. It has been partially adopted into the first standardization model of the collaborative phase of the HEVC effort.

193 citations


Journal ArticleDOI
TL;DR: A novel readable data-hiding algorithm, which can embed data into the quantized discrete cosine transform (DCT) coefficients of I frames without bringing any intra-frame distortion drift into the H.264/AVC video host, is presented.
Abstract: Intra-frame distortion drift is a big problem of data hiding in H.264/AVC video streams. Based on a thorough investigation of this problem, a novel readable data-hiding algorithm, which can embed data into the quantized discrete cosine transform (DCT) coefficients of I frames without bringing any intra-frame distortion drift into the H.264/advanced video coding (AVC) video host, is presented in this paper. We exploit several paired-coefficients of a 4 × 4 DCT block to accumulate the embedding induced distortion. The directions of intra-frame prediction are utilized to avert the distortion drift. It is proved analytically and shown experimentally that the proposed algorithm can achieve high embedding capacity and low visual distortion. Performance comparisons with other existing schemes are provided to demonstrate the superiority of the proposed scheme.

181 citations


Journal ArticleDOI
TL;DR: Experimental results upon face and other datasets demonstrate the advantages of the proposed L1-norm-based tensor analysis (TPCA-L1), which is robust to outliers.
Abstract: Tensor analysis plays an important role in modern image and vision computing problems. Most of the existing tensor analysis approaches are based on the Frobenius norm, which makes them sensitive to outliers. In this paper, we propose L1-norm-based tensor analysis (TPCA-L1), which is robust to outliers. Experimental results upon face and other datasets demonstrate the advantages of the proposed approach.

181 citations


Journal ArticleDOI
TL;DR: A video coding architecture is described that is based on nested and pre-configurable quadtree structures for flexible and signal-adaptive picture partitioning that was ranked among the five best performing proposals, both in terms of subjective and objective quality.
Abstract: -A video coding architecture is described that is based on nested and pre-configurable quadtree structures for flexible and signal-adaptive picture partitioning. The primary goal of this partitioning concept is to provide a high degree of adaptability for both temporal and spatial prediction as well as for the purpose of space-frequency representation of prediction residuals. At the same time, a leaf merging mechanism is included in order to prevent excessive partitioning of a picture into prediction blocks and to reduce the amount of bits for signaling the prediction signal. For fractional-sample motion-compensated prediction, a fixed-point implementation of the maximal-order minimum-support algorithm is presented that uses a combination of infinite impulse response and FIR filtering. Entropy coding utilizes the concept of probability interval partitioning entropy codes that offers new ways for parallelization and enhanced throughput. The presented video coding scheme was submitted to a joint call for proposals of ITU-T Visual Coding Experts Group and ISO/IEC Moving Picture Experts Group and was ranked among the five best performing proposals, both in terms of subjective and objective quality.

171 citations


Journal ArticleDOI
TL;DR: Coding efficiency improvements are achieved with lower complexity than the H.264/AVC Baseline Profile, particularly suiting the proposal for high resolution, high quality applications in resource-constrained environments.
Abstract: This paper describes a low complexity video codec with high coding efficiency. It was proposed to the high efficiency video coding (HEVC) standardization effort of moving picture experts group and video coding experts group, and has been partially adopted into the initial HEVC test model under consideration design. The proposal utilizes a quadtree-based coding structure with support for macroblocks of size 64 × 64, 32 × 32, and 16 × 16 pixels. Entropy coding is performed using a low complexity variable length coding scheme with improved context adaptation compared to the context adaptive variable length coding design in H.264/AVC. The proposal's interpolation and deblocking filter designs improve coding efficiency, yet have low complexity. Finally, intra-picture coding methods have been improved to provide better subjective quality than H.264/AVC. The subjective quality of the proposed codec has been evaluated extensively within the HEVC project, with results indicating that similar visual quality to H.264/AVC High Profile anchors is achieved, measured by mean opinion score, using significantly fewer bits. Coding efficiency improvements are achieved with lower complexity than the H.264/AVC Baseline Profile, particularly suiting the proposal for high resolution, high quality applications in resource-constrained environments.

156 citations


Journal ArticleDOI
TL;DR: A novel no-reference metric that can automatically quantify ringing annoyance in compressed images is presented and shows to be highly consistent with subjective data.
Abstract: A novel no-reference metric that can automatically quantify ringing annoyance in compressed images is presented. In the first step a recently proposed ringing region detection method extracts the regions which are likely to be impaired by ringing artifacts. To quantify ringing annoyance in these detected regions, the visibility of ringing artifacts is estimated, and is compared to the activity of the corresponding local background. The local annoyance score calculated for each individual ringing region is averaged over all ringing regions to yield a ringing annoyance score for the whole image. A psychovisual experiment is carried out to measure ringing annoyance subjectively and to validate the proposed metric. The performance of our metric is compared to existing alternatives in literature and shows to be highly consistent with subjective data.

150 citations


Journal ArticleDOI
TL;DR: To evaluate the performance of VQA algorithms for the specific task of H.264 advanced video coding compressed video transmission over wireless networks, a subjective study involving 160 distorted videos is conducted.
Abstract: Evaluating the perceptual quality of video is of tremendous importance in the design and optimization of wireless video processing and transmission systems. In an endeavor to emulate human perception of quality, various objective video quality assessment (VQA) algorithms have been developed. However, the only subjective video quality database that exists on which these algorithms can be tested is dated and does not accurately reflect distortions introduced by present generation encoders and/or wireless channels. In order to evaluate the performance of VQA algorithms for the specific task of H.264 advanced video coding compressed video transmission over wireless networks, we conducted a subjective study involving 160 distorted videos. Various leading full reference VQA algorithms were tested for their correlation with human perception. The data from the paper has been made available to the research community, so that further research on new VQA algorithms and on the general area of VQA may be carried out.

Journal ArticleDOI
TL;DR: A high-performance hardware-friendly disparity estimation algorithm called mini-census adaptive support weight (MCADSW) is proposed and its corresponding real-time very large scale integration (VLSI) architecture is proposed.
Abstract: High-performance real-time stereo vision system is crucial to various stereo vision applications, such as robotics, autonomous vehicles, multiview video coding, freeview TV, and 3-D video conferencing. In this paper, we proposed a high-performance hardware-friendly disparity estimation algorithm called mini-census adaptive support weight (MCADSW) and also proposed its corresponding real-time very large scale integration (VLSI) architecture. To make the proposed MCADSW algorithm hardware-friendly, we proposed simplification techniques such as using mini-census, removing proximity weight, using YUV color representation, using Manhattan color distance, and using scaled-and-truncate weight approximation. After applied these simplifications, the MCADSW algorithm was not only hardware-friendly, but was also 1.63 times faster. In the corresponding real-time VLSI architecture, we proposed partial column reuse and access reduction with expanded window to significantly reduce the bandwidth requirement. The proposed architecture was implemented using United Microelectronics Corporation (UMC) 90 nm complementary metal-oxide-semiconductor technology and can achieve a disparity estimation frame rate of 42 frames/s for common intermediate format size images when clocked at 95 MHz. The synthesized gate-count and memory size is 563 k and 21.3 kB, respectively.

Journal ArticleDOI
TL;DR: A predictive Lagrange multiplier estimation method is developed to resolve the chicken and egg dilemma of perceptual-based RDO and apply it to H.264 intra and inter mode decision.
Abstract: The rate-distortion optimization (RDO) framework for video coding achieves a tradeoff between bit-rate and quality. However, objective distortion metrics such as mean squared error traditionally used in this framework are poorly correlated with perceptual quality. We address this issue by proposing an approach that incorporates the structural similarity index as a quality metric into the framework. In particular, we develop a predictive Lagrange multiplier estimation method to resolve the chicken and egg dilemma of perceptual-based RDO and apply it to H.264 intra and inter mode decision. Given a perceptual quality level, the resulting video encoder achieves on the average 9% bit-rate reduction for intra-frame coding and 11% for inter-frame coding over the JM reference software. Subjective test further confirms that, at the same bit-rate, the proposed perceptual RDO indeed preserves image details and prevents block artifact better than traditional RDO.

Journal ArticleDOI
TL;DR: An algorithm to extract initial color backgrounds from surveillance videos using a probability-based background extraction algorithm that can be extracted accurately and quickly, while using relatively little memory is proposed.
Abstract: A video-based monitoring system must be capable of continuous operation under various weather and illumination conditions. Moreover, background subtraction is a very important part of surveillance applications for successful segmentation of objects from video sequences, and the accuracy, computational complexity, and memory requirements of the initial background extraction are crucial in any background subtraction method. This paper proposes an algorithm to extract initial color backgrounds from surveillance videos using a probability-based background extraction algorithm. With the proposed algorithm, the initial background can be extracted accurately and quickly, while using relatively little memory. The intrusive objects can then be segmented quickly and correctly by a robust object segmentation algorithm. The segmentation algorithm analyzes the threshold values of the background subtraction from the prior frame to obtain good quality while minimizing execution time and maximizing detection accuracy. The color background images can be extracted efficiently and quickly from color image sequences and updated in real time to overcome any variation in illumination conditions. Experimental results for various environmental sequences and a quantitative evaluation are provided to demonstrate the robustness, accuracy, effectiveness, and memory economy of the proposed algorithm.

Journal ArticleDOI
TL;DR: The results obtained by performance evaluations using MPEG-4 coded video streams have demonstrated the effectiveness of the proposed NR video quality metric.
Abstract: A no-reference (NR) quality measure for networked video is introduced using information extracted from the compressed bit stream without resorting to complete video decoding. This NR video quality assessment measure accounts for three key factors which affect the overall perceived picture quality of networked video, namely, picture distortion caused by quantization, quality degradation due to packet loss and error propagation, and temporal effects of the human visual system. First, the picture quality in the spatial domain is measured, for each frame, relative to quantization under an error-free transmission condition. Second, picture quality is evaluated with respect to packet loss and the subsequent error propagation. The video frame quality in the spatial domain is, therefore, jointly determined by coding distortion and packet loss. Third, a pooling scheme is devised as the last step of the proposed quality measure to capture the perceived quality degradation in the temporal domain. The results obtained by performance evaluations using MPEG-4 coded video streams have demonstrated the effectiveness of the proposed NR video quality metric.

Journal ArticleDOI
TL;DR: The proposed FRUC method reduces the computation for refining motion vectors, but also suppresses the interpolation noises and misregistration errors and achieves about 3 dB on-average peak signal-to-noise ratio improvement.
Abstract: Frame rate up-conversion (FRUC) can enhance the visual quality of low frame rate video presented on liquid crystal display. To minimize the difference between a reference block and an interpolated block, an effective FRUC algorithm partitions a large block into several sub-blocks of smaller size and estimates their motions. Motion estimation searches for the block which has the minimum difference (cost) with the processed block in terms of some block matching distortion and motion discontinuity. As convexity and convergence of the cost function are not guaranteed, the computational cost for such motion estimation is usually extensive or unpredictable. In our proposed FRUC method, the two predictions of a frame to be interpolated are generated through shifting its nearest neighbor frames in the previous and following directions with the motion vectors estimated between them. The initial interpolated frame and its pixel's reliability are subsequently estimated from these two predictions. We then apply a trilateral filter on the initial prediction to correct the unreliable pixels and to restore the missing pixels. Our proposed method not only reduces the computation for refining motion vectors, but also suppresses the interpolation noises and misregistration errors. We have conducted extensive experiments and the results show that the proposed algorithm outperforms the existing methods with better objective and subjective visual quality, and achieves about 3 dB on-average peak signal-to-noise ratio improvement.

Journal ArticleDOI
Jaemoon Kim1, Chong-Min Kyung1
TL;DR: A lossless EC algorithm for HD video sequences and related hardware architecture is proposed that consists of a hierarchical prediction method based on pixel averaging and copying and significant bit truncation (SBT).
Abstract: Increasing the image size of a video sequence aggravates the memory bandwidth problem of a video coding system. Despite many embedded compression (EC) algorithms proposed to overcome this problem, no lossless EC algorithm able to handle high-definition (HD) size video sequences has been proposed thus far. In this paper, a lossless EC algorithm for HD video sequences and related hardware architecture is proposed. The proposed algorithm consists of two steps. The first is a hierarchical prediction method based on pixel averaging and copying. The second step involves significant bit truncation (SBT) which encodes prediction errors in a group with the same number of bits so that the multiple prediction errors are decoded in a clock cycle. The theoretical lower bound of the compression ratio of the SBT coding was also derived. Experimental results have shown a 60% reduction of memory bandwidth on average. Hardware implementation results have shown that a throughput of 14.2 pixels/cycle can be achieved with 36 K gates, which is sufficient to handle HD-size video sequences in real time.

Journal ArticleDOI
TL;DR: A new motion estimation algorithm for frame rate up-conversion that enhances the estimation accuracy of motion vectors by using the unidirectional and bidirectional matching ratios of blocks in the previous and current frames and uses motion vector validity to evaluate the accuracy ofmotion vectors thereby avoiding false motion vectors.
Abstract: In this letter, we present a new motion estimation algorithm for frame rate up-conversion. The proposed dual motion estimation algorithm enhances the estimation accuracy of motion vectors by using the unidirectional and bidirectional matching ratios of blocks in the previous and current frames. In addition, the proposed motion estimation approach uses motion vector validity to evaluate the accuracy of motion vectors thereby avoiding false motion vectors. In experiments using benchmark image sequences, the proposed motion estimation algorithm improved the average peak signal-to-noise ratio of interpolated frames by up to 2.272 dB, when compared to conventional motion estimation algorithms. For the comparison of the perceptual image quality using the structural similarity, the average value of the proposed dual motion estimation was by up to 0.062 higher than those of the conventional algorithms.

Journal ArticleDOI
TL;DR: Experimental results demonstrate that precise distortion estimation enables the proposed transmission system to achieve a significantly higher average video peak signal-to-noise ratio compared to a conventional content independent system.
Abstract: Efficient bit stream adaptation and resilience to packet losses are two critical requirements in scalable video coding for transmission over packet-lossy networks. Various scalable layers have highly distinct importance, measured by their contribution to the overall video quality. This distinction is especially more significant in the scalable H.264/advanced video coding (AVC) video, due to the employed prediction hierarchy and the drift propagation when quality refinements are missing. Therefore, efficient bit stream adaptation and unequal protection of these layers are of special interest in the scalable H.264/AVC video. This paper proposes an algorithm to accurately estimate the overall distortion of decoder reconstructed frames due to enhancement layer truncation, drift/error propagation, and error concealment in the scalable H.264/AVC video. The method recursively computes the total decoder expected distortion at the picture-level for each layer in the prediction hierarchy. This ensures low computational cost since it bypasses highly complex pixel-level motion compensation operations. Simulation results show an accurate distortion estimation at various channel loss rates. The estimate is further integrated into a cross-layer optimization framework for optimized bit extraction and content-aware channel rate allocation. Experimental results demonstrate that precise distortion estimation enables our proposed transmission system to achieve a significantly higher average video peak signal-to-noise ratio compared to a conventional content independent system.

Journal ArticleDOI
TL;DR: Experimental results show that the performance of the proposed approach is competitive when compared with state-of-the-art video denoising algorithms based on both peak signal-to-noise-ratio and structural similarity evaluations.
Abstract: We propose a video denoising algorithm based on a spatiotemporal Gaussian scale mixture model in the wavelet transform domain. This model simultaneously captures the local correlations between the wavelet coefficients of natural video sequences across both space and time. Such correlations are further strengthened with a motion compensation process, for which a Fourier domain noise-robust cross correlation algorithm is proposed for motion estimation. Bayesian least square estimation is used to recover the original video signal from the noisy observation. Experimental results show that the performance of the proposed approach is competitive when compared with state-of-the-art video denoising algorithms based on both peak signal-to-noise-ratio and structural similarity evaluations.

Journal ArticleDOI
TL;DR: Results show that the quality scores computed by the proposed algorithm are well correlated with the mean opinion scores associated to the subjective assessment of the no-reference quality assessment metric.
Abstract: This paper proposes a no-reference quality assessment metric for digital video subject to H.264/advanced video coding encoding. The proposed metric comprises two main steps: coding error estimation and perceptual weighting of this error. Error estimates are computed in the transform domain, assuming that discrete cosine transform (DCT) coefficients are corrupted by quantization noise. The DCT coefficient distributions are modeled using Cauchy or Laplace probability density functions, whose parameterization is performed using the quantized coefficient data and quantization steps. Parameter estimation is based on a maximum-likelihood estimation method combined with linear prediction. The linear prediction scheme takes advantage of the correlation between parameter values at neighbor DCT spatial frequencies. As for the perceptual weighting module, it is based on a spatiotemporal contrast sensitivity function applied to the DCT domain that compensates image plane movement by considering the movements of the human eye, namely smooth pursuit, natural drift, and saccadic movements. The video related inputs for the perceptual model are the motion vectors and the frame rate, which are also extracted from the encoded video. Subjective video quality assessment tests have been carried out in order to validate the results of the metric. A set of 11 video sequences, spanning a wide range of content, have been encoded at different bitrates and the outcome was subject to quality evaluation. Results show that the quality scores computed by the proposed algorithm are well correlated with the mean opinion scores associated to the subjective assessment.

Journal ArticleDOI
TL;DR: This paper proposes a variance- based algorithm for block size decision, an improved filter-based algorithm for prediction mode decision using contextual information, and a selection algorithm for intra block decision that exploits the relation between the rate-distortion characteristic and the best coding type.
Abstract: The spatial-domain intra prediction scheme of H.264 has high computational complexity, especially for the High Profile as it incorporates the additional intra 8 × 8 prediction mode. To address this issue, we explore the hierarchy of H.264 mode decision process in this paper and adopt an approach that is in synchrony with the mode decision hierarchy. In particular, we propose a variance-based algorithm for block size decision, an improved filter-based algorithm for prediction mode decision using contextual information, and a selection algorithm for intra block decision that exploits the relation between the rate-distortion characteristic and the best coding type. Performance comparison is provided to show the improvement of the proposed algorithms over previous methods.

Journal ArticleDOI
TL;DR: This work proposes a species-based particle swarm optimization algorithm for multiple object tracking, in which the global swarm is divided into many species according to the number of objects, and each species searches for its object and maintains track of it.
Abstract: Multiple object tracking is particularly challenging when many objects with similar appearances occlude one another. Most existing approaches concatenate the states of different objects, view the multi-object tracking as a joint motion estimation problem and search for the best state of the joint motion in a rather high dimensional space. However, this centralized framework suffers from a high computational load. We bring a new view to the tracking problem from a swarm intelligence perspective. In analogy with the foraging behavior of bird flocks, we propose a species-based particle swarm optimization algorithm for multiple object tracking, in which the global swarm is divided into many species according to the number of objects, and each species searches for its object and maintains track of it. The interaction between different objects is modeled as species competition and repulsion, and the occlusion relationship is implicitly deduced from the “power” of each species, which is a function of the image observations. Therefore, our approach decentralizes the joint tracker to a set of individual trackers, each of which tries to maximize its visual evidence. Experimental results demonstrate the efficiency and effectiveness of our method.

Journal ArticleDOI
TL;DR: The five papers in this special section were among those submitted in response to the joint call for proposals on high efficiency video coding (HEVC) standardization and cover most of the promising tools and technologies that seem likely to be included in the standard.
Abstract: The five papers in this special section were among those submitted in response to the joint call for proposals on high efficiency video coding (HEVC) standardization. Although at this point of development it is still unclear which specific elements the final HEVC standard will contain, the selection of the papers was made such that together they would cover most of the promising tools and technologies that seem likely to be included in the standard.

Journal ArticleDOI
TL;DR: Experimental results showed that the proposed framework can systematically determine the vacant space number, efficiently label ground and car regions, precisely locate the shadowed regions, and effectively tackle the problem of luminance variations.
Abstract: In this paper, from the viewpoint of scene under standing, a three-layer Bayesian hierarchical framework (BHF) is proposed for robust vacant parking space detection. In practice, the challenges of vacant parking space inference come from dramatic luminance variations, shadow effect, perspective distortion, and the inter-occlusion among vehicles. By using a hidden labeling layer between an observation layer and a scene layer, the BHF provides a systematic generative structure to model these variations. In the proposed BHF, the problem of luminance variations is treated as a color classification problem and is tack led via a classification process from the observation layer to the labeling layer, while the occlusion pattern, perspective distortion, and shadow effect are well modeled by the relationships between the scene layer and the labeling layer. With the BHF scheme, the detection of vacant parking spaces and the labeling of scene status are regarded as a unified Bayesian optimization problem subject to a shadow generation model, an occlusion generation model, and an object classification model. The system accuracy was evaluated by using outdoor parking lot videos captured from morning to evening. Experimental results showed that the proposed framework can systematically determine the vacant space number, efficiently label ground and car regions, precisely locate the shadowed regions, and effectively tackle the problem of luminance variations.

Journal ArticleDOI
TL;DR: Compared with the plain MS tracker, it is now much easier to incorporate online template adaptation to cope with inherent changes during the course of tracking, and a sophisticated online support vector machine is used.
Abstract: Kernel-based mean shift (MS) trackers have proven to be a promising alternative to stochastic particle filtering trackers. Despite its popularity, MS trackers have two fundamental drawbacks: 1) the template model can only be built from a single image, and 2) it is difficult to adaptively update the template model. In this paper, we generalize the plain MS trackers and attempt to overcome these two limitations. It is well known that modeling and maintaining a representation of a target object is an important component of a successful visual tracker. However, little work has been done on building a robust template model for kernel-based MS tracking. In contrast to building a template from a single frame, we train a robust object representation model from a large amount of data. Tracking is viewed as a binary classification problem, and a discriminative classification rule is learned to distinguish between the object and background. We adopt a support vector machine for training. The tracker is then implemented by maximizing the classification score. An iterative optimization scheme very similar to MS is derived for this purpose. Compared with the plain MS tracker, it is now much easier to incorporate online template adaptation to cope with inherent changes during the course of tracking. To this end, a sophisticated online support vector machine is used. We demonstrate successful localization and tracking on various data sets.

Journal ArticleDOI
Liquan Shen1, Zhi Liu1, Tao Yan1, Zhaoyang Zhang1, Ping An1 
TL;DR: A fast ME and DE algorithm that adaptively utilizes the inter-view correlation is proposed that can save 85% computational complexity on average, with negligible loss of coding efficiency.
Abstract: The emerging international standard for multiview video coding (MVC) is an extension of H.264/advanced video coding. In the joint mode of MVC, both motion estimation (ME) and disparity estimation (DE) are included in the encoding process. This achieves the highest coding efficiency but requires a very high computational complexity. In this letter, we propose a fast ME and DE algorithm that adaptively utilizes the inter-view correlation. The coding mode complexity and the motion homogeneity of a macroblock (MB) are first analyzed according to the coding modes and motion vectors from the corresponding MBs in the neighbor views, which are located by means of global disparity vector. According to the coding mode complexity and the motion homogeneity, the proposed algorithm adjusts the search strategies for different types of MBs in order to perform a precise search according to video content. Experimental results demonstrate that the proposed algorithm can save 85% computational complexity on average, with negligible loss of coding efficiency.

Journal ArticleDOI
TL;DR: A reversible DH-based approach for intra-frame error concealment based on the quantized discrete cosine transform coefficients of an H.264/AVC sequence tend to form a Laplace distribution, which is able to achieve no quality degradation.
Abstract: Error concealment plays an important role in robust video transmission. Recently, Chen and Leung presented an efficient data hiding-based (DH-based) approach to recover corrupted macroblocks from the intra-frame of an H.264/AVC sequence, but it suffers from the quality degradation problem. Since the quantized discrete cosine transform coefficients of an H.264/AVC sequence tend to form a Laplace distribution, we therefore propose a reversible DH-based approach for intra-frame error concealment based on this characteristic. Our design is able to achieve no quality degradation. Experimental results demonstrate that the quality of recovered video sequences obtained by our approach is indeed superior to that of the DH-based method. In addition, the quality advantage of our approach is illustrated when compared with the previous five related methods.

Journal ArticleDOI
TL;DR: Simulation shows outstanding robustness of the proposed scheme against common attacks, especially additive white noise and JPEG compression.
Abstract: A robust image watermarking scheme in the ridgelet transform domain is proposed in this paper. Due to the use of the ridgelet domain, sparse representation of an image which deals with line singularities is obtained. In order to achieve more robustness and transparency, the watermark data is embedded in selected blocks of the host image by modifying the amplitude of the ridgelet coefficients which represent the most energetic direction. Since the probability distribution function of the ridgelet coefficients is not known, we propose a universally optimum decoder to perform the watermark extraction in a distribution-independent fashion. Decoder extracts the watermark data using the variance of the ridgelet coefficients of the most energetic direction in each block. Furthermore, since the decoder needs the noise variance to perform decoding, a robust noise estimation scheme is proposed. Moreover, the implementation of error correction codes on the proposed method is investigated. Analytical derivation of bit error probability is also carried out and experimental results prove its accuracy. Simulation also shows outstanding robustness of the proposed scheme against common attacks, especially additive white noise and JPEG compression.

Journal ArticleDOI
TL;DR: A novel adaptive region-based image preprocessing scheme that enhances face images and facilitates the illumination invariant face recognition task, and is shown to be more suitable for dealing with uneven illuminations in face images.
Abstract: Variable illumination conditions, especially the side lighting effects in face images, form a main obstacle in face recognition systems. To deal with this problem, this paper presents a novel adaptive region-based image preprocessing scheme that enhances face images and facilitates the illumination invariant face recognition task. The proposed method first segments an image into different regions according to its different local illumination conditions, then both the contrast and the edges are enhanced regionally so as to alleviate the side lighting effect. Different from existing contrast enhancement methods, we apply the proposed adaptive region-based histogram equalization on the low-frequency coefficients to minimize illumination variations under different lighting conditions. Besides contrast enhancement, by observing that under poor illuminations the high-frequency features become more important in recognition, we propose enlarging the high-frequency coefficients to make face images more distinguishable. This procedure is called edge enhancement (EdgeE). The EdgeE is also region-based. Compared with existing image preprocessing methods, our method is shown to be more suitable for dealing with uneven illuminations in face images. Experimental results on the representative databases, the Yale B+Extended Yale B database and the Carnegie Mellon University-Pose, Illumination, and Expression database, show that the proposed method significantly improves the performance of face images with illumination variations. The proposed method does not require any modeling and model fitting steps and can be implemented easily. Moreover, it can be applied directly to any single image without using any lighting assumption, and any prior information on 3-D face geometry.