scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Circuits and Systems for Video Technology in 2009"


Journal ArticleDOI
TL;DR: This paper presents a reversible or lossless watermarking algorithm for images without using a location map in most cases that employs prediction errors to embed data into an image.
Abstract: This paper presents a reversible or lossless watermarking algorithm for images without using a location map in most cases. This algorithm employs prediction errors to embed data into an image. A sorting technique is used to record the prediction errors based on magnitude of its local variance. Using sorted prediction errors and, if needed, though rarely, a reduced size location map allows us to embed more data into the image with less distortion. The performance of the proposed reversible watermarking scheme is evaluated using different images and compared with four methods: those of Kamstra and Heijmans, Thodi and Rodriguez, and Lee et al. The results clearly indicate that the proposed scheme can embed more data with less distortion.

773 citations


Journal ArticleDOI
TL;DR: A binary tree structure is exploited to solve the problem of communicating pairs of peak points and distribution of pixel differences is used to achieve large hiding capacity while keeping the distortion low.
Abstract: In this letter, we present a reversible data hiding scheme based on histogram modification. We exploit a binary tree structure to solve the problem of communicating pairs of peak points. Distribution of pixel differences is used to achieve large hiding capacity while keeping the distortion low. We also adopt a histogram shifting technique to prevent overflow and underflow. Performance comparisons with other existing schemes are provided to demonstrate the superiority of the proposed scheme.

550 citations


Journal ArticleDOI
TL;DR: An area-based local stereo matching algorithm for accurate disparity estimation across all image regions, and is among the best performing local stereo methods according to the benchmark Middlebury stereo evaluation.
Abstract: We propose an area-based local stereo matching algorithm for accurate disparity estimation across all image regions. A well-known challenge to local stereo methods is to decide an appropriate support window for the pixel under consideration, adapting the window shape or the pixelwise support weight to the underlying scene structures. Our stereo method tackles this problem with two key contributions. First, for each anchor pixel an upright cross local support skeleton is adaptively constructed, with four varying arm lengths decided on color similarity and connectivity constraints. Second, given the local cross-decision results, we dynamically construct a shape-adaptive full support region on the fly, merging horizontal segments of the crosses in the vertical neighborhood. Approximating image structures accurately, the proposed method is among the best performing local stereo methods according to the benchmark Middlebury stereo evaluation. Additionally, it reduces memory consumption significantly thanks to our compact local cross representation. To accelerate matching cost aggregation performed in an arbitrarily shaped 2-D region, we also propose an orthogonal integral image technique, yielding a speedup factor of 5-15 over the straightforward integration.

511 citations


Journal ArticleDOI
TL;DR: A new embedding scheme is designed that helps to construct an efficient payload-dependent overflow location map that has good compressibility and accurate capacity control capability and reduces unnecessary alteration to the image.
Abstract: For difference-expansion (DE)-based reversible data hiding, the embedded bit-stream mainly consists of two parts: one part that conveys the secret message and the other part that contains embedding information, including the 2-D binary (overflow) location map and the header file. The first part is the payload while the second part is the auxiliary information package for blind detection. To increase embedding capacity, we have to make the size of the second part as small as possible. Tian's classical DE method has a large auxiliary information package. Thodi mitigated the problem by using a payload-independent overflow location map. However, the compressibility of the overflow location map is still undesirable in some image types. In this paper, we focus on improving the overflow location map. We design a new embedding scheme that helps us construct an efficient payload-dependent overflow location map. Such an overflow location map has good compressibility. Our accurate capacity control capability also reduces unnecessary alteration to the image. Under the same image quality, the proposed algorithm often has larger embedding capacity. It performs well in different types of images, including those where other algorithms often have difficulty in acquiring good embedding capacity and high image quality.

479 citations


Journal ArticleDOI
TL;DR: This paper shows that various crucial factors in video annotation, including multiple modalities, multiple distance functions, and temporal consistency, all correspond to different relationships among video units, and hence they can be represented by different graphs, and proposes optimized multigraph-based semi-supervised learning (OMG-SSL), which aims to simultaneously tackle these difficulties in a unified scheme.
Abstract: Learning-based video annotation is a promising approach to facilitating video retrieval and it can avoid the intensive labor costs of pure manual annotation. But it frequently encounters several difficulties, such as insufficiency of training data and the curse of dimensionality. In this paper, we propose a method named optimized multigraph-based semi-supervised learning (OMG-SSL), which aims to simultaneously tackle these difficulties in a unified scheme. We show that various crucial factors in video annotation, including multiple modalities, multiple distance functions, and temporal consistency, all correspond to different relationships among video units, and hence they can be represented by different graphs. Therefore, these factors can be simultaneously dealt with by learning with multiple graphs, namely, the proposed OMG-SSL approach. Different from the existing graph-based semi-supervised learning methods that only utilize one graph, OMG-SSL integrates multiple graphs into a regularization framework in order to sufficiently explore their complementation. We show that this scheme is equivalent to first fusing multiple graphs and then conducting semi-supervised learning on the fused graph. Through an optimization approach, it is able to assign suitable weights to the graphs. Furthermore, we show that the proposed method can be implemented through a computationally efficient iterative process. Extensive experiments on the TREC video retrieval evaluation (TRECVID) benchmark have demonstrated the effectiveness and efficiency of our proposed approach.

453 citations


Journal ArticleDOI
TL;DR: A DCT based JND model for monochrome pictures is proposed that incorporates the spatial contrast sensitivity function (CSF), the luminance adaptation effect, and the contrast masking effect based on block classification and is consistent with the human visual system.
Abstract: In image and video processing field, an effective compression algorithm should remove not only the statistical redundancy information but also the perceptually insignificant component from the pictures. Just-noticeable distortion (JND) profile is an efficient model to represent those perceptual redundancies. Human eyes are usually not sensitive to the distortion below the JND threshold. In this paper, a DCT based JND model for monochrome pictures is proposed. This model incorporates the spatial contrast sensitivity function (CSF), the luminance adaptation effect, and the contrast masking effect based on block classification. Gamma correction is also considered to compensate the original luminance adaptation effect which gives more accurate results. In order to extend the proposed JND profile to video images, the temporal modulation factor is included by incorporating the temporal CSF and the eye movement compensation. Moreover, a psychophysical experiment was designed to parameterize the proposed model. Experimental results show that the proposed model is consistent with the human visual system (HVS). Compared with the other JND profiles, the proposed model can tolerate more distortion and has much better perceptual quality. This model can be easily applied in many related areas, such as compression, watermarking, error protection, perceptual distortion metric, and so on.

257 citations


Journal ArticleDOI
TL;DR: The proposed Lap-lambda is able to adaptively optimize the input sequences so that the overall coding efficiency is improved, and compared with HR-lambda, shows a much better or similar performance in all scenarios.
Abstract: In today's hybrid video coding, Rate-Distortion Optimization (RDO) plays a critical role. It aims at minimizing the distortion under a constraint on the rate. Currently, the most popular RDO algorithm for one-pass coding is the one recommended in the H.264/AVC reference software. It, or HR-lambda for convenience, is actually a kind of universal method which performs the optimization only according to the quantization process while ignoring the properties of input sequences. Intuitively, it is not efficient all the time and an adaptive scheme should be better. Therefore, a new algorithm Lap- lambda is presented in this paper. Based on the Laplace distribution of transformed residuals, the proposed Lap-lambda is able to adaptively optimize the input sequences so that the overall coding efficiency is improved. Cases which cannot be well captured by the proposed models are considered via escape methods. Comprehensive simulations verify that compared with HR-lambda , Lap-lambda shows a much better or similar performance in all scenarios. Particularly, significant gains of 1.79 dB and 1.60 dB in PSNR are obtained for slow sequences and B-frames, respectively.

162 citations


Journal ArticleDOI
TL;DR: A novel technique for video stabilization based on the particle filtering framework that extends the traditional use of particle filters in object tracking to tracking of the projected affine model of the camera motions and relies on the inverse of the resulting image transform to obtain a stable video sequence.
Abstract: Video stabilization is an important technique in digital cameras Its impact increases rapidly with the rising popularity of handheld cameras and cameras mounted on moving platforms (eg, cars) Stabilization of two images can be viewed as an image registration problem However, to ensure the visual quality of the whole video, video stabilization has a particular emphasis on the accuracy and robustness over long image sequences In this paper, we propose a novel technique for video stabilization based on the particle filtering framework We extend the traditional use of particle filters in object tracking to tracking of the projected affine model of the camera motions We rely on the inverse of the resulting image transform to obtain a stable video sequence The correspondence between scale-invariant feature transform points is used to obtain a crude estimate of the projected camera motion We subsequently postprocess the crude estimate with particle filters to obtain a smooth estimate It is shown both theoretically and experimentally that particle filtering can reduce the error variance compared to estimation without particle filtering The superior performance of our algorithm over other methods for video stabilization is demonstrated through computer simulated experiments

155 citations


Journal ArticleDOI
TL;DR: A fast mode decision algorithm is proposed to speed up the encoding process by reducing the number of modes required to be checked in a hierarchical manner, and Experimental results have shown that the proposed MAMD algorithm reduces the computational complexity by 62.96%.
Abstract: The intra-mode and inter-mode predictions have been made available in H.264/AVC for effectively improving coding efficiency. However, exhaustively checking for all the prediction modes for identifying the best one (commonly referred to as exhaustive mode decision) greatly increases computational complexity. In this paper, a fast mode decision algorithm, called the motion activity-based mode decision (MAMD), is proposed to speed up the encoding process by reducing the number of modes required to be checked in a hierarchical manner, and is as follows. For each macroblock, the proposed MAMD algorithm always starts with checking the rate-distortion (RD) cost computed at the SKIP mode for a possible early termination, once the RD cost value is below a predetermined ldquolowrdquo threshold. On the other hand, if the RD cost exceeds another ldquohighrdquo threshold, then this indicates that only the intra modes are worthwhile to be checked. If the computed RD cost falls between the above-mentioned two thresholds, the remaining seven modes, which are classified into three motion activity classes in our work, will be examined, and only one of the three classes will be chosen for further mode checking. The above-mentioned motion activity can be quantitatively measured, which is equal to the maximum city-block length of the motion vector taken from a set of adjacent macroblocks (i.e., region of support, ROS). This measurement is then used to determine the most possible motion-activity class for the current macroblock. Experimental results have shown that, on average, the proposed MAMD algorithm reduces the computational complexity by 62.96%, while incurring only 0.059 dB loss in PSNR (peak signal-to-noise ratio) and 0.19% increment on the total bit rate compared to that of exhaustive mode decision, which is a default approach set in the JM reference software.

151 citations


Journal ArticleDOI
TL;DR: An automatic key frame extraction method dedicated to summarizing consumer video clips acquired from digital cameras and demonstrates the effectiveness of the method by comparing the results with two alternative methods against the ground truth agreed by multiple judges.
Abstract: Extracting key frames from video is of great interest in many applications, such as video summary, video organization, video compression, and prints from video. Key frame extraction is not a new problem but existing literature has focused primarily on sports or news video. In the personal or consumer video space, the biggest challenges for key frame selection are the unconstrained content and lack of any pre-imposed structures. First, in a psychovisual study, we conduct ground truth collection of key frames from video clips taken by digital cameras (as opposed to camcorders) using both first- and third-party judges. The goals of this study are to: 1) create a reference database of video clips reasonably representative of the consumer video space; 2) identify consensus key frames by which automated algorithms can be compared and judged for effectiveness, i.e., ground truth; and 3) uncover the criteria used by both first- and third-party human judges so these criteria can influence algorithm design. Next, we develop an automatic key frame extraction method dedicated to summarizing consumer video clips acquired from digital cameras. Analysis of spatio-temporal changes over time provides semantically meaningful information about the scene and the camera operator's general intents. In particular, camera and object motion are estimated and used to derive motion descriptors. A video clip is segmented into homogeneous parts based on major types of camera motion (e.g., pan, zoom, pause, steady). Dedicated rules are used to extract candidate key frames from each segment. In addition, confidence measures are computed for the candidates to enable ranking in semantic relevance. This method is scalable so that one can produce any desired number of key frames from the candidates. Finally, we demonstrate the effectiveness of our method by comparing the results with two alternative methods against the ground truth agreed by multiple judges.

127 citations


Journal ArticleDOI
TL;DR: A novel side information refinement (SIR) algorithm is proposed for a transform domain WZ video codec based on a learning approach where the side information is successively improved as the decoding proceeds, showing significant and consistent performance improvements regarding state-of-the-art WZ and standard video codecs.
Abstract: Wyner-Ziv (WZ) video coding is a particular case of distributed video coding, which is a recent video coding paradigm based on the Slepian-Wolf and WZ theorems. Contrary to available prediction-based standard video codecs, WZ video coding exploits the source statistics at the decoder, allowing the development of simpler encoders. Until now, WZ video coding did not reach the compression efficiency performance of conventional video coding solutions, mainly due to the poor quality of the side information, which is an estimate of the original frame created at the decoder in the most popular WZ video codecs. In this context, this paper proposes a novel side information refinement (SIR) algorithm for a transform domain WZ video codec based on a learning approach where the side information is successively improved as the decoding proceeds. The results show significant and consistent performance improvements regarding state-of-the-art WZ and standard video codecs, especially under critical conditions such as high motion content and long group of pictures sizes.

Journal ArticleDOI
TL;DR: This paper proposes a hardware architecture for object detection based on an AdaBoost learning algorithm with Haar-like features as weak classifiers and proposes a partially parallel execution model suitable for hardware implementation that dramatically improves the total processing speed.
Abstract: This paper proposes a hardware architecture for object detection based on an AdaBoost learning algorithm with Haar-like features as weak classifiers. We analyze and discuss the parallelism in this detection algorithm and propose a partially parallel execution model suitable for hardware implementation. This parallel execution model exploits the cascade structure of classifiers, in which classifiers located near the beginning of the cascade are used more frequently than subsequent classifiers. We assign more resources to these earlier classifiers to execute in parallel than to subsequent classifiers. This dramatically improves the total processing speed without a great increase in circuit area. Moreover, the partially parallel execution model achieves flexible processing performance by adjusting the balance of parallel processing. In addition, we implement the proposed architecture on a Virtex-5 FPGA to show that it achieves real-time object detection at 30 fps on VGA video without candidate extraction.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a method to enhance the image quality for a given backlight intensity by performing brightness compensation and local contrast enhancement, where global image statistics and backlight level are considered to maintain the overall brightness of the image.
Abstract: One common way to extend the battery life of a portable device is to reduce the LCD backlight intensity. In contrast to previous approaches that minimize the power consumption by adjusting the backlight intensity frame by frame to reach a specified image quality, the proposed method optimizes the image quality for a given backlight intensity. Image is enhanced by performing brightness compensation and local contrast enhancement. For brightness compensation, global image statistics and backlight level are considered to maintain the overall brightness of the image. For contrast enhancement, the local contrast property of human visual system (HVS) is exploited to enhance the local image details. In addition, a brightness prediction scheme is proposed to speed up the algorithm for display of video sequences. Experimental results are presented to show the performance of the algorithm.

Journal ArticleDOI
TL;DR: An exemplar-based image inpainting algorithm is extended by incorporating an improved patch matching strategy for video inPainting, which produces very few ldquoghost shadows,rdquo which were produced by most image in painting algorithms directly applied on video.
Abstract: Image inpainting or image completion is the technique that automatically restores/completes removed areas in an image. When dealing with a similar problem in video, not only should a robust tracking algorithm be used, but the temporal continuity among video frames also needs to be taken into account, especially when the video has camera motions such as zooming and tilting. In this paper, we extend an exemplar-based image inpainting algorithm by incorporating an improved patch matching strategy for video inpainting. In our proposed algorithm, different motion segments with different temporal continuity call for different candidate patches, which are used to inpaint holes after a selected video object is tracked and removed. The proposed new video inpainting algorithm produces very few ldquoghost shadows,rdquo which were produced by most image inpainting algorithms directly applied on video. Our experiments use different types of videos, including cartoon, video from games, and video from digital camera with different camera motions. Our demonstration at http://member.mine.tku.edu.tw/www/T_CSVT/web/shows the promising results.

Journal ArticleDOI
TL;DR: The experimental results demonstrate the superiority of the proposed reversible visible watermarking scheme compared to the existing methods, and adopts data compression for further reduction in the recovery packet size and improvement in embedding capacity.
Abstract: A reversible (also called lossless, distortion-free, or invertible) visible watermarking scheme is proposed to satisfy the applications, in which the visible watermark is expected to combat copyright piracy but can be removed to losslessly recover the original image. We transparently reveal the watermark image by overlapping it on a user-specified region of the host image through adaptively adjusting the pixel values beneath the watermark, depending on the human visual system-based scaling factors. In order to achieve reversibility, a reconstruction/recovery packet, which is utilized to restore the watermarked area, is reversibly inserted into non-visibly-watermarked region. The packet is established according to the difference image between the original image and its approximate version instead of its visibly watermarked version so as to alleviate its overhead. For the generation of the approximation, we develop a simple prediction technique that makes use of the unaltered neighboring pixels as auxiliary information. The recovery packet is uniquely encoded before hiding so that the original watermark pattern can be reconstructed based on the encoded packet. In this way, the image recovery process is carried out without needing the availability of the watermark. In addition, our method adopts data compression for further reduction in the recovery packet size and improvement in embedding capacity. The experimental results demonstrate the superiority of the proposed scheme compared to the existing methods.

Journal ArticleDOI
TL;DR: A gradient-based scheduling and resource allocation algorithm is proposed, which prioritizes the transmissions of different users by considering video contents, deadline requirements, and transmission history.
Abstract: We consider the problem of scheduling and resource allocation for multiuser video streaming over downlink orthogonal frequency division multiplexing (OFDM) channels. The video streams are precoded using the scalable video coding (SVC) scheme that offers both quality and temporal scalabilities. The OFDM technology provides the flexibility of resource allocation in terms of time, frequency, and power. We propose a gradient-based scheduling and resource allocation algorithm, which prioritizes the transmissions of different users by considering video contents, deadline requirements, and transmission history. Simulation results show that the proposed algorithm outperforms the content-blind and deadline-blind algorithms with a gain of as much as 6 dB in terms of average PSNR when the network is congested.

Journal ArticleDOI
TL;DR: A key frames selection algorithm based on three iso-content principles (iso-content distance, iso- Content error and iso- content distortion) is presented, so that the selected key frames are equidistant in video content according to the used principle.
Abstract: We present a key frames selection algorithm based on three iso-content principles (iso-content distance, iso-content error and iso-content distortion), so that the selected key frames are equidistant in video content according to the used principle. Two automatic approaches for defining the most appropriate number of key frames are proposed by exploiting supervised and unsupervised content criteria. Experimental results and the comparisons with existing methods from literature on large dataset of real-life video sequences illustrate the high performance of the proposed schemata.

Journal ArticleDOI
TL;DR: This paper examines a strategy for maximizing the network lifetime in wireless visual sensor networks by jointly optimizing the source rates, the encoding powers, and the routing scheme and demonstrates that the proposed algorithm can achieve a much longer network lifetime compared to the scheme optimized for the conventional wireless sensor networks.
Abstract: Network lifetime maximization is a critical issue in wireless sensor networks since each sensor has a limited energy supply. In contrast with conventional sensor networks, video sensor nodes compress the video before transmission. The encoding process demands a high power consumption, and thus raises a great challenge to the maintenance of a long network lifetime. In this paper, we examine a strategy for maximizing the network lifetime in wireless visual sensor networks by jointly optimizing the source rates, the encoding powers, and the routing scheme. Fully distributed algorithms are developed using the Lagrangian duality to solve the lifetime maximization problem. We also examine the relationship between the collected video quality and the maximal network lifetime. Through extensive numerical simulations, we demonstrate that the proposed algorithm can achieve a much longer network lifetime compared to the scheme optimized for the conventional wireless sensor networks.

Journal ArticleDOI
TL;DR: A novel data hiding method in the compressed video domain that completely preserves the image quality of the host video while embedding information into it and is also reversible, where the embedded information could be removed to obtain the original video.
Abstract: Although many data hiding methods are proposed in the literature, all of them distort the quality of the host content during data embedding. In this paper, we propose a novel data hiding method in the compressed video domain that completely preserves the image quality of the host video while embedding information into it. Information is embedded into a compressed video by simultaneously manipulating Mquant and quantized discrete cosine transform coefficients, which are the significant parts of MPEG and H.26x-based compression standards. To the best of our knowledge, this data hiding method is the first attempt of its kind. When fed into an ordinary video decoder, the modified video completely reconstructs the original video even compared at the bit-to-bit level. Our method is also reversible, where the embedded information could be removed to obtain the original video. A new data representation scheme called reverse zerorun length (RZL) is proposed to exploit the statistics of macroblock for achieving high embedding efficiency while trading off with payload. It is theoretically and experimentally verified that RZL outperforms matrix encoding in terms of payload and embedding efficiency for this particular data hiding method. The problem of video bitstream size increment caused by data embedding is also addressed, and two independent solutions are proposed to suppress this increment. Basic performance of this data hiding method is verified through experiments on various existing MPEG-1 encoded videos. In the best case scenario, an average increase of four bits in the video bitstream size is observed for every message bit embedded.

Journal ArticleDOI
TL;DR: It is shown that the analytical calculation of an optimal interpolation filter at particular constraints is possible, resulting in total coding improvements of 20% at broadcast quality compared to the H.264/AVC High Profile.
Abstract: In order to reduce the bit-rate of video signals, current coding standards apply hybrid coding with motion-compensated prediction and transform coding of the prediction error. In former publications, it has been shown that aliasing components contained in an image signal, as well as motion blur are limiting the prediction efficiency obtained by motion compensation. In this paper, we show that the analytical calculation of an optimal interpolation filter at particular constraints is possible, resulting in total coding improvements of 20% at broadcast quality compared to the H.264/AVC High Profile. Furthermore, the spatial adaptation to local image characteristics enables further improvements of 0.15 dB for CIF sequences compared to globally adaptive filter or up to 0.6 dB, compared to the standard H.264/AVC. Additionally, we show that the presented approach is generally applicable, i.e., also motion blur can be exactly compensated, if particular constraints are fulfilled.

Journal ArticleDOI
Li Su1, Yan Lu2, Feng Wu2, Shipeng Li2, Wen Gao3 
TL;DR: A joint complexity-distortion optimization approach is proposed for real-time H.264 video encoding under the power-constrained environment and the adaptive allocation of computational resources and the fine scalability of complexity control can be achieved.
Abstract: In this paper, a joint complexity-distortion optimization approach is proposed for real-time H.264 video encoding under the power-constrained environment. The power consumption is first translated to the encoding computation costs measured by the number of scaled computation units consumed by basic operations. The solved problem is then specified to be the allocation and utilization of the computational resources. A computation allocation model (CAM) with virtual computation buffers is proposed to optimally allocate the computational resources to each video frame. In particular, the proposed CAM and the traditional hypothetical reference decoder model have the same temporal phase in operations. Further, to fully utilize the allocated computational resources, complexity-configurable motion estimation (CAME) and complexity-configurable mode decision (CAMD) algorithms are proposed for H.264 video encoding. In particular, the CAME is performed to select the path of motion search at the frame level, and the CAMD is performed to select the order of mode search at the macroblock level. Based on the hierarchical adjusting approach, the adaptive allocation of computational resources and the fine scalability of complexity control can be achieved.

Journal ArticleDOI
TL;DR: The results demonstrate that PSNR gains can be achieved for the conventional inter prediction (IPPP) coding structure or the hierarchical bi-predictive (B) picture coding structure with large group of pictures size, for all the tested sequences and under various combinations of packet loss rates.
Abstract: Scalable video coding (SVC), which is the scalable extension of the H.264/AVC standard, was developed by the Joint Video Team (JVT) of ISO/IEC MPEG (Moving Picture Experts Group) and ITU-T VCEG (Video Coding Experts Group). SVC is designed to provide adaptation capability for heterogeneous network structures and different receiving devices with the help of temporal, spatial, and quality scalabilities. It is challenging to achieve graceful quality degradation in an error-prone environment, since channel errors can drastically deteriorate the quality of the video. Error resilient coding and error concealment techniques have been introduced into SVC to reduce the quality degradation impact of transmission errors. Some of the techniques are inherited from or applicable also to H.264/AVC, while some of them take advantage of the SVC coding structure and coding tools. In this paper, the error resilient coding and error concealment tools in SVC are first reviewed. Then, several important tools such as loss-aware rate-distortion optimized macroblock mode decision algorithm and error concealment methods in SVC are discussed and experimental results are provided to show the benefits from them. The results demonstrate that PSNR gains can be achieved for the conventional inter prediction (IPPP) coding structure or the hierarchical bi-predictive (B) picture coding structure with large group of pictures size, for all the tested sequences and under various combinations of packet loss rates, compared with the basic joint scalable video model (JSVM) design applying no error resilient tools at the encoder and only picture copy error concealment method at the decoder.

Journal ArticleDOI
TL;DR: This paper presents a HD1080p 30-frames/s H.264 intra encoder operated at 140 MHz with just 94 K gate count and 0.72-mm2 core area to achieve high throughput and low area cost for high-definition video, and adopts the modified three-step fast intra prediction technique to reduce the cycle count.
Abstract: This paper presents a HD1080p 30-frames/s H.264 intra encoder operated at 140 MHz with just 94 K gate count and 0.72-mm2 core area for digital video recorder or digital still camera applications. To achieve high throughput and low area cost for high-definition video, we apply the modified three-step fast intra prediction technique to reduce the cycle count while keeping the quality as close as full search. Then, in architecture scheduling, we further adopt the variable pixel parallelism instead of constant four-pixel parallelism to speed up performance on the critical intra prediction part while keeping other parts unchanged for low area cost. The achieved design only needs half of the working frequency and reduces the gate count cost by 23.5% compared with the previous design with the same HD720p 30-frames/s requirement. Besides, our design at 140 MHz can support HD1080p 30 frames/s for digital video encoder or 4096 times2304 images with 6.78 frames/s for digital still camera application.

Journal ArticleDOI
TL;DR: The proposed denoising scheme gives better performance than several state-of-the-art DDWT-based schemes for images with rich directional features and shows promising results without using motion estimation in video denoised.
Abstract: We investigate image and video denoising using adaptive dual-tree discrete wavelet packets (ADDWP), which is extended from the dual-tree discrete wavelet transform (DDWT). With ADDWP, DDWT subbands are further decomposed into wavelet packets with anisotropic decomposition, so that the resulting wavelets have elongated support regions and more orientations than DDWT wavelets. To determine the decomposition structure, we develop a greedy basis selection algorithm for ADDWP, which has significantly lower computational complexity than a previously developed optimal basis selection algorithm, with only slight performance loss. For denoising the ADDWP coefficients, a statistical model is used to exploit the dependency between the real and imaginary parts of the coefficients. The proposed denoising scheme gives better performance than several state-of-the-art DDWT-based schemes for images with rich directional features. Moreover, our scheme shows promising results without using motion estimation in video denoising. The visual quality of images and videos denoised by the proposed scheme is also superior.

Journal ArticleDOI
TL;DR: Through both simulations and actual experiments, it is shown that the performance of the proposed protocol is close to that of the optimal solution, and is better than that of other heuristic protocols.
Abstract: In this paper, we propose a novel multipath selection framework for video streaming over wireless ad hoc networks. We propose a heuristic interference-aware multipath routing protocol based on the estimation of concurrent packet drop probability of two paths, taking into account interference between links. Through both simulations and actual experiments, we show that the performance of the proposed protocol is close to that of the optimal solution, and is better than that of other heuristic protocols.

Journal ArticleDOI
TL;DR: An algorithm and a hardware architecture of a new type EC codec engine with multiple modes are presented and the proposed four-tree pipelining scheme can reduce 83% latency and 67% buffer size between transform and entropy coding.
Abstract: In a typical portable multimedia system, external access, which is usually dominated by block-based video content, induces more than half of total system power. Embedded compression (EC) effectively reduces external access caused by video content by reducing the data size. In this paper, an algorithm and a hardware architecture of a new type EC codec engine with multiple modes are presented. Lossless mode, and lossy modes with rate control modes and quality control modes are all supported by single algorithm. The proposed four-tree pipelining scheme can reduce 83% latency and 67% buffer size between transform and entropy coding. The proposed EC codec engine can save 62%, 66%, and 77% external access at lossless mode, half-size mode, and quarter-size mode and can be used in various system power conditions. With TSMC 0.18 mum 1P6M CMOS logic process, the proposed EC codec engine can encode or decode CIF 30 frame per second video data and achieve power saving of more than 109 mW. The EC codec engine itself consumes only 2 mW power.

Journal ArticleDOI
TL;DR: A novel framework to enable personalized image recommendation via exploratory search from large-scale collections of Flickr images, where a small set of most representative images are recommended for the given image topic according to their representativeness scores and Kernel principal component analysis and hyperbolic visualization are seamlessly integrated.
Abstract: In this paper, we have developed a novel framework called JustClick to enable personalized image recommendation via exploratory search from large-scale collections of Flickr images. First, a topic network is automatically generated to summarize large-scale collections of Flickr images at a semantic level. Hyperbolic visualization is further used to enable interactive navigation and exploration of the topic network, so that users can gain insights of large-scale image collections at the first glance, build up their mental query models interactively and specify their queries (i.e., image needs) more precisely by selecting the image topics on the topic network directly. Thus, our personalized query recommendation framework can effectively address both the problem of query formulation and the problem of vocabulary discrepancy and null returns. Second, a small set of most representative images are recommended for the given image topic according to their representativeness scores. Kernel principal component analysis and hyperbolic visualization are seamlessly integrated to organize and layout the recommended images (i.e., most representative images) according to their nonlinear visual similarity contexts, so that users can assess the relevance between the recommended images and their real query intentions interactively. An interactive interface is implemented to allow users to express their time-varying query intentions precisely and to direct our JustClick system to more relevant images according to their personal preferences. Our experiments on large-scale collections of Flickr images show very positive results.

Journal ArticleDOI
TL;DR: An efficient intermode decision algorithm based on motion homogeneity evaluated on a normalized motion vector (MV) field, which is generated using MVs from motion estimation on the block size of 4 times 4.
Abstract: The latest video coding standard H.264/AVC significantly outperforms previous standards in terms of coding efficiency. H.264/AVC adopts variable block sizes ranging from 4 times 4 to 16 times 16 in inter frame coding, and achieves significant gain in coding efficiency compared to coding a macroblock (MB) using regular block size. However, this new feature causes extremely high computation complexity when rate-distortion optimization (RDO) is performed using the scheme of full mode decision. This paper presents an efficient intermode decision algorithm based on motion homogeneity evaluated on a normalized motion vector (MV) field, which is generated using MVs from motion estimation on the block size of 4 times 4. Three directional motion homogeneity measures derived from the normalized MV field are exploited to determine a subset of candidate intermodes for each MB, and unnecessary RDO calculations on other intermodes can be skipped. Experimental results demonstrate that our algorithm can reduce the entire encoding time about 40% on average, without any noticeable loss of coding efficiency.

Journal ArticleDOI
TL;DR: In this paper, the effect of different constraints on the multiple-camera system in terms of geometric accuracy and the requirement for high-quality view synthesis is evaluated, and two prototype studios are contrasted and state-of-the-art techniques for 3D content production demonstrated.
Abstract: Multiple-camera systems are currently widely used in research and development as a means of capturing and synthesizing realistic 3-D video content. Studio systems for 3-D production of human performance are reviewed from the literature, and the practical experience gained in developing prototype studios is reported across two research laboratories. System design should consider the studio backdrop for foreground matting, lighting for ambient illumination, camera acquisition hardware, the camera configuration for scene capture, and accurate geometric and photometric camera calibration. A ground-truth evaluation is performed to quantify the effect of different constraints on the multiple-camera system in terms of geometric accuracy and the requirement for high-quality view synthesis. As changing camera height has only a limited influence on surface visibility, multiple-camera sets or an active vision system may be required for wide area capture, and accurate reconstruction requires a camera baseline of 25deg, and the achievable accuracy is 5-10-mm at current camera resolutions. Accuracy is inherently limited, and view-dependent rendering is required for view synthesis with sub-pixel accuracy where display resolutions match camera resolutions. The two prototype studios are contrasted and state-of-the-art techniques for 3-D content production demonstrated.

Journal ArticleDOI
TL;DR: The whole system has been validated using real-time images acquired during official soccer matches, and quantitative results on the system accuracy were obtained comparing the system responses with the ground truth data generated manually on a number of extracted significant sequences.
Abstract: In this paper, we investigate on the feasibility of multiple camera system for automatic offside detection. We propose six fixed cameras, properly placed on the two sides of the soccer field (three for each side) to reduce perspective and occlusion errors. The images acquired by the synchronized cameras are processed to detect the players' position and the ball position in real-time; a multiple view analysis is carried out to evaluate the offside event, considering the position of all the players in the field, determining the players who passed the ball, and determining if active offside condition occurred. The whole system has been validated using real-time images acquired during official soccer matches, and quantitative results on the system accuracy were obtained comparing the system responses with the ground truth data generated manually on a number of extracted significant sequences.