scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Circuits and Systems for Video Technology in 2007"


Journal ArticleDOI
TL;DR: An overview of the basic concepts for extending H.264/AVC towards SVC are provided and the basic tools for providing temporal, spatial, and quality scalability are described in detail and experimentally analyzed regarding their efficiency and complexity.
Abstract: With the introduction of the H.264/AVC video coding standard, significant improvements have recently been demonstrated in video compression capability. The Joint Video Team of the ITU-T VCEG and the ISO/IEC MPEG has now also standardized a Scalable Video Coding (SVC) extension of the H.264/AVC standard. SVC enables the transmission and decoding of partial bit streams to provide video services with lower temporal or spatial resolutions or reduced fidelity while retaining a reconstruction quality that is high relative to the rate of the partial bit streams. Hence, SVC provides functionalities such as graceful degradation in lossy transmission environments as well as bit rate, format, and power adaptation. These functionalities provide enhancements to transmission and storage applications. SVC has achieved significant improvements in coding efficiency with an increased degree of supported scalability relative to the scalable profiles of prior video coding standards. This paper provides an overview of the basic concepts for extending H.264/AVC towards SVC. Moreover, the basic tools for providing temporal, spatial, and quality scalability are described in detail and experimentally analyzed regarding their efficiency and complexity.

3,592 citations


Journal ArticleDOI
TL;DR: An experimental analysis of multiview video coding (MVC) for various temporal and inter-view prediction structures is presented, showing that prediction with temporal reference pictures is highly efficient, but for 20% of a picture's blocks on average prediction with reference pictures from adjacent views is more efficient.
Abstract: An experimental analysis of multiview video coding (MVC) for various temporal and inter-view prediction structures is presented. The compression method is based on the multiple reference picture technique in the H.264/AVC video coding standard. The idea is to exploit the statistical dependencies from both temporal and inter-view reference pictures for motion-compensated prediction. The effectiveness of this approach is demonstrated by an experimental analysis of temporal versus inter-view prediction in terms of the Lagrange cost function. The results show that prediction with temporal reference pictures is highly efficient, but for 20% of a picture's blocks on average prediction with reference pictures from adjacent views is more efficient. Hierarchical B pictures are used as basic structure for temporal prediction. Their advantages are combined with inter-view prediction for different temporal hierarchy levels, starting from simulcast coding with no inter-view prediction up to full level inter-view prediction. When using inter-view prediction at key picture temporal levels, average gains of 1.4-dB peak signal-to-noise ratio (PSNR) are reported, while additionally using inter-view prediction at nonkey picture temporal levels, average gains of 1.6-dB PSNR are reported. For some cases, gains of more than 3 dB, corresponding to bit-rate savings of up to 50%, are obtained.

645 citations


Journal ArticleDOI
TL;DR: A unified shot boundary detection system based on graph partition model is presented and it is shown that the proposed approach is among the best in the evaluation of TRECVID 2005.
Abstract: This paper conducts a formal study of the shot boundary detection problem. First, a general formal framework of shot boundary detection techniques is proposed. Three critical techniques, i.e., the representation of visual content, the construction of continuity signal and the classification of continuity values, are identified and formulated in the perspective of pattern recognition. Meanwhile, the major challenges to the framework are identified. Second, a comprehensive review of the existing approaches is conducted. The representative approaches are categorized and compared according to their roles in the formal framework. Based on the comparison of the existing approaches, optimal criteria for each module of the framework are discussed, which will provide practical guide for developing novel methods. Third, with all the above issues considered, we present a unified shot boundary detection system based on graph partition model. Extensive experiments are carried out on the platform of TRECVID. The experiments not only verify the optimal criteria discussed above, but also show that the proposed approach is among the best in the evaluation of TRECVID 2005. Finally, we conclude the paper and present some further discussions on what shot boundary detection can learn from other related fields

357 citations


Journal ArticleDOI
TL;DR: A new motion-compe (MC) interpolation algorithm to enhance the temporal resolution of video sequences and can overcome the limitations of the conventional OBMC, such as over-smoothing and poor de-blocking.
Abstract: In this work, we develop a new motion-compe (MC) interpolation algorithm to enhance the temporal resolution of video sequences. First, we propose the bilateral motion estimation scheme to obtain the motion field of an interpolated frame without yielding the hole and overlapping problems. Then, we partition a frame into several object regions by clustering motion vectors. We apply the variable-size block MC (VS-BMC) algorithm to object boundaries in order to reconstruct edge information with a higher quality. Finally, we use the adaptive overlapped block MC (OBMC), which adjusts the coefficients of overlapped windows based on the reliabilities of neighboring motion vectors. The adaptive OBMC (AOBMC) can overcome the limitations of the conventional OBMC, such as over-smoothing and poor de-blocking. Experimental results show that the proposed algorithm provides a better image quality than conventional methods both objectively and subjectively

348 citations


Journal ArticleDOI
TL;DR: 3DTV coding technology is maturating, however, the research area is relatively young compared to coding of other types of media, and there is still a lot of room for improvement and new development of algorithms.
Abstract: Research efforts on 3DTV technology have been strengthened worldwide recently, covering the whole media processing chain from capture to display. Different 3DTV systems rely on different 3D scene representations that integrate various types of data. Efficient coding of these data is crucial for the success of 3DTV. Compression of pixel-type data including stereo video, multiview video, and associated depth or disparity maps extends available principles of classical video coding. Powerful algorithms and open international standards for multiview video coding and coding of video plus depth data are available and under development, which will provide the basis for introduction of various 3DTV systems and services in the near future. Compression of 3D mesh models has also reached a high level of maturity. For static geometry, a variety of powerful algorithms are available to efficiently compress vertices and connectivity. Compression of dynamic 3D geometry is currently a more active field of research. Temporal prediction is an important mechanism to remove redundancy from animated 3D mesh sequences. Error resilience is important for transmission of data over error prone channels, and multiple description coding (MDC) is a suitable way to protect data. MDC of still images and 2D video has already been widely studied, whereas multiview video and 3D meshes have been addressed only recently. Intellectual property protection of 3D data by watermarking is a pioneering research area as well. The 3D watermarking methods in the literature are classified into three groups, considering the dimensions of the main components of scene representations and the resulting components after applying the algorithm. In general, 3DTV coding technology is maturating. Systems and services may enter the market in the near future. However, the research area is relatively young compared to coding of other types of media. Therefore, there is still a lot of room for improvement and new development of algorithms.

326 citations


Journal ArticleDOI
TL;DR: The scalable video coding (SVC) standard as an extension of H.264/AVC allows efficient, standard-based temporal, spatial, and quality scalability of video bit streams.
Abstract: The scalable video coding (SVC) standard as an extension of H.264/AVC allows efficient, standard-based temporal, spatial, and quality scalability of video bit streams. Scalability of a video bit stream allows for media bit rate as well as for device capability adaptation. Moreover, adaptation of the bit rate of a video signal is a desirable key feature, if limitation in network resources, mostly characterized by throughput variations, varying delay or transmission errors, need to be considered. Typically, in mobile networks the throughput, delay and errors of a connection (link) depend on the current reception conditions, which are largely influenced by a number of physical factors. In order to cope with the typically varying characteristics of mobile communication channels in unicast, multicast, or broadcast services, different methods for increasing robustness and achieving quality of service are desirable. We will give an overview of SVC and its relation to mobile delivery methods. Furthermore, innovative use cases are introduced which apply SVC in mobile networks.

319 citations


Journal ArticleDOI
TL;DR: The results show that the performance gap between single layer coding and scalable video coding can be very small and that SVC clearly outperforms previous video coding technology such as MPEG-4 ASP.
Abstract: This paper provides a performance analysis of the scalable video coding (SVC) extension of H.264/AVC. A short overview presenting the main functionalities of SVC is given and main issues in encoder control and bit stream extraction are outlined. Some aspects of rate-distortion optimization in the context of SVC are discussed and strategies for derivation of optimized configurations relative to the investigated scalability scenarios are presented. Based on these methods, rate-distortion results for several SVC configurations are presented and compared to rate-distortion optimized H.264/AVC single layer coding. For reference, a comparison to rate-distortion optimized MPEG-4 visual (advanced simple profile) coding results is provided. The results show that the performance gap between single layer coding and scalable video coding can be very small and that SVC clearly outperforms previous video coding technology such as MPEG-4 ASP.

313 citations


Journal ArticleDOI
TL;DR: The spatially scalable extension within the resulting scalable video coding standard is introduced and the high-level design is described and individual coding tools are explained.
Abstract: A scalable extension to the H.264/AVC video coding standard has been developed within the joint video team (JVT), a joint organization of the ITU-T video coding group (VCEG) and the ISO/IEC moving picture experts group (MPEG). The extension allows multiple resolutions of an image sequence to be contained in a single bit stream. In this paper, we introduce the spatially scalable extension within the resulting scalable video coding standard. The high-level design is described and individual coding tools are explained. Additionally, encoder issues are identified. Finally, the performance of the design is reported.

273 citations


Journal ArticleDOI
TL;DR: This scheme embeds the watermark without exposing video content's confidentiality, and provides a solution for signal processing in encrypted domain, and increases the operation efficiency, since the encrypted video can be watermarked without decryption.
Abstract: A scheme is proposed to implement commutative video encryption and watermarking during advanced video coding process. In H.264/AVC compression, the intra-prediction mode, motion vector difference and discrete cosine transform (DCT) coefficients' signs are encrypted, while DCT coefficients' amplitudes are watermarked adaptively. To avoid that the watermarking operation affects the decryption operation, a traditional watermarking algorithm is modified. The encryption and watermarking operations are commutative. Thus, the watermark can be extracted from the encrypted videos, and the encrypted videos can be re-watermarked. This scheme embeds the watermark without exposing video content's confidentiality, and provides a solution for signal processing in encrypted domain. Additionally, it increases the operation efficiency, since the encrypted video can be watermarked without decryption. These properties make the scheme a good choice for secure media transmission or distribution

256 citations


Journal ArticleDOI
TL;DR: A new rate control scheme for H.264 video encoding with enhanced rate and distortion models is proposed and it is shown by experimental results that the new algorithm can control bit rates accurately with the R-D performance significantly better than that of the rate control algorithm implemented in the H. 264 software encoder JM8.1a.
Abstract: A new rate control scheme for H.264 video encoding with enhanced rate and distortion models is proposed in this work. Compared with existing H.264 rate control schemes, our scheme has offered several new features. First, the inter-dependency between rate-distortion optimization (RDO) and rate control in H.264 is resolved via quantization parameter estimation and update. Second, since the bits of the header information may occupy a larger portion of the total bit budget, which is especially true when being coded at low bit rates, a rate model for the header information is developed to estimate header bits more accurately. The number of header bits is modeled as a function of the number of nonzero motion vector (MV) elements and the number of MVs. Third, a new source rate model and a distortion model are proposed. For this purpose, coded 4 times 4 blocks are identified and the number of source bits and distortion are modeled as functions of the quantization stepsize and the complexity of coded 4 times 4 blocks. Finally, a R-D optimized bit allocation scheme among macroblocks (MBs) is proposed to improve picture quality. Built upon the above ideas, a rate control algorithm is developed for the H.264 baseline-profile encoder under the constant bit rate constraint. It is shown by experimental results that the new algorithm can control bit rates accurately with the R-D performance significantly better than that of the rate control algorithm implemented in the H.264 software encoder JM8.1a

255 citations


Journal ArticleDOI
TL;DR: Holography enables 3-D scenes to be encoded into an interference pattern, however, this places constraints on the display resolution necessary to reconstruct a scene, and although holography may ultimately offer the solution for 3DTV, the problem of capturing naturally lit scenes will first have to be solved and holographY is unlikely to provide a short-term solution due to limitations in current enabling technologies.
Abstract: The display is the last component in a chain of activity from image acquisition, compression, coding transmission and reproduction of 3-D images through to the display itself. There are various schemes for 3-D display taxonomy; the basic categories adopted for this paper are: holography where the image is produced by wavefront reconstruction, volumetric where the image is produced within a volume of space and multiple image displays where two or more images are seen across the viewing field. In an ideal world a stereoscopic display would produce images in real time that exhibit all the characteristics of the original scene. This would require the wavefront to be reproduced accurately, but currently this can only be achieved using holographic techniques. Volumetric displays provide both vertical and horizontal parallax so that several viewers can see 3-D images that exhibit no accommodation/convergence rivalry. Multiple image displays fall within three fundamental types: holoform in which a large number of views give smooth motion parallax and hence a hologram-like appearance, multiview where a series of discrete views are presented across viewing field and binocular where only two views are presented in regions that may occupy fixed positions or follow viewers' eye positions by employing head tracking. Holography enables 3-D scenes to be encoded into an interference pattern, however, this places constraints on the display resolution necessary to reconstruct a scene. Although holography may ultimately offer the solution for 3DTV, the problem of capturing naturally lit scenes will first have to be solved and holography is unlikely to provide a short-term solution due to limitations in current enabling technologies. Liquid crystal, digital micromirror, optically addressed liquid crystal and acoustooptic spatial light modulators (SLMs) have been employed as suitable spatial light modulation devices in holography. Liquid crystal SLMs are generally favored owing to the commercial availability of high fill factor, high resolution addressable devices. Volumetric displays provide both vertical and horizontal parallax and several viewers are able to see a 3-D image that exhibits no accommodation/convergence rivalry. However, the principal disadvantages of these displays are: the images are generally transparent, the hardware tends to be complex and non-Lambertian intensity distribution cannot be displayed. Multiple image displays take many forms and it is likely that one or more of these will provide the solution(s) for the first generation of 3DTV displays.

Journal ArticleDOI
TL;DR: This paper proposes an image compression framework towards visual quality rather than pixel-wise fidelity, and constructs a practical system to verify the effectiveness of the compression approach in which edge map serves as assistant information and the edge extraction and region removal approaches are developed accordingly.
Abstract: In this paper, image compression utilizing visual redundancy is investigated. Inspired by recent advancements in image inpainting techniques, we propose an image compression framework towards visual quality rather than pixel-wise fidelity. In this framework, an original image is analyzed at the encoder side so that portions of the image are intentionally and automatically skipped. Instead, some information is extracted from these skipped regions and delivered to the decoder as assistant information in the compressed fashion. The delivered assistant information plays a key role in the proposed framework because it guides image inpainting to accurately restore these regions at the decoder side. Moreover, to fully take advantage of the assistant information, a compression-oriented edge-based inpainting algorithm is proposed for image restoration, integrating pixel-wise structure propagation and patch-wise texture synthesis. We also construct a practical system to verify the effectiveness of the compression approach in which edge map serves as assistant information and the edge extraction and region removal approaches are developed accordingly. Evaluations have been made in comparison with baseline JPEG and standard MPEG-4 AVC/H.264 intra-picture coding. Experimental results show that our system achieves up to 44% and 33% bits-savings, respectively, at similar visual quality levels. Our proposed framework is a promising exploration towards future image and video compression.

Journal ArticleDOI
TL;DR: An accurate linear rate-quantization (R-Q) model is also formulated to describe the relationship between the total amount of bits for both texture and nontexture information and the quantization parameter (QP), so that the negative effect caused by the inaccurate estimation of nontexture bits is removed.
Abstract: This paper presents a novel rate control scheme for low delay video communication of H264/AVC standard A switched mean-absolute-difference (MAD) prediction scheme is introduced to enhance the traditional temporal MAD prediction model, which is not suitable for predicting abrupt MAD fluctuations Our new model could reduce the MAD prediction error by up to 69% Furthermore, an accurate linear rate-quantization (R-Q) model is also formulated to describe the relationship between the total amount of bits for both texture and nontexture information and the quantization parameter (QP), so that the negative effect caused by the inaccurate estimation of nontexture bits is removed By exploring the relationship between peak signal-to-noise ratio and QP value, the proposed linear R-Q model could further optimize QP calculation at the macroblock level When compared with the rate control scheme JVT-G012 which is adopted by the latest JVT H264/AVC reference model JM98, the proposed rate control algorithm could reduce the mismatch between actual bits and target ones by up to 75% To meet the low delay requirement, the buffer is better controlled to prevent overflowing and underflowing The average luminance PSNR of reconstructed video is increased by up to 113 dB at low bit rates, and the subjective video quality is also improved

Journal ArticleDOI
TL;DR: A simpler and more effective design is suggested, which selectively encrypts fixed-length codewords in MPEG-video bit streams under the control of three perceptibility factors, which can work with any stream cipher or block cipher.
Abstract: In this paper, some existing perceptual encryption algorithms of MPEG videos are reviewed and some problems, especially security defects of two recently proposed MPEG-video perceptual encryption schemes, are pointed out. Then, a simpler and more effective design is suggested, which selectively encrypts fixed-length codewords in MPEG-video bit streams under the control of three perceptibility factors. The proposed design is actually an encryption configuration that can work with any stream cipher or block cipher. Compared with the previously-proposed schemes, the new design provides more useful features, such as strict size-preservation, on-the-fly encryption and multiple perceptibility, which make it possible to support more applications with different requirements. In addition, four different measures are suggested to provide better security against known/chosen-plaintext attacks

Journal ArticleDOI
Peter Amon1, T. Rathgen, D. Singer
TL;DR: This paper describes the file format defined for scalable video coding, which enables rapid extraction of scalable data, corresponding to the desired operating point, in a variety of usages and application scenarios.
Abstract: This paper describes the file format defined for scalable video coding. Techniques in the file format enable rapid extraction of scalable data, corresponding to the desired operating point. Significant assistance to file readers can be provided, and there is also great flexibility in the ways that the techniques can be used and combined, corresponding to different usages and application scenarios.

Journal ArticleDOI
TL;DR: This paper proposes a method to build a dynamic representation of the semantic context of ongoing retrieval tasks, which is used to activate different subsets of user interests at runtime, in a way that out-of-context preferences are discarded.
Abstract: Personalized content retrieval aims at improving the retrieval process by taking into account the particular interests of individual users. However, not all user preferences are relevant in all situations. It is well known that human preferences are complex, multiple, heterogeneous, changing, even contradictory, and should be understood in context with the user goals and tasks at hand. In this paper, we propose a method to build a dynamic representation of the semantic context of ongoing retrieval tasks, which is used to activate different subsets of user interests at runtime, in a way that out-of-context preferences are discarded. Our approach is based on an ontology-driven representation of the domain of discourse, providing enriched descriptions of the semantics involved in retrieval actions and preferences, and enabling the definition of effective means to relate preferences and context

Journal ArticleDOI
TL;DR: A novel client-driven multiview video streaming system that allows a user to watch 3D video interactively with significantly reduced bandwidth requirements by transmitting a small number of views selected according to his/her head position is presented.
Abstract: We present a novel client-driven multiview video streaming system that allows a user to watch 3D video interactively with significantly reduced bandwidth requirements by transmitting a small number of views selected according to his/her head position. The user's head position is tracked and predicted into the future to select the views that best match the user's current viewing angle dynamically. Prediction of future head positions is needed so that views matching the predicted head positions can be prefetched in order to account for delays due to network transport and stream switching. The system allocates more bandwidth to the selected views in order to render the current viewing angle. Highly compressed, lower quality versions of some other views are also prefetched for concealment if the current user viewpoint differs from the predicted viewpoint. An objective measure based on the abruptness of the head movements and delays in the system is introduced to determine the number of additional lower quality views to be prefetched. The proposed system makes use of multiview coding (MVC) and scalable video coding (SVC) concepts together to obtain improved compression efficiency while providing flexibility in bandwidth allocation to the selected views. Rate-distortion performance of the proposed system is demonstrated under different experimental conditions.

Journal ArticleDOI
TL;DR: An efficient adaptation framework using SVC and MPEG-21 digital item adaptation (DIA) is integrated and it is shown that SVC can seamlessly be adapted using DIA.
Abstract: This paper presents the integration of scalable video coding (SVC) into a generic platform for multimedia adaptation. The platform provides a full MPEG-21 chain including server, adaptation nodes, and clients. An efficient adaptation framework using SVC and MPEG-21 digital item adaptation (DIA) is integrated and it is shown that SVC can seamlessly be adapted using DIA. For protection of packet losses in an error prone environment an unequal erasure protection scheme for SVC is provided. The platform includes a real-time SVC encoder capable of encoding CIF video with a QCIF base layer and fine grain scalable quality refinement at 12.5 fps on off-the-shelf high-end PCs. The reported quality degradation due to the optimization of the encoding algorithm is below 0.6 dB for the tested sequences.

Journal ArticleDOI
TL;DR: A framework for simultaneous image segmentation and object labeling leading to automatic image annotation focusing on semantic analysis of images contributes to knowledge-assisted multimedia analysis and bridging the gap between semantics and low level visual features.
Abstract: In this paper, we present a framework for simultaneous image segmentation and object labeling leading to automatic image annotation. Focusing on semantic analysis of images, it contributes to knowledge-assisted multimedia analysis and bridging the gap between semantics and low level visual features. The proposed framework operates at semantic level using possible semantic labels, formally represented as fuzzy sets, to make decisions on handling image regions instead of visual features used traditionally. In order to stress its independence of a specific image segmentation approach we have modified two well known region growing algorithms, i.e., watershed and recursive shortest spanning tree, and compared them to their traditional counterparts. Additionally, a visual context representation and analysis approach is presented, blending global knowledge in interpreting each object locally. Contextual information is based on a novel semantic processing methodology, employing fuzzy algebra and ontological taxonomic knowledge representation. In this process, utilization of contextual knowledge re-adjusts labeling results of semantic region growing, by means of fine-tuning membership degrees of detected concepts. The performance of the overall methodology is evaluated on a real-life still image dataset from two popular domains

Journal ArticleDOI
Sangkeun Lee1
TL;DR: The main advantage of the proposed algorithm enhances the details in the dark and the bright areas with low computations without boosting noise information and affecting the compressibility of the original image since it performs on the images in the compressed domain.
Abstract: The object of this paper is to present a simple and efficient algorithm for dynamic range compression and contrast enhancement of digital images under the noisy environment in the compressed domain. First, an image is separated into illumination and reflectance components. Next, the illumination component is manipulated adaptively for image dynamics by using a new content measure. Then, the reflectance component based on the measure of the spectral contents of the image is manipulated for image contrast. The spectral content measure is computed from the energy distribution across different spectral bands in a discrete cosine transform (DCT) block. The proposed approach also introduces a simple scheme for estimating and reducing noise information directly in the DCT domain. The main advantage of the proposed algorithm enhances the details in the dark and the bright areas with low computations without boosting noise information and affecting the compressibility of the original image since it performs on the images in the compressed domain. In order to evaluate the proposed scheme, several base-line approaches are described and compared using enhancement quality measures

Journal ArticleDOI
TL;DR: A novel joint application physical-layer design (JAPLD) strategy to cost-effectively transmit scalable H.264/AVC video over multi-input multi-output (MIMO) wireless systems and can implicitly achieve automatic unequal error protection (UEP) for layered SVC transmission over MIMO system without power control at the transmitter.
Abstract: In this paper, we present a novel joint application physical-layer design (JAPLD) strategy to cost-effectively transmit scalable H.264/AVC video over multi-input multi-output (MIMO) wireless systems. With this approach, the application layer cooperates with the physical layer to maximize the system performance. First, in physical layer, we propose a new layered video transmission scheme over MIMO: adaptive channel selection (ACS). ACS-MIMO is fundamentally different from parallel transmission MIMO (PT-MIMO). While each bit stream is continuously transmitted through a fixed antenna in PT-MIMO, ACS-MIMO is able to periodically switch each bit stream among multiple antennas. In application layer, Scalable Video Coding (SVC) generates layered bit streams that need prioritized delivery. Then, we obtain the ordering of each subchannel's SNR strength as partial channel information (CI) at the receiver. The partial CI is acquired via the estimated channel state information based on training sequences. The JAPLD strategy we developed in this research shall switch the bit stream automatically to match the ordering of SNR strength for the subchannels. Essentially, we will launch higher priority layer bit stream into higher SNR strength subchannel by the proposed JAPLD algorithm. In this fashion, we can implicitly achieve automatic unequal error protection (UEP) for layered SVC transmission over MIMO system without power control at the transmitter. Experimental results show that the proposed ACS-MIMO system is able to achieve UEP with the obtained partial CI and the reconstructed video peak signal-to-noise ratio demonstrate the performance improvement of the proposed system as compared with open loop PT-MIMO system.

Journal ArticleDOI
TL;DR: A new spatially adaptive wavelet-based Bayesian method for despeckling the SAR images using the zero-location Cauchy and zero-mean Gaussian distributions for incorporating the spatial dependency of the wavelet coefficients with the Bayesian estimation processes.
Abstract: The speckle noise complicates the human and automatic interpretation of synthetic aperture radar (SAR) images. Thus, the reduction of speckle is critical in various SAR image processing tasks. In this paper, we introduce a new spatially adaptive wavelet-based Bayesian method for despeckling the SAR images. The wavelet coefficients of the logarithmically transformed reflectance and speckle noise are modeled using the zero-location Cauchy and zero-mean Gaussian distributions, respectively. These prior distributions are then exploited to develop a Bayesian minimum mean absolute error estimator as well as a maximum a posteriori estimator. A new context-based technique with a reduced complexity is proposed for incorporating the spatial dependency of the wavelet coefficients with the Bayesian estimation processes. Experiments are carried out using typical noise-free images corrupted with simulated speckle noise as well as real SAR images, and the results show that the proposed method performs favorably in comparison to some of the existing methods in terms of the peak signal-to-noise ratio, speckle statistics and structural similarity index, and in its ability to suppress the speckle in the homogeneous regions

Journal ArticleDOI
TL;DR: Two approaches to improving compression efficiency are introduced by synthesizing pictures at a given time and a given position by using view interpolation and using them as reference pictures (view-interpolation prediction) and to correct the luminance and chrominance of other views by using lookup tables to compensate for photoelectric variations in individual cameras.
Abstract: Neighboring views must be highly correlated in multiview video systems. We should therefore use various neighboring views to efficiently compress videos. There are many approaches to doing this. However, most of these treat pictures of other views in the same way as they treat pictures of the current view, i.e., pictures of other views are used as reference pictures (inter-view prediction). We introduce two approaches to improving compression efficiency in this paper. The first is by synthesizing pictures at a given time and a given position by using view interpolation and using them as reference pictures (view-interpolation prediction). In other words, we tried to compensate for geometry to obtain precise predictions. The second approach is to correct the luminance and chrominance of other views by using lookup tables to compensate for photoelectric variations in individual cameras. We implemented these ideas in H.264/AVC with inter-view prediction and confirmed that they worked well. The experimental results revealed that these ideas can reduce the number of generated bits by approximately 15% without loss of PSNR.

Journal ArticleDOI
TL;DR: An overview of the system interface features defined in the SVC specification is provided, amongst other features, bit stream structure, extended network abstraction layer (NAL) unit header, and supplemental enhancement information (SEI) messages related to scalability information.
Abstract: Scalable video coding (SVC) and transmission has been a research topic for many years. Among other objectives, it aims to support different receiving devices, perhaps connected through a heterogeneous network structure, using a single bit stream. Earlier attempts of standardized scalable video coding, for example in MPEG-2, H.263, or MPEG-4 Visual, have not been commercially successful. Nevertheless, the Joint Video Team has recently focused on the development of the scalable video extensions of H.264/AVC, known as SVC. Some of the key problems of older scalable compression techniques have been solved in SVC and, at the same time, new and compelling use cases for SVC have been identified. While it is certainly important to develop coding tools targeted at high coding efficiency, the design of the features of the interface between the core coding technologies and the system and transport are also of vital importance for the success of SVC. Only through this interface, and novel mechanisms defined therein, applications can take advantage of the scalability features of the coded video signal. This paper provides an overview of the system interface features defined in the SVC specification. We discuss, amongst other features, bit stream structure, extended network abstraction layer (NAL) unit header, and supplemental enhancement information (SEI) messages related to scalability information.

Journal ArticleDOI
TL;DR: Using the Quality Layers post-processing to evaluate and signal the impact on rate and distortion of the various enhancement information pieces, a significant gain is achieved: quality layers significantly outperform the basic standard extractor that was initially proposed in SVC.
Abstract: The concept of quality layers that has been introduced in scalable video coding (SVC) amendment of MPEG4-AVC is presented. By using the Quality Layers post-processing to evaluate and signal the impact on rate and distortion of the various enhancement information pieces, a significant gain is achieved: quality layers significantly outperform the basic standard extractor that was initially proposed in SVC. For the standard set of test sequences, in a range of acceptable video quality, an average quality gain of up to 0.5 dB is achieved. Furthermore, the technique can be used for combined (spatial, temporal and quality) scalability. Thanks to the signaling of this information in the header of the network abstraction layer units or in a supplemental enhancement information message, the adaptation can be performed with a simple parser, e.g., at decoder side or in an intelligent network node designed for rate adaptation.

Journal ArticleDOI
TL;DR: High-resolution digital holography and pattern projection techniques such as coded light or fringe projection for real-time extraction of 3D object positions and color information could manifest themselves as an alternative to traditional camera-based methods.
Abstract: Advances in image sensors and evolution of digital computation is a strong stimulus for development and implementation of sophisticated methods for capturing, processing and analysis of 3D data from dynamic scenes. Research on perspective time-varying 3D scene capture technologies is important for the upcoming 3DTV displays. Methods such as shape-from-texture, shape-from-shading, shape-from-focus, and shape-from-motion extraction can restore 3D shape information from a single camera data. The existing techniques for 3D extraction from single-camera video sequences are especially useful for conversion of the already available vast mono-view content to the 3DTV systems. Scene-oriented single-camera methods such as human face reconstruction and facial motion analysis, body modeling and body motion tracking, and motion recognition solve efficiently a variety of tasks. 3D multicamera dynamic acquisition and reconstruction, their hardware specifics including calibration and synchronization and software demands form another area of intensive research. Different classes of multiview stereo algorithms such as those based on cost function computing and optimization, fusing of multiple views, and feature-point reconstruction are possible candidates for dynamic 3D reconstruction. High-resolution digital holography and pattern projection techniques such as coded light or fringe projection for real-time extraction of 3D object positions and color information could manifest themselves as an alternative to traditional camera-based methods. Apart from all of these approaches, there also are some active imaging devices capable of 3D extraction such as the 3D time-of-flight camera, which provides 3D image data of its environment by means of a modulated infrared light source.

Journal ArticleDOI
TL;DR: A regular spatial domain filtering technique is proposed to compute the dominant edge strength (DES) to reduce the possible predictive modes and the proposed fast intra-algorithm reduces 40% computation with slight peak signal-to-noise ratio (PSNR) degradation.
Abstract: In this paper, we present a fast mode decision algorithm and design its VLSI architecture for H.264 intra-prediction. A regular spatial domain filtering technique is proposed to compute the dominant edge strength (DES) to reduce the possible predictive modes. Experimental results revealed that the proposed fast intra-algorithm reduces 40% computation with slight peak signal-to-noise ratio (PSNR) degradation. The designed DES VLSI engine comprises a zigzag converter, a DES finite-state machine (FSM), and a DES core. The former two units handle memory allocation and control flow while the last performs pseudoblock computation, edge filtering, and dominant edge strength extraction. With semicustom design fabricated by 0.18 mum CMOS single-poly-six-metal technology, the realized die size is roughly 0.15 times 0.15 mm2 and can be operated at 66 MHz.

Journal ArticleDOI
TL;DR: A new multiview video coding scheme that can improve the compression efficiency under such a limited inter-view prediction structure and adopt the following three modifications: object-based interpolation on 3-D warping; depth estimation with consideration of rate-distortion costs; and quarter-pel accuracy depth representation.
Abstract: Multiview video coding demands high compression rates as well as view scalability, which enables the video to be displayed on a multitude of different terminals. In order to achieve view scalability, it is necessary to limit the inter-view prediction structure. In this paper, we propose a new multiview video coding scheme that can improve the compression efficiency under such a limited inter-view prediction structure. All views are divided into two groups in the proposed scheme: base view and enhancement views. The proposed scheme first estimates a view-dependent geometry of the base view. It then uses a video encoder to encode the video of base view. The view-dependent geometry is also encoded by the video encoder. The scheme then generates prediction images of enhancement views from the decoded video and the view-dependent geometry by using image-based rendering techniques, and it makes residual signals for each enhancement view. Finally, it encodes residual signals by the conventional video encoder as if they were regular video signals. We implement one encoder that employs this scheme by using a depth map as the view-dependent geometry and 3-D warping as the view generation method. In order to increase the coding efficiency, we adopt the following three modifications: (1) object-based interpolation on 3-D warping; (2) depth estimation with consideration of rate-distortion costs; and (3) quarter-pel accuracy depth representation. Experiments show that the proposed scheme offers about 30% higher compression efficiency than the conventional scheme, even though one depth map video is added to the original multiview video.

Journal ArticleDOI
TL;DR: Experimental results over a set of real-world sequences show that the proposed feature weighting procedure outperforms state-of-the-art solutions and thatThe proposed adaptive multifeature tracker improves the reliability of the target estimate while eliminating the need of manually selecting each feature's relevance.
Abstract: In this paper, we propose a tracking algorithm based on an adaptive multifeature statistical target model. The features are combined in a single particle filter by weighting their contributions using a novel reliability measure derived from the particle distribution in the state space. This measure estimates the reliability of the information by measuring the spatial uncertainty of features. A modified resampling strategy is also devised to account for the needs of the feature reliability estimation. We demonstrate the algorithm using color and orientation features. Color is described with partwise normalized histograms. Orientation is described with histograms of the gradient directions that represent the shape and the internal edges of a target. A feedback from the state estimation is used to align the orientation histograms as well as to adapt the scales of the filters to compute the gradient. Experimental results over a set of real-world sequences show that the proposed feature weighting procedure outperforms state-of-the-art solutions and that the proposed adaptive multifeature tracker improves the reliability of the target estimate while eliminating the need of manually selecting each feature's relevance.

Journal ArticleDOI
TL;DR: This letter proposes a spatial variation to the traditional temporal framework that allows statistical motion detection with methods trained on one background frame instead of a series of frames as is usually the case.
Abstract: Most statistical background subtraction techniques are based on the analysis of temporal color/intensity distribution. However, learning statistics on a series of time frames can be problematic, especially when no frame absent of moving objects is available or when the available memory is not sufficient to store the series of frames needed for learning. In this letter, we propose a spatial variation to the traditional temporal framework. The proposed framework allows statistical motion detection with methods trained on one background frame instead of a series of frames as is usually the case. Our framework includes two spatial background subtraction approaches suitable for different applications. The first approach is meant for scenes having a nonstatic background due to noise, camera jitter or animation in the scene (e.g.,waving trees, fluttering leaves). This approach models each pixel with two PDFs: one unimodal PDF and one multimodal PDF, both trained on one background frame. In this way, the method can handle backgrounds with static and nonstatic areas. The second spatial approach is designed to use as little processing time and memory as possible. Based on the assumption that neighboring pixels often share similar temporal distribution, this second approach models the background with one global mixture of Gaussians.