scispace - formally typeset
Search or ask a question

Showing papers in "Signal Processing-image Communication in 2005"


Journal ArticleDOI
TL;DR: A new perceptually-adaptive video coding (PVC) scheme for hybrid video compression is explored, in order to achieve better perceptual coding quality and operational efficiency and to integrate spatial masking factors with the nonlinear additivity model for masking (NAMM).
Abstract: We explore a new perceptually-adaptive video coding (PVC) scheme for hybrid video compression, in order to achieve better perceptual coding quality and operational efficiency. A new just noticeable distortion (JND) estimator for color video is first devised in the image domain. How to efficiently integrate masking effects together is a key issue of JND modelling. We integrate spatial masking factors with the nonlinear additivity model for masking (NAMM). The JND estimator applies to all color components and accounts for the compound impact of luminance masking, texture masking and temporal masking. Extensive subjective viewing confirms that it is capable of determining a more accurate visibility threshold that is close to the actual JND bound in human eyes. Secondly, the image-domain JND profile is incorporated into hybrid video encoding via the JND-adaptive motion estimation and residue filtering process. The scheme works with any prevalent video coding standards and various motion estimation strategies. To demonstrate the effectiveness of the proposed scheme, it has been implemented in the MPEG-2 TM5 coder and demonstrated to achieve average improvement of over 18% in motion estimation efficiency, 0.6 dB in average peak signal-to perceptual-noise ratio (PSPNR) and most remarkably, 0.17 dB in the objective coding quality measure (PSNR) on average. Theoretical explanation is presented for the improvement on the objective coding quality measure. With the JND-based motion estimation and residue filtering process, hybrid video encoding can be more efficient and the use of bits is optimized for visual quality.

305 citations


Journal ArticleDOI
TL;DR: The general estimation rule in the wavelet domain is derived to obtain the denoised coefficients from the noisy image based on the multivariate statistical theory and a parametric multivariate generalized Gaussian distribution model is defined which closely fits the sample distribution.
Abstract: Recently a variety of efficient image denoising methods using wavelet transforms have been proposed by many researchers. In this paper, we derive the general estimation rule in the wavelet domain to obtain the denoised coefficients from the noisy image based on the multivariate statistical theory. The multivariate distributions of the original clean image can be estimated empirically from a sample image set. We define a parametric multivariate generalized Gaussian distribution (MGGD) model which closely fits the sample distribution. Multivariate model makes it possible to exploit the dependency between the estimated wavelet coefficients and their neighbours or other coefficients in different subbands. Also it can be shown that some of the existing methods based on statistical modeling are subsets of our multivariate approach. Our method could achieve high quality image denoising. Among the existing image denoising methods using the same type of wavelet (Daubechies 8) filter, our results produce the highest peak signal-to-noise ratio (PSNR).

149 citations


Journal ArticleDOI
TL;DR: This work develops models for MD streaming over multiple paths and proposes a multi-path selection method that chooses a set of paths maximizing the overall quality at the client under various constraints, and considers the architecture and mechanisms by which multi- path streaming can be accomplished over a conventional IP network.
Abstract: Real-time media distribution over the Internet poses several challenging problems due to its stringent delay/loss requirements and complex network dynamics. A promising approach to alleviate the severe impacts of these dynamics is to transmit the media over diverse paths. For such an environment, multiple description (MD) coding has been previously proposed to produce multiple independently decodable streams that are routed over partially link–disjoint (non-shared) paths for combatting bursty packet losses and error propagation. However, selecting these paths appropriately is fundamental to the success of MD streaming and path diversity. Hence, in this paper we develop models for MD streaming over multiple paths and based on these models we propose a multi-path selection method that chooses a set of paths maximizing the overall quality at the client under various constraints. The simulation results with MPEG-2 show that sizeable average peak signal-to-noise ratio (PSNR) improvements (ranging from 0.73 to 6.07 dB) can be achieved when the source video is streamed over intelligently selected multiple paths as opposed to over the shortest path or maximally link–disjoint paths. In addition to the PSNR improvement, end-users experience a more continual, i.e., uninterrupted, streaming quality. Our work also considers the architecture and mechanisms by which multi-path streaming can be accomplished over a conventional IP network.

86 citations


Journal ArticleDOI
TL;DR: A model is developed which captures both the impact of encoder quantization and of packet loss due to network congestion on the overall video quality and is confirmed by network simulations performed with different routing algorithms, latency requirements and encoding structures.
Abstract: The performance of low-latency video streaming with multipath routing over ad hoc networks is studied. As the available transmission rate of individual links in an ad hoc network is typically limited due to power and bandwidth constraints, a single node transmitting multimedia data may impact the overall network congestion and may therefore need to limit its rate while striving for the highest sustainable video quality. For this purpose, optimal routing algorithms which seek to minimize congestion by optimally distributing traffic over multiple paths are attractive. To predict the end-to-end rate-distortion tradeoff, we develop a model which captures both the impact of encoder quantization and of packet loss due to network congestion on the overall video quality. The validity of the model is confirmed by network simulations performed with different routing algorithms, latency requirements and encoding structures.

83 citations


Journal ArticleDOI
TL;DR: An algorithm for moving object and region detection in video which is compressed using a wavelet transform (WT) is developed, which leads to a computationally efficient method and a system compared to the existing motion estimation methods.
Abstract: In many surveillance systems the video is stored in wavelet compressed form. In this paper, an algorithm for moving object and region detection in video which is compressed using a wavelet transform (WT) is developed. The algorithm estimates the WT of the background scene from the WTs of the past image frames of the video. The WT of the current image is compared with the WT of the background and the moving objects are determined from the difference. The algorithm does not perform inverse WT to obtain the actual pixels of the current image nor the estimated background. This leads to a computationally efficient method and a system compared to the existing motion estimation methods.

71 citations


Journal ArticleDOI
TL;DR: A new approach to color image denoising taking into consideration HVS model is presented and significant improvement is reported in the experimental results in terms of perceptual error metrics and visual effect.
Abstract: Recent research in transform-based image denoising has focused on the wavelet transform due to its superior performance over other transform. Performance is often measured solely in terms of PSNR and denoising algorithms are optimized for this quantitative metric. The performance in terms of subjective quality is typically not evaluated. Moreover, human visual system (HVS) is often not incorporated into denoising algorithm. This paper presents a new approach to color image denoising taking into consideration HVS model. The denoising process takes place in the wavelet transform domain. A Contrast Sensitivity Function (CSF) implementation is employed in the subband of wavelet domain based on an invariant single factor weighting and noise masking is adopted in succession. Significant improvement is reported in the experimental results in terms of perceptual error metrics and visual effect.

56 citations


Journal ArticleDOI
TL;DR: This work considers a quantization approach to Gauss mixture design based on the information theoretic view of Gaussian sources as a “worst case” for robust signal compression and describes the quantizer mismatch distortion and its relation to other distortion measures including the traditional squared error, the Kullback–Leibler and minimum discrimination information, and the log-likehood distortions.
Abstract: Gauss mixtures have gained popularity in statistics and statistical signal processing applications for a variety of reasons, including their ability to well approximate a large class of interesting densities and the availability of algorithms such as the Baum–Welch or expectation-maximization (EM) algorithm for constructing the models based on observed data. We here consider a quantization approach to Gauss mixture design based on the information theoretic view of Gaussian sources as a “worst case” for robust signal compression. Results in high-rate quantization theory suggest distortion measures suitable for Lloyd clustering of Gaussian components based on a training set of data. The approach provides a Gauss mixture model and an associated Gauss mixture vector quantizer which is locally robust. We describe the quantizer mismatch distortion and its relation to other distortion measures including the traditional squared error, the Kullback–Leibler (relative entropy) and minimum discrimination information, and the log-likehood distortions. The resulting Lloyd clustering algorithm is demonstrated by applications to image vector quantization, texture classification, and North Atlantic pipeline image classification.

56 citations


Journal ArticleDOI
TL;DR: The system presented in this paper applies concepts derived from computational intelligence, and supports an objective quality-assessment method based on a circular back-propagation (CBP) neural model, which renders quite accurately the image quality perceived by human assessors.
Abstract: Considerable research effort is being devoted to the development of image-enhancement algorithms, which improve the quality of displayed digital pictures. Reliable methods for measuring perceived image quality are needed to evaluate the performances of those algorithms, and such measurements require a univariant (i.e., no-reference) approach. The system presented in this paper applies concepts derived from computational intelligence, and supports an objective quality-assessment method based on a circular back-propagation (CBP) neural model. The network is trained to predict quality ratings, as scored by human assessors, from numerical features that characterize images. As such, the method aims at reproducing perceived image quality, rather than defining a comprehensive model of the human visual system. The connectionist approach allows one to decouple the task of feature selection from the consequent mapping of features into an objective quality score. Experimental results on the perceptual effects of a family of contrast-enhancement algorithms confirm the method effectiveness, as the system renders quite accurately the image quality perceived by human assessors.

52 citations


Journal ArticleDOI
TL;DR: The video frame-dependent watermark (VFDW) is presented, and extensive experimental results verify the excellent performance of the proposed compressed video watermarking system in addressing the issues of real-time detection, bit-rate control, and resistance to watermark estimation attacks.
Abstract: Digital watermarking is a helpful technology for providing copyright protection for valuable multimedia data. In particular, video watermarking deals with several issues, which are unique to various types of media watermarking. In this paper, these issues, including compressed domain watermarking, real-time detection, bit-rate control, and resistance to watermark estimation attacks, will be addressed. Since video sequences are usually compressed before they are transmitted over networks, we first describe how watermark signals can be embedded into compressed video while keeping the desired bit-rate nearly unchanged. In the embedding process, our algorithm is designed to operate directly in the variable length codeword (VLC) domain to satisfy the requirement of real-time detection. We describe how suitable positions in the VLC domain can be selected for embedding transparent watermarks. Second, in addition to typical attacks, the peculiar attacks that video sequences encounter are investigated. In particular, in order to deal with both collusion and copy attacks that are fatal to video watermarking, the video frame-dependent watermark (VFDW) is presented. Extensive experimental results verify the excellent performance of the proposed compressed video watermarking system in addressing the aforementioned issues.

49 citations


Journal ArticleDOI
TL;DR: It is shown that global, constant-velocity, translational motion in an image sequence induces in the DCT domain spectral occupancy planes, similarly to the FT domain, however, these planes are subject to spectral folding.
Abstract: Global, constant-velocity, translational motion in an image sequence induces a characteristic energy footprint in the Fourier-transform (FT) domain; spectrum is limited to a plane with orientation defined by the direction of motion. By detecting these spectral occupancy planes, methods have been proposed to estimate such global motion. Since the discrete cosine transform (DCT) is a ubiquitous tool of all video compression standards to date, we investigate in this paper properties of motion in the DCT domain. We show that global, constant-velocity, translational motion in an image sequence induces in the DCT domain spectral occupancy planes, similarly to the FT domain. Unlike in the FT case, however, these planes are subject to spectral folding. Based on this analysis, we propose a motion estimation method in the DCT domain, and we show that results comparable to standard block matching can be obtained. Moreover, by realizing that significant energy in the DCT domain concentrates around a folded plane, we propose a new approach to video compression. The approach is based on 3D DCT applied to a group of frames, followed by motion-adaptive scanning of DCT coefficients (akin to “zig-zag” scanning in MPEG coders), their adaptive quantization, and final entropy coding. We discuss the design of the complete 3D DCT coder and we carry out a performance comparison of the new coder with ubiquitous hybrid coders.

48 citations


Journal ArticleDOI
TL;DR: The 3D reconstruction algorithm in a stereo image pair for realizing mutual occlusion and interactions between the real and virtual world in an image synthesis is proposed, and the reconstructed 3D model produces a natural space in which the real world and virtual objects interact with each other as if they were in the same world.
Abstract: The 3D reconstruction algorithm in a stereo image pair for realizing mutual occlusion and interactions between the real and virtual world in an image synthesis is proposed. A two-stage algorithm, consisting of disparity estimation and regularization is used to locate a smooth and precise disparity vector. The hierarchical disparity estimation technique increases the efficiency and reliability of the estimation process, and edge-preserving disparity field regularization produces smooth disparity fields while preserving discontinuities that result from object boundaries. Depth information concerning the real scene is then recovered from the estimated disparity fields by stereo camera geometry. Simulation results show that the proposed algorithm provides accurate and spatially correlated disparity vector fields in various types of images, and the reconstructed 3D model produces a natural space in which the real world and virtual objects interact with each other as if they were in the same world.

Journal ArticleDOI
TL;DR: The proposed framework enables the adaptation of the coding process to the video content, network and end-device characteristics, allows for enhanced scalability, content-adaptivity and reduced delay, while improving the coding efficiency as compared to state-of-the-art motion-compensated wavelet video coders.
Abstract: We introduce an efficient and flexible framework for temporal filtering in wavelet-based scalable video codecs called unconstrained motion compensated temporal filtering (UMCTF). UMCTF allows for the use of different filters and temporal decomposition structures through a set of controlling parameters that may be easily modified during the coding process, at different granularities and levels. The proposed framework enables the adaptation of the coding process to the video content, network and end-device characteristics, allows for enhanced scalability, content-adaptivity and reduced delay, while improving the coding efficiency as compared to state-of-the-art motion-compensated wavelet video coders. Additionally, a mechanism for the control of the distortion variation in video coding based on UMCTF employing only the predict step is proposed. The control mechanism is formulated by expressing the distortion in an arbitrary decoded frame, at any temporal level in the pyramid, as a function of the distortions in the reference frames at the same temporal level. All the different scenarios proposed in the paper are experimentally validated through a coding scheme that incorporates advanced features (such as rate-distortion optimized variable block-size multihypothesis prediction and overlapped block motion compensation). Experiments are carried out to determine the relative efficiency of different UMCTF instantiations, as well as to compare against the current state-of-the-art in video coding.

Journal ArticleDOI
TL;DR: A framework of joint source-channel coding and power adaptation is presented, where error resilient source coding, channel coding, and transmission power adaptation are jointly designed to optimize video quality given constraints on the total transmission energy and delay for each video frame.
Abstract: We consider efficiently transmitting video over a hybrid wireless/wire-line network by optimally allocating resources across multiple protocol layers. Specifically, we present a framework of joint source-channel coding and power adaptation, where error resilient source coding, channel coding, and transmission power adaptation are jointly designed to optimize video quality given constraints on the total transmission energy and delay for each video frame. In particular, we consider the combination of two types of channel coding—inter-packet coding (at the transport layer) to provide protection against packet dropping in the wire-line network and intra-packet coding (at the link layer) to provide protection against bit errors in the wireless link. In both cases, we allow the coding rate to be adaptive to provide unequal error protection at both the packet and frame level. In addition to both types of channel coding, we also compensate for channel errors by adapting the transmission power used to send each packet. An efficient algorithm based on Lagrangian relaxation and the method of alternating variables is proposed to solve the resulting optimization problem. Simulation results are shown to illustrate the advantages of joint optimization across multiple layers.

Journal ArticleDOI
TL;DR: A system for the three-dimensional (3D) reconstruction of an underwater environment on the basis of multiple range views from an acoustical camera to improve the understanding of a human operator driving an underwater Remotely Operated Vehicle.
Abstract: This paper presents a system for the three-dimensional (3D) reconstruction of an underwater environment on the basis of multiple range views from an acoustical camera. The challenge is to provide the reconstruction on-line, as the range views are obtained from the sensor. The final target of the work is to improve the understanding of a human operator driving an underwater Remotely Operated Vehicle. The acoustic camera provides a sequence of 3D images in real time. Data must be registered and fused to generate a unique 3D mosaic in the form of a triangle mesh, which is rendered through a graphical interface. Available technologies for registration and meshing have been modified and extended to match time constraints. Some experiments on real data are reported.

Journal ArticleDOI
TL;DR: This paper analyzes and optimize the impact of network-embedded FEC (NEF) in overlay and p2p multimedia multicast networks and develops an optimization algorithm for the placement of NEF codecs within random multicast trees.
Abstract: Forward error correction (FEC) schemes have been proposed and used successfully for multicasting realtime video content to groups of users. Under traditional IP multicast, application-level FEC can only be implemented on an end-to-end basis between the sender and the clients. Emerging overlay and peer-to-peer (p2p) networks open the door for new paradigms of network FEC. The deployment of FEC within these emerging networks has received very little attention (if any). In this paper, we analyze and optimize the impact of network-embedded FEC (NEF) in overlay and p2p multimedia multicast networks. Under NEF, we place FEC codecs in selected intermediate nodes of a multicast tree. The NEF codecs detect and recover lost packets within FEC blocks at earlier stages before these blocks arrive at deeper intermediate nodes or at the final leaf nodes. This approach significantly reduces the probability of receiving undecodable FEC blocks. In essence, the proposed NEF codecs work as signal regenerators in a communication system and can reconstruct most of the lost data packets without requiring retransmission. We develop an optimization algorithm for the placement of NEF codecs within random multicast trees. Based on extensive H.264 video simulations, we show that this approach provides significant improvements in video quality, both visually and in terms of PSNR values.

Journal ArticleDOI
TL;DR: A separate edge-preserving regularization scheme to calculate disparity fields for a stereoscopic image pair and a joint disparity and motion estimation algorithm for stereoscopic video sequences and results are compared with existing algorithms and the superior performance of the proposed methods is confirmed.
Abstract: In this paper, we present a separate edge-preserving regularization scheme to calculate disparity fields for a stereoscopic image pair and a joint disparity and motion estimation algorithm for stereoscopic video sequences. We aim at using the block-based joint estimation algorithm to calculate the displacement fields for stereoscopic and multiview video coding. In the proposed separate regularization scheme, an edge-preserving cost function is proposed for matching, the Sobel edge values are incorporated into the cost function as edge-preserving weights. The optimal Lagrange multiplier is determined using the convex hull bisection algorithm under the rate-distortion theory. A fast algorithm is then proposed where the textured regions and the homogeneous regions of the images are identified and regularized differently. In the joint regularization scheme, we calculate the two motion fields and the two disparity fields for two successive image pairs simultaneously. The four fields are regularized iteratively under the stereo-motion consistency constraint. Results are compared with existing algorithms and the superior performance of the proposed methods is confirmed.

Journal ArticleDOI
TL;DR: A new knowledge-based predictive approach based on estimating the Mahalanobis distance between test sample feature values and the corresponding probability distribution function from training data that selectively triggers classifiers is proposed.
Abstract: In this paper we propose a ‘bank of classifiers’ approach to image region labelling and evaluate dynamic classifier selection and classifier combination approaches against a baseline approach that works with a single best classifier chosen using a validation set. In this analysis, image segmentation, feature extraction, and classification are treated as three separate steps of analysis. The classifiers used are each trained with a different texture feature representation of training images. The paper proposes a new knowledge-based predictive approach based on estimating the Mahalanobis distance between test sample feature values and the corresponding probability distribution function from training data that selectively triggers classifiers. This approach is shown to perform better than probability-based classifier combination (all classifiers are triggered but their decisions are fused with combination rules), and single classifier, respectively, based on classification rates and confusion matrices. The experiments are performed on the natural scene analysis application.

Journal ArticleDOI
TL;DR: A protocol for a completely distributed implementation has been developed and tested on a prototype system extending a point-to-point video phone to a multipoint one and several performance results are presented.
Abstract: A peer-to-peer architecture for multipoint videoconferencing is presented. Each conference participant may have asymmetric and dissimilar bandwidth connections to the Internet. The solution does not require additional hardware, as in multipoint control units, or network infrastructure support such as multicast. Without creating any additional demand on the networking and computing resources needed for a point-to-point videoconference, this architecture can extend it into a multipoint one. A protocol for a completely distributed implementation has been developed and tested on a prototype system extending a point-to-point video phone to a multipoint one. The architecture of the prototype system along with the details of the protocol optimization is discussed. Several performance results are presented.

Journal ArticleDOI
TL;DR: This paper compares different continuous isotropic nonlinear and anisotropic diffusion processes, which can be found in literature, with a process especially designed for image sequence denoising for motion estimation, and shows the superior behavior of this process.
Abstract: In this paper, we combine 3D anisotropic diffusion and motion estimation for image denoising and improvement of motion estimation. We compare different continuous isotropic nonlinear and anisotropic diffusion processes, which can be found in literature, with a process especially designed for image sequence denoising for motion estimation. All of these processes initially improve motion estimation due to reduction of noise and high frequencies. But while all the well known processes rapidly destroy or hallucinate motion information, the process brought forward here shows considerably less information loss or violation even at motion boundaries. We show the superior behavior of this process. Further we compare the performance of a standard finite difference diffusion scheme with several schemes using derivative filters optimized for rotation invariance. Using the discrete scheme with least smoothing artifacts we demonstrate the denoising capabilities of this approach. We exploit the motion estimation to derive an automatic stopping criterion.

Journal ArticleDOI
TL;DR: It is shown how the proposed transform can be applied to the problems of image coding, noise reduction and image fusion, and the advantages of this scheme are both analysis and synthesis operators are Gaussian derivatives.
Abstract: In this work, a multi-channel model for image representation is derived based on the scale-space theory. This model is inspired in biological insights and includes some important properties of human vision such as the Gaussian derivative model for early vision proposed by Young [The Gaussian derivative theory of spatial vision: analysis of cortical cell receptive field line-weighting profiles, General Motors Res. Labs. Rep. 4920, 1986]. The image transform that we propose in this work uses analysis operators similar to those of the Hermite transform at multiple scales, but the synthesis scheme of our approach integrates the responses of all channels at different scales. The advantages of this scheme are: (1) Both analysis and synthesis operators are Gaussian derivatives. This allows for simplicity during implementation. (2) The operator functions possess better space-frequency localization, and it is possible to separate adjacent scales one octave apart, according to Wilson's results on human vision channels. [H.R. Wilson, J.R. Bergen, A four mechanism model for spatial vision. Vision Res. 19 (1979) 19–32). (3) In the case of two-dimensional (2-D) signals, it is easy to analyze local orientations at different scales. A discrete approximation is also derived from an asymptotic relation between the Gaussian derivatives and the discrete binomial filters. We show in this work how the proposed transform can be applied to the problems of image coding, noise reduction and image fusion. Practical considerations are also of concern.

Journal ArticleDOI
TL;DR: An efficient algorithm of node selecting in the binary partition tree is proposed for the final face segmentation, which can exactly segment the faces without any underlying assumption.
Abstract: This paper presents an efficient face segmentation algorithm based on binary partition tree. Skin-like regions are first obtained by integrating the results of pixel classification and watershed segmentation. Facial features are extracted by the techniques of valley detection and entropic thresholding, and are used to refine the skin-like regions. In order to segment the facial regions from the skin-like regions, a novel region merging algorithm is proposed by considering the impact of the common border ratio between adjacent regions, and the binary partition tree is used to represent the whole region merging process. Then the facial likeness of each node in the binary partition tree is evaluated using a set of fuzzy membership functions devised for a number of facial primitives of geometrical, elliptical and facial features. Finally, an efficient algorithm of node selecting in the binary partition tree is proposed for the final face segmentation, which can exactly segment the faces without any underlying assumption. The performance of the proposed face segmentation algorithm is demonstrated by experimental results carried out on a variety of images in different scenarios.

Journal ArticleDOI
TL;DR: This paper presents the construction of adaptive wavelets by means of an extension of the lifting scheme and shows that these adaptive schemes yield lower entropies than schemes with fixed update filters, a property that is highly relevant in the context of compression.
Abstract: Over the past few years, wavelets have become extremely popular in signal and image processing applications. The classical linear wavelet transform, however, performs a homogeneous smoothing of the signal contents which, in some cases, is not desirable. This has led to a growing interest in (nonlinear) wavelet representations that can preserve discontinuities, such as transitions and edges. In this paper, we present the construction of adaptive wavelets by means of an extension of the lifting scheme. The basic idea is to choose the update filters according to some decision criterion which depends on the local characteristics of the input signal. We show that these adaptive schemes yield lower entropies than schemes with fixed update filters, a property that is highly relevant in the context of compression . Moreover, we analyze the effect of a scalar uniform quantization and the stability in such adaptive wavelet decompositions.

Journal ArticleDOI
Ju-Hyun Cho1, Seong-Dae Kim1
TL;DR: A unified framework for background subtraction is proposed and an algorithm using spatio-temporal thresholding and truncated variable adaptation rate (TVAR) for object detection and background adaptation, respectively are proposed.
Abstract: Object detection in image sequences has a very important role in many applications such as surveillance systems, tracking and recognition systems, coding systems and so on. This paper proposes a unified framework for background subtraction, which is very popular algorithm for object detection in image sequences. And we propose an algorithm using spatio-temporal thresholding and truncated variable adaptation rate (TVAR) for object detection and background adaptation, respectively. Especially when the camera moves and zooms in on something to track the target, we generate multi-resolution mosaic which is made up of many background mosaics with different resolution, and use it for object detection. Some experimental results in various environments show that the averaged performance of the proposed algorithm is good.

Journal ArticleDOI
TL;DR: The evaluation supports the claim that loglets are preferable to other designs and it is demonstrated that the loglet approach outperforms a Gaussian derivative approach in resolution and robustness.
Abstract: The paper discusses which properties of filter sets used in local structure estimation that are the most important. Answers are provided via the introduction of a number of fundamental invariances. Mathematical formulations corresponding to the required invariances leads up to the introduction of a new class of filter sets termed loglets. Loglets are polar separable and have excellent uncertainty properties. The directional part uses a spherical harmonics basis. Using loglets it is shown how the concepts of quadrature and phase can be defined in n-dimensions. It is also shown how a reliable measure of the certainty of the estimate can be obtained by finding the deviation from the signal model manifold. Local structure analysis algorithms are quite complex and involve a lot more than the filters used. This makes comparisons difficult to interpret from a filter point of view. To reduce the number ‘free’ parameters and target the filter design aspects a number of simple 2D experiments have been carried out. The evaluation supports the claim that loglets are preferable to other designs. In particular it is demonstrated that the loglet approach outperforms a Gaussian derivative approach in resolution and robustness.

Journal ArticleDOI
TL;DR: Experiments confirm that by using a scalable MVC, lower bit-rates can be attained without sacrificing motion-estimation efficiency and that the overall coding performance at low rates is significantly improved by a better distribution of the available rate between texture and motion information.
Abstract: Modern video coding applications require data transmission over variable-bandwidth wired and wireless network channels to a variety of terminals, possibly having different screen resolutions and available computing power. Scalable video coding technology is needed to optimally support these applications. Recently proposed wavelet-based video codecs employing spatial-domain motion-compensated temporal filtering (SDMCTF) provide quality, resolution and frame-rate scalability while delivering compression performance comparable to that of H.264, the state-of-the-art in single-layer video coding. These codecs require quality-scalable coding of the motion vectors to support a large range of bit-rates with optimal compression efficiency. In this paper, the practical use of prediction-based scalable motion-vector coding in the context of scalable SDMCTF-based video coding is investigated. Extensive experimental results demonstrate that, irrespective of the employed motion model, our prediction-based scalable motion-vector codec (MVC) systematically outperforms state-of-the-art wavelet-based solutions for both lossy and lossless compression. A new rate-distortion optimized rate-allocation strategy is proposed, capable of optimally distributing the available bit-budget between the different frames and between the texture and motion information, making the integration of the scalable MVC into a scalable video codec possible. This rate-allocation scheme systematically outperforms heuristic approaches previously employed in the literature. Experiments confirm that by using a scalable MVC, lower bit-rates can be attained without sacrificing motion-estimation efficiency and that the overall coding performance at low rates is significantly improved by a better distribution of the available rate between texture and motion information. The only downside of scalable motion-vector coding is a slight performance loss incurred at high bit-rates.

Journal ArticleDOI
TL;DR: A novel overlay multi-hop forward error correction (OM-FEC) scheme that provides FEC encoding/decoding capabilities at intermediate nodes in the overlay path that can outperform a pure end-to-end strategy, and can be much more efficient than a heavyweight hop-by-hop strategy.
Abstract: Overlay networks offer promising capabilities for video streaming, due to their support for application-layer processing at the overlay forwarding nodes. In this paper, we focus on the problem of providing lightweight support at selected intermediate overlay forwarding nodes to achieve increased error resilience on a single overlay path for video streaming. We propose a novel overlay multi-hop forward error correction (OM-FEC) scheme that provides FEC encoding/decoding capabilities at intermediate nodes in the overlay path. Based on the network conditions, the end-to-end overlay path is partitioned into segments, and appropriate FEC codes are applied over those segments. Architecturally, this flexible design lies between the end-to-end and hop-by-hop paradigms, and we argue that it is well suited to peer-based overlay networks. We evaluate our work by both simulations and controlled Planet-Lab network experiments. These evaluations show that OM-FEC can outperform a pure end-to-end strategy up to 10–15 dB in terms of video peak signal-to-noise ratio (PSNR), and can be much more efficient than a heavyweight hop-by-hop strategy.

Journal ArticleDOI
TL;DR: An on-going VideoGIS project, in which scalable geo-referenced video and geographic information (GI) are transmitted to GPS-guided vehicles, and the hypermedia, which contains cross-references video and GI, are organized in a scalable (layered) fashion.
Abstract: A VideoGIS system aims at combining geo-referenced video information with traditional geographic information in order to provide a more comprehensive understanding over a spatial location. Video data have been used with geographic information in some projects to facilitate a better understanding of the spatial objects of interest. This paper presents an on-going VideoGIS project, in which scalable geo-referenced video and geographic information (GI) are transmitted to GPS-guided vehicles. The hypermedia, which contains cross-referenced video and GI, are organized in a scalable (layered) fashion. The remote users can request, through 3G mobile devices, the abundant information related to the objects of interest, while adapting to heterogeneous network condition and local CPU usage. Available bandwidth estimation technique is used in the adaptive video transmission.

Journal ArticleDOI
TL;DR: An efficient algorithm is proposed to reduce the computational complexity of variable-size block-matching motion estimation, which uses multiple candidate motion vectors obtained from different block-sizes to avoid being trapped in local minima.
Abstract: In this paper, an efficient algorithm is proposed to reduce the computational complexity of variable-size block-matching motion estimation. We first investigate features of multiple candidate search centers, adaptive initial-blocksizes, search patterns, and search step-sizes, to match different motion characteristics and block-sizes. To avoid being trapped in local minima, the proposed algorithm uses multiple candidate motion vectors, which are obtained from different block-sizes. To further reduce the computation cost, a threshold-based early stop strategy according to the quantization parameter is suggested. With adaptive initial block-sizes, a merge-or-skip strategy is also proposed to reduce the computation for the final block-size decision. For the H.264/AVC encoder, simulations show that the proposed algorithms can speed up about 2.6–3.9 times of the original JM v6.1d encoder, which uses fast full-search for all block-sizes, and still maintain a comparable rate-distortion performance.

Journal ArticleDOI
TL;DR: This paper proposes a methodology for designing filters based on image content that minimize the estimator bias inherent to gradient-based image registration and shows that minimizing such bias improves the overall estimator performance in terms of mean square error for high signal-to-noise ratio (SNR) scenarios.
Abstract: Gradient-based image registration techniques represent a very popular class of approaches to registering pairs or sets of images. As the name suggests, these methods rely on image gradients to perform the task of registration. Very often, little attention is paid to the filters used to estimate image gradients. In this paper, we explore the relationship between such gradient filters and their effect on overall estimation performance in registering translated images. We propose a methodology for designing filters based on image content that minimize the estimator bias inherent to gradient-based image registration. We show that minimizing such bias improves the overall estimator performance in terms of mean square error (MSE) for high signal-to-noise ratio (SNR) scenarios. Finally, we propose a technique for designing such optimal gradient filters in the context of iterative multiscale image registration and verify their further improved performance.

Journal ArticleDOI
TL;DR: The main result is an analytical description of the motions and the distortions that occur at the occluding boundary and the exact expression for the distortion term is derived for the case of straight boundaries.
Abstract: We present a spatio-temporal analysis of motion at occluding boundaries The main result is an analytical description of the motions and the distortions that occur at the occluding boundary Based on this result we analyze occluding motions in the Fourier domain and show that the distortion term has an hyperbolic decay independent of the shape of the occluding boundary Moreover, we derive the exact expression for the distortion term for the case of straight boundaries The results are illustrated by using simulations with synthetic movies