scispace - formally typeset
Search or ask a question

Showing papers in "Signal Processing-image Communication in 2001"


Journal ArticleDOI
TL;DR: The performance, relative merits and limitations of each of the approaches are comprehensively discussed and contrasted, and the related topic of camera operation recognition is also reviewed.
Abstract: Temporal video segmentation is the first step towards automatic annotation of digital video for browsing and retrieval. This article gives an overview of existing techniques for video segmentation that operate on both uncompressed and compressed video stream. The performance, relative merits and limitations of each of the approaches are comprehensively discussed and contrasted. The gradual development of the techniques and how the uncompressed domain methods were tailored and applied into compressed domain are considered. In addition to the algorithms for shot boundaries detection, the related topic of camera operation recognition is also reviewed.

447 citations


Journal ArticleDOI
TL;DR: This paper is an overview of the work done for protecting content owners' investment in intellectual property through encryption and watermarking.
Abstract: A digital home network is a cluster of digital audio/visual (A/V) devices including set-top boxes, TVs, VCRs, DVD players, and general-purpose computing devices such as personal computers. The network may receive copyrighted digital multimedia content from a number of sources. This content maybe broadcast via satellite or terrestrial systems, transmitted bycable operators, or made available as prepackaged media (e.g., a digital tape or a digital video disc). Before releasing their content for distribution, the content owners mayrequire protection byspecifying access conditions. Once the content is delivered to the consumer, it moves across the home network until it reaches its destination where it is stored or displayed. A copy protection system is needed to prevent unauthorized access to bit streams in transmission from one A/V device to another or while it is in storage on magnetic or optical media. Recently, two fundamental groups of technologies, encryption and watermarking, have been identi"ed for protecting copyrighted digital multimedia content. This paper is an overview of the work done for protecting content owners' investment in intellectual property. 2001 Elsevier Science B.V. All rights reserved.

226 citations


Journal ArticleDOI
TL;DR: A new embedded video coding system with local motion compensation using an invertible motion compensated 3-D subband filter bank is utilized for video analysis/synthesis, meeting the increasing need for fine granular scalability in video streaming applications.
Abstract: In this paper, we present a new embedded video coding system with local motion compensation. An invertible motion compensated 3-D subband filter bank is utilized for video analysis/synthesis. The efficient embedded image coding scheme EZBC is extended to 3-D coding of video subbands. The coded bitstream is rate scalable and fully embedded thus meeting the increasing need for fine granular scalability in video streaming applications. Our experimental results show the new algorithm outperforms nonscalable standard MPEG-2 by 0.8– 3.3 dB (luminance component) over a broad range of bitrates, while providing an embedded video bitstream. Unlike conventional SNR hybrid coders, this multirate property is achieved without a loss in compression efficiency. Comparisons are also made to two other competing embedded video coders, LZC and 3D-SPIHT, both based on 3-D subband coding as well. The new coder demonstrates a significant advantage for encoding video with high local motion content, thanks to the efficient inclusion of local motion information in the temporal filter bank of our proposed system.

216 citations


Journal ArticleDOI
TL;DR: This work describes the application of generalized multiple description (MD) source coding architected through the usage of forward error correction (FEC) codes to achieve robust and efficient Internet video streaming and multicast consistent with the existing Internet architecture.
Abstract: In this work, we describe the application of generalized multiple description (MD) source coding architected through the usage of forward error correction (FEC) codes to achieve robust and efficient Internet video streaming and multicast consistent with the existing Internet architecture. We highlight the synergistic interaction between the robust source coding and packetization strategy and an efficient as well as responsive, TCP-friendly congestion control protocol linear increase multiplicative decrease with history (LIMD/H) to achieve robust streaming over the Internet. The proposed source packetization scheme transforms a loss sensitive layered video bitstream into a robust packet stream so as to provide graceful resilience to network packet drops. The proposed congestion control mechanism provides low variation in transmission rate in steady state and at the same time is reactive and provably TCP friendly. Furthermore, exploiting the fundamental property of a multicast tree, namely, the observation that the multicast tree is a degraded channel, we propose an elegant application level architecture for efficient video multicast. We exploit the amenability of the MD coding strategy to fast transcoding. Our extensive simulation studies for the streaming problem and preliminary results for the multicast case, bear out the efficacy of the proposed setup.

92 citations


Journal ArticleDOI
TL;DR: Generic models, corresponding to current approaches for content integrity evaluation – labeling and watermarking – are described, providing a common basis to compare existing techniques.
Abstract: This paper is focused on digital image and video authentication, considered as the process of evaluating the integrity of image contents relative to the original picture and of being able to detect, in an automatic way, malevolent content modifications. Generic models, corresponding to current approaches for content integrity evaluation – labeling and watermarking – are described, providing a common basis to compare existing techniques. Two labeling implementations are put forward. The first one is based on second-order image moments. It can be thought of as a low-level approach, where the emphasis is put on computational simplicity. The second one relies on image edges and tackles the problem of image/video integrity evaluation from a semantic, higher-level point of view. The viability of both methods, as a means of protecting the content, is assessed under JPEG and MPEG2 compression and non-authorized image modifications.

70 citations


Journal ArticleDOI
Yung-Lyul Lee1, HyunWook Park1
TL;DR: The comparison results show that the post-filtering is slightly better than or similar to the loop filtering with respect to peak signal-to-noise ratio (PSNR), whereas the subjective image qualities of both methods are quite similar.
Abstract: When an image is highly compressed by using the current coding standards, the decompressed image has noticeable image degradations such as blocking artifacts near the block boundaries, corner outliers at cross points of blocks and ringing noise near image edges. These image degradations are caused by quantization process of the 8×8 DCT coefficients. In order to restore the decompressed image, a loop-filtering algorithm and a post-filtering algorithm have been developed. The developed methods perform an adaptive filtering on the decompressed image according to blocking and ringing flags that are defined to reduce computation complexity. Performances of both algorithms are compared with respect to the image quality and the computation complexity. The comparison results show that the post-filtering is slightly better than or similar to the loop filtering with respect to peak signal-to-noise ratio (PSNR), whereas the subjective image qualities of both methods are quite similar. However, the computation complexity of the loop filtering is much less than that of the post-filtering.

68 citations


Journal ArticleDOI
TL;DR: Experimental results indicate that the derived HVS-based quantization table can achieve better performance in rate-distortion sense than the JPEG default quantizationtable.
Abstract: In this paper, we propose a systematic procedure to design a quantization table based on the human visual system model for the baseline JPEG coder. By incorporating the human visual system model with a uniform quantizer, a perceptual quantization table is derived. The quantization table can be easily adapted to the specified resolution for viewing and printing. Experimental results indicate that the derived HVS-based quantization table can achieve better performance in rate-distortion sense than the JPEG default quantization table.

64 citations


Journal ArticleDOI
TL;DR: Two novel algorithms, namely intensity selection (IS) and connection selection (CS), that can be applied to the existing halftone image data hiding algorithms DHSPT, DHPT and DHST to achieve improved visual quality are proposed.
Abstract: In this paper, we propose two novel algorithms, namely intensity selection (IS) and connection selection (CS), that can be applied to the existing halftone image data hiding algorithms DHSPT, DHPT and DHST to achieve improved visual quality. The proposed algorithms generalize the hidden data representation and select the best location out of a set of candidate locations for the application of DHSPT, DHPT or DHST. The two algorithms provide trade-off between visual quality and computational complexity. The IS yields higher visual quality but requires either the original multi-tone image or the inverse-halftoned image which implies high computation requirement. The CS has lower visual quality than IS but requires neither the original nor the inverse-halftoned images. Some objective visual quality measures are defined. Our experiments suggest that significant improvement in visual quality can be achieved, especially when the number of candidate locations is large.

50 citations


Journal ArticleDOI
TL;DR: A new multi-label fast marching algorithm for expanding competitive regions that is based on the map of changed pixels previously extracted and initialised by two curves evolving in converging opposite directions is introduced.
Abstract: In this paper, we address two problems crucial to motion analysis: the detection of moving objects and their localisation. Statistical and level set approaches are adopted in formulating these problems. For the change detection problem, the inter-frame difference is modelled by a mixture of two zero-mean Laplacian distributions. At first, statistical tests using criteria with negligible error probability are used for labelling as changed or unchanged as many sites as possible. All the connected components of the labelled sites are used thereafter as region seeds, which give the initial level sets for which velocity fields for label propagation are provided. We introduce a new multi-label fast marching algorithm for expanding competitive regions. The solution of the localisation problem is based on the map of changed pixels previously extracted. The boundary of the moving object is determined by a level set algorithm, which is initialised by two curves evolving in converging opposite directions. The sites of curve contact determine the position of the object boundary. Experimental results using real video sequences are presented, illustrating the efficiency of the proposed approach.

43 citations


Journal ArticleDOI
Jun-Seon Kim1, HyunWook Park1
TL;DR: An adaptive 3-D median filtering, which achieves optimal image quality as well as fast computing time, is proposed to remove the impulse noise from a highly corrupted image sequence and adaptively applied based on the estimated impulse noise ratio.
Abstract: An adaptive 3-D median filtering, which achieves optimal image quality as well as fast computing time, is proposed to remove the impulse noise from a highly corrupted image sequence. The proposed algorithm is compared with the widely used impulse noise removal algorithms with respect to the peak signal-to-noise ratio and the number of computations. The proposed algorithm preserves the image details which are not expected to be corrupted by impulse noise so that the number of computations can be minimized. It has good restoration performance whether the number of pixels corrupted by impulse noise is large or small. In the proposed algorithm, the impulse noise ratio, which is the ratio of the number of pixels corrupted by impulse noise to the total number of pixels, is estimated, and the restoration filtering is adaptively applied based on the estimated impulse noise ratio.

42 citations


Journal ArticleDOI
TL;DR: A novel approach for face tracking, resulting in a visual feedback loop, is proposed, which constructs from precise range data a specific texture and wireframe face model, whose realism allows the analysis and synthesis modules to visually cooperate in the image plane, by directly using 2D patterns synthesized by the face model.
Abstract: We propose a novel approach for face tracking, resulting in a visual feedback loop: instead of trying to adapt a more or less realistic artificial face model to an individual, we construct from precise range data a specific texture and wireframe face model, whose realism allows the analysis and synthesis modules to visually cooperate in the image plane, by directly using 2D patterns synthesized by the face model. Unlike other feedback loops found in the literature, we do not explicitly handle the 3D complex geometric data of the face model, to make real-time manipulations possible. Our main contribution is a complete face tracking and pose estimation framework, with few assumptions about the face rigid motion (allowing large rotations out of the image plane), and without marks or makeup on the user's face. Our framework feeds the feature-tracking procedure with synthesized facial patterns, controlled by an extended Kalman filter. Within this framework, we present original and efficient geometric and photometric modelling techniques, and a reformulation of a block-matching algorithm to make it match synthesized patterns with real images, and avoid background areas during the matching. We also offer some numerical evaluations, assessing the validity of our algorithms, and new developments in the context of facial animation. Our face-tracking algorithm may be used to recover the 3D position and orientation of a real face and generate a MPEG-4 animation stream to reproduce the rigid motion of the face with a synthetic face model. It may also serve as a pre-processing step for further facial expression analysis algorithms, since it locates the position of the facial features in the image plane, and gives precise 3D information to take into account the possible coupling between pose and expressions of the analysed facial images.

Journal ArticleDOI
TL;DR: A novel hierarchical object-oriented video segmentation and representation algorithm that may be useful for MPEG-4 system generating the video object plane (VOP) automatically is proposed.
Abstract: In this paper, a novel hierarchical object-oriented video segmentation and representation algorithm is proposed. The local variance contrast and the frame difference contrast are jointly exploited for structural spatiotemporal video segmentation because these two visual features can indicate the spatial homogeneity of the grey levels and the temporal coherence of the motion fields efficiently, where the two-dimensional (2D) spatiotemporal entropic technique is further selected for generating the 2D thresholding vectors adaptively according to the variations of the video components. After the region growing and edge simplification procedures, the accurate boundaries among the different video components are further exploited by an intra-block edge extraction procedure. Moreover, the relationships of the video components among frames are exploited by a temporal tracking procedure. This proposed object-oriented spatiotemporal video segmentation algorithm may be useful for MPEG-4 system generating the video object plane (VOP) automatically.

Journal ArticleDOI
TL;DR: A novel video object segmentation method that is posed as a constrained maximum contrast path search problem along the edges of a 2-D triangular mesh, and a2-D mesh-based uncovered region detection method along the object boundary as well as within the object.
Abstract: This paper integrates fully automatic video object segmentation and tracking including detection and assignment of uncovered regions in a 2-D mesh-based framework. Particular contributions of this work are (i) a novel video object segmentation method that is posed as a constrained maximum contrast path search problem along the edges of a 2-D triangular mesh, and (ii) a 2-D mesh-based uncovered region detection method along the object boundary as well as within the object. At the first frame, an optimal number of feature points are selected as nodes of a 2-D content-based mesh. These points are classified as moving (foreground) and stationary nodes based on multi-frame node motion analysis, yielding a coarse estimate of the foreground object boundary. Color differences across triangles near the coarse boundary are employed for a maximum contrast path search along the edges of the 2-D mesh to refine the boundary of the video object. Next, we propagate the refined boundary to the subsequent frame by using motion vectors of the node points to form the coarse boundary at the next frame. We detect occluded regions by using motion-compensated frame differences and range filtered edge maps. The boundaries of detected uncovered regions are then refined by using the search procedure. These regions are either appended to the foreground object or tracked as new objects. The segmentation procedure is re-initialized when unreliable motion vectors exceed a certain number. The proposed scheme is demonstrated on several video sequences.

Journal ArticleDOI
Hyung-Sun Kim1, HyunWook Park1
TL;DR: Both schemes using the low-band-shift method have superior performances to the conventional motion estimation methods in spatial domain or wavelet domain with respect to peak-noise-to-signal ratio (PSNR).
Abstract: The shift-variant property of the discrete wavelet transform (DWT) makes the motion estimation and compensation inefficient in the wavelet domain. In order to overcome the shift-variant property of the DWT, a low-band-shift (LBS) method has been developed. Using the LBS method in the wavelet domain, two motion estimation and compensation schemes are developed and evaluated. One scheme is the motion estimation and compensation using the LBS method with wavelet-block basis and the other is with band-by-band basis. Both schemes using the LBS method have superior performances to the conventional motion estimation methods in spatial domain or wavelet domain with respect to peak-noise-to-signal ratio (PSNR). The experiment results show that the proposed schemes have PSNR improvements of above 1.0 dB to the full-search method in spatial domain.

Journal ArticleDOI
TL;DR: This paper describes a semi-automatic method for moving object segmentation and tracking suitable when a few objects have to be tracked, while the camera moves and fixates on them.
Abstract: This paper describes a semi-automatic method for moving object segmentation and tracking. This method is suitable when a few objects have to be tracked, while the camera moves and fixates on them. The user delineates approximately the initial locations in a selected frame and specifies the depth ordering of the objects to be tracked. First, motion-based segmentation is obtained through an initial application of a region growing algorithm. The partition map is sequentially tracked from frame to frame using motion compensation and location prediction. The segmentation map is obtained by the region growing algorithm. Translational motion is assumed for the moving objects, and local intensity or color average may be used as additional features. A post-processing procedure regularizes the object boundaries over time.

Journal ArticleDOI
TL;DR: The search for eigenvectors of a Toeplitz matrix shows that complex or real orthogonal mappings such as the discrete Fourier transform and its decompositions approximate the Karhunen–Loeve transformation in the case of first-order Markov processes.
Abstract: Analysis of colour images in the Red, Green and Blue acquisition space and in the intensity and chrominance spaces shows that colour components are closely correlated (Carron, Ph.D. Thesis, Univ. Savoie, France, 1995; Ocadis, Ph.D. Thesis, Univ. Grenoble, France, 1985). These have to be decorrelated so that each component of the colour image can be studied separately. The Karhunen–Loeve transformation provides optimal decorrelation of these colour data. However, this transformation is related to the colour distribution in the image, i.e. to the statistical properties of the colour image and is therefore dependent on the image under analysis. In order to enjoy the advantages of direct, independent and rapid transformation and the advantages of the Karhunen–Loeve properties, this paper presents the study of the approximation of the Karhunen–Loeve transformation. The approximation is arrived at through exploitation of the properties of Toeplitz matrices. The search for eigenvectors of a Toeplitz matrix shows that complex or real orthogonal mappings such as the discrete Fourier transform and its decompositions approximate the Karhunen–Loeve transformation in the case of first-order Markov processes.

Journal ArticleDOI
TL;DR: This paper presents a new method that significantly reduces the computational load of ITT-based image coding by transforming both domain and range blocks of the image into the frequency domain (which has proven to be more appropriate for ITT coding).
Abstract: Iterated transformation theory (ITT) coding, also known as fractal coding, in its original form, allows fast decoding but suffers from long encoding times. During the encoding step, a large number of block best-matching searches have to be performed which leads to a computationally expensive process. Because of that, most of the research efforts carried on this held are focused on speeding up the encoding algorithm. Many different methods and algorithms have been proposed, from simple classifying methods to multi-dimensional nearest key search. We present in this paper a new method that significantly reduces the computational load of ITT-based image coding. Both domain and range blocks of the image are transformed into the frequency domain (which has proven to be more appropriate for ITT coding). Domain blocks are then used to train a two-dimensional Kohonen neural network (KNN) forming a codebook similar to vector quantization coding. The property of KNN land self-organizing feature maps in general) which maintains the input space (transformed domain blocks) topology allows to perform a neighboring search to find the piecewise transformation between domain and range blocks. (C) 2001 Elsevier Science B.V. All rights reserved.

Journal ArticleDOI
TL;DR: An algorithm for optimal coding mode selection for the base and enhancement layers is developed, which limits error propagation due to packet loss, while retaining compression efficiency, in scalable video coding.
Abstract: We propose to improve the packet loss resilience of scalable video coding. An algorithm for optimal coding mode selection for the base and enhancement layers is developed, which limits error propagation due to packet loss, while retaining compression efficiency. We first derive a method to estimate the overall decoder distortion, which includes the effects of quantization, packet loss and error concealment employed at the decoder. The estimate accounts for temporal and spatial error propagation due to motion compensated prediction, and computes the expected distortion precisely per pixel. The distortion estimate is incorporated within a rate-distortion framework to optimally select the coding mode as well as quantization step size for the macroblocks in each layer. Simulation results show substantial performance gains for both base and enhancement layers.

Journal ArticleDOI
TL;DR: Simulation results confirm that the BNC 17/11 and BNC 16/8 wavelet bases are outstanding for compression of natural and medical images, and particularly for images with significant high-frequency detail, such as fingerprints.
Abstract: Filter bank design for wavelet compression is crucial; careful design enables superior quality for broad classes of images. The Bernstein basis for frequency-domain construction of biorthogonal nearly coiflet (BNC) wavelet bases forms a unified design framework for high-performance medium-length filters. A common filter bandwidth is characteristic of widely favoured BNC filter pairs: the classical CDF 9/7, the Villasenor 6/10, and the Villasenor 10/18. Based on this observation, we construct previously unknown BNC 17/11 and BNC 16/8 wavelet filters. Key filter-quality evaluation metrics, due to Villasenor, demonstrate these filters to be well suited for image compression. Also studied are the biorthogonal coiflet 17/11 (half-band), 18/10 and 10/6 filter pairs, which have not previously been formally evaluated for image coding. Simulation results confirm that the BNC 17/11 and BNC 16/8 wavelet bases are outstanding for compression of natural and medical images, and particularly for images with significant high-frequency detail, such as fingerprints. The BNC 17/11 pair recommends itself for international standardization for the compression of still images; the BNC 16/8 pair for high-quality compression of production quality video. Experimental evidence suggests biorthogonal filters achieve good compression if, subject to a filter bandwidth constraint, maximum vanishing moments are obtained for a given filter support.

Journal ArticleDOI
TL;DR: Based on the experimental results obtained in this study, the interpolation results by the proposed approach are always better than those from the three existing approaches used for comparison, which shows the feasibility of the proposed Approach.
Abstract: Image sequence interpolation, or to obtain an up-sampled image sequence equivalently from a corresponding low-resolution image sequence, is an ill-posed inverse problem. In this study, three processing steps, namely, regularization, discretization and optimization, are used to convert the image sequence interpolation problem into a solvable optimization problem. In regularization, a fitness function combining a set of spatial and temporal performance measures for rating the quality of the interpolated (up-sampled) images is defined, which is used to convert the original ill-posed interpolation problem into a well-posed optimization problem. Discretization transforms the well-posed problem into a discrete one so that it can be solved numerically. Genetic algorithms (GAs) are used to optimize the solution in the discrete solution space using three basic operations, namely, reproduction, crossover and mutation. In the proposed approach, instead of only the spatial information within the current image frame employed in most existing methods, both the spatial and temporal information within the image sequence can be employed. Based on the experimental results obtained in this study, the interpolation results by the proposed approach are always better than those from the three existing approaches used for comparison. This shows the feasibility of the proposed approach.

Journal ArticleDOI
TL;DR: Two ways of minimising the temporal propagation of errors resulting from packet losses because of the motion compensation process are proposed: the selective use of FEC on the motion information and the use of periodic reference frames, protected with FEC as well.
Abstract: The Internet was designed mainly for non-real-time data, and the existing error recovery mechanisms cannot be used for time critical applications because of their latency. Therefore, any real-time interactive application will need to be robust to packet losses caused mainly by congestion at routers, and this is especially true for video transmissions. In this paper, the problem of robust H.263+ video over the Internet is addressed. The effect of packet loss on H.263 (version 2) video transmitted over the Internet using the real-time transport protocol (RTP) is assessed and different packetisation strategies are also compared. The main problem is the temporal propagation of errors resulting from packet losses because of the motion compensation process. Two ways of minimising this propagation are proposed: the selective use of FEC on the motion information and the use of periodic reference frames, protected with FEC as well. The main advantage of these two techniques is that they do not introduce more than one frame delay, and they do not rely on retransmissions. When combined together, these are shown to perform better than using periodic intraframes at minimising error propagation. Our robust H.263+ encoder has been integrated into vic , a videoconferencing tool widely used over the multicast backbone (MBone) of the Internet.

Journal ArticleDOI
TL;DR: In this integrated multi-level BTC scheme, the visual quality is substantially improved, the bit-rate is maintained the same as standard BTC and the computational demand is still kept low.
Abstract: Block truncation coding (BTC) is very attractive for real-time image coding at moderate bit-rate due to its low computation and storage demands One major artefact of this image compression technique is edge raggedness Modifications are mainly in two areas: bit-rate reduction and improvement of the quality of reconstructed image Our investigation is in the latter area In our integrated multi-level BTC scheme, the visual quality is substantially improved, the bit-rate is maintained the same as standard BTC and the computational demand is still kept low We exploit various algorithms and derive new techniques to optimise the performance of multi-level BTC Dynamic range tuning (DRT) is used to tackle 2-level BTC Middle group settlement is used to carry out binary classification for tri-modal distribution For 4-level BTC, an iterative DRT is derived Based on our experience on 2- and 4-level BTCs, an optimised 3-level BTC is derived Extensive testing has been carried out on 30 images The resultant visual quality is nearly perfect and the PSNR may be boasted by up to 7 dB

Journal ArticleDOI
TL;DR: In this article, the authors evaluate the quality of full-motion video transmitted over IP via a two-layer codec and CBQ routers in the presence of TCP traffic and derive guidelines for the cost-effective divisions of bandwidth between base and enhancement layers.
Abstract: We evaluate the quality of full-motion video, when transmitted over IP via a two-layer codec and CBQ routers in the presence of TCP traffic. We investigate the effects of packet size and distribution of I-frames. We derive guidelines for the cost-effective divisions of bandwidth between base and enhancement layers, so as to maintain perceived quality in the presence of known amounts of packet loss.

Journal ArticleDOI
TL;DR: A novel content-adaptive enhancement filter is described, which aims at reducing compression artifacts in MPEG-coded video streams by selecting the most appropriate kernel among a set of pre-defined masks, based upon a classification of the pixels to be processed.
Abstract: A novel content-adaptive enhancement filter is described, which aims at reducing compression artifacts in MPEG-coded video streams. The filter locally selects the most appropriate kernel among a set of pre-defined masks, based upon a classification of the pixels to be processed. The features used in the classification phase take into account the distribution of transform coefficients and the presence of nearby contour pixels, previously detected by an edge extractor. An important aspect of the proposed approach is the low computational complexity, very appealing in the scenario of a typical low-cost consumer application (video communication over the Internet, set-top-box DVB receiver, etc.). Experimental results show that the proposed algorithm outperforms existing approaches with similar level of complexity.

Journal ArticleDOI
TL;DR: A new transport framework for complex multimedia applications over the next generation Internet, which provides differentiation functionality within one IP session as well as among different IP sessions, and a new bitstream classification, prioritization and packetization scheme.
Abstract: In this paper, we present a new user-aware adaptive object-based video transmission approach to heterogeneous users over the next generation Internet. Firstly, we describe a new transport framework for complex multimedia applications over the next generation Internet, which provides differentiation functionality within one IP session as well as among different IP sessions. It includes application-aware intelligent resource control at the edge of the network, fast transcoding and signaling in the network. Secondly, we propose a new bitstream classification, prioritization and packetization scheme in which different types of data such as shape, motion and texture are reassembled, assigned to different priority classes, and packetized separately based on their priorities. Thirdly, we present a simple but effective mechanism of object-based dynamic rate control and adaptation by selectively dropping packets in conjunction with differentiated services (Diffserv) to minimize the end-to-end quality distortion. Finally, we perform the queuing analysis for our mechanism and explore how to extend our approach to the multicast case. Experimental results demonstrate effectiveness of our proposed approach.

Journal ArticleDOI
TL;DR: Different filter settings and combinations are evaluated with respect to bandwidth and processing time and it is shown that filters can be used to efficiently support real time conferencing scenarios including mobile and low power receivers.
Abstract: In this paper, we classify and evaluate filter algorithms developed for a wavelet based video coder. Filter operations convert one representation of a video stream into a different one in which the client is more interested in. Based on our previous work, we have improved the frame rate filter and the dynamic rate shaping filter with preferences (DRS-P) which allows to trade off bandwidth consumption versus visual quality using given user constraints. The user may specify a target bandwidth and his/her preferences and the DRS-P tunes its internal filter operations to match the bandwidth contract and the user preferences based on bandwidth estimates and packet statistics gathered during filter operation. We evaluate different filter settings and combinations with respect to bandwidth and processing time and show that filters can be used to efficiently support real time conferencing scenarios including mobile and low power receivers.

Journal ArticleDOI
TL;DR: An error resilience technique for MPEG2 compressed video is introduced that is comparable to a standard Reed–Solomon (RS) code for BERs up to 10 −5 .
Abstract: An error resilience technique for MPEG2 compressed video is introduced. Errors are located in the compressed domain using a parity check, and then are corrected using residual redundancy in the compressed video stream. This redundancy is in the form of the MPEG syntax and in spatial smoothness. Performance is comparable to a standard Reed–Solomon (RS) code for BERs up to 10 −5 . Complexity is estimated to be better for these BERs.

Journal ArticleDOI
TL;DR: It is shown through end-to-end bit-level simulation, that highly reliable transmission of 24 and 64 kbps video can be realized at 15 and 40.5 kBd, respectively, with low delay, power and modest overall system complexity.
Abstract: This paper addresses the transmission of low bit-rate video image sequences through mobile satellite channels to provide portable communications services to remote areas. The particularly challenging aspects of this transmission channel include (1) rapid fading and log-normal shadowing, (2) low signal-to-noise ratio (SNR) due to the noise-limited channel, (3) the satellite's geostationary orbit which incurs a large 250 ms roundtrip propagation delay, (4) limited existing bandwidth near 2 GHz; the video service is to overlay existing Mobile Satellite (MSAT) voice service using a minimum number of 6 kHz (analog bandwidth) channels, and (5) the use of travelling-wave-tube amplifiers which preclude the bandwidth-efficient quadrature amplitude modulation (QAM) proposed for terrestrial high-definition TV (HDTV) broadcast. In the proposed concatenated system, the inner codec is compatible with both voice as well as future-oriented error-resilient, scalable video compression schemes. The key issues are the joint design of on-line channel estimation, soft-decision decoding, trellis-coded modulation (TCM), interleaving depths, and error correcting codes. We have shown through end-to-end bit-level simulation, that highly reliable transmission of 24 and 64 kbps video (H.263) can be realized at 15 and 40.5 kBd, respectively, with low delay, power and modest overall system complexity.

Journal ArticleDOI
TL;DR: MPEG-2 packet loss effect on video quality is quantitatively investigated, a temporal layered signal model is described and evaluated, and a quality measure for reconstructed pictures called macroblock impairment ratio is suggested and defined.
Abstract: To efficiently combat the signal loss of MPEG-2 transmission over unreliable networks, priority encoding transmission, unequal packet loss protection and priority dropping techniques have been studied in many papers. Those studies are based on the qualitative analysis of different importance of signals, without quantitative investigation of signal loss effect on video quality. In this paper, MPEG-2 packet loss effect on video quality is quantitatively investigated, a temporal layered signal model is described and evaluated, a quality measure for reconstructed pictures called macroblock impairment ratio is suggested and defined. The investigation and the model are specified for MPEG-2, but the principles and the methods are suitable for any layered video. These are useful for the development of efficient schemes and protocols for packet video transmission over unreliable networks.

Journal ArticleDOI
TL;DR: The proposed tree-structured product-codebook VQ is proposed to carry out low-complexity, low-memory storage VQ, with progressive transmission, for the compression of gray-scale images and multispectral images by means of the SPIHT algorithm, providing satisfactory experimental results.
Abstract: To carry out vector quantization (VQ) on large vectors, and hence obtain a good performance, it is necessary to introduce some structural constraint in the encoder Product-codebook VQ reduces memory storage and encoding complexity Tree-structured VQ reduces encoding complexity as well, and allows for progressive transmission In this paper tree-structured product-codebook VQ is proposed to carry out low-complexity, low-memory storage VQ, with progressive transmission The joint design of the tree-structured component codebooks is analyzed and a low-complexity greedy procedure is devised The proposed approach has been implemented for two applications: the compression of gray-scale images, and the compression of multispectral images by means of the SPIHT algorithm, providing in both cases satisfactory experimental results