scispace - formally typeset
Search or ask a question

Showing papers in "Signal Processing-image Communication in 1999"


Journal ArticleDOI
TL;DR: A theoretical framework is derived by which the Internet packet loss behavior can be directly related to the picture quality perceived at the receiver and it is demonstrated how this framework can be used to select appropriate parameter values for the overall system design.
Abstract: In this article we describe and investigate an Internet video streaming system based on a scalable video coder combined with unequal error protection that maintains an acceptable picture quality over a wide range of connection qualities. The proposed approach does not require any specific support from the network layer and is especially suited for Internet multicast applications where different users are perceiving different transmission conditions and no feedback channel can be employed. We derive a theoretical framework for the overall system by which the Internet packet loss behavior can be directly related to the picture quality perceived at the receiver. We demonstrate how this framework can be used to select appropriate parameter values for the overall system design. Experimental results show how the presented system achieves a gracefully degrading picture quality for packet losses up to 30%.

296 citations


Journal ArticleDOI
TL;DR: This paper presents an extensive survey on the development of neural networks for image compression which covers three categories: direct image compression by neural networks; neural network implementation of existing techniques, and neural network based technology which provide improvement over traditional algorithms.
Abstract: Apart from the existing technology on image compression represented by series of JPEG, MPEG and H.26x standards, new technology such as neural networks and genetic algorithms are being developed to explore the future of image coding. Successful applications of neural networks to vector quantization have now become well established, and other aspects of neural network involvement in this area are stepping up to play significant roles in assisting with those traditional technologies. This paper presents an extensive survey on the development of neural networks for image compression which covers three categories: direct image compression by neural networks; neural network implementation of existing techniques, and neural network based technology which provide improvement over traditional algorithms.

187 citations


Journal ArticleDOI
TL;DR: This paper presents a rate-distortion optimized mode selection method for packet lossy environments that takes into account the network conditions and the error concealment method used at the decoder.
Abstract: Reliable transmission of compressed video in a packet lossy environment cannot be achieved without error recovery mechanisms. We describe an effective method for increasing error resilience of video transmission over packet lossy networks such as the Internet. Intra coding (without reference to a previous picture) is a well-known technique for eliminating temporal error propagation in a predictive video coding system. Randomly intra coding of blocks increases error resilience to packet loss. However, when the error concealment used by the decoder is known, intra encoding following a method that optimizes the tradeoffs between compression efficiency and error resilience is a better alternative. In this paper, we present a rate-distortion optimized mode selection method for packet lossy environments that takes into account the network conditions and the error concealment method used at the decoder. We present results for different packet loss rates and typical packet sizes of the Internet, that illustrate the advantages of the proposed method.

156 citations


Journal ArticleDOI
TL;DR: A video-based, contact-free measurement system which allows combined tracking of the subject's eye positions and the gaze direction in near real time will be integrated into a new kind of autostereoscopic display which reproduces the limited depth of focus of human vision and shows comfortable, hologram-like scenes with motion parallax to subjects.
Abstract: We present a video-based, contact-free measurement system which allows combined tracking of the subject's eye positions and the gaze direction in near real time. This system will be integrated into a new kind of autostereoscopic display which supports the natural link between accommodation and convergence of human vision, reproduces the limited depth of focus of human vision and shows comfortable, hologram-like scenes with motion parallax to subjects.

126 citations


Journal ArticleDOI
TL;DR: This paper describes a real-time streaming solution suitable for non-delay-sensitive video applications such as video-on-demand and live TV viewing and gives an overview of a recent activity within MPEG-4 video on the development of a "ne-granular-scalability coding tool for streaming applications".
Abstract: Real-time streaming of audio-visual content over Internet Protocol (IP) based networks has enabled a wide range of multimedia applications. An Internet streaming solution has to provide real-time delivery and presentation of a continuous media content while compensating for the lack of Quality-of-Service (QoS) guarantees over the Internet. Due to the variation and unpredictability of bandwidth and other performance parameters (e.g. packet loss rate) over IP networks, in general, most of the proposed streaming solutions are based on some type of a data loss handling method and a layered video coding scheme. In this paper, we describe a real-time streaming solution suitable for non-delay-sensitive video applications such as video-on-demand and live TV viewing. The main aspects of our proposed streaming solution are: 1. An MPEG-4 based scalable video coding method using both a prediction-based base layer and a "ne-granular enhancement layer; 2. An integrated transport-decoder bu!er model with priority re-transmission for the recovery of lost packets, and continuous decoding and presentation of video. In addition to describing the above two aspects of our system, we also give an overview of a recent activity within MPEG-4 video on the development of a "ne-granular-scalability coding tool for streaming applications. Results for the performance of our scalable video coding scheme and the re-transmission mechanism are also presented. The latter results are based on actual testing conducted over Internet sessions used for streaming MPEG-4 video in realtime. ( Published by 1999 Elsevier Science B.V. All rights reserved.

112 citations


Journal ArticleDOI
TL;DR: The received video quality, as measured by PSNR and the Negsob measure, was significantly improved when the HiPP method was applied, and it is well suited for interactive and multicast applications.
Abstract: A method is proposed to protect MPEG video quality from packet loss for real-time transmission over the Internet. Because MPEG uses inter-frame coding, relatively small packet loss rates in IP transmission can dramatically reduce the quality of the received MPEG video. In the proposed high-priority protection (HiPP) method, the MPEG video stream is split into high- and low-priority partitions, using a technique similar to MPEG-2 data partitioning. Overhead resilient data for the MPEG video stream is created by applying forward error correction coding to only the high-priority portion of the video stream. The high- and low-priority data, and resilient data, are sent over a single channel, using a packetization method that maximizes resistance to burst losses, while minimizing delay and overhead. Because the proposed method has low delay and does not require re-transmission, it is well suited for interactive and multicast applications. Simulations were performed comparing the improvement in video quality using the HiPP method, using experimental Internet packet loss traces with loss rates in the range of 0–8.5%. Overhead resiliency data rates of 0%, 12.5%, 25%, and 37.5% were studied, with different compositions of the overhead data for the 25% and 37.5% overhead rates, in an attempt to find the “best” composition of the overhead data. In the presence of packet loss, the received video quality, as measured by PSNR and the Negsob measure, was significantly improved when the HiPP method was applied.

83 citations


Journal ArticleDOI
TL;DR: This paper addresses two important problems in motion analysis: the detection of moving objects and their localization and the labeling problem, which is solved using either iterated conditional modes (ICM) or highest confidence first (HCF) algorithms.
Abstract: In this paper we address two important problems in motion analysis: the detection of moving objects and their localization. A statistical approach is adopted in order to formulate these problems. For the first, the inter-frame difference is modelized by a mixture of two zero-mean generalized Gaussian distributions, and a Gibbs random field is used for describing the label set. A new method to determine the regularization parameter is proposed, based on a voting technique. This method is also modelized using a statistical framework. The solution of the second problem is based on the observation of only two successive frames. Using the results of change detection an adaptive statistical model for the couple of image intensities is identified. For each problem two different multiscale algorithms are evaluated, and the labeling problem is solved using either iterated conditional modes (ICM) or highest confidence first (HCF) algorithms. For illustrating the efficiency of the proposed approach, experimental results are presented using synthetic and real video sequences.

73 citations


Journal ArticleDOI
TL;DR: A novel technique is introduced to locate and track the facial area in videophone-type sequences and is robust with regard to di⁄erent skin types, and various types of object or background motion within the scene.
Abstract: A novel technique is introduced to locate and track the facial area in videophone-type sequences. The proposed method essentially consists of two components: (i) a color processing unit, and (ii) a knowledge-based shape and color analysis module. The color processing component utilizes the distribution of skin-tones in the HSV color space to obtain an initial set of candidate regions or objects. The second component in the segmentation scheme, that is, the shape and color analysis module is used to correctly identify and select the facial region in the case where more than one object has been extracted. A number of fuzzy membership functions are devised to provide information about each object’s shape, orientation, location and average hue. An aggregation operator finally combines these measures and correctly selects the facial area. The suggested approach is robust with regard to di⁄erent skin types, and various types of object or background motion within the scene. Furthermore, the algorithm can be implemented at a low computational complexity due to the binary nature of the operations involved. Experimental results are presented for a series of CIF and QCIF video sequences. ( 1999 Elsevier Science B.V. All rights reserved.

72 citations


Journal ArticleDOI
TL;DR: An error concealment algorithm for compressed video sequences is proposed; an iterative and a recursive one, and the necessity of an oriented high pass operator and the requirement of changing the initial condition in iterative regularized recovery algorithm.
Abstract: In this paper we propose an error concealment algorithm for compressed video sequences. For packetization and transmission, a two layer ATM is utilized so that the location of information loss is easily detected. The coded image can be degraded due to channel error, network congestion, and switching system problems. Seriously degraded images may therefore result due to information loss represented by DCT coefficients and motion vectors, and due to the inter-dependency of information in predictive coding. In order to solve the error concealment problem of intra frames, two spatially adaptive algorithms are introduced; an iterative and a recursive one. We analyze the necessity of an oriented high pass operator we introduce, and the requirement of changing the initial condition in iterative regularized recovery algorithm. Also, the convergence of iteration is analyzed. In recursive interpolation algorithm, the edge direction of the missing areas is estimated from the neighbors, and estimated edge direction is utilized for steering the direction of interpolation. For recovery of the lost motion vectors, an overlapped region matching algorithm is introduced. Several experimental results are presented.

63 citations


Journal ArticleDOI
TL;DR: This work proposes a new H.263+ rate control scheme which supports the variable bit rate (VBR) channel through the adjustment of the encoding frame rate and quantization parameter and develops a fast algorithm based on the inherent motion information within a sliding window in the underlying video.
Abstract: Most existing H.263+ rate control algorithms, e.g. the one adopted in the test model of the near-term (TMN8), focus on the macroblock layer rate control and low latency under the assumptions of a constant frame rate and through a constant bit rate (CBR) channel. These algorithms do not accommodate the transmission bandwidth fluctuation efficiently, and the resulting video quality can be degraded. In this work, we propose a new H.263+ rate control scheme which supports the variable bit rate (VBR) channel through the adjustment of the encoding frame rate and quantization parameter. A fast algorithm for the encoding frame rate control based on the inherent motion information within a sliding window in the underlying video is developed to efficiently pursue a good tradeoff between spatial and temporal quality. The proposed rate control algorithm also takes the time-varying bandwidth characteristic of the Internet into account and is able to accommodate the change accordingly. Experimental results are provided to demonstrate the superior performance of the proposed scheme.

45 citations


Journal ArticleDOI
TL;DR: This paper describes a procedure for model-based analysis and coding of both left and right channels of a stereoscopic image sequence by starting with a hierarchical dynamic programming technique for matching across the epipolar line for efficient disparity/depth estimation.
Abstract: This paper describes a procedure for model-based analysis and coding of both left and right channels of a stereoscopic image sequence The proposed scheme starts with a hierarchical dynamic programming technique for matching across the epipolar line for efficient disparity/depth estimation Foreground/background segmentation is initially based on depth estimation and is improved using motion and luminance information The model is initialised by the adaptation of a wireframe model to the consistent depth information Robust classification techniques are then used to obtain an articulated description of the foreground of the scene (head, neck, shoulders) The object articulation procedure is based on a novel scheme for the segmentation of the rigid 3D motion fields of the triangle patches of the 3D model object Spatial neighbourhood constraints are used to improve the reliability of the original triangle motion estimation The motion estimation and motion field segmentation procedures are repeated iteratively until a satisfactory object articulation emerges The rigid 3D motion is then re-computed for each sub-object and finally, a novel technique is used to estimate flexible motion of the nodes of the wireframe from the rigid 3D motion vectors computed for the wireframe triangles containing each specific node The performance of the resulting analysis and compression method is evaluated experimentally

Journal ArticleDOI
TL;DR: A new concealment technique is presented, which aims at restoring the lost visual information by using a synthetic reconstruction of the high-frequency content of the damaged blocks by using the spatial correlation of the available surrounding blocks.
Abstract: The transmission of coded visual information over packet networks introduces fidelity problems related to the loss of frames during transmission. In standard block-based coding, such losses result in a wrong reconstruction of long block sequences, also due to the use of predictive and variable length source coding techniques. In video transmission, artifacts are even more visible due to the temporal propagation caused by prediction and interpolation schemes. In order to reduce the impact of these errors on visual quality, appropriate concealment algorithms should be applied, aimed at minimizing the appearance of block artifacts due to transmission errors. In this paper, a new concealment technique is presented, which aims at restoring the lost visual information by using a synthetic reconstruction of the high-frequency content of the damaged blocks. The method is funded on the theory of sketch-based encoders: for each block to be interpolated, the sketch information of the available surrounding blocks is extracted and propagated to the missing area. Then, the low-pass content is easily interpolated from the sketch. The proposed method uses only the spatial correlation, and has been applied with good results in the transmission of video data over non-reliable packet networks.

Journal ArticleDOI
TL;DR: The performance of the MPEG-4 error resilient frame-based syntax, which addresses simple mobile communications, is studied with a meaningful set of test conditions and three different concealment schemes.
Abstract: This paper addresses the general problem of error resilience and concealment in the context of content-based video coding schemes, such as the new MPEG-4 video coding standard. In the first part of the paper, the most important error resilience tools available in the literature are studied, organized and classified according to three main classes, notably: error resilient source coding, channel coding and decoding, and error concealment. In the second part, the performance of the MPEG-4 error resilient frame-based syntax, which addresses simple mobile communications, is studied with a meaningful set of test conditions and three different concealment schemes.

Journal ArticleDOI
TL;DR: Simulations showed that motion compensated temporal error concealment provides best results for P- and B-pictures, whereas a combination of temporalerror concealment and re-synchronization or the use of shorter slices providesbest results for I- pictures.
Abstract: In terrestrial broadcasting the received power fluctuation might be in the order of 10 dB, where a non-hierarchical TV-transmission scheme may fail at bad reception conditions. A two layer hierarchical transmission system, also specified by the European digital terrestrial TV-broadcasting standard, may help to have better reception conditions especially for fixed receivers. However, the portable/mobile receiver that only needs the base layer of this hierarchical signal may risk a total loss of picture. Since no time interleaving is foreseen in the standard, the situation is more critical in the case of shadowing, high Doppler (mobile reception), or impulsive noise, whereas even for fixed receivers with two hierarchical levels a good quality of service cannot be guaranteed. Therefore, in order to assume a continuous service with higher quality, some post-processing techniques such as error concealment at the receiver side will be desirable. These techniques, for instance even in the case of hierarchical transmission, may offer better results by concealing the errors instead of using only the most robust stream. The aim of this article is to study different error concealment techniques for the European digital terrestrial TV-broadcasting standard where the source coding is based on MPEG-2. These techniques are based on temporal, spatial and frequency error concealment techniques and on the combination of them. Simulations showed that motion compensated temporal error concealment provides best results for P- and B-pictures, whereas a combination of temporal error concealment and re-synchronization or the use of shorter slices provides best results for I-pictures.

Journal ArticleDOI
TL;DR: The technique is shown to provide superior decode video quality over wide variety of channel errors, bit-rates and test sequences and to present an efficient search algorithm to identify a motion marker with good error resilience properties.
Abstract: This paper presents an error concealment strategy for improving the quality of compressed video data when transmitted over noisy communication channels. Data partitioning is used to enable the recovery of motion information when the compressed bit stream is corrupted by channel errors. At low bit-rates the motion data is a significant part of the entire video stream and its recovery enables the decoder to perform motion compensated error concealment and hence maintain adequate video quality. The technique is shown to provide superior decode video quality over wide variety of channel errors, bit-rates and test sequences. In order to enable effective error concealment the partitioned motion and texture data are separated by a motion marker. The motion marker needs to be unique from any valid combination of the motion VLC data. In this paper, we also present an efficient search algorithm to identify such a motion marker with good error resilience properties. This error concealment technique was proposed to the ISO MPEG4 standard and based on its performance, it has been accepted as part of the evolving standard definition.

Journal ArticleDOI
TL;DR: A scalable and recursive video coding scheme is outlined which is compared successfully to a hybrid coding scheme based on block matching and it is shown experimentally that the motion can be compensated almost as well as in the original fields.
Abstract: In this correspondence, the problem of recursive motion estimation and compensation in image subbands is considered. A pel recursive algorithm is presented for this purpose and it is shown experimentally that the motion can be compensated almost as well as in the original fields. Based on this algorithm, a scalable and recursive video coding scheme is outlined which is compared successfully to a hybrid coding scheme based on block matching.

Journal ArticleDOI
TL;DR: This paper discusses the implementation of a client/server system for the real-time delivery of MPEG-2 encoded audio/video streams over the IP networks based on thereal-time transport protocol (RTP), designed using off-the-shelf hardware and commercial operating systems.
Abstract: Current Internet and intranet development is focusing attention on networked multimedia services involving the transport of real-time multimedia streams over IP. Several important networked applications and services such as Internet TV and high quality video conferencing are based on MPEG-2 audio and video streaming. In this paper, we discuss our implementation of a client/server system for the real-time delivery of MPEG-2 encoded audio/video streams over the IP networks based on the real-time transport protocol (RTP). This system was designed using off-the-shelf hardware and commercial operating systems. We emphasized a cost-effective design that delivers acceptable quality audio and video under packet loss rates as high as 10 −1.

Journal ArticleDOI
TL;DR: A new vector-based nonlinear conversion algorithm has been developed which applies nonlinear center weighted median (CWM) filters and yields a very good interpolation quality and its advantages will be shown.
Abstract: The conversion of one video standard into another with different field and scan rates is a key feature for modern TV receivers and multimedia video equipment. Therefore, a new vector-based nonlinear conversion algorithm has been developed which applies nonlinear center weighted median (CWM) filters and yields a very good interpolation quality. One of the main properties of this algorithm is vector error tolerance. This property will be derived in this paper and its advantages will be shown. Assuming a 2-channel model of the human visual system with different spatio-temporal characteristics, there are contrary demands for the CWM filters. One can meet these demands by a vertical band separation and an application of the so-called temporally and spatially dominated CWMs. Hereby interpolation errors of the separated channels can be compensated by an adequate splitting of the spectrum. By this means, a very robust vector error tolerant upconversion method can be achieved which significantly improves the interpolation quality. By an appropriate choice of the CWM filter root structures main picture elements are interpolated correctly also if faulty vector fields occur. To demonstrate the correctness of the deduced interpolation scheme picture content is classified. These classes are distinguished by correct or incorrect vector assignment and correlated or noncorrelated picture content. The mode of operation of the new algorithm is portrayed for each class. Whereas for correlated picture content the mode of operation can be shown by object models the operation mode is shown for noncorrelated picture content by the output probability distribution function of the applied CWM filters. The new algorithm has been verified for a 100 Hz upconversion by objective evaluation methods and by comprehensive subjective test series. Within these tests for critical test sequences a gain of about 2 dB PSNR in the objective tests and about 0.4 evaluation grades in the subjective test series could be achieved.

Journal ArticleDOI
TL;DR: The use of various techniques for reconstructing regularly sampled images from irregularly spaced samples which result from the use of spatial transformations with forward mapping for motion compensation are discussed.
Abstract: Recently, there has been an increasing interest in the use of spatial transformations for motion-compensated prediction. A spatial transformation is a process by which the coordinates of pixels of one image are mapped to new coordinates to form another image. There are two methods by which a spatial transformation can be performed; forward or backward mapping. In most applications, backward mapping is preferred due to its simplicity and fast computation. Forward mapping, on the other hand, is generally avoided since it results in irregularly spaced samples which are difficult to use to reconstruct the desired regularly sampled image. The use of forward mapping spatial transformations, however, has the potential advantage of allowing adaptive motion compensation with no overhead required as compared to backward mapping. In this paper we discuss the use of various techniques for reconstructing regularly sampled images from irregularly spaced samples which result from the use of spatial transformations with forward mapping for motion compensation. A more general problem about the use of spatial transformations for motion compensation is the huge computational load required to find the optimal spatial transformation motion parameters. We also present a new fast search algorithm for refining initial estimates of the spatial transformation motion parameters. We finally evaluate the performance of all the techniques and compare their performance against that of the conventional motion compensation algorithm; the block matching algorithm.

Journal ArticleDOI
TL;DR: Soft centroids method is proposed for binary vector quantizer design, where the codevectors can take any real value between one and zero during the codebook generation process.
Abstract: Soft centroids method is proposed for binary vector quantizer design. Instead of using binary centroids, the codevectors can take any real value between one and zero during the codebook generation process. The binarization is performed only for the final codebook. The proposed method is successfully applied for three existing codebook generation algorithms: GLA, SA and PNN.

Journal ArticleDOI
TL;DR: It is demonstrated that the quality of the lower resolution service when this approach is used is fundamentally limited by drift and it is also shown that a relatively low data rate correction signal can provide a significant increase in the quality.
Abstract: Data partitioning has been incorporated within the MPEG 2 video coding standard as one means of achieving error resilience in transmission systems where more than one transmission priority level is available. However, data partitioning also includes many ideas from the study of frequency scalability. In particular, the base partition information could be used to form a lower spatial resolution reconstruction of the original service thus permitting service interworking. In this paper, we investigate service interworking using data partitioning and demonstrate that the quality of the lower resolution service when this approach is used is fundamentally limited by drift. It is also shown that a relatively low data rate correction signal can provide a significant increase in the quality of this lower resolution service.

Journal ArticleDOI
TL;DR: Results indicate the range of cell loss rates over which each error resilience tool is useful and how the tools provided in MPEG-2 relate to other approaches suggested in the literature are discussed.
Abstract: The MPEG-2 video coding standard is being extensively used worldwide for the provision of digital video services. Many of these applications involve the transport of MPEG-2 video over cell-based (or packet) networks. Examples include the broadband integrated services digital network (B-ISDN) and wireless local area networks (W-LAN). All of these networks can suffer from the problem of cell loss. For example, in the B-ISDN cells are discarded at times of network congestion. In the case of the W-LAN, transmission effects can lead to high error rates within a cell which in turn can lead to cell loss. It was realised during the development of MPEG-2 that resilience to cell loss was an important issue and a number of techniques were suggested to minimise its impact. In this paper, we briefly review these techniques, demonstrate how each of the tools can be best utilised and provide results which indicate the range of cell loss rates over which each error resilience tool is useful. We also discuss briefly how the tools provided in MPEG-2 relate to other approaches suggested in the literature.

Journal ArticleDOI
TL;DR: The dual forward prediction is effective for the bursty error and it improves the coding efficiency for some sequences and both PSNR and subjective quality are improved.
Abstract: In this paper, we present self error resilient video coding schemes for low-bitrate mobile communication. They are the dual forward prediction and the adaptive refreshing. If a coded bit stream suffers from transmission error, the reconstructed image is degraded and the degradation is propagated temporally. The dual forward prediction minimizes the degradation by using two prediction images selectively. Also, the adaptive refreshing quickly eliminates the degradation by concentrating the cyclic intra refreshing on the moving area of image. The simulations are conducted on MPEG-4 test sequences at 24 k and 48 kbps. Experiment results show that the dual forward prediction is effective for the bursty error and it improves the coding efficiency for some sequences. Concerning the adaptive refreshing, it is shown that both PSNR and subjective quality are improved. In conclusion, the proposed schemes are highly error resilient and suitable for low-bitrate mobile communication.

Journal ArticleDOI
TL;DR: Targeting multimedia communications over the Internet, a set of complementary techniques in the direction of both improved packet loss resiliency of video-compressed streams and e$cient usage of available network resources are described.
Abstract: Targeting multimedia communications over the Internet, this paper describes a set of complementary techniques in the direction of both improved packet loss resiliency of video-compressed streams and e$cient usage of available network resources. Aiming "rst at a best trade-o! between compression e$ciency and packet loss resiliency, a procedure for adapting the video coding modes to varying network characteristics is introduced. The coding mode selection is based on a rate-distortion procedure with global distortion metrics incorporating channel characteristics under the form of a two-state Markov model. This procedure has been incorporated in an MPEG-4 video encoder. It has been observed that, in error-free environments, the channel adaptive mode selection technique does not bring any penalty in terms of compression, with respect to the initial MPEG-4 encoder, while allowing a signi"cant gain with respect to simple conditional replenishment. On the other hand, under the same loss conditions, it is shown that this procedure signi"cantly improves the encoder’s performance with respect to the original MPEG-4 encoder, to approach the robustness of conditional replenishment mechanisms. This intrinsic robusti"cation of the encoder allows to minimize the e!ects of packet losses on the visual quality of the received video; however, it does not avoid losses. A rate-based #ow control mechanism is then developed and introduced into the encoder, in order to match the bandwidth requirements of the source to the bandwidth available over the path of the connection, for both &social’ and &individual’ bene"ts. The control mechanism developed combines an RTT-based control loop allowing early reaction to congestion and a TCPfriendly rate prediction model getting into play under lossy conditions. This hybrid control mechanism allows full rate control (even in loss-free conditions) and smooth rate variations together with high responsiveness. The introduction of the rate control in the MPEG-4 compliant encoder allows to maintain a stable PSNR and visual quality while decreasing signi"cantly the source throughput, hence reducing congestion and loss provoked by the same video source at a constant bit-rate. ( 1999 Elsevier Science B.V. All rights reserved.


Journal ArticleDOI
TL;DR: The design issues of an embedded SNR scalable MPEG-2 video encoder and also its associated error resilience decoding procedures are presented and a codeword divider is proposed to properly split quantization coefficients into two different layers.
Abstract: With the rapid development of digital video, noisy channels such as wireless and satellite are able to transmit digital video for many applications. SNR scalable codec systems provide robust error resilience capability for these channels. This paper presents the design issues of an embedded SNR scalable MPEG-2 video encoder and also its associated error resilience decoding procedures. Two existing cascade quantization designs are reviewed before presenting our further analysis on their performance and possible losses. Based on our observations on the losses of cascade quantizations, a codeword divider is proposed to properly split quantization coefficients into two different layers. The proposed encoder provides improvement on the coding efficiency from that obtained by the cascade quantizers. Error resilience procedures are developed at the decoder side, which fully utilize the properties of a scalable system. Simulation results demonstrate that the proposed codec can achieve superior quality in decoded pictures, regardless of bit-rate ratio between the layers or during error bursts. Furthermore, since only one quantization device is needed, this encoder relatively reduces hardware complexity.

Journal ArticleDOI
TL;DR: The proposed solution does not take into account the possible different delays that may be introduced between the different communication points, thus resulting eventually in potential jitters in the reconstruction of the audio-visual streams between different receivers.
Abstract: This paper deals with the problem of audio and video synchronization issues for real-time audio-visual communication over IP-based networks. Starting from the real-time transport protocol (RTP) specifications (Schulzrinne, 1995), it provides an accurate description on how to recover a reliable absolute time reference for audio and video signals from header information in RTP and RTP control protocol packets. Such temporal informations allow to synchronize both media within acceptable perceptual bounds for reconstruction at any receiver end, in a possibly multi-point videoconference. This may occur independently of the fact that all (audio/video) packets reach destination, or that multiple replications of such packets arrive at destination. The proposed solution does not take into account the possible different delays that may be introduced between the different communication points, thus resulting eventually in potential jitters in the reconstruction of the audio-visual streams between different receivers. Each receiver handles its reconstruction independently of any transmission/processing delay. In order to ensure a better quality of the reconstructed material, priority is given to audio information. If the audio stream anticipate the video stream, the receiver simply discards video packets. Conversely, when video is ahead of audio information, the video decoding stage is interrupted till audio information arrives. Experimental simulations over a LAN have demonstrated the validity of the proposed approach.

Journal ArticleDOI
TL;DR: The proposed rate–quality tradeoff scheme has a higher, and also a more consistent, quality measure than the MPEG2 Test Model 5 scheme and opens a different video encoding arena compared to the traditional rate–distortion encoding scheme.
Abstract: This paper will address a video coding scheme which controls the video subjective quality with bit-rate constraint. In the paper, a video quality measure based on block-based features which has a high correlation with subjective test experiments is presented first. Four block-based features (the average of DFT differences, the standard deviation of DFT differences, the mean absolute deviation of wepstrum differences, and the variance of UVW differences) are extracted from the input and output video sequences and fed into a four-layer backpropagation neural network that has been trained by subjective testing. These features were selected out of a much larger set of candidate features, based on their combined ability to predict observer assessments of image quality. The quality measure is then introduced into the design of a rate–quality tradeoff MPEG encoder which is based on the non-convex quality and bitcount functions at the macroblock level. The relationships between one macroblock's quality, a frame's quality and overall quality have also been addressed. The proposed rate–quality tradeoff scheme has a higher, and also a more consistent, quality measure than the MPEG2 Test Model 5 scheme. The objective quality, as measured by the signal-to-noise ratio is also improved, and the variations in the signal-to-noise ratio and bit-rate from frame to frame are reduced. The results have also been validated by a subjective test experiment. The proposed rate–quality tradeoff scheme opens a different video encoding arena compared to the traditional rate–distortion encoding scheme.

Journal ArticleDOI
TL;DR: An error resilient video coding system, which dynamically replaces reference pictures in inter-frame coding according to the backward channel signaling, and shows the performance compared with the previous version of ITU-T H.263 by a computer simulation.
Abstract: We have proposed an error resilient video coding system, which dynamically replaces reference pictures in inter-frame coding according to the backward channel signaling. This system can prevent the temporal error propagation, because the encoder would not use the erred picture as the reference picture. This system has two modes, one is ACK mode and the other is NACK mode. ACK mode is effective in more erroneous condition, and NACK mode is effective in less erroneous condition. In this paper, we explain both modes and focus on the mode switching mechanism according to the error condition on the network so that the optimal performance can be achieved in any error conditions by the proposed system. We show the performance of our proposed system compared with the previous version of ITU-T H.263, which did not have a reference picture selection mode, by a computer simulation as well.

Journal ArticleDOI
TL;DR: Simulations of the different predictors for a variety of real world and medical images, evaluated both numerically and graphically, show the superiority of median based prediction over this proposed implementation of context model based prediction, for all resolutions.
Abstract: This paper presents a study of lossless image compression of fullband and subband images using predictive coding. The performance of a number of different fixed and adaptive predictors are evaluated to establish the relative performance of different predictors at various resolutions and to give an indication of the achievable image resolution for given bit rates. In particular, the median adaptive predictor is compared with two new classes of predictors proposed in this paper. One is based on the weighted median filter, while the other uses context modelling to select the optimum from a set of predictors. A graphical tool is also proposed to analyse the prediction methods. Simulations of the different predictors for a variety of real world and medical images, evaluated both numerically and graphically, show the superiority of median based prediction over this proposed implementation of context model based prediction, for all resolutions. The effects of different subband decomposition techniques are also explored.