scispace - formally typeset
Search or ask a question

Showing papers in "Signal Processing-image Communication in 1997"


Journal ArticleDOI
TL;DR: The developments that make MPEG-4 necessary, what the standard does to address the new needs, and an overview of how the standard is developed are described.
Abstract: The MPEG-4 audiovisual representation standard currently under development addresses the needs and requirements, arising from the increasing availability of audiovisual content in digital form. It goes beyond digitizing linear audio and video, specifying a description of digital material in the form of ‘objects’, that can be flexibly and interactively used and re-used. This paper describes the developments that make MPEG-4 necessary (‘Why’), what MPEG-4 does to address the new needs (‘What’), and gives an overview of how the standard is developed (‘How’).

171 citations


Journal ArticleDOI
TL;DR: The objective of this synthetic/natural hybrid coding (SNHC) is to facilitate content-based manipulation, interoperability, and wider user access in the delivery of animated mixed media, which will support non-real-time and passive media delivery, as well as more interactive, real-time applications.
Abstract: MPEG-4 addresses coding of digital hybrids of natural and synthetic, aural and visual (A/V) information. The objective of this synthetic/natural hybrid coding (SNHC) is to facilitate content-based manipulation, interoperability, and wider user access in the delivery of animated mixed media. SNHC will support non-real-time and passive media delivery, as well as more interactive, real-time applications. Integrated spatial-temporal coding is sought for audio, video, and 2D/3D computer graphics as standardized A/V objects. Targets of standardization include mesh-segmented video coding, compression of geometry, synchronization between A/V objects, multiplexing of streamed A/V objects, and spatial-temporal integration of mixed media types. Composition, interactivity, and scripting of A/V objects can thus be supported in client terminals, as well as in content production for servers, also more effectively enabling terminals as servers. Such A/V objects can exhibit high efficiency in transmission and storage, plus content-based interactivity, spatial-temporal scalability, and combinations of transient dynamic data and persistent downloaded data. This approach can lower bandwidth of mixed media, offer tradeoffs in quality versus update for specific terminals, and foster varied distribution methods for content that exploit spatial and temporal coherence over buses and networks. MPEG-4 responds to trends at home and work to move beyond the paradigm of audio/video as a passive experience to more flexible A/V objects which combine audio/video with synthetic 2D/3D graphics and audio.

135 citations


Journal ArticleDOI
TL;DR: The proposed scheme is developed to have both data encryption and compression performed simultaneously simultaneously, and relies on the computational infeasibility of an exhaustive search approach.
Abstract: A private key encryption scheme for a two-dimensional image data is proposed in this work. This scheme is designed on the basis of lossless data compression principle. The proposed scheme is developed to have both data encryption and compression performed simultaneously. For the lossless data compression effect, the quadtree data structure is used to represent the image; for the encryption purpose, various scanning sequences of image data are provided. The scanning sequences comprise a private key for encryption. Twenty four possible combinations of scanning sequences are defined for accessing four quadrants, thereby making available 24 n × 4 n(n − 1) 2 possibilities to encode an image of resolution 2n × 2n. The security of the proposed encryption scheme therefore relies on the computational infeasibility of an exhaustive search approach. Three images of 512 × 512 pixels are used to verify the feasibility of the proposed scheme. The testing results and analysis demonstrate the characteristics of the proposed scheme. This scheme can be applied for problems of data storage or transmission in a public network.

120 citations


Journal ArticleDOI
TL;DR: The state of the standardization of the Systems part of ISO/IEC JTC1/SC29/WG11 (MPEG-4 Systems) as specified early 1997 is presented and the language adopted by MPEG-4 on purpose of syntax description is fully detailed.
Abstract: The state of the standardization of the Systems part of ISO/IEC JTC1/SC29/WG11 (MPEG-4 Systems) as specified early 1997 is presented. First, a rationale and the architecture of the complete Systems are described. Based on the rapid technological advances in software and in hardware, the MPEG-4 Systems provides for a framework for the integration of natural and synthetic streamed and synchronised media. The different fields of interest in Systems are then developed from the description of audiovisual scenes to the definition of the multiplex. The programmability of the standard is described for composition and decompression and the language adopted by MPEG-4 on purpose of syntax description is fully detailed. Finally, a conclusion and the future evolutions of the specifications are presented.

97 citations


Journal ArticleDOI
TL;DR: An overview of version 4.0 of the Video VM in MPEG-4 is given, issues, algorithms, and majors tools used in the development of this future video standard are discussed.
Abstract: MPEG-4 video aims at providing standardized core technologies allowing efficient storage, transmission and manipulation of video data in multimedia environments. This is a challenging task given the broad spectrum of requirements and applications in multimedia. In order to achieve this broad goal, rather than a solution for a narrow set of applications, functionalities common to clusters of applications are under consideration. Therefore, video group activities in MPEG-4 aim at providing solutions in the form of tools and algorithms enabling functionalities such as efficient compression, object scalability, spatial and temporal scalability, and error resilience. The standardized MPEG-4 video will provide a toolbox containing tools and algorithms bringing solutions to the above-mentioned functionalities and more. The current focus of the MPEG-4 video group is the development of the Video Verification Models. A Verification Model (VM) is a common platform with a precise definition of encoding and decoding algorithms which can be presented as tools addressing specific functionalities. It evolves through time by means of core experiments. New algorithms/tools are added to the VM and old algorithms/tools are replaced in the VM by successful core experiments. Until October 1996, the MPEG-4 video group has focused its efforts on a single VM which has gradually evolved from version 1.0 to version 4.0, and in the process has addressed increasing number of desired functionalities, namely, content based object and temporal scalabilities, spatial scalability, error resilience, and compression efficiency. This paper gives an overview of version 4.0 of the Video VM in MPEG-4. In doing so, issues, algorithms, and majors tools used in the development of this future video standard are discussed.

83 citations


Journal ArticleDOI
TL;DR: Results indicate that compression of both views of stereoscopic video of normal TV resolution appears feasible with good quality in a total of 6–8 Mbit/s and the combined disparity and motion-compensated prediction is found to offer the best performance among combinations tested.
Abstract: Many current as well as emerging applications in areas of entertainment, remote operations, manufacturing industry and medicine can benefit from the depth perception offered by stereoscopic video systems which employ two views of a scene imaged under the constraints imposed by the human visual system. Among the many challenges to be overcome for practical realization and widespread use of 3D/stereoscopic systems are good systems for 3D video capture, display and efficient techniques for digital compression of enormous amounts of data while maintaining compatibility with normal video decoding and display systems. After a brief introduction to the basics of 3D/stereo including issues of depth perception, imaging and display, we present a brief overview of portions of the MPEG-2 video standard that are relevant to our discussion on compression of stereoscopic video. Next, we outline the various approaches for compression of stereoscopic video and then focus on compatible stereoscopic video coding using MPEG-2 Temporal scalability concepts. Compatible coding employing two different types of prediction structures become potentially possible, disparity compensated prediction and combined disparity and motion compensated predictions. To further improve coding performance and display quality, gain and offset preprocessing for reducing mismatch between the two views forming stereoscopic video is considered. We then introduce the various considerations in coding of stereoscopic video at lower bit-rates for the ongoing MPEG-4 standard. A method is proposed that builds on the proven framework of MPEG-2 like coding but introduces additional coding flexibilities to achieve reasonable performance at lower bit-rates for MPEG-4. Next, results of experiments are presented for a variety of combinations of MPEG-2 based coding methods for the left and the right views while employing TV resolution video for a number of sequences and for various bit-rates. The combined disparity and motion-compensated prediction is found to offer the best performance among combinations tested. These results indicate that compression of both views of stereoscopic video of normal TV resolution appears feasible with good quality in a total of 6–8 Mbit/s. Further, results are presented at much lower bit-rates based on the coding method proposed for MPEG-4 on two long test sequences. We then discuss multi-viewpoint video applications, the ongoing efforts towards a multi-viewpoint profile in MPEG-2 and expected direction of multi-viewpoint video coding in MPEG-4.

66 citations


Journal ArticleDOI
TL;DR: Results demonstrate that the proposed class of error concealment algorithms provides significant robustness for MPEG video delivery in the presence of channel impairments, permitting useful operation at ATM cell-loss rates in the region of 10 −4 to 10 −3 and 10 −2 to10 −1 for one- and two-tier transmission scenarios, respectively.
Abstract: This paper provides some research results on the topic of error resilience for robust decoding of MPEG (Moving Picture Experts Group) compressed video. It introduces and characterizes the performance of a general class of error concealment algorithms. Such receiver-based error concealment techniques are essential for many practical video transmission scenarios such as terrestrial HDTV broadcasting, packet network based teleconferencing/multimedia, and digital SDTV/HDTV delivery via ATM (asynchronous transfer mode). Error concealment is intended to ameliorate the impact of channel impairments (i.e., bit-errors in noisy channels, or cell-loss in ATM networks) by utilizing available picture redundancy to provide a subjectively acceptable rendition of affected picture regions. The concealment process must be supported by an appropriate transport format which helps to identify image pixel regions which correspond to lost or damaged data. Once the image regions (i.e., macroblocks, slices, etc.) to be concealed are identified, a combination of spatial and temporal replacement techniques may be applied to fill in lost picture elements. A specific class of spatio-temporal error concealment algorithms for MPEG video is described and alternative realizations are compared via detailed end-to-end simulations for both one- or two-tier transmission media. Several algorithm enhancements based on directional interpolation, ‘I-picture motion vectors’, and use of MPEG-2 ‘scalability’ features are also presented. In each case, achievable performance improvements are estimated via simulation. Overall, these results demonstrate that the proposed class of error concealment algorithms provides significant robustness for MPEG video delivery in the presence of channel impairments, permitting useful operation at ATM cell-loss rates in the region of 10 −4 to 10 −3 and 10 −2 to 10 −1 for one- and two-tier transmission scenarios, respectively.

62 citations


Journal ArticleDOI
TL;DR: Novel algorithms enabling low-complexity encoding of polynomial motion vector fields which are optimal in least mean square error sense which provide simultaneously high coding efficiency as well as ability of content-based encoding of video.
Abstract: This paper describes a method for motion-compensated compression of video sequences. The proposed method enables very efficient encoding of video frames partitioned into arbitrary-shaped regions. The implementation described in this paper employs simple quadtree-based segmentation of video frames and a polynomial model of the motion vector field. Polynomial motion vector fields are estimated using an iterative minimization method. Application of complex polynomial motion vector fields to coding of video requires their compact encoding, especially for low bit-rate applications. The paper proposes novel algorithms enabling low-complexity encoding of polynomial motion vector fields which are optimal in least mean square error sense. Specifically, an efficient method for merging of regions resulting from quadtree segmentation is presented. The advantage of this method is its ability to achieve large reduction in the number of regions in typical coded sequences with only a minor increase of prediction error. An algorithm for reduction of the number of polynomial motion vector field coefficients is also presented. This algorithm enables adaptation of the motion vector field model to the complexity of motion in the coded scene. The problem of quantization of polynomial motion coefficients is also addressed. Performance of the proposed algorithms has been evaluated in a simple motion-compensated DCT codec and compared to an ITU-T H.263 codec. Simulations show that a consistent reduction of bit-rate in excess of 25% can be achieved over a wide range of sequences. Depending on the envisaged application the described algorithms can be used together with segmentation of desired accuracy and therefore provide simultaneously high coding efficiency as well as ability of content-based encoding of video.

51 citations


Journal ArticleDOI
TL;DR: The proposed methodology constitutes a unifying and powerful framework for multichannel signal processing that uses fuzzy membership functions based on different distance measures among the image vectors to adapt to local data in the image.
Abstract: New filters for multichannel image processing are introduced and analysed. The proposed methodology constitutes a unifying and powerful framework for multichannel signal processing. The new filters use fuzzy membership functions based on different distance measures among the image vectors to adapt to local data in the image. Fuzzy aggregators are utilized to determine the weights in the proposed filter structure. The special case of colour image processing is studied as an important example of multichannel signal processing. Simulation results indicate that the new filters are computationally attractive and have excellent performance.

48 citations


Journal ArticleDOI
TL;DR: Simulations show that the proposed motion estimation tool provides higher PSNR than the classical block matching algorithm and may achieve the optimal sharing between motion and error information encoding.
Abstract: This paper introduces a motion estimation tool based on triangular active mesh. This tool can be used to model the deformation of various kinds of objects, especially frames and arbitrarily shaped regions known as video object planes (VOPs) in the MPEG-4 context. In the latter case, a polygon approximation of the region is performed in order to define border nodes and to triangulate the whole considered domain. Object motion is represented by a piecewise affine transformation whose coefficients are estimated by means of motion estimation of triangle vertices. Within the context of very low bit-rate coding, this tool appears to be useful for image prediction, temporal interpolation and may achieve the optimal sharing between motion and error information encoding. Simulations show that the proposed motion estimation tool provides higher PSNR than the classical block matching algorithm.

37 citations


Journal ArticleDOI
TL;DR: The extension of a generic object-based analysis-synthesis coder using the source model of ‘moving flexible 3D objects’ for the encoding of moving images at very low data rates to a knowledge-based coder in case scene-specific knowledge can be acquired is investigated.
Abstract: The topic of investigation was the extension of a generic object-based analysis-synthesis coder (OBASC) using the source model of ‘moving flexible 3D objects’ for the encoding of moving images at very low data rates to a knowledge-based coder in case scene-specific knowledge can be acquired. According to the coding concept, the OBASC describes and encodes each moving object of an image sequence by three parameter sets defining its motion, shape and surface color. The parameter sets of each object are obtained by image analysis. Using the coded parameter sets, an image can be synthesized by model-based image synthesis. The coder switches to a knowledge-based coder as soon as scene-specific knowledge can be acquired. In the example given in this paper, an algorithm tries to detect faces within moving objects using template matching and feature extraction techniques. If a face is detected, the face model Candide is adapted to the image sequence and integrated into the 3D model object describing the moving object. Due to this knowledge of the scene, the head and shoulders scene can be more efficiently modeled and encoded. Furthermore, the knowledge of the scene contents is used for controlling the coder. When compared to OBASC, the knowledge-based coder reduces the bit-rate required for encoding head and shoulder scenes by 17%.

Journal ArticleDOI
TL;DR: A hybrid scheme is proposed which utilizes both SIC and block transform coding (BTC) methods, and has 35% less computational complexity than the full SIC approach, while it retains all the edge information produced by the segmentation algorithm.
Abstract: This paper experimentally compares several variants of segmented image coding (SIC), some of which are new. First it compares two methods for approximating the image intensity, with respect to quality of the reconstructed image, computational complexity and memory requirements. An extension of an existing segmentation algorithm (edgmentation) is introduced and compared with two other segmentation algorithms suitable for SIC. Finally, a hybrid scheme is proposed which utilizes both SIC and block transform coding (BTC) methods. The hybrid scheme has 35% less computational complexity than the full SIC approach, while it retains all the edge information produced by the segmentation algorithm.

Journal ArticleDOI
TL;DR: Experimental results indicate that the proposed algorithm significantly outperforms Telenor's H.263 video coder both in terms of compression performance and computational complexity, while still producing a more error resilient bit stream.
Abstract: This paper introduces a new H.263-based video coding algorithm for operation at very low bit-rates. Although the algorithm is also based on block-based motion estimation/compensation and DCT coding, it is very different from conventional H.263 algorithms. Our algorithm employs (1) a rate-distortion-based mechanism to select amongst the H.263 macroblock coding types, (2) a fast median-based predictive motion searching technique, (3) a Lagrangian minimization for estimating the motion vectors, and (4) semi-fixed-length coders for coding the motion vectors and the DCT coefficients of the 8 × 8 motion-compensated prediction difference blocks. Experimental results indicate that the proposed algorithm significantly outperforms Telenor's H.263 video coder both in terms of compression performance and computational complexity, while still producing a more error resilient bit stream. Another important advantage of our algorithm is that the bit-rate, quality, and number of computations can be controlled through manipulating the Lagrangian and threshold parameters. This feature is usually desired in many very low bit-rate video communication applications due to power and mobility constraints.

Journal ArticleDOI
TL;DR: The MPEG-4 video subjective tests were successful, providing the MPEG community with critical information to guide in the selection of technologies for inclusion in the video part of the MPEG- 4 standard.
Abstract: A new audio-visual coding standard, MPEG-4, is currently under development. MPEG-4 will address not only compression, but also completely new audio-video coding functionalities related to content-based interactivity and universal access. As part of the MPEG-4 standardization process, in November, 1995 assessments were performed on technologies proposed for incorporation in the standard. These assessments included formal subjective tests, as well as expert panel evaluations. This paper describes the MPEG-4 video formal subjective tests. Since MPEG-4 addresses new coding functionalities, and also operates at bit-rates lower than ever subjectively tested before on a large scale, standard ITU test methods were not directly applicable. These methods had to be adapted, and even new test methods devised, for the MPEG-4 video subjective tests. We describe here the test methods used in the MPEG-4 video subjective tests, how the tests were carried out, and how the test results were interpreted. We also evaluate the successes and shortcomings of the MPEG-4 video subjective tests, and suggest possible improvements for future tests. The MPEG-4 video subjective tests were successful, providing the MPEG community with critical information to guide in the selection of technologies for inclusion in the video part of the MPEG-4 standard.

Journal ArticleDOI
TL;DR: The proposed object-based stereo image coding algorithm is very efficient for applications like stereoscopic video transmission, and is especially suited to advanced applications such as generation and transmission of intermediate views for multiview receiver systems.
Abstract: In this paper we propose an object-based stereo image coding algorithm. The algorithm relies on modeling of the object structure using 3D wire-frame models, and motion estimation using globally rigid and locally deformable motion models. Algorithms for the estimation of motion and structure parameters from stereo images are described. Motion parameters are used to construct predicted images at subsequent time instances by mapping the image texture on the object surface. Coding of object parameters, appearing background regions and prediction errors is investigated and experimental results with video-conference scenes are presented. The proposed algorithm is very efficient for applications like stereoscopic video transmission, and is especially suited to advanced applications such as generation and transmission of intermediate views for multiview receiver systems, as well as applications in which an object-wise editing of the bit-stream is required, such as video-production using preanalysed scenes or virtual reality applications.

Journal ArticleDOI
TL;DR: This paper investigates shape estimation of articulated 3D objects for object-based analysis-synthesis coding based on the source model of ‘moving articulated3D objects’ and introduces a new algorithm forObject-articulation, which subdivides a rigid model object represented by a mesh of triangles into flexible connected model object-components.
Abstract: This paper investigates shape estimation of articulated 3D objects for object-based analysis-synthesis coding based on the source model of ‘moving articulated 3D objects’. For shape estimation three steps are applied: shape-initialization, object-articulation and shape-adaptation. Here, a new algorithm for object-articulation is introduced. Object-articulation subdivides a rigid model object represented by a mesh of triangles into flexibly connected model object-components. For object articulation, neighboring triangles which exhibit similar 3D motion during the image sequence are clustered into patches. These patches are considered to be the model object-components. For 3D motion estimation of a single triangle, a more reliable algorithm is proposed. The reliability is measured by the probability of convergency to correct parameters. For improving the reliability both, a more robust technique is applied and the triangle and its neighborhood are evaluated by the estimation algorithm. For clustering, a frame to frame clustering method which considers clustering results obtained in previous frames is presented. The developed algorithm for object-articulation is incorporated in OBASC. Typical videophone test sequences were applied. Compared to OBASC based on the source model of ‘moving rigid 3D objects’, the transmission rate decreases from 63.5 to 53 kbit/s at a fixed image quality measured by SNR. Furthermore, a realistic object articulation of a model object ‘body’ into object-components ‘head’ and ‘shoulders’ can be achieved without a priori knowledge about the scene content.

Journal ArticleDOI
TL;DR: An analytical description of displacement estimation which allows an objective evaluation and comparison of various displacement estimation techniques is presented and allows one to optimise a displacement estimation technique for motion compensated image sequence coding.
Abstract: An analytical description of displacement estimation which allows an objective evaluation and comparison of various displacement estimation techniques is presented. For evaluation and comparison rate-distortion functions are calculated and by this the impact of the 2D motion model of the displacement estimator and the amplitude and spatial resolution of the estimated sparse displacement vector field can be quantified. The rate-distortion function describes the relationship between the encoding bit-rate required for transmission of the measured displacement vector field and the variance of the displacement error. The impact of the spatial and amplitude resolution of the sparse displacement vector field is described by sampling and quantisation of an exact displacement signal. The 2D motion model is considered by low pass filtering the exact displacement signal before sampling and subsequent spatial interpolation. The analytical description is verified by simulations of various displacement estimation techniques, the 2D motion models of which can be described by the nearest neighbour, affine and bilinear displacement vector interpolation, at different amplitude and spatial resolutions. The analytical description allows one to optimise a displacement estimation technique for motion compensated image sequence coding.

Journal ArticleDOI
TL;DR: Two hybrid codec systems to encode MPEG-II-based packet video are proposed, one has better concealment for impaired area that makes it more tolerant to higher cell-loss ratio, whereas the second generates lower bits, but it has the disadvantages of temporal error propagation.
Abstract: This paper proposes two hybrid codec systems to encode MPEG-II-based packet video. Two-layer coding for ATM transmission has two primary disadvantages: (1) the total bit-rate is increased compared with one-layer coding at the same picture quality; (2) the cell loss caused by the network traffic needs to be dealt with. To reduce the total bit-rate and avoid error propagation, we develop two different layer coding schemes and concealment methods for intracoded macro blocks (MBs) and intercoded MBs. The first proposed system has better concealment for impaired area that makes it more tolerant to higher cell-loss ratio, whereas the second one generates lower bits, but it has the disadvantages of temporal error propagation. In the experiments, we compare our method with previous ones (Ghanbari and Seferidis, 1993; Wada, 1989; Lee et al., 1993; Zhu et al., 1993) and illustrate that our two-layer codec gives a lower bit-rate and produces better error concealment.

Journal ArticleDOI
TL;DR: A combined source-channel coding scheme using multicarrier modulation, which achieves more than a 9 dB improvement over single-carrier systems in terms of the image signal-to-distortion ratio on very noisy channels.
Abstract: We have proposed and analyzed a combined source-channel coding scheme using multicarrier modulation. By changing the power and modulation of subchannels carrying different bits of the compressed signal, the channel-induced distortion can be minimized. An algorithm for the subchannel power allocation is derived. As an example, for DCT- and subbandcoded images, multicarrier systems using uncoded BPSK achieve more than a 9 dB improvement over single-carrier systems in terms of the image signal-to-distortion ratio on very noisy channels.

Journal ArticleDOI
TL;DR: A spatio-temporal simplification algorithm and a segmentation algorithm in order to segment video images to a small number of regions that removes the image details which are perceptually less sensitive for the human visual system and simplifies still objects while preserving sensitive moving objects.
Abstract: In the segmentation-based video compression approach, the number of regions is a basic constraint. This paper presents a spatio-temporal simplification algorithm and a segmentation algorithm in order to segment video images to a small number of regions. The simplification algorithm is based on mathematical morphology. It removes the image details which are perceptually less sensitive for the human visual system and simplifies still objects (null motion) while preserving sensitive moving objects. The segmentation algorithm is based on region growing and region merging, which does not produce artificial boundaries. Experimental results show that, due to the simplification, several still objects may be segmented into one region so that the number of regions is greatly reduced. The segmentation results are well adapted to segmentation-based motion compensation for video compression.

Journal ArticleDOI
TL;DR: A new algorithm for tracking a face combining global head motion compensation and the update of the face model Candide during the sequence is proposed and the experimental results show that the proposed algorithm reduces the average position errors for the eyes and the mouth by 48% and 53%, respectively, compared to face tracking by global headmotion compensation only.
Abstract: Tracking a face is one of the important topics for knowledge-based coding of videophone sequences and also for the representation of 3D objects within MPEG-4 Synthetic/Natural Hybrid Coding (SNHC). Up to now, the face model has been tracked by global head motion compensation. Because the 3D head model shape affects the accuracy of motion estimation, an inaccurate head model shape reduces the accuracy of face tracking. In this paper, a new algorithm for tracking a face combining global head motion compensation and the update of the face model Candide during the sequence is proposed. As a first stage of the proposed algorithm, face tracking only by global head motion compensation is used. After that, the 2D center positions of the eyes and the mouth of a person in the image sequence are estimated using template matching and feature point extraction techniques. Then, the shape of the face model Candide is updated during the sequence using these estimated 2D center positions. This proposed algorithm has been applied to typical videophone sequences with a spatial resolution corresponding to CIF and a frame rate of 10 Hz. For evaluation, error criteria have been introduced which give position errors of the eyes and the mouth averaged over a whole sequence. The experimental results show that the proposed algorithm reduces the average position errors for the eyes and the mouth by 48% and 53%, respectively, compared to face tracking by global head motion compensation only.

Journal ArticleDOI
TL;DR: The results presented show that there exists a perceptually optimum set of traffic statistics which provide a consistent perceived decoded image quality for a given probability of cell loss.
Abstract: This paper describes a variable bit-rate (VBR) rate control algorithm for the MPEG-2 video coder. The algorithm described uses perceptually adaptive quantisation by varying the quantiser step-size of the coder according to the spatial correlation, temporal correlation and a prediction of the quantisation distortion of each macroblock. The algorithm also shapes the traffic characteristics of the output bit-stream of the coder by varying the quantiser step-size to ensure that the traffic statistics of the output bit-rate of the coder comply with a predetermined traffic descriptor. The results presented show that there exists a perceptually optimum set of traffic statistics which provide a consistent perceived decoded image quality for a given probability of cell loss.

Journal ArticleDOI
Jörn Ostermann1
TL;DR: This paper reports on the evaluation of tools submitted for evaluation in November 1995 and January 1996 and suggests areas of core experiments to improve a video verification model (VM) as soon as the VM becomes available.
Abstract: MPEG-4 issued two calls for proposals requesting submission of algorithms and tools relevant to standardization of MPEG-4. This paper reports on the evaluation of tools submitted for evaluation in November 1995 and January 1996. Complete video coding schemes submitted in January 1996 are also covered. The goal of the evaluation was to cluster the tools according to the technical areas they address, to evaluate them according to the issues relevant to the standardization process, and finally to suggest areas of core experiments to improve a video verification model (VM) as soon as the VM becomes available. Altogether, MPEG evaluated 87 tools and 19 complete coding algorithms, most of them highlighted in this paper. During the evaluation, 19 areas for core experiments were identified. Each core experiment is targeted at different functionalities like compression efficiency, content-based coding, error resilience, scalability. This definition of core experiments caused close collaboration and supported mutual fertilization between organizations working on similar tools, which allowed the VM to progress much faster than expected.

Journal ArticleDOI
TL;DR: It turns out that for compression ratios on the order of 25:1 the reconstructed images are almost undistinguishable from the original ones, and that a good image quality is still achieved for ratios as high as 40:1.
Abstract: Multispectral images are formed by a large number of component images of a single subject taken in different spectral windows. They are often represented by tens or even hundreds of Mbits of data and huge resources are required to transmit and store them, making some form of data compression necessary. To obtain a high compression efficiency, exploiting both the spatial and the spectral dependency, we propose two coding schemes based on vector quantization and address prediction, one more suited to the case of strong spectral dependence, and the other preferable in the case of strong spatial dependence. The performances of the proposed techniques are assessed by means of numerical experiments and compared to those of other techniques known in the literature. It turns out that for compression ratios on the order of 25:1 the reconstructed images are almost undistinguishable from the original ones, and that a good image quality is still achieved for ratios as high as 40:1.

Journal ArticleDOI
TL;DR: This paper describes how two different evaluation methods were used and adjusted to fit the different testing requirements of the new MPEG-4 standard, and how this first major effort to test coding schemes at low bit-rates proved successful.
Abstract: During December 1995, subjective tests were carried out by members of the Moving Picture Experts Group (MPEG, ISO/JTC1/SC29/WG11) to select the proposed technology for inclusion in the audio part of the new MPEG-4 standard. The new standard addresses coding for more than just the functionality of data rate compression. Material coded at very low bit-rates is also included. Thus, different testing methodologies were applied, according to ITU-R Rec. BS 1116 for a bit-rate of 64 kbit/s per channel and according to ITU-T Rec. P.80 for lower bit-rates or functionalities other than data rate compression. Proposals were subjectively tested for coding efficiency, error resilience, scalability and speed change: a subset of the MPEG-4 ‘functionalities’. This paper describes how two different evaluation methods were used and adjusted to fit the different testing requirements. This first major effort to test coding schemes at low bit-rates proved successful. Based on the test results, decisions for MPEG-4 technology were made. This was the first opportunity for MPEG members to carry out tests on the submitted functionalities. In the process, much was learnt. As a result, some suggestions are made to improve the way new functionalities can be subjectively evaluated.


Journal ArticleDOI
TL;DR: Implementation results show that by introducing three image classes and using fuzzy classifier optimized by a genetic algorithm the encoding process can be speedup by about 40% of an unclassified encoding system.
Abstract: This paper presents a fractal image compression scheme incorporated with a fuzzy classifier that is optimized by a genetic algorithm. The fractal image compression scheme requires to find matching range blocks to domain blocks from all the possible division of an image into subblocks. With suitable classification of the subblocks by a fuzzy classifier we can reduce the search time for this matching process so as to speedup the encoding process in the scheme. Implementation results show that by introducing three image classes and using fuzzy classifier optimized by a genetic algorithm the encoding process can be speedup by about 40% of an unclassified encoding system.

Journal ArticleDOI
TL;DR: It is shown that the compression efficiency of a neural network depends on the normalization function used and that the new normalization functions consistently outperform the traditionalnormalization functions.
Abstract: Recently multilayer neural networks have been used for still picture compression. In these networks it is necessary to normalize the gray levels in the input picture before they are fed into the neural network. In this paper we investigate six different normalization functions, of which four are new and appear for the first time in this paper. We show that the compression efficiency of a neural network depends on the normalization function used and that the new normalization functions consistently outperform the traditional normalization functions.

Journal ArticleDOI
TL;DR: The use of Fuzzy Logic is used in order to model human subjective knowledge about the attribution of priorities to picture regions and an original bit-rate allocation method is proposed as a solution to the optimization problem posed by the rate-distortion theory.
Abstract: When video coders communicate at very-low bit-rate, it is often difficult for them to preserve an acceptable global quality of the images. A selection of the key regions according to the semantic relevance may therefore be useful for improving the quality, as it would permit an adaptive bit allocation, which is important for good subjective quality at (very-)low bit-rate: the essential features could be extracted and coded with a good quality, while the remaining portions of the image would be coarsely transmitted. This paper describes tools to perform classification of regions according to the subjective priority, and a generic algorithm to optimally share out the bit-rate. Among others, an important contribution of this paper is the use of Fuzzy Logic in order to model human subjective knowledge about the attribution of priorities to picture regions. An original bit-rate allocation method is also proposed as a solution to the optimization problem posed by the rate-distortion theory. The paper concludes presenting some preliminary results. (C) 1997 Elsevier Science B.V.

Journal ArticleDOI
TL;DR: A new hair model named ‘fractional hair model’ is proposed to reduce the total quantity of the hair data and many human hair CG images generated by this model are also presented.
Abstract: In the field of multimedia communication and very low bit-rate communication, it is necessary to generate natural and synthetic images by computer. CG technology plays an important part in realizing the communication. Using a realistic human model by CG technology, we can construct a new man-machine interface which is easy to use. Moreover, using a CG image can reduce the quantity of communication data compared with a video image data. The reduction of communication data enables very low bit-rate communication and real-time communication. Recently, many kinds of objects have become realistically generated using CG technology because CG technology has rapidly improved. However, human hair remains to be one of the most difficult objects to generate easily because the number of hair is enormous and the hair size has very thin figure. Furthermore, the style is very different individually and easily changeable by external force. In this paper, a new hair model named ‘fractional hair model’ is proposed to reduce the total quantity of the hair data and many human hair CG images generated by this model are also presented.