scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Multimedia in 1999"


Journal ArticleDOI
TL;DR: This work presents a system that adapts multimedia Web documents to optimally match the capabilities of the client device requesting it using a representation scheme called the InfoPyramid that provides a multimodal, multiresolution representation hierarchy for multimedia.
Abstract: Content delivery over the Internet needs to address both the multimedia nature of the content and the capabilities of the diverse client platforms the content is being delivered to. We present a system that adapts multimedia Web documents to optimally match the capabilities of the client device requesting it. This system has two key components. 1) A representation scheme called the InfoPyramid that provides a multimodal, multiresolution representation hierarchy for multimedia. 2) A customizer that selects the best content representation to meet the client capabilities while delivering the most value. We model the selection process as a resource allocation problem in a generalized rate distortion framework. In this framework, we address the issue of both multiple media types in a Web document and multiple resource types at the client. We extend this framework to allow prioritization on the content items in a Web document. We illustrate our content adaptation technique with a web server that adapts multimedia news stories to clients as diverse as workstations, PDA's and cellular phones.

652 citations


Journal ArticleDOI
TL;DR: An efficient and reliable probabilistic metric derived from the Bhattacharrya distance is used in order to classify the extracted feature vectors into face or nonface areas, using some prototype face area vectors, acquired in a previous training stage.
Abstract: Detecting and recognizing human faces automatically in digital images strongly enhance content-based video indexing systems. In this paper, a novel scheme for human faces detection in color images under nonconstrained scene conditions, such as the presence of a complex background and uncontrolled illumination, is presented. Color clustering and filtering using approximations of the YCbCr and HSV skin color subspaces are applied on the original image, providing quantized skin color regions. A merging stage is then iteratively performed on the set of homogeneous skin color regions in the color quantized image, in order to provide a set of potential face areas. Constraints related to shape and size of faces are applied, and face intensity texture is analyzed by performing a wavelet packet decomposition on each face area candidate in order to detect human faces. The wavelet coefficients of the band filtered images characterize the face texture and a set of simple statistical deviations is extracted in order to form compact and meaningful feature vectors. Then, an efficient and reliable probabilistic metric derived from the Bhattacharrya distance is used in order to classify the extracted feature vectors into face or nonface areas, using some prototype face area vectors, acquired in a previous training stage.

641 citations


Journal ArticleDOI
TL;DR: Results obtained with MPEG-4 test sequences and additional sequences show that the accuracy of object segmentation is substantially improved in presence of moving cast shadows.
Abstract: To prevent moving shadows being misclassified as moving objects or parts of moving objects, this paper presents an explicit method for detection of moving cast shadows on a dominating scene background. Those shadows are generated by objects moving between a light source and the background. Moving cast shadows cause a frame difference between two succeeding images of a monocular video image sequence. For shadow detection, these frame differences are detected and classified into regions covered and regions uncovered by a moving shadow. The detection and classification assume plane background and a nonnegligible size and intensity of the light sources. A cast shadow is detected by temporal integration of the covered background regions while subtracting the uncovered background regions. The shadow detection method is integrated into an algorithm for two-dimensional (2-D) shape estimation of moving objects from the informative part of the description of the international standard ISO/MPEG-4. The extended segmentation algorithm compensates first apparent camera motion. Then, a spatially adaptive relaxation scheme estimates a change detection mask for two consecutive images. An object mask is derived from the change detection mask by elimination of changes due to background uncovered by moving objects and by elimination of changes due to background covered or uncovered by moving cast shadows. Results obtained with MPEG-4 test sequences and additional sequences show that the accuracy of object segmentation is substantially improved in presence of moving cast shadows. Objects and shadows are detected and tracked separately.

354 citations


Journal ArticleDOI
TL;DR: A point to point real-time video transmission scheme over the Internet combining a low-delay TCP-friendly transport protocol in conjunction with a novel compression method that is error resilient and bandwidth-scalable is introduced.
Abstract: We introduce a point to point real-time video transmission scheme over the Internet combining a low-delay TCP-friendly transport protocol in conjunction with a novel compression method that is error resilient and bandwidth-scalable. Compressed video is packetized into individually decodable packets of equal expected visual importance. Consequently, relatively constant video quality can be achieved at the receiver under lossy conditions. Furthermore, the packets can be truncated to instantaneously meet the time varying bandwidth imposed by a TCP-friendly transport protocol. As a result, adaptive flows that are friendly to other Internet traffic are produced. Actual Internet experiments together with simulations are used to evaluate the performance of the compression, transport, and the combined schemes.

318 citations


Journal ArticleDOI
TL;DR: This paper shows that the incoming motion vectors become nonoptimal due to the reconstruction errors, and proposes a fast-search adaptive motion vector refinement scheme capable of providing video quality comparable to that can be achieved by performing a new full-scale motion estimation but with much less computation.
Abstract: In transcoding, simply reusing the motion vectors extracted from an incoming video bit stream may not result in the best quality. In this paper, we show that the incoming motion vectors become nonoptimal due to the reconstruction errors. To achieve the best video quality possible, a new motion estimation should be performed in the transcoder. We propose a fast-search adaptive motion vector refinement scheme that is capable of providing video quality comparable to that can be achieved by performing a new full-scale motion estimation but with much less computation. We discuss the case when some incoming frames are dropped for frame-rate conversions, and propose motion vector composition method to compose a motion vector from the incoming motion vectors. The composed motion vector can also be refined using the proposed motion vector refinement scheme to achieve better results.

261 citations


Journal ArticleDOI
TL;DR: A new representation of audio noise signals is proposed, based on symmetric /spl alpha/-stable (S/spl alpha/S) distributions in order to better model the outliers that exist in real signals.
Abstract: A new representation of audio noise signals is proposed, based on symmetric /spl alpha/-stable (S/spl alpha/S) distributions in order to better model the outliers that exist in real signals. This representation addresses a shortcoming of the Gaussian model, namely, the fact that it is not well suited for describing signals with impulsive behavior. The /spl alpha/-stable and Gaussian methods are used to model measured noise signals. It is demonstrated that the /spl alpha/-stable distribution, which has heavier tails than the Gaussian distribution, gives a much better approximation to real-world audio signals. The significance of these results is shown by considering the time delay estimation (TDE) problem for source localization in teleimmersion applications. In order to achieve robust sound source localization, a novel time delay estimation approach is proposed. It is based on fractional lower order statistics (FLOS), which mitigate the effects of heavy-tailed noise. An improvement in TDE performance is demonstrated using FLOS that is up to a factor of four better than what can be achieved with second-order statistics.

213 citations


Journal ArticleDOI
TL;DR: This work develops two techniques, an estimate approach and a learning approach, which are designed to optimize accurate recognition during the multimodal integration process, and evaluates these methods using Quickset, a speech/gesture multimodals system, and reports evaluation results based on an empirical corpus collected with Quicksets.
Abstract: We present a statistical approach to developing multimodal recognition systems and, in particular, to integrating the posterior probabilities of parallel input signals involved in the multimodal system. We first identify the primary factors that influence multimodal recognition performance by evaluating the multimodal recognition probabilities. We then develop two techniques, an estimate approach and a learning approach, which are designed to optimize accurate recognition during the multimodal integration process. We evaluate these methods using Quickset, a speech/gesture multimodal system, and report evaluation results based on an empirical corpus collected with Quickset. From an architectural perspective, the integration technique presented offers enhanced robustness. It also is premised on more realistic assumptions than previous multimodal systems using semantic fusion. From a methodological standpoint, the evaluation techniques that we describe provide a valuable tool for evaluating multimodal systems.

197 citations


Journal ArticleDOI
TL;DR: This work develops robust computer vision methods to detect and track natural features in video images that represent a step toward integrating vision with graphics to produce robust wide-area augmented realities.
Abstract: Natural scene features stabilize and extend the tracking range of augmented reality (AR) pose-tracking systems. We develop robust computer vision methods to detect and track natural features in video images. Point and region features are automatically and adaptively selected for properties that lead to robust tracking. A multistage tracking algorithm produces accurate motion estimates, and the entire system operates in a closed-loop that stabilizes its performance and accuracy. We present demonstrations of the benefits of using tracked natural features for AR applications that illustrate direct scene annotation, pose stabilization, and extendible tracking range. Our system represents a step toward integrating vision with graphics to produce robust wide-area augmented realities.

183 citations


Journal ArticleDOI
TL;DR: This paper compares the transmission schedules generated by the various smoothing algorithms, based on a collection of metrics that relate directly to the server, network, and client resources necessary for the transmission, transport, and playback of prerecorded video.
Abstract: The transfer of prerecorded, compressed variable-bit-rate video requires multimedia services to support large fluctuations in bandwidth requirements on multiple time scales. Bandwidth smoothing techniques can reduce the burstiness of a variable-bit-rate stream by transmitting data at a series of fixed rates, simplifying the allocation of resources in video servers and the communication network. This paper compares the transmission schedules generated by the various smoothing algorithms, based on a collection of metrics that relate directly to the server, network, and client resources necessary for the transmission, transport, and playback of prerecorded video. Using MPEG-1 and MJPEG video data and a range of client buffer sizes, we investigate the interplay between the performance metrics and the smoothing algorithms. The results highlight the unique strengths and weaknesses of each bandwidth smoothing algorithm, as well as the characteristics of a diverse set of video clips.

137 citations


Journal ArticleDOI
TL;DR: A novel method for video analysis using the macroblock (MB) type information of MPEG compressed video bitstreams is developed, which exploits the comparison operations performed in the motion estimation procedure to achieve very fast scene change, gradual transition, flashlight, and caption detection.
Abstract: Efficient indexing methods are required to handle the rapidly increasing amount of visual information within video databases. Video analysis that partitions the video into clips or extracts interesting frames is an important preprocessing step for video indexing. We develop a novel method for video analysis using the macroblock (MB) type information of MPEG compressed video bitstreams. This method exploits the comparison operations performed in the motion estimation procedure, which results in specific characteristics of the MB type information when scene changes occur or some special effects are applied. Only a simple analysis on MB types of frames is needed to achieve very fast scene change, gradual transition, flashlight, and caption detection. The advantages of this novel approach are its direct extraction from the MPEG bitstreams after VLC decoding, very low complexity analysis, frame-based detection accuracy and high sensitivity.

94 citations


Journal ArticleDOI
TL;DR: It is argued that, in many applications, the MINMAX criterion is more appropriate than the more popular MINAVE criterion, and the way both criteria can be applied simultaneously within the same optimization framework is discussed.
Abstract: In this paper, we review a general framework for the optimal bit allocation among dependent quantizers based on the minimum maximum (MINMAX) distortion criterion. The pros and cons of this optimization criterion are discussed and compared to the well-known Lagrange multiplier method for the minimum average (MINAVE) distortion criterion. We argue that, in many applications, the MINMAX criterion is more appropriate than the more popular MINAVE criterion. We discuss the algorithms for solving the optimal bit allocation problem among dependent quantizers for both criteria and highlight the similarities and differences. We point out that any problem which can be solved with the MINAVE criterion can also be solved with the MINMAX criterion, since both approaches are based on the same assumptions. We discuss uniqueness of the MINMAX solution and the way both criteria can be applied simultaneously within the same optimization framework. Furthermore, we show how the discussed MINMAX approach can be directly extended to result in the lexicographically optimal solution. Finally, we apply the discussed MINMAX solution methods to still image compression, intermode frame compression of H.263, and shape coding applications.

Journal ArticleDOI
TL;DR: An overview of the AudioBIFS system, part of the Binary Format for Scene Description (BIFS) tool in the MPEG-4 International Standard, and the current state of implementations of the standard are described.
Abstract: We present an overview of the AudioBIFS system, part of the Binary Format for Scene Description (BIFS) tool in the MPEG-4 International Standard. AudioBIFS is the tool that integrates the synthetic and natural sound coding functions in MPEG-4. It allows the flexible construction of soundtracks and sound scenes using compressed sound, sound synthesis, streaming audio, interactive and terminal-dependent presentation, three-dimensional (3-D) spatialization, environmental auralization, and dynamic download of custom signal-processing effects algorithms. MPEG-4 sound scenes are based on a model that is a superset of the model in VRML 2.0, and we describe how MPEG-4 is built upon VRML and the new capabilities provided by MPEG-4. We discuss the use of structured audio orchestra language, the MPEG-4 SAOL, for writing downloadable effects, present an example sound scene built with AudioBIFS, and describe the current state of implementations of the standard.

Journal ArticleDOI
TL;DR: A new kind of human-computer interface allowing three-dimensional (3-D) visualization of multimedia objects and eye controlled interaction is proposed and preliminary results show that most of the users are impressed by a 3-D graphic user interface and the possibility to communicate with a computer by simply looking at the object of interest.
Abstract: In this paper, a new kind of human-computer interface allowing three-dimensional (3-D) visualization of multimedia objects and eye controlled interaction is proposed. In order to explore the advantages and limitations of the concept, a prototype system has been set up. The testbed includes a visual operating system for integrating novel forms of interaction with a 3-D graphic user interface, autostereoscopic (free-viewing) 3-D displays with close adaptation to the mechanisms of binocular vision, and solutions for nonintrusive eye-controlled interaction (video-based head and gaze tracking). The paper reviews the system's key components and outlines various applications implemented for user testing. Preliminary results show that most of the users are impressed by a 3-D graphic user interface and the possibility to communicate with a computer by simply looking at the object of interest. On the other hand, the results emphasize the need for a more intelligent interface agent to avoid misinterpretation of the user's eye-controlled input and to reset undesired activities.

Journal ArticleDOI
TL;DR: A video model to generate VBR MPEG video traffic based on the scene content description that may be used to generate traffic of any type of video scenes ranging from a low complexity video conferencing to a highly active sport program.
Abstract: In this paper, we propose a video model to generate VBR MPEG video traffic based on the scene content description. Long sessions of nonhomogeneous video clips are decomposed into homogeneous video shots. The shots are then classified into different classes in terms of their texture and motion complexity. Each shot class was uniquely described with an autoregressive model. Transitions between the shots and their durations have been analyzed. Unlike many classical video source models, this model may be used to generate traffic of any type of video scenes ranging from a low complexity video conferencing to a highly active sport program. The performance of the model is evaluated by measuring the mean cell delay when the generated video traffic is fed to an ATM multiplex buffer.

Journal ArticleDOI
TL;DR: This paper proposes a new algorithm that can deal with the burst errors and the location-dependent errors of EC-MAC, which is a low-power medium access control protocol for wireless and mobile ATM networks.
Abstract: This paper describes the design and analysis of the scheduling algorithm for energy conserving medium access control (EC-MAC), which is a low-power medium access control (MAC) protocol for wireless and mobile ATM networks. We evaluate the scheduling algorithms that have been proposed for traditional ATM networks. Based on the structure of EC-MAC and the characteristics of wireless channel, we propose a new algorithm that can deal with the burst errors and the location-dependent errors. Most scheduling algorithms proposed for either wired or wireless networks were analyzed with homogeneous traffic or multimedia services with simplified traffic models. We analyze our scheduling algorithm with more realistic multimedia traffic models based on H.263 video traces and self-similar data traffic. One of the key goals of the scheduling algorithms is simplicity and fast implementation. Unlike the time-stamped based algorithms, our algorithm does not need to sort the virtual time, and thus, the complexity of the algorithm is reduced significantly.

Journal ArticleDOI
TL;DR: A new parallel transmission framework for reliable multimedia data transmission over spectrally shaped channels using multicarrier modulation to transmit source data layers of different perceptual importance in parallel, each occupying a number of subchannels.
Abstract: This paper presents a new parallel transmission framework for reliable multimedia data transmission over spectrally shaped channels using multicarrier modulation. We propose to transmit source data layers of different perceptual importance in parallel, each occupying a number of subchannels. New loading algorithms are developed to efficiently allocate the available resources, e.g., transmitted power and bit rate, to the subchannels according to the source layers they transmit. Instead of making the bit error rate of all the subchannels equal as in most existing loading algorithms, the proposed algorithm assigns different error performance to the subchannels to achieve unequal error protection for different layers. The channel induced distortion in mean-square sense is minimized. We show that the proposed system can be applied nicely to both fixed length coding and variable-length coding. Asymptotic gains with respect to channel distortion are also derived. Numerical examples show that the proposed algorithm achieves significant performance improvement compared to the existing work, especially for spectrally shaped channels commonly used in in ADSL systems.

Journal ArticleDOI
TL;DR: A new divide-and-conquer technique for disparity estimation is proposed, which performs feature matching following the high confidence first principle, starting with the strongest feature point in the stereo pair of scanlines.
Abstract: A new divide-and-conquer technique for disparity estimation is proposed in this paper. This technique performs feature matching following the high confidence first principle, starting with the strongest feature point in the stereo pair of scanlines. Once the first matching pair is established, the ordering constraint in disparity estimation allows the original intra-scanline matching problem to be divided into two smaller subproblems. Each subproblem can then be solved recursively until there is no reliable feature point within the subintervals. This technique is very efficient for dense disparity map estimation for stereo images with rich features. For general scenes, this technique can be paired up with the disparity-space image (DSI) technique to compute dense disparity maps with integrated occlusion detection. In this approach, the divide-and-conquer part of the algorithm handles the matching of stronger features and the DSI-based technique handles the matching of pixels in between feature points and the detection of occlusions. An extension to the standard disparity-space technique is also presented to compliment the divide-and-conquer algorithm. Experiments demonstrate the effectiveness of the proposed divide-and-conquer DSI algorithm.

Journal ArticleDOI
John R. Smith1
TL;DR: VideoZoom provides a new and useful system for accessing video over the Internet in which streaming methods provide insufficient quality of video, video downloading introduces large latencies, and generating video summaries is difficult or not well integrated with video retrieval tasks.
Abstract: We describe a system for browsing and interactively retrieving video over the Internet at multiple spatial and temporal resolutions. The VideoZoom system enables users to start with coarse, low-resolution views of the sequences and selectively zoom-in in space and time. VideoZoom decomposes the video sequences into a hierarchy of view elements, which are retrieved in a progressive fashion. The client browser incrementally builds the views by retrieving, caching, and assembling the view elements, as needed. By integrating browsing and retrieval into a single progressive retrieval paradigm, VideoZoom provides a new and useful system for accessing video over the Internet. VideoZoom is suitable for digital video libraries and a number of other applications in which streaming methods provide insufficient quality of video, video downloading introduces large latencies, and generating video summaries is difficult or not well integrated with video retrieval tasks.

Journal ArticleDOI
TL;DR: This paper presents a knowledge-based framework to capture metarepresentations for real-life video with human walkers that models the human body as an articulated object and the human walking as a cyclic activity with highly correlated temporal patterns.
Abstract: Extracting human representations from video has vast applications. In this paper, we present a knowledge-based framework to capture metarepresentations for real-life video with human walkers. The system models the human body as an articulated object and the human walking as a cyclic activity with highly correlated temporal patterns. We extract for each of the body parts its motion, shape, and texture. Once available, this structural information can be used to manipulate or synthesize the original video sequence, or animate the walker with a different motion in a new synthesized video.

Journal ArticleDOI
TL;DR: This paper estimates that a software reference implementation of an MPEG-4 video encoder typically requires five Gtransfers/s to main memory for a simple profile level L2, and applies the ACROPOLIS methodology to relieve this data access bottleneck, arriving at an implementation which needs a factor 65 less background accesses.
Abstract: Data transfers and storage are crucial cost factors in multimedia systems. Systematic methodologies are needed to obtain dramatic reductions in terms of power, area and cycle count. Upcoming multimedia processing applications will require high memory bandwidth. In this paper, we estimate that a software reference implementation of an MPEG-4 video encoder typically requires five Gtransfers/s to main memory for a simple profile level L2. This shows a clear need for optimization and the use of intermediate memory stages. By applying our ACROPOLIS methodology, developed mainly to relieve this data access bottleneck, we have arrived at an implementation which needs a factor 65 less background accesses. In addition, we also show that we can heavily improve on the memory transfers, without sacrificing speed (even gaining about 10% on cache misses and cycles for a DEC Alpha), by aggressive source code transformations.

Journal ArticleDOI
TL;DR: The loss behavior encountered in transmitting real-time voice over the Internet is explored and a new loss-concealment scheme is proposed to improve its received quality and can be extended to various interleaving factors and interpolation-based reconstruction methods.
Abstract: We explore the loss behavior encountered in transmitting real-time voice over the Internet and propose a new loss-concealment scheme to improve its received quality. One known technique to conceal loss is to send interleaved streams of voice samples and reconstruct missing or late samples by interpolation at the receiver. Based on this method, we propose a new transformation-based reconstruction algorithm. Its basic idea is for the sender to transform an input voice stream, according to the interpolation method used at the receiver and the predicted loss behavior, before interleaving the stream. The transformation is derived by minimizing reconstruction error in case of loss. We show that our method is computationally efficient and can be extended to various interleaving factors and interpolation-based reconstruction methods. Finally, we show performance improvements of our method by testing it over the Internet.

Journal ArticleDOI
TL;DR: Experimental results with a moving virtual object mixed into real video telephone sequences show that the virtual object appears naturally having the same shading and shadows as the real objects.
Abstract: In applications of augmented reality like virtual studio TV production, multisite video conference applications using a virtual meeting room and synthetic/natural hybrid coding according to the new ISO/MPEG-4 standard, a synthetic scene is mixed into a natural scene to generate a synthetic/natural hybrid image sequence. For realism, the illumination in both scenes should be identical. In this paper, the illumination of the natural scene is estimated automatically and applied to the synthetic scene. The natural scenes are restricted to scenes with nonoccluding, simple, moving, mainly rigid objects. For illumination estimation, these natural objects are automatically segmented in the natural image sequence and three-dimensionally (3-D) modeled using ellipsoid-like models. The 3-D shape, 3-D motion, and the displaced frame difference between two succeeding images are evaluated to estimate three illumination parameters. The parameters describe a distant point light source and ambient light. Using the estimated illumination parameters, the synthetic scene is rendered and mixed to the natural image sequence. Experimental results with a moving virtual object mixed into real video telephone sequences show that the virtual object appears naturally having the same shading and shadows as the real objects. Further, shading and shadow allows the viewer to understand the motion trajectory of the objects much better.

Journal ArticleDOI
TL;DR: This paper reports on an ARAM chip that has been designed and fabricated in a 0.5-/spl mu/m CMOS technology and features an equivalent resolution of up to 7 bits-measured by comparing the reconstructed waveform with the original input signal.
Abstract: Data compressing, data coding, and communications in object-oriented multimedia applications like telepresence, computer-aided medical diagnosis, or telesurgery require an enormous computing power-in the order of trillions of operations per second (TeraOPS). Compared with conventional digital technology, cellular neural/nonlinear network (CNN)-based computing is capable of realizing these TeraOPS-range image processing tasks in a cost-effective implementation. To exploit the computing power of the CNN Universal Machine (CNN-UM), the CNN chipset architecture has been developed-a mixed-signal hardware platform for CNN-based image processing. One of the nonstandard components of the chipset is the cache memory of the analog array processor, the analog random access memory (ARAM). This paper reports on an ARAM chip that has been designed and fabricated in a 0.5-/spl mu/m CMOS technology. This chip consists of a fully addressable array of 32/spl times/256 analog memory registers and has a packing density of 637 analog-memory-cells/mm/sup 2/. Random and nondestructive access of the memory contents is available. Bottom-plate sampling techniques have been employed to eliminate harmonic distortion introduced by signal-dependent feedthrough. Signal coupling and interaction have been minimized by proper layout measures, including the use of protection rings and separate power supplies for the analog and the digital circuitry. This prototype features an equivalent resolution of up to 7 bits-measured by comparing the reconstructed waveform with the original input signal. Measured access times for writing/reading to/from the memory registers are of 200 ns. I/O rates via the l6-line-wide I/O bus exceed 10 Msamples/s. Storage time at room temperature is in the 80 to 100 ms range, without accuracy loss.

Journal ArticleDOI
TL;DR: The dynamic nature of rate reduction is investigated: any prolonged impairment is likely to be noticeable and the utility of the proposed scheme is measured by its ability to multiplex a large number of streams effectively.
Abstract: We have proposed a smoothing and rate adaptation algorithm-SAVE (Smoothed Adaptive Video over Explicit rate networks)-for transport of compressed video over rate-controlled networks. SAVE attempts to preserve quality as much as possible, and exercises control over the source rate only when essential to prevent unacceptable delay. In order to understand the impact on quality of rate adaptation, we have evolved the quality metrics typically used to evaluate the efficacy of mechanisms to transport video. We investigate the dynamic nature of rate reduction: any prolonged impairment is likely to be noticeable. We study the sensitivity of SAVE to its parameters and network characteristics. Finally, the utility of the proposed scheme is measured by its ability to multiplex a large number of streams effectively. Our evaluations are based on experiments with 20 traces of entertainment videos using different compression algorithms.

Journal ArticleDOI
TL;DR: A modeling methodology that captures the spatio-temporal relationship between various objects and user interaction is proposed and a number of scheduling algorithms that periodically allocate MPEG-4 objects to multiple workstations ensuring load balancing and synchronization requirements among multiple objects are proposed.
Abstract: MPEG-4 currently being finalized by the Moving Pictures Experts Group of the ISO is a multimedia standard, MPEG-4 aims to support content-based coding of audio, text, image, and video (synthetic and natural) data, multiplexing of coded data, as well as composition and representation of audiovisual scenes. One of the most critical components of an MPEG-4 environment is the system encoder. An MPEG-4 scene may contain several audio and video objects, images, and text, each of which must be encoded individually and then multiplexed to form the system bitstream. Due to its flexible features, object-based nature, and provision for user interaction, MPEG-4 encoder is highly suitable for a software-based implementation. A full-scale software-based MPEG-J system encoder with real-time encoding speed is a nontrivial task and requires massive computation. We have built such an encoder using a cluster of workstations collectively working as a virtual parallel machine. Parallel processing of MPEG-4 encoder needs to be carried out carefully as objects may appear or disappear dynamically in a scene. In addition, objects may be synchronized with each other. User interactions may also prohibit a straightforward parallelization. We propose a modeling methodology that captures the spatio-temporal relationship between various objects and user interaction. We then propose a number of scheduling algorithms that periodically allocate MPEG-4 objects to multiple workstations ensuring load balancing and synchronization requirements among multiple objects. Each scheduling algorithm has its own performance and complexity characteristics. The experimental results, while showing real-time encoding rates, exhibit tradeoffs between load balancing, scheduling overhead cost, and global performance.

Journal ArticleDOI
TL;DR: A random timestamping procedure based on a random telegraph process is analyzed and lower bounds on the rate of PCR polarity changes are obtained such that the recovered clock does not violate the PAL/NTSC clock specifications.
Abstract: We propose and analyze several strategies for performing timestamping of an MPEG-2 Transport Stream transmitted over a packet-switched network using the PCR-unaware encapsulation scheme, and analyze their effect on the quality of the recovered clock at the MPEG-2 Systems decoder. When the timestamping scheme is based on a timer with a fixed period, the PCR values in the packet stream may switch polarity deterministically, at a frequency determined by the timer period and the transport rate of the MPEG signal. This, in turn, can degrade the duality of the recovered clock at the receiver beyond acceptable limits. We consider three timestamping schemes for solving this problem: (1) selecting a deterministic timer period to avoid the phase difference in PCR values altogether, (2) fine-tuning the deterministic timer period to maximize the frequency of PCR polarity changes, and (3) selecting the timer period randomly to eliminate the deterministic PCR polarity changes. For the case of deterministic timer period, we derive the frequency of the PCR polarity changes as a function of the timer period and the transport rate, and use it to find ranges of the timer period for acceptable quality of the recovered clock. We also analyze a random timestamping procedure based on a random telegraph process and obtain lower bounds on the rate of PCR polarity changes such that the recovered clock does not violate the PAL/NTSC clock specifications. The analytical results are verified by simulations with both synthetic and actual MPEG-2 Transport Streams sent to a simulation model of an MPEG-2 Systems decoder.

Journal ArticleDOI
TL;DR: A "selective packet retransmission" scheme for improving HTTP/TCP performance when transmitting through ATM networks, which takes advantage of the property of humans' perception tolerance for errors to determine whether to retransmit a corrupted TCP segment or not.
Abstract: Transmission control protocol/Internet protocol (TCP/IP) is the de facto standard of the networking world. It dynamically adjusts routing of packets to accommodate failures in channels and allows construction of very large networks with little central management. But IP packets are based on the datagram model and are not really suited to real-time traffic. In order to overcome the drawbacks, a new network technology, ATM, is proposed. ATM provides quality of service (QOS) guarantees for various classes of applications and in-order delivery of packets via connection oriented virtual circuits. Unfortunately, when ATM is to be internetworked with the existing network infrastructure, some special signaling, addressing and routing protocols are needed. IP over ATM is one of the methods proposed by IETF. It allows existing TCP/IP applications to run on ATM end-stations and ATM networks to interconnect with legacy LAN/WAN technologies. But the performance of TCP/IP over ATM leaves something to be desired. Partial packet discard (PPD) and early packet discard (EPD) are two schemes to improve its performance. This paper proposes a "selective packet retransmission" scheme for improving HTTP/TCP performance when transmitting through ATM networks. In selective packet retransmission, we take advantage of the property of humans' perception tolerance for errors to determine whether to retransmit a corrupted TCP segment or not. For lossable data, such as images, when an error occurs because of cell losses, it will not be retransmitted. The simulations show that, for the same buffer size and traffic load, selective packet retransmission performs better than PPD, EPD, and plain TCP over ATM.