Michael G. Perkins
Other affiliations: Stanford University
Bio: Michael G. Perkins is an academic researcher from Scientific Atlanta. The author has contributed to research in topics: Encoder & Discrete cosine transform. The author has an hindex of 9, co-authored 9 publications receiving 512 citations. Previous affiliations of Michael G. Perkins include Stanford University.
TL;DR: It is proved that the rate distortion limit for coding stereopairs cannot in general be achieved by a coder that first codes and decodes the right picture sequence independently of the left picture sequence, and then codes anddecodes theleft picture sequence given the decoded right picture sequences.
Abstract: Two fundamentally different techniques for compressing stereopairs are discussed. The first technique, called disparity-compensated transform-domain predictive coding, attempts to minimize the mean-square error between the original stereopair and the compressed stereopair. The second technique, called mixed-resolution coding, is a psychophysically justified technique that exploits known facts about human stereovision to code stereopairs in a subjectively acceptable manner. A method for assessing the quality of compressed stereopairs is also presented. It involves measuring the ability of an observer to perceive depth in coded stereopairs. It was found that observers generally perceived objects to be further away in compressed stereopairs than they did in originals. It is proved that the rate distortion limit for coding stereopairs cannot in general be achieved by a coder that first codes and decodes the right picture sequence independently of the left picture sequence, and then codes and decodes the left picture sequence given the decoded right picture sequence. >
10 Nov 1997
TL;DR: In this article, a method for dynamically allocating bandwidth to each encoder in an ensemble of video encoders whose output bit streams share a single communications channel is proposed. But the method is not suitable for the case of high visual complexity.
Abstract: The present invention relates to a method for dynamically allocating bandwidth to each encoder in an ensemble of video encoders whose output bit streams share a single communications channel. In accordance with the present invention, the channel bandwidth is allocated to the individual encoders in the ensemble in such a way that differences in a quality measure among the decoders are reduced. The quality measure includes a term that behaves like a peak-signal-to-noise ratio (PSNR) and a term that measures the "masking effect" in a video signal. The "masking effect" results because an encoded frame with a high visual complexity masks coding artifacts from the viewer when it is decoded and displayed.
TL;DR: This paper discusses rate control for statistically multiplexed (stat mux) systems and derives the fundamental equations governing encoder and decoder buffer fullness and two fundamental approaches to stat mux rate control are discussed.
Abstract: This paper discusses rate control for statistically multiplexed (stat mux) systems and derives the fundamental equations governing encoder and decoder buffer fullness. An example of how these equations can be used to determine the sizes of the encoder and decoder buffers is presented. Two fundamental approaches to stat mux rate control are also discussed: the look-ahead approach and the feedback approach. In the look-ahead approach, statistics computed by a preprocessor are used to adjust the bit rate prior to coding the frames in question. In the feedback approach, statistics generated by the encoder as a by-product of the compression process are used to control the future bit-rate allocation. Finally, simulation results for one stat mux implementation are presented.
23 Feb 1996
TL;DR: In this article, a method and apparatus for reducing program clock reference jitter in transport packets of a transport stream compliant with MPEG-2 or another suitable audio-video encoding standard is presented.
Abstract: A method and apparatus for reducing program clock reference (PCR) jitter in transport packets of a transport stream compliant with MPEG-2 or another suitable audio-video encoding standard. The PCRs from a given single program transport stream (SPTS) of a multi-program transport stream are processed in a phase-locked loop (PLL) to generate dejittered PCRs for that SPTS. The PLL for a given SPTS receives as inputs the PCRs from that SPTS and a cycle count for each PCR indicative of the number of asynchronous clock cycles counted since the previous PCR. The PLL generates a given dejittered PCR as a function of the previous dejittered PCR, the cycle count for the given PCR, and a clock frequency mismatch estimate for the given program clock. The clock frequency mismatch estimate is generated by filtering a sequence of jitter estimates, each corresponding to the difference between a previous PCR and its corresponding dejittered PCR. The SPTS transport packets may then be restamped with the dejittered PCRs from the PLL to provide a dejittered multi-program transport stream.
•01 Apr 1993
TL;DR: The vector quantization (VQ) and Huffman coding (Huffman coding) methods for compressing data in a system employing VQ and HCC are described in this paper.
Abstract: Methods for compressing data in a system employing vector quantization (VQ) and Huffman coding comprise: First, quantizing an input vector by representing the input vector with a VQ codevector selected from a VQ codebook partitioned into subsets, wherein each subset comprises codevectors and each codevector is stored at a corresponding address in the VQ codebook. Next, generating a rate dependent Huffman codeword for the selected codevector, wherein the rate dependent Huffman codeword identifies the subset of the VQ codebook in which the selected codevector is stored. And finally, generating a substantially rate independent Huffman codeword for the selected codevector, wherein the substantially rate independent Huffman codeword identifies a particular VQ codevector within the subset identified by the rate dependent Huffman codeword.
••31 Jan 2011
TL;DR: An overview of the algorithmic design used for extending H.264/MPEG-4 AVC towards MVC is provided and a summary of the coding performance achieved by MVC for both stereo- and multiview video is provided.
Abstract: Significant improvements in video compression capability have been demonstrated with the introduction of the H.264/MPEG-4 advanced video coding (AVC) standard. Since developing this standard, the Joint Video Team of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) has also standardized an extension of that technology that is referred to as multiview video coding (MVC). MVC provides a compact representation for multiple views of a video scene, such as multiple synchronized video cameras. Stereo-paired video for 3-D viewing is an important special case of MVC. The standard enables inter-view prediction to improve compression capability, as well as supporting ordinary temporal and spatial prediction. It also supports backward compatibility with existing legacy systems by structuring the MVC bitstream to include a compatible “base view.” Each other view is encoded at the same picture resolution as the base view. In recognition of its high-quality encoding capability and support for backward compatibility, the stereo high profile of the MVC extension was selected by the Blu-Ray Disc Association as the coding format for 3-D video with high-definition resolution. This paper provides an overview of the algorithmic design used for extending H.264/MPEG-4 AVC towards MVC. The basic approach of MVC for enabling inter-view prediction and view scalability in the context of H.264/MPEG-4 AVC is reviewed. Related supplemental enhancement information (SEI) metadata is also described. Various “frame compatible” approaches for support of stereo-view video as an alternative to MVC are also discussed. A summary of the coding performance achieved by MVC for both stereo- and multiview video is also provided. Future directions and challenges related to 3-D video are also briefly discussed.
01 Jan 1993
TL;DR: For a wide class of distortion measures and discrete sources of information there exists a functionR(d) (depending on the particular distortion measure and source) which measures the equivalent rateR of the source (in bits per letter produced) whendis the allowed distortion level.
Abstract: Consider a discrete source producing a sequence of message letters from a finite alphabet. A single-letter distortion measure is given by a non-negative matrix (d ij ). The entryd ij measures the ?cost? or ?distortion? if letteriis reproduced at the receiver as letterj. The average distortion of a communications system (source-coder-noisy channel-decoder) is taken to bed= ? i.j P ij d ij whereP ij is the probability ofibeing reproduced asj. It is shown that there is a functionR(d) that measures the ?equivalent rate? of the source for a given level of distortion. For coding purposes where a leveldof distortion can be tolerated, the source acts like one with information rateR(d). Methods are given for calculatingR(d), and various properties discussed. Finally, generalizations to ergodic sources, to continuous sources, and to distortion measures involving blocks of letters are developed. In this paper a study is made of the problem of coding a discrete source of information, given afidelity criterionor ameasure of the distortionof the final recovered message at the receiving point relative to the actual transmitted message. In a particular case there might be a certain tolerable level of distortion as determined by this measure. It is desired to so encode the information that the maximum possible signaling rate is obtained without exceeding the tolerable distortion level. This work is an expansion and detailed elaboration of ideas presented earlier , with particular reference to the discrete case. We shall show that for a wide class of distortion measures and discrete sources of information there exists a functionR(d) (depending on the particular distortion measure and source) which measures, in a sense, the equivalent rateRof the source (in bits per letter produced) whendis the allowed distortion level. Methods will be given for evaluatingR(d) explicitly in certain simple cases and for evaluatingR(d) by a limiting process in more complex cases. The basic results are roughly that it is impossible to signal at a rate faster thanC/R(d) (source letters per second) over a memoryless channel of capacityC(bits per second) with a distortion measure less than or equal tod. On the other hand, by sufficiently long block codes it is possible to approach as closely as desired the rateC/R(d) with distortion leveld. Finally, some particular examples, using error probability per letter of message and other simple distortion measures, are worked out in detail.
20 Apr 2000
TL;DR: In this article, a system for managing advertisements in a digital video environment, including methods for selecting suitable advertising based on subscriber profiles, and substituting advertisements in program stream, is presented.
Abstract: A system for managing advertisements in a digital video environment, including methods for selecting suitable advertising based on subscriber profiles, and substituting advertisements in a program stream. The Ad management System (100) of the present invention manages the sales and insertion of digital video ads in cable tv, switched digital video and streaming internet based environments.
16 May 1996
TL;DR: In this paper, an architecture for distributing digital information to subscriber units is proposed, where selection from among multiple digital services is accomplished by transmitting a tuning command from a subscriber unit to an intermediate interface.
Abstract: An architecture (200) for distributing digital information to subscriber units (202) wherein selection from among multiple digital services is accomplished by transmitting a tuning command from a subscriber unit to an intermediate interface (206). The intermediate interface (206) selects the desired service from a broadband network and transmits it to the subscriber unit (202) over a bandwidth-constrained access line. The bandwidth-constrained access line may be implemented with existing infrastructure, yet the subscriber unit (202) may access a wide variety of digital information available on the broadband network. Universal broadband access is thus provided at low cost. Output bandwidth of broadcast equipment may also be optimized.
TL;DR: This paper presents several bitstream scaling methods for the purpose of reducing the rate of constant bit rate (CBR) encoded bitstreams and shows typical performance trade-offs of the methods.
Abstract: The idea of moving picture expert group (MPEG) bitstream scaling relates to altering or scaling the amount of data in a previously compressed MPEG bitstream. The new scaled bitstream conforms to constraints that are not known nor considered when the original preceded bitstream was constructed. Numerous applications for video transmission and storage are being developed based on the MPEG video coding standard. Applications such as video on demand, trick-play track on digital video tape recorders (VTR's) and extended-play recording on VTR's motivate the idea of bitstream scaling. In this paper, we present several bitstream scaling methods for the purpose of reducing the rate of constant bit rate (CBR) encoded bitstreams. The different methods have varying hardware implementation complexity and associated trade-offs in resulting image quality. Simulation results on MPEG test sequences demonstrate the typical performance trade-offs of the methods.