scispace - formally typeset
Search or ask a question

Showing papers by "Dolby Laboratories published in 2008"


Patent
08 Sep 2008
TL;DR: In this article, a family of rate allocation and rate control methods that utilize advanced processing of past and future frame/field picture statistics and are designed to operate with one or more coding passes are described.
Abstract: Embodiments feature families of rate allocation and rate control methods that utilize advanced processing of past and future frame/field picture statistics and are designed to operate with one or more coding passes. At least two method families include: a family of methods for a rate allocation with picture look-ahead; and a family of methods for average bit rate (ABR) control methods. At least two other methods for each method family are described. For the first family of methods, some methods may involve intra rate control. For the second family of methods, some methods may involve high complexity ABR control and/or low complexity ABR control. These and other embodiments can involve any of the following: spatial coding parameter adaptation, coding prediction, complexity processing, complexity estimation, complexity filtering, bit rate considerations, quality considerations, coding parameter allocation, and/or hierarchical prediction structures, among others.

134 citations



Patent
08 Sep 2008
TL;DR: In this paper, a method for treating video information is described, which includes receiving video information, classifying one or more frames in the received video information as a scene, adjusting one or several coding parameters based on the classification of the frames, and coding the video information in accordance with the adjusted coding parameters.
Abstract: Systems, methods, and techniques for treating video information are described. In one implementation, a method includes receiving video information, classifying one or more frames in the received video information as a scene, adjusting one or more coding parameters based on the classification of the frames, and coding the video information in accordance with the adjusted coding parameters.

92 citations


Patent
09 May 2008
TL;DR: In this article, the shape glasses and projection filters are used together as a system for projecting and viewing 3D images, and complementary images are projected for viewing through projection filters having passbands that pre-shift to compensate for subsequent wavelength shifts.
Abstract: Shaped glasses have curved surface lenses with spectrally complementary filters disposed thereon. The filters curved surface lenses are configured to compensate for wavelength shifts occurring due to viewing angles and other sources. Complementary images are projected for viewing through projection filters having passbands that pre-shift to compensate for subsequent wavelength shifts. At least one filter may have more than 3 primary passbands. For example, two filters include a first filter having passbands of low blue, high blue, low green, high green, and red, and a second filter having passbands of blue, green, and red. The additional passbands may be utilized to more closely match a color space and white point of a projector in which the filters are used. The shaped glasses and projection filters together may be utilized as a system for projecting and viewing 3D images.

86 citations


Patent
30 Jul 2008
TL;DR: In this paper, the authors take as input image data in a lower-dynamic-range (LDR) format and produce as output enhanced image data having a dynamic range greater than that of the input data.
Abstract: Methods and apparatus according to various aspects take as input image data in a lower-dynamic-range (LDR) format and produce as output enhanced image data having a dynamic range greater than that of the input image data (i.e. higher-dynamic range (HDR) image data). In some embodiments, the methods are applied to video data and are performed in real-time (i.e. processing of video frames to enhance the dynamic range of the video frames is completed at least on average at the frame rate of the video signal).

69 citations


Patent
27 Aug 2008
TL;DR: In this paper, a media fingerprint is derived from a portion of media content and information is associated with the media content portion based on the derived media fingerprint, which is then linked to the associated information.
Abstract: A media fingerprint is derived from a portion of media content. Information is associated with the media content portion based on the derived media fingerprint. Upon linking to the associated information, the associated content is presented with the media content portion. The media fingerprint includes a unique representation of the media content portion that is derived from a characteristic component of the media content portion. The media content may comprise an original instance of content or a derivative instance of the original content.

65 citations


Patent
06 Oct 2008
TL;DR: In this article, the energy values are accessed to initially represent a temporally related group of content elements in a media sequence, and then the initial representation is transformed into a subsequent representation, which is in another dimensional space.
Abstract: Quantized energy values are accessed to initially represent a temporally related group of content elements in a media sequence. The values are accessed over a matrix of regions into which the initial representation is partitioned. The initial representation may be downsampled and/or cropped from the content. A basis vector set is estimated in a dimensional space from the values. The initial representation is transformed into a subsequent representation, which is in another dimensional space. The subsequent representation projects the initial representation, based on the basis vectors. The subsequent representation reliably corresponds to the media content portion over a change in a geometric orientation thereof. Repeated for other media content portions of the group, subsequent representations of the first and other portions are averaged or transformed over time. The averaged/transformed values reliably correspond to the content portion over speed changes. The initial representation may include spatial or transform related information.

58 citations


Patent
Hannes Muesch1
20 Feb 2008
TL;DR: In this paper, the authors proposed a method for enhancing entertainment audio, such as television audio, to improve the clarity and intelligibility of speech such as dialog and narrative audio, using audio signal processing.
Abstract: The invention relates to audio signal processing. More specifically, the invention relates to enhancing entertainment audio, such as television audio, to improve the clarity and intelligibility of speech, such as dialog and narrative audio. The invention relates to methods, apparatus for performing such methods, and to software stored on a computer-readable medium for causing a computer to perform such methods.

56 citations


Patent
Rongshan Yu1
14 Mar 2008
TL;DR: Speech enhancement based on a psycho-acoustic model is described in this paper, which is capable of preserving the fidelity of speech while sufficiently suppressing noise including the processing artifact known as "musical noise".
Abstract: Speech enhancement based on a psycho-acoustic model is disclosed that is capable of preserving the fidelity of speech while sufficiently suppressing noise including the processing artifact known as “musical noise”

46 citations


Patent
06 Jun 2008
TL;DR: In this paper, the same decorrelation filter sequence may be applied to each of the input audio signals or, alternatively, a different decorrelation filtering scheme may be used for each audio signal.
Abstract: Ambience signal components are obtained from source audio signals, matrix-decoded signal components are obtained from the source audio signals, and the ambience signal components are controllably combined with the matrix-decoded signal components. Obtaining ambience signal components may include applying at least one decorrelation filter sequence. The same decorrelation filter sequence may be applied to each of the input audio signals or, alternatively, a different decorrelation filter sequence may be applied to each of the input audio signals.

38 citations


Patent
04 Jun 2008
TL;DR: In this paper, a method for use in identifying a segment of audio and/or video information comprises obtaining a query fingerprint at each of a plurality of spaced-apart time locations in said segment, searching fingerprints in a database for a potential match for each such query fingerprint, and combining the results of searching for potential matches, wherein each potential match result is weighted by respective confidence level.
Abstract: A method for use in identifying a segment of audio and/or video information comprises obtaining a query fingerprint at each of a plurality of spaced-apart time locations in said segment, searching fingerprints in a database for a potential match for each such query fingerprint, obtaining a confidence level of a potential match to a found fingerprint in the database for each such query fingerprint, and combining the results of searching for potential matches, wherein each potential match result is weighted by respective confidence level. A confidence level may be a function of at least one or both of (1) a measure of difference between a query fingerprint and a found fingerprint and (2) the relative timing relationship between the time location of a query fingerprint and the time location of a found fingerprint.

Journal ArticleDOI
TL;DR: A flexible and low-complexity entropy-constrained vector quantizer (ECVQ) scheme based on Gaussian mixture models, lattice quantization, and arithmetic coding is presented and has a comparable performance to at rates relevant for speech coding with lower computational complexity.
Abstract: A flexible and low-complexity entropy-constrained vector quantizer (ECVQ) scheme based on Gaussian mixture models (GMMs), lattice quantization, and arithmetic coding is presented. The source is assumed to have a probability density function of a GMM. An input vector is first classified to one of the mixture components, and the Karhunen-Loeve transform of the selected mixture component is applied to the vector, followed by quantization using a lattice structured codebook. Finally, the scalar elements of the quantized vector are entropy coded sequentially using a specially designed arithmetic coder. The computational complexity of the proposed scheme is low, and independent of the coding rate in both the encoder and the decoder. Therefore, the proposed scheme serves as a lower complexity alternative to the GMM based ECVQ proposed by Gardner, Subramaniam and Rao. The performance of the proposed scheme is analyzed under a high-rate assumption, and quantified for a given GMM. The practical performance of the scheme was evaluated through simulations on both synthetic and speech line spectral frequency (LSF) vectors. For LSF quantization, the proposed scheme has a comparable performance to at rates relevant for speech coding (20-28 bits per vector) with lower computational complexity.

Patent
11 Apr 2008
TL;DR: In this paper, the authors present a method for accessing a video picture that includes multiple pictures combined into a single picture (826), accessing information indicating how the multiple pictures in the accessed video picture are combined (806, 808, 822), decoding the video picture to provide a decoded representation of at least one of the multiple images (824, 826), and providing the accessed information and the decoded video picture as output.
Abstract: Implementations are provided that relate, for example, to view tiling in video encoding and decoding. A particular method includes accessing a video picture that includes multiple pictures combined into a single picture (826), accessing information indicating how the multiple pictures in the accessed video picture are combined (806, 808, 822), decoding the video picture to provide a decoded representation of at least one of the multiple pictures (824, 826), and providing the accessed information and the decoded video picture as output (824, 826). Some other implementations format or process the information that indicates how multiple pictures included in a single video picture are combined into the single video picture, and format or process an encoded representation of the combined multiple pictures.

Patent
18 Jun 2008
TL;DR: In this article, the perceived loudness of an audio signal is measured by modifying a spectral representation of the audio signal as a function of a reference spectral shape so that the spectral representation conforms more closely to the reference signal spectral shape.
Abstract: The perceived loudness of an audio signal is measured by modifying a spectral representation of an audio signal as a function of a reference spectral shape so that the spectral representation of the audio signal conforms more closely to the reference spectral shape, and determining the perceived loudness of the modified spectral representation of the audio signal.

Patent
20 Jun 2008
TL;DR: In this article, partial tree structures with signatures derived from segments of video and audio content are used by systems to identify content and re-establish the correct alignment between audio and video that have become disassociated with one another.
Abstract: Search tree structures with nodes that represent signatures derived from segments of video and audio content are used by systems to identify content and re-establish the correct alignment between video and audio content that have become disassociated with one another. The amount of storage needed to record data representing the tree structure can be reduced by replacing stored signature sets with signature pointers. The efficiency of searches in the tree structure can be improved by constructing and using partial tree structures.

Patent
Rongshan Yu1
14 Mar 2008
TL;DR: In this article, a speech enhancement method for devices having limited available memory is described, which is appropriate for very noisy environments and is capable of estimating the relative strengths of speech and noise components during both the presence as well as the absence of speech.
Abstract: A speech enhancement method operative for devices having limited available memory is described. The method is appropriate for very noisy environments and is capable of estimating the relative strengths of speech and noise components during both the presence as well as the absence of speech.

Patent
01 May 2008
TL;DR: In this paper, a signature that can be used to identify video content in a series of video frames is generated by first calculating the average and variance of picture elements in a low-resolution composite image that represents a temporal and spatial composite of the video contents in the series of frames.
Abstract: A signature that can be used to identify video content in a series of video frames is generated by first calculating the average and variance of picture elements in a low-resolution composite image that represents a temporal and spatial composite of the video content in the series of frames. The signature is generated by applying a hash function to values derived from the average and variance composite representations. The video content of a signal can be represented by a set of signatures that are generated for multiple series of frames within the signal. A set of signatures can provide reliable identifications despite intentional and unintentional modifications to the content.

Patent
12 Sep 2008
TL;DR: In this article, a hybrid stereophonic/monophonic audio signal decoding scheme was proposed, in which the audio signal is a discrete two-channel stereo signal below a frequency f m and a single-channel monophonic signal above f m.
Abstract: A hybrid stereophonic/monophonic audio signal encoding comprises generating, in response to a discrete two-channel stereophonic audio signal, an encoded hybrid stereophonic/monophonic audio signal in which the audio signal is a discrete two-channel audio signal below a frequency f m and a single-channel monophonic audio signal above the frequency f m , generating, in response to the discrete two-channel stereophonic audio signal, spatial parameter information characterizing the discrete two-channel stereophonic audio signal above the frequency f m , and combining the hybrid stereophonic/monophonic audio signal with said spatial parameter information in such a manner that the resulting signal is decodable both by a decoder configured to decode a discrete two-channel stereophonic audio signal encoded with the same encoding as applied to the hybrid stereophonic/monophonic audio signal and by a decoder configured to decode, with the use of the spatial parameter information, the hybrid stereophonic/monophonic audio signal. A hybrid stereophonic/monophonic audio signal decoding is also provided.

Proceedings ArticleDOI
26 Aug 2008
TL;DR: A framework based on signatures extracted from audio and video streams for automatically measuring and maintaining synchronization between the two streams and can achieve > 93.0% accuracy in synchronization.
Abstract: We propose a framework based on signatures extracted from audio and video streams for automatically measuring and maintaining synchronization between the two streams. The audio signature is based on projections of a coarse representation of the spectrogram onto random vectors. The video signature is based on projections of a coarse representation of the difference image between two consecutive frames onto random vectors. The time alignment present at the signature generator between the two streams is recorded by combining audio and video signatures into a combined synchronization signature. At the detector after video and audio streams go through different processing operations, we extract the signatures again. The signatures extracted before and after processing from the audio and the video are compared independently using a Hamming distance based correlator to estimate the relative misalignment introduced due to processing in each of the streams. Then, the estimated relative misalignment between the audio and video streams is used to preserve the same alignment between the streams that was present before processing. Our experimental results show that we can achieve > 93.0% accuracy in synchronization.

Patent
08 Aug 2008
TL;DR: In this paper, a multimedia coding and decoding system and method is presented that uses the specific prediction mode to signal supplemental information, e.g., metadata, while considering and providing trade offs between coding performance and metadata capacity.
Abstract: A multimedia coding and decoding system and method is presented that uses the specific prediction mode to signal supplemental information, e.g., metadata, while considering and providing trade offs between coding performance and metadata capacity. The prediction mode can be encoded according to a mode table that relates mode to bits and by considering coding impact. Start and stop codes can be used to signal the message, while various techniques of how to properly design the mode to bits tables are presented.

Proceedings ArticleDOI
12 Dec 2008
TL;DR: A novel block-based intra-prediction scheme is proposed for efficient image (or intra-frame) coding, where various 2D geometrical manipulations to reference image blocks to enrich the pool of prediction blocks for a given target block.
Abstract: A novel block-based intra-prediction scheme is proposed for efficient image (or intra-frame) coding, where we apply various 2D geometrical manipulations to reference image blocks to enrich the pool of prediction blocks for a given target block. As compared with the traditional line-based intra prediction in H.264/AVC, the new scheme offers a significant coding gain (about 0.24-1.23dB in the PSNR value with the same bit rate) at the cost of higher complexity. Several techniques to reduce the search complexity are also discussed.

Patent
Rongshan Yu1
10 Sep 2008
TL;DR: In this paper, the level of estimated noise components is determined at least in part by comparing an estimated noise component level with the audio signal in the subband and increasing the estimation of the noise components level by a predetermined amount.
Abstract: Enhancing speech components of an audio signal composed of speech and noise components includes controlling the gain of the audio signal in ones of its subbands, wherein the gain in a subband is reduced as the level of estimated noise components increases with respect to the level of speech components, wherein the level of estimated noise components is determined at least in part by (1) comparing an estimated noise components level with the level of the audio signal in the subband and increasing the estimated noise components level in the subband by a predetermined amount when the input signal level in the subband exceeds the estimated noise components level in the subband by a limit for more than a defined time, or (2) obtaining and monitoring the signal-to-noise ratio in the subband and increasing the estimated noise components level in the subband by a predetermined amount when the signal-to-noise ratio in the subband exceeds a limit for more than a defined time.

Patent
30 Dec 2008
TL;DR: In this paper, a linear prediction unit for filtering an input signal based on an adaptive filter was proposed, and a transformation unit for transforming a frame of the filtered input signal into a transform domain.
Abstract: The present invention teaches a new audio coding system that can code both general audio and speech signals well at low bit rates. A proposed audio coding system comprises a linear prediction unit for filtering an input signal based on an adaptive filter; a transformation unit for transforming a frame of the filtered input signal into a transform domain; a quantization unit for quantizing a transform domain signal; a long term prediction unit for determining an estimation of the frame of the filtered input signal based on a reconstruction of a previous segment of the filtered input signal; and a transform domain signal combination unit for combining, in the transform domain, the long term prediction estimation and the transformed input signal to generate the transform domain signal.

Patent
11 Jul 2008
TL;DR: In this paper, the authors proposed a method for controlling the loudness of auditory events in an audio signal. In an embodiment, the method includes weighting the auditory events (an auditory event having a spectrum and a loudness), using skewness in the spectra and controlling loudness, using the weights.
Abstract: A method for controlling the loudness of auditory events in an audio signal. In an embodiment, the method includes weighting the auditory events (an auditory event having a spectrum and a loudness), using skewness in the spectra and controlling loudness of the auditory events, using the weights. Various embodiments of the invention are as follows: The weighting being proportionate to the measure of skewness in the spectra; the measure of skewness is a measure of smoothed skewness; the weighting is insensitive to amplitude of the audio signal; the weighting is insensitive to power; the weighting is insensitive to loudness; and any relationship between signal measure and absolute reproduction level is not known at the time of weighting; the weighting includes weighting auditory-event-boundary importance, using skewness in the spectra.

Patent
Hannes Muesch1
12 Feb 2008
TL;DR: In this article, a high-quality audio program that is a mix of speech and non-speech audio with a lower quality copy of the speech components contained in the audio program for the purpose of generating a high quality audio program with an increased ratio of speech to nonspeech audio such as may benefit the elderly, hearing impaired or other listeners.
Abstract: The invention relates to audio signal processing and speech enhancement. In accordance with one aspect, the invention combines a high-quality audio program that is a mix of speech and non-speech audio with a lower-quality copy of the speech components contained in the audio program for the purpose of generating a high-quality audio program with an increased ratio of speech to non-speech audio such as may benefit the elderly, hearing impaired or other listeners. Aspects of the invention are particularly useful for television and home theater sound, although they may be applicable to other audio and sound applications. The invention relates to methods, apparatus for performing such methods, and to software stored on a computer-readable medium for causing a computer to perform such methods.

Patent
10 Sep 2008
TL;DR: In this article, a method for enhancing speech components of an audio signal composed of speech and noise components processes subbands of the audio signal, the processing including controlling the gain of audio signal in ones of the subbands, wherein the gain in a subband is controlled at least by processes that convey either additive/subtractive differences in gain or multiplicative ratios of gain.
Abstract: A method for enhancing speech components of an audio signal composed of speech and noise components processes subbands of the audio signal, the processing including controlling the gain of the audio signal in ones of the subbands, wherein the gain in a subband is controlled at least by processes that convey either additive/subtractive differences in gain or multiplicative ratios of gain so as to reduce gain in a subband as the level of noise components increases with respect to the level of speech components in the subband and increase gain in a subband when speech components are present in subbands of the audio signal, the processes each responding to subbands of the audio signal and controlling gain independently of each other to provide a processed subband audio signal.

Patent
28 Mar 2008
TL;DR: In this paper, a watermarked transport stream is created by interleaving selected processed content packets and associated carrier packets according to a watermark message, and the associated carrier packet is paired with processed content packet.
Abstract: Methods and apparatuses for processing and watermarking a transport stream with a message. A processed transport stream that includes processed content packets, associated carrier packets, and a watermark descriptor for a group of the associated carrier packets is created from the transport stream. The processed content data represent a first watermark value and are bounded by transport sector boundaries. The associated carrier packets include replacement watermark data that represent a second watermark value and are bounded by transport sector boundaries. These associated carrier packets are paired with processed content packets. The watermark descriptor includes synchronization data. A watermarked transport stream is created by interleaving selected processed content packets and associated carrier packets according to a watermark message.

Patent
21 Nov 2008
TL;DR: In this article, the motion estimate is smoothed over the temporal window to facilitate aligning at least part of the image feature within the set of frames, where the regions of the two frames contain at least a portion of an image feature.
Abstract: For a frame set of a moving image sequence, a motion estimate is accessed. The motion estimate describes a change to a region of a reference frame with respect to at least one other frame. The reference frame and the other frames are displaced from each other within the frame set from over a temporal window. The regions of the two frames contain at least a portion of an image feature. The motion estimate is smoothed over the temporal window. The smoothing may facilitate aligning, at least in part, the image feature within the set of frames.

Patent
19 Dec 2008
TL;DR: In this paper, an initial motion estimation using an initial error metric function can be performed, and if the initial metric function is not the optimal error function, then a final motion estimation is performed using a selected optimal metric function.
Abstract: Optimal error metric function for motion estimation is determined and used for video coding and/or video processing of images. To do so, an initial motion estimation using an initial error metric function can be performed. This can produce motion prediction errors. If the initial error metric function is not the optimal error function, then a final motion estimation is performed using a selected optimal error metric function. In some embodiments, a shape of error distribution can be used to determine the optimal error metric function. Some example systems or devices for this motion estimation can include systems or devices for compression, temporal interpolation, and/or super-resolution processing.

Patent
26 Nov 2008
TL;DR: In this paper, a video processor for replacement-based watermarking may include a video content input channel, a preprocessor, a replacement based watermarker, and a video encoder.
Abstract: A video processor for replacement based watermarking may include a video content input channel; a video content preprocessor; a replacement based watermarking (RBW) metadata creator; and a video content encoder. To output the encoded video content, the video processor may have a dual or single output channel. For a dual output channel, the video processor includes an encoded video content output channel and an RBW metadata output channel for outputting encoded video content and RBW metadata as separate streams for further processing and distribution. For a single output channel, the video processor includes a video content output channel for outputting encoded video content combined with RBW metadata as a single output stream for further processing and distribution.