scispace - formally typeset
Search or ask a question
Patent

Source coding enhancement using spectral-band replication

TL;DR: In this article, the authors proposed a new method and apparatus for the enhancement of source coding systems, which employs bandwidth reduction (101) prior to or in the encoder, followed by spectral-band replication (105) at the decoder.
Abstract: The present invention proposes a new method and apparatus for the enhancement of source coding systems. The invention employs bandwidth reduction (101) prior to or in the encoder (103), followed by spectral-band replication (105) at the decoder (107). This is accomplished by the use of new transposition methods, in combination with spectral envelope adjustments. Reduced bitrate at a given perceptual quality or an improved perceptual quality at a given bitrate is offered. The invention is preferably integrated in a hardware or software codec, but can also be implemented as a separate processor in combination with a codec. The invention offers substantial improvements practically independent of codec type and technological progress.
Citations
More filters
Patent
20 Jun 2007
TL;DR: In this article, the authors describe 3D pointing devices which enhance usability by transforming sensed motion data from a first frame of reference (e.g., the body of the pointing device) into a second frame of a reference frame, i.e., a user's reference frame.
Abstract: Systems and methods according to the present invention describe 3D pointing devices which enhance usability by transforming sensed motion data from a first frame of reference (e.g., the body of the 3D pointing device) into a second frame of reference (e.g., a user's frame of reference). One exemplary embodiment of the present invention removes effects associated with a tilt orientation in which the 3D pointing device is held by a user.

331 citations

Patent
07 Nov 2002
TL;DR: In this paper, an encoding device (200) includes an MDCT unit (202) that transforms an input signal in a time domain into a frequency spectrum including a lower frequency spectrum, a BWE encoding unit (204) that generates extension data which specifies a higher frequency spectrum at higher frequency than the lower spectrum, and an encoded data stream generating unit (205) that encodes to output the lower-frequency spectrum obtained by the MDCT units and the extension data obtained by BWE units.
Abstract: An encoding device (200) includes an MDCT unit (202) that transforms an input signal in a time domain into a frequency spectrum including a lower frequency spectrum, a BWE encoding unit (204) that generates extension data which specifies a higher frequency spectrum at a higher frequency than the lower frequency spectrum, and an encoded data stream generating unit (205) that encodes to output the lower frequency spectrum obtained by the MDCT unit (202) and the extension data obtained by the BWE encoding unit (204). The BWE encoding unit (204) generates as the extension data (i) a first parameter which specifies a lower subband which is to be copied as the higher frequency spectrum from among a plurality of the lower subbands which form the lower frequency spectrum obtained by the MDCT unit (202) and (ii) a second parameter which specifies a gain of the lower subband after being copied.

232 citations

Patent
20 Jun 2007
TL;DR: In this paper, a 3D pointing device, which uses at least one sensor to detect motion of the handheld device, is presented, and the detected motion can then be mapped into a desired output, e.g., cursor movement.
Abstract: Systems and methods according to the present invention address these needs and others by providing a handheld device, e.g., a 3D pointing device, which uses at least one sensor to detect motion of the handheld device. The detected motion can then be mapped into a desired output, e.g., cursor movement.

218 citations

Patent
29 Feb 2008
TL;DR: The speech encoding apparatus (220) as mentioned in this paper has a first layer encoding section (2201), a first-layer decoding section (2202), a delay section(2203), a subtracting section (104), a frequency domain transforming section (101), a second-layer encoding section(105) and a multiplexing section(106).
Abstract: The speech encoding apparatus (220) has a first layer encoding section (2201), a first layer decoding section (2202), a delay section (2203), a subtracting section (104), a frequency domain transforming section (101), a second layer encoding section (105) and a multiplexing section (106).

186 citations

Patent
30 Dec 2008
TL;DR: In this article, a linear prediction unit for filtering an input signal based on an adaptive filter; a transformation unit for transforming a frame of the filtered input signal into a transform domain; and a quantization unit for quantizing the transform domain signal.
Abstract: The present invention teaches a new audio coding system that can code both general audio and speech signals well at low bit rates. A proposed audio coding system comprises linear prediction unit for filtering an input signal based on an adaptive filter; a transformation unit for transforming a frame of the filtered input signal into a transform domain; and a quantization unit for quantizing the transform domain signal. The quantization unit decides, based on input signal characteristics, to encode the transform domain signal with a model-based quantizer or a non-model-based quantizer. Preferably, the decision is based on the frame size applied by the transformation unit.

170 citations

References
More filters
Journal ArticleDOI
TL;DR: Two lapped transforms for subband/transform coding of signals are introduced: a version of the lapped orthogonal transform (LOT), which can be efficiently computed for any transform length; and the modulated lapped transform (MLT), which is based on a modulated quadrature mirror (QMF) bank.
Abstract: Two lapped transforms for subband/transform coding of signals are introduced: a version of the lapped orthogonal transform (LOT), which can be efficiently computed for any transform length; and the modulated lapped transform (MLT), which is based on a modulated quadrature mirror (QMF) bank. The MLT can also be efficiently computed by means of a type-IV discrete sine transform (DST-IV). The LOT and the MLT are both asymptotically optimal lapped transforms for coding an AR(1) signal with a high intersample correlation coefficient. The coding gains of the LOT and MLT of length M are higher than that of the discrete cosine transform (DCT) of the same length; they are actually close to the coding gains obtained with a DCT of length 2M. An MLT-based adaptive transform coder (ACT) for speech signals is simulated; the code is essentially free from frame rate noise and has a better spectral resolution that DCT-based ATC systems. >

513 citations

Patent
02 May 1996
TL;DR: A subband audio coder employs perfect/nonperfect reconstruction filters, predictive/non-predictive subband encoding, transient analysis, and psycho-acoustic/minimum mean square error (mmse) bit allocation over time, frequency and the multiple audio channels to encode/decode a data stream to generate high fidelity reconstructed audio as discussed by the authors.
Abstract: A subband audio coder employs perfect/non-perfect reconstruction filters, predictive/non-predictive subband encoding, transient analysis, and psycho-acoustic/minimum mean-square-error (mmse) bit allocation over time, frequency and the multiple audio channels to encode/decode a data stream to generate high fidelity reconstructed audio. The audio coder windows the multi-channel audio signal such that the frame size, i.e. number of bytes, is constrained to lie in a desired range, and formats the encoded data so that the individual subframes can be played back as they are received thereby reducing latency. Furthermore, the audio coder processes the baseband portion (0-24 kHz) of the audio bandwidth for sampling frequencies of 48 kHz and higher with the same encoding/decoding algorithm so that audio coder architecture is future compatible.

274 citations

Patent
11 Oct 1994
TL;DR: In this paper, a method for determining estimates of the perceived noise masking level of audio signals as a function of frequency is presented, where the noise spectrum is shaped based on a noise threshold and a tonality measure for each critical frequency-band (bark).
Abstract: A method is disclosed for determining estimates of the perceived noise masking level of audio signals as a function of frequency. By developing a randomness metric related to the euclidian distance between (i) actual frequency components amplitude and phase for each block of sampled values of the signal and (ii) predicted values for these components based on values in prior blocks, it is possible to form a tonality index which provides more detailed information useful in forming the noise masking function. Application of these techniques is illustrated in a coding and decoding context for audio recording or transmission. The noise spectrum is shaped based on a noise threshold and a tonality measure for each critical frequency-band (bark).

258 citations

Journal ArticleDOI
TL;DR: This paper discusses a digital formulation of the phase vocoder, an analysis-synthesis system providing a parametric representation of a speech waveform by its short-time Fourier transform, designed to be an identity system in the absence of any parameter modifications.
Abstract: This paper discusses a digital formulation of the phase vocoder, an analysis-synthesis system providing a parametric representation of a speech waveform by its short-time Fourier transform. Such a system is of interest both for data-rate reduction and for manipulating basic speech parameters. The system is designed to be an identity system in the absence of any parameter modifications. Computational efficiency is achieved by employing the fast Fourier transform (FFT) algorithm to perform the bulk of the computation in both the analysis and synthesis procedures, thereby making the formulation attractive for implementation on a minicomputer.

240 citations

Patent
05 Dec 1996
TL;DR: In this paper, auxiliary data subband samples representing an auxiliary data signal are transported in a subband-coded compressed digital audio signal without decompressing the data, which is carried substantially inaudibly in the audio signal.
Abstract: Auxialiary data subband samples representing an auxiliary data signal (315) are transported in a subband-coded compressed digital audio signal without decompressing the data. A pre-existing packetized data stream (305) is provided to an input of an encoder (310). Subband audio samples (406) are extracted from the packet stream and normalized (408). The data to be transported modulates data carrier subbands (SPD0, SPD1, ---SPDN-1) including a pseudo-noise (PN) spread spectrum signal, each subband of which has a bandwidth corresponding to those of the digital audio signal. The modulated data carrier sequence is combined with the audio subband samples (SS1, SS2---SSN-1) to form a combined signal (452), then multiplexed (460) into pre-existing packet stream (407). In the decoder (368), the combined signal is demodulated to recover the auxiliary data signal (672). The recovered auxiliary data signal is carried substantially inaudibly in the audio signal and is spectrally shaped according to the audio signal to enhance concealment.

220 citations