scispace - formally typeset
Search or ask a question

Showing papers on "Speech coding published in 1990"



Proceedings ArticleDOI
I.A. Gerson1, M.A. Jasiuk1
03 Apr 1990
TL;DR: The vector sum excited linear prediction speech coder is presented, and it utilizes a codebook with a structure that allows for a very efficient search procedure.
Abstract: The vector sum excited linear prediction speech coder is presented. It utilizes a codebook with a structure that allows for a very efficient search procedure. Other advantages of the VSELP codebook structure are discussed, and a detailed description of an 8-kb/s VSELP coder is given. This coder was selected by the Telecommunications Industry Association (TIA) as the standard for use in North American digital cellular telephone systems. The coder uses two VSELP excitation codebooks, a gain quantizer which is robust to channel errors, and a novel adaptive pre/postfilter arrangement. >

288 citations


Journal ArticleDOI
Kuldip K. Paliwal1, B. Atal1
TL;DR: A split vector quantization approach is used to overcome the complexity problem of LPC vector and each part is vector‐quantized separately.
Abstract: Linear prediction coding (LPC) parameters are widely used in various speech processing applications for representing the spectral envelope information of speech. For low‐bit‐rate speech coding application, it is important to quantize these parameters accurately using as few bits as possible without sacrificing the speech quality. Though the vector quantizers are more efficient than the scalar quantizers, their use for fine quantization of LPC information (using 24–26 bits/frames) is impeded due to their prohibitively high complexity. In this paper, a split vector quantization approach is used to overcome the complexity problem. Here, the LPC vector is divided into two parts and each part is vector‐quantized separately. The splitting of LPC vector is studied in the following three domains: (1) line spectral‐pair frequency (LSF), (2) arc‐sine reflection coefficient, and (3) log area ratio. Splitting in LSF domain is found to be the best. Using the localized spectral properties of the LSF parameters, a weigh...

211 citations


Proceedings ArticleDOI
P. Kroon1, Bishnu S. Atal1
03 Apr 1990
TL;DR: A first-order pitch predictor is described whose delay is specified as an integer number of samples plus a fraction of a sample at the current sampling rate, which has a better performance than conventional multiple coefficient predictors and leads to more efficient coding of the predictor parameters.
Abstract: A first-order pitch predictor is described whose delay is specified as an integer number of samples plus a fraction of a sample at the current sampling rate. This realization has a better performance than conventional multiple coefficient predictors and leads to more efficient coding of the predictor parameters. Also discussed is the application of noninteger delay pitch predictors to low-bit-rate speech coding. >

208 citations


PatentDOI
Kazunori Ozawa1
TL;DR: In this article, a speech coding method in which spectrum parameters representing a spectrum envelope and a pitch parameter representing a pitch are obtained from an input discrete speech signal is presented. And a frame interval is divided into subintervals in accordance with the pitch parameter.
Abstract: A speech coding method in which spectrum parameter representing a spectrum envelope and a pitch parameter representing a pitch are obtained from an input discrete speech signal. A frame interval is divided into subintervals in accordance with the pitch parameter. A sound source signal in one of the subintervals is obtained by obtaining a multipulse with respect to a difference signal obtained by performing prediction on the basis of a past sound source signal. Correction information for correcting at least one of the amplitude and the phase of the sound source signal are obtained and output in other pitch intervals in the frame.

183 citations


PatentDOI
TL;DR: In this paper, a speech coding system which recursively executes a filter-applied "Toeplitz characteristic" by causing a drive signal (i.e., an excitation signal) to be converted into a "Toplitz matrix" when detecting a pitch period in which distortion of the input vector and the vector subsequent to the application of filter applied computation to the drive signal vector in the pitch forecast called either closed loop or compatible code book is minimized.
Abstract: This invention provides a novel speech coding system which recursively executes a filter-applied "Toeplitz characteristic" by causing a drive signal (i.e., an excitation signal) to be converted into a "Toeplitz matrix" when detecting a pitch period in which distortion of the input vector and the vector subsequent to the application of filter-applied computation to the drive signal vector in the pitch forecast called either "closed loop" or "compatible code book" is minimized. The vector quantization method substantially making up the speech coding system of the invention is characteristically used by the system.

181 citations


Proceedings ArticleDOI
03 Apr 1990
TL;DR: A pitch estimation criterion is derived that is inherently unambiguous, uses pitch-adaptive resolution, uses small-signal suppression to provide enhanced discrimination, and uses amplitude compression to eliminate the effects of pitch-formant interaction.
Abstract: A technique for estimating the pitch of a speech waveform is developed. It fits a harmonic set of sine waves to the input data using a mean-squared-error (MSE) criterion. By exploiting a sinusoidal model for the input speech waveform, a pitch estimation criterion is derived that is inherently unambiguous, uses pitch-adaptive resolution, uses small-signal suppression to provide enhanced discrimination, and uses amplitude compression to eliminate the effects of pitch-formant interaction. The normalized minimum mean squared error proves to be a powerful discriminant for estimating the likelihood that a given frame of speech is voiced. >

145 citations


Journal ArticleDOI
TL;DR: Special fast procedures for the code excited linear predictive coding (CELP) algorithm have been developed to make implementation on modest hardware possible and their storage requirement and numerical accuracy are discussed.
Abstract: Special fast procedures for the code excited linear predictive coding (CELP) algorithm have been developed to make implementation on modest hardware possible. The advantages, as well as the disadvantages, of the various fast procedures are discussed. A general formalism for the algorithm is developed, followed by the discussion of the individual procedures which are grouped according to their features. Along with the computational complexity of each procedure, its storage requirement and numerical accuracy are discussed. A large number of the fast procedures are designed to search through a particular type of codebook (most of the codebooks are stochastic in character, while a few are deterministic). Other fast procedures can be used for arbitrary codebooks and are thus also applicable to trained codebooks. Some of the fast procedures designed for stochastic codebooks can also be used for the computation of the closed pitch loop parameters, which can be interpreted as a search through a time-dependent codebook. >

112 citations


PatentDOI
TL;DR: A speech coder employs vector quantization of LPC parameters, interpolation, and trellis coding for improved speech coding at low bit rates (400 bps).
Abstract: A speech coder employs vector quantization of LPC parameters, interpolation, and trellis coding for improved speech coding at low bit rates (400 bps). The speech coder has an LPC analysis module for converting input speech to LPC parameters, an LSP conversion module for converting LPC parameters into line spectrum frequencies (LSP) data, and a vector quantization and interpolation (VQ/I) module for encoding the LSP data into vector indexes for transmission by applying LPC spectral amplitude as weighting coefficients to the LSP data. The VQ/I module outputs one vector index for every two LPC frames in order to reduce the transmission bit rate, and the omitted frames are interpolated on the receiving end. A decoder correspondingly decodes incoming indexes to LPC parameters and synthesizes them into output speech. Trellis coders with an adaptive tracking function encode the pitch and gain parameters of the LPC frames. A universal codebook stores codewords according to a plurality of accents. The speech coder automatically identifies a speaker's accent and selects the corresponding vocabulary of codewords in order to more intelligibly encode and decode the speaker's speech.

101 citations


Journal ArticleDOI
TL;DR: In this article, five approaches that can be used to control and simplify the speech recognition task are examined: isolated words, speaker-dependent systems, limited vocabulary size, a tightly constrained grammar, and quiet and controlled environmental conditions.
Abstract: Five approaches that can be used to control and simplify the speech recognition task are examined. They entail the use of isolated words, speaker-dependent systems, limited vocabulary size, a tightly constrained grammar, and quiet and controlled environmental conditions. The five components of a speech recognition system are described: a speech capture device, a digital signal processing module, preprocessed signal storage, reference speech patterns, and a pattern-matching algorithm. Current speech recognition systems are reviewed and categorized. Speaker recognition approaches and systems are also discussed. >

87 citations


PatentDOI
Danny Thomas Pinckley1
TL;DR: An automatic gain control circuit uses a speech recognizer to obtain smoothautomatic gain control and AGC is not used until it is required (i.e., when speech is present).
Abstract: An automatic gain control circuit uses a speech recognizer to obtain smooth automatic gain control. An analog audio input signal is converted to a digital signal by an analog-to-digital converter and delayed by a delay circuit. A frame power (or alternatively, rectified peak amplitude) detector determines the power of each frame (or alternatively, the rectified peak amplitude) of the audio input signal, after applied to the A/D converter. A linear-to-log converter converts those values to a logarithmic form (for gain control over a broad range of values). A detected speech smoothing circuit smooths the variation in the values determined by the frame power (or peak amplitude) detector. A summer subtracts the output of the detected speech smoothing means from a fixed reference level, and thus obtains an error signal from the desired reference. A gain smoothing circuit smooths the resulting error signal (which is the logarithmically-shaped gain signal). A logarithm-to-linear converter converts the logarithmic gain signal to a linear form; and a multiplier multiplies the input signal by this smoothed gain. In accordance with the invention, a speech recognizer determines whether the audio input signal represents speech. An output of the speech recognizer is used to enable the detected speech smoothing circuit and the gain smoothing means when the audio input signal represents speech. Thus AGC is not used until it is required (i.e., when speech is present).

PatentDOI
TL;DR: In this article, the search complexity in finding the best codeword is greatly reduced by bringing the search back to the algebraic code domain, thereby allowing the sparsity of the codebook to speed up the necessary computations.
Abstract: A method of encoding a speech signal is presented. This method improves the excitation codebook and search procedure of the conventional Code Excited Linear Prediction (CELP) speech encoders. Use is made of a dynamic codebook (201, 202) based on the combination of two modules: a sparse algebraic code generator (201) associated to a filter (202) having a transfer function varying in time. The generator (102) is a structured codebook with codewords having very few non zero components. The filter (202) shapes the spectral characteristics whereby the resulting excitation codebook (201, 202) exhibits favorable perceptual properties. The search complexity in finding the best codeword is greatly reduced by bringing the search back to the algebraic code domain thereby allowing the sparsity of the algebraic code to speed up the necessary computations.

Proceedings ArticleDOI
Juin-Hwey Chen1
03 Apr 1990
TL;DR: A high-quality 16-kb/s speech coder which has a one-way coding delay of less than 2 ms is presented and formal subjective tests indicate that this coder produces high- quality speech comparable to that of the CCITT G.721 32- kb/s ADPCM standard.
Abstract: A high-quality 16-kb/s speech coder which has a one-way coding delay of less than 2 ms is presented. The coder is basically a backward-adaptive version of the code-excited linear prediction (CELP) coder. The low coding delay is achieved by using backward-adaptive predictor and gain and by using an excitation vector size as small as five samples. The pitch predictor in conventional CELP coders is eliminated, and the linear predictive coding (LPC) predictor order is increased from 10 to 50. The excitation gain is updated by a tenth-order adaptive logarithmic gain predictor. This log-gain predictor and the LPC predictor are updated by performing LPC analysis on previous log-gain and coded speech, respectively. The excitation codebook is closed-loop optimized and the codebook index is Gray-coded to improve the robustness against channel errors. Formal subjective tests indicate that this coder produces high-quality speech comparable to that of the CCITT G.721 32-kb/s ADPCM standard. >

Journal ArticleDOI
TL;DR: The performance of predictive TCQ (PTCQ) is compared to that of other waveform coders, and the effects of channel errors on PTCQ performance are discussed.
Abstract: Trellis-coded quantization (TCQ) is incorporated into a predictive coding structure for encoding sampled speech. The modest complexity of the resulting structure is seen to be a direct consequence of the TCQ formulation. Simulation results are presented for systems using fixed-prediction/fixed-residual encoding, fixed-prediction/adaptive-residual encoding, and adaptive-prediction/adaptive-residual encoding. The performance of predictive TCQ (PTCQ) is compared to that of other waveform coders, and the effects of channel errors on PTCQ performance are discussed. For a fully adaptive 16-kb/s speech coding system, segmental signal-to-noise ratios in the range of 19.1-21.9 dB are obtained for a variety of speakers and test sentences. Reconstructed speech obtained from this system is of excellent communication quality. >

Proceedings ArticleDOI
03 Apr 1990
TL;DR: In this paper, a scheme for long-term prediction in CELP (code-excited linear predictive) coding using fractional delay prediction was discussed, which permits a more accurate representation of voiced speech and achieves an improvement of synthetic quality for female speakers.
Abstract: A scheme is discussed for long-term prediction in CELP (code-excited linear predictive) coding using fractional delay prediction. This technique permits a more accurate representation of voiced speech and achieves an improvement of synthetic quality for female speakers. The higher complexity of this type of predictor relative to the classical one is its major disadvantage. Suboptimal schemes in which the search for the functional pitch delay is restricted to a neighborhood of an integer pitch estimate can be envisaged to decrease the computational load. >

Journal ArticleDOI
TL;DR: A combined subband speech coding (SBC), Bose-Chaudhuri-Hocquenghem (BCH) error-correction coding, and 16-level quadrature amplitude modulation (16-QAM) scheme with switched diversity and speech postenhancement is proposed.
Abstract: A combined subband speech coding (SBC), Bose-Chaudhuri-Hocquenghem (BCH) error-correction coding, and 16-level quadrature amplitude modulation (16-QAM) scheme with switched diversity and speech postenhancement is proposed. The system's performance is dramatically improved by deploying some degree of fade tracking capability over fading channels. Further quality enhancement accrues by using appropriate mapping between the SBC speech codec and the Gray coded QAM words. Various BCH codes are utilized to adequately match the error-correcting power to the perceptual importance of the SBC bits. One of the proposed systems operates at 7 kBd and yields good communications-quality speech for channel signal-to-noise ratios (SNRs) in excess of 20 dB and encounters a maximum overall system delay of 55.125 ms. A more complex arrangement uses second-order switched diversity to reduce the channel SNR required to around 16 dB and the transmission rate to 5 kBd when the vehicular speed is 30 mph while the system delay is unchanged at 55.125 ms. >

Patent
05 Nov 1990
TL;DR: In this article, a signal coding apparatus coupled to a receiver having a receiver buffer and a decoder, includes a coding unit for coding a signal and outputting information generated in a frame unit, the information being a coded signal.
Abstract: A signal coding apparatus, which is coupled, via a transmission path, to a receiver having a receiver buffer and a decoder, includes a coding unit for coding a signal and outputting information generated in a frame unit, the information being a coded signal. The apparatus also includes a transmitter buffer for temporarily storing the information, and a controller for controlling an amount of the information on the basis of a storage capacity of the receiver buffer and an amount of the information which is contained in a frame per a unit time. There is also provided a method used in the above coding apparatus, and a signal coding transmission system employing the signal coding apparatus.

Proceedings ArticleDOI
03 Apr 1990
TL;DR: The hierarchical lapped transform has a much lower computational complexity than a tree-structured QMF filter bank, and so with HLTs a much larger number of bands can be used in practice.
Abstract: The hierarchical lapped transform (HLT) is defined. It is based on the modulated lapped transform (MLT). The HLT has a much lower computational complexity than a tree-structured QMF filter bank, and so with HLTs a much larger number of bands can be used in practice. The coding gain of HLTs is close to that of a full-length MLT, for the same number of bands, and therefore there is no significant loss of coding efficiency. With HLTs transient signals can be better reconstructed than with nonhierarchical transforms, as demonstrated by speech and image coding examples. In image coding applications, the HLT can also be used for progressive transmission. >

PatentDOI
Fumio Amano1, Tomohiko Taniguchi1, Yoshinori Tanaka1, Yasuji Ota1, Shigeyuki Unagami1 
TL;DR: In this paper, a speech coding apparatus which selects an optimum code from a code book is presented, the optimum code giving the minimum magnitude of error signal between the input signal and the reproduced signal obtained by a filter calculation using a linear prediction parameter from a linear predictive analysis unit with respect to the codes of the code data, wherein use is made, as the codes, of a code formed by thinning to 1/M (M being an integer of two or more) the plurality of sampling values constituting the codes.
Abstract: A speech coding apparatus which selects an optimum code from a code book (21), the optimum code giving the minimum magnitude of error signal between the input signal and the reproduced signal obtained by a filter calculation using a linear prediction parameter from a linear predictive analysis unit (10) with respect to the codes of the code data, wherein use is made, as the codes, of a code formed by thinning to 1/M (M being an integer of two or more) the plurality of sampling values constituting the codes. To compensate for the deterioration of the quality of the reproduced signal caused by thinning the sampling values in this way, an additional linear predictive analysis unit (20) is further introduced and use made of an amended linear prediction parameter instead of the linear prediction parameter.

PatentDOI
TL;DR: In this article, a speech coding apparatus coupled to a transmission channel includes m (m is an integer greater than 1) coders, m decoders and m or (m-1) error-correcting coders.
Abstract: A speech coding apparatus coupled to a transmission channel includes m (m is an integer greater than 1) coders, m decoders and m or (m-1) error correcting coders. The apparatus also includes an evaluation unit which evaluates a quality of each of reproduced speech signals from the input speech signal and the reproduced speech signals and which outputs an evaluated quality of each of the reproduced speech signals. The quality of each of the reproduced speech signals is evaluated in a state having no transmission error. A decision unit identifies one of the m coders which provides the reproduced speech signal having a smallest distortion on the basis of the evaluated quality of each of the reproduced speech signals, a current error rate of the transmission channel and error correcting abilities of the error correcting coders, and generates a coder identification number representative of a selected one of the m coders. An output part outputs a multiplexed transmission signal including the coded speech signal generated by the one of the m coders identified by the decision unit and the error correcting code generated by a corresponding one of the m error correcting coders.

Proceedings ArticleDOI
03 Apr 1990
TL;DR: It is observed that recognition scores cannot necessarily be improved by reducing the variability for one specific parameter, and recognition scores are not directly related to the increase of the vocal effort and cannot be predicted from speech variability.
Abstract: The purpose of this study was (1) to determine what are the acoustic-phonetic differences between speech produced in quiet and speech produced in noise (Lombard speech) and (2) to evaluate the influence of these differences on human listeners and automatic speech recognizers. The acoustical analyses, done at the phonetic level on about 40 parameters, showed significant differences in variability for male and female speakers. In addition to replicating previous studies, the authors investigated more parameters, and examined the influence of the Lombard effect on female speakers. Perceptual experiments, run for foreign listeners, exhibited a decrease of the intelligibility for some confusable subsets of the vocabulary studied. The findings are correlated with the performance of a DTW-based recognizer, and it is observed that recognition scores cannot necessarily be improved by reducing the variability for one specific parameter. It is also found that recognition scores are not directly related to the increase of the vocal effort and cannot be predicted from speech variability. >

Proceedings Article
01 Jan 1990
TL;DR: In this paper, the authors report on the use of the codebook-excited linear-predictive (CELP) algorithm for 32 kb/s low-delay (LD) coding of wideband speech.
Abstract: The authors report on the use of the codebook-excited linear-predictive (CELP) algorithm for 32 kb/s low-delay (LD-CELP) coding of wideband speech. The main problem associated with wideband coding, namely, spectral noise weighting, is discussed. The authors propose an enhanced noise weighting technique and demonstrate its efficiency via subjective listening tests. In these tests, involving 20 listeners and 8 test sentences, the average rating for the proposed 32 kb/s LD-CELP was essentially equal to that of the 65 kb/s standard (G.722) CCITT wideband coder.<>

Journal ArticleDOI
TL;DR: Digital speech technology is reviewed, with the emphasis on applications demanding high-quality reproduction of the speech signal, which include the important subclass of wideband speech.
Abstract: Digital speech technology is reviewed, with the emphasis on applications demanding high-quality reproduction of the speech signal. Examples of such applications are network telephony, ISDN terminals for audio teleconferencing, and systems for the storage of audio signals, which include the important subclass of wideband speech. Depending on the application, the bandwidth of input speech can vary from about 3 kHz to nearly 20 kHz. Coding for digital telephony at 4 and 8 kb/s, network quality coding at 16 kb/s, and coding for audio at 7 and 20 kHz are examined. Future directions in the field are discussed with respect to anticipated technology applications and the algorithms needed to support these technologies. >

Proceedings ArticleDOI
C.-E.W. Sundberg1, N. Seshadri1
02 Dec 1990
TL;DR: The North American system is compared to the pan-European digital GSM (Groupe Speciale Mobile) system, and techniques that may be used to further improve the system capacity of future digital cellular systems beyond the current standard are discussed.
Abstract: Standards for a new cellular mobile radio system for North America are currently being defined. The system will use digital transmission in contrast to the present analog system. Capacity is increased by means of three techniques. These are: sending three digital voice channels in one current analog FM channel (maintaining spectral compatibility), increased trunking efficiency, and exploiting improved frequency reuse offered by robust digital transmission techniques. The main elements of the system, such as multiple access digital modulation, speech coding, channel coding, and equalization, are briefly discussed. The North American system is compared to the pan-European digital GSM (Groupe Speciale Mobile) system, and techniques that may be used to further improve the system capacity of future digital cellular systems beyond the current standard are discussed. >

Proceedings ArticleDOI
Y. Mahieux1, J.P. Petit1
02 Dec 1990
TL;DR: The coding of high-quality sound at 64 kb/s is of interest for applications such as ISDN, and the algorithm described allows the reduction to such a bit rate while maintaining the original quality.
Abstract: The coding of high-quality sound at 64 kb/s is of interest for applications such as ISDN. The algorithm described allows the reduction to such a bit rate while maintaining the original quality. It is based on transform coding, and uses a time-domain aliasing cancellation (TDAC) transformation. Perceptual properties and the interblock redundancy of the spectrum are involved when coding the transform coefficients. The complexity of the algorithm allows its real-time implementation on a one floating-point digital signal processor, such as the ATT DSP 32C. The performance and subjective results of the coding system are discussed. >

Proceedings ArticleDOI
03 Apr 1990
TL;DR: The proposed coding method significantly increases the quality of the 4.8-kb/s CELP coder at the cost of an additional 5-ms coding delay, and the optimum combined parameter sequences are selected to minimize global quantization distortion over the coding frame.
Abstract: A 4.8-kb/s delayed decision code excited linear prediction (CELP) coder that uses tree coding is described. In conventional CELP coding, short-term and long-term prediction parameters as well as excitation parameters are sequentially determined. In the proposed delayed decision CELP coding, a tree coding method is utilized. The long-term prediction and excitation parameter candidates obtained in each subframe are listed as a tree and the optimum combined parameter sequences are selected to minimize global quantization distortion over the coding frame. The proposed coding method significantly increases the quality of the 4.8-kb/s CELP coder at the cost of an additional 5-ms coding delay. >

Proceedings ArticleDOI
03 Apr 1990
TL;DR: A real-time, single digital signal processing (DSP) chip implementation of a 2.4, 4.8, and 8.0-kb/s improved multiband excitation (IMBE) vocoder is presented, and it is shown to generate high-quality speech under both clean and noisy conditions.
Abstract: A real-time, single digital signal processing (DSP) chip implementation of a 2.4-, 4.8-, and 8.0-kb/s improved multiband excitation (IMBE) vocoder is presented. The IMBE vocoder is based on the MBE speech model, and it is shown to generate high-quality speech under both clean and noisy conditions. In addition, the IMBE vocoder is well suited for real-time implementation since it does not require excessive computation or storage. Full-duplex operation is demonstrated using a single AT&T WE DSP 32. Aspects of the hardware architecture, algorithm implementation, and system performance are addressed. >

Proceedings ArticleDOI
02 Dec 1990
TL;DR: Concepts for improvement of the coding algorithms are discussed which might be the basis for future ISO activities aiming at a bit rate of only 2*64 kb/s for a stereo sound signal.
Abstract: An ISO audio coding standard is being developed that will provide an audio quality comparable to that of a compact disc using a reduced bit rate of about 2*128 kb/s for a stereo sound signal instead of 2*706 kb/s. Four coding algorithms have been considered in order to develop the audio coding standard. Two of these coding algorithms have been tested and are outlined. The ASPEC algorithm uses a modified discrete cosine transform with overlapping blocks and dynamic windowing in order to map the input samples into frequency coefficients. The MUSICAM algorithm uses a subband analysis filter bank with 32 equally spaced subbands to map the input samples into frequency coefficients. Concepts for improvement of the coding algorithms are discussed which might be the basis for future ISO activities aiming at a bit rate of only 2*64 kb/s for a stereo sound signal. >

Journal ArticleDOI
TL;DR: A new coder, named the self-excited vocoder, is discussed because of its good performance with low complexity, and because of the insight this coder gives to analysis-by-synthesis coders in general.
Abstract: The performance of a broad class of analysis-by-synthesis linear predictive speech coders is quantified experimentally The class of coders includes a number of well-known techniques as well as a very large number of speech coders which have not been named or studied A general formulation for deriving the parametric representation used in all of the coders in the class is presented A new coder, named the self-excited vocoder, is discussed because of its good performance with low complexity, and because of the insight this coder gives to analysis-by-synthesis coders in general The results of a study comparing the performances of different members of this class are presented The study takes the form of a series of formal subjective and objective speech quality tests performed on selected coders The results of this study lead to some interesting and important observations concerning the controlling parameters for analysis-by-synthesis speech coders >

Proceedings ArticleDOI
03 Apr 1990
TL;DR: An approach to wideband digital audio compression of CD-quality signals at data rates of 128 kb/s channel and below is presented, a form of adaptive transform coding that features a nonuniform frequency division and coding scheme to exploit known characteristics of human perception.
Abstract: An approach to wideband digital audio compression of CD-quality signals at data rates of 128 kb/s channel and below is presented. A form of adaptive transform coding, this technique features a nonuniform frequency division and coding scheme to exploit known characteristics of human perception. The algorithm has low computational complexity and can be adapted for use at other bit rates. A windowed overlap-add process is used with the forward/inverse transforms, which have been efficiently implemented using FFTs. Transform coefficients are converted into a subband block-companded format consisting of exponent words and associated mantissas, which are then coded with an adaptive quantizer. A real-time, single-chip programmable digital signal processing (DSP) implementation encodes 480-kHz-sampled stereo audio signals at a variety of bit rates. At 128 kb/s, the coder's subjective performance is appropriate for highest-quality 15-kHz professional audio applications. >