Showing papers on "Speech coding published in 1988"

PDF

Open Access

Journal Article•DOI•

Efficient bit allocation for an arbitrary set of quantizers (speech coding)

[...]

Y. Shoham¹, Allen Gersho²•Institutions (2)

Bell Labs¹, University of California, Santa Barbara²

01 Sep 1988-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: In this article, a bit allocation algorithm that is capable of efficiently allocating a given quota of bits to an arbitrary set of different quantizers is proposed, which produces an optimal or very nearly optimal allocation, while allowing the set of admissible bit allocation values to be constrained to nonnegative integers.

...read moreread less

Abstract: A bit allocation algorithm that is capable of efficiently allocating a given quota of bits to an arbitrary set of different quantizers is proposed. This algorithm is useful in any coding scheme which uses bit allocation or, more generally, codebook allocation. It produces an optimal or very nearly optimal allocation, while allowing the set of admissible bit allocation values to be constrained to nonnegative integers. It is particularly useful in cases where the quantizer performance versus rate is irregular and changing in time, a situation that cannot be handled by conventional allocation algorithms. >

...read moreread less

822 citations

Journal Article•DOI•

LPC speech coding based on variable-length segment quantization

[...]

Y. Shiraki, M. Honda

01 Sep 1988-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A low-bit-rate linear predictive coder (LPC) that is based on variable-length segment quantization that is compared to that of fixed-length segments quantization and vector quantization for voice coding is presented.

...read moreread less

Abstract: A low-bit-rate linear predictive coder (LPC) that is based on variable-length segment quantization is presented. In this vocoder, the speech spectral-parameter sequence is represented as the concatenation of variable-length spectral segments generated by linearly time-warping fixed-length code segments. Both the sequence of code segments and the segment lengths are efficiently determined using a dynamic programming procedure. This procedure minimizes the spectral distance measured between the original and the coded spectral sequence in a given interval. An iterative algorithm is developed for designing fixed-length code segments for the training spectral sequence. It updates the segment boundaries of the training spectral sequence using an a priori codebook and updates the codebook using these segment sequences. The convergence of this algorithm is discussed theoretically and experimentally. In experiments, the performance of variable-length segment quantization for voice coding is compared to that of fixed-length segment quantization and vector quantization. >

...read moreread less

209 citations

Proceedings Article•DOI•

Speech recognition with continuous-parameter hidden Markov models

[...]

Lalit R. Bahl¹, Peter Fitzhugh Brown¹, P.V. de Souza¹, Robert Leroy Mercer¹•Institutions (1)

IBM¹

11 Apr 1988

TL;DR: The authors explore the trade-off between packing information into sequences of feature vectors and being able to model them accurately and investigate a method of parameter estimation which is designed to cope with inaccurate modeling assumptions.

...read moreread less

Abstract: The acoustic-modelling problem in automatic speech recognition is examined from an information theoretic point of view. This problem is to design a speech-recognition system which can extract from the speech waveform as much information as possible about the corresponding word sequence. The information extraction process is factored into two steps: a signal-processing step which converts a speech waveform into a sequence of informative acoustic feature vectors, and a step which models such a sequence. The authors are primarily concerned with the use of hidden Markov models to model sequences of feature vectors which lie in a continuous space. They explore the trade-off between packing information into such sequences and being able to model them accurately. The difficulty of developing accurate models of continuous-parameter sequences is addressed by investigating a method of parameter estimation which is designed to cope with inaccurate modeling assumptions. >

...read moreread less

207 citations

Journal Article•DOI•

Subband coding of images using vector quantization

[...]

P.H. Westerink¹, D.E. Boekee¹, Jan Biemond¹, John W. Woods•Institutions (1)

Delft University of Technology¹

01 Jun 1988-IEEE Transactions on Communications

TL;DR: A novel two-dimensional subband coding technique is presented that can be applied to images as well as speech and has a performance that is comparable to that of more complex coding techniques.

...read moreread less

Abstract: A novel two-dimensional subband coding technique is presented that can be applied to images as well as speech. A frequency-band decomposition of the image is carried out by means of 2D separable quadrature mirror filters, which split the image spectrum into 16 equal-rate subbands. These 16 parallel subband signals are regarded as a 16-dimensional vector source and coded as such using vector quantization. In the asymptotic case of high bit rates, a theoretical analysis yields that a lower bound to the gain is attainable by choosing this approach over scalar quantization of each subband with an optimal bit allocation. It is shown that vector quantization in this scheme has several advantages over coding the subbands separately. Experimental results are given, and it is shown the scheme has a performance that is comparable to that of more complex coding techniques. >

...read moreread less

196 citations

Proceedings Article•DOI•

Estimation of perceptual entropy using noise masking criteria

[...]

James D. Johnston¹•Institutions (1)

Bell Labs¹

11 Apr 1988

TL;DR: The perceptual entropy of each short-term section of the audio stimuli is estimated as the number of bits required to encode the short- term spectrum of the signal to the resolution measured by this process.

...read moreread less

Abstract: The perceptual entropy of each short-term section of the audio stimuli is estimated as the number of bits required to encode the short-term spectrum of the signal to the resolution measured by this process provide an entropy estimate, for transparent coding, of 1.4 (mean) or 2.1 (peak) bits/sample for telephone speech (200-3200-Hz bandwidth sampled at 8 kHz). The entropy measures for audio signals of other bandwidths and sampling rates is also reported. >

...read moreread less

167 citations

Journal Article•DOI•

Objective quality evaluation for low-bit-rate speech coding systems

[...]

Nobuhiko Kitawaki, H. Nagabuchi, Kenzo Itoh

01 Feb 1988-IEEE Journal on Selected Areas in Communications

TL;DR: An LPC (linear predictive coding) cepstrum distance measure (CD) is introduced as an objective measure for estimating the subjective quality of speech signals and good correspondence between LPC CD and the subjectivequality, expressed in terms of both opinion equivalent Q and mean opinion score are shown.

...read moreread less

Abstract: An LPC (linear predictive coding) cepstrum distance measure (CD) is introduced as an objective measure for estimating the subjective quality of speech signals. Good correspondence between LPC CD and the subjective quality, expressed in terms of both opinion equivalent Q and mean opinion score, are shown. Good repeatability of objective quality evaluation using LPC CD is also shown. A method for generating an artificial voice signal that reflects the characteristics of real speech signals is described. The LPC CD values calculated using this artificial voice are almost the same as those calculated using real speech signals. The speaker-dependency of the coded-speech quality is shown to be an important factor in low-bit-rate speech coding. Even taking this factor into consideration, LPC CD is shown to be effective for estimating the subjective quality. >

...read moreread less

151 citations

Journal Article•DOI•

A class of analysis-by-synthesis predictive coders for high quality speech coding at rates between 4.8 and 16 kbit/s

[...]

P. Kroon¹, E.F. Deprettere²•Institutions (2)

Bell Labs¹, Delft University of Technology²

01 Feb 1988-IEEE Journal on Selected Areas in Communications

TL;DR: The general structure of this class of coders is reviewed, and the particulars of its members are discussed and the contributions of the various coder parameters to the performance of the coder are examined.

...read moreread less

Abstract: The general structure of this class of coders is reviewed, and the particulars of its members are discussed. The different analysis procedures are described, and the contributions of the various coder parameters to the performance of the coder are examined. Quantization procedures for each transmitted parameter are given along with examples of bit allocations. The speech quality produced by these coders is high at 16 kb/s and good at 8 kb/s, but only fair at 4.8 kb/s. The use of postprocessing techniques changes the performance at lower rates, but more research is needed to further improve the coders. >

...read moreread less

151 citations

Patent•DOI•

A method for indicating the presence of speech in an audio signal

[...]

Yoram Stettiner¹, Shabtai Adlersberg¹, Mendel Aizner¹•Institutions (1)

DSP Group¹

03 Feb 1988-Journal of the Acoustical Society of America

TL;DR: In this paper, the authors employ a multiple-stage, delayed-decision adaptive digital signal processing algorithm implemented through the use of commonly available electronic circuit components to examine audio signal frames having harmonic content to identify voiced phonemes and determine whether the signal frame contains primarily speech or noise.

...read moreread less

Abstract: A voice operated switch employs digital signal processing techniques to examine audio signal frames having harmonic content to identify voiced phonemes and to determined whether the signal frame contains primarily speech or noise. The method and apparatus employ a multiple-stage, delayed-decision adaptive digital signal processing algorithm implemented through the use of commonly available electronic circuit components. Specifically the method and apparatus comprise a plurality of stages, including (1) a low-pass filter to limit examination of input signals to below about one kHz, (2) a digital center-clipped autocorrelation processor whih recognizes that the presence of periodic components of the input signal below and above a peak-related threshold identifies a frame as containing speech or noise, and (3) a nonlinear filtering processor which includes nonlinear smoothing of the frame-level decisions and incorporates a delay, and further incorporates a forward and backward decision extension at the speech-segment level of several tenths of milliseconds to determine whether adjacent frames are primarily speech or primarily noise.

...read moreread less

142 citations

Journal Article•DOI•

The effect of waveform substitution on the quality of PCM packet communications

[...]

O.J. Wasem¹, D.J. Goodman², C.A. Dvorak², H.G. Page²•Institutions (2)

Massachusetts Institute of Technology¹, AT&T²

01 Mar 1988-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The most effective estimation technique for packets containing 16 ms of speech in a pulse-code-modulation format is pitch waveform replication, which extends the acceptable ratio of missing packets to 10%.

...read moreread less

Abstract: Missing packets are a major cause of impairment in packet voice networks. While it is easiest to allow these gaps in received speech to appear as silent intervals in reconstructed speech, speech quality is improved by filling the gaps with estimates of the transmitted waveform. Several estimation techniques have been investigated for packets containing 16 ms of speech in a pulse-code-modulation format. The simplest method, packet repetition, extends from 2% to 5%, the acceptable ratio of missing packets. Here, acceptability is defined as a mean opinion score midway between fair and good on a five-point opinion scale. The most effective estimation technique (although not the most complex) is pitch waveform replication. It extends the acceptable ratio of missing packets to 10%. >

...read moreread less

131 citations

Journal Article•DOI•

Statistical model-based speech enhancement systems

[...]

Yariv Ephraim

01 Nov 1988-Journal of the Acoustical Society of America

TL;DR: This paper deals with the problem of enhancing speech signals that have been degraded by statistically independent quasistationary noise and proposes maximum likelihood estimation solutions that are based upon the E–M algorithm and its derivatives.

...read moreread less

Abstract: This paper deals with the problem of enhancing speech signals that have been degraded by statistically independent quasistationary noise. The estimation of the clean speech waveform, and of the parameters of autoregressive (AR) models for the clean speech, given the noisy speech, is considered. The two problems are demonstrated to be closely related in the sense that a good solution to one of them can be used for achieving a satisfactory solution for the other. The difficulties in solving these estimation problems are mainly due to the lack of explicit knowledge of the statistics of the clean speech signal and of the noise process. Maximum likelihood estimation solutions that are based upon the E–M algorithm and its derivatives are proposed. For estimating the speech waveform, the statistics of the clean speech signal and of the noise process are first estimated by training a pair of Gaussian AR hidden Markov models, one for the clean speech and the other for the noise, using long training sequences from the two sources. Then, the speech waveform is reestimated by applying the E–M algorithm to the estimated statistics. An approximation to the E–M algorithm is interpreted as being an iterative procedure in which Wiener filtering and AR modeling are alternatively applied. The different algorithms considered here will be compared and demonstrated.

...read moreread less

115 citations

Journal Article•DOI•

On robust linear prediction of speech

[...]

Chin-Hui Lee¹•Institutions (1)

Bell Labs¹

01 May 1988-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A robust linear prediction (LP) algorithms is proposed that minimizes the sum of appropriately weighted residuals and takes into account the non-Gaussian nature of the excitations for voiced speech and gives a more efficient and less biased estimate for the prediction coefficients than conventional methods.

...read moreread less

Abstract: A robust linear prediction (LP) algorithms is proposed that minimizes the sum of appropriately weighted residuals. The weight is a function of the prediction residual, and the cost function is selected to give more weight to the bulk of small residuals while deemphasizing the small portion of large residuals. In contrast, the conventional LP procedure weights all prediction residuals equally. The robust algorithm takes into account the non-Gaussian nature of the excitations for voiced speech and gives a more efficient (less variance) and less biased estimate for the prediction coefficients than conventional methods. The algorithm can be used in the front-end features extractor for a speech recognition system and as an analyzer for a speech coding system. Testing on synthetic vowel data demonstrates that the robust LP procedure is able to reduce the formant and bandwidth error rate by more than an order of magnitude compared to the conventional LP procedures and is relatively insensitive to the placement of the LPC (LP coding) analysis window and to the value of the pitch period, for a given section of speech signal. >

...read moreread less

Proceedings Article•DOI•

Speech codec for the European mobile radio system

[...]

Peter Vary¹, Karl Hellwig¹, Rudolf Dipl Ing Hofmann¹, R.J. Sluyter, C. Galand, M. Rosso - Show less +2 more•Institutions (1)

Philips¹

11 Apr 1988

TL;DR: The coding scheme which has been selected by the CEPT Groupe-Speciale-Mobile (GSM) as a result of formal subjective listening tests, is based on the regular-pulse excitation LPC technique (RPE-LPC) combined with long-term prediction (LTP).

...read moreread less

Abstract: In 1991 a digital mobile radio system will be introduced in Europe The speech codec to be used as the standard is presented The coding scheme which has been selected by the CEPT Groupe-Speciale-Mobile (GSM) as a result of formal subjective listening tests, is based on the regular-pulse excitation LPC technique (RPE-LPC) combined with long-term prediction (LTP) The so-called RPE-LTP codec has a net bit rate of 13 kbit/s The algorithm and the experimental implementations based on different VLSI signal processors are described and demonstrated >

...read moreread less

Journal Article•DOI•

G.722: a new CCITT coding standard for digital transmission of wideband audio signals

[...]

P. Mermelstein¹•Institutions (1)

McGill University¹

01 Jan 1988-IEEE Communications Magazine

TL;DR: A tutorial discussion is provided of the adaptive differential PCM (pulse-code modulation) coding method recommended by the group, which covers the subjective performance tests performed, mode initialization and mode switching, data-speed multiplexing, and communication between narrowband and wideband terminals.

...read moreread less

Abstract: CCITT Study Group XVIII recognized the need for a new international coding standard on high-quality audio to allow interconnection of diverse switching, transmission, and terminal equipment and organized an expert group in 1983 to recommend an appropriate coding technique. A tutorial discussion is provided of the adaptive differential PCM (pulse-code modulation) coding method recommended by the group. The discussion covers the subjective performance tests performed, mode initialization and mode switching, data-speed multiplexing, and communication between narrowband and wideband terminals. >

...read moreread less

Journal Article•DOI•

Enhancement of ADPCM speech coding with backward-adaptive algorithms for postfiltering and noise feedback

[...]

V. Ramamoorthy¹, Nikil S. Jayant¹, Richard V. Cox¹, M.M. Sondhi¹•Institutions (1)

Bell Labs¹

01 Feb 1988-IEEE Journal on Selected Areas in Communications

TL;DR: It is shown that postfilters based on higher order LPC (linear predictive coding) models can provide very low distortion in terms of special tilt and can provide better speech enhancement than circuits based on the backward-adaptive pole-zero predictor in ADPCM (adaptive digital pulse code modulation).

...read moreread less

Abstract: It is shown that postfiltering circuits based on higher order LPC (linear predictive coding) models can provide very low distortion in terms of special tilt. Thus, they can provide better speech enhancement than circuits based on the backward-adaptive pole-zero predictor in ADPCM (adaptive digital pulse code modulation). Quantitative criteria for designing postfiltering circuits based on higher-order LPC models are discussed. These postfilters are particularly attractive for systems where high-order LPC analysis is an integral part of the coding algorithm. In a subjective test that used a computer-simulated version of these circuits, enhanced ADPCM obtained a mean opinion score of 3.6 at 16 kb/s. >

...read moreread less

Patent•DOI•

Audio pre-processing methods and apparatus

[...]

Thomas F. Quatieri¹, R.J. McAulay¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Mar 1988-Journal of the Acoustical Society of America

TL;DR: In this paper, a sinusoidal speech representation system is applied to the problem of speech dispersion by pre-processing the waveform prior to transmission to reduce the peak-to-RMS ratio.

...read moreread less

Abstract: A lower threshold for dynamic range compression and clipping is allowed by sinusoidal estimation and phase adjustment of the original speech signal to obtain a lower Peak to RMS ratio. A sinusoidal speech representation system is applied to the problem of speech dispersion by pre-processing the waveform prior to transmission to reduce the peak-to-RMS ratio of the waveform. The sinusoidal system first estimates and then removes the natural phase dispersion in the frequency components of the speech signal. Artificial dispersion based on pulse compression techniques is then introduced with little change in speech quality. The new phase dispersion allocation serves to preprocess the waveform prior to dynamic range compression and clipping, allowing considerably deeper thresholding than can be tolerated on the original waveform.

...read moreread less

Patent•DOI•

Harmonic speech coding arrangement where a set of parameters for a continuous magnitude spectrum is determined by a speech analyzer and the parameters are used by a synthesizer to determine a spectrum which is used to determine senusoids for synthesis

[...]

David L. Thomson¹•Institutions (1)

Bell Labs¹

08 Apr 1988-Journal of the Acoustical Society of America

TL;DR: In this paper, a harmonic coding arrangement where the magnitude spectrum of the input speech is modeled at the analyzer by a relatively small set of parameters and, significantly, as a continuous rather than only a line magnitude spectrum.

...read moreread less

Abstract: A harmonic coding arrangement where the magnitude spectrum of the input speech is modeled at the analyzer by a relatively small set of parameters and, significantly, as a continuous rather than only a line magnitude spectrum. The synthesizer, rather than the analyzer, determines the magnitude, frequency, and phase of a large number of sinusoids which are summed to generate synthetic speech. Rather than receiving information explicitly defining the sinusoids from the analyzer, the synthesizer receives the small set of parameters and uses those parameters to determine a spectrum, which, in turn, is used by the synthesizer to determine the sinusoids for synthesis.

...read moreread less

Proceedings Article•DOI•

Computationally efficient sine-wave synthesis and its application to sinusoidal transform coding

[...]

R.J. McAulay¹, Thomas F. Quatieri¹•Institutions (1)

Massachusetts Institute of Technology¹

11 Apr 1988

TL;DR: The synthesizer, in conjunction with straightforward noise suppression, greatly improve the quality of the synthetic speech, rendering the sinusoidal transform coder (STC) algorithm a truly robust system.

...read moreread less

Abstract: A technique for sine-wave synthesis is described that uses the fast Fourier transform overlap-add method at a 100 Hz rate based on sine-wave parameter coded at a 50 Hz rate. This technique leads to an implementation requiring less than one-half the computational power of a digital-signal-processor chip. The synthesis method implicitly introduces a frequency jitter which renders the encoded synthetic speech more natural. For speech computed by additive acoustic noise, the synthesizer, in conjunction with straightforward noise suppression, greatly improve the quality of the synthetic speech, rendering the sinusoidal transform coder (STC) algorithm a truly robust system. More recent architecture studies of the STC algorithm suggests that an entire implementation requires no more than two ADSP2 100 chips. >

...read moreread less

Proceedings Article•DOI•

Strategies for improving the performance of CELP coders at low bit rates (speech analysis)

[...]

P. Kroon¹, B. Atal¹•Institutions (1)

Bell Labs¹

11 Apr 1988

TL;DR: It is found that the coder does not reproduce high frequencies well and that rapid changes in the speech signal are not adequately tracked.

...read moreread less

Abstract: Some of the distortions produced by CELP (code-excited linear prediction) coders are characterized. It is found that the coder does not reproduce high frequencies well and that rapid changes in the speech signal are not adequately tracked. Within the framework of the current CELP concept, strategies are discussed that can reduce these distortions. Nonstationarities in the speech signal can be better followed by allowing a flexible allocation of the bits used for the excitation. However, the bit allocation procedures and the way the bits are used need further improvement. The reproduction of higher frequencies can be improved by changing the error-weighting procedure or by shaping the code-book excitation functions. >

...read moreread less

Proceedings Article•DOI•

Encoding of LPC spectral parameters using switched-adaptive interframe vector prediction (speech coding)

[...]

M. Yong¹, G. Davidson¹, Allen Gersho¹•Institutions (1)

University of California, Santa Barbara¹

11 Apr 1988

TL;DR: It has been found that the LPC parameter bit rate required to achieve high-quality synthetic speech is only 1300 b/s, and when SIVP is combined with scalar quantization, the bit rate can be reduced even further without introducing any perceivable quantization noise in the reconstructed speech.

...read moreread less

Abstract: An efficient, low-complexity method called switched-adaptive interframe vector prediction (SIVP) has been developed for linear predictive coding (LPC) of spectral parameters in the development of low-bit-rate speech coding systems. SIVP utilizes vector linear prediction to exploit the high frame-to-frame redundancy present in the successive frames of LPC parameters. When SIVP is combined with scalar quantization, it has been found that the LPC parameter bit rate required to achieve high-quality synthetic speech is only 1300 b/s. With vector quantization, the bit-rate can be reduced even further (to 1000 b/s) without introducing any perceivable quantization noise in the reconstructed speech. >

...read moreread less

Proceedings Article•DOI•

A 4.8 kbps multi-band excitation speech coder

[...]

John C. Hardwick¹, Jae Lim¹•Institutions (1)

Massachusetts Institute of Technology¹

11 Apr 1988

TL;DR: The result of information listening tests indicate that this system can achieve high quality for both clean and noisy speech as the MBE speech is extremely robust to the presence of background noise in speech.

...read moreread less

Abstract: A speech model, referred to as the multiband excitation (MBE) speech model, has been shown to be capable of synthesizing speech without the artifacts common to model-based speech systems and has been used to develop a 4.8 kb/s speech coder. This system was developed using several new approaches to quantize the MBE model parameters. These techniques were designed to utilize additional redundancy amongst these parameters, thereby permitting more efficient quantization. The result of information listening tests indicate that this system can achieve high quality for both clean and noisy speech as the MBE speech is extremely robust to the presence of background noise in speech. >

...read moreread less

Journal Article•DOI•

An efficient stochastically excited linear predictive algorithm for high quality low bit rate transmission of speech

[...]

Willem Bastiaan Kleijn¹, Daniel John Krasinski¹, Richard Harry Ketchum¹•Institutions (1)

Bell Labs¹

01 Oct 1988-Speech Communication

TL;DR: Improvements to the SELP algorithm are described which result in better speech quality and higher computational efficiency, and a new recursive algorithm which performs a very fast search through the adaptive codebook.

...read moreread less

Patent•

Improving sub-band coding of speech at low bit rates by adding residual speech energy signals to sub-bands

[...]

Kumar Swaminathan¹•Institutions (1)

Bell Labs¹

30 Sep 1988

TL;DR: In this article, a sub-band speech coding arrangement was proposed, which divides the speech spectrum into sub-bands and allocates bits to encode the time frame interval samples of each subband responsive to the speech energies of the subbands.

...read moreread less

Abstract: A sub-band speech coding arrangement divides the speech spectrum into sub-bands and allocates bits to encode the time frame interval samples of each sub-band responsive to the speech energies of the sub-bands. The sub-band samples are quantized according to the sub-band energy bit allocation and the time frame quantized samples and speech energy signals are coded. A signal representative of the residual difference between the each time frame interval speech sample of the sub-band and the corresponding quantized speech sample of the sub-band is generated. The quality of the sub-band coded signal is improved by selecting the sub-bands with the largest residual differences, producing a vector signal from the sequence of residual difference signals of each selected sub-band, and matching the sub-band vector signal to one of a set of stored Gaussian codebook entries to generate a reduced bit code for the selected vector signal. The coded time frame interval quantized signals, speech energy signals and reduced bit codes for the selected residual differences are combined to form a multiplexed stream for the speech pattern of the time frame interval.

...read moreread less

Journal Article•DOI•

Transform coding of speech using a weighted vector quantizer

[...]

Takehiro Moriya, M. Honda

01 Feb 1988-IEEE Journal on Selected Areas in Communications

TL;DR: A medium-band speech coder that uses a weighted vector quantization scheme in the transformed domain and adaptively weighted matching is used instead of conventional adaptive bit allocation, which means the residual signal can be reconstructed by the decoder, even if the spectral envelope parameters are destroyed due to transmission errors.

...read moreread less

Abstract: A medium-band speech coder is proposed that uses a weighted vector quantization scheme in the transformed domain. The linear prediction residue is transformed and vector-quantized. In order to control the quantization errors in the transformed domain, adaptively weighted matching is used instead of conventional adaptive bit allocation. Therefore, the residual signal can be reconstructed by the decoder, even if the spectral envelope parameters are destroyed due to transmission errors. This coder is also capable of maintaining higher SNR (signal-to-noise ratio) performance than time-domain vector quantization coders for a wide range of computation complexities and bit rates. Coded speech is natural and unaffected by background noise. The mean opinion score for this coder at 7.2 kb/s is comparable to that of 5.5-bit log PCM coded speech sampled at 6.4 kHz. >

...read moreread less

Proceedings Article•DOI•

A low delay 16 kbits/sec speech coder

[...]

V. Iyengar¹, P. Kabal•Institutions (1)

McGill University¹

11 Apr 1988

TL;DR: The authors study a speech coder using a tree code generated by a stochastic innovations tree and backward adaptive synthesis filters that has low delay and has been evaluated through formal subjective testing to have speech quality that is equivalent to that for 7-bit log-PCM.

...read moreread less

Abstract: The authors study a speech coder using a tree code generated by a stochastic innovations tree and backward adaptive synthesis filters. The synthesis configuration uses a cascade of two all-pole filters-a pitch (long time delay) filter followed by a formant (short time delay) filter. Both filters are updated using backward adaptation. The formant predictor is updated using an adaptive lattice algorithm. The multipath (M,L) search algorithm is used to encode the speech. A frequency weighted error measure is used to reduce the perceptual loudness of the quantization noise. The speech coder has low delay (1 ms) and has been evaluated through formal subjective testing to have speech quality that is equivalent to that for 7-bit log-PCM. >

...read moreread less

s9.9 ENCODING OF LPC SPECTRAL PARAMETERS USING SWITCHED-ADAPTIVE INTERFRAME VECTOR PREDICTION?

[...]

Mei Yong, Grant Davidson, Allen Gersho

01 Jan 1988

TL;DR: When SIVP is combined with scalar quantization, it is found that the LPC parameter bit-rate required to achieve high-quality synthetic speech is only 1300 bits per second.

...read moreread less

Abstract: LPC spectral parameter encoding is often a challenging task in the development of low bit-rate speech coding systems. An efficient, low complexity method called Switched-Adaptive Interframe Vector Prediction (SIVP) has been developed for this purpose. SIVP utilizes vector linear prediction to exploit the high frame-to-frame redundancy present in successive frames of LPC parameters. When SIVP is combined with scalar quantization, we have found that the LPC parameter bit-rate required to achieve high-quality synthetic speech is only 1300 bits per second. With vector quantization, the bit-rate can be reduced even further (to 1000 hits per second) without introducing any perceivable quantization noise in the reconstructed

...read moreread less

Journal Article•DOI•

Quality assessment of speech coding and speech synthesis systems

[...]

Nobuhiko Kitawaki¹, H. Nagabuchi¹•Institutions (1)

Nippon Telegraph and Telephone¹

01 Oct 1988-IEEE Communications Magazine

TL;DR: Quality assessment methodologies for speech waveform coding, source coding, and speech synthesis by rule from the viewpoints of naturalness and intelligibility are reviewed.

...read moreread less

Abstract: The concept of speech quality assessment is examined. Quality assessment methodologies for speech waveform coding, source coding, and speech synthesis by rule from the viewpoints of naturalness and intelligibility are reviewed. Both subjective and objective measures are considered. >

...read moreread less

Patent•

Speech coding transmission equipment

[...]

Tomohiko Taniguchi¹, Kohei Iseda¹, Koji Okazaki¹, Fumio Amano¹, Shigeyuki Unagami¹ - Show less +1 more•Institutions (1)

Fujitsu¹

19 Feb 1988

TL;DR: In this article, a speech coding transmission equipment including a voiced/unvoiced detection unit (10) for distinguishing whether a speech signal is in a voiced speech or an unvoiced speech period was presented.

...read moreread less

Abstract: A speech coding transmission equipment including a voiced/unvoiced detection unit (10) for distinguishing whether a speech signal is in a voiced speech or an unvoiced speech period, time domain harmonic compression and expansion units (1, 6) which, when the speech signal is in a voiced speech period, compress and expand, in the time domain, the speech signal using a pitch period of the speech signal extracted by a pitch period extraction unit (2); and decimation and interpolation units (7, 8) which, when the speech signal is in an unvoiced speech period, compress and expand, in the time domain, the speech signal using a sample period in the equipment, thus carrying out an appropriate time domain harmonic compression and expansion of the speech signal and increasing the clarity of the reproduced speech as a whole.

...read moreread less

Proceedings Article•DOI•

Multiple-stage vector excitation coding of speech waveforms

[...]

G. Davidson¹, Allen Gersho¹•Institutions (1)

University of California, Santa Barbara¹

11 Apr 1988

TL;DR: If a number of VQ stages is increased sufficiently, MSVXC can be expressed as a form of transform coding, in which the computationally intensive excitation codebook search is completely eliminated.

...read moreread less

Abstract: An approach to vector-excitation-coding (VXC) speech compression utilizing multiple-stage vector quantization (VQ) is considered. Called multiple-stage VXC (MSVXC), this technique facilitates the use of high-dimensional excitation vectors at medium-band rates without substantially increasing computation. The basic approach consists of successively approximating the input speech vector in several cascaded VQ stages, where the input vector for each stage is the quantization error vector from the preceding stage. It is shown that if a number of VQ stages is increased sufficiently, MSVXC can be expressed as a form of transform coding, in which the computationally intensive excitation codebook search is completely eliminated. >

...read moreread less

Proceedings Article•DOI•

Temporal decomposition and acoustic-phonetic decoding of speech

[...]

Frédéric Bimbot¹, Gérard Chollet¹, P. Deleglise¹, C. Montacie¹•Institutions (1)

Centre national de la recherche scientifique¹

11 Apr 1988

TL;DR: The possibility of undoing the effects of coarticulation is the major contribution of this work, and the identification of corrected targets is therefore possible with no further contextual rules.

...read moreread less

Abstract: The automatic recognition of continuous speech may use a symbolic representation of the acoustic signal in order to facilitate lexical access. The allophones of the language form a practical set of symbols. A major issue is a reliable localisation of these units in the speech stream and their identification. Localisation is obtained using a robust implementation of temporal decomposition, a technique originally proposed by Atal (1983), for speech coding. Speech is decomposed in terms of overlapping events characterized by both a spectral target and a time-limited interpolation function. An undershot target may be reestimated using neighbours and the associated functions. The possibility of undoing the effects of coarticulation is the major contribution of this work. The identification of these corrected targets is therefore possible with no further contextual rules. The recognition of spelled surnames (letters of the alphabet) is used for evaluation. 76% of correct phones allow 70% of correct letters. >

...read moreread less

Proceedings Article•DOI•

Modulation techniques for digital cellular systems

[...]

J.A. Tarallo¹, G.I. Zysman¹•Institutions (1)

Bell Labs¹

15 Jun 1988

TL;DR: A modulation technique is described that uses the properties of linear modulation techniques to achieve high spectral efficiency and a QDPSK system provides high-quality speech using a 10-kHz bandwidth, with better S/I protection ratios than the current analog system.

...read moreread less

Abstract: A modulation technique is described that uses the properties of linear modulation techniques to achieve high spectral efficiency. When combined with high-quality low-bit-rate speech coding and channel coding, linear modulation provides a substantial increase in system capacity. A QDPSK system provides high-quality speech using a 10-kHz bandwidth, with better S/I protection ratios than the current analog system. A digital cellular system which uses QDPSK is discussed. Bit error rate and spectral efficiency data are presented. >

...read moreread less

Collapse