Showing papers on "Linear predictive coding published in 1988"

PDF

Open Access

Journal Article•DOI•

[...]

01 Feb 1988-IEEE Potentials

TL;DR: The basic principles of linear predictive coding (LPC) are presented and least-squares methods for obtaining the LPC coefficients characterizing the all-pole filter are described.

...read moreread less

Abstract: The basic principles of linear predictive coding (LPC) are presented. Least-squares methods for obtaining the LPC coefficients characterizing the all-pole filter are described. Computational factors, instantaneous updating, and spectral estimation are discussed. >

...read moreread less

224 citations

Journal Article•DOI•

LPC speech coding based on variable-length segment quantization

[...]

Y. Shiraki, M. Honda

01 Sep 1988-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A low-bit-rate linear predictive coder (LPC) that is based on variable-length segment quantization that is compared to that of fixed-length segments quantization and vector quantization for voice coding is presented.

...read moreread less

Abstract: A low-bit-rate linear predictive coder (LPC) that is based on variable-length segment quantization is presented. In this vocoder, the speech spectral-parameter sequence is represented as the concatenation of variable-length spectral segments generated by linearly time-warping fixed-length code segments. Both the sequence of code segments and the segment lengths are efficiently determined using a dynamic programming procedure. This procedure minimizes the spectral distance measured between the original and the coded spectral sequence in a given interval. An iterative algorithm is developed for designing fixed-length code segments for the training spectral sequence. It updates the segment boundaries of the training spectral sequence using an a priori codebook and updates the codebook using these segment sequences. The convergence of this algorithm is discussed theoretically and experimentally. In experiments, the performance of variable-length segment quantization for voice coding is compared to that of fixed-length segment quantization and vector quantization. >

...read moreread less

209 citations

Proceedings Article•

Fast Learning in Multi-Resolution Hierarchies

[...]

John Moody¹•Institutions (1)

Yale University¹

01 Jan 1988

TL;DR: A class of fast, supervised learning algorithms inspired by Albus's CMAC model that use local representations, hashing, and multiple scales of resolution to approximate functions which are piece-wise continuous are presented.

...read moreread less

Abstract: A class of fast, supervised learning algorithms is presented. They use local representations, hashing, and multiple scales of resolution to approximate functions which are piece-wise continuous. Inspired by Albus's CMAC model, the algorithms learn orders of magnitude more rapidly than typical implementations of back propagation, while often achieving comparable qualities of generalization. Furthermore, unlike most traditional function approximation methods, the algorithms are well suited for use in real time adaptive signal processing. Unlike simpler adaptive systems, such as linear predictive coding, the adaptive linear combiner, and the Kalman filter, the new algorithms are capable of efficiently capturing the structure of complicated non-linear systems. As an illustration, the algorithm is applied to the prediction of a chaotic timeseries.

...read moreread less

171 citations

Journal Article•DOI•

Objective quality evaluation for low-bit-rate speech coding systems

[...]

Nobuhiko Kitawaki, H. Nagabuchi, Kenzo Itoh

01 Feb 1988-IEEE Journal on Selected Areas in Communications

TL;DR: An LPC (linear predictive coding) cepstrum distance measure (CD) is introduced as an objective measure for estimating the subjective quality of speech signals and good correspondence between LPC CD and the subjectivequality, expressed in terms of both opinion equivalent Q and mean opinion score are shown.

...read moreread less

Abstract: An LPC (linear predictive coding) cepstrum distance measure (CD) is introduced as an objective measure for estimating the subjective quality of speech signals. Good correspondence between LPC CD and the subjective quality, expressed in terms of both opinion equivalent Q and mean opinion score, are shown. Good repeatability of objective quality evaluation using LPC CD is also shown. A method for generating an artificial voice signal that reflects the characteristics of real speech signals is described. The LPC CD values calculated using this artificial voice are almost the same as those calculated using real speech signals. The speaker-dependency of the coded-speech quality is shown to be an important factor in low-bit-rate speech coding. Even taking this factor into consideration, LPC CD is shown to be effective for estimating the subjective quality. >

...read moreread less

151 citations

Patent•DOI•

Speech synthesis system by rule using phonemes as systhesis units

[...]

Seiichi Yamamoto, Norio Higuchi, Shimizu Toru

17 May 1988-Journal of the Acoustical Society of America

TL;DR: In this paper, a speech synthesizer that synthesizes speech by actuating a voice source and a filter which processes output of the voice source according to speech parameters in each successive short interval of time according to feature vectors which include formant frequencies, formant bandwidth, speech rate and so on.

...read moreread less

Abstract: A speech synthesizer that synthesizes speech by actuating a voice source and a filter which processes output of the voice source according to speech parameters in each successive short interval of time according to feature vectors which include formant frequencies, formant bandwidth, speech rate and so on. Each feature vector, or speech parameter is defined by two target points (r1, r2), and a value at each target point together with a connection curve between target points. A speech rate is defined by a speech rate curve which defines elongation or shortening of the speech rate, by start point (d1) of elongation (or shorteninng), end point (d2), and elongation ratio between d1 and d2. The ratios between the relative time of each speech parameter and absolute time are preliminarily calculated according to the speech rate table in each predetermined short interval.

...read moreread less

137 citations

Journal Article•DOI•

The effect of waveform substitution on the quality of PCM packet communications

[...]

O.J. Wasem¹, D.J. Goodman², C.A. Dvorak², H.G. Page²•Institutions (2)

Massachusetts Institute of Technology¹, AT&T²

01 Mar 1988-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The most effective estimation technique for packets containing 16 ms of speech in a pulse-code-modulation format is pitch waveform replication, which extends the acceptable ratio of missing packets to 10%.

...read moreread less

Abstract: Missing packets are a major cause of impairment in packet voice networks. While it is easiest to allow these gaps in received speech to appear as silent intervals in reconstructed speech, speech quality is improved by filling the gaps with estimates of the transmitted waveform. Several estimation techniques have been investigated for packets containing 16 ms of speech in a pulse-code-modulation format. The simplest method, packet repetition, extends from 2% to 5%, the acceptable ratio of missing packets. Here, acceptability is defined as a mean opinion score midway between fair and good on a five-point opinion scale. The most effective estimation technique (although not the most complex) is pitch waveform replication. It extends the acceptable ratio of missing packets to 10%. >

...read moreread less

131 citations

Journal Article•DOI•

Statistical model-based speech enhancement systems

[...]

Yariv Ephraim

01 Nov 1988-Journal of the Acoustical Society of America

TL;DR: This paper deals with the problem of enhancing speech signals that have been degraded by statistically independent quasistationary noise and proposes maximum likelihood estimation solutions that are based upon the E–M algorithm and its derivatives.

...read moreread less

Abstract: This paper deals with the problem of enhancing speech signals that have been degraded by statistically independent quasistationary noise. The estimation of the clean speech waveform, and of the parameters of autoregressive (AR) models for the clean speech, given the noisy speech, is considered. The two problems are demonstrated to be closely related in the sense that a good solution to one of them can be used for achieving a satisfactory solution for the other. The difficulties in solving these estimation problems are mainly due to the lack of explicit knowledge of the statistics of the clean speech signal and of the noise process. Maximum likelihood estimation solutions that are based upon the E–M algorithm and its derivatives are proposed. For estimating the speech waveform, the statistics of the clean speech signal and of the noise process are first estimated by training a pair of Gaussian AR hidden Markov models, one for the clean speech and the other for the noise, using long training sequences from the two sources. Then, the speech waveform is reestimated by applying the E–M algorithm to the estimated statistics. An approximation to the E–M algorithm is interpreted as being an iterative procedure in which Wiener filtering and AR modeling are alternatively applied. The different algorithms considered here will be compared and demonstrated.

...read moreread less

115 citations

Journal Article•DOI•

On robust linear prediction of speech

[...]

Chin-Hui Lee¹•Institutions (1)

Bell Labs¹

01 May 1988-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A robust linear prediction (LP) algorithms is proposed that minimizes the sum of appropriately weighted residuals and takes into account the non-Gaussian nature of the excitations for voiced speech and gives a more efficient and less biased estimate for the prediction coefficients than conventional methods.

...read moreread less

Abstract: A robust linear prediction (LP) algorithms is proposed that minimizes the sum of appropriately weighted residuals. The weight is a function of the prediction residual, and the cost function is selected to give more weight to the bulk of small residuals while deemphasizing the small portion of large residuals. In contrast, the conventional LP procedure weights all prediction residuals equally. The robust algorithm takes into account the non-Gaussian nature of the excitations for voiced speech and gives a more efficient (less variance) and less biased estimate for the prediction coefficients than conventional methods. The algorithm can be used in the front-end features extractor for a speech recognition system and as an analyzer for a speech coding system. Testing on synthetic vowel data demonstrates that the robust LP procedure is able to reduce the formant and bandwidth error rate by more than an order of magnitude compared to the conventional LP procedures and is relatively insensitive to the placement of the LPC (LP coding) analysis window and to the value of the pitch period, for a given section of speech signal. >

...read moreread less

112 citations

Journal Article•DOI•

Enhancement of ADPCM speech coding with backward-adaptive algorithms for postfiltering and noise feedback

[...]

V. Ramamoorthy¹, Nikil S. Jayant¹, Richard V. Cox¹, M.M. Sondhi¹•Institutions (1)

Bell Labs¹

01 Feb 1988-IEEE Journal on Selected Areas in Communications

TL;DR: It is shown that postfilters based on higher order LPC (linear predictive coding) models can provide very low distortion in terms of special tilt and can provide better speech enhancement than circuits based on the backward-adaptive pole-zero predictor in ADPCM (adaptive digital pulse code modulation).

...read moreread less

Abstract: It is shown that postfiltering circuits based on higher order LPC (linear predictive coding) models can provide very low distortion in terms of special tilt. Thus, they can provide better speech enhancement than circuits based on the backward-adaptive pole-zero predictor in ADPCM (adaptive digital pulse code modulation). Quantitative criteria for designing postfiltering circuits based on higher-order LPC models are discussed. These postfilters are particularly attractive for systems where high-order LPC analysis is an integral part of the coding algorithm. In a subjective test that used a computer-simulated version of these circuits, enhanced ADPCM obtained a mean opinion score of 3.6 at 16 kb/s. >

...read moreread less

65 citations

Proceedings Article•DOI•

Computationally efficient sine-wave synthesis and its application to sinusoidal transform coding

[...]

R.J. McAulay¹, Thomas F. Quatieri¹•Institutions (1)

Massachusetts Institute of Technology¹

11 Apr 1988

TL;DR: The synthesizer, in conjunction with straightforward noise suppression, greatly improve the quality of the synthetic speech, rendering the sinusoidal transform coder (STC) algorithm a truly robust system.

...read moreread less

Abstract: A technique for sine-wave synthesis is described that uses the fast Fourier transform overlap-add method at a 100 Hz rate based on sine-wave parameter coded at a 50 Hz rate. This technique leads to an implementation requiring less than one-half the computational power of a digital-signal-processor chip. The synthesis method implicitly introduces a frequency jitter which renders the encoded synthetic speech more natural. For speech computed by additive acoustic noise, the synthesizer, in conjunction with straightforward noise suppression, greatly improve the quality of the synthetic speech, rendering the sinusoidal transform coder (STC) algorithm a truly robust system. More recent architecture studies of the STC algorithm suggests that an entire implementation requires no more than two ADSP2 100 chips. >

...read moreread less

62 citations

Proceedings Article•DOI•

Strategies for improving the performance of CELP coders at low bit rates (speech analysis)

[...]

P. Kroon¹, B. Atal¹•Institutions (1)

Bell Labs¹

11 Apr 1988

TL;DR: It is found that the coder does not reproduce high frequencies well and that rapid changes in the speech signal are not adequately tracked.

...read moreread less

Abstract: Some of the distortions produced by CELP (code-excited linear prediction) coders are characterized. It is found that the coder does not reproduce high frequencies well and that rapid changes in the speech signal are not adequately tracked. Within the framework of the current CELP concept, strategies are discussed that can reduce these distortions. Nonstationarities in the speech signal can be better followed by allowing a flexible allocation of the bits used for the excitation. However, the bit allocation procedures and the way the bits are used need further improvement. The reproduction of higher frequencies can be improved by changing the error-weighting procedure or by shaping the code-book excitation functions. >

...read moreread less

Proceedings Article•DOI•

Encoding of LPC spectral parameters using switched-adaptive interframe vector prediction (speech coding)

[...]

M. Yong¹, G. Davidson¹, Allen Gersho¹•Institutions (1)

University of California, Santa Barbara¹

11 Apr 1988

TL;DR: It has been found that the LPC parameter bit rate required to achieve high-quality synthetic speech is only 1300 b/s, and when SIVP is combined with scalar quantization, the bit rate can be reduced even further without introducing any perceivable quantization noise in the reconstructed speech.

...read moreread less

Abstract: An efficient, low-complexity method called switched-adaptive interframe vector prediction (SIVP) has been developed for linear predictive coding (LPC) of spectral parameters in the development of low-bit-rate speech coding systems. SIVP utilizes vector linear prediction to exploit the high frame-to-frame redundancy present in the successive frames of LPC parameters. When SIVP is combined with scalar quantization, it has been found that the LPC parameter bit rate required to achieve high-quality synthetic speech is only 1300 b/s. With vector quantization, the bit-rate can be reduced even further (to 1000 b/s) without introducing any perceivable quantization noise in the reconstructed speech. >

...read moreread less

Proceedings Article•DOI•

A 4.8 kbps multi-band excitation speech coder

[...]

John C. Hardwick¹, Jae Lim¹•Institutions (1)

Massachusetts Institute of Technology¹

11 Apr 1988

TL;DR: The result of information listening tests indicate that this system can achieve high quality for both clean and noisy speech as the MBE speech is extremely robust to the presence of background noise in speech.

...read moreread less

Abstract: A speech model, referred to as the multiband excitation (MBE) speech model, has been shown to be capable of synthesizing speech without the artifacts common to model-based speech systems and has been used to develop a 4.8 kb/s speech coder. This system was developed using several new approaches to quantize the MBE model parameters. These techniques were designed to utilize additional redundancy amongst these parameters, thereby permitting more efficient quantization. The result of information listening tests indicate that this system can achieve high quality for both clean and noisy speech as the MBE speech is extremely robust to the presence of background noise in speech. >

...read moreread less

Journal Article•DOI•

An efficient stochastically excited linear predictive algorithm for high quality low bit rate transmission of speech

[...]

Willem Bastiaan Kleijn¹, Daniel John Krasinski¹, Richard Harry Ketchum¹•Institutions (1)

Bell Labs¹

01 Oct 1988-Speech Communication

TL;DR: Improvements to the SELP algorithm are described which result in better speech quality and higher computational efficiency, and a new recursive algorithm which performs a very fast search through the adaptive codebook.

...read moreread less

Patent•DOI•

Method for utilizing formant frequencies in speech recognition

[...]

George R. Doddington¹, Yeunung Chen¹, R. Gary Leonard¹•Institutions (1)

Texas Instruments¹

08 Nov 1988-Journal of the Acoustical Society of America

TL;DR: A speech recognizer which utilizes hypothesis testing to determine formant frequencies for use in speech recognition and an optimum formant selector operates with a comparator to select from the formant candidates those formants which best match stored reference formants.

...read moreread less

Abstract: A speech recognizer which utilizes hypothesis testing to determine formant frequencies for use in speech recognition. A pre-processor (36) receives speech signal frames and utilizes linear predictive coding to generate all formant frequency candidates. An optimum formant selector (38) operates with a comparator (40) to select from the formant candidates those formants which best match stored reference formants. A dynamic time warper (42) and high level recognition logic (44) operate to determine whether or not to declare a recognized word.

...read moreread less

Patent•

Improving sub-band coding of speech at low bit rates by adding residual speech energy signals to sub-bands

[...]

Kumar Swaminathan¹•Institutions (1)

Bell Labs¹

30 Sep 1988

TL;DR: In this article, a sub-band speech coding arrangement was proposed, which divides the speech spectrum into sub-bands and allocates bits to encode the time frame interval samples of each subband responsive to the speech energies of the subbands.

...read moreread less

Abstract: A sub-band speech coding arrangement divides the speech spectrum into sub-bands and allocates bits to encode the time frame interval samples of each sub-band responsive to the speech energies of the sub-bands. The sub-band samples are quantized according to the sub-band energy bit allocation and the time frame quantized samples and speech energy signals are coded. A signal representative of the residual difference between the each time frame interval speech sample of the sub-band and the corresponding quantized speech sample of the sub-band is generated. The quality of the sub-band coded signal is improved by selecting the sub-bands with the largest residual differences, producing a vector signal from the sequence of residual difference signals of each selected sub-band, and matching the sub-band vector signal to one of a set of stored Gaussian codebook entries to generate a reduced bit code for the selected vector signal. The coded time frame interval quantized signals, speech energy signals and reduced bit codes for the selected residual differences are combined to form a multiplexed stream for the speech pattern of the time frame interval.

...read moreread less

Journal Article•DOI•

Quality assessment of speech coding and speech synthesis systems

[...]

Nobuhiko Kitawaki¹, H. Nagabuchi¹•Institutions (1)

Nippon Telegraph and Telephone¹

01 Oct 1988-IEEE Communications Magazine

TL;DR: Quality assessment methodologies for speech waveform coding, source coding, and speech synthesis by rule from the viewpoints of naturalness and intelligibility are reviewed.

...read moreread less

Abstract: The concept of speech quality assessment is examined. Quality assessment methodologies for speech waveform coding, source coding, and speech synthesis by rule from the viewpoints of naturalness and intelligibility are reviewed. Both subjective and objective measures are considered. >

...read moreread less

Journal Article•DOI•

A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise

[...]

F.K. Soong¹, Man Mohan Sondhi¹•Institutions (1)

Bell Labs¹

01 Jan 1988-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The authors propose an adaptively weighted Itakura distortion measure, which they studied its effects on the performance of a conventional dynamic time-warping (DTW)-based speech recognizer in a series of speaker-independent, isolated-digit-recognition experiments.

...read moreread less

Abstract: The authors propose an adaptively weighted Itakura distortion measure. They studied its effects on the performance of a conventional dynamic time-warping (DTW)-based speech recognizer in a series of speaker-independent, isolated-digit-recognition experiments. The equivalent SNR improvement achieved by using the proposed weighted Itakura distortion at low SNRs is about 5-7 dB. >

...read moreread less

Proceedings Article•DOI•

A family of distortion measures base upon projection operation for robust speech recognition

[...]

D. Mansour¹, Biing-Hwang Juang¹•Institutions (1)

Bell Labs¹

11 Apr 1988

TL;DR: Experimental results show that the new measures cause no degradation in recognition accuracy at high SNR, but perform significantly better when tested under noisy conditions using only clean reference templates.

...read moreread less

Abstract: The authors aim at the formulation of similarity measures for robust speech recognition. Their consideration focuses on the speech cepstrum derived from linear prediction coefficients (the LPC cepstrum). By using common models for noisy speech, they analytically and empirically show how the ambient noise can affect some important attributes of the LPC cepstrum such as the vector norm, coefficient order, and the direction perturbation. The new findings led them to propose a family of distortion measures based on the projection between two cepstral vectors. Performance evaluation of these measures has been conducted in both speaker-dependent and speaker-independent isolated word recognition tasks. Experimental results show that the new measures cause no degradation in recognition accuracy at high SNR, but perform significantly better when tested under noisy conditions using only clean reference templates. At an SNR of 5 dB, the new measures are shown to be able to achieve a recognition rate equivalent to that obtained by the filtered cepstral measure at 20 dB SNR, demonstrating a gain of 15 dB. >

...read moreread less

Proceedings Article•DOI•

Multiple-stage vector excitation coding of speech waveforms

[...]

G. Davidson¹, Allen Gersho¹•Institutions (1)

University of California, Santa Barbara¹

11 Apr 1988

TL;DR: If a number of VQ stages is increased sufficiently, MSVXC can be expressed as a form of transform coding, in which the computationally intensive excitation codebook search is completely eliminated.

...read moreread less

Abstract: An approach to vector-excitation-coding (VXC) speech compression utilizing multiple-stage vector quantization (VQ) is considered. Called multiple-stage VXC (MSVXC), this technique facilitates the use of high-dimensional excitation vectors at medium-band rates without substantially increasing computation. The basic approach consists of successively approximating the input speech vector in several cascaded VQ stages, where the input vector for each stage is the quantization error vector from the preceding stage. It is shown that if a number of VQ stages is increased sufficiently, MSVXC can be expressed as a form of transform coding, in which the computationally intensive excitation codebook search is completely eliminated. >

...read moreread less

Proceedings Article•DOI•

The short-time modified coherence representation and its application for noisy speech recognition

[...]

D. Mansour¹, Biing-Hwang Juang¹•Institutions (1)

Bell Labs¹

11 Apr 1988

TL;DR: The short-time modified coherence (SCM) representation, proposed here, is an all-pole modeling of the autocorrelation sequence followed by a spectral shaper, essentially a square root operator in the frequency domain that compensates for the inherent spectral distortion introduced by the autOCorrelation operation on the autoceanic sequence of the signal.

...read moreread less

Abstract: A technique for robust spectral representation of all-pole sequences is proposed. It is shown that the autocorrelation of an all-pole sequence, obtained by passing white noise through an all-pole filter 1/A(z), is an all-pole sequence of the form 1/A/sup 2/(z). The short-time modified coherence (SCM) representation, proposed here, is an all-pole modeling of the autocorrelation sequence followed by a spectral shaper. The spectral shaper, essentially a square root operator in the frequency domain, compensates for the inherent spectral distortion introduced by the autocorrelation operation on the autocorrelation sequence of the signal. The properties of the SMC representation, especially its robustness to additive white noise, are analyzed. Initial implementation of the SMC in a speaker-dependent isolated-word recognizer shows a considerable improvement over the standard linear predictive coding (LPC) representation. The SMC recognizer achieved an improvement in recognition accuracy equivalent to an increase in input SNR of approximately 13 dB, as compared to the LPC recognizer. >

...read moreread less

Proceedings Article•DOI•

Variants of cepstrum based speaker identity verification

[...]

G. Velius

11 Apr 1988

TL;DR: Analysis parameters and various distance measures are investigated for a template matching scheme for speaker identity verification (SIV) and performance varies significantly across vocabulary, and average performance is approximately 5% EER for the better algorithms on telephone speech.

...read moreread less

Abstract: Analysis parameters and various distance measures are investigated for a template matching scheme for speaker identity verification (SIV). Two parameters are systematically varied-the length of the signal analysis window, and the order of the linear predictive coding/-cepstrum analysis. Computational costs associated with the choice of parameters are also considered. The distance measures tested are the Euclidean, inverse variance weighting, differential mean weighting, Kahn's simplified weighting, the Mahalanobis distance, and the Fisher linear discriminant. Using the equal error rate (EER) of pairwise utterance dissimilarity distributions, performance is estimated for prespecified and (a simulation of) user-determined input vocabulary. Performance varies significantly across vocabulary, and average performance is approximately 5% EER for the better algorithms on telephone speech. >

...read moreread less

Journal Article•DOI•

Autoregressive models with time-dependent log area ratios

[...]

Yves Grenier¹, M. C. Omnes-Chevalier¹•Institutions (1)

Télécom ParisTech¹

01 Oct 1988-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: Two algorithms for the estimation of these time-varying log area ratios are proposed; the first one is an approximation using a lattice filter, while the second one minimizes a least-squares criterion.

...read moreread less

Abstract: A large class of stationary signals, containing speech signals, but not restricted to them, can be represented by time-varying models, the coefficients of which are finite linear combinations of known time functions. Such models have been found useful for speech recognition and speech synthesis, but they suffer in this last application from a lack of stability. A time-varying area-ratio (AR) model, into which the time-dependency is coded through log-area ratios to ensure stability is described. Two algorithms for the estimation of these time-varying log area ratios are proposed; the first one is an approximation using a lattice filter, while the second one minimizes a least-squares criterion. The evaluation of their performance is obtained by a set of simulations. An example of speech signal modeled with these time-varying log area ratios shows the usefulness of this approach for speech synthesis and recognition. >

...read moreread less

Constrained Iterative Speech Enhance with Application to Automatic Speech Recognition

[...]

John H. L. Hansen, Mark A. Clements

01 Jan 1988

TL;DR: In this paper, a set of iterative speech enhancement techniques employing specral constraints is extended and evaluated, and the authors apply inter-and intraframe spectral constraints to ensure optimum speech quality across all classes of speech.

...read moreread less

Abstract: set of iterative speech enhancement techniques employing specral constraints is extended and evaluated in this paper. The orignal unconstrained technique attempts to solve for the maximum ikelihood estimate of a speech waveform in additive noise. The new approaches (presented in ICASSP-87 [3]), apply inter- and intraframe spectral constraints to ensure optimum speech quality across all classes of speech. Constraints are applied based on the presence of perceptually important speech characteristics found during the echnique is presented the techniques have colored noise. And d to determine their extremely noisy en

...read moreread less

Journal Article•DOI•

Adaptive silence deletion for speech storage and voice mail applications

[...]

C.K. Gan¹, R.W. Donaldson•Institutions (1)

IBM¹

01 Jun 1988-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: An algorithm that uses two adaptive-amplitude thresholds and zero-crossing rate was used to delete nonspeech material from speech waveforms which have been digitally encoded and then decoded using PCM, adaptive-differential PCM and adaptive-delta-modulation.

...read moreread less

Abstract: An algorithm that uses two adaptive-amplitude thresholds and zero-crossing rate was used to delete nonspeech material from speech waveforms which have been digitally encoded and then decoded using PCM, adaptive-differential PCM, and adaptive-delta-modulation. Typically, compression rates of 35% resulted. Subject evaluations are used to assess reconstructed speech quality, which improves significantly when absolute silence on playback is replaced with prerecorded background noise. >

...read moreread less

Proceedings Article•DOI•

Speech enhancement using vector quantization and a formant distance measure

[...]

Douglas O'Shaughnessy¹•Institutions (1)

Institut national de la recherche scientifique¹

11 Apr 1988

TL;DR: A system to improve the intelligibility of noisy speech through the use of vector quantization of linear predictive coding (LPC) spectra and a distance measure involving formants is described, appearing to be a promising way to transform noisy speech into more intelligible signals.

...read moreread less

Abstract: A system to improve the intelligibility of noisy speech through the use of vector quantization of linear predictive coding (LPC) spectra and a distance measure involving formants is described. Based on experiments using the system on natural speech degraded by additive white noise, the approach appears to be a promising way to transform noisy speech into more intelligible signals. As the noise corruption increases (and SRN decreases), the output speech becomes more distorted in terms of spectral jumps and mismatches, but remains free of noise. Good intelligibility remains as low as 9 dB SNR, although the speech is unnatural due to the LPC synthesis. >

...read moreread less

Proceedings Article•DOI•

Quantizer design in LSP speech analysis and synthesis

[...]

N. Sugamura, N. Farvardin

11 Apr 1988

TL;DR: Experimental results indicate that high-quality synthesized speech can be obtained using the LSP parameters at relatively low rates.

...read moreread less

Abstract: The performance of several algorithms for the quantization of the line spectrum pair (LSP) parameters is studied. An adaptive method which utilizes the ordering property of the LSP parameters is presented. The performance of the different quantization schemes is studied on a long sequence of speech samples. For the spectral distortion measure, appropriate performance comparisons between the different quantization schemes are rendered. Experimental results indicate that high-quality synthesized speech can be obtained using the LSP parameters at relatively low rates. >

...read moreread less

Proceedings Article•DOI•

Text-to-speech algorithms based on FFT synthesis

[...]

F. Charpentier¹, Eric Moulines¹•Institutions (1)

CNET¹

11 Apr 1988

TL;DR: The authors present FFT synthesis algorithms for a French text-to-speech system based on diphone concatenation, and an experiment to reduce the computational cost by performing all the FFTs off-line is described.

...read moreread less

Abstract: The authors present FFT synthesis algorithms for a French text-to-speech system based on diphone concatenation. FFT synthesis techniques are capable of producing high quality prosodic modifications of natural speech. Several approaches are presented to reduce the distortions due to diphone concatenation. They are based on appropriate manipulations of the phase spectrum, either by phase equalization across all the diphones, or by phase smoothing between successive diphones. The resulting speech is significantly better quality than with conventional LPC synthesis. An experiment to reduce the computational cost by performing all the FFTs off-line is described. The resulting speech is slightly degraded with respect to 'full' FFT synthesized speech, but it remains more natural in comparison with the LPC speech. >

...read moreread less

Proceedings Article•DOI•

Adaptive labeling: normalization of speech by adaptive transformations based on vector quantization

[...]

A. Nadas¹, David Nahamoo¹, Michael Picheny¹•Institutions (1)

IBM¹

11 Apr 1988

TL;DR: A general technique termed adaptive labeling is presented for the normalization of the speech signal that combines the familiar labeling process executed by a vector quantizer with an adaptive renormalization transformation of the feature vectors proposed here.

...read moreread less

Abstract: A general technique termed adaptive labeling is presented for the normalization of the speech signal. In principle, adaptive labeling is applicable to any sequence of feature vectors of a given dimension. It combines the familiar labeling process executed by a vector quantizer with an adaptive renormalization transformation of the feature vectors proposed here. Adaptive labeling is applied to speech recognition, where the particular interest lies in diminishing the degradation of performance that occurs as a result of changes in the signal characteristics following changes in ambient noise and other recording environment conditions or in response to a change in the characteristics of the talker. Results are presented for a series of experiments using soft and loud noises as well as environments in which microphone-to-speaker distances were allowed to vary. A 5000-word vocabulary with isolated word input was used. >

...read moreread less

Patent•

Line connection switching apparatus for connecting communication line in accordance with matching result of speech pattern

[...]

Norimasa C¹, O Patent Division Nomura¹, Masahiro C, Nishihata Masahiro•Institutions (1)

Toshiba¹

27 Sep 1988

TL;DR: In this paper, a line connection switching apparatus performs connection switching of a communication line used both for speech communication performed by a telephone set and data communication by a facsimile apparatus.

...read moreread less

Abstract: A line connection switching apparatus performs connection switching of a communication line used both for speech communication performed by a telephone set and data communication performed by a facsimile apparatus and is inserted between the telephone set and the facsimile apparatus, and the communication line. When predetermined speech command information is input from a caller side through the communication line, a speech signal is detected by a switching unit. When a voice/silence discriminator discriminates that the speech signal represents voice, a voice interval monitor monitors a duration of the speech signal. If the duration of the speech signal falls within a predetermined range, the speech signal is stored in a speech signal storage unit. A pattern matching unit verifies whether a standard speech pattern of a speech signal registered in standard pattern dictionary unit matches with a speech pattern of the input speech signal stored in the speech signal storage unit. In accordance with a verification result, if the speech pattern of the input speech signal coincides with the standard pattern, the switching unit switches connection of the communication line to the facsimile apparatus.

...read moreread less