Showing papers on "Linear predictive coding published in 1982"

PDF

Open Access

Proceedings Article•DOI•

A new model of LPC excitation for producing natural-sounding speech at low bit rates

[...]

B. Atal¹, J. Remde¹•Institutions (1)

03 May 1982

TL;DR: This paper describes a new approach to the excitation problem that does not require a priori knowledge of either the voiced-unvoiced decision or the pitch period, and minimizes a perceptual-distance metric representing subjectively-important differences between the waveforms of the original and the synthetic speech signals.

...read moreread less

Abstract: The excitation for LPC speech synthesis usually consists of two separate signals - a delta-function pulse once every pitch period for voiced speech and white noise for unvoiced speech. This manner of representing excitation requires that speech segments be classified accurately into voiced and unvoiced categories and the pitch period of voiced segments be known. It is now well recognized that such a rigid idealization of the vocal excitation is often responsible for the unnatural quality associated with synthesized speech. This paper describes a new approach to the excitation problem that does not require a priori knowledge of either the voiced-unvoiced decision or the pitch period. All classes of sounds are generated by exciting the LPC filter with a sequence of pulses; the amplitudes and locations of the pulses are determined using a non-iterative analysis-by-synthesis procedure. This procedure minimizes a perceptual-distance metric representing subjectively-important differences between the waveforms of the original and the synthetic speech signals. The distance metric takes account of the finite-frequency resolution as well as the differential sensitivity of the human ear to errors in the formant and inter-formant regions of the speech spectrum.

...read moreread less

600 citations

Proceedings Article•DOI•

Multiple stage vector quantization for speech coding

[...]

Biing-Hwang Juang, A. Gray

03 May 1982

TL;DR: Experimental results show that the quantizer performance is very close to a theoretically predicted asymptotically optimal rate distortion relationship for Euclidean distance measures.

...read moreread less

Abstract: In this paper, we present a multiple stage vector quantization technique which allows easy expansion of the original vector quantizer design to operate at higher bit rates for lower distortion. The computation and storage reduction is achieved by the fact that the overall requirements are the sum of the requirements of each stage instead of an exponentially increasing function of the bit rate as in the original one stage design. In the case of Euclidean distance measures such as the log area ratio measure, experimental results show that the quantizer performance is very close to a theoretically predicted asymptotically optimal rate distortion relationship.

...read moreread less

317 citations

Journal Article•DOI•

Predictive Coding of Speech at Low Bit Rates

[...]

Bishnu S. Atal¹•Institutions (1)

Bell Labs¹

01 Apr 1982-IEEE Transactions on Communications

TL;DR: A new class of speech coders are described which allow one to realize the precise optimum noise spectrum which is crucial to achieving very low bit rates, but also represent the important first step in bridging the gap between waveform coders and vocoders without suffering from their limitations.

...read moreread less

Abstract: Predictive coding is a promising approach for speech coding. In this paper, we review the recent work on adaptive predictive coding of speech signals, with particular emphasis on achieving high speech quality at low bit rates (less than 10 kbits/s). Efficient prediction of the redundant structure in speech signals is obviously important for proper functioning of a predictive coder. It is equally important to ensure that the distortion in the coded speech signal be perceptually small. The subjective loudness of quantization noise depends both on the short-time spectrum of the noise and its relation to the short-time spectrum of the Speech signal. The noise in the formant regions is partially masked by the speech signal itself. This masking of quantization noise by speech signal allows one to use low bit rates while maintaining high speech quality. This paper will present generalizations of predictive coding for minimizing subjective distortion in the reconstructed speech signal at the receiver. The quantizer in predictive coders quantizes its input on a sample-by-sample basis. Such sample-by-sample (instantaneous) quantization creates difficulty in realizing an arbitrary noise spectrum, particularly at low bit rates. We will describe a new class of speech coders in this paper which could be considered to be a generalization of the predictive coder. These new coders not only allow one to realize the precise optimum noise spectrum which is crucial to achieving very low bit rates, but also represent the important first step in bridging the gap between waveform coders and vocoders without suffering from their limitations.

...read moreread less

316 citations

The government standard linear predictive coding algorithm: lpc10

[...]

T Tremain

01 Jan 1982

258 citations

Journal Article•DOI•

An 800 bit/s vector quantization LPC vocoder

[...]

D. Wong, Biing-Hwang Juang, A. Gray

01 Oct 1982-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: An 800 bit/s vector quantization linear predictive coding (LPC) vocoder has been developed that preserves most of the intelligibility of an LPC system and compatibility with any LPC-10 vocoder is guaranteed.

...read moreread less

Abstract: An 800 bit/s vector quantization linear predictive coding (LPC) vocoder has been developed. The recently developed LPC vector quantization theory is applied to reduce the bit rate for LPC coefficients coding by a factor of four. Branch search techniques and separation of voiced and unvoiced codebooks are applied for better algorithm efficiency. Differential coding is applied to reduce the bit rate for the pitch and gain parameters by one third. Formal subjective evaluation shows that the 800 bit/s vocoder preserves most of the intelligibility of an LPC system. It is also robust under different transmission error and acoustic conditions. Informal listening comparisons show the quality to be acceptable and sometimes very close to 2400 bit/s LPC speech. The computational cost of the 800 bit/s vocoder is equivalent to or even lower than the 2400 bit/s LPC-10. Compatibility with any LPC-10 vocoder is guaranteed because the 800 bit/s design only differs in the quantization and encoding algorithms. Further bit rate reduction can be achieved by removing frame to frame redundancy in the code.

...read moreread less

187 citations

Patent•

Transmitting data on the phase of speech

[...]

Raymond Steele¹, Wai C. Wong¹, Costas Xydeas¹•Institutions (1)

Bell Labs¹

05 Aug 1982

TL;DR: In this article, the authors proposed a means for simultaneous transmission of data and speech with only a minimal expansion of the bandwidth of the speech signal, where a Fourier transform is performed on the speech signals and a predetermined number of phase components are replaced with data (d(n)) in an appropriate form.

...read moreread less

Abstract: The present invention relates to a means for achieving simultaneous transmission of data and speech with only a minimal expansion of the bandwidth of the speech signal. A Fourier transform (14) is performed on the speech signal and a predetermined number of phase components are replaced with data (d(n)) in an appropriate form. The number of phase components replaced with data is determined by approximately classifying the speech (16) as either "silence", no data inserted; "unvoiced" speech, M phase components convey data; and "voiced" speech, J phase components convey data; where J is less than M, and M is not greater than the number of phase components in the message band of the speech signal. An inverse Fourier transform (22) is subsequently performed on the combined data and speech signal. The combined message signal (G(t)) will comprise approximately the same bandwidth as the original speech signal, by virtue of the frequency domain insertion of the data into the speech. At the receiver the signal is inspected and a classifier (38) determines if data is embedded in the received signal. If data is deemed embedded, a Fourier transformation is performed, the data carrying phase components are inspected, and the data signal regenerated in an appropriate form. The phase components used for the conveyance of data are replaced by random phase components, and the inverse Fourier transformation performed. Median filtering is employed to mitigate the effects of end-of-block distortion and yield the recovered speech signal.

...read moreread less

121 citations

Journal Article•DOI•

Distortion performance of vector quantization for LPC voice coding

[...]

Biing-Hwang Juang, D. Wong, A. Gray

01 Apr 1982-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The distortion performance of the vector quantization approach for LPC voice coding is examined both analytically and experimentally to show its relationship with the residual minimization process in LPC analysis.

...read moreread less

Abstract: The distortion performance of the vector quantization approach for LPC voice coding is examined both analytically and experimentally. Analytically, interpretations of the interparameter coupling effects of a distortion measure and the clustering nature of the algorithm for LPC vector quantization are obtained to show its relationship with the residual minimization process in LPC analysis. Experimentally, a large database of speech is used to compare its performance and properties to scalar quantization. The results lend further insight into the superior performance of vector quantization.

...read moreread less

104 citations

Proceedings Article•DOI•

Embedding data in speech using scrambling techniques

[...]

R. Steele¹, D. Vitello•Institutions (1)

Bell Labs¹

01 May 1982

TL;DR: It is found that 126 b/s can be transmitted without error over a channel whose additive noise is only 10 dB below the mean square value of the speech signal.

...read moreread less

Abstract: A method of embedding data into speech signals is proposed. The speech signal is scrambled using the data as the scrambling key, while the receiver adopts the role of a code breaker. By judicious choice of scrambling algorithm the receiver can be made to break the code at every attempt. We found that 126 b/s can be transmitted without error over a channel whose additive noise is only 10 dB below the mean square value of the speech signal.

...read moreread less

60 citations

Proceedings Article•DOI•

Segment quantization for very-low-rate speech coding

[...]

S. Roucos¹, Richard Schwartz, J. Makhoul•Institutions (1)

BBN Technologies¹

01 May 1982

TL;DR: A new method for very-low-rate vocoding that the input speech as a sequence of variable-length segments is introduced, using an automatic segmentation algorithm to obtain segments with an average duration comparable to that of a phoneme.

...read moreread less

Abstract: We introduce a new method for very-low-rate vocoding that the input speech as a sequence of variable-length segments. A segment is a by a spectrum of frames, where each frame is represented by a spectrum, pitch and gain. We use an automatic segmentation algorithm to obtain segments with an average duration comparable to that of a phoneme. A segment is quantized as a single block. The distance measure used for quantization incooporates the appropriate time alignment of two segments. We employ a computationally efficient metric that does not use the usual dynamic programming time warping. Two basic vocoders using the above approach of block quantization have been used to transmit intelligible speech at 200 b/s.

...read moreread less

55 citations

Proceedings Article•DOI•

Applications of the short time Fourier transform to speech processing and spectral analysis

[...]

Jont B. Allen¹•Institutions (1)

Bell Labs¹

03 May 1982

54 citations

Proceedings Article•DOI•

Harmonic coding: A low bit-rate, good-quality speech coding technique

[...]

Luís B. Almeida, José Tribolet

01 May 1982

TL;DR: A new coding scheme is presented, which is based on a recently developed spectral model for nonstationary voiced speech, and it forms the basis of a waveform coder and a vocoder which are introduced in this paper, and which share the same basic structure.

...read moreread less

Abstract: Low bit-rate, good-quality speech coding is one of the fundamental goals of today's speech processing research. Present-day coding techniques, like APC and ATC, are able to achieve good-quality transmission only down to about 12 kb/s. Below this rate, their quality degrades rapidly. On the other hand, the various kinds of vocoders, which operate up to about 5 kb/s, have inherent quality limitations which cannot be overcome by an increase of the bit rate. In this paper, a new coding scheme is presented, which is based on a recently developed spectral model for nonstationary voiced speech, and it forms the basis of a waveform coder and a vocoder which are introduced in this paper, and which share the same basic structure. Experimental results are presented, which show that both systems yield significant bit-rate reductions relative to present-day schemes of equivalent quality.

...read moreread less

Journal Article•DOI•

System to independently modify excitation and/Or spectrum of speech waveform without explicit pitch extraction

[...]

S. Seneff¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Aug 1982-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: In this paper, a speech analysis/synthesis system is described which is capable of independent manipulation of the fundamental frequency and spectral envelope of a speech waveform, which has applications in the areas of voice modification, baseband-excited vocoders, time-scale modification, and frequency compression as an aid to the partially deaf.

...read moreread less

Abstract: A new speech analysis/synthesis system is described which is capable of independent manipulation of the fundamental frequency and spectral envelope of a speech waveform. The system deconvolves the original speech with the spectral envelope estimate to obtain a model for the excitation. Hence, explicit pitch extraction is not required. As a consequence, the transformed speech is more natural sounding than would be the case if the excitation were modeled as a sequence of pulses during voiced segments or pseudorandom noise during unvoiced segments. The system has applications in the areas of voice modification, baseband-excited vocoders, time-scale modification, and frequency compression as an aid to the partially deaf.

...read moreread less

Journal Article•DOI•

Variable Frame Rate Transmission: A Review of Methodology and Application to Narrow-Band LPC Speech Coding

[...]

V. Viswanathan¹, J. Makhoul, Richard Schwartz, A. Huggins•Institutions (1)

BBN Technologies¹

01 Apr 1982-IEEE Transactions on Communications

TL;DR: The variable frame rate (VFR) transmission methodology that was developed, implemented, and tested during the period 1973-1978 for efficiently transmitting LPC vocoder parameters extracted from the input speech at a fixed frame rate is reviewed.

...read moreread less

Abstract: We review the variable frame rate (VFR) transmission methodology that we developed, implemented, and tested during the period 1973-1978 for efficiently transmitting LPC vocoder parameters extracted from the input speech at a fixed frame rate. In the VFR method, parameters are transmitted only when their values have changed sufficiently over the interval since their preceding transmission. We explored two distinct approaches to automatic implementation of the VFR method. The first approach bases the transmission decisions on comparisons of the parameter values of the present frame and the last transmitted frame. The second approach, which is based on a functional perceptual model of speech, compares the parameter values of all the frames that lie in the interval between the present frame and the last transmitted frame against a linear model of parameter variation over that interval. The application of VFR transmission to the design of narrow-band LPC speech coders with average bit rates of 2000-2400 bits/s is also considered. The transmission decisions are made separately for the three sets of LPC parameters, pitch, gain, and spectral parameters, using separate VFR schemes. A formal subjective spccch quality test of six selected LPC coders is described, and the results are presented and analyzed in detail. It is shown that a 2075 bit/s VFR coder produces speech quality equal to or better than that of a 5700 bit/s fixed frame rate coder.

...read moreread less

Patent•

Speech intelligibility enhancement system and method

[...]

James M. Kates

04 Oct 1982

TL;DR: In this paper, a short-time estimate of the relative spectral shape of an input speech signal is determined by envelope detectors (24) operating on the outputs of band pass filters (20).

...read moreread less

Abstract: To enhance the intelligibility of speech, the consonant sounds are intensified and, in effect, their intensity equalised to that of the vowel sounds in a speech waveform. A short-time estimate of the relative spectral shape of an input speech signal is determined by envelope detectors (24) operating on the outputs of band pass filters (20). Control means are provided to respond to such relative spectral shape estimate by dynamically controlling a modification of the spectral shape of the actual speech signal so as to produce a modified output speech signal, the control means comprising a combination matrix (28) operating on the outputs of the envelope detectors (24) with a matrix of coefficients and producing weighted signals (29) as control signals. The control signals (29) act on gain selecting logic (30) to determine the gains of multipliers (31) through which respective different portions of the frequency spectrum of the input speech are coupled to a summation circuit (32) producing the consonant - enhanced output speech signal, the respective different portions of the frequency spectrum of the input speech being produced by a bank of filters (20) supplying the envelope detectors (24) or by a set of different filters (26).

...read moreread less

Proceedings Article•DOI•

Time encoding of LPC roots

[...]

P. Papamichalis¹, G. Doddington¹•Institutions (1)

Texas Instruments¹

03 May 1982

TL;DR: A time encoding scheme of the center frequencies of the LPC inverse filter roots is presented, and without quantization, the processed speech is subjectively indistinguishable from the original synthetic speech.

...read moreread less

Abstract: A time encoding scheme of the center frequencies of the LPC inverse filter roots is presented. The roots are represented in terms of center frequency (CF) and bandwidth (BW), and the continuity of the CF's in time is established by a dynamic programming scheme which has a cost function depending on both CFs and BWs. Segmentation points are set at the beginning or end of root tracks, at the voiced-unvoiced transitions and at the peaks of a function measuring the dissimilarity of adjacent frames. The tracks within segments are then fitted by linear combinations of orthogonal polynomials. Without quantization, the processed speech is subjectively indistinguishable from the original synthetic speech. Good quality is achieved even below 1000 bps.

...read moreread less

Patent•DOI•

Speech synthesis system utilizing variable frame rate

[...]

Alva E. Henderson¹, Richard H. Wiggins¹•Institutions (1)

Texas Instruments¹

25 Jan 1982-Journal of the Acoustical Society of America

TL;DR: A frame control circuit accomplishes the foregoing utilization of speech data at a variable frame rate by the speech synthesizer by providing for a variable number of interpolation calculations between adjacent speech frames from last implemented speech data.

...read moreread less

Abstract: Speech synthesis system implementable in an integrated circuit device capable of converting frames of speech data at a variable frame rate into analog signals representative of human speech. The frames of speech data comprise digital representations of values of pitch, energy, filter coefficients and coded frame rate data. The speech synthesis system includes a linear predictive coding filter as a speech synthesizer which utilizes the speech data at a varying frame rate to produce digital speech signals representative of human speech. Frames of digital speech data including coded frame rate data are received by an input, with the frame rate data being decoded to control both the rate at which the incoming variable-length frames of speech data are accepted by the speech synthesizer and the number of interpolation calculations required to define interpolated speech values between adjacent incoming frames of speech data. A frame control circuit accomplishes the foregoing utilization of speech data at a variable frame rate by the speech synthesizer by providing for a variable number of interpolation calculations between adjacent speech frames from last implemented speech data in which the number of interpolation calculations in a given instance is determined by the frame rate data. A microprocessor controls the access of selected speech data which is stored in a memory. The system also includes a digital-to-analog converter for converting the digital speech signals produced by the filter into analog signals and a speaker for generating audible sounds in the form of synthesized human speech from the analog signals provided by the digital-to-analog converter.

...read moreread less

Patent•DOI•

Digital speech processing system having reduced encoding bit requirements

[...]

Stephan Dr. Horvath, Carlo Bernasconi

23 Sep 1982-Journal of the Acoustical Society of America

TL;DR: In this paper, a digitized speech signal is divided into sections and each section is analyzed by the linear prediction method to determine the coefficients of a sound formation model, a sound volume parameter, information concerning voiced or unvoiced excitation and the period of the vocal band base frequency.

...read moreread less

Abstract: A digitized speech signal is divided into sections and each section is analyzed by the linear prediction method to determine the coefficients of a sound formation model, a sound volume parameter, information concerning voiced or unvoiced excitation and the period of the vocal band base frequency. In order to improve the quality of speech without increasing the data rate, redundance reducing coding of the speech parameters is effected. The coding of the speech parameters is performed in blocks of two or three adjacent speech sections. The parameters of the first speech section are coded in a complete form, and those of the other speech sections in a differential form or in part not at all. The average number of bits required per speech section is reduced to compensate for the increased section rate, so that the overall data rate is not increased.

...read moreread less

Proceedings Article•DOI•

A hardware implementation of a new narrow to medium band speech coding

[...]

F. Itakura, T. Kobayashi¹, Masaaki Honda¹•Institutions (1)

Nippon Telegraph and Telephone¹

03 May 1982

TL;DR: A digital speech signal processor designed for a narrow band and a medium band speech coding system, both of them recently developed at the E.C.L., N.T.T., is described.

...read moreread less

Abstract: This paper will describes a digital speech signal processor(DSSP) designed for a narrow band and a medium band speech coding system, both of them are recently developed at the E.C.L., N.T.T.. The narrow band system is LSP(Line Spectrum Pair) vocoder, which is based on the frequency domain representation of LPC parameters. The LSP parameters are superior to conventional LPC parameters in view of its quantization and interpolation properties. The medium band system is a split band adaptive predictive coding with adaptive bit-allocation, APC-AB. This system can achieve a toll quality coding at 16 Kbit/s. We have made an extensive software simulation of LSP and APC-AB, and decided the specification of the DSSP. The DSSP can handle arithmetic and shifting operations in both 12 and 24 bit mode. Multiplications are done by a 12- by-12 parallel multiplier. Multiplier, ALU and shifters can operate concurrently in a pipeline mode at the minimum interval of 0.5 µs per multiplication-accumulation.

...read moreread less

Proceedings Article•DOI•

Comparison of objective speech quality measures for voiceband CODECs

[...]

Nobuhiko Kitawaki, Kenzo Itoh¹, Masaaki Honda¹, K. Kakehi¹•Institutions (1)

Nippon Telegraph and Telephone¹

01 Jan 1982

TL;DR: Speech quality for voiceband CODECs was evaluated by subjective and objective quality measures and it was concluded that the LPC Cepstrum Distance measure had best correspondence to Mean Opinion Score, among the objective measures studied.

...read moreread less

Abstract: This paper describes objective quality measures to evaluate speech quality for various kinds of voiceband CODECs in common. The voiceband CODECs studied were PCM, ADM, ADPCM, ATC (Adaptive Transform Coding) and APC-AB (Adaptive Predictive Coding with Adaptive Bit Allocation). First, several objective quality measures in time and frequency domain were defined. They were SNR, Segmental SNR, Spectral Distortion, LPC Cepstrum Distance, COSH, Likelihood Ratio and Weighted Likelihood Ratio. Second, speech quality for voiceband CODECs were evaluated by subjective and objective quality measures. The subjective measures used were based on opinion test and articulation test. Finally, the relationship between objective measures and subjectively evaluated values was studied. It was concluded that the LPC Cepstrum Distance measure had best correspondence to Mean Opinion Score, among the objective measures studied. It was also concluded that the Wighted Likelihood Ratio measure had best correspondence to Articulation Score.

...read moreread less

Journal Article•DOI•

A speech analysis algorithm which eliminates the influence of pitch using the model reference adaptive system

[...]

Yoshikazu Miyanaga¹, Nobuhiro Miki¹, Nobuo Nagai¹, Kozo Hatori¹•Institutions (1)

Hokkaido University¹

01 Feb 1982-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A new adaptive algorithm based upon a least square criterion with a weighting factor is presented and shown to be quite useful for estimating ARMA parameters together with input in speech analysis.

...read moreread less

Abstract: A new adaptive algorithm based upon a least square criterion with a weighting factor is presented and shown to be quite useful for estimating ARMA parameters together with input in speech analysis. The estimator of both the input pulse train for voiced speech and the input white noise for unvoiced speech are easily obtained from the prediction errors by using this new adaptive algorithm. When these estimated inputs are used as the input of the model to be estimated, the influence of the pitch can be eliminated from the estimated ARMA parameters. By using this method the accuracy of formant and antiformant estimators is shown experimentally in comparison with LPC and cepstrum estimators.

...read moreread less

Proceedings Article•DOI•

Adaptive bit allocation scheme in predictive coding of speech

[...]

Masaaki Honda¹, Nobuhiko Kitawaki¹, F. Itakura¹•Institutions (1)

Nippon Telegraph and Telephone¹

01 Jan 1982

TL;DR: The result shows that the APC-AB system has advantage over the conventional full-band APC system in the segmental SNR and the stability of the prediction loop.

...read moreread less

Abstract: An adaptive predictive coding with adaptive bit allocation (APC-AB) is presented for speech encoding at low to medium bit rates (6.4kb/s-24kb/s). In this system, a split-band predictive coding scheme and a bit allocation scheme are employed in order to remove the redundancies due to a periodic concentration of the prediction residual energy as well as nonuniform nature of the speech spectrum. Quantization bits are dynamically allocated both over the sub-bands(frequency domain) and over the subintervals(time domain) in accordance with the distribution of the residual energies in the time-frequency domain. Optimum bit allocation is derived based on the mean square error criterion on the speech waveform, and the SNR gain is presented in relation to the prediction gain of the full-band signal. This system is evaluated in terms of the segmental SNR and speech quality. The result shows that the APC-AB system has advantage over the conventional full-band APC system in the segmental SNR and the stability of the prediction loop. It was also shown that this system can provide speech quality subjectively equivalent to 7 bit Log-PCM at 16 kb/s, and to 6 bit Log-PCM at 9.6 kb/s.

...read moreread less

Patent•

Three-party conference circuit for digital time-division-multiplex communication systems

[...]

Barry D. Lubin¹•Institutions (1)

Motorola¹

28 Jun 1982

TL;DR: In this paper, a three-party conference circuit is proposed for three party conference calls in time-division multiplex communication systems utilizing pulse-code modulation (P-code) modulation, where speech signals from a selected group of channels are received from an incoming PCM highway, expanded from PCM coding to linear coding and consecutively stored in groups of three in three registers.

...read moreread less

Abstract: A three-party conference circuit provides for three-party conference calls in time-division-multiplex communication systems utilizing pulse-code modulation. The digitized speech signals from a selected group of channels are received from an incoming PCM highway, expanded from PCM coding to linear coding and consecutively stored in groups of three in three registers. Different pairs of the registers are selected by a multiplexer, after which the speech signals from the selected pairs of registers are added to provide a combined speech signal. The combined speech signals are compressed from linear coding to PCM coding and applied to an outgoing PCM highway for transmission to the respective parties. Since the speech signals are combined and retransmitted essentially within the same frame, time delay distortion of the audio signals is minimized. In addition, the three-party conference circuit provides for a diagnostic mode of operation where incoming speech signals are retransmitted and a broadcast mode of operation where the speech signals from one channel are combined with each of the other channels of the selected group of channels to be transmitted.

...read moreread less

Journal Article•DOI•

A Multirate Voice Digitizer Based Upon Vector Quantization

[...]

G. Rebolledo¹, Robert M. Gray², J. Burg•Institutions (2)

National Autonomous University of Mexico¹, Stanford University²

01 Apr 1982-IEEE Transactions on Communications

TL;DR: The design and simulation of a multirate voice digitizer (MRVD) that switches between two speech compression systems, each based on a recently developed vector quantization (VQ) coding technique, which is shown to have a simpler architecture and to provide comparable speech quality.

...read moreread less

Abstract: The importance of integrating voice and data over digital networks has increased during the last few years primarily because of the growing popularity of such networks. Of particular interest are efficient voice digitizing terminals, capable of operating at various data rates in both circuit-switched and packet-switched data networks. Several such terminals, including two or more speech compression algorithms, have been proposed and implemented. Typically the terminal switches between a low-rate (500 - 4000 bits/s) vocoding scheme and a medium-rate (7000 - 16000 bits/s) waveform coding algorithm, depending on, among other things, the network congestion and on the desired voice quality and robustness. We here describe the design and simulation of a multirate voice digitizer (MRVD) that switches between two speech compression systems, each based on a recently developed vector quantization (VQ) coding technique. This technique consists of the off-line interactive design of a codebook minimizing an average distortion measure, followed by the use of the codebook in an on-line nearest neighbor encoding scheme. One of the two systems is a rate-distortion speech coder that resembles a linear predictive coding (LPC) speech compression system but has a much lower rate (800 bits/s and below). We call this the LPC-VQ system, and it is similar to other previously reported systems [15],[19],[21]. The only difference is that the LPC parameters are extracted using the Burg method instead of the autocorrelation method. We here show that this provides both qualitative and quantitative improvements. The other system of our MRVD is a residual-excited linear predictive (RELP) speech compression system using VQ in both model selection and residual digitization. The residual waveform is digitized at 1 or 2 bits/sample, resulting in rates of 7300 and 13800 bits/s, respectively. We call this the RELP-VQ system. When compared to other RELP systems [6]-[8], it is shown to have a simpler architecture and to provide comparable speech quality. In a direct comparison with an APC scheme, our RELP-VQ system was determined to provide a more natural speech sound. Another interesting result presented is the quantitative comparison of the application of the VQ algorithm to the original speech waveform and its residuals.

...read moreread less

Patent•DOI•

Speech processing system including an amplitude level control circuit for digital processing

[...]

Hiroyuki C O Nippon Electric Co. Ltd. Kaneda

04 Mar 1982-Journal of the Acoustical Society of America

TL;DR: In this article, a speech processor having microprocessor control of the amplitude level of input speech signals is applied to a digitally controlled level regulator, the output of which is converted into a digital speech signal for further speech processing.

...read moreread less

Abstract: A speech processor having microprocessor control of the amplitude level of input speech signals. Input speech signals are applied to a digitally controlled level regulator, the output of which is converted into a digital speech signal for further speech processing. The peak level of the digital speech signals over a frame period is compared in the microprocessor with a preset optimum range. If the peak level falls outside the optimum range, control signals for the level regulator are adjusted in a direction to change the amplification/attenuation amount of the level regulator to bring the peak level within the optimum range.

...read moreread less

Proceedings Article•DOI•

Discrete utterance recognition based upon source coding techniques

[...]

A. Buzo, H. Martinez, C. Rivera

03 May 1982

TL;DR: A speaker-independent isolated word recognition system is described which is based on some techniques and results from rate-distortion speech coders and the Itakura Saito distortion measure is used to design the system (or selection of the patterns) and for the decision step.

...read moreread less

Abstract: A speaker-independent isolated word recognition system is described which is based on some techniques and results from rate-distortion speech coders. The recognition system can be viewed as a minimum distortion or nearest-neighbor system where the distortion measure is defined between an observed sequence of frames of speech and a reference pattern. The patterns are sequences of sets of LPC models. Every one of the sets of each pattern consist of a collection of LPC models that "best" reproduces a given frame of a word from a training sequence. The Itakura Saito distortion measure is used to design the system (or selection of the patterns) and for the decision step.

...read moreread less

Proceedings Article•DOI•

A systolic processing element for speech recognition

[...]

Neil Weste¹, D. Burr¹, Bryan D. Ackland¹•Institutions (1)

Bell Labs¹

01 Jan 1982

TL;DR: An integrated 16b CMOS processor designed for systolic array processing, with programmable processors, capable of performing the pattern matching required for speech recognition of up to 25,000 words per second will be described.

...read moreread less

Abstract: An integrated 16b CMOS processor designed for systolic array processing, with programmable processors, capable of performing the pattern matching required for speech recognition of up to 25,000 words per second will be described.

...read moreread less

Journal Article•DOI•

Linear prediction, extermal entropy and prior information in speech signal analysis and synthesis

[...]

Manfred R. Schroeder¹, Manfred R. Schroeder²•Institutions (2)

Bell Labs¹, University of Göttingen²

01 May 1982-Speech Communication

TL;DR: The fundamental concepts of Linear Prediction and Maximum Entropy spectral analysis are reviewed, and the powerful principle of Minimum Cross-Entropy (MCE) spectral analysis is introduced, allowing the incorporation of prior information into signal analysis.

...read moreread less

Journal Article•DOI•

Feature Extraction by System Identification

[...]

Bruce A. Eisenstein, Richard J. Vaccaro

01 Jan 1982

TL;DR: The perturbation analyses done in this research verify the viability of using the parameters of a process model as a feature vector in a pattern recognition scheme.

...read moreread less

Abstract: A method for the extraction of features for pattern recognition by system identification is presented. A test waveform is associated with a parameterized process model (PM) which is an inverse filter. The structure of the PM corresponds to the redundant information in a waveform, and the parameter values correspond to the discriminatory information. The PM used in this research is a linear predictive system whose parameters are the linear predictive coefficients (LPC's). This technique is applied to feature extraction of electrocardiograms (ECG's) for differential diagnosis. The LPC's are calculated for each ECG and used as a feature vector in a hypergeometric affine N-space spanned by the LPC's. The efficacy of this feature extraction technique is tested by three different perturbation methods, namely noise, matrix distortion, and a newly developed method called directed distortion. Both the Euclidean and Itakura distances between feature vectors in N-space are shown in increase with increasing perturbation of the template waveform. The monotonic behavior of a distance measure is a necessary attribute of a valid feature space. Thus the perturbation analyses done in this research verify the viability of using the parameters of a process model as a feature vector in a pattern recognition scheme.

...read moreread less

Journal Article•DOI•

A Time Warping Approach to Fundamental Period Estimation

[...]

Hermann Ney

01 May 1982

TL;DR: A time warping approach for estimating the fundamental period of a speech signal is described, and a computationally simple recursive algorithm is obtained which has the capability of tracking the time-varying period.

...read moreread less

Abstract: A time warping approach for estimating the fundamental period of a speech signal is described. The problem of estimating the period of a signal corrupted by background noise is formulated in terms of a minimum error function which defines a matching score between two signal segments and, depending on the error criterion, leads to well-known estimation schemes. To allow for the fact that the speech signal is only a quasi-periodic waveform with time-varying structure from period to period, a nonlinear time warping function is introduced. Using a dynamic programming technique, a computationally simple recursive algorithm is obtained which has the capability of tracking the time-varying period.

...read moreread less

Patent•DOI•

Speech analysis circuits using an inverse lattice network

[...]

Richard H. Wiggins¹•Institutions (1)

Texas Instruments¹

29 Apr 1982-Journal of the Acoustical Society of America

TL;DR: In this article, a single multiplier (26) and a single adder (27) are used in series together with appropriate registers or memories (28, 29, 30) so that the output of the adder is either looped back to the multiplier itself so as to perform the operations of an all-zero digital lathe filter moving average, linear predictive coding, or alternatively is looped, after a delay, to the multipliers.

...read moreread less

Abstract: An adaptive filter suitable for speech analysis which is constructable on a silicon chip. The filter has a single multiplier (26) and a single adder (27) in series together with appropriate registers or memories (28, 29, 30) so that the output of the adder is either looped back to the adder itself so as to perform the operations of an all-zero digital lathe filter moving average, linear predictive coding, or alternatively is looped, after a delay, to the multiplier. Another register (25) provides the multiplier with the constant values typically associated with lattice filter coefficiency. Preferably the multiplier (26) is an M-Stage pipeline multiplier so as to reduce the area necessary for its incorporation on a silicon chip. The filter accepts digital samples of the voice input (102) and outputs (17) a compressed data representation of the input.

...read moreread less