Showing papers on "Speech coding published in 1977"

PDF

Open Access

Journal Article•DOI•

Adaptive transform coding of speech signals

[...]

01 Aug 1977-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The main result is that this adaptive transform coder performs better than all known nonpitch-tracking coding schemes; it extends the range of speech waveform coding to lower bit rates and closes the gap between vocoders and predictive waveform coders.

...read moreread less

Abstract: This paper discusses speech coding systems based upon transform coding (TC). It compares several transforms and shows that the cosine transform leads to a nearly optimum performance for almost all speech sounds. Various adaptive coding strategies are then investigated, and a coding scheme is proposed that is based on a nonadaptive discrete cosine transform (DCT), on an adaptive bit assignment, and on adaptive quantization. The adaptation is controlled by a short-term basis spectrum that is derived from the transform coefficients prior to coding and transmission and that is transmitted as side information to the receiver. The main result is that this adaptive transform coder performs better than all known nonpitch-tracking coding schemes; it extends the range of speech waveform coding to lower bit rates and closes the gap between vocoders and predictive waveform coders.

...read moreread less

340 citations

Journal Article•DOI•

Detecting and locating key words in continuous speech using linear predictive coding

[...]

R. Christiansen, C. Rushforth¹•Institutions (1)

University of Utah¹

01 Oct 1977-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A template-matching procedure which uses as its basic waveform features a set of linear prediction coefficients and is used in conjunction with a dynamic-programming time-warp algorithm developed by Bridle and a novel method for using multiple templates.

...read moreread less

Abstract: This paper considers the problem of automatically detecting and locating key words in a stream of continuous speech The system described here is a template-matching procedure which uses as its basic waveform features a set of linear prediction coefficients The similarity measure between a segment of the template and a segment of the incoming speech stream is taken to be a ratio of minimum prediction residuals This similarity measure is used in conjunction with a dynamic-programming time-warp algorithm developed by Bridle and a novel method for using multiple templates Using templates and incoming speech spoken by the same person in a quiet room, an accuracy in excess of 99 percent was obtained Further experiments are described which explore cross-speaker word spotting and the effects of noise on system performance The results of these experiments suggest that the technique described in this paper could well form the basis for a practical system

...read moreread less

63 citations

Journal Article•DOI•

The use of time-domain selection for improved linear prediction

[...]

Kenneth Steiglitz¹, Bradley W. Dickinson•Institutions (1)

Princeton University¹

01 Feb 1977-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: It is shown by theoretical argument and by experiment that selection of an undriven segment of voiced speech for analysis by linear predictive coding (LPC) gives more accurate estimates of the poles of the vocal-tract model.

...read moreread less

Abstract: We show by theoretical argument and by experiment with both synthetic and real data that selection of an undriven segment of voiced speech for analysis by linear predictive coding (LPC) gives more accurate estimates of the poles of the vocal-tract model. In the case of voiced nasal phonemes, this technique provides a simple algorithm for separately determining the poles and the zeros in the model and illustrates the desirability of identifying the portions of the speech wave during which there is a significant driving input. A key problem which remains is the development of a practical algorithm for selecting such segments for analysis.

...read moreread less

40 citations

Journal Article•DOI•

A subjective evaluation of pitch detection methods using LPC synthesized speech

[...]

C. A. McGonegal¹, Lawrence R. Rabiner¹, Aaron E. Rosenberg¹•Institutions (1)

Bell Labs¹

01 Jun 1977-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The quality of LPC (linear predictive coding) analyzed and synthesized speech was evaluated and subject preference as a function of the pitch range of the speaker and the transmission environment used in the recording is discussed.

...read moreread less

Abstract: A subjective evaluation of seven pitch detectors has been carried out using synthetic speech. The evaluation is intended to complement the objective performance evaluation of the same pitch detection algorithms in the investigation of Rabiner et al. [1]. In the earlier study, each of the seven algorithms was evaluated on the basis of its performance with respect to four different types of errors. The standard of comparison was a semiautomatically determined pitch contour of each utterance in the experimental corpus. In the present study, the quality of LPC (linear predictive coding) analyzed and synthesized speech was evaluated. The pitch contour used in the synthesis was obtained either from one of the seven pitch detectors or from the semiautomatic pitch analysis. Using a computer-controlled sort board, an experiment was run in which each of eight listeners was asked to rank the nine versions of each utterance (the natural version was included to provide a stable anchor point). Results are presented on the overall preference for each pitch detector. In addition, subject preference as a function of the pitch range of the speaker and the transmission environment used in the recording is discussed. The present results are compared to those obtained in the earlier objective performance study.

...read moreread less

36 citations

Journal Article•DOI•

Statistical tests and distance measures for LPC coefficients

[...]

P. de Souza¹•Institutions (1)

University of Otago¹

01 Dec 1977-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: It is shown that Itakura's prediction-residual ratio is intuitively unsatisfactory and theoretically misleading as a distance measure, and two slower, but more accurate statistical means of comparison are suggested.

...read moreread less

Abstract: This paper considers the problem of comparing two sets of (LPC) coefficients or, more generally, that of comparing two short segments of speech via LPC techniques. It is shown that Itakura's prediction-residual ratio is intuitively unsatisfactory and theoretically misleading as a distance measure. Two slower, but more accurate statistical means of comparison are suggested, and these are supported by evidence from a simulation study.

...read moreread less

34 citations

Journal Article•DOI•

Microprocessor realization of a linear predictive vocoder

[...]

E.M. Hofstetter¹, J. Tierney, O. Wheeler•Institutions (1)

Massachusetts Institute of Technology¹

01 Oct 1977-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The goal was a low-power, low-cost, compact special-purpose realization of a narrow-band speech terminal, and the resultant design is a general-purpose two-bus structure running at a 150 ns cycle time.

...read moreread less

Abstract: A microprocessor realization for a linear predictive vocoder is presented. The goal was a low-power, low-cost, compact special-purpose realization of a narrow-band speech terminal. The resultant design is a general-purpose two-bus structure running at a 150 ns cycle time, using as the basic signal processing element, four of the AMD 2901 CPE chips. This basic structure is augmented by a four-cycle multiplier to allow for sufficient signal processing power. The design concessions that mark the linear predictive coding microprocessor (LPCM) as a special-purpose machine designed to be a speech terminal are: limited I/O and limited memory. The present design requires 162 dual-in-line packages, dissipates less than 45 W and occupies about \frac{1}{3} ft3.

...read moreread less

32 citations

Journal Article•DOI•

The 1976 modular acoustic processor(MAP)

[...]

N. Dixon¹, Harvey F. Silverman•Institutions (1)

IBM¹

01 Oct 1977-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The modular acoustic processor (MAP), a complex experimental system for automatic derivation of phonemic string output for continuous speech, has stages dedicated to signal analysis, spectral classification, phonemic segmentation, phonetic (steady state) classified, phoneme boundary placement, dyadic (transitional) classification, and final phoneme string consolidation.

...read moreread less

Abstract: The modular acoustic processor (MAP), a complex experimental system for automatic derivation of phonemic string output for continuous speech, has stages dedicated to signal analysis, spectral classification, phonemic segmentation, phonemic (steady state) classification, phoneme boundary placement, dyadic (transitional) classification, and final phoneme string consolidation. This paper presents the concepts of and some details concerning these five stages. Results on a large body of continuous speech data, prepared by an automatic evaluation system, will also be presented.

...read moreread less

28 citations

Journal Article•DOI•

Linear prediction with a variable analysis frame size

[...]

S. Chandra, Wen Lin

01 Aug 1977-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: This paper describes a speech analysis-synthesis system based on stationary linear prediction formulation that uses a variable analysis frame size concept and the k-parameters are used to represent the spectral information in the speech.

...read moreread less

Abstract: This paper describes a speech analysis-synthesis system based on stationary linear prediction formulation. This system uses a variable analysis frame size concept. The k-parameters are used to represent the spectral information in the speech. The statistical and quantization properties of k-parameters are studied in detail. A method for calculating the analysis frame size based on energy and pitch period variations within a speech waveform has been developed. The speech analysis-synthesis system has been implemented on the computing facility of the Signal Processing Laboratory at Case Western Reserve University. Average data rates of 4800, 3600, and 2400 bits/s have been achieved on a limited speech data base of male speakers.

...read moreread less

22 citations

Patent•

Subscription TV audio carrier recovery system

[...]

Stanley E Guif, Terry L Nimmer, Donald A Weigt, Glenn Wolenec, Richard C Gall - Show less +1 more

11 May 1977

TL;DR: In this paper, a subscription TV decoder for processing encoded audio and video signals in which the TV program audio information is received as an audio subcarrier includes means for decoding the video signal, a filter for separating the decoded video and audio signals, and a filter that separates the audio signal from the audio information sub-carrier.

...read moreread less

Abstract: A subscription TV decoder for processing encoded audio and video signals in which the TV program audio information is received as an audio subcarrier includes means for decoding the video signal, a filter for separating the decoded video and audio signals, and a filter for separating the audio signal from the audio information subcarrier. The audio information subcarrier is multiplied to raise it to the frequency of the audio signal and is then recombined with the decoded video signal.

...read moreread less

18 citations

Journal Article•DOI•

Source coding algorithms for fast data compression (Ph.D. Thesis abstr.)

[...]

R. Pasco

01 Jul 1977-IEEE Transactions on Information Theory

14 citations

Proceedings Article•DOI•

PCM speech compression via ADPCM/TASI

[...]

T. McPherson¹, J. O'Neal, R. Stroh•Institutions (1)

North Carolina State University¹

09 May 1977

TL;DR: A data compression technique, Adaptive Differential PCM with Time Assignment Speech Interpolation (TASI), that is capable of reducing the bit rate required for PCM encoded speech is described and evaluated using computer simulation.

...read moreread less

Abstract: A data compression technique, Adaptive Differential PCM (ADPCM) with Time Assignment Speech Interpolation (TASI), that is capable of reducing the bit rate required for PCM encoded speech is described. The particular case of 2:1 compression in a T1 system environment is described in detail and evaluated using computer simulation. The ADPCM/TASI system has wide dynamic range, little degradation under loading, than standard PCM. Signal-to-noise ratios provide an objective metric. An audio tape containing computer processed speech for various ADPCM/TASI systems in various environments accompanies this presentation.

...read moreread less

Journal Article•DOI•

Architecture and Construction of a Hardware Sequential Encoder for Speech

[...]

John B. Anderson¹, C.-W. Ho•Institutions (1)

McMaster University¹

01 Jul 1977-IEEE Transactions on Communications

TL;DR: This work reports on construction of a TTL hardware multi-path sequential encoder which uses the so-called M algorithm search procedure, and hardware peculiar to this type of encoder is discussed, including architecture of the search algorithm sorter, the squared error calculator, and the code generator.

...read moreread less

Abstract: Recent work has shown the usefulness of sequential or "tree" encoding of speech. We report on construction of a TTL hardware multi-path sequential encoder which uses the so-called M algorithm search procedure. The device attains a signal-to-noise ratio of about 20 dB at 16 kbits/s. Hardware peculiar to this type of encoder is discussed, including architecture of the search algorithm sorter, the squared error calculator, and the code generator.

...read moreread less

Proceedings Article•DOI•

A variable band coding scheme for speech encoding at 4.8 kb/s

[...]

R. Crochiere¹, M. R. Sambur•Institutions (1)

Bell Labs¹

09 May 1977

TL;DR: The standard fixed sub-band coding scheme has been modified to allow the center frequency of the two upper bands to vary in accordance with the dynamic movement of the vocal tract resonances F2 and F3 to produce moderate-quality, intelligible speech.

...read moreread less

Abstract: The standard fixed sub-band coding scheme has been modified to allow the center frequency of the two upper bands to vary in accordance with the dynamic movement of the vocal tract resonances F2 and F3. A relatively simple zero-crossing technique is used to measure the formants F2 and F3. Through the use of this variable band coder, it is possible to produce moderate-quality, intelligible speech at 4.8 kb/s (quality is slightly less than that of a 7.2-kb/s fixed sub-band coder and equal to that of about a 16-kb/s adm coder). The reasonably good intelligibility of the 4.8-kb/s variable-band coded speech can be attributed to the coders attempt to capture and encode those spectral components of the signal that are perceptually most significant (the region around the formants). The major advantage of the variable-band scheme is that its implementation is considerably less complex than other waveform coding schemes or vocoder systems that can produce intelligible, narrowband speech.

...read moreread less

Proceedings Article•DOI•

Piecewise linear quantization of LPC reflection coefficients

[...]

C. Un¹, S. Yang•Institutions (1)

SRI International¹

01 May 1977

TL;DR: A new quantization method for coding reflection coefficients in linear predictive coding (LPC) of speech employs piece-wise linear quantization and requires statistical properties of the LPC reflection coefficients.

...read moreread less

Abstract: We present a new quantization method for coding reflection coefficients in linear predictive coding (LPC) of speech. It employs piece-wise linear quantization and requires statistical properties of the LPC reflection coefficients. Although the quantization scheme is based on the density of the frequencies of the coefficient values, it does not neglect the importance of spectral sensitivity. In our informal subjective listening tests it was observed that the quality of synthetic speech with the transmission rate of 2.4 kbits/s coded by the piecewise linear quantization method was equivalent to the quality with the rate of 3 kbits/s coded by a linear quantization method.

...read moreread less

Journal Article•DOI•

Waveform quantization and coding

[...]

Jr. J.B. O'Neal¹•Institutions (1)

North Carolina State University¹

01 Feb 1977-IEEE Transactions on Acoustics, Speech, and Signal Processing

Journal Article•DOI•

A variable-band coding Scheme for speech encoding at 4.8 kb/s

[...]

R. Crochiere¹, M. R. Sambur•Institutions (1)

Bell Labs¹

06 May 1977-Bell System Technical Journal

TL;DR: In this article, a variable band coding scheme was proposed to allow the center frequency of the two upper bands to vary in accordance with the dynamic movement of the vocal tract resonances F2 and F3.

...read moreread less

Proceedings Article•DOI•

Variable-to-fixed rate conversion of narrowband LPC speech

[...]

E. Blackman¹, R. Viswanathan, J. Makhoul•Institutions (1)

BBN Technologies¹

01 May 1977

TL;DR: Variable data rate LPC speech compression schemes are employed to transmit LPC parameters only when speech characteristics have changed sufficiently since the last transmission, yielding improved speech quality relative to fixed-rate schemes for a given average transmission rate.

...read moreread less

Abstract: Variable data rate LPC speech compression schemes are employed to transmit LPC parameters only when speech characteristics have changed sufficiently since the last transmission, yielding improved speech quality relative to fixed-rate schemes for a given average transmission rate. Transmission of variable-rate LPC speech over fixed-rate channels is accomplished using transmit and receive buffers, with resulting transmission delays. Development of proper buffer control strategy is essential to minimize losses caused by exhausting either buffer, or by corrective actions, namely, forced or suppressed transmission. Certain aspects of such strategy and their impact on speech quality and data rate are discussed for a narrowband (2400 bps) speech transmission system.

...read moreread less

Journal Article•DOI•

Hadamard-transformation technique of speech coding: some further results

[...]

E. Frangoulis¹, L.F. Turner¹•Institutions (1)

Imperial College London¹

01 Oct 1977

TL;DR: The results of an extensive investigation of the properties of 64-point Hadamard transformed speech are presented in this article, where detailed information is given about the probability density functions of the hadamard coefficients, the average power-density spectrum in the Hasamard domain and the logical-autocorrelation function.

...read moreread less

Abstract: The results of an extensive investigation of the properties of 64-point Hadamard transformed speech are presented. Detailed information is given about the probability density functions of the Hadamard coefficients, the average power-density spectrum in the Hadamard domain and the logical-autocorrelation function. The results indicate that good-quality speech can be reconstructed from 6 to 8 dominant Hadamard coefficients, but that the use of fewer coefficients is unlikely to lead to the reconstruction of speech of acceptable quality. The results of a preliminary series of listening tests are presented and these confirm conclusions drawn from the statistical properties of the transformed speech. It is shown that the number of bits needed for coefficient labelling constitutes a significant proportion of the total number of bits needed to represent Hadamard transformed speech. A technique is presented for reducing by more than 50% the number of labelling bits needed, and it is explained how, by using this technique, it should be possible to obtain good quality speech when using a transmission bit rate of 8 k bits/s.

...read moreread less

Journal Article•DOI•

Adaptive Delta Modulation for Companded PCM Coding and Decoding

[...]

D. Song

01 May 1977-IEEE Transactions on Communications

TL;DR: The strategy is a shift of circuit emphasis from analog to digital in order to take full advantage of low-cost, low-speed digital processing technology such as MOS/LSI to achieve the desired objectives.

...read moreread less

Abstract: An Adaptive Delta Modulator and demodulator are used as the first and last stages in a system for coding and decoding telephone signals into \mu = 255 Companded Pulse Code Modulation. The system objectives are to devise an economic coder and decoder that is reliable, free of potentiometer adjustments, and convenient for automated manufacturing for large quantity production. The strategy is a shift of circuit emphasis from analog to digital in order to take full advantage of low-cost, low-speed digital processing technology such as MOS/LSI to achieve the desired objectives. The system structure, digital signal processing, system implementation, and performance of the prototype are discussed in this paper.

...read moreread less

The Development of a Computer Speech Processing System and Its Use for the Study and Development of Processing Methods for Enhancing the Intelligibility of Speech in Noise.

[...]

Russell J Niederjohn, Robert A Curtis

01 Oct 1977

TL;DR: Results of an examination of four methods for processing speech so as to enhance its intelligibility in the presence of wideband random noise at the source are described.

...read moreread less

Abstract: : This report describes results of an examination of four methods for processing speech so as to enhance its intelligibility in the presence of wideband random noise at the source. The four methods were: (1) INTEL, a method which involves processing in both the first and second order spectral domains; (2) Spectral subtraction, which involves a simple subtraction of the average noise spectrum from the first-order spectrum; (3) Minimum mean square error filtering, which involves filtering speech in such a way as to minimize the mean square error between a signal and its expected value in noise; and (4) Methods based upon suppressing the frequency content of a speech plus noise signal between pitch harmonics of the speech signal. To carry out a study of methods of enhance speech intelligibility in noise, two general-purpose computer processing systems were implemented. The first, a terminal interactive system for generation, analysis, and graphic display of synthetic voiced speech sounds, provided considerable insight into the effect of various processing algorithms upon speech and upon speech in noise. The second computer processing system has been developed for the processing of real speech. It involves use of a DDP-116 data converter and a Honeywell 6000 Computer.

...read moreread less

Journal Article•DOI•

Voiced-unvoiced discrimination for efficient compression in multichannel telephony

[...]

J.-P. Adoul¹, F. Dumont¹, F. Daaboul¹, M. Rudko¹•Institutions (1)

Université de Sherbrooke¹

01 Jul 1977-Canadian Electrical Engineering Journal

TL;DR: It is shown that in addition to speech interpolation, substantial savings can be achieved through the use of redundancy-reducing transcoding schemes, one for voiced and one for unvoiced sounds.

...read moreread less

Abstract: This paper focuses on the efficient transmission of digital telephone-quality speech in a multichannel situation. It is shown that in addition to speech interpolation, substantial savings can be achieved through the use of redundancy-reducing transcoding schemes, one for voiced and one for unvoiced sounds. Specific waveform properties and coding schemes for both types of sounds, as well as the means for the discrimination, are discussed. The performance of two experimental systems which, respectively, triple and quadruple digital carrier capacity are presented.

...read moreread less

Proceedings Article•DOI•

Polar plane blockquantization of speech signals using bit-pattern matching techniques

[...]

H. Gethoffer¹•Institutions (1)

Darmstadt University of Applied Sciences¹

01 May 1977

TL;DR: The theoretical and practical limits for voiced and unvoiced segmentation using two distinct bit patterns in blockquantization are outlined, and useful constraints for implementing a transform encoder on a fast digital signal processor are derived with computer simulations.

...read moreread less

Abstract: Fourier Transformation and blockquantization offers a good tool for decorrelation and data compression of speech signals as well as for correlated random processes. Polar plane quantization versus cartesian quantization is presented and the objective and subjective performance is discussed according to the effects of introducing phase errors. The theoretical and practical limits for voiced and unvoiced segmentation using two distinct bit patterns in blockquantization are outlined. Useful constraints for implementing a transform encoder on a fast digital signal processor are derived with computer simulations, demonstrating high speech qualities for data rates downto 8-16 kbit/s .

...read moreread less

Journal Article•DOI•

Some displays for computer-analysed speech

[...]

F. Fallside¹, S. Brooks¹•Institutions (1)

University of Cambridge¹

01 Dec 1977

TL;DR: The paper describes the extension of the area function to an areagraph display for continuous speech developed for the training of continuous speech, and various forms of the areagraph are described and compared with the spectrograph.

...read moreread less

Abstract: The classical displays used in speech analysis are the spectrum for single sounds and the spectrograph for continuous speech. Recent work using linear prediction analysis has led to the display of the vocal tract area function and this has been found useful in the speech training of single sounds for the deaf. The paper describes the extension of the area function to an areagraph display for continuous speech. This has been developed for the training of continuous speech, and various forms of the areagraph are described and compared with the spectrograph. Areagraph displays are also thought to be potentially useful in applications other than speech training.

...read moreread less

Journal Article•

Automatic Digital Audio Processing-ADAP

[...]

James Paul

01 Nov 1977-Journal of The Audio Engineering Society

Proceedings Article•DOI•

A low-bit-rate speech coding system using a low cost stack architecture microprocessor

[...]

Subhro Das¹, Charles C. Tappert•Institutions (1)

IBM¹

01 May 1977

TL;DR: A real-time Adaptive Differential Pulse Code Modulation system is described which employs an inexpensive, currently about $25, stack-architecture microprocessor which performs all processing between the taking of input speech samples.

...read moreread less

Abstract: A real-time Adaptive Differential Pulse Code Modulation (ADPCM) system is described which employs an inexpensive, currently about $25, stack-architecture microprocessor. The coder operates at 3 bits/sample and a 10 KHz rate and performs all processing between the taking of input speech samples. The ADPCM tables are stored in the stack for rapid manipulation of table column and row pointers allowing implementation of real-time operation.

...read moreread less

Proceedings Article•DOI•

An application hierarchy for heuristic rules in automatic phonemic segmentation of continuous speech

[...]

N. Dixon¹•Institutions (1)

IBM¹

01 May 1977

TL;DR: The rationale for and some examples from an application hierarchy and a recognition-then-segmentation approach will be presented; this approach has been used fairly successfully in phonemic segmentation of continuous speech.

...read moreread less

Abstract: In Automatic Recognition of Continuous Speech (ARCS), one approach is to segment the speech continuum approximately at the phoneme level as an initial step in abstracting lexical and/or sementic content. If heuristic rules are used for this segmentation, the order of rule application and the character of the data to be used by the rules become important considerations. The rationale for and some examples from an application hierarchy and a recognition-then-segmentation approach will be presented; this approach has been used fairly successfully in phonemic segmentation of continuous speech.

...read moreread less

Proceedings Article•DOI•

Waveform Analysis Using Zero-detection With Applications To Speech Processing

[...]

G.H. Hostetter, M.E. Valdez

07 Nov 1977

TL;DR: A single data line zero-crossing detector input to a microprocessor or other computer is evaluated for use in lieu of analog/digital conversion in the processing of complex continuous-time waveforms.

...read moreread less

Abstract: A single data line zero-crossing detector input to a microprocessor or other computer is evaluated for use in lieu of analog/digital conversion in the processing of complex continuous-time waveforms.

...read moreread less

Proceedings Article•DOI•

On line speech/Data-modem identifier for telephone network

[...]

J.-P. Adoul¹, D. Pradelles•Institutions (1)

Université de Sherbrooke¹

01 May 1977

TL;DR: An efficient yet simple way to resolve the problem of on-line speech/data-modem identification for a class of FSK and PSK modems by using a pattern classifier technique based on four parameters extracted from 8ms- block analysis.

...read moreread less

Abstract: The application of efficient speech compression schemes to digital telephony is hindered by the potential transit of non-speech signal generated by data modems. This obstacle can be overcome through on-line speech/data-modem identification. This paper presents an efficient yet simple way to resolve this problem for a class of FSK and PSK modems. The approach uses a pattern classifier technique based on four parameters extracted from 8ms- block analysis. These are the average log magnitude, the extremum count, the zero-crossing count and max-to-mean amplitude deviation. The speech/data-modem decision is obtained through a piecewise-linear partitionning of the parameter space.

...read moreread less

Proceedings Article•DOI•

Properties of LPC estimates on short speech segments

[...]

John C. Turner¹, Bradley W. Dickinson¹•Institutions (1)

Princeton University¹

01 Dec 1977

TL;DR: The covariance lattice linear prediction method is shown to have some advantages over the other methods for segmenting speech into regions where the spectrum is approximately stationary.

...read moreread less

Abstract: Linear predictive analysis/synthesis methods offer an efficient means of low bit rate encoding of speech signals. This method involves the time segmentation of the speech into regions where the spectrum is approximately stationary. The analysis/ synthesis technique is discussed as well as four methods of determining linear predictive approximations A comparison of these methods on a synthetic speech-like signal and on real speech is presented. The covariance lattice linear prediction method is shown to have some advantages over the other methods for segmenting speech into regions where the spectrum is approximately stationary.

...read moreread less

DOI•

Use of the analysis by synthesis model of speech perception by children acquiring the sound system of language

[...]

Christine Ann Reddy

01 Jan 1977

TL;DR: This thesis reviews recent research into the speech perception process and revises the analysis by synthesis model of speech perception, revealing that the human auditory system is innately equipped to divide s t i m u l i (both speech and non-speech) that vary along c e r t a i n acoustic dimensions into d i s c r e t e classes.

...read moreread less

Abstract: During the time when a c h i l d learns the sound system of h i s language, there i s much evidence that the c h i l d can perceive phonological d i s t i n c t i o n s and therefore detect phonetic differences before he can produce these d i s t i n c t i o n s . This evidence i s often provided to disprove the hypothesis that the c h i l d could be using an " a c t i v e " model of speech perception. One such model, the analysis by synthesis model of speech perception, supposes that decoding of the acoustic s i g n a l employs the a r t i c u l a t o r y representation that would be required to produce the hypothesized i d e n t i t y of the incoming s i g n a l . The model proposes that while the human auditory system i s innately equipped to handle the segments contained i n speech, that the c o r r e l a t i o n s between the acoustic information and a r t i c u l a t i o n are learned with experience and form the basis for the d i v i s i o n of the continuous acoustic s i g n a l into d i s c r e t e categories of speech sounds. This thesis reviews recent research into the speech perception process and revises the analysis by synthesis model. I t reveals that the human auditory system i s innately equipped to divide s t i m u l i (both speech and non-speech) that vary along c e r t a i n acoustic dimensions into d i s c r e t e classes. The unique processing that r e s u l t s f o r speech s t i m u l i , occurs when the s t i m u l i i s recognized as having a function i n the system of language. Hence the requirements for phonetic processing involve the psychological r e a l i z a t i o n that stimulus originated i n the human vocal t r a c t . This i n v e s t i g a t i o n then reviewed the av a i l a b l e l i t e r a t u r e on the perception and production of ch i l d r e n acquiring language to determine

...read moreread less