Showing papers on "Speech coding published in 1976"

PDF

Open Access

Journal Article•DOI•

A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition

[...]

Bishnu S. Atal¹, Lawrence R. Rabiner¹•Institutions (1)

01 Jun 1976-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A pattern recognition approach for deciding whether a given segment of a speech signal should be classified as voiced speech, unvoiced speech, or silence, based on measurements made on the signal, which has been found to provide reliable classification with speech segments as short as 10 ms.

...read moreread less

Abstract: In speech analysis, the voiced-unvoiced decision is usually performed in conjunction with pitch analysis The linking of voiced-unvoiced (V-UV) decision to pitch analysis not only results in unnecessary complexity, but makes it difficult to classify short speech segments which are less than a few pitch periods in duration In this paper, we describe a pattern recognition approach for deciding whether a given segment of a speech signal should be classified as voiced speech, unvoiced speech, or silence, based on measurements made on the signal In this method, five different measurements are made on the speech segment to be classified The measured parameters are the zero-crossing rate, the speech energy, the correlation between adjacent speech samples, the first predictor coefficient from a 12-pole linear predictive coding (LPC) analysis, and the energy in the prediction error The speech segment is assigned to a particular class based on a minimum-distance rule obtained under the assumption that the measured parameters are distributed according to the multidimensional Gaussian probability density function The means and covariances for the Gaussian distribution are determined from manually classified speech data included in a training set The method has been found to provide reliable classification with speech segments as short as 10 ms and has been used for both speech analysis-synthesis and recognition applications A simple nonlinear smoothing algorithm is described to provide a smooth 3-level contour of an utterance for use in speech recognition applications Quantitative results and several examples illustrating the performance of the method are included in the paper

...read moreread less

479 citations

Proceedings Article•DOI•

Digital coding of speech in sub-bands

[...]

R. Crochiere¹, S. A. Webber, James L. Flanagan•Institutions (1)

Bell Labs¹

12 Apr 1976

TL;DR: A rationale is advanced for digitally coding speech signals in terms of sub-bands of the total spectrum, which provides a means for controlling and reducing quantizing noise in the coding.

...read moreread less

Abstract: A rationale is advanced for digitally coding speech signals in terms of sub-bands of the total spectrum. The approach provides a means for controlling and reducing quantizing noise in the coding. Each sub-band is quantized with an accuracy (bit allocation) based upon perceptual criteria. As a result, the quality of the coded signal is improved over that obtained from a single full-band coding of the total spectrum. In one implementation, the individual sub-bands are low-pass translated before coding. In another, "integer-band" sampling is employed to alias the signal in an advantageous way before coding. Other possibilities extend to complex demodulation of the sub-bands, and to representing the subband signals in terms of envelopes and phase-derivatives. In all techniques, adaptive quantization is used for the coding, and a parsimonious allocation of bits is made across the bands. Computer simulations are made to demonstrate the signal qualities obtained for codings at 16 and 9.6 Kbits/sec.

...read moreread less

276 citations

Journal Article•DOI•

Digital coding of speech in sub-bands

[...]

R. Crochiere¹, S. A. Webber, James L. Flanagan•Institutions (1)

Bell Labs¹

01 Oct 1976-Bell System Technical Journal

TL;DR: A rationale is advanced for digitally coding speech signals in terms of sub-bands of the total spectrum, which provides a means for controlling and reducing quantizing noise in the coding.

...read moreread less

Abstract: A rationale is advanced for digitally coding speech signals in terms of sub-bands of the total spectrum. The approach provides a means for controlling and reducing quantizing noise in the coding. Each sub-band is quantized with an accuracy (bit allocation) based upon perceptual criteria. As a result, the quality of the coded signal is improved over that obtained from a single full-band coding of the total spectrum. In one implementation, the individual sub-bands are low-pass translated before coding. In another, “integer-band” sampling is employed to alias the signal in an advantageous way before coding. Other possibilities extend to complex demodulation of the sub-bands, and to representing the sub-band signals in terms of envelopes and phase-derivatives. In all techniques, adaptive quantization is used for the coding, and a parsimonious allocation of bits is made across the bands. Computer simulations are made to demonstrate the signal qualities obtained for codings at 16 and 9.6 kb/s.

...read moreread less

252 citations

Book•

Source coding algorithms for fast data compression

[...]

Richard Clark Pasco

01 Jan 1976

224 citations

Journal Article•DOI•

The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression

[...]

R. Niederjohn¹, J. Grotelueschen•Institutions (1)

Marquette University¹

01 Aug 1976-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: It is shown that this new method resuits in a substantial improvement in the intelligibility of speech in white noise over normal speech and over previously implemented methods.

...read moreread less

Abstract: This paper presents the results of an examination of rapid amplitude compression following high-pass filtering as a method for processing speech, prior to reception by the listener, as a means of enhancing the intelligibility of speech in high noise levels. Arguments supporting this particular signal processing method are based on the results of previous perceptual studies of speech in noise. In these previous studies, it has been shown that high-pass filtered/clipped speech offers a significant gain in the intelligibility of speech in white noise over that for unprocessed speech at the same signal-to-noise ratios. Similar results have also been obtained for speech processed by high-pass filtering alone. The present paper explores these effects and it proposes the use of high-pass filtering followed by rapid amplitude compression as a signal processing method for enhancing the intelligibility of speech in noise. It is shown that this new method resuits in a substantial improvement in the intelligibility of speech in white noise over normal speech and over previously implemented methods.

...read moreread less

131 citations

Journal Article•DOI•

Synthesis of speech from unrestricted text

[...]

J. Allen¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Apr 1976

TL;DR: The resulting system serves as a model for the cognitive process of reading aloud, and also as a stable practical means for providing speech output in a broad class of computer-based systems.

...read moreread less

Abstract: For many applications, it is desirable to be able to convert arbitrary English text to natural and intelligible sounding speech. This transformation between two surface forms is facilitated by first obtaining the common underlying abstract linguistic representation which relates to both text and speech surface representations. Calculation of these abstract bases then permits proper selection of phonetic segments, lexical stress, juncture, and sentence-level stress and intonation. The resulting system serves as a model for the cognitive process of reading aloud, and also as a stable practical means for providing speech output in a broad class of computer-based systems.

...read moreread less

116 citations

Patent•DOI•

Speech coding hearing aid system utilizing formant frequency transformation

[...]

William J. Strong¹, Edward Paul Palmer¹•Institutions (1)

Brigham Young University¹

29 Mar 1976-Journal of the Acoustical Society of America

TL;DR: In this paper, the transposed formant frequencies are determined at successive intervals in the speech signal using a fixed value, greater than 1, and added to this fixed value is another fixed value to obtain what are called transposed fundamental frequencies.

...read moreread less

Abstract: A hearing aid system and method includes apparatus for receiving a spoken speech signal, apparatus coupled to the receiving apparatus for determining at successive intervals in the speech signal the frequency and amplitude of the largest formants, apparatus for determining at successive intervals the fundamental frequency of the speech signal, and apparatus for determining at successive intervals whether or not the speed signal is voiced or unvoiced. Each successively determined formant frequency is divided by a fixed value, greater than 1, and added thereto is another fixed value, to obtain what are called transposed formant frequencies. The fundamental frequency is also divided by a fixed value, greater than 1, to obtain a transposed fundamental frequency. At the successive intervals, sine waves having frequencies corresponding to the transposed formant frequencies and the transposed fundamental frequency are generated, and these sine waves are combined to obtain an output signal which is applied to a transducer for producing an auditory signal. The amplitudes of the sine waves are functions of the amplitudes of corresponding formants. If it is determined that the speech signal is unvoiced, then no sine wave corresponding to the transposed fundamental frequency is produced and the other sine waves are noise modulated. The auditory signal produced by the transducer in effect constitutes a coded signal occupying a frequency range lower than the frequency range of normal speech and yet which is in the residual-hearing range of many hearing-impaired persons.

...read moreread less

74 citations

Journal Article•DOI•

Real-time adaptive linear prediction using the least mean square gradient algorithm

[...]

D. Morgan¹, S. Craig•Institutions (1)

Syracuse University¹

01 Dec 1976-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: An alternate approach that uses the least mean square (LMS) gradient, stochastic-approximation algorithm, commonly used in many other adaptive systems is described, and a complete 8-coefficient hardward system has been designed and constructed and is described in this paper.

...read moreread less

Abstract: Adaptive linear prediction (ALP) recently has received a great deal of attention for spectral analysis, system modeling, and speech encoding. The conventional approach used to implement ALP involves the computation of a sample covariance matrix for a block of data and solution of an associated set of simultaneous equations to obtain the predictor coefficients. This paper describes an alternate approach that uses the least mean square (LMS) gradient, stochastic-approximation algorithm, commonly used in many other adaptive systems. A complete 8-coefficient hardward system based on this approach has been designed and constructed and is described in this paper. The system consists of an analyzer that computes the eight ALP coefficients in real time and a reconstructor that forms an all-pole model filter using the computed coefficients. Several examples are presented to illustrate the concepts introduced. Each example includes an analytical discussion followed by experimental verification. Applications of ALP for spectral analysis, instantaneous frequency measurement, and speech encoding are discussed and experimental results obtained with the real-time hardware are presented.

...read moreread less

63 citations

Journal Article•DOI•

A pitch-synchronous digital feature extraction system for phonemic recognition of speech

[...]

W. Hess¹•Institutions (1)

Technische Universität München¹

01 Feb 1976-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The system described in this paper is subdivided into three main steps: pitch extraction, segmentation, and formant analysis, which uses an adaptive digital filter in time-domain transforming the speech signal into a signal similar to the glottal waveform.

...read moreread less

Abstract: The system described in this paper is subdivided into three main steps: pitch extraction, segmentation, and formant analysis. The pitch extractor uses an adaptive digital filter in time-domain transforming the speech signal into a signal similar to the glottal waveform. Using the levels of the speech signal and the differenced signal as parameters in time domain, the subsequent segmentation algorithm derives a signal parameter which describes the speed of articulatory movement. From this, the signal is divided into "stationary" and "'transitional" segments; one stationary segment is associated to one phoneme. For the formant tracking procedure, a subset of the pitch periods is selected by the segmentation algorithm and is transformed into frequency domain. The formant tracking algorithm uses a maximum detection strategy and continuity criteria for adjacent spectra. After this step, the total parameter set is offered to an adaptive universal pattern classifier which is trained by selected material before working. For stationary phonemes, the recognition rate is about 85 percent when training material and test material are uttered by the same speaker. The recognition rate is increased to about 90 percent when segmentation results are used.

...read moreread less

47 citations

Journal Article•DOI•

A comparison of several speech-spectra classification methods

[...]

Harvey F. Silverman¹, N. Dixon•Institutions (1)

IBM¹

01 Aug 1976-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: Of those evaluated, a linearly mean-corrected minimum distance measure, on a 40-point spectral representation with a square (or cube) norm was consistently superior to the other methods.

...read moreread less

Abstract: An important consideration in speech processing involves classification of speech spectra. Several methods for performing this classification are discussed. A number of these were selected for comparative evaluation. Two measures of performance-accuracy and stability-were derived through the use of an automatic performance evaluation system. Over 3000 hand-labeled spectra were used. Of those evaluated, a linearly mean-corrected minimum distance measure, on a 40-point spectral representation with a square (or cube) norm was consistently superior to the other methods.

...read moreread less

39 citations

Journal Article•DOI•

Algorithms for Delayed Encoding in Delta Modulation with Speech-Like Signals

[...]

J. Uddenfeldt¹, L. Zetterberg¹•Institutions (1)

Royal Institute of Technology¹

01 Jun 1976-IEEE Transactions on Communications

TL;DR: It is found that delayed encoding allows a fairly general predictor to be used without causing instability problems andSimulations indicate that considerable improvement can be achieved by matching the feedback filter to the input process.

...read moreread less

Abstract: This concise paper is concerned with the problem of improved delta-coding by using delayed decision instead of bit-bit-by-bit decision. It is found that delayed encoding allows a fairly general predictor to be used without causing instability problems. Simulations indicate that considerable improvement can be achieved by matching the feedback filter to the input process. Delayed encoding requires a search algorithm for making the decisions. Some proposals of algorithms that are efficient from a computational point of view are presented. Particular interest is attached to a highly truncated version of the Viterbi algorithm, which seems very promising.

...read moreread less

Journal Article•DOI•

A comparison of three methods of extracting resonance information from predictor-coefficient coded speech

[...]

R. Christensen, William J. Strong, E. Palmer

01 Feb 1976-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: In this paper, three methods of extracting resonance information from predictor-coefficient coded speech are compared: finding roots of the polynomial in the denominator of the transfer function using Newton iteration, picking peaks in the spectrum of the transferred function, and picking peaks on the negative of the second derivative of the spectrum.

...read moreread less

Abstract: Three methods of extracting resonance information from predictor-coefficient coded speech are compared. The methods are finding roots of the polynomial in the denominator of the transfer function using Newton iteration, picking peaks in the spectrum of the transfer function, and picking peaks in the negative of the second derivative of the spectrum. A relationship was found between the bandwidth of a resonance and the magnitude of the second derivative peak. Data, accumulated from a total of about two minutes of running speech from both female and male talkers, are presented illustrating the relative effectiveness of each method in locating resonances. The second-derivative method was shown to locate about 98 percent of the significant resonances while the simple peak-picking method located about 85 percent.

...read moreread less

Proceedings Article•DOI•

Speech processing by splicing of autocorrelation function

[...]

J. Suzuki

01 Apr 1976

TL;DR: A speech processing system named SPAC (SPlicing of AutoCorrelation function) is proposed in order to compress or expand the speech spectrum, to prolong or shorten the duration of utterance, and to reduce the noise level in speech signal.

...read moreread less

Abstract: A speech processing system named SPAC (SPlicing of AutoCorrelation function) is proposed in order to compress or expand the speech spectrum, to prolong or shorten the duration of utterance, and to reduce the noise level in speech signal. A period of short-time autocorrelation function is sampled and spliced after change of the time scale. Transformed speech is quite natural and free from distortion. Applications of SPAC are expected in many fields such as improvement of speech quality, narrow band transmission, communication aid for hard of hearing, information service for blind, unscrambling of helium speech, stenography and so on.

...read moreread less

Proceedings Article•DOI•

Linear predictive coding systems

[...]

T. Tremain¹•Institutions (1)

National Security Agency¹

01 Apr 1976

TL;DR: The real time implementation of a Linear Predictive Coding algorithm that has been developed over the past five years is described, using a modification of the Covariance Method for the analyzer and the system for pitch extraction and smoothing.

...read moreread less

Abstract: This paper describes the real time implementation of a Linear Predictive Coding algorithm that has been developed over the past five years. The algorithm chosen for the analyzer is a modification of the Covariance Method introduced by B. S. Atal [1],[2] of Bell Labs. The system for pitch extraction uses a minimum distance function correlation technique. A dynamic programming algorithm [3] is used for pitch smoothing and correction of isolated pitch errors. The synthesizer uses a transversal filter. Considerable time has been devoted to optimizing the running time and integer scaling of the different algorithms for real time implementation on a 16 bit mini-computer.

...read moreread less

Proceedings Article•DOI•

Subjective evaluation of PCM coded speech

[...]

D.J. Goodman¹, B. J. Mcdermott, L. H. Nakatani•Institutions (1)

Bell Labs¹

01 Apr 1976

TL;DR: Subjective quality ratings of pcm coded speech were obtained with the aims of determining the effects of certain coder parameters and their interactions on speech quality, finding objective measures for predicting perceived distortions, and providing guidelines for optimizing coder design.

...read moreread less

Abstract: An experiment was performed to investigate: (a) the influence of PCM code parameters on subjective speech quality, (b) objective measures for predicting perceived distortions and (c) optimum combinations of code parameters. The results indicate that listener opinions depended strongly on coder clipping level and step size but only weakly on bandwidth. Clipping noise power proved a poor predictor of perceived overload distortion; clipping percentage was more useful. Granular noise power was a good predictor of granular distortion. For a given bit rate, the coder with the highest quality rating was not the one with minimum total (clipping + granular) noise power, contrary to traditional wisdom.

...read moreread less

Proceedings Article•DOI•

Speech recognition in the question-answering system operated by conversational speech

[...]

M. Kohda, R. Nakatsu¹, K. Shikano¹•Institutions (1)

Nippon Telegraph and Telephone¹

12 Apr 1976

TL;DR: The voice-operated question-answering system for seat reservation is constructed by computer simulation technique and the promising results are obtained.

...read moreread less

Abstract: The speech recognition system composing a part of the question-answering system operated by conversational speech is described. The recognition system consists of two stages of process : acoustic processing stage and linguistic processing stage. In the acoustic processing stage, input speech is analyzed and transformed into the phoneme sequence which usually contains ambiguities and errors caused in the segmentation and phoneme recognition. In the linguistic processing stage, the phoneme sequence containing ambiguities and errors is converted into the correct word sequence by the use of the linguistic knowledge such as phoneme rewriting rules, lexicon, syntax, semantics and pragmatics. The voice-operated question-answering system for seat reservation is constructed by computer simulation technique and the promising results are obtained.

...read moreread less

Book Chapter•DOI•

Fundamental Frequency Estimation

[...]

John D. Markel, Augustine H. GrayJr.

01 Jan 1976

TL;DR: The fundamental frequency (F0) is the rate at which glottal volume velocity pulses are applied to the vocal tract, i.e., the driving function to the model is periodic with a period of 1/F0.

...read moreread less

Abstract: The fundamental frequency (F0) is a basic parameter in acoustical studies of speech. It is also a necessary parameter for low bit rate speech coding systems. It is generally considered to be one of the acoustical correlates to the perceived intonation pattern of speech. If the fundamental frequency of a speaker is constant, the speech would be perceived as being machine-like or monotone. If the speaker is excited, the fundamental frequency generally increases. It is the acoustical correlate to the rate at which the vocal folds open and close (or vibrate). If the folds are vibrating rapidly, a high fundamental frequency will be measured. In the linear speech production model, the fundamental frequency is the rate at which glottal volume velocity pulses are applied to the vocal tract, i.e., the driving function to the model is periodic with a period of 1/F0.

...read moreread less

Proceedings Article•DOI•

Methods for nonlinear spectral distortion of speech signals

[...]

John Makhoul¹•Institutions (1)

BBN Technologies¹

01 Apr 1976

TL;DR: This paper presents a general analysis-synthesis scheme for the arbitrary spectral distortion of speech signals without the need for pitch extraction, and linear predictive warping, cepstral Warping, and autocorrelation warping are given as examples of the general scheme.

...read moreread less

Abstract: The spectral distortion of speech signals, without affecting the pitch or the speed of the signal, has met with some difficulty due to the need for pitch extraction. This paper presents a general analysis-synthesis scheme for the arbitrary spectral distortion of speech signals without the need for pitch extraction. Linear predictive warping, cepstral warping, and autocorrelation warping, are given as examples of the general scheme. Applications include the unscrambling of helium speech, spectral compression for the hard of hearing, bit rate reduction in speech compression systems, and efficiency of spectral representation for speech recognition systems.

...read moreread less

Journal Article•DOI•

Detecting the Presence of Speech Using ADPCM Coding

[...]

R. Schafer¹, K. Jackson, J. Dubnowski, Lawrence R. Rabiner•Institutions (1)

Georgia Institute of Technology¹

01 May 1976-IEEE Transactions on Communications

TL;DR: A simple algorithm for locating the beginning and end of a speech utterance has been developed that has been tested in computer simulations and has been constructed with standard integrated circuit technology.

...read moreread less

Abstract: When speech is coded using a differential pulse-code modulation system with an adaptive quantizer, the digital code words exhibit considerable variation among all quantization levels during both voiced and unvoiced speech intervals. However, because of limits on the range of step sizes, during silent intervals the code words vary only slightly among the smallest quantization steps. Based on this principle, a simple algorithm for locating the beginning and end of a speech utterance has been developed. This algorithm has been tested in computer simulations and has been constructed with standard integrated circuit technology.

...read moreread less

Proceedings Article•DOI•

Piecewise linear predictive coding (PLPC)

[...]

J. Roberts¹, R. Wiggins•Institutions (1)

Mitre Corporation¹

01 Apr 1976

TL;DR: A technique that splits the spectrum into two equal halves and performs a piecewise LPC approximation to each half is described, and the fidelity is expected to be higher than standard LPC.

...read moreread less

Abstract: A great deal of current research in the area of narrowband digital speech compression makes use of the Linear Prediction Coding (LPC) algorithm to extract the vocal track spectrum. This paper describes a technique that splits the spectrum into two equal halves and performs a piecewise LPC approximation to each half. By taking advantage of the classical benefits of piecewise approximation, the fidelity is expected to be higher than standard LPC. In addition, by making use of under-sampling and spectrum folding, computational requirements are reduced by about 40%. PLPC has been implemented in real time on the CSP-30 computer at the Speech Research and Development Facility of the Communications Security Engineering Office (DCW) at ESD.

...read moreread less

Journal Article•DOI•

Some Figures of Performance and the Apparatus for a Digital Talking Typewriter for the Blind-SPELLEX

[...]

Michael P. Beddoes¹, Ching Y. Suen²•Institutions (2)

University of British Columbia¹, Concordia University²

01 Jan 1976-IEEE Transactions on Biomedical Engineering

TL;DR: Spelled Speech can be used as feedback by a blind typist to monitor her typing and correct her typing mistakes.

...read moreread less

Abstract: Spelled Speech can be used as feedback by a blind typist to monitor her typing and correct her typing mistakes. The speech can be produced using a computer or using a small portable digital apparatus.

...read moreread less

Proceedings Article•DOI•

Effect of noise and distortion in speech on parametric extraction

[...]

B. Yegnanarayana¹•Institutions (1)

Indian Institute of Science¹

01 Apr 1976

TL;DR: Characteristics of common sources of noise and distortion are described in this paper and their effect in shaping the spectrum of speech is discussed.

...read moreread less

Abstract: Parameter or feature extraction from speech signal forms the basis for systems designed for speech recognition, speaker verification, speech bandwidth compression etc. The parameters in general are critically dependent upon the short-time spectrum of speech. The input speech waveform is however, subjected to several types of noises and distortions due to background noise sources, reverberation, close speaking into a microphone, telephone system imperfections etc. These factors modify the spectrum of the speech signal and hence the parameters extracted. Characteristics of common sources of noise and distortion are described in this paper and their effect in shaping the spectrum of speech is discussed. Steps to reduce the influence of some noises while producing speech input to a system are suggested. Methods of normalization of spectral distortions due to noise and the effect of such normalization on parametric extraction are also discussed.

...read moreread less

Proceedings Article•DOI•

A new method for accurate analysis of voiced speech

[...]

A. Holden¹, Y. Gulut•Institutions (1)

University of Washington¹

01 Apr 1976

TL;DR: This research has resulted in the development of a new pitch-synchronous analysis technique for the extraction of accurate formant information from speech signals that is an improvement over current methods of analysis in terms of accuracy and temporal resolution.

...read moreread less

Abstract: This research has resulted in the development of a new pitch-synchronous analysis technique for the extraction of accurate formant information from speech signals. The method is an improvement over current methods of analysis in terms of accuracy and temporal resolution. This is achieved by extension of the signal from one pitch period into the next, using a speech production model based on linear prediction. The result is higher accuracy in the determination of formant frequencies, bandwidths and amplitudes, and the ability to follow rapid formant transitions. The method performs equally well with nasal and high pitched sounds. The method is applied to the speech recognition and the speaker identification problems.

...read moreread less

Proceedings Article•DOI•

A pitch compensating quantizer

[...]

D. Cohn¹, J. Melsa•Institutions (1)

University of Notre Dame¹

01 Apr 1976

TL;DR: A new adaptive quantizer for speech digitization has been derived that adjusts its dynamic range to match that of the speech waveform and further adjusts its range to compensate for the increased signal strength that follows a pitch pulse.

...read moreread less

Abstract: A new adaptive quantizer for speech digitization has been derived. It is similar to known adaptive quantizers in that it adjusts its dynamic range to match that of the speech waveform. In addition, it further adjusts its range to compensate for the increased signal strength that follows a pitch pulse. The new quantizer bases its adaptation on its own output and no side information is required. When combined with a variable length source coding scheme, the new quantizer offers a significant improvement in signal-to-noise ratio and in subjective speech quality. The technique is applicable to a broad range of digitization methods including adaptive delta modulation and various forms of ADPCM.

...read moreread less

Proceedings Article•DOI•

Word spotting in continuous speech using linear predictive coding

[...]

R. Christiansen¹, C. Rushforth•Institutions (1)

University of Utah¹

12 Apr 1976

TL;DR: Computational and syntactic information is used to resolve ambiguities and to yield higher—order decisions in automatic speech recognition and understanding systems.

...read moreread less

Abstract: Automatic speech recognition and understanding are currently receiving considerable attention.1 Most approaches to problems in these areas involve rather complicated systems. Typically, the acoustic waveform is first segmented into units such as phonemes or syl— lables. Semantic and syntactic information is then used to resolve ambiguities and to yield higher—order decisions. This complexity is probably necessary if the most general speech—recognition problems are to be solved.

...read moreread less

Proceedings Article•DOI•

An application of the linear prediction technique to efficient coding of speech segments

[...]

G. Mian¹, F. Morgantini, C. Offelli•Institutions (1)

University of Padua¹

12 Apr 1976

TL;DR: The quantitative rules obtained for generating the SSRU's are expected to be useful, at least as a preliminary investigation tool, for synthesis-by-rule.

...read moreread less

Abstract: Summary form only given, as follows. The paper deals with the application of linear prediction technique to the speech synthesis of both italian and german languages by Standard Speech Reproducing Units (SSRU), it is by combining elementary speech segments of standardized charac teristics extracted fron utterances of native speakers. The nain feature of the method presented is the possibility of synthesizing in a higly intelligible form any nessage of such languages with a very limited amount of data. So far the use of linear predictive coding of the previously realized SSRU sets allowed a memory occupation less than 16 kb for the synthesis of italian language and less than 32 k-bytes for the combined synthesis of italian and german languages. The data flow rate is about 1 kb/s. A key property of the code with respect to methods previously used (i.e. simple concatenation of original segments ) relies in the possibility of greatly enhancing the naturalness of the synthesized speech by varying pitch, amplitude and duration of the synthetic segments. Further, the quantitative rules obtained for generating the SSRU's are expected to be useful, at least as a preliminary investigation tool, for synthesis-by-rule.

...read moreread less

Proceedings Article•DOI•

Uses of higher level knowledge in a speech understanding system: A progress report

[...]

William A. Woods¹, M. A. Bates, G. Brown, Bertram C. Bruce, J. W. Klovstad, B. L. Nash-Webber - Show less +2 more•Institutions (1)

BBN Technologies¹

12 Apr 1976

Journal Article•DOI•

Real-time areagraph of continuous speech for analysis and speech training

[...]

Frank Fallside¹, Susan Brooks¹•Institutions (1)

University of Cambridge¹

30 Sep 1976-Electronics Letters

TL;DR: As an alternative to the spectrograph technique for speech analysis, an areagraph technique is presented in which the instantaneous vocal-tract area function is plotted against time with distance along the tract as the y-ordinate and area denoted by intensity modulation.

...read moreread less

Abstract: As an alternative to the spectrograph technique for speech analysis, an areagraph technique is presented in which the instantaneous vocal-tract area function (derived from linear prediction analysis) is plotted against time with distance along the tract as the y-ordinate and area denoted by intensity modulation. Since the display is related to a physical quantity, it has a number of advantages over the spectrograph. An application to speech training is described.

...read moreread less

Proceedings Article•DOI•

Speech synthesis by dyads and automatic intonation processing

[...]

D. Larreur, F. Emerard

12 Apr 1976

TL;DR: It's proving necessary to find a simple set of intonation patterns without taking the complexity of syntactic sentence structuration and the many derived rules into account.

...read moreread less

Abstract: Speech Synthesis by dyads concatenation produces an intelligible speech, but the lack of prosodic features like rythm and intonation gives the speech an unnatural and unpleasant sound. Having regard to the short-dated applied objectives, It's proving necessary to find a simple set of intonation patterns without taking the complexity of syntactic sentence structuration and the many derived rules into account. Intrinsic characteristics of each dyad are stored and a very simplified grammar is used to surimpose them automatically a pitch pattern, function of the following parameters: Type of sentences, End of each kind of syntagms, words boundaries, and words position inside a sentence.

...read moreread less

Journal Article•DOI•

Subjective effects of anomalies in packetized speech transmission

[...]

J. W. Forgie

01 Nov 1976-Journal of the Acoustical Society of America

TL;DR: A series of listening and communicability tests has been undertaken using speech in which anomaly effects have been introduced by simulation techniques, introduced at controlled rates in simulated networks using a variety of speech encoding techniques and packetization strategies.

...read moreread less

Abstract: When speech is transmitted in a packet‐switched network the variability in packet delays inherent in such a net tends to produce occasional anomalies or “glitches” in the output speech when packets fail to arrive at the destination in a timely fashion. While the frequency of occurrence of these anomalies can be minimized at the expense of buffering and increased overall speech delay, it is likely that a practical network design would represent a compromise which allowed some degradation of the output speech under worst case load conditions. To provide some basic data on the subjective effects of such anomalies a series of listening and communicability tests has been undertaken using speech in which anomaly effects have been introduced by simulation techniques. Anomalies resulting from packet losses due to delay dispersion as well as variation in average delay are introduced at controlled rates in simulated networks using a variety of speech encoding techniques and packetization strategies. Preliminary tes...

...read moreread less