Showing papers on "Voice activity detection published in 1976"

PDF

Open Access

Journal Article•DOI•

Continuous speech recognition by statistical methods

[...]

01 Apr 1976

TL;DR: Experimental results are presented that indicate the power of the methods and concern modeling of a speaker and of an acoustic processor, extraction of the models' statistical parameters and hypothesis search procedures and likelihood computations of linguistic decoding.

...read moreread less

Abstract: Statistical methods useful in automatic recognition of continuous speech are described. They concern modeling of a speaker and of an acoustic processor, extraction of the models' statistical parameters and hypothesis search procedures and likelihood computations of linguistic decoding. Experimental results are presented that indicate the power of the methods.

...read moreread less

1,024 citations

Journal Article•DOI•

Automatic recognition of speakers from their voices

[...]

B.S. Atal¹•Institutions (1)

Bell Labs¹

01 Apr 1976

TL;DR: The paper indudes a discussion of the speaker-dependent properties of the speech signal, methods for selecting an efficient set of speech measurements, results of experimental studies illustrating the performance of various methods of speaker recognition, and a comparision of theperformance of automatic methods with that of human listeners.

...read moreread less

Abstract: This paper presents a survey of automatic speaker recognition techniques. The paper indudes a discussion of the speaker-dependent properties of the speech signal, methods for selecting an efficient set of speech measurements, results of experimental studies illustrating the performance of various methods of speaker recognition, and a comparision of the performance of automatic methods with that of human listeners. Both text-dependent as well as text-independent speaker-recognition techniques are discussed.

...read moreread less

420 citations

Journal Article•DOI•

Separation of speech from interfering speech by means of harmonic selection

[...]

Thomas W. Parsons

01 Oct 1976-Journal of the Acoustical Society of America

TL;DR: In this paper, the harmonics of the desired voice in the Fourier transform of the input were selected to distinguish between two different voices. But the authors focus on the principal subproblem, the separation of vocalic speech.

...read moreread less

Abstract: A common type of interference in speech transmission is that caused by the speech of a competing talker. Although the brain is adept at clarifying such speech, it relies heavily on binaural data. When voices interfere over a single channel, separation is much more difficult and intelligibility suffers. Clarifying such speech is a complex and varied problem whose nature changes with the moment‐to‐moment variation in the types of sound which interfere. This paper describes an attack on the principal subproblem, the separation of vocalic speech. Separation is done by selecting the harmonics of the desired voice in the Fourier transform of the input. In implementing this process, techniques have been developed for resolving overlapping spectrum components, for determining pitches of both talkers, and for assuring consistent separation. These techniques are described, their performance on test utterances is summarized, and the possibility of using this process as a basis for the solution of the general two‐tal...

...read moreread less

294 citations

Journal Article•DOI•

Continuous speech recognition via centisecond acoustic states

[...]

R. Bakis

01 Apr 1976-Journal of the Acoustical Society of America

TL;DR: When trained to the voice of a particular speaker, the decoder recognized seven‐digit telephone numbers correctly 96% of the time, with a better than 99% per‐digit accuracy.

...read moreread less

Abstract: Continuous speech was treated as if produced by a finite‐state machine making a transition every centisecond. The observable output from state transitions was considered to be a power spectrum—a probabilistic function of the target state of each transition. Using this model, observed sequences of power spectra from real speech were decoded as sequences of acoustic states by means of the Viterbi trellis algorithm. The finite‐state machine used as a representation of the speech source was composed of machines representing words, combined according to a “language model.” When trained to the voice of a particular speaker, the decoder recognized seven‐digit telephone numbers correctly 96% of the time, with a better than 99% per‐digit accuracy. Results for other tests of the system, including syllable and phoneme recognition, will also be given.

...read moreread less

208 citations

Journal Article•DOI•

Practical applications of voice input to machines

[...]

T.B. Martin

01 Apr 1976

TL;DR: Future developments in both new applications and increased capability voice input systems can be expected to considerably expand the usage of this form of man-machine communications.

...read moreread less

Abstract: Voice input to machine is the most natural form of man-machine communications. In this type of system the machine responds to the mode of communications preferred by the user, rather than vice versa. Many practical applications exist today for limited capability voice input systems. The first operational voice input systems have taken place with limited vocabulary, isolated word voice input systems. Most of these initial systems were for industrial applications in which the users' hands or eyes were already busy with their normal work requirements. Future developments in both new applications and increased capability voice input systems can be expected to considerably expand the usage of this form of man-machine communications.

...read moreread less

133 citations

Journal Article•DOI•

The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression

[...]

R. Niederjohn¹, J. Grotelueschen•Institutions (1)

Marquette University¹

01 Aug 1976-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: It is shown that this new method resuits in a substantial improvement in the intelligibility of speech in white noise over normal speech and over previously implemented methods.

...read moreread less

Abstract: This paper presents the results of an examination of rapid amplitude compression following high-pass filtering as a method for processing speech, prior to reception by the listener, as a means of enhancing the intelligibility of speech in high noise levels. Arguments supporting this particular signal processing method are based on the results of previous perceptual studies of speech in noise. In these previous studies, it has been shown that high-pass filtered/clipped speech offers a significant gain in the intelligibility of speech in white noise over that for unprocessed speech at the same signal-to-noise ratios. Similar results have also been obtained for speech processed by high-pass filtering alone. The present paper explores these effects and it proposes the use of high-pass filtering followed by rapid amplitude compression as a signal processing method for enhancing the intelligibility of speech in noise. It is shown that this new method resuits in a substantial improvement in the intelligibility of speech in white noise over normal speech and over previously implemented methods.

...read moreread less

131 citations

Journal Article•DOI•

Synthesis of speech from unrestricted text

[...]

J. Allen¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Apr 1976

TL;DR: The resulting system serves as a model for the cognitive process of reading aloud, and also as a stable practical means for providing speech output in a broad class of computer-based systems.

...read moreread less

Abstract: For many applications, it is desirable to be able to convert arbitrary English text to natural and intelligible sounding speech. This transformation between two surface forms is facilitated by first obtaining the common underlying abstract linguistic representation which relates to both text and speech surface representations. Calculation of these abstract bases then permits proper selection of phonetic segments, lexical stress, juncture, and sentence-level stress and intonation. The resulting system serves as a model for the cognitive process of reading aloud, and also as a stable practical means for providing speech output in a broad class of computer-based systems.

...read moreread less

116 citations

Patent•

Digital speech detector

[...]

Robert E. LaMarche¹, Carl J. May¹, Timothy James Zebo¹•Institutions (1)

Bell Labs¹

17 Aug 1976

TL;DR: In this paper, the authors improved the detection sensitivity and noise rejection of an arrangement for detecting speech in the presence of noise by accumulating the weighted differences between input signal samples and their short-term running average.

...read moreread less

Abstract: The detection sensitivity and noise rejection of an arrangement for detecting speech in the presence of noise is improved by accumulating the weighted differences between input signal samples and their short-term running average. The detector thus tracks ambient noise, providing an adaptive detection threshold such that detection sensitivity is increased in low noise environments without excessive false operation on high level noise. The peak average attained during an interval of speech is used to provide variable hangover upon cessation of speech, yielding greater hangover for weak talkers than for loud talkers. In an illustrative embodiment of the speech detector used in a speech interpolation system, protection is afforded also against false transmission path operation due to detection of speech echo.

...read moreread less

67 citations

Journal Article•DOI•

A pitch-synchronous digital feature extraction system for phonemic recognition of speech

[...]

W. Hess¹•Institutions (1)

Technische Universität München¹

01 Feb 1976-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The system described in this paper is subdivided into three main steps: pitch extraction, segmentation, and formant analysis, which uses an adaptive digital filter in time-domain transforming the speech signal into a signal similar to the glottal waveform.

...read moreread less

Abstract: The system described in this paper is subdivided into three main steps: pitch extraction, segmentation, and formant analysis. The pitch extractor uses an adaptive digital filter in time-domain transforming the speech signal into a signal similar to the glottal waveform. Using the levels of the speech signal and the differenced signal as parameters in time domain, the subsequent segmentation algorithm derives a signal parameter which describes the speed of articulatory movement. From this, the signal is divided into "stationary" and "'transitional" segments; one stationary segment is associated to one phoneme. For the formant tracking procedure, a subset of the pitch periods is selected by the segmentation algorithm and is transformed into frequency domain. The formant tracking algorithm uses a maximum detection strategy and continuity criteria for adjacent spectra. After this step, the total parameter set is offered to an adaptive universal pattern classifier which is trained by selected material before working. For stationary phonemes, the recognition rate is about 85 percent when training material and test material are uttered by the same speaker. The recognition rate is increased to about 90 percent when segmentation results are used.

...read moreread less

47 citations

Proceedings Article•DOI•

Preliminary results on the performance of a system for the automatic recognition of continuous speech

[...]

Lalit R. Bahl¹, J. Baker, Paul S. Cohen, N. Dixon, Frederick Jelinek, Robert Leroy Mercer, Harvey F. Silverman - Show less +3 more•Institutions (1)

IBM¹

01 Apr 1976

TL;DR: This report presents results obtained in some experiments on the computer recognition of continuous speech with two simple languages having vocabularies of 11 and 250 words.

...read moreread less

Abstract: This report presents results obtained in some experiments on the computer recognition of continuous speech. The experiments deal with two simple languages having vocabularies of 11 and 250 words.

...read moreread less

36 citations

Proceedings Article•DOI•

Speech processing by splicing of autocorrelation function

[...]

J. Suzuki

01 Apr 1976

TL;DR: A speech processing system named SPAC (SPlicing of AutoCorrelation function) is proposed in order to compress or expand the speech spectrum, to prolong or shorten the duration of utterance, and to reduce the noise level in speech signal.

...read moreread less

Abstract: A speech processing system named SPAC (SPlicing of AutoCorrelation function) is proposed in order to compress or expand the speech spectrum, to prolong or shorten the duration of utterance, and to reduce the noise level in speech signal. A period of short-time autocorrelation function is sampled and spliced after change of the time scale. Transformed speech is quite natural and free from distortion. Applications of SPAC are expected in many fields such as improvement of speech quality, narrow band transmission, communication aid for hard of hearing, information service for blind, unscrambling of helium speech, stenography and so on.

...read moreread less

Journal Article•DOI•

Real-time speech synthesis

[...]

Michael M. Cohen¹, Dominic W. Massaro¹•Institutions (1)

University of Wisconsin-Madison¹

01 Mar 1976-Behavior Research Methods

TL;DR: How a speech synthesizer can be controlled by a small computer in real time and the properties of the synthesizer and the control program are described along with an example of the speech synthesis.

...read moreread less

Abstract: This paper describes how a speech synthesizer can be controlled by a small computer in real time. The synthesizer allows precise control of the speech output that is necessary for experimental purposes. The control information is computed in real time during synthesis in order to reduce data storage. The properties of the synthesizer and the control program are prsented along with an example of the speech synthesis.

...read moreread less

Patent•

Method and apparatus for detecting the presence of a speech signal on a voice channel signal

[...]

Federico Vagliani, Alcide Molinari

24 Jun 1976

TL;DR: In this article, a system and method for detecting the presence of useful speech information in telephone voice channels capable of containing noise as well as such useful information for optimizing the telephone transmission of such speech information is presented.

...read moreread less

Abstract: A system and method for detecting the presence of useful speech information in telephone voice channels capable of containing noise as well as such useful speech information for optimizing the telephone transmission of such speech information. Two segments of the envelope of a given voice channel are compared against each other over two different time domains in order to determine if a predetermined magnitude of difference exists between these envelopes. The presence of such magnitude of difference is indicative of the presence of such useful speech information in the voice channel thereby enabling transmission thereof by the system, whereas the absence of such magnitude of difference is indicative of the presence of solely noise thereby preventing the transmission thereof by the system.

...read moreread less

Proceedings Article•DOI•

Organization and operation of a connected speech understanding system at lexical, syntactic and semantic levels

[...]

J.-P. Haton¹, J.-M. Pierrel•Institutions (1)

Nancy-Université¹

01 Apr 1976

TL;DR: This paper describes a connected speech understanding system being implemented in Nancy made up of an acoustic recognizer which gives a string of phoneme-like segments from a spoken sentence, a syntactic parser which controls the recognition process, a word recognizer working on words predicted by the parser and a dialog procedure which takes in account semantic constraints in order to avoid some of the errors and ambiguities.

...read moreread less

Abstract: This paper describes a connected speech understanding system being implemented in Nancy, thanks to the work done in automatic speech recognition since 1968. This system is made up of four parts : an acoustic recognizer which gives a string of phoneme-like segments from a spoken sentence, a syntactic parser which controls the recognition process, a word recognizer working on words predicted by the parser and a dialog procedure which takes in account semantic constraints in order to avoid some of the errors and ambiguities. Some original features of the system are pointed out : modularily (e.g. the language used is considered as a parameter), possibility of processing slightly syntactically incorrect sentences, ... The application both in data management and in oral control of a telephone center has given very promising results. Work is in progress for generalizing our model : extension of the vocabulary and of the grammar, multi-speaker operation, etc.

...read moreread less

Patent•

Method and apparatus for speech detection of PCM multiplexed voice channels

[...]

Jean-Pierre Adoul¹, Fouad Daaboul¹•Institutions (1)

Université de Sherbrooke¹

10 May 1976

TL;DR: In this paper, a method and a system for speech detection on PCM multiplexed voice channels is described, where a decision is reached every M samples regarding the channel activity.

...read moreread less

Abstract: The disclosure herein describes a method and a system for speech detection on PCM multiplexed voice channels; for each channel, a decision is reached every M samples regarding the channel activity; in addition, the nature of speech is detected as: voiced (compact or non-compact) or unvoiced (fricative or non-fricative) when the channel is active; pure silence, white noise or echo when the channel is inactive. The decision is based on the joint value of the amplitude, zero crossing of the signal and zero crossing of the signal derivative.

...read moreread less

Patent•

Method and installation for masked speech transmission over a telephone channel

[...]

Schmid Pierre, Eduard Brunner, Stofer Walter

20 May 1976

TL;DR: In this paper, a method and an installation for masked or scrambled speech transmission utilize a time-scrambling unit for dividing the speech band into at least two sub-bands, for delaying the one sub-band with respect to the other, and for forming an aggregate signal, and a frequency-scambling unit is used to divide the aggregate signal into two second subbands of variable bandwidth, for their cyclic interchanging, for forming a transmission signal capable of being transmitted over a transmission channel, in order to mask not only the sound character of the speech signals but

...read moreread less

Abstract: A method and an installation for masked or scrambled speech transmission utilize a time-scrambling unit for dividing the speech band into at least two sub-bands, for delaying the one sub-band with respect to the other, and for forming an aggregate signal, and a frequency-scrambling unit for dividing the aggregate signal into at least two second sub-bands of variable band-width, for their cyclic interchanging, and for forming a transmission signal capable of being transmitted over a transmission channel, in order to mask not only the sound character of the speech signals but also the speech rhythm, thus ensuring increased privacy of transmission with high code-changing speed and low sensitivity to distortion.

...read moreread less

Proceedings Article•DOI•

Word verification in a speech understanding system

[...]

C. Cook¹•Institutions (1)

BBN Technologies¹

01 Apr 1976

TL;DR: Verification offers an alternative strategy by doing a top-down parametric word match independent of segmentation and labeling, which results in a distance measure between the reference parameterization of a hypothesized word and the computed parameterizations of the real speech.

...read moreread less

Abstract: If, in a speech understanding system, word matching is performed at the phonetic level, then the accurate determination of the locations and identities of words present in an unknown utterance is necessarily limited by the phonetic segmentation and labeling. Verification offers an alternative strategy by doing a top-down parametric word match independent of segmentation and labeling. The result is a distance measure between the reference parameterization of a hypothesized word and the computed parameterization of the real speech. This distance is interpreted as the likelihood of that word having actually occurred over a given portion of the utterance.

...read moreread less

Robust speech processing

[...]

Bernard Gold

27 Jan 1976

TL;DR: Relatively little effort has been expended toward designing low data rate speech processing devices which can operate in difficult environments, but problems addressed include that of good beahvior for a wide variety of speakers.

...read moreread less

Abstract: : Relatively little effort has been expended toward designing low data rate speech processing devices which can operate in difficult environments. The particular problems addressed include that of good beahvior for a wide variety of speakers, with tandeming and conferencing configurations, in the presence of jamming and/or background noise and with telephone speech as input. (Author)

...read moreread less

Proceedings Article•DOI•

Speech recognition in the question-answering system operated by conversational speech

[...]

M. Kohda, R. Nakatsu¹, K. Shikano¹•Institutions (1)

Nippon Telegraph and Telephone¹

12 Apr 1976

TL;DR: The voice-operated question-answering system for seat reservation is constructed by computer simulation technique and the promising results are obtained.

...read moreread less

Abstract: The speech recognition system composing a part of the question-answering system operated by conversational speech is described. The recognition system consists of two stages of process : acoustic processing stage and linguistic processing stage. In the acoustic processing stage, input speech is analyzed and transformed into the phoneme sequence which usually contains ambiguities and errors caused in the segmentation and phoneme recognition. In the linguistic processing stage, the phoneme sequence containing ambiguities and errors is converted into the correct word sequence by the use of the linguistic knowledge such as phoneme rewriting rules, lexicon, syntax, semantics and pragmatics. The voice-operated question-answering system for seat reservation is constructed by computer simulation technique and the promising results are obtained.

...read moreread less

Proceedings Article•DOI•

600 bps voice digitizer

[...]

G.S. Kang¹, D. Coulter•Institutions (1)

United States Naval Research Laboratory¹

12 Apr 1976

TL;DR: An analysis/synthesis method whereby speech may be transmitted at 600 bps, a data rate which is less than 1 percent of the PCM transmission rate for original speech sounds, which is enough to permit the use of the system in certain specialized military applications.

...read moreread less

Abstract: This paper presents an analysis/synthesis method whereby speech may be transmitted at 600 bps, a data rate which is less than 1 percent of the PCM transmission rate for original speech sounds. This R&D effort was motivated by the pressing need for very-low-data rate (VLDR) voice digitizers to meet some of the current military voice communication requirements. The use of a VLDR voice digitizer makes it possible to transmit speech signals over adverse channels which support data rates of only a few hundred bps, or to transmit speech signals over more favorable channels with redundancies for error protection and other useful applications. The 600 bps synthesized speech loses some of its original speech quality, but the intelligibility is sufficiently high to permit the use of the system in certain specialized military applications. One of the most attractive features of the 600 bps voice digitizer is that it is a simple extension of the 2400 bps linear predictive encoder (LPE) which has been under intensive investigation by various government agencies, including the Navy, and is presently entering advanced development. In essence, the 600 bps voice digitizer is a combination of an LPE and a format vocoder, which is realized by adding a processor to the existing 2400 bps LPE. This add-on processor converts the 2400 bps speech data to 600 bps speech data at the transmitter, and reconverts the data to 2400 bps at the receiver.

...read moreread less

Journal Article•DOI•

A hearing aid for subjects with extreme high-frequency losses

[...]

S. Knorr¹•Institutions (1)

University of California, Los Angeles¹

01 Dec 1976-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A speech processing system has been developed in which the unvoiced portion of speech is bandwidth compressed from an original bandwidth of 4000 Hz into a low-frequency band not exceeding 1000 Hz, in which hearing impaired subjects with severe high-frequency hearing losses still possess some residual speech perception.

...read moreread less

Abstract: A speech processing system has been developed in which the unvoiced portion of speech is bandwidth compressed from an original bandwidth of 4000 Hz into a low-frequency band not exceeding 1000 Hz, in which hearing impaired subjects with severe high-frequency hearing losses still possess some residual speech perception. The basic compression operation is based upon a time-domain time expansion technique, and the resulting reduction in bandwidth is accomplished without relinquishing the essential information contained in unvoiced speech. Thus, subjects are able again to perceive unvoiced speech of fair intelligibility where conventional hearing aids normally fail to be of any assistance. The imposition of stringent operating requirements such as portability, real-time operation, and functionality in a real-listening environment composed of many competing speech and noise sources, eliminated numerous elegant speech processing approaches.

...read moreread less

Network Speech. System Implications of Packetized Speech

[...]

James W Forgie

30 Sep 1976

TL;DR: Test results indicate that packet-system speech quality varies from essentially perfect to unusable, and guidelines are provided for an acceptable packetized speech communication system.

...read moreread less

Abstract: : This paper reports on the effects--examined in parametric fashion--on the overall voice quality, acceptability, and communicability of speech packetization and its transmission through a packet-switched network. Speech processed through a number of real-time simulation programs developed to create anticipated anomalies (glitches) in packet speech systems were evaluated by informal acceptability testing. Depending on system design parameters, test results indicate that packet-system speech quality varies from essentially perfect (no packet-related anomalies) to unusable. Guidelines are provided for an acceptable packetized speech communication system.

...read moreread less

Journal Article•DOI•

Detecting the Presence of Speech Using ADPCM Coding

[...]

R. Schafer¹, K. Jackson, J. Dubnowski, Lawrence R. Rabiner•Institutions (1)

Georgia Institute of Technology¹

01 May 1976-IEEE Transactions on Communications

TL;DR: A simple algorithm for locating the beginning and end of a speech utterance has been developed that has been tested in computer simulations and has been constructed with standard integrated circuit technology.

...read moreread less

Abstract: When speech is coded using a differential pulse-code modulation system with an adaptive quantizer, the digital code words exhibit considerable variation among all quantization levels during both voiced and unvoiced speech intervals. However, because of limits on the range of step sizes, during silent intervals the code words vary only slightly among the smallest quantization steps. Based on this principle, a simple algorithm for locating the beginning and end of a speech utterance has been developed. This algorithm has been tested in computer simulations and has been constructed with standard integrated circuit technology.

...read moreread less

Proceedings Article•DOI•

The 1976 modular acoustic processor (MAP) : Diadic segment classification and final phonemic string estimation

[...]

Harvey F. Silverman¹, N. Dixon•Institutions (1)

IBM¹

01 Apr 1976

TL;DR: The problems concerning the diadic segment classification and final string estimation are discussed, and the current solutions given.

...read moreread less

Abstract: The Modular Acoustic Processor (MAP), a complex experimental system for automatic derivation of phonemic string output for continuous speech, was first described in April 1974. Many of the new concepts currently in MAP are described. In particular, the problems concerning the diadic segment classification and final string estimation are discussed, and the current solutions given. Results on a large body of continuous speech data, prepared by an automatic evaluation system will also be presented.

...read moreread less

Proceedings Article•DOI•

A new method for accurate analysis of voiced speech

[...]

A. Holden¹, Y. Gulut•Institutions (1)

University of Washington¹

01 Apr 1976

TL;DR: This research has resulted in the development of a new pitch-synchronous analysis technique for the extraction of accurate formant information from speech signals that is an improvement over current methods of analysis in terms of accuracy and temporal resolution.

...read moreread less

Abstract: This research has resulted in the development of a new pitch-synchronous analysis technique for the extraction of accurate formant information from speech signals. The method is an improvement over current methods of analysis in terms of accuracy and temporal resolution. This is achieved by extension of the signal from one pitch period into the next, using a speech production model based on linear prediction. The result is higher accuracy in the determination of formant frequencies, bandwidths and amplitudes, and the ability to follow rapid formant transitions. The method performs equally well with nasal and high pitched sounds. The method is applied to the speech recognition and the speaker identification problems.

...read moreread less

Proceedings Article•DOI•

Word spotting in continuous speech using linear predictive coding

[...]

R. Christiansen¹, C. Rushforth•Institutions (1)

University of Utah¹

12 Apr 1976

TL;DR: Computational and syntactic information is used to resolve ambiguities and to yield higher—order decisions in automatic speech recognition and understanding systems.

...read moreread less

Abstract: Automatic speech recognition and understanding are currently receiving considerable attention.1 Most approaches to problems in these areas involve rather complicated systems. Typically, the acoustic waveform is first segmented into units such as phonemes or syl— lables. Semantic and syntactic information is then used to resolve ambiguities and to yield higher—order decisions. This complexity is probably necessary if the most general speech—recognition problems are to be solved.

...read moreread less

Proceedings Article•DOI•

An application of the linear prediction technique to efficient coding of speech segments

[...]

G. Mian¹, F. Morgantini, C. Offelli•Institutions (1)

University of Padua¹

12 Apr 1976

TL;DR: The quantitative rules obtained for generating the SSRU's are expected to be useful, at least as a preliminary investigation tool, for synthesis-by-rule.

...read moreread less

Abstract: Summary form only given, as follows. The paper deals with the application of linear prediction technique to the speech synthesis of both italian and german languages by Standard Speech Reproducing Units (SSRU), it is by combining elementary speech segments of standardized charac teristics extracted fron utterances of native speakers. The nain feature of the method presented is the possibility of synthesizing in a higly intelligible form any nessage of such languages with a very limited amount of data. So far the use of linear predictive coding of the previously realized SSRU sets allowed a memory occupation less than 16 kb for the synthesis of italian language and less than 32 k-bytes for the combined synthesis of italian and german languages. The data flow rate is about 1 kb/s. A key property of the code with respect to methods previously used (i.e. simple concatenation of original segments ) relies in the possibility of greatly enhancing the naturalness of the synthesized speech by varying pitch, amplitude and duration of the synthetic segments. Further, the quantitative rules obtained for generating the SSRU's are expected to be useful, at least as a preliminary investigation tool, for synthesis-by-rule.

...read moreread less

Proceedings Article•DOI•

Uses of higher level knowledge in a speech understanding system: A progress report

[...]

William A. Woods¹, M. A. Bates, G. Brown, Bertram C. Bruce, J. W. Klovstad, B. L. Nash-Webber - Show less +2 more•Institutions (1)

BBN Technologies¹

12 Apr 1976

Proceedings Article•DOI•

Speech segmentation and feature normalization based on area functions

[...]

H. Kasuya, H. Wakita

12 Apr 1976

TL;DR: Algorithms for segmenting speech sounds into vowel-like and nonvowellike segments, and then for identifying vowels and detecting nasal segments, turbulence noise segments, etc. are described, together with an algorithm for feature normalization.

...read moreread less

Abstract: This paper presents a new approach to automatic segmentation and feature normalization of connected speech based on area functions. Algorithms for segmenting speech sounds into vowel-like and nonvowellike segments, and then for identifying vowels and detecting nasal segments, turbulence noise segments, etc. are described, together with an algorithm for feature normalization. Fairly reasonable results were obtained with seven sentences spoken by two male speakers and a female speaker.

...read moreread less

Journal Article•DOI•

An approach towards a synthesis-based speech recognition system

[...]

R. Thosar, P. Rao

01 Apr 1976-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: Results are presented of experiments with a recognition scheme intended for continuous speech that utilizes information about interphoneme contextual effects contained in formant transitions and employs internal trial synthesis and feedback comparison as a means for recognition.

...read moreread less

Abstract: Preliminary results are presented of experiments with a recognition scheme intended for continuous speech. The scheme utilizes information about interphoneme contextual effects contained in formant transitions and employs internal trial synthesis and feedback comparison as a means for recognition. The aim is to achieve minimal sensitivity to the appreciable variability which occurs in the speech signal, even for utterances of a single speaker. While the approach outlined here is quite general, it has initially been tried out on vowel-stop-vowel utterances. Recognition scores obtained are encouraging and demonstrate the viability of the approach.

...read moreread less