Showing papers on "Voice activity detection published in 1992"

PDF

Open Access

Journal Article•DOI•

Statistical-model-based speech enhancement systems

[...]

Yariv Ephraim¹•Institutions (1)

01 Oct 1992

TL;DR: A unified statistical approach for the three basic problems of speech enhancement is developed, using composite source models for the signal and noise and a fairly large set of distortion measures.

...read moreread less

Abstract: Since the statistics of the speech signal as well as of the noise are not explicitly available, and the most perceptually meaningful distortion measure is not known, model-based approaches have recently been extensively studied and applied to the three basic problems of speech enhancement: signal estimation from a given sample function of noisy speech, signal coding when only noisy speech is available, and recognition of noisy speech signals in man-machine communication. Research on the model-based approach is integrated and put into perspective with other more traditional approaches for speech enhancement. A unified statistical approach for the three basic problems of speech enhancement is developed, using composite source models for the signal and noise and a fairly large set of distortion measures. >

...read moreread less

383 citations

Patent•DOI•

System and method for improved speech acquisition for hands-free voice telecommunication in a noisy environment

[...]

Sang G. Oh¹, Vishu R. Viswanathan¹•Institutions (1)

Texas Instruments¹

20 Mar 1992-Journal of the Acoustical Society of America

TL;DR: In this paper, a plurality of linearly arrayed sensors to detect spoken input and to output signals in response thereto, a beamformer connected to the sensors to cancel a preselected noise portion of the signals to thereby produce a processed signal, and a speech recognition system to recognize the processed signal and to respond thereto.

...read moreread less

Abstract: Systems and methods for improved speech acquisition are disclosed including a plurality of linearly arrayed sensors to detect spoken input and to output signals in response thereto, a beamformer connected to the sensors to cancel a preselected noise portion of the signals to thereby produce a processed signal, and a speech recognition system to recognize the processed signal and to respond thereto. The beamformer may also include an adaptive filter with enable/disable circuitry for selectively training the adaptive filter a predetermined period of time. A highpass filter may also be used to filter a preselected noise portion of the sensed signals before the signals are forwarded to the beamformer. The speech recognition system may include a speaker independent base which is able to be adapted by a predetermined amount of training by a speaker, and which system includes a voice dialer or a speech coder for telecommunication.

...read moreread less

214 citations

Journal Article•DOI•

Voice activity detection using a periodicity measure

[...]

R. Tucker

01 Aug 1992

TL;DR: A voice activity detector (VAD) that can operate reliably in SNRs down to 0 dB and detect most speech at −5 dB is described, and how robustness to these signals can be achieved with suitable preprocessing and postprocessing is shown.

...read moreread less

Abstract: The paper describes a voice activity detector (VAD) that can operate reliably in SNRs down to 0 dB and detect most speech at −5 dB. The detector applies a least-squares periodicity estimator to the input signal, and triggers when a significant amount of periodicity is found. It does not aim to find the exact talkspurt boundaries and, consequently, is most suited to speech-logging applications where it is easy to include a small margin to allow for any missed speech. The paper discusses the problem of false triggering on nonspeech periodic signals and shows how robustness to these signals can be achieved with suitable preprocessing and postprocessing.

...read moreread less

205 citations

Voice activity detection using a periodicity measure

[...]

Tucker

01 Jan 1992

169 citations

Book•DOI•

Digital Speech Processing

[...]

A. Nejat Ince

01 Jan 1992

152 citations

Patent•

Temporal decorrelation method for robust speaker verification

[...]

Lorin P. Netsch¹, George R. Doddington¹•Institutions (1)

Texas Instruments¹

12 Feb 1992

TL;DR: In this article, a speaker voice verification system uses temporal decorrelation linear transformation and includes a collector for receiving speech inputs from an unknown speaker claiming a specific identity, a word-level speech features calculator operable to use a temporal decor correlation linear transformation for generating wordlevel speech feature vectors from such speech inputs, and a word level speech feature storage for storing word level feature vectors known to belong to a speaker with the specific identity.

...read moreread less

Abstract: A speaker voice verification system uses temporal decorrelation linear transformation and includes a collector for receiving speech inputs from an unknown speaker claiming a specific identity, a word-level speech features calculator operable to use a temporal decorrelation linear transformation for generating word-level speech feature vectors from such speech inputs, word-level speech feature storage for storing word-level speech feature vectors known to belong to a speaker with the specific identity, a word-level speech feature vectors received from the unknown speaker with those received from the word-level speech feature storage, and speaker verification decision circuitry for determining, based on the similarity score, whether the unknown speaker's identity is the same as that claimed The word-level vector scorer further includes concatenation circuitry as well as a word-specific orthogonalizing linear transformer Other systems and methods are also disclosed

...read moreread less

143 citations

Proceedings Article•DOI•

Neural network lipreading system for improved speech recognition

[...]

David G. Stork¹, Gregory J. Wolff¹, Earl Levine•Institutions (1)

Ricoh¹

07 Jun 1992

TL;DR: A modified time-delay neural network (TDNN) has been designed to perform both automatic lipreading (speech reading) in conjunction with acoustic speech recognition in order to improve recognition both in silent environments as well as in the presence of acoustic noise.

...read moreread less

Abstract: A modified time-delay neural network (TDNN) has been designed to perform both automatic lipreading (speech reading) in conjunction with acoustic speech recognition in order to improve recognition both in silent environments as well as in the presence of acoustic noise. The system is far more robust to acoustic noise and verbal distractors than is a system not incorporating visual information. Specifically, in the presence of high-amplitude pink noise, the low recognition rate in the acoustic only system (43%) is raised to 75% by the incorporation of visual information. The system responds to (artificial) conflicting cross-modal patterns in a way closely analogous to the McGurk effect in humans. The power of neural techniques is demonstrated in several difficult domains: pattern recognition; sensory integration; and distributed approaches toward 'rule-based' (linguistic-phonological) processing. >

...read moreread less

129 citations

Patent•DOI•

Method and apparatus for hearing assistance with speech speed control function

[...]

Akira Nakamura, Ryou Ikezawa, Seiyama Nobumasa, Tohru Takagi, Eiichi Miyasaka - Show less +1 more

23 Sep 1992-Journal of the Acoustical Society of America

TL;DR: In this paper, a method and an apparatus for hearing assistance, capable of compensating the lowering of the speech recognition ability related to the deterioration of the auditory sense center, is presented.

...read moreread less

Abstract: A method and an apparatus for hearing assistance, capable of compensating the lowering of the speech recognition ability related to the deterioration of the auditory sense center. The input speech is divided into voiced speech sections, unvoiced speech sections, and silent sections, of which the voiced speech sections and the silent sections are appropriately extended/contracted while the unvoiced speech sections are left unchanged, and then these sections are combined in an identical order as in the input speech, so as to obtain output speech which is easier to listen for a listener with a handicapped hearing ability. Also, only the silent sections other than the punctuation silent sections for pauses due to punctuation between sentences can be contracted and the speech speed for each of the voiced speech sections can be adjusted, and then the adjusted voiced speech sections, the unvoiced speech sections, the punctuation silent sections and the contracted silent sections can be combined in an identical order as in the input speech, in order to realize the real time hearing assistance without extending the speech utterance period.

...read moreread less

128 citations

Patent•

Speech dialogue system for realizing improved communication between user and system

[...]

David Gleaves¹, Yoshifumi Nagata¹, Yoichi Takebayashi¹•Institutions (1)

Toshiba¹

13 Aug 1992

TL;DR: In this paper, a speech input uttered by a human is received by a microphone which outputs microphone output signals, and the speech input received by the microphone is then recognized by a speech recognition unit, and a synthetic speech response appropriate for the input recognized by the speech recognizer is generated and outputted from a loudspeaker to the human.

...read moreread less

Abstract: In the system, a speech input uttered by a human is received by a microphone which outputs microphone output signals. The speech input received by the microphone is then recognized by a speech recognition unit, and a synthetic speech response appropriate for the speech input recognized by the speech recognition unit is generated and outputted from a loudspeaker to the human. In recognizing the speech input, the speech recognition unit receives input signals in which the synthetic speech response, outputted from the loudspeaker and then received by the microphone, is cancelled from the microphone output signals.

...read moreread less

89 citations

Proceedings Article•DOI•

An unsupervised, sequential learning algorithm for the segmentation of speech waveforms with multiple speakers

[...]

M.-H. Siu, G. Yu, H. Gish

23 Mar 1992

TL;DR: The authors present a method for segmenting speech waveforms containing several speakers into utterances, each from one individual, and then identifying each utterance as coming from a specific individual or group of individuals.

...read moreread less

Abstract: The authors present a method for segmenting speech waveforms containing several speakers into utterances, each from one individual, and then identifying each utterance as coming from a specific individual or group of individuals. The procedure is unsupervised in that there is no training set, and sequential in that information obtained in early stages of the process is utilized in later stages. >

...read moreread less

77 citations

Journal Article•DOI•

Techniques for improving the performance of CELP-type speech coders

[...]

I.A. Gerson¹, M.A. Jasiuk¹•Institutions (1)

Motorola¹

01 Jun 1992-IEEE Journal on Selected Areas in Communications

TL;DR: Techniques for improving the performance of CELP (code excited linear prediction)-type speech coders while maintaining reasonable computational complexity are explored and a harmonic noise weighting function is introduced.

...read moreread less

Abstract: Techniques for improving the performance of CELP (code excited linear prediction)-type speech coders while maintaining reasonable computational complexity are explored. A harmonic noise weighting function, which enhances the perceptual quality of the processed speech, is introduced. The combination of harmonic noise weighting and subsample pitch lag resolution significantly improves the coder performance for voiced speech. Strategies for reducing the speech coder's data rate, while maintaining speech quality, are presented. These include a method for efficient encoding of the long-term predictor lags, utilization of multiple gain vector quantizers, and a multimode definition of the speech coder frame. A 5.9-kb/s VSELP speech coder that incorporates these features is described. Complexity reduction techniques which allow the coder to be implemented using a single fixed-point DSP (digital signal processor) are discussed. >

...read moreread less

Proceedings Article•DOI•

Hands-free voice communication in an automobile with a microphone array

[...]

S. Oh¹, Vishu R. Viswanathan¹, P. Papamichalis¹•Institutions (1)

Texas Instruments¹

23 Mar 1992

TL;DR: The authors present the result of their research on developing a hands-free voice communication system with a microphone array for use in an automobile environment, showing that the microphone array is superior to a single microphone.

...read moreread less

Abstract: The authors present the result of their research on developing a hands-free voice communication system with a microphone array for use in an automobile environment. The goal of this research is to develop a speech acquisition and enhancement system so that a speech recognizer can reliably be used inside a noise automobile environment, for digital cellular phone application. Speech data have been collected using a microphone array and a digital audio tape (DAT) recorder inside a real car for several idling and driving conditions, and processed using delay-and-sum and adaptive beamforming algorithms. Performance criteria including signal-to-noise ratio and speech recognition error rate have been evaluated for the processed data. Detailed performance results presented show that the microphone array is superior to a single microphone. >

...read moreread less

Proceedings Article•DOI•

Generalized analysis-by-synthesis coding and its application to pitch prediction

[...]

Willem Bastiaan Kleijn¹, Ravi P. Ramachandran¹, P. Kroon¹•Institutions (1)

Bell Labs¹

23 Mar 1992

TL;DR: The authors discuss the application of generalized analysis-by-synthesis coding to the pitch predictor of a code excited linear predictor (CELP) coder, which makes it possible to transmit the pitch prediction parameters at a much lower rate than conventional approaches, without compromising speech quality.

...read moreread less

Abstract: Many modifications can be applied to a speech signal without changing its perceptual quality. For a particular speech coder, the coding efficiency will differ for distinct modifications. To exploit this, the authors introduced a generalized analysis-by-synthesis procedure. In this procedure, a search is performed over a multitude of modified original signals (on a blockwise basis), and the signal which can be encoded with the least distortion is selected for transmission. At the receiver, a quantized version of this modified original signal is constructed. The authors discuss the application of generalized analysis-by-synthesis coding to the pitch predictor of a code excited linear predictor (CELP) coder. The use of this technique makes it possible to transmit the pitch predictor parameters at a much lower rate than conventional approaches, without compromising speech quality. >

...read moreread less

Proceedings Article•DOI•

Spontaneous speech effects in large vocabulary speech recognition applications

[...]

John Butzberger¹, Hy Murveit¹, Elizabeth Shriberg¹, Patti Price¹•Institutions (1)

SRI International¹

23 Feb 1992

TL;DR: It is concluded that word accuracy can be improved by explicitly modeling spontaneous effects in the recognizer, and by using as much spontaneous speech training data as possible.

...read moreread less

Abstract: We describe three analyses on the effects of spontaneous speech on continuous speech recognition performance. We have found that: (1) spontaneous speech effects significantly degrade recognition performance, (2) fluent spontaneous speech yields word accuracies equivalent to read speech, and (3) using spontaneous speech training data can significantly improve performance for recognizing spontaneous speech. We conclude that word accuracy can be improved by explicitly modeling spontaneous effects in the recognizer, and by using as much spontaneous speech training data as possible. Inclusion of read speech training data, even within the task domain, does not significantly improve performance.

...read moreread less

Proceedings Article•DOI•

Using speech recognition in a personal communications system

[...]

A. Burstein¹, A. Stolzle¹, Robert W. Brodersen¹•Institutions (1)

University of California, Berkeley¹

14 Jun 1992

TL;DR: An overview of speech recognition systems and design strategies for their use in portable communications are given and types of speech recognizers are discussed.

...read moreread less

Abstract: The authors give an overview of speech recognition systems and discuss design strategies for their use in portable communications. State-of-the-art speech recognition systems can recognize continuously spoken speech from a large vocabulary in real time. In the future, portable speech recognition systems will be made possible by advances in integrated circuit technology, by optimizing system architectures, and by exploiting the special features of personal communications systems. Types of speech recognizers are discussed. Current speech recognition systems are outlined. Personal communication systems with speech recognition are discussed. >

...read moreread less

Patent•DOI•

Auditory model for parametrization of speech

[...]

Hynek Hermansky¹, Nelson Morgan¹, Philip D. Kohn¹•Institutions (1)

International Computer Science Institute¹

11 Aug 1992-Journal of the Acoustical Society of America

TL;DR: In this paper, a method and system are provided for alleviating the harmful effects of convolutional distortions of speech, such as the effect of a telecommunication channel, on the performance of an automatic speech recognizer (ASR).

...read moreread less

Abstract: A method and system are provided for alleviating the harmful effects of convolutional distortions of speech, such as the effect of a telecommunication channel, on the performance of an automatic speech recognizer (ASR). The technique is based on the filtering of time trajectories of an auditory-like spectrum derived from the Perceptual Linear Predictive (PLP) method of speech parameter estimation.

...read moreread less

Report•DOI•

Intelligibility and Acceptability Testing for Speech Technology

[...]

Astrid Schmidt-Nielsen

22 May 1992

TL;DR: In this article, a more complete evaluation of speech quality is needed-one that takes into account the many different sources of information that contribute to how we understand speech, and subjective acceptability tests can be used to evaluate voice quality.

...read moreread less

Abstract: : The evaluation of speech intelligibility and acceptability is an important aspect of the use, development, and selection of voice communication devices-telephone systems, digital voice systems, speech synthesis by rule, speech in noise, and the effects of noise stripping. Standard test procedures can provide highly reliable measures of speech intelligibility, and subjective acceptability tests can be used to evaluate voice quality. These tests are often highly correlated with other measures of communication performance and can be used to predict performance in many situations. However, when the speech signal is severely degraded or highly processed. a more complete evaluation of speech quality is needed-one that takes into account the many different sources of information that contribute to how we understand speech.

...read moreread less

Proceedings Article•DOI•

Variable rate voice coding system

[...]

S. Sasaki, M. Miyake

11 Oct 1992

TL;DR: A novel voice compression method which provides significant improvement in transmission efficiency and flexibility for communications systems is described, and the basic scheme involves the use of split vector quantized transform coding in conjunction with pitch prediction to achieve excellent voice quality.

...read moreread less

Abstract: The authors discussed a variable bit rate voice coding system in digital communications networks. A novel voice compression method which provides significant improvement in transmission efficiency and flexibility for communications systems is described. The basic scheme used for the investigations involves the use of split vector quantized (SVQ) transform coding (TC) in conjunction with pitch prediction (PP) to achieve excellent voice quality at rates of 4.8 kb/s and below. The authors describe of the algorithm and its implementation for a variable bit rate voice coding system from 4.8 kb/s to 2.4 kb/s. >

...read moreread less

Vocabulary-independent speech recognition: the Vocind System

[...]

Hsiao-Wuen Hon¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 1992

Proceedings Article•DOI•

Cinematic techniques for speech processing: temporal decomposition and multivariate linear prediction

[...]

C. Montacie, P. Deleglise, Frédéric Bimbot, M.-J. Caraty

23 Mar 1992

TL;DR: Using the original method developed by Laforia, a series of text-independent speaker recognition experiments, characterized by a long-term multivariate auto-regressive modelization, gives first-rate results without using more than one sentence.

...read moreread less

Abstract: Two models, the temporal decomposition and the multivariate linear prediction, of the spectral evolution of speech signals capable of processing some aspects of the speech variability are presented. A series of acoustic-phonetic decoding experiments, characterized by the use of spectral targets of the temporal decomposition techniques and a speaker-dependent mode, gives good results compared to a reference system (i.e., 70% vs. 60% for the first choice). Using the original method developed by Laforia, a series of text-independent speaker recognition experiments, characterized by a long-term multivariate auto-regressive modelization, gives first-rate results (i.e., 98.4% recognition rate for 420 speakers) without using more than one sentence. Taking into account the interpretation of the models, these results show how interesting the cinematic models are for obtaining a reduced variability of the speech signal representation. >

...read moreread less

Patent•DOI•

Speech recognition LSI system including recording/reproduction device

[...]

Motoaki Koyama¹•Institutions (1)

Toshiba¹

30 Sep 1992-Journal of the Acoustical Society of America

TL;DR: In this paper, a speech segment detector is used to detect speech segments and a reference pattern memory for storing reference patterns, and a speech recognition section for comparing the detected speech segment detected by the detector with the reference patterns stored in the Reference Pattern Memory and selecting the reference pattern most similar to that of the speech segment.

...read moreread less

Abstract: A speech recognition LSI system comprises a speech segment detector for detecting a speech segment from a speech segment detected, a reference pattern memory for storing reference patterns, and a speech recognition section for comparing the speech segment detected by the detector with the reference patterns stored in the reference pattern memory and selecting the reference pattern most similar to that of the speech segment. The system further comprises a recording/reproduction device for recording the speech signal and for reproducing only the speech segment the speech segment detector has detected, so that an operator can hear the speech segment.

...read moreread less

Proceedings Article•DOI•

An overview of variable rate speech coding for cellular networks

[...]

Allen Gersho¹, E. Paksoy¹•Institutions (1)

University of California, Santa Barbara¹

25 Jun 1992

TL;DR: Variable rate speech coding is a critical system component for achieving very high capacity in future generation multiple access systems for cellular networks and TDMA can also be designed to benefit from voice activity patterns.

...read moreread less

Abstract: Variable rate speech coding is a critical system component for achieving very high capacity in future generation multiple access systems for cellular networks. A significant capacity gain comes from exploitation of the large fraction of the time during which a speaker is idle in a two-way conversation. Additional capacity gain can also be achieved by exploiting the time-varying entropy of active speech. While CDMA and packet-based multiple access systems, e.g. PRMA, are naturally suited for variable rate coding. TDMA can also be designed to benefit from voice activity patterns. >

...read moreread less

Patent•DOI•

Speech detection in presence of noise by determining variance over time of frequency band limited energy

[...]

Benjamin K. Reaves¹•Institutions (1)

Panasonic¹

05 Oct 1992-Journal of the Acoustical Society of America

TL;DR: The device detects the beginning and ending portions of speech contained within an input signal based on the variance of frequency band limited energy within the signal.

...read moreread less

Abstract: The device detects the beginning and ending portions of speech contained within an input signal based on the variance of frequency band limited energy within the signal. The use of the variance allows detection which is relatively independent of an absolute signal-to-noise ratio with the signal, and allows accurate detection within a wide variety of backgrounds such as music, motor noise, and background noise, such as other speakers. The device can be easily implemented using off-the-shelf hardware along with a high-speed special purpose digital signal processor integrated circuit.

...read moreread less

Journal Article•DOI•

Lexical mediation between sight and sound in speechreading.

[...]

Bruno H. Repp¹, Ram Frost², Elizabeth C. Zsiga¹•Institutions (2)

Haskins Laboratories¹, Hebrew University of Jerusalem²

13 May 1992-Quarterly Journal of Experimental Psychology

TL;DR: These results, which resemble earlier findings obtained with orthographic visual input, indicate that the mapping from sight to sound is lexically mediated even when, as in the case of the articulatory-phonetic correspondence, the cross-modal relationship is non-arbitrary.

...read moreread less

Abstract: In two experiments, we investigated whether simultaneous speech reading can influence the detection of speech in envelope-matched noise. Subjects attempted to detect the presence of a disyllabic utterance in noise while watching a speaker articulate a matching or a non-matching utterance. Speech detection was not facilitated by an audio-visual match, which suggests that listeners relied on low-level auditory cues whose perception was immune to cross-modal top-down influences. However, when the stimuli were words (Experiment 1), there was a (predicted) relative shift in bias, suggesting that the masking noise itself was perceived as more speechlike when its envelope corresponded to the visual information. This bias shift was absent, however, with non-word materials (Experiment 2). These results, which resemble earlier findings obtained with orthographic visual input, indicate that the mapping from sight to sound is lexically mediated even when, as in the case of the articulatory-phonetic correspondence, the cross-modal relationship is non-arbitrary.

...read moreread less

Proceedings Article•DOI•

A robust speech/non-speech detection algorithm using time and frequency-based features

[...]

Brian Mak¹, J.-C. Junqua¹, B. Reaves¹•Institutions (1)

Panasonic¹

23 Mar 1992

TL;DR: A new algorithm is proposed that identifies islands of reliability (essentially the portion of speech contained between the first and last vowel) using time- and frequency-based features and then applies a noise adaptive procedure to refine the endpoints.

...read moreread less

Abstract: The authors address the problem of automatic endpoint detection in normal and adverse conditions. Attention has been given to automatic endpoint detection for both additive noise and noise-induced changes in the talker's speech production (Lombard reflex). After a comparison of several automatic endpoint detection algorithms in different noisy-Lombard conditions, the authors propose a new algorithm. This algorithm identifies islands of reliability (essentially the portion of speech contained between the first and last vowel) using time- and frequency-based features and then applies a noise adaptive procedure to refine the endpoints. It is shown that this algorithm outperforms the commonly used algorithm developed by Lamel et al. (1981), and several other recently developed methods. >

...read moreread less

Proceedings Article•

Optimal speech recognition using phone recognition and lexical access.

[...]

Andrej Ljolje, Michael Riley

01 Jan 1992

Patent•DOI•

Speech coding circuit

[...]

Seishi Sasaki, Miyake Masayasu, Urabe Kenzo

15 Jul 1992-Journal of the Acoustical Society of America

TL;DR: In this article, a speech coding circuit with a speech coder and a power comparator is described, which consists of a PCM encoder for converting an analog input into a digital output, and a speech-coder with voice activity detector which detects whether the analog input is voice active or non-active.

...read moreread less

Abstract: A speech coding circuit is disclosed, which comprises a PCM encoder for converting an analog input into a digital output, and a speech coder with voice activity detector which encodes the digital output from the PCM encoder into speech coding data and detects whether the analog input is voice active or non-active, for each period, and then outputs a speech detection flag indicating whether the analog input is voice active or non-active. A power comparator compares the power of the analog input with a predetermined power threshold value and outputs a level detection flag indicating voice activity or non-activity, depending on whether the power of the analog input is greater or smaller than the power threshold value. A mode switch receives the level detection flag indicating voice activity or non-activity and applies to the PCM encoder and the speech coder a mode control signal which puts them into an activated mode or a sleep mode.

...read moreread less

Proceedings Article•DOI•

Study of voice packet reconstruction methods applied to CELP speech coding

[...]

M. Yong¹•Institutions (1)

Mansfield University of Pennsylvania¹

23 Mar 1992

TL;DR: Four voice packet reconstruction methods used for speech coded by code excited linear prediction (CELP)-type speech coders are described and their performance is discussed.

...read moreread less

Abstract: Four voice packet reconstruction methods used for speech coded by code excited linear prediction (CELP)-type speech coders are described. In the first method, the authors generalize the waveform substitution technique originally developed for the PCM coded speech to the CELP speech coding. In the second method, a priority level is assigned to each speech frame to protect against those perceptually important and hard-to-reconstruct speech frames being lost. The third and fourth methods both split the information bits in a frame into two groups of different levels of importance. In method three, the bits for representing the filter parameters are given high priority and bits for representing the excitation signals are given low priority. Method four is an embedded coding technique based on two-stage CELP. The four methods were tested in combination with a simulated voice activity and queuing model and their performance is discussed. >

...read moreread less

Patent•DOI•

Speech recognition combining dynamic programming and neural network techniques

[...]

Heidi Hackbarth¹•Institutions (1)

Alcatel-Lucent¹

10 Sep 1992-Journal of the Acoustical Society of America

TL;DR: Recognition of speech with successive expansion of a reference vocabulary, can be used for automatic telephone dialing by voice input.

...read moreread less

Abstract: Recognition of speech with successive expansion of a reference vocabulary, can be used for automatic telephone dialing by voice input. Neural and conventional recognition methods are performed in parallel so that during training and configuration of the neural network, a conventional recognizer operating according to the dynamic programming principle has available newly added word patterns as references for immediate use in recognition. Upon completion of the training and configuration, the neural network takes over the recognition of the now expanded vocabulary.

...read moreread less

Book•

Digital speech processing : speech coding, synthesis, and recognition

[...]

A. Nejat İnce

01 Jan 1992

TL;DR: The application of Audio/Speech Recognition for Military Requirements and Quality Evaluation of Speech Processing Systems is studied.

...read moreread less

Abstract: 1: Overview of Voice Communications and Speech Processing.- 2: The Speech Signal.- 3: Speech Coding.- 4: Voice Interactive Information Systems.- 5: Speech Recognition Based on Pattern Recognition Approaches.- 6: Quality Evaluation of Speech Processing Systems.- 7: Speech Processing Standards.- 8: Application of Audio/Speech Recognition for Military Requirements.- Selective Bibliography with Abstract.

...read moreread less