scispace - formally typeset
Search or ask a question

Showing papers on "Voice activity detection published in 1998"


Patent
19 Oct 1998
TL;DR: In this paper, a speech unit (2) is proposed that enables all devices (11) connected to the bus system (31) to be controlled by a single speech recognition device.
Abstract: Home networks low-cost digital interfaces are introduced that integrate entertainment, communication and computing electronics into consumer multimedia. Normally, these are low-cost, easy to use systems, since they allow the user to remove or add any kind of network devices with the bus being active. To improve the user interface a speech unit (2) is proposed that enables all devices (11) connected to the bus system (31) to be controlled by a single speech recognition device. The properties of this device, e.g. the vocabulary can be dynamically and actively extended by the consumer devices (11) connected to the bus system (31). The proposed technology is independent from a specific bus standard, e.g. the IEEE 1394 standard, and is well-suited for all kinds of wired wireless home networks. The speech unit (2) receives data and messages from the device. The speech unit (2) recognizes speaker-dependent commands. A Speech synthesizer synthesizes messages. A remotely controllable device (11) has access to a medium which may be a CD-ROM. The device may ask for a logical name or identifier.

301 citations


Journal ArticleDOI
TL;DR: Using the modulation spectrogram as a front end for ASR provides a significant improvement in performance on highly reverberant speech and when it is used in combination with log-RASTA-PLP performance over a range of noisy and reverberant conditions is significantly improved, suggesting that the use of multiple representations is another promising method for improving the robustness of ASR systems.

279 citations


Journal ArticleDOI
TL;DR: A class of Kalman filter-based algorithms with some extensions, modifications, and improvements of previous work is presented, including the estimate-maximize (EM) method to iteratively estimate the spectral parameters of the speech and noise parameters.
Abstract: Speech quality and intelligibility might significantly deteriorate in the presence of background noise, especially when the speech signal is subject to subsequent processing. In particular, speech coders and automatic speech recognition (ASR) systems that were designed or trained to act on clean speech signals might be rendered useless in the presence of background noise. Speech enhancement algorithms have therefore attracted a great deal of interest. In this paper, we present a class of Kalman filter-based algorithms with some extensions, modifications, and improvements of previous work. The first algorithm employs the estimate-maximize (EM) method to iteratively estimate the spectral parameters of the speech and noise parameters. The enhanced speech signal is obtained as a byproduct of the parameter estimation algorithm. The second algorithm is a sequential, computationally efficient, gradient descent algorithm. We discuss various topics concerning the practical implementation of these algorithms. Extensive experimental study using real speech and noise signals is provided to compare these algorithms with alternative speech enhancement algorithms, and to compare the performance of the iterative and sequential algorithms.

276 citations


PatentDOI
TL;DR: In this article, a system and method of operating an automatic speech recognition service using a client-server architecture is used to make ASR services accessible at a client location remote from the location of the main ASR engine.
Abstract: A system and method of operating an automatic speech recognition service using a client-server architecture is used to make ASR services accessible at a client location remote from the location of the main ASR engine. The present invention utilizes client-server communications over a packet network, such as the Internet, where the ASR server receives a grammar from the client, receives information representing speech from the client, performs speech recognition, and returns information based upon the recognized speech to the client.

227 citations


PatentDOI
Hsiao-Wuen Hon1, Dong Li1, Xuedong Huang1, Yun-Chen Ju1, Xianghui Sean Zhang1 
TL;DR: A computer implemented system and method of proofreading text in a computer system includes receiving text from a user into a text editing module as discussed by the authors, at least a portion of the text is converted to an audio signal upon the detection of an indicator, the indicator defining a boundary in the text by either being embodied therein or comprising delays in receiving text.
Abstract: A computer implemented system and method of proofreading text in a computer system includes receiving text from a user into a text editing module. At least a portion of the text is converted to an audio signal upon the detection of an indicator, the indicator defining a boundary in the text by either being embodied therein or comprising delays in receiving text. The audio signal is played through a speaker to the user to provide feedback.

224 citations


01 Jan 1998
TL;DR: It is argued and demonstrated empirically that the articulatory feature approach can lead to greater robustness by enhancing the accuracy of the bottom-up acoustic modeling component in a speech recognition system, to improve the robustness of speech recognition systems in adverse acoustic environments.
Abstract: Current automatic speech recognition systems make use of a single source of information about their input, viz a preprocessed form of the acoustic speech signal, which encodes the time-frequency distribution of signal energy The goal of this thesis is to investigate the benefits of integrating articulatory information into state-of-the art speech recognizers, either as a genuine alternative to standard acoustic representations, or as an additional source of information Articulatory information is represented in terms of abstract articulatory classes or "features", which are extracted from the speech signal by means of statistical classifiers A higher-level classifier then combines the scores for these features and maps them to standard subword unit probabilities The main motivation for this approach is to improve the robustness of speech recognition systems in adverse acoustic environments, such as background noise Typically, recognition systems show a sharp decline of performance under these conditions We argue and demonstrate empirically that the articulatory feature approach can lead to greater robustness by enhancing the accuracy of the bottom-up acoustic modeling component in a speech recognition system The second focus point of this thesis is to provide detailed analyses of the different types of information provided by the acoustic and the articulatory representations, respectively, and to develop strategies to optimally combine them To this effect we investigate combination methods at the levels of feature extraction, subword unit probability estimation, and word recognition The feasibility of this approach is demonstrated with respect to two different speech recognition tasks The first of these is an American English corpus of telephone-bandwidth speech; the recognition domain is continuous numbers The second is a German database of studio-quality speech consisting of spontaneous dialogues In both cases recognition performance will be tested not only under clean acoustic conditions but also under deteriorated conditions

221 citations


Proceedings Article
01 Jan 1998
TL;DR: This paper presents an entropy-based algorithm for accurate and robust endpoint detection for speech recognition under noisy environments that uses the spectral entropy to identify the speech segments accurately.
Abstract: This paper presents an entropy-based algorithm for accurate and robust endpoint detection for speech recognition under noisy environments. Instead of using the conventional energy-based features, the spectral entropy is developed to identify the speech segments accurately. Experimental results show that this algorithm outperforms the energy-based algorithms in both detection accuracy and recognition performance under noisy environments, with an average error rate reduction of more than 16%.

221 citations


Patent
06 Oct 1998
TL;DR: In this paper, a speech synthesizer generates speech which characterizes the structure and content of a web page retrieved over the network, and a grammar generator utilizes textual information parsed from the retrieved web page to produce a grammar.
Abstract: A platform for implementing interactive voice response (IVR) applications over the Internet or other type of network includes a speech synthesizer, a grammar generator and a speech recognizer. The speech synthesizer generates speech which characterizes the structure and content of a web page retrieved over the network. The speech is delivered to a user via a telephone or other type of audio interface device. The grammar generator utilizes textual information parsed from the retrieved web page to produce a grammar. The grammar is supplied to the speech recognizer and used to interpret voice commands and other speech input generated by the user. The platform may also include a voice processor which determines which of a number of predefined models best characterized a given retrieved page, such that the process of generating an appropriate verbal description of the page is considerably simplified. The speech synthesizer, grammar generator, speech recognizer and other elements of the IVR platform may be operated by a Internet Service Provider (ISP), thereby allowing the general Internet population to create interactive voice response applications without acquiring their own IVR equipment.

217 citations


PatentDOI
TL;DR: This invention includes the steps of providing input to be acoustically produced, comparing the input to training data or application specific splice files to identify one of words and word sequences corresponding to the input for constructing a phone sequence.
Abstract: In accordance with the present invention, a method for providing generation of speech includes the steps of providing input to be acoustically produced, comparing the input to training data or application specific splice files to identify one of words and word sequences corresponding to the input for constructing a phone sequence, using a search algorithm to identify a segment sequence to construct output speech according to the phone sequence and concatenating segments and modifying characteristics of the segments to be substantially equal to requested characteristics. Application specific data is advantageously used to make pertinent information available to synthesize both the phone sequence and the output speech. Also, described is a system for performing operations in accordance with the disclosure.

207 citations


Patent
Shichiro Miyashita1, Takashi Saito1
16 Jun 1998
TL;DR: In this article, a voice synthesizing part 30 outputs the text and the header of the received electronic mail to the voice synthesizer part 32, and the voice font switching part 326 outputs the sander ID contained in the header to voice font searching part 328.
Abstract: The reception part 30 outputs the text and the header of the received electronic mail to the voice synthesizing part 32 . The voice font switching part 326 outputs the sander ID contained in the header to the voice font searching part 328 . The voice font searching part 328 searches the voice feature data in the voice font database part 330 to have voice feature data in which the user ID contained in the header of the voice feature data matches the sender ID sent to the rhythm control part 322 and the voice generating part 324 . The rhythm control part 322 , the voice generating part 324 and the voice output part 38 read the content of the text in a voice feature indicated by the phonemes contained in the supplied voice feature data and generates a voice signal characterized by rhythm data contained in the header of the voice feature data for output.

182 citations


Patent
31 Mar 1998
TL;DR: In this article, two different kinds of features in a speech signal are analyzed for classification purposes, one set of features is based on pitch information that is obtained from the speech signal, and the other set of feature are based on changes in the spectral shape of the speech signals over time, which may indicate the emotional state of the speaker.
Abstract: The classification of speech according to emotional content employs acoustic measures in addition to pitch as classification input. In one embodiment, two different kinds of features in a speech signal are analyzed for classification purposes. One set of features is based on pitch information that is obtained from a speech signal, and the other set of features is based on changes in the spectral shape of the speech signal over time. This latter feature is used to distinguish long, smoothly varying sounds from quickly changing sound, which may indicate the emotional state of the speaker. These changes are determined by means of a low-dimensional representation of the speech signal, such as MFCC or LPC. Additional features of the speech signal, such as energy, can also be employed for classification purposes. Different variations of pitch and spectral shape features can be measured and analyzed, to assist in the classification of individual utterances. In one implementation, the features are measured individually for each of the first, middle and last thirds of an utterance, as well as for the utterance as a whole, to generate multiple sets of data for each utterance.

Patent
Dimitri Kanevsky1, Wlodek Zadrozny1
14 May 1998
TL;DR: In this article, an automatic dialog system capable of keeping a drive awake while driving during a long trip or one that extends into the late evening is presented, which carries on a conversation with the driver on various topics utilizing a natural dialog car system.
Abstract: An automatic dialog system capable of keeping a drive awake while driving during a long trip or one that extends into the late evening. The system carries on a conversation with the driver on various topics utilizing a natural dialog car system. The system includes an automatic speech recognition module, a speech generation module which includes speech synthesis or recorded speech, and possibly dynamically combined speech synthesizer and recorded speech, and a natural language processing module. The natural dialog car system analyzes a driver's answer and the contents of the answer together with his voice patterns to determine if he is alert while driving. The system warns the driver or changes the topic of conversation if the system determines that the driver is about to fall asleep. The system may also detect whether a driver is effected by alcohol or drugs.

Patent
28 Aug 1998
TL;DR: In this article, a method and apparatus for encoding speech for communication to a decoder for reproduction of the speech where the speech signal is classified into steady state voiced (harmonic), stationary unvoiced, and "transitory" or "transition" speech.
Abstract: A method and apparatus for encoding speech for communication to a decoder for reproduction of the speech where the speech signal is classified into steady state voiced (harmonic), stationary unvoiced, and “transitory” or “transition” speech, and a particular type of coding scheme is used for each class Harmonic coding is used for steady state voiced speech, “noise-like” coding is used for stationary unvoiced speech, and a special coding mode is used for transition speech, designed to capture the location, the structure, and the strength of the local time events that characterize the transition portions of the speech The compression schemes can be applied to the speech signal or to the LP residual signal

Patent
Geoffrey W. Peters1
29 Dec 1998
TL;DR: In this article, a method and apparatus for using video input to control speech recognition systems is described, where gestures of a user of a speech recognition system are detected from a video input, and are used to turn a speech unit on and off.
Abstract: Method and apparatus for using video input to control speech recognition systems is disclosed. In one embodiment, gestures of a user of a speech recognition system are detected from a video input, and are used to turn a speech recognition unit on and off. In another embodiment, the position of a user is detected from a video input, and the position information supplied to a microphone array point of source filter to aid the filter in selecting the voice of a user that is moving about in the field of the camera supplying the video input.

Proceedings ArticleDOI
12 May 1998
TL;DR: Experimental results show that the FC system can yield better performance than both the conventional ASR and the LC strategy for noisy speech, and is proposed as an alternative method, namely feature recombination (FC).
Abstract: This paper presents a new approach for multi-band based automatic speech recognition (ASR). Previous work by Bourlard et al. (see Proc. Int. Conf. on Spoken Language Processing, Philadelphia, p.426-9, 1996) and Hermansky et al. (see Proc. Int. Conf. on Spoken Language Processing, Philadelphia, p.1579-82, 1996) suggests that multi-band ASR gives a more accurate recognition, especially in noisy acoustic environments, by combining the likelihoods of different frequency bands. Here we evaluate this likelihood recombination (LC) approach to multi-band ASR, and propose an alternative method, namely feature recombination (FC). In the FC system, after different acoustic analyzers are applied to each sub-band individually, a vector is composed by combining the sub-band features. The speech classifier then calculates the likelihood from the single vector. Thus, band-limited noise affects only a few of the feature components, as in the multi-band LC system, but, at the same time, all feature components are jointly modeled, as in conventional ASR. The experimental results show that the FC system can yield better performance than both the conventional ASR and the LC strategy for noisy speech.

Journal ArticleDOI
TL;DR: This paper presents a voice detection algorithm which is robust to noisy environments, thanks to a new methodology adopted for the matching process, based on a pattern recognition approach in which the matching phase is performed by a set of six fuzzy rules, trained by means of a new hybrid learning tool.
Abstract: Discontinuous transmission based on speech/pause detection represents a valid solution to improve the spectral efficiency of new generation wireless communication systems. In this context, robust voice activity detection (VAD) algorithms are required, as traditional solutions present a high misclassification rate in the presence of the background noise typical of mobile environments. This paper presents a voice detection algorithm which is robust to noisy environments, thanks to a new methodology adopted for the matching process. More specifically, the VAD proposed is based on a pattern recognition approach in which the matching phase is performed by a set of six fuzzy rules, trained by means of a new hybrid learning tool. A series of objective tests performed on a large speech database, varying the signal-to-noise ratio (SNR), the types of background noise, and the input signal level, showed that, as compared with the VAD standardized by ITU-T in Recommendation G.729 annex B, the fuzzy VAD, on average, achieves an improvement in reduction both of the activity factor of about 25% and of the clipping introduced of about 43%. Informal listening tests also confirm an improvement in the perceived speech quality.

Patent
03 Nov 1998
TL;DR: In this article, a first server for receiving speech input by a user, a speech recognition system for converting the speech to text and a speech synthesizer for playing back the synthesized speech for correction by the user and a correction mechanism for enabling the user to correct the speech such that the corrected speech is provided as text for transmittal over a communication system.
Abstract: A messaging system for receiving speech over a telephone and converting the speech to text includes a first server for receiving speech input by a user, a speech recognition system for converting the speech to text, a speech synthesizer for converting the text to speech for playing back the synthesized speech for correction by the user and a correction mechanism for enabling the user to correct the speech such that the corrected speech is provided as text for transmittal over a communication system.

Patent
21 Apr 1998
TL;DR: In this article, the authors present a method for dynamic adjustment of audio prompts and speech prompts by switching from a foreground state to a background state of a speech interface in response to a users current interaction modality, by selecting alternative states for speech and audio interfaces that represent users needs for speech prompts.
Abstract: Management of speech and audio prompts, and interface presence, in multimodal user interfaces is provided. A communications device having a multimodal user interface including a speech interface, and a non-speech interface, e.g. a graphical or tactile user interface, comprises means for dynamically switching between a background state of the speech interface and a foreground state of the speech interface in accordance with a users input modality choice. Preferably, in the foreground state speech prompts and speech based error recovery are fully implemented and in a background state speech prompts are replaced by earcons, and no speech based error recovery is implemented. Thus there is provided a device which automatically subdue the speech prompts when a user selects a non-speech input/output mechanism. Also provided is a method for dynamic adjustment of audio prompts and speech prompts by switching from a foreground state to a background state of a speech interface in response to a users current interaction modality, by selecting alternative states for speech and audio interfaces that represent users needs for speech prompts. This type of system and method is particularly useful and applicable to hand held Internet access communication devices.

Patent
Jennifer Lai1, John Vergo1
23 Nov 1998
TL;DR: In this article, a speech recognition computer system and method indicate the level of confidence that a speech recognizer has in it recognition of one or more displayed words, and a plurality of confidence levels of individual recognized words may be visually indicated.
Abstract: A speech recognition computer system and method indicates the level of confidence that a speech recognizer has in it recognition of one or more displayed words. The system and method allow for the rapid identification of speech recognition errors. A plurality of confidence levels of individual recognized words may be visually indicated. Additionally, the system and method allow the user of the system to select threshold levels to determine when the visual indication occurs.

Patent
22 Sep 1998
TL;DR: In this paper, a method and apparatus are provided for improving the performance of an interactive speech application, in which the application stores, in a log, event information that describes each task carried out by the Interactive Speech application in response to interaction with the one or more callers.
Abstract: A method and apparatus are provided for improving the performance of an interactive speech application. The interactive speech application is developed and deployed for use by one or more callers. During execution, the interactive speech application stores, in a log, event information that describes each task carried out by the interactive speech application in response to interaction with the one or more callers. The application also stores one or more sets of audio information, in which each of the sets of audio information is associated with one or more utterances by one of the callers. Each of the sets of audio information is associated with one of the tasks represented in the log. After the log is established, an analytical report is displayed. The report describes selective actions taken by the interactive speech application while executing, and selective actions taken by one or more callers while interacting with the interactive speech application. Information in the analytical report is selected so as to identify one or more potential performance problems in the interactive speech application. While the analytical report is displayed, when the analytical report reaches a point at which the audio information was previously recorded and stored, the audio information may be replayed and analyzed. The interactive speech application is modified based on the analytical report. Accordingly, the interactive speech application may be improved based upon its actual performance, and its actual performance may be evaluated in detail based on specific call events and caller responses to application actions.

Patent
31 Mar 1998
TL;DR: In this article, a speech sample is received and speech recognition is performed on the speech sample to produce recognition results, and the recognition results are evaluated in view of the training data and the identification of the speech elements to which the portions of training data are related.
Abstract: A speech sample is evaluated using a computer. Training data that include samples of speech are received and stored along with identification of speech elements to which portions of the training data are related. A speech sample is received and speech recognition is performed on the speech sample to produce recognition results. Finally, the recognition results are evaluated in view of the training data and the identification of the speech elements to which the portions of the training data are related. The technique may be used to perform tasks such as speech recognition, speaker identification, and language identification.

Proceedings ArticleDOI
12 May 1998
TL;DR: The proposed compression algorithm uses a combination of simple techniques, such as linear prediction and multi-stage vector quantization, and the current version of the algorithm encodes the acoustic features at a fixed rate of 4.0 kbit/s.
Abstract: In this paper, we describe a new compression algorithm for encoding acoustic features used in typical speech recognition systems. The proposed algorithm uses a combination of simple techniques, such as linear prediction and multi-stage vector quantization, and the current version of the algorithm encodes the acoustic features at a fixed rate of 4.0 kbit/s. The compression algorithm can be used very effectively for speech recognition in network environments, such as those employing a client-server model, or to reduce storage in general speech recognition applications. The algorithm has also been tuned for practical implementations, so that the computational complexity and memory requirements are modest. We have successfully tested the compression algorithm against many test sets from several different languages, and the algorithm performed very well, with no significant change in the recognition accuracy due to compression.

Patent
30 Jun 1998
TL;DR: In this article, a method for performing a voice recognition function on a voice telephone conversation to convert the conversation into text data using a voice processing system was described, which allows entire days/weeks or even months of conversation to be stored and accessed.
Abstract: Voice data requires large storage resources even when compressed and takes a long time to retrieve. Further the required information cannot normally be directly located and it is difficult to analyze the voice data for statistical information. There is described a method for performing a voice recognition function on a voice telephone conversation to convert the conversation into text data using a voice processing system. The method comprises receiving voice data representing the telephone conversation comprising a first series of speech data from an agent interspersed with a second series of speech data from a client and storing the first and second series of speech data as a single body of voice data for later retrieval. Then a voice recognition function is performed on the voice data to convert it into text data representing the telephone conversation and the text data is stored for later retrieval. Such a solution allows entire days/weeks or even months of conversation to be stored and accessed. Since the memory space required is considerably smaller for text storage it is possible to keep many days of conversation in directly accessible memory that may be searched by a computer. Furthermore it is possible to search for keywords typed from the keyboard and it is not necessary to manually scan the entire conversation for the desired topic.

Proceedings ArticleDOI
Subhro Das1, D. Nix, M. Picheny
12 May 1998
TL;DR: Comparative studies demonstrating the performance gain realized by adopting to children's acoustic and language model data to construct a children's speech recognition system are described.
Abstract: There are several reasons why conventional speech recognition systems modeled on adult data fail to perform satisfactorily on children's speech input. For instance, children's vocal characteristics differ significantly from those of adults. In addition, their choices of vocabulary and sentence construction modalities usually do not conform to adult patterns. We describe comparative studies demonstrating the performance gain realized by adopting to children's acoustic and language model data to construct a children's speech recognition system.

Patent
TL;DR: In this paper, speaker-dependent and speaker-independent speech recognition in a voice-controlled multi-station network has been discussed and a fallback procedure is maintained for any particular station in order to cater for failure of the speakerdependent recognition, whilst allowing reverting to the improvement procedure.
Abstract: A voice-controlled multi-station network has both speaker-dependent and speaker-independent speech recognition. Conditionally to recognizing items of an applicable vocabulary, the network executes a particular function. The method receives a call from a particular origin and executes speaker-independent speech recognition on the call. In an improvement procedure, in case of successful determination of what has been said, a template associated to the recognized speech items is stored and assigned to the origin. Next, speaker-dependent recognition is applied if feasible, for speech received from the same origin, using one or more templates associated to that station. Further, a fallback procedure to speaker-independent recognition is maintained for any particular station in order to cater for failure of the speaker-dependent recognition, whilst allowing reverting to the improvement procedure.

Patent
Kenneth Jong1
TL;DR: In this article, a speech-text-transmit communication over data networks includes speech recognition devices and text to speech conversion devices that translate speech signals input to the terminal into text and text data received from a data network into speech output signals.
Abstract: An apparatus and method for speech-text-transmit communication over data networks includes speech recognition devices and text to speech conversion devices that translate speech signals input to the terminal into text and text data received from a data network into speech output signals. The speech input signals are translated into text based on phonemes obtained from a spectral analysis of the speech input signals. The text data is transmitted to a receiving party over the data network as a plurality of text data packets such that a continuous stream of text data is obtained. The receiving party's terminal receives the text data and may immediately display the text data and/or translate it into speech output signals using the text to speech conversion device. The text to speech conversion device uses speech pattern data stored in a speech pattern database for synthesizing a human voice for playing of the speech output signals using a speech output device.

Journal ArticleDOI
TL;DR: The research areas covered are speech analysis and synthesis, speech coding, speech enhancement, speech recognition, spoken language understanding, speaker identification and verification, and multimodal communication.
Abstract: This article provides a succinct review of speech research, in particular its history, current trends, and prospects for the future. The research areas covered are speech analysis and synthesis, speech coding, speech enhancement, speech recognition, spoken language understanding, speaker identification and verification, and multimodal communication.

Proceedings ArticleDOI
S.A. Ramprashad1
12 May 1998
TL;DR: A two stage hybrid embedded speech/audio coding structure uses a speech coder as a core to provide the minimal bitrate and an acceptable performance on speech inputs and a transform coder using a modified discrete cosine transform and perceptual coding principles is proposed.
Abstract: A two stage hybrid embedded speech/audio coding structure is proposed. The structure uses a speech coder as a core to provide the minimal bitrate and an acceptable performance on speech inputs. The second stage is a transform coder using a modified discrete cosine transform (MDCT) and perceptual coding principles. This stage is itself embedded both in complexity and bitrate, and provides various levels of enhancement of the core output, particularly for general audio signals like music. Informal A-B comparison tests show that the performance of the structure at 16 kb/s is between that of the GSM enhanced full rate coder at 12.2 kb/s, and the G.728 LD-CELP coder at 16 kb/s.

Book
01 Jan 1998
TL;DR: This work presentsceptually Inspired Signal-processing Strategies for Robust Speech Recognition in Reverberant Environments, a novel approach to signal-processing that automates the very labor-intensive and therefore time-heavy and expensive process of recognizing speech.
Abstract: Perceptually Inspired Signal-processing Strategies for Robust Speech Recognition in Reverberant Environments

Patent
31 Jul 1998
TL;DR: In this paper, the authors used therapidly available speech recognition results to provide intelligent barge-in for voice-response systems and, to count words to output sub-sequences to provide paralleling and/or pipelining of tasks related to the entire word sequence to increase processing throughput.
Abstract: Speech recognition technology has attained maturity such that the most likely speech recognition result has been reached and is available before an energy based termination of speech has been made. The present invention innovatively uses therapidly available speech recognition results to provide intelligent barge-in forvoice-response systems and, to count words to output sub-sequences to provide paralleling and/or pipelining of tasks related to the entire word sequence to increase processing throughput.