Showing papers on "Voice activity detection published in 1998"

PDF

Open Access

Patent•

Speech recognition control of remotely controllable devices in a home network environment

[...]

Peter Buchner, Silke Goronzy, Ralf Kompe, Stefan Rapp

19 Oct 1998

TL;DR: In this paper, a speech unit (2) is proposed that enables all devices (11) connected to the bus system (31) to be controlled by a single speech recognition device.

...read moreread less

Abstract: Home networks low-cost digital interfaces are introduced that integrate entertainment, communication and computing electronics into consumer multimedia. Normally, these are low-cost, easy to use systems, since they allow the user to remove or add any kind of network devices with the bus being active. To improve the user interface a speech unit (2) is proposed that enables all devices (11) connected to the bus system (31) to be controlled by a single speech recognition device. The properties of this device, e.g. the vocabulary can be dynamically and actively extended by the consumer devices (11) connected to the bus system (31). The proposed technology is independent from a specific bus standard, e.g. the IEEE 1394 standard, and is well-suited for all kinds of wired wireless home networks. The speech unit (2) receives data and messages from the device. The speech unit (2) recognizes speaker-dependent commands. A Speech synthesizer synthesizes messages. A remotely controllable device (11) has access to a medium which may be a CD-ROM. The device may ask for a logical name or identifier.

...read moreread less

301 citations

Journal Article•DOI•

Robust speech recognition using the modulation spectrogram

[...]

Brian Kingsbury¹, Brian Kingsbury², Nelson Morgan¹, Nelson Morgan², Steven Greenberg¹, Steven Greenberg² - Show less +2 more•Institutions (2)

University of California, Berkeley¹, International Computer Science Institute²

01 Aug 1998-Speech Communication

TL;DR: Using the modulation spectrogram as a front end for ASR provides a significant improvement in performance on highly reverberant speech and when it is used in combination with log-RASTA-PLP performance over a range of noisy and reverberant conditions is significantly improved, suggesting that the use of multiple representations is another promising method for improving the robustness of ASR systems.

...read moreread less

279 citations

Journal Article•DOI•

Iterative and sequential Kalman filter-based speech enhancement algorithms

[...]

Sharon Gannot¹, David Burshtein¹, E. Weinstein¹•Institutions (1)

Tel Aviv University¹

01 Jul 1998-IEEE Transactions on Speech and Audio Processing

TL;DR: A class of Kalman filter-based algorithms with some extensions, modifications, and improvements of previous work is presented, including the estimate-maximize (EM) method to iteratively estimate the spectral parameters of the speech and noise parameters.

...read moreread less

Abstract: Speech quality and intelligibility might significantly deteriorate in the presence of background noise, especially when the speech signal is subject to subsequent processing. In particular, speech coders and automatic speech recognition (ASR) systems that were designed or trained to act on clean speech signals might be rendered useless in the presence of background noise. Speech enhancement algorithms have therefore attracted a great deal of interest. In this paper, we present a class of Kalman filter-based algorithms with some extensions, modifications, and improvements of previous work. The first algorithm employs the estimate-maximize (EM) method to iteratively estimate the spectral parameters of the speech and noise parameters. The enhanced speech signal is obtained as a byproduct of the parameter estimation algorithm. The second algorithm is a sequential, computationally efficient, gradient descent algorithm. We discuss various topics concerning the practical implementation of these algorithms. Extensive experimental study using real speech and noise signals is provided to compare these algorithms with alternative speech enhancement algorithms, and to compare the performance of the iterative and sequential algorithms.

...read moreread less

276 citations

Patent•DOI•

System and method for providing remote automatic speech recognition services via a packet network

[...]

Pamela Leigh Dragosh¹, David Bjorn Roe¹, Robert Douglas Sharp¹•Institutions (1)

AT&T¹

08 Apr 1998-Journal of the Acoustical Society of America

TL;DR: In this article, a system and method of operating an automatic speech recognition service using a client-server architecture is used to make ASR services accessible at a client location remote from the location of the main ASR engine.

...read moreread less

Abstract: A system and method of operating an automatic speech recognition service using a client-server architecture is used to make ASR services accessible at a client location remote from the location of the main ASR engine. The present invention utilizes client-server communications over a packet network, such as the Internet, where the ASR server receives a grammar from the client, receives information representing speech from the client, performs speech recognition, and returns information based upon the recognized speech to the client.

...read moreread less

227 citations

Patent•DOI•

Proofreading with text to speech feedback

[...]

Hsiao-Wuen Hon¹, Dong Li¹, Xuedong Huang¹, Yun-Chen Ju¹, Xianghui Sean Zhang¹ - Show less +1 more•Institutions (1)

Microsoft¹

17 Aug 1998-Journal of the Acoustical Society of America

TL;DR: A computer implemented system and method of proofreading text in a computer system includes receiving text from a user into a text editing module as discussed by the authors, at least a portion of the text is converted to an audio signal upon the detection of an indicator, the indicator defining a boundary in the text by either being embodied therein or comprising delays in receiving text.

...read moreread less

Abstract: A computer implemented system and method of proofreading text in a computer system includes receiving text from a user into a text editing module. At least a portion of the text is converted to an audio signal upon the detection of an indicator, the indicator defining a boundary in the text by either being embodied therein or comprising delays in receiving text. The audio signal is played through a speaker to the user to provide feedback.

...read moreread less

224 citations

Robust speech recognition using articulatory information

[...]

Katrin Kirchhoff

01 Jan 1998

TL;DR: It is argued and demonstrated empirically that the articulatory feature approach can lead to greater robustness by enhancing the accuracy of the bottom-up acoustic modeling component in a speech recognition system, to improve the robustness of speech recognition systems in adverse acoustic environments.

...read moreread less

Abstract: Current automatic speech recognition systems make use of a single source of information about their input, viz a preprocessed form of the acoustic speech signal, which encodes the time-frequency distribution of signal energy The goal of this thesis is to investigate the benefits of integrating articulatory information into state-of-the art speech recognizers, either as a genuine alternative to standard acoustic representations, or as an additional source of information Articulatory information is represented in terms of abstract articulatory classes or "features", which are extracted from the speech signal by means of statistical classifiers A higher-level classifier then combines the scores for these features and maps them to standard subword unit probabilities The main motivation for this approach is to improve the robustness of speech recognition systems in adverse acoustic environments, such as background noise Typically, recognition systems show a sharp decline of performance under these conditions We argue and demonstrate empirically that the articulatory feature approach can lead to greater robustness by enhancing the accuracy of the bottom-up acoustic modeling component in a speech recognition system The second focus point of this thesis is to provide detailed analyses of the different types of information provided by the acoustic and the articulatory representations, respectively, and to develop strategies to optimally combine them To this effect we investigate combination methods at the levels of feature extraction, subword unit probability estimation, and word recognition The feasibility of this approach is demonstrated with respect to two different speech recognition tasks The first of these is an American English corpus of telephone-bandwidth speech; the recognition domain is continuous numbers The second is a German database of studio-quality speech consisting of spontaneous dialogues In both cases recognition performance will be tested not only under clean acoustic conditions but also under deteriorated conditions

...read moreread less

221 citations

Proceedings Article•

Robust entropy-based endpoint detection for speech recognition in noisy environments.

[...]

Jia-Lin Shen¹, Jeih-weih Hung, Lin-Shan Lee•Institutions (1)

Academia Sinica¹

01 Jan 1998

TL;DR: This paper presents an entropy-based algorithm for accurate and robust endpoint detection for speech recognition under noisy environments that uses the spectral entropy to identify the speech segments accurately.

...read moreread less

Abstract: This paper presents an entropy-based algorithm for accurate and robust endpoint detection for speech recognition under noisy environments. Instead of using the conventional energy-based features, the spectral entropy is developed to identify the speech segments accurately. Experimental results show that this algorithm outperforms the energy-based algorithms in both detection accuracy and recognition performance under noisy environments, with an average error rate reduction of more than 16%.

...read moreread less

221 citations

Patent•

Web-based platform for interactive voice response (IVR)

[...]

Michael Kenneth Brown, Kenneth G. Rehor, Brian Carl Schmult, Curtis Duane Tuckey

06 Oct 1998

TL;DR: In this paper, a speech synthesizer generates speech which characterizes the structure and content of a web page retrieved over the network, and a grammar generator utilizes textual information parsed from the retrieved web page to produce a grammar.

...read moreread less

Abstract: A platform for implementing interactive voice response (IVR) applications over the Internet or other type of network includes a speech synthesizer, a grammar generator and a speech recognizer. The speech synthesizer generates speech which characterizes the structure and content of a web page retrieved over the network. The speech is delivered to a user via a telephone or other type of audio interface device. The grammar generator utilizes textual information parsed from the retrieved web page to produce a grammar. The grammar is supplied to the speech recognizer and used to interpret voice commands and other speech input generated by the user. The platform may also include a voice processor which determines which of a number of predefined models best characterized a given retrieved page, such that the process of generating an appropriate verbal description of the page is considerably simplified. The speech synthesizer, grammar generator, speech recognizer and other elements of the IVR platform may be operated by a Internet Service Provider (ISP), thereby allowing the general Internet population to create interactive voice response applications without acquiring their own IVR equipment.

...read moreread less

217 citations

Patent•DOI•

Phrase splicing and variable substitution using a trainable speech synthesizer

[...]

Robert E. Donovan¹, Martin Franz¹, Salim Roukos¹, Jeffrey Sorensen¹•Institutions (1)

IBM¹

11 Sep 1998-Journal of the Acoustical Society of America

TL;DR: This invention includes the steps of providing input to be acoustically produced, comparing the input to training data or application specific splice files to identify one of words and word sequences corresponding to the input for constructing a phone sequence.

...read moreread less

Abstract: In accordance with the present invention, a method for providing generation of speech includes the steps of providing input to be acoustically produced, comparing the input to training data or application specific splice files to identify one of words and word sequences corresponding to the input for constructing a phone sequence, using a search algorithm to identify a segment sequence to construct output speech according to the phone sequence and concatenating segments and modifying characteristics of the segments to be substantially equal to requested characteristics. Application specific data is advantageously used to make pertinent information available to synthesize both the phone sequence and the output speech. Also, described is a system for performing operations in accordance with the disclosure.

...read moreread less

207 citations

Patent•

Voice mail system, voice synthesizing device and method therefor

[...]

Shichiro Miyashita¹, Takashi Saito¹•Institutions (1)

IBM¹

16 Jun 1998

TL;DR: In this article, a voice synthesizing part 30 outputs the text and the header of the received electronic mail to the voice synthesizer part 32, and the voice font switching part 326 outputs the sander ID contained in the header to voice font searching part 328.

...read moreread less

Abstract: The reception part 30 outputs the text and the header of the received electronic mail to the voice synthesizing part 32 . The voice font switching part 326 outputs the sander ID contained in the header to the voice font searching part 328 . The voice font searching part 328 searches the voice feature data in the voice font database part 330 to have voice feature data in which the user ID contained in the header of the voice feature data matches the sender ID sent to the rhythm control part 322 and the voice generating part 324 . The rhythm control part 322 , the voice generating part 324 and the voice output part 38 read the content of the text in a voice feature indicated by the phonemes contained in the supplied voice feature data and generates a voice signal characterized by rhythm data contained in the header of the voice feature data for output.

...read moreread less

182 citations

Patent•

System and method for automatic classification of speech based upon affective content

[...]

Malcolm Slaney¹•Institutions (1)

Interval Research Corporation¹

31 Mar 1998

TL;DR: In this article, two different kinds of features in a speech signal are analyzed for classification purposes, one set of features is based on pitch information that is obtained from the speech signal, and the other set of feature are based on changes in the spectral shape of the speech signals over time, which may indicate the emotional state of the speaker.

...read moreread less

Abstract: The classification of speech according to emotional content employs acoustic measures in addition to pitch as classification input. In one embodiment, two different kinds of features in a speech signal are analyzed for classification purposes. One set of features is based on pitch information that is obtained from a speech signal, and the other set of features is based on changes in the spectral shape of the speech signal over time. This latter feature is used to distinguish long, smoothly varying sounds from quickly changing sound, which may indicate the emotional state of the speaker. These changes are determined by means of a low-dimensional representation of the speech signal, such as MFCC or LPC. Additional features of the speech signal, such as energy, can also be employed for classification purposes. Different variations of pitch and spectral shape features can be measured and analyzed, to assist in the classification of individual utterances. In one implementation, the features are measured individually for each of the first, middle and last thirds of an utterance, as well as for the utterance as a whole, to generate multiple sets of data for each utterance.

...read moreread less

Patent•

Sleep prevention dialog based car system

[...]

Dimitri Kanevsky¹, Wlodek Zadrozny¹•Institutions (1)

IBM¹

14 May 1998

TL;DR: In this article, an automatic dialog system capable of keeping a drive awake while driving during a long trip or one that extends into the late evening is presented, which carries on a conversation with the driver on various topics utilizing a natural dialog car system.

...read moreread less

Abstract: An automatic dialog system capable of keeping a drive awake while driving during a long trip or one that extends into the late evening. The system carries on a conversation with the driver on various topics utilizing a natural dialog car system. The system includes an automatic speech recognition module, a speech generation module which includes speech synthesis or recorded speech, and possibly dynamically combined speech synthesizer and recorded speech, and a natural language processing module. The natural dialog car system analyzes a driver's answer and the contents of the answer together with his voice patterns to determine if he is alert while driving. The system warns the driver or changes the topic of conversation if the system determines that the driver is about to fall asleep. The system may also detect whether a driver is effected by alcohol or drugs.

...read moreread less

Patent•

Method and apparatus for hybrid coding of speech at 4kbps

[...]

Allen Gersho¹, Eyal Shlomot¹, Vladimir Cuperman¹, Chunyan Li¹•Institutions (1)

University of California¹

28 Aug 1998

TL;DR: In this article, a method and apparatus for encoding speech for communication to a decoder for reproduction of the speech where the speech signal is classified into steady state voiced (harmonic), stationary unvoiced, and "transitory" or "transition" speech.

...read moreread less

Abstract: A method and apparatus for encoding speech for communication to a decoder for reproduction of the speech where the speech signal is classified into steady state voiced (harmonic), stationary unvoiced, and “transitory” or “transition” speech, and a particular type of coding scheme is used for each class Harmonic coding is used for steady state voiced speech, “noise-like” coding is used for stationary unvoiced speech, and a special coding mode is used for transition speech, designed to capture the location, the structure, and the strength of the local time events that characterize the transition portions of the speech The compression schemes can be applied to the speech signal or to the LP residual signal

...read moreread less

Patent•

Video control of speech recognition

[...]

Geoffrey W. Peters¹•Institutions (1)

Intel¹

29 Dec 1998

TL;DR: In this article, a method and apparatus for using video input to control speech recognition systems is described, where gestures of a user of a speech recognition system are detected from a video input, and are used to turn a speech unit on and off.

...read moreread less

Abstract: Method and apparatus for using video input to control speech recognition systems is disclosed. In one embodiment, gestures of a user of a speech recognition system are detected from a video input, and are used to turn a speech recognition unit on and off. In another embodiment, the position of a user is detected from a video input, and the position information supplied to a microphone array point of source filter to aid the filter in selecting the voice of a user that is moving about in the field of the camera supplying the video input.

...read moreread less

Proceedings Article•DOI•

Multi-band speech recognition in noisy environments

[...]

S. Okawa¹, E. Bocchieri, Alexandros Potamianos•Institutions (1)

AT&T Labs¹

12 May 1998

TL;DR: Experimental results show that the FC system can yield better performance than both the conventional ASR and the LC strategy for noisy speech, and is proposed as an alternative method, namely feature recombination (FC).

...read moreread less

Abstract: This paper presents a new approach for multi-band based automatic speech recognition (ASR). Previous work by Bourlard et al. (see Proc. Int. Conf. on Spoken Language Processing, Philadelphia, p.426-9, 1996) and Hermansky et al. (see Proc. Int. Conf. on Spoken Language Processing, Philadelphia, p.1579-82, 1996) suggests that multi-band ASR gives a more accurate recognition, especially in noisy acoustic environments, by combining the likelihoods of different frequency bands. Here we evaluate this likelihood recombination (LC) approach to multi-band ASR, and propose an alternative method, namely feature recombination (FC). In the FC system, after different acoustic analyzers are applied to each sub-band individually, a vector is composed by combining the sub-band features. The speech classifier then calculates the likelihood from the single vector. Thus, band-limited noise affects only a few of the feature components, as in the multi-band LC system, but, at the same time, all feature components are jointly modeled, as in conventional ASR. The experimental results show that the FC system can yield better performance than both the conventional ASR and the LC strategy for noisy speech.

...read moreread less

Journal Article•DOI•

A robust voice activity detector for wireless communications using soft computing

[...]

Francesco Beritelli, Salvatore Casale¹, A. Cavallaero¹•Institutions (1)

University of Catania¹

01 Dec 1998-IEEE Journal on Selected Areas in Communications

TL;DR: This paper presents a voice detection algorithm which is robust to noisy environments, thanks to a new methodology adopted for the matching process, based on a pattern recognition approach in which the matching phase is performed by a set of six fuzzy rules, trained by means of a new hybrid learning tool.

...read moreread less

Abstract: Discontinuous transmission based on speech/pause detection represents a valid solution to improve the spectral efficiency of new generation wireless communication systems. In this context, robust voice activity detection (VAD) algorithms are required, as traditional solutions present a high misclassification rate in the presence of the background noise typical of mobile environments. This paper presents a voice detection algorithm which is robust to noisy environments, thanks to a new methodology adopted for the matching process. More specifically, the VAD proposed is based on a pattern recognition approach in which the matching phase is performed by a set of six fuzzy rules, trained by means of a new hybrid learning tool. A series of objective tests performed on a large speech database, varying the signal-to-noise ratio (SNR), the types of background noise, and the input signal level, showed that, as compared with the VAD standardized by ITU-T in Recommendation G.729 annex B, the fuzzy VAD, on average, achieves an improvement in reduction both of the activity factor of about 25% and of the clipping introduced of about 43%. Informal listening tests also confirm an improvement in the perceived speech quality.

...read moreread less

Patent•

Telephone messaging and editing system

[...]

Mukund Padmanabhan¹, Michael Picheny¹, David Nahamoo¹, Salim Roukos¹•Institutions (1)

IBM¹

03 Nov 1998

TL;DR: In this article, a first server for receiving speech input by a user, a speech recognition system for converting the speech to text and a speech synthesizer for playing back the synthesized speech for correction by the user and a correction mechanism for enabling the user to correct the speech such that the corrected speech is provided as text for transmittal over a communication system.

...read moreread less

Abstract: A messaging system for receiving speech over a telephone and converting the speech to text includes a first server for receiving speech input by a user, a speech recognition system for converting the speech to text, a speech synthesizer for converting the text to speech for playing back the synthesized speech for correction by the user and a correction mechanism for enabling the user to correct the speech such that the corrected speech is provided as text for transmittal over a communication system.

...read moreread less

Patent•

Management of speech and audio prompts in multimodal interfaces

[...]

Marilyn French-St. George¹, Nicola Fumai¹, Henry Adam Pasternack¹•Institutions (1)

Nortel¹

21 Apr 1998

TL;DR: In this article, the authors present a method for dynamic adjustment of audio prompts and speech prompts by switching from a foreground state to a background state of a speech interface in response to a users current interaction modality, by selecting alternative states for speech and audio interfaces that represent users needs for speech prompts.

...read moreread less

Abstract: Management of speech and audio prompts, and interface presence, in multimodal user interfaces is provided. A communications device having a multimodal user interface including a speech interface, and a non-speech interface, e.g. a graphical or tactile user interface, comprises means for dynamically switching between a background state of the speech interface and a foreground state of the speech interface in accordance with a users input modality choice. Preferably, in the foreground state speech prompts and speech based error recovery are fully implemented and in a background state speech prompts are replaced by earcons, and no speech based error recovery is implemented. Thus there is provided a device which automatically subdue the speech prompts when a user selects a non-speech input/output mechanism. Also provided is a method for dynamic adjustment of audio prompts and speech prompts by switching from a foreground state to a background state of a speech interface in response to a users current interaction modality, by selecting alternative states for speech and audio interfaces that represent users needs for speech prompts. This type of system and method is particularly useful and applicable to hand held Internet access communication devices.

...read moreread less

Patent•

Speech recognition confidence level display

[...]

Jennifer Lai¹, John Vergo¹•Institutions (1)

IBM¹

23 Nov 1998

TL;DR: In this article, a speech recognition computer system and method indicate the level of confidence that a speech recognizer has in it recognition of one or more displayed words, and a plurality of confidence levels of individual recognized words may be visually indicated.

...read moreread less

Abstract: A speech recognition computer system and method indicates the level of confidence that a speech recognizer has in it recognition of one or more displayed words. The system and method allow for the rapid identification of speech recognition errors. A plurality of confidence levels of individual recognized words may be visually indicated. Additionally, the system and method allow the user of the system to select threshold levels to determine when the visual indication occurs.

...read moreread less

Patent•

Method and system of reviewing the behavior of an interactive speech recognition application

[...]

Michael S. Phillips, Mark A. Fanty, Krishna K. Somerville Govindarajan

22 Sep 1998

TL;DR: In this paper, a method and apparatus are provided for improving the performance of an interactive speech application, in which the application stores, in a log, event information that describes each task carried out by the Interactive Speech application in response to interaction with the one or more callers.

...read moreread less

Abstract: A method and apparatus are provided for improving the performance of an interactive speech application. The interactive speech application is developed and deployed for use by one or more callers. During execution, the interactive speech application stores, in a log, event information that describes each task carried out by the interactive speech application in response to interaction with the one or more callers. The application also stores one or more sets of audio information, in which each of the sets of audio information is associated with one or more utterances by one of the callers. Each of the sets of audio information is associated with one of the tasks represented in the log. After the log is established, an analytical report is displayed. The report describes selective actions taken by the interactive speech application while executing, and selective actions taken by one or more callers while interacting with the interactive speech application. Information in the analytical report is selected so as to identify one or more potential performance problems in the interactive speech application. While the analytical report is displayed, when the analytical report reaches a point at which the audio information was previously recorded and stored, the audio information may be replayed and analyzed. The interactive speech application is modified based on the analytical report. Accordingly, the interactive speech application may be improved based upon its actual performance, and its actual performance may be evaluated in detail based on specific call events and caller responses to application actions.

...read moreread less

Patent•

Sequential, nonparametric speech recognition and speaker identification

[...]

Laurence S. Gillick, Andres Corrada-Emmanuel, Michael J. Newman, Barbara Peskin

31 Mar 1998

TL;DR: In this article, a speech sample is received and speech recognition is performed on the speech sample to produce recognition results, and the recognition results are evaluated in view of the training data and the identification of the speech elements to which the portions of training data are related.

...read moreread less

Abstract: A speech sample is evaluated using a computer. Training data that include samples of speech are received and stored along with identification of speech elements to which portions of the training data are related. A speech sample is received and speech recognition is performed on the speech sample to produce recognition results. Finally, the recognition results are evaluated in view of the training data and the identification of the speech elements to which the portions of the training data are related. The technique may be used to perform tasks such as speech recognition, speaker identification, and language identification.

...read moreread less

Proceedings Article•DOI•

Compression of acoustic features for speech recognition in network environments

[...]

Ganesh N. Ramaswamy¹, P.S. Gopalakrishnan•Institutions (1)

IBM¹

12 May 1998

TL;DR: The proposed compression algorithm uses a combination of simple techniques, such as linear prediction and multi-stage vector quantization, and the current version of the algorithm encodes the acoustic features at a fixed rate of 4.0 kbit/s.

...read moreread less

Abstract: In this paper, we describe a new compression algorithm for encoding acoustic features used in typical speech recognition systems. The proposed algorithm uses a combination of simple techniques, such as linear prediction and multi-stage vector quantization, and the current version of the algorithm encodes the acoustic features at a fixed rate of 4.0 kbit/s. The compression algorithm can be used very effectively for speech recognition in network environments, such as those employing a client-server model, or to reduce storage in general speech recognition applications. The algorithm has also been tuned for practical implementations, so that the computational complexity and memory requirements are modest. We have successfully tested the compression algorithm against many test sets from several different languages, and the algorithm performed very well, with no significant change in the recognition accuracy due to compression.

...read moreread less

Patent•

Voice recognition of telephone conversations

[...]

Ronald John Bowater¹, Lawrence Leon Porter¹•Institutions (1)

IBM¹

30 Jun 1998

TL;DR: In this article, a method for performing a voice recognition function on a voice telephone conversation to convert the conversation into text data using a voice processing system was described, which allows entire days/weeks or even months of conversation to be stored and accessed.

...read moreread less

Abstract: Voice data requires large storage resources even when compressed and takes a long time to retrieve. Further the required information cannot normally be directly located and it is difficult to analyze the voice data for statistical information. There is described a method for performing a voice recognition function on a voice telephone conversation to convert the conversation into text data using a voice processing system. The method comprises receiving voice data representing the telephone conversation comprising a first series of speech data from an agent interspersed with a second series of speech data from a client and storing the first and second series of speech data as a single body of voice data for later retrieval. Then a voice recognition function is performed on the voice data to convert it into text data representing the telephone conversation and the text data is stored for later retrieval. Such a solution allows entire days/weeks or even months of conversation to be stored and accessed. Since the memory space required is considerably smaller for text storage it is possible to keep many days of conversation in directly accessible memory that may be searched by a computer. Furthermore it is possible to search for keywords typed from the keyboard and it is not necessary to manually scan the entire conversation for the desired topic.

...read moreread less

Proceedings Article•DOI•

Improvements in children's speech recognition performance

[...]

Subhro Das¹, D. Nix, M. Picheny•Institutions (1)

IBM¹

12 May 1998

TL;DR: Comparative studies demonstrating the performance gain realized by adopting to children's acoustic and language model data to construct a children's speech recognition system are described.

...read moreread less

Abstract: There are several reasons why conventional speech recognition systems modeled on adult data fail to perform satisfactorily on children's speech input. For instance, children's vocal characteristics differ significantly from those of adults. In addition, their choices of vocabulary and sentence construction modalities usually do not conform to adult patterns. We describe comparative studies demonstrating the performance gain realized by adopting to children's acoustic and language model data to construct a children's speech recognition system.

...read moreread less

Patent•

A method and device for activating a voice-controlled function in a multi-station network through using both speaker-dependent and speaker-independent speech recognition

[...]

Franciscus Johannes Lambertus Dams¹, Piet Bernard Hesdahl¹, Jeroen G. Van Velden¹•Institutions (1)

Philips¹

08 Sep 1998-Journal of the Acoustical Society of America

TL;DR: In this paper, speaker-dependent and speaker-independent speech recognition in a voice-controlled multi-station network has been discussed and a fallback procedure is maintained for any particular station in order to cater for failure of the speakerdependent recognition, whilst allowing reverting to the improvement procedure.

...read moreread less

Abstract: A voice-controlled multi-station network has both speaker-dependent and speaker-independent speech recognition. Conditionally to recognizing items of an applicable vocabulary, the network executes a particular function. The method receives a call from a particular origin and executes speaker-independent speech recognition on the call. In an improvement procedure, in case of successful determination of what has been said, a template associated to the recognized speech items is stored and assigned to the origin. Next, speaker-dependent recognition is applied if feasible, for speech received from the same origin, using one or more templates associated to that station. Further, a fallback procedure to speaker-independent recognition is maintained for any particular station in order to cater for failure of the speaker-dependent recognition, whilst allowing reverting to the improvement procedure.

...read moreread less

Patent•

Apparatus and method for speech-text-transmit communication over data networks

[...]

Kenneth Jong¹•Institutions (1)

AT&T¹

03 Jun 1998-Journal of the Acoustical Society of America

TL;DR: In this article, a speech-text-transmit communication over data networks includes speech recognition devices and text to speech conversion devices that translate speech signals input to the terminal into text and text data received from a data network into speech output signals.

...read moreread less

Abstract: An apparatus and method for speech-text-transmit communication over data networks includes speech recognition devices and text to speech conversion devices that translate speech signals input to the terminal into text and text data received from a data network into speech output signals. The speech input signals are translated into text based on phonemes obtained from a spectral analysis of the speech input signals. The text data is transmitted to a receiving party over the data network as a plurality of text data packets such that a continuous stream of text data is obtained. The receiving party's terminal receives the text data and may immediately display the text data and/or translate it into speech output signals using the text to speech conversion device. The text to speech conversion device uses speech pattern data stored in a speech pattern database for synthesizing a human voice for playing of the speech output signals using a speech output device.

...read moreread less

Journal Article•DOI•

The past, present, and future of speech processing

[...]

Biing-Hwang Juang, Tsuhan Chen

01 May 1998-IEEE Signal Processing Magazine

TL;DR: The research areas covered are speech analysis and synthesis, speech coding, speech enhancement, speech recognition, spoken language understanding, speaker identification and verification, and multimodal communication.

...read moreread less

Abstract: This article provides a succinct review of speech research, in particular its history, current trends, and prospects for the future. The research areas covered are speech analysis and synthesis, speech coding, speech enhancement, speech recognition, spoken language understanding, speaker identification and verification, and multimodal communication.

...read moreread less

Proceedings Article•DOI•

A two stage hybrid embedded speech/audio coding structure

[...]

S.A. Ramprashad¹•Institutions (1)

Bell Labs¹

12 May 1998

TL;DR: A two stage hybrid embedded speech/audio coding structure uses a speech coder as a core to provide the minimal bitrate and an acceptable performance on speech inputs and a transform coder using a modified discrete cosine transform and perceptual coding principles is proposed.

...read moreread less

Abstract: A two stage hybrid embedded speech/audio coding structure is proposed. The structure uses a speech coder as a core to provide the minimal bitrate and an acceptable performance on speech inputs. The second stage is a transform coder using a modified discrete cosine transform (MDCT) and perceptual coding principles. This stage is itself embedded both in complexity and bitrate, and provides various levels of enhancement of the core output, particularly for general audio signals like music. Informal A-B comparison tests show that the performance of the structure at 16 kb/s is between that of the GSM enhanced full rate coder at 12.2 kb/s, and the G.728 LD-CELP coder at 16 kb/s.

...read moreread less

Book•

Perceptually inspired signal processing strategies for robust speech recognition in reverberant environments

[...]

Brian Kingsbury, Nelson Morgan

01 Jan 1998

TL;DR: This work presentsceptually Inspired Signal-processing Strategies for Robust Speech Recognition in Reverberant Environments, a novel approach to signal-processing that automates the very labor-intensive and therefore time-heavy and expensive process of recognizing speech.

...read moreread less

Abstract: Perceptually Inspired Signal-processing Strategies for Robust Speech Recognition in Reverberant Environments

...read moreread less

Patent•

Method and apparatus for word counting in continuous speech recognition useful for reliable barge-in and early end of speech detection

[...]

Anand Rangaswamy Setlur¹, Rafid Antoon Aurora Sukkar¹•Institutions (1)

Alcatel-Lucent¹

31 Jul 1998

TL;DR: In this paper, the authors used therapidly available speech recognition results to provide intelligent barge-in for voice-response systems and, to count words to output sub-sequences to provide paralleling and/or pipelining of tasks related to the entire word sequence to increase processing throughput.

...read moreread less

Abstract: Speech recognition technology has attained maturity such that the most likely speech recognition result has been reached and is available before an energy based termination of speech has been made. The present invention innovatively uses therapidly available speech recognition results to provide intelligent barge-in forvoice-response systems and, to count words to output sub-sequences to provide paralleling and/or pipelining of tasks related to the entire word sequence to increase processing throughput.

...read moreread less

Collapse