Institution
Nuance Communications
Company•Vienna, Austria•
About: Nuance Communications is a company organization based out in Vienna, Austria. It is known for research contribution in the topics: Speech processing & Voice activity detection. The organization has 1518 authors who have published 1701 publications receiving 54891 citations. The organization is also known as: ScanSoft & ScanSoft Inc..
Papers published on a yearly basis
Papers
More filters
•
31 Mar 2009TL;DR: In this paper, a method for detecting barge-in in a speech dialogue system comprising determining whether a speech prompt is output by the speech dialog system, and detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold of a speech activity detector and/or based on speaker information is presented.
Abstract: A method for detecting barge-in in a speech dialogue system comprising determining whether a speech prompt is output by the speech dialogue system, and detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold of a speech activity detector and/or based on speaker information, where the sensitivity threshold is increased if output of a speech prompt is determined and decreased if no output of a speech prompt is determined. If speech activity is detected in the input signal, the speech prompt may be interrupted or faded out. A speech dialogue system configured to detect barge-in is also disclosed.
20 citations
••
22 May 2011TL;DR: The subspace based Gaussian mixture model (SGMM) is shown to provide an 18% reduction in word error rate (WER) for speaker independent ASR relative to the continuous density HMM(CDHMM) in the resource management CSR domain.
Abstract: This paper investigates the impact of subspace based techniques for acoustic modeling in automatic speech recognition (ASR). There are many well known approaches to subspace based speaker adaptation which represent sources of variability as a projection within a low dimensional subspace. A new approach to acoustic modeling in ASR, referred to as the subspace based Gaussian mixture model (SGMM), represents phonetic variability as a set of projections applied at the state level in a hidden Markov model (HMM) based acoustic model. The impact of the SGMM in modeling these intrinsic sources of variability is evaluated for a continuous speech recognition (CSR) task. The SGMM is shown to provide an 18% reduction in word error rate (WER) for speaker independent (SI) ASR relative to the continuous density HMM(CDHMM) in the resource management CSR domain. The SI performance obtained from SGMM also represents a 5% reduction in WER relative to subspace based speaker adaption in an unsupervised speaker adaptation scenario.
20 citations
•
26 May 2015TL;DR: In this article, the authors propose a method for reducing latency in speech recognition applications, which comprises receiving first audio comprising speech from a user of a computing device, detecting an end-of-speech in the first audio, generating an ASR result based, at least in part, on a portion of audio prior to the detected end of speech.
Abstract: Methods and apparatus for reducing latency in speech recognition applications. The method comprises receive first audio comprising speech from a user of a computing device, detecting an end of speech in the first audio, generating an ASR result based, at least in part, on a portion of the first audio prior to the detected end of speech, determining whether a valid action can be performed by a speech-enabled application installed on the computing device using the ASR result, and processing second audio when it is determined that a valid action cannot be performed by the speech-enabled application using the ASR result.
20 citations
•
30 Aug 2011TL;DR: In this article, a user may issue a search query, and the search engine or engines to which that query is provided may be determined dynamically based on any of a variety of factors.
Abstract: Some embodiments relate to techniques for performing a search for content, in which a user may issue a search query, and the search engine or engines to which that query is provided may be determined dynamically based on any of a variety of factors. For example, in some embodiments, the search engine or engines to which the query is provided may be determined based on the content of the search query, and/or auxiliary information such as the user's location, demographics, query history and/or browsing history.
20 citations
•
24 Mar 2006TL;DR: In this article, a caption correction system is proposed for real-time captions to a presentation or the like, where a manual judgment of the voice recognition result is performed on the basis of the processed voice.
Abstract: PROBLEM TO BE SOLVED: To solve problems that manual provision of real-time captions to a presentation or the like has low popularization in costs, a high recognition rate can not be expected only by an automatic voice recognition apparatus and there is a problem of incorrect translation and to provide an inexpensive apparatus or the like. SOLUTION: The caption correction apparatus obtains character strings and a degree of confidence of a voice recognition result. A time monitoring monitor monitors time and judges whether processing is delayed or not on the basis of the degree of confidence and time status. When the processing is not delayed, manual judgment is requested to a checker. In this case, voice is processed and the manual judgment of the voice recognition result is performed on the basis of the processed voice. When the processing is delayed, automatic judgment is performed on the basis of the degree of confidence. When the validity of the voice recognition result is judged as the result of manual judgment or automatic judgment, the character strings are displayed as determined character strings. When the invalidity of the voice recognition result is judged, the voice recognition result is automatically corrected by matching on the basis of a succeeding candidate based on voice recognition, the text/attributes of the presentation, the text of a script, and so on. Automatically corrected character strings are displayed as indefinite character strings. COPYRIGHT: (C)2008,JPO&INPIT
20 citations
Authors
Showing all 1521 results
Name | H-index | Papers | Citations |
---|---|---|---|
Vinayak P. Dravid | 103 | 817 | 43612 |
Mehryar Mohri | 75 | 320 | 22868 |
Jinsong Wu | 70 | 566 | 16282 |
Horacio D. Espinosa | 67 | 315 | 16270 |
Shumin Zhai | 67 | 200 | 13447 |
Shang-Hua Teng | 66 | 265 | 16647 |
Dimitri Kanevsky | 62 | 362 | 14072 |
Marilyn A. Walker | 62 | 309 | 13429 |
Tara N. Sainath | 61 | 274 | 25183 |
Kenneth Church | 61 | 295 | 21179 |
John B Ketterson | 60 | 814 | 16929 |
Pascal Frossard | 59 | 637 | 22749 |
Michael Picheny | 57 | 244 | 11759 |
G. R. Scott Budinger | 56 | 196 | 12063 |
Jun Wu | 53 | 359 | 12110 |