scispace - formally typeset
Search or ask a question
Institution

Nuance Communications

CompanyVienna, Austria
About: Nuance Communications is a company organization based out in Vienna, Austria. It is known for research contribution in the topics: Speech processing & Voice activity detection. The organization has 1518 authors who have published 1701 publications receiving 54891 citations. The organization is also known as: ScanSoft & ScanSoft Inc..


Papers
More filters
Patent
31 Mar 2009
TL;DR: In this paper, a method for detecting barge-in in a speech dialogue system comprising determining whether a speech prompt is output by the speech dialog system, and detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold of a speech activity detector and/or based on speaker information is presented.
Abstract: A method for detecting barge-in in a speech dialogue system comprising determining whether a speech prompt is output by the speech dialogue system, and detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold of a speech activity detector and/or based on speaker information, where the sensitivity threshold is increased if output of a speech prompt is determined and decreased if no output of a speech prompt is determined. If speech activity is detected in the input signal, the speech prompt may be interrupted or faded out. A speech dialogue system configured to detect barge-in is also disclosed.

20 citations

Proceedings ArticleDOI
22 May 2011
TL;DR: The subspace based Gaussian mixture model (SGMM) is shown to provide an 18% reduction in word error rate (WER) for speaker independent ASR relative to the continuous density HMM(CDHMM) in the resource management CSR domain.
Abstract: This paper investigates the impact of subspace based techniques for acoustic modeling in automatic speech recognition (ASR). There are many well known approaches to subspace based speaker adaptation which represent sources of variability as a projection within a low dimensional subspace. A new approach to acoustic modeling in ASR, referred to as the subspace based Gaussian mixture model (SGMM), represents phonetic variability as a set of projections applied at the state level in a hidden Markov model (HMM) based acoustic model. The impact of the SGMM in modeling these intrinsic sources of variability is evaluated for a continuous speech recognition (CSR) task. The SGMM is shown to provide an 18% reduction in word error rate (WER) for speaker independent (SI) ASR relative to the continuous density HMM(CDHMM) in the resource management CSR domain. The SI performance obtained from SGMM also represents a 5% reduction in WER relative to subspace based speaker adaption in an unsupervised speaker adaptation scenario.

20 citations

Patent
Mark Fanty1
26 May 2015
TL;DR: In this article, the authors propose a method for reducing latency in speech recognition applications, which comprises receiving first audio comprising speech from a user of a computing device, detecting an end-of-speech in the first audio, generating an ASR result based, at least in part, on a portion of audio prior to the detected end of speech.
Abstract: Methods and apparatus for reducing latency in speech recognition applications. The method comprises receive first audio comprising speech from a user of a computing device, detecting an end of speech in the first audio, generating an ASR result based, at least in part, on a portion of the first audio prior to the detected end of speech, determining whether a valid action can be performed by a speech-enabled application installed on the computing device using the ASR result, and processing second audio when it is determined that a valid action cannot be performed by the speech-enabled application using the ASR result.

20 citations

Patent
30 Aug 2011
TL;DR: In this article, a user may issue a search query, and the search engine or engines to which that query is provided may be determined dynamically based on any of a variety of factors.
Abstract: Some embodiments relate to techniques for performing a search for content, in which a user may issue a search query, and the search engine or engines to which that query is provided may be determined dynamically based on any of a variety of factors. For example, in some embodiments, the search engine or engines to which the query is provided may be determined based on the content of the search query, and/or auxiliary information such as the user's location, demographics, query history and/or browsing history.

20 citations

Patent
24 Mar 2006
TL;DR: In this article, a caption correction system is proposed for real-time captions to a presentation or the like, where a manual judgment of the voice recognition result is performed on the basis of the processed voice.
Abstract: PROBLEM TO BE SOLVED: To solve problems that manual provision of real-time captions to a presentation or the like has low popularization in costs, a high recognition rate can not be expected only by an automatic voice recognition apparatus and there is a problem of incorrect translation and to provide an inexpensive apparatus or the like. SOLUTION: The caption correction apparatus obtains character strings and a degree of confidence of a voice recognition result. A time monitoring monitor monitors time and judges whether processing is delayed or not on the basis of the degree of confidence and time status. When the processing is not delayed, manual judgment is requested to a checker. In this case, voice is processed and the manual judgment of the voice recognition result is performed on the basis of the processed voice. When the processing is delayed, automatic judgment is performed on the basis of the degree of confidence. When the validity of the voice recognition result is judged as the result of manual judgment or automatic judgment, the character strings are displayed as determined character strings. When the invalidity of the voice recognition result is judged, the voice recognition result is automatically corrected by matching on the basis of a succeeding candidate based on voice recognition, the text/attributes of the presentation, the text of a script, and so on. Automatically corrected character strings are displayed as indefinite character strings. COPYRIGHT: (C)2008,JPO&INPIT

20 citations


Authors

Showing all 1521 results

NameH-indexPapersCitations
Vinayak P. Dravid10381743612
Mehryar Mohri7532022868
Jinsong Wu7056616282
Horacio D. Espinosa6731516270
Shumin Zhai6720013447
Shang-Hua Teng6626516647
Dimitri Kanevsky6236214072
Marilyn A. Walker6230913429
Tara N. Sainath6127425183
Kenneth Church6129521179
John B Ketterson6081416929
Pascal Frossard5963722749
Michael Picheny5724411759
G. R. Scott Budinger5619612063
Jun Wu5335912110
Network Information
Related Institutions (5)
Google
39.8K papers, 2.1M citations

82% related

Microsoft
86.9K papers, 4.1M citations

82% related

Carnegie Mellon University
104.3K papers, 5.9M citations

80% related

Nokia
28.3K papers, 695.7K citations

79% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20223
202124
202042
201955
201841
201753