Institution
Nuance Communications
Company•Vienna, Austria•
About: Nuance Communications is a company organization based out in Vienna, Austria. It is known for research contribution in the topics: Speech processing & Voice activity detection. The organization has 1518 authors who have published 1701 publications receiving 54891 citations. The organization is also known as: ScanSoft & ScanSoft Inc..
Papers published on a yearly basis
Papers
More filters
•
19 Nov 2012TL;DR: In this paper, a speech output is generated from a text input written in a first language and containing inclusions in a second language, where words in the native language are pronounced with a native pronunciation and words in a foreign language are spoken with a proficient foreign pronunciation.
Abstract: A speech output is generated from a text input written in a first language and containing inclusions in a second language. Words in the native language are pronounced with a native pronunciation and words in the foreign language are pronounced with a proficient foreign pronunciation. Language dependent phoneme symbols generated for words of the second language are replaced with language dependent phoneme symbols of the first language, where said replacing includes the steps of assigning to each language dependent phoneme symbol of the second language a language independent target phoneme symbol, mapping to each one language independent target phoneme symbol a language independent substitute phoneme symbol assignable to a language dependent substitute phoneme symbol of the first language, substituting the language dependent phoneme symbols of the second language by the language dependent substitute phoneme symbols of the first language.
27 citations
•
08 Sep 2010TL;DR: In this article, a carousel having a plurality of slots may be displayed in a first portion of a display of display devices, and content that is dynamically generated based on user input may be shown in a second portion of the display, separate from the first portion.
Abstract: Some embodiments relate to using a carousel to display content. In some embodiments, a carousel having a plurality of slots may be displayed in a first portion of a display of a display device, and in response to user selection of one of the plurality of slots, content that is dynamically generated based on user input may be displayed in a second portion of the display, separate from the first portion.
27 citations
•
09 Mar 2012TL;DR: In this article, the authors present an interface that integrates the multiple channels of the customer service provider and recommends a channel based on an identification of a customer service need of the customers.
Abstract: Customer service and/or care providers generally have multiple communications channels (i.e., modes of communications, such as an Internet webpage, live agent telephones, Interactive Voice Response (IVR) system) of communication with which a customer may interact with the customer service provider. Currently, customers must select the communications channel by guessing which communications channel would best accommodate the customer's purpose/need for communicating with the customer service provider. In some scenarios, the customer may select the wrong communications channel because the selected channel is not able to service the customer's need. In another scenario, the customer may select a channel that is more cumbersome to service the customer's particular need than another channel of the customer service provider. Embodiments of the present invention provide an interface that integrates the multiple channels of the customer service provider and recommends a channel based on an identification of a customer service need of the customer.
27 citations
••
28 Jan 2007TL;DR: The architecture of a modern OCR system is described with an emphasis on the adaptation process, where systems try to adapt themselves to the actual features of the image or document to be recognized.
Abstract: Optical Character Recognition is much more than character classification. An industrial OCR application combines
algorithms studied in detail by different researchers in the area of image processing, pattern recognition, machine
learning, language analysis, document understanding, data mining, and other, artificial intelligence domains. There is no
single perfect algorithm for any of the OCR problems, so modern systems try to adapt themselves to the actual features
of the image or document to be recognized. This paper describes the architecture of a modern OCR system with an
emphasis on this adaptation process.
26 citations
•
29 Feb 2000TL;DR: In this article, a method and apparatus for generating a noise-reduced feature vector representing human speech is presented, where speech data representing an input speech waveform are first input and filtered, and a noise reduction process is then performed.
Abstract: A method and apparatus for generating a noise-reduced feature vector representing human speech are provided. Speech data representing an input speech waveform are first input and filtered. Spectral energies of the filtered speech data are determined, and a noise reduction process is then performed. In the noise reduction process, a spectral magnitude is computed for a frequency index of multiple frequency indexes. A noise magnitude estimate is then determined for the frequency index by updating a histogram of spectral magnitude, and then determining the noise magnitude estimate as a predetermined percentile of the histogram. A signal-to-noise ratio is then determined for the frequency index. A scale factor is computed for the frequency index, as a function of the signal-to-noise ratio and the noise magnitude estimate. The noise magnitude estimate is then scaled by the scale factor. The scaled noise magnitude estimate is subtracted from the spectral magnitudes of the filtered speech data, to produce cleaned speech data, based on which a feature vector is generated.
26 citations
Authors
Showing all 1521 results
Name | H-index | Papers | Citations |
---|---|---|---|
Vinayak P. Dravid | 103 | 817 | 43612 |
Mehryar Mohri | 75 | 320 | 22868 |
Jinsong Wu | 70 | 566 | 16282 |
Horacio D. Espinosa | 67 | 315 | 16270 |
Shumin Zhai | 67 | 200 | 13447 |
Shang-Hua Teng | 66 | 265 | 16647 |
Dimitri Kanevsky | 62 | 362 | 14072 |
Marilyn A. Walker | 62 | 309 | 13429 |
Tara N. Sainath | 61 | 274 | 25183 |
Kenneth Church | 61 | 295 | 21179 |
John B Ketterson | 60 | 814 | 16929 |
Pascal Frossard | 59 | 637 | 22749 |
Michael Picheny | 57 | 244 | 11759 |
G. R. Scott Budinger | 56 | 196 | 12063 |
Jun Wu | 53 | 359 | 12110 |