Dynamic switching between local and remote speech rendering

Home
/
Papers
/
Dynamic switching between local and remote speech rendering

Patent•

Dynamic switching between local and remote speech rendering

Charles W. Cross¹, David Jaramillo¹, Gerald M. McCobb¹•Institutions (1)

08 Dec 2004-

TL;DR: In this paper, a multimodal browser for rendering a multi-modal document on an end system defining a host can include a visual browser component for rendering visual content, if any, of the multimodi-al document, and a voice browser component, which can determine which of a plurality of speech processing configuration is used by the host in rendering the voice-based content.

read less

Abstract: A multimodal browser for rendering a multimodal document on an end system defining a host can include a visual browser component for rendering visual content, if any, of the multimodal document, and a voice browser component for rendering voice-based content, if any, of the multimodal document. The voice browser component can determine which of a plurality of speech processing configuration is used by the host in rendering the voice-based content. The determination can be based upon the resources of the host running the application. The determination also can be based upon a processing instruction contained in the application.

...read moreread less

Citations

PDF

Open Access

More filters

Patent•

Intelligent Automated Assistant

[...]

Thomas R. Gruber¹, Adam Cheyer¹, Dag Kittlaus¹, Didier Rene Guzzoni¹, Christopher Dean Brigham¹, Richard Donald Giuli¹, Marcello Bastea-Forte¹, Harry J. Saddler¹ - Show less +4 more•Institutions (1)

Apple Inc.¹

11 Jan 2011

TL;DR: In this article, an intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions.

...read moreread less

Abstract: An intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.

...read moreread less

1,462 citations

Patent•

Using context information to facilitate processing of commands in a virtual assistant

[...]

Thomas R. Gruber¹, Christopher Dean Brigham¹, Daniel S. Keen¹, Gregory Novick¹, Phipps Benjamin S¹ - Show less +1 more•Institutions (1)

Apple Inc.¹

28 Sep 2012

TL;DR: In this article, a virtual assistant uses context information to supplement natural language or gestural input from a user, which helps to clarify the user's intent and reduce the number of candidate interpretations of user's input, and reduces the need for the user to provide excessive clarification input.

...read moreread less

Abstract: A virtual assistant uses context information to supplement natural language or gestural input from a user. Context helps to clarify the user's intent and to reduce the number of candidate interpretations of the user's input, and reduces the need for the user to provide excessive clarification input. Context can include any available information that is usable by the assistant to supplement explicit user input to constrain an information-processing problem and/or to personalize results. Context can be used to constrain solutions during various phases of processing, including, for example, speech recognition, natural language processing, task flow processing, and dialog generation.

...read moreread less

593 citations

Patent•

Method and apparatus for building an intelligent automated assistant

[...]

Adam Cheyer¹, Didier Rene Guzzoni¹•Institutions (1)

Apple Inc.¹

08 Sep 2006

TL;DR: In this paper, a method for building an automated assistant includes interfacing a service-oriented architecture that includes a plurality of remote services to an active ontology, where the active ontologies includes at least one active processing element that models a domain.

...read moreread less

Abstract: A method and apparatus are provided for building an intelligent automated assistant. Embodiments of the present invention rely on the concept of “active ontologies” (e.g., execution environments constructed in an ontology-like manner) to build and run applications for use by intelligent automated assistants. In one specific embodiment, a method for building an automated assistant includes interfacing a service-oriented architecture that includes a plurality of remote services to an active ontology, where the active ontology includes at least one active processing element that models a domain. At least one of the remote services is then registered for use in the domain.

...read moreread less

389 citations

Patent•

Automatically adapting user interfaces for hands-free interaction

[...]

Thomas R. Gruber¹, Harry J. Saddler¹•Institutions (1)

Apple Inc.¹

30 Sep 2011

TL;DR: In this article, the authors present a method for automatically determining whether a digital assistant application has been separately invoked by a user without regard to whether a user has separately invoked the application.

...read moreread less

Abstract: The method includes automatically, without user input and without regard to whether a digital assistant application has been separately invoked by a user, determining that the electronic device is in a vehicle. In some implementations, determining that the electronic device is in a vehicle comprises detecting that the electronic device is in communication with the vehicle (e.g., via a wired or wireless communication techniques and/or protocols). The method also includes, responsive to the determining, invoking a listening mode of a virtual assistant implemented by the electronic device. In some implementations, the method also includes limiting the ability of a user to view visual output presented by the electronic device, provide typed input to the electronic device, and the like.

...read moreread less

367 citations

Patent•

Voice trigger for a digital assistant

[...]

Justin G. Binder¹, Onur E. Tackin¹, Samuel D. Post¹, Thomas R. Gruber¹•Institutions (1)

Apple Inc.¹

07 Feb 2014

TL;DR: In this paper, a method for operating a voice trigger is presented, which includes determining whether at least a portion of the sound input corresponds to a predetermined type of sound, such as a human voice.

...read moreread less

Abstract: A method for operating a voice trigger is provided. In some implementations, the method is performed at an electronic device including one or more processors and memory storing instructions for execution by the one or more processors. The method includes receiving a sound input. The sound input may correspond to a spoken word or phrase, or a portion thereof. The method includes determining whether at least a portion of the sound input corresponds to a predetermined type of sound, such as a human voice. The method includes, upon a determination that at least a portion of the sound input corresponds to the predetermined type, determining whether the sound input includes predetermined content, such as a predetermined trigger word or phrase. The method also includes, upon a determination that the sound input includes the predetermined content, initiating a speech-based service, such as a voice-based digital assistant.

...read moreread less

365 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

Collapse

References

PDF

Open Access

More filters

Patent•

Universal IP-based and scalable architectures across conversational applications using web services for speech and audio processing resources

[...]

Stephane H. Maes¹, David M. Lubensky¹, Andrzej Sakrajda¹•Institutions (1)

IBM¹

25 Jun 2002

TL;DR: In this paper, the authors present systems and methods for building distributed conversational applications using a Web services-based model where speech engines (e.g., speech recognition) and audio I/O systems are programmable services that can be asynchronously programmed by an application using a standard, extensible SERCP (speech engine remote control protocol), to provide scalable and flexible IP-based architectures that enable deployment of the same application or application development environment across a wide range of voice processing platforms and networks/gateways.

...read moreread less

Abstract: Systems and methods for conversational computing and, in particular, to systems and methods for building distributed conversational applications using a Web services-based model wherein speech engines (e.g., speech recognition) and audio I/O systems are programmable services that can be asynchronously programmed by an application using a standard, extensible SERCP (speech engine remote control protocol), to thereby provide scalable and flexible IP-based architectures that enable deployment of the same application or application development environment across a wide range of voice processing platforms and networks/gateways (e.g., PSTN (public switched telephone network), Wireless, Internet, and VoIP (voice over IP)). Systems and methods are further provided for dynamically allocating, assigning, configuring and controlling speech resources such as speech engines, speech pre/post processing systems, audio subsystems, and exchanges between speech engines using SERCP in a web service-based framework.

...read moreread less

619 citations

Patent•DOI•

Distributed voice user interface

[...]

George M. White, James J. Buteau, Glen E. Shires, Kevin J. Surace, Steven Markman - Show less +1 more

22 Jan 2002-Journal of the Acoustical Society of America

TL;DR: In this article, a distributed voice user interface system includes a local device which receives speech input issued from a user, such speech input may specify a command or a request by the user, and the local device performs preliminary processing of the speech input and determines whether it is able to respond to the command or request by itself.

...read moreread less

Abstract: A distributed voice user interface system includes a local device which receives speech input issued from a user. Such speech input may specify a command or a request by the user. The local device performs preliminary processing of the speech input and determines whether it is able to respond to the command or request by itself. If not, the local device initiates communication with a remote system for further processing of the speech input.

...read moreread less

441 citations

Patent•

System and method for providing network coordinated conversational services

[...]

Stephane H. Maes¹, Ponani S. Gopalakrishnan²•Institutions (2)

IBM¹, Nuance Communications²

01 Oct 1999

TL;DR: In this paper, a system and method for providing automatic and coordinated sharing of conversational resources, e.g., functions and arguments, between network-connected servers and devices and their corresponding applications is presented.

...read moreread less

Abstract: A system and method for providing automatic and coordinated sharing of conversational resources, e.g., functions and arguments, between network-connected servers and devices and their corresponding applications. In one aspect, a system for providing automatic and coordinated sharing of conversational resources includes a network having a first and second network device, the first and second network device each comprising a set of conversational resources, a dialog manager for managing a conversation and executing calls requesting a conversational service, and a communication stack for communicating messages over the network using conversational protocols, wherein the conversational protocols establish coordinated network communication between the dialog managers of the first and second network device to automatically share the set of conversational resources of the first and second network device, when necessary, to perform their respective requested conversational service.

...read moreread less

361 citations

Patent•DOI•

Distributed voice recognition system

[...]

Paul E. Jacobs¹, Chienchung Chang¹•Institutions (1)

Qualcomm¹

20 Dec 1994-Journal of the Acoustical Society of America

TL;DR: In this article, a distributed voice recognition system includes a digital signal processor (DSP), a nonvolatile storage medium (108), and a microprocessor (106), which is configured to extract parameters from digitized input speech samples and provide the extracted parameters to the microprocessor.

...read moreread less

Abstract: A distributed voice recognition system includes a digital signal processor (DSP)(104), a nonvolatile storage medium (108), and a microprocessor (106). The DSP (104) is configured to extract parameters from digitized input speech samples and provide the extracted parameters to the microprocessor (106). The nonvolatile storage medium contains a database of speech templates. The microprocessor is configured to read the contents of the nonvolatile storage medium (108), compare the parameters with the contents, and select a speech template based upon the comparison. The nonvolatile storage medium may be a flash memory. The DSP (104) may be a vocoder. If the DSP (104) is a vocoder, the parameters may be diagnostic data generated by the vocoder. The distributed voice recognition system may reside on an application specific integrated circuit (ASIC).

...read moreread less

361 citations

Patent•

Systems and methods for implementing modular DOM (Document Object Model)-based multi-modal browsers

[...]

David Boloker, Rafah A. Hosn, Photina Jaeyun Jang, Jan Kleindienst, Tomas Macek, Stephane H. Maes, T. V. Raman, Ladislav Seredi - Show less +4 more

04 Dec 2001

TL;DR: In this article, the authors present a framework for building modular multi-modal browsers using a DOM (Document Object Model) and MVC (Model-View-Controller) framework that enables a user to interact in parallel with the same information via a multiplicity of channels, devices, and/or user interfaces.

...read moreread less

Abstract: Systems and methods for building multi-modal browsers applications and, in particular, to systems and methods for building modular multi-modal browsers using a DOM (Document Object Model) and MVC (Model-View-Controller) framework that enables a user to interact in parallel with the same information via a multiplicity of channels, devices, and/or user interfaces, while presenting a unified, synchronized view of such information across the various channels, devices and/or user interfaces supported by the multi-modal browser. The use of a DOM framework (or specifications similar to DOM) allows existing browsers to be extended without modification of the underling browser code. A multi-modal browser framework is modular and flexible to allow various fat client and thin (distributed) client approaches.

...read moreread less

342 citations