scispace - formally typeset
Search or ask a question
Patent

Acoustic sensory network

TL;DR: In this article, a system is provided for determining which controllable device an audible command is directed towards, based on a comparison of each of the time and date stamps performed by the central processor.
Abstract: A system is provided for determining which controllable device an audible command is directed towards. The system comprises two or more controllable devices, two or more electronic devices, each of which is adapted to receive the audible command, add a respective electronic device identifier to the received audible command, time and date stamp the received audible command, and transmit the respective time and date stamped versions of the audible command, and wherein each of the two or more electronic device are further adapted to controller respective ones of the two or more controllable devices, and a central processor adapted to receive each of the transmitted time and date stamped versions of the audible command and perform processing based on the time and date stamp, wherein the electronic device that reports the earlier time and date stamp, as ascertained by the respective electronic device identifier and a comparison of each of the time and date stamps performed by the central processor, is the electronic device to which the audible command is directed towards.
Citations
More filters
Patent
28 May 2015
TL;DR: In this paper, the authors describe a system for handling multi-part voice commands for a virtual assistant that includes multiple actionable commands within a single utterance using a speech transcription process, where the text string can be parsed into multiple candidate substrings based on domain keywords, imperative verbs, predetermined substring lengths, or the like.
Abstract: Systems and processes are disclosed for handling a multi-part voice command for a virtual assistant. Speech input can be received from a user that includes multiple actionable commands within a single utterance. A text string can be generated from the speech input using a speech transcription process. The text string can be parsed into multiple candidate substrings based on domain keywords, imperative verbs, predetermined substring lengths, or the like. For each candidate substring, a probability can be determined indicating whether the candidate substring corresponds to an actionable command. Such probabilities can be determined based on semantic coherence, similarity to user request templates, querying services to determine manageability, or the like. If the probabilities exceed a threshold, the user intent of each substring can be determined, processes associated with the user intents can be executed, and an acknowledgment can be provided to the user.

225 citations

Patent
Allen P. Haughay1
13 Nov 2015
TL;DR: In this paper, a voice input is processed using a subset of words from a library used to identify the words or phrases of the voice input provided by the user, such that the particular subset can be selected such that voice inputs provided by a user are more likely to include words from the subset.
Abstract: This is directed to processing voice inputs received by an electronic device. In particular, this is directed to receiving a voice input and identifying the user providing the voice input. The voice input can be processed using a subset of words from a library used to identify the words or phrases of the voice input. The particular subset can be selected such that voice inputs provided by the user are more likely to include words from the subset. The subset of the library can be selected using any suitable approach, including for example based on the user's interests and words that relate to those interests. For example, the subset can include one or more words related to media items selected by the user for storage on the electronic device, names of the user's contacts, applications or processes used by the user, or any other words relating to the user's interactions with the device.

185 citations

Patent
31 Mar 2015
TL;DR: In this paper, the authors describe a system for using a virtual assistant to control electronic devices, where a user can speak an input in natural language form to a user device, which can forward the commands to the appropriate one or more electronic devices for execution.
Abstract: This relates to systems and processes for using a virtual assistant to control electronic devices. In one example process, a user can speak an input in natural language form to a user device to control one or more electronic devices. The user device can transmit the user speech to a server to be converted into a textual representation. The server can identify the one or more electronic devices and appropriate commands to be performed by the one or more electronic devices based on the textual representation. The identified one or more devices and commands to be performed can be transmitted back to the user device, which can forward the commands to the appropriate one or more electronic devices for execution. In response to receiving the commands, the one or more electronic devices can perform the commands and transmit their current states to the user device.

162 citations

Patent
27 Aug 2015
TL;DR: In this paper, a speaker identification system for virtual assistants is presented, in which a speaker profile is generated for each user based on the speaker profile for a predetermined user and contextual information is used to verify results produced by the speaker identification process.
Abstract: Systems and processes for generating a speaker profile for use in performing speaker identification for a virtual assistant are provided. One example process can include receiving an audio input including user speech and determining whether a speaker of the user speech is a predetermined user based on a speaker profile for the predetermined user. In response to determining that the speaker of the user speech is the predetermined user, the user speech can be added to the speaker profile and operation of the virtual assistant can be triggered. In response to determining that the speaker of the user speech is not the predetermined user, the user speech can be added to an alternate speaker profile and operation of the virtual assistant may not be triggered. In some examples, contextual information can be used to verify results produced by the speaker identification process.

142 citations

Patent
08 Mar 2016
Abstract: At a first electronic device with a display and a microphone: sampling audio input using the first microphone; in accordance with the sampling of audio input using the first microphone, sending stop instructions to a second electronic device with a second microphone, the second electronic device external to the first electronic device, wherein the second electronic device is configured to respond to audio input received using the second microphone, and wherein the stop instructions instruct the second electronic device to forgo responding to audio input received using the second microphone, wherein responding to audio input received using the second microphone comprises providing perceptible output.

137 citations

References
More filters
PatentDOI
TL;DR: In this article, the authors proposed a method for recognizing audio samples that locates an audio file that most closely matches the audio sample from a database indexing a large set of original recordings, where each indexed audio file is represented in the database index by a set of landmark timepoints and associated fingerprints.
Abstract: A method for recognizing an audio sample locates an audio file that most closely matches the audio sample from a database indexing a large set of original recordings. Each indexed audio file is represented in the database index by a set of landmark timepoints and associated fingerprints. Landmarks occur at reproducible locations within the file, while fingerprints represent features of the signal at or near the landmark timepoints. To perform recognition, landmarks and fingerprints are computed for the unknown sample and used to retrieve matching fingerprints from the database. For each file containing matching fingerprints, the landmarks are compared with landmarks of the sample at which the same fingerprints were computed. If a large number of corresponding landmarks are linearly related, i.e., if equivalent fingerprints of the sample and retrieved file have the same time evolution, then the file is identified with the sample. The method can be used for any type of sound or music, and is particularly effective for audio signals subject to linear and nonlinear distortion such as background noise, compression artifacts, or transmission dropouts. The sample can be identified in a time proportional to the logarithm of the number of entries in the database; given sufficient computational power, recognition can be performed in nearly real time as the sound is being sampled.

774 citations

Patent
09 Jul 2013
TL;DR: In this article, a home automation system and method for configuring a device state including but not limited to receiving an input from the client device at the server, configuring the device state in the database in accordance with the input, and sending the configured device state from the server to the client devices.
Abstract: A home automation system and method are disclosed for configuring a device state including but not limited to receiving an input from the client device at the server, configuring the device state in the database at the server in accordance with the input, and sending the configured device state from the server to the client device.

331 citations

Patent
14 Feb 2006
TL;DR: In this article, a method for interacting with a controllable device in an internet protocol television (IPTV) system is described, which allows the user interface (UI) data from a database accessible to the control server to be accessed by the first client device.
Abstract: In one embodiment a method is disclosed for interacting with a controllable device in an internet protocol television (IPTV) system. The method receives at a control server, device state data for the controllable device from a first client device in the IPTV network; accesses user interface (UI) data from a database accessible to the control server; reflects the device state data in the UI data at the control server; and sends the UI data from an IPTV server to the first client device. In another embodiment a system is disclosed for interacting with a controllable device in an internet protocol television (IPTV) system. The system receives at a control server, device state data for the controllable device from a first client device in the IPTV network; accesses user interface (UI) data from a database at the control server; reflects the device state data in the UI at the control server; and sends the UI from an IPTV server to the first client device.

327 citations

Patent
04 Oct 2011
TL;DR: A self-contained wireless interactive speech recognition control device and system that integrates with automated systems and appliances to provide totally hands-free speech control capabilities for a given space is described in this article.
Abstract: A self-contained wireless interactive speech recognition control device and system that integrates with automated systems and appliances to provide totally hands-free speech control capabilities for a given space. Preferably, each device comprises a programmable microcontroller having embedded speech recognition and audio output capabilities, a microphone, a speaker and a wireless communication system through which a plurality of devices can communicate with each other and with one or more system controllers or automated mechanisms. The device may be enclosed in a stand-alone housing or within a standard electrical wall box. Several devices may be installed in close proximity to one another to ensure hands-free coverage throughout the space. When two or more devices are triggered simultaneously by the same speech command, real time coordination ensures that only one device will respond to the command.

303 citations

Patent
Matthew Sharifi1
29 Sep 2015
TL;DR: In this paper, a method, system and apparatus including a computer program encoded on a computer storage medium, for hotword detection on multiple devices are disclosed, including a method that receives, by a first computing device, audio data corresponding to an utterance, determining a first value corresponding to a likelihood that the utterance includes a hotword, the second value being determined by a second computing device.
Abstract: To allow for detecting hotwords on multiple devices.SOLUTION: A method, system and apparatus including a computer program encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes receiving, by a first computing device, audio data corresponding to an utterance. The method also includes determining a first value corresponding to a likelihood that the utterance includes a hotword. The method further includes receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The method further includes comparing the first value and the second value. The method further includes initiating speech recognition processing on the audio data based on the comparison between the first value and the second value.SELECTED DRAWING: Figure 1

180 citations