scispace - formally typeset
Search or ask a question
Author

Kalyan Ganesan

Bio: Kalyan Ganesan is an academic researcher from ExxonMobil. The author has contributed to research in topics: Code-excited linear prediction & Voice activity detection. The author has an hindex of 5, co-authored 6 publications receiving 420 citations.

Papers
More filters
PatentDOI
TL;DR: A method for encoding a signal that includes a speech component that is classified in one of at least two modes, based, for example, on pitch stationarity, short-term level gradient or zero crossing rate, is described.
Abstract: A method for encoding a signal that includes a speech component is described. First and second linear prediction windows of a frame are analyzed to generate sets of filter coefficients. First and second pitch analysis windows of the frame are analyzed to generate pitch estimates. The frame is classified in one of at least two modes, e.g. voiced, unvoiced and noise modes, based, for example, on pitch stationarity, short-term level gradient or zero crossing rate. Then the frame is encoded using the filter coefficients and pitch estimates in a particular manner depending upon the mode determination for the frame, preferably employing CELP based encoding algorithms.

282 citations

PatentDOI
TL;DR: In this paper, a bit rate Codebook Excited Linear Predictor (CELP) communication system is proposed, which includes a transmitter that organizes a signal containing speech into frames of 40 millisecond duration, and classifies each frame as one of three modes: voiced and stationary, unvoiced or transient, and background noise.
Abstract: A bit rate Codebook Excited Linear Predictor (CELP) communication system which includes a transmitter that organizes a signal containing speech into frames of 40 millisecond duration, and classifies each frame as one of three modes: voiced and stationary, unvoiced or transient, and background noise.

57 citations

PatentDOI
TL;DR: In a speech recognition system, the beginning of speech versus non-speech (a cough or noise) is distinguished by reverting to a nonspeech decision process whenever the liklihood cost of template (vocabulary) patterns, including silence, is worse than a predetermined threshold, established by a Joker Word which represents a non-vocabulary word score and path in the grammar graph as discussed by the authors.
Abstract: In a speech recognition system, the beginning of speech versus non-speech (a cough or noise) is distinguished by reverting to a non-speech decision process whenever the liklihood cost of template (vocabulary) patterns, including silence, is worse than a predetermined threshold, established by a Joker Word which represents a non-vocabulary word score and path in the grammar graph.

29 citations

PatentDOI
TL;DR: In this article, a speech recognition method and apparatus employ a speech processing circuitry for repetitively deriving from a speech input, at a frame repetition rate, a plurality of acoustic parameters.
Abstract: A speech recognition method and apparatus employ a speech processing circuitry for repetitively deriving from a speech input, at a frame repetition rate, a plurality of acoustic parameters. The acoustic parameters represent the speech input signal for a frame time. A plurality of template matching and cost processing circuitries are connected to a system bus, along with the speech processing circuitry, for determining, or identifying, the speech units in the input speech, by comparing the acoustic parameters with stored template patterns. The apparatus can be expanded by adding more template matching and cost processing circuitry to the bus thereby increasing the speech recognition capacity of the apparatus. Template pattern generation is advantageously aided by using a "joker" word to specify the time boundaries of utterances spoken in isolation, by finding the beginning and ending of an utterance surrounded by silence.

25 citations

Patent
17 Apr 1995
TL;DR: In this paper, a method of encoding a signal containing speech is employed in a bit rate Codebook Excited Linear Predictor (CELP) communication system, which includes a transmitter that organizes a signal-containing speech into frames of 40 millisecond duration, and classifies each frame as one of three modes: voiced and stationary, unvoiced or transient, and background noise.
Abstract: A method of encoding a signal containing speech is employed in a bit rate Codebook Excited Linear Predictor (CELP) communication system. The system includes a transmitter that organizes a signal containing speech into frames of 40 millisecond duration, and classifies each frame as one of three modes: voiced and stationary, unvoiced or transient, and background noise.

22 citations


Cited by
More filters
Patent
11 Jan 2011
TL;DR: In this article, an intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions.
Abstract: An intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.

1,462 citations

Patent
19 Oct 2007
TL;DR: In this paper, various methods and devices described herein relate to devices which, in at least certain embodiments, may include one or more sensors for providing data relating to user activity and at least one processor for causing the device to respond based on the user activity which was determined, at least in part, through the sensors.
Abstract: The various methods and devices described herein relate to devices which, in at least certain embodiments, may include one or more sensors for providing data relating to user activity and at least one processor for causing the device to respond based on the user activity which was determined, at least in part, through the sensors. The response by the device may include a change of state of the device, and the response may be automatically performed after the user activity is determined.

844 citations

Patent
28 Sep 2012
TL;DR: In this article, a virtual assistant uses context information to supplement natural language or gestural input from a user, which helps to clarify the user's intent and reduce the number of candidate interpretations of user's input, and reduces the need for the user to provide excessive clarification input.
Abstract: A virtual assistant uses context information to supplement natural language or gestural input from a user. Context helps to clarify the user's intent and to reduce the number of candidate interpretations of the user's input, and reduces the need for the user to provide excessive clarification input. Context can include any available information that is usable by the assistant to supplement explicit user input to constrain an information-processing problem and/or to personalize results. Context can be used to constrain solutions during various phases of processing, including, for example, speech recognition, natural language processing, task flow processing, and dialog generation.

593 citations

Patent
08 Sep 2006
TL;DR: In this paper, a method for building an automated assistant includes interfacing a service-oriented architecture that includes a plurality of remote services to an active ontology, where the active ontologies includes at least one active processing element that models a domain.
Abstract: A method and apparatus are provided for building an intelligent automated assistant. Embodiments of the present invention rely on the concept of “active ontologies” (e.g., execution environments constructed in an ontology-like manner) to build and run applications for use by intelligent automated assistants. In one specific embodiment, a method for building an automated assistant includes interfacing a service-oriented architecture that includes a plurality of remote services to an active ontology, where the active ontology includes at least one active processing element that models a domain. At least one of the remote services is then registered for use in the domain.

389 citations

Patent
05 Jun 2009
TL;DR: In this paper, techniques and systems for implementing contextual voice commands are described and a physical input that relates the selected data item to an operation in a second context is received, and the operation is performed on the input data item in the second context.
Abstract: Among other things, techniques and systems are disclosed for implementing contextual voice commands. On a device, a data item in a first context is displayed. On the device, a physical input selecting the displayed data item in the first context is received. On the device, a voice input that relates the selected data item to an operation in a second context is received. The operation is performed on the selected data item in the second context.

385 citations