An intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.

Intelligent Automated Assistant

Media objects are transformed into active, connected objects via identifiers embedded into them or their containers. In the context of a user's playback experience, a decoding process extracts the identifier from a media object and possibly additional context information and forwards it to a server. The server, in turn, maps the identifier to an action, such as returning metadata, re-directing the request to one or more other servers, requesting information from another server to identify the media object, etc. The linking process applies to broadcast objects as well as objects transmitted over networks in streaming and compressed file formats.

Connected audio and other media objects

A virtual assistant uses context information to supplement natural language or gestural input from a user. Context helps to clarify the user's intent and to reduce the number of candidate interpretations of the user's input, and reduces the need for the user to provide excessive clarification input. Context can include any available information that is usable by the assistant to supplement explicit user input to constrain an information-processing problem and/or to personalize results. Context can be used to constrain solutions during various phases of processing, including, for example, speech recognition, natural language processing, task flow processing, and dialog generation.

Using context information to facilitate processing of commands in a virtual assistant

A system, method, and computer program product discover relationships among items and recommend items based on the discovered relationships. The recommendations provided by the present invention are based on user profiles that take into account actual preferences of users, without requiring users to complete questionnaires. An improved binomial log likelihood ratio analysis technique is applied, to reduce adverse effects of overstatement of coincidence and predominance of best sellers. The invention may be used, for example, to generate track lists for a personalized radio station.

Relationship discovery engine

The presently claimed invention generally relates to deriving and/or utilizing content signatures (e.g., so-called “fingerprints”). One claim recites a method of generating a fingerprint associated with a content item including: pseudo-randomly selecting a segment of the content item; and utilizing a processor or electronic processing circuitry, fingerprinting the selected segment of content item as at least an identifier of the content item. Of course, other claims and combination are provided as well.

Methods, Apparatus and Programs for Generating and Utilizing Content Signatures

The present invention provides a method and system for computerized synchronization of an audio stream having a first synchronized textual representation usable for subtitling of the audio stream in the original language, with a second synchronized textual representation which can be used as an alternative subtitle including a transcription of the original language into another language. Time synchronous links can be built between the audio stream and, for instance, the textual representations of the words spoken in the audio stream. More particularly, the second representation can inherit the synchronization between the audio stream and the first representation using structure association information determined between the first and the second representation.

Method and system for the automatic generation of multi-lingual synchronized sub-titles for audiovisual data

A method and apparatus for creating links between a representation, (e.g. text data,) and a realization, (e.g. corresponding audio data,) is provided. According to the invention the realization is structured by combining a time-stamped version of the representation generated from the realization with structural information from the representation. Thereby so called hyper links between representation and realization are created. These hyper links are used for performing search operations in realization data equivalent to those which are possible in representation data, enabling an improved access to the realization (e.g. via audio databases).

Method and apparatus for linking representation and realization data

A characteristic identifier for digital data is generated. Thereby, the information contained in a digital data set is reduced such that the resulting identifier is made comparable to another identifier made in the same manner. The generated identifiers are used for detecting identical digital data or to determine inexact copies of digital data. In one embodiment of the invention, the digital data is a digital audio signal and the characteristic identifier is called an audio signature. The comparison of identical audio data according to the invention can be carried out without a person actually listening to the audio data. The present invention can be used to establish automated processes to find potential unauthorized copies of audio data, e.g., music recordings, and therefore enables a better enforcement of copyrights in the audio industry.

Method and system for generating a characteristic identifier for digital data and for detecting identical digital data

Disclosed are a computerized method and system for the identification of identical or similar audio recordings or segments of audio recordings. Identity or similarity between a first audio segment of a first audio stream and at least a second audio segment of an at least second audio stream is determined by digitizing at least the first audio segment and the at least second audio segment of said audio streams, calculating characteristic signatures from at least one local feature of the first audio segment and the at least second audio segment, aligning the at least two characteristic signatures, comparing the at least two aligned characteristic signatures and calculating a distance between the aligned characteristic signatures and determining identity or similarity between the at least two audio segments based on the determined distance.

A method and system for the automatic detection of similar or identical segments in audio recordings

A digitized speech signal ( 600 ) is input to an F0 (fundamental frequency) processor that computes ( 610 ) a continuous F0 data from the speech signal. By the criterion voicing state transition (voiced/unvoiced transitions) the speech signal is presegmented ( 620 ) into segments. For each segment ( 630 ) it is evaluated ( 640 ) whether F0 is defined or not defined i.e. whether F0 is ON or OFF. In case of F0=OFF a candidate segment boundary is assumed as described above and, starting from that boundary, prosodic features are computed ( 650 ). The feature values are input into a classification tree and each candidate segment is classified thereby revealing, as a result, the existence or non-existence of a semantic or syntactic speech unit.

Gerhard Stenzel

Papers

Method and system for the automatic generation of multi-lingual synchronized sub-titles for audiovisual data

Method and apparatus for linking representation and realization data

Method and system for generating a characteristic identifier for digital data and for detecting identical digital data

A method and system for the automatic detection of similar or identical segments in audio recordings

Method and system for the automatic segmentation of an audio stream into semantic or syntactic units