scispace - formally typeset
Search or ask a question
PatentDOI

System and methods for recognizing sound and music signals in high noise and distortion

TL;DR: In this article, the authors proposed a method for recognizing audio samples that locates an audio file that most closely matches the audio sample from a database indexing a large set of original recordings, where each indexed audio file is represented in the database index by a set of landmark timepoints and associated fingerprints.
Abstract: A method for recognizing an audio sample locates an audio file that most closely matches the audio sample from a database indexing a large set of original recordings. Each indexed audio file is represented in the database index by a set of landmark timepoints and associated fingerprints. Landmarks occur at reproducible locations within the file, while fingerprints represent features of the signal at or near the landmark timepoints. To perform recognition, landmarks and fingerprints are computed for the unknown sample and used to retrieve matching fingerprints from the database. For each file containing matching fingerprints, the landmarks are compared with landmarks of the sample at which the same fingerprints were computed. If a large number of corresponding landmarks are linearly related, i.e., if equivalent fingerprints of the sample and retrieved file have the same time evolution, then the file is identified with the sample. The method can be used for any type of sound or music, and is particularly effective for audio signals subject to linear and nonlinear distortion such as background noise, compression artifacts, or transmission dropouts. The sample can be identified in a time proportional to the logarithm of the number of entries in the database; given sufficient computational power, recognition can be performed in nearly real time as the sound is being sampled.
Citations
More filters
Patent
02 Sep 2009
TL;DR: In this paper, the authors present systems and methods for navigating hypermedia using multiple coordinated input/output device sets, allowing a user and/or an author to control what resources are presented on which device sets (whether they are integrated or not), and provide for coordinating browsing activities to enable such a user interface to be employed across multiple independent systems.
Abstract: Systems and methods for navigating hypermedia using multiple coordinated input/output device sets. Disclosed systems and methods allow a user and/or an author to control what resources are presented on which device sets (whether they are integrated or not), and provide for coordinating browsing activities to enable such a user interface to be employed across multiple independent systems. Disclosed systems and methods also support new and enriched aspects and applications of hypermedia browsing and related business activities.

1,974 citations

Patent
06 Jan 2014
TL;DR: In this article, the authors present systems and methods for navigating hypermedia using multiple coordinated input/output device sets, allowing a user and/or an author to control what resources are presented on which device sets (whether they are integrated or not), and provide for coordinating browsing activities to enable such a user interface to be employed across multiple independent systems.
Abstract: Systems and methods for navigating hypermedia using multiple coordinated input/output device sets. Disclosed systems and methods allow a user and/or an author to control what resources are presented on which device sets (whether they are integrated or not), and provide for coordinating browsing activities to enable such a user interface to be employed across multiple independent systems. Disclosed systems and methods also support new and enriched aspects and applications of hypermedia browsing and related business activities.

1,344 citations

Patent
23 Feb 2011
TL;DR: A smart phone senses audio, imagery, and/or other stimulus from a user's environment, and acts autonomously to fulfill inferred or anticipated user desires as discussed by the authors, and can apply more or less resources to an image processing task depending on how successfully the task is proceeding or based on the user's apparent interest in the task.
Abstract: A smart phone senses audio, imagery, and/or other stimulus from a user's environment, and acts autonomously to fulfill inferred or anticipated user desires. In one aspect, the detailed technology concerns phone-based cognition of a scene viewed by the phone's camera. The image processing tasks applied to the scene can be selected from among various alternatives by reference to resource costs, resource constraints, other stimulus information (e.g., audio), task substitutability, etc. The phone can apply more or less resources to an image processing task depending on how successfully the task is proceeding, or based on the user's apparent interest in the task. In some arrangements, data may be referred to the cloud for analysis, or for gleaning. Cognition, and identification of appropriate device response(s), can be aided by collateral information, such as context. A great number of other features and arrangements are also detailed.

1,056 citations

Patent
25 Jan 2001
TL;DR: In this paper, a decoding process extracts the identifier from a media object and possibly additional context information and forwards it to a server, in turn, maps the identifier to an action, such as returning metadata, re-directing the request to one or more other servers, requesting information from another server to identify the media object, etc.
Abstract: Media objects are transformed into active, connected objects via identifiers embedded into them or their containers. In the context of a user's playback experience, a decoding process extracts the identifier from a media object and possibly additional context information and forwards it to a server. The server, in turn, maps the identifier to an action, such as returning metadata, re-directing the request to one or more other servers, requesting information from another server to identify the media object, etc. The linking process applies to broadcast objects as well as objects transmitted over networks in streaming and compressed file formats.

1,026 citations

Patent
04 Nov 2011
TL;DR: In this paper, the authors discuss the use of portable devices (e.g., smartphones and tablet computers) in a variety of applications, such as shopping, text entry, sign language interpretation, and vision-based discovery.
Abstract: Arrangements involving portable devices (e.g., smartphones and tablet computers) are disclosed. One arrangement enables a content creator to select software with which that creator's content should be rendered—assuring continuity between artistic intention and delivery. Another utilizes a device camera to identify nearby subjects, and take actions based thereon. Others rely on near field chip (RFID) identification of objects, or on identification of audio streams (e.g., music, voice). Some technologies concern improvements to the user interfaces associated with such devices. Others involve use of these devices in connection with shopping, text entry, sign language interpretation, and vision-based discovery. Still other improvements are architectural in nature, e.g., relating to evidence-based state machines, and blackboard systems. Yet other technologies concern use of linked data in portable devices—some of which exploit GPU capabilities. Still other technologies concern computational photography. A great variety of other features and arrangements are also detailed.

679 citations

References
More filters
Journal ArticleDOI
TL;DR: The audio analysis, search, and classification engine described here reduces sounds to perceptual and acoustical features, which lets users search or retrieve sounds by any one feature or a combination of them, by specifying previously learned classes based on these features.
Abstract: Many audio and multimedia applications would benefit from the ability to classify and search for audio based on its characteristics. The audio analysis, search, and classification engine described here reduces sounds to perceptual and acoustical features. This lets users search or retrieve sounds by any one feature or a combination of them, by specifying previously learned classes based on these features, or by selecting or entering reference sounds and asking the engine to retrieve similar or dissimilar sounds.

1,147 citations

Proceedings Article
Udi Manber1
17 Jan 1994
TL;DR: Application of sif can be found in file management, information collecting, program reuse, file synchronization, data compression, and maybe even plagiarism detection.
Abstract: We present a tool, called sif, for finding all similar files in a large file system. Files are considered similar if they have significant number of common pieces, even if they are very different otherwise. For example, one file may be contained, possibly with some changes, in another file, or a file may be a reorganization of another file. The running time for finding all groups of similar files, even for as little as 25% similarity, is on the order of 500MB to 1GB an hour. The amount of similarity and several other customized parameters can be determined by the user at a post-processing stage, which is very fast. Sif can also be used to very quickly identify all similar files to a query file using a preprocessed index. Application of sif can be found in file management, information collecting (to remove duplicates), program reuse, file synchronization, data compression, and maybe even plagiarism detection.

821 citations

Patent
21 Jul 1997
TL;DR: In this paper, a system that performs analysis and comparison of audio data files based upon the content of the data files is presented, which produces a set of numeric values (a feature vector) that can be used to classify and rank the similarity between individual audio files typically stored in a multimedia database or on the Web.
Abstract: A system that performs analysis and comparison of audio data files based upon the content of the data files is presented. The analysis of the audio data produces a set of numeric values (a feature vector) that can be used to classify and rank the similarity between individual audio files typically stored in a multimedia database or on the World Wide Web. The analysis also facilitates the description of user-defined classes of audio files, based on an analysis of a set of audio files that are members of a user-defined class. The system can find sounds within a longer sound, allowing an audio recording to be automatically segmented into a series of shorter audio segments.

726 citations

Proceedings ArticleDOI
06 Oct 1997
TL;DR: In this article, a supervised vector quantizer is used to learn audio features from a corpus of simple sounds and musical excerpts, and the similarity measure is based on statistics derived from a supervised quantizer, rather than matching simple pitch or spectral characteristics.
Abstract: Though many systems exist for content-based retrieval of images, little work has been done on the audio portion of the multimedia stream. This paper presents a system to retrieve audio documents y acoustic similarity. The similarity measure is based on statistics derived from a supervised vector quantizer, rather than matching simple pitch or spectral characteristics. The system is thus able to learn distinguishing audio features while ignoring unimportant variation. Both theoretical and experimental results are presented, including quantitative measures of retrieval performance. Retrieval was tested on a corpus of simple sounds as well as a corpus of musical excerpts. The system is purely data-driven and does not depend on particular audio characteristics. Given a suitable parameterization, this method may thus be applicable to image retrieval as well.© (1997) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

487 citations

Journal ArticleDOI
TL;DR: The state of the art in audio information retrieval is reviewed, and recent advances in automatic speech recognition, word spotting, speaker and music identification, and audio similarity are presented with a view towards making audio less “opaque”.
Abstract: The problem of audio information retrieval is familiar to anyone who has returned from vacation to find ananswering machine full of messages. While there is not yetan "AltaVista" for the audio data type, many workers arefinding ways to automatically locate, index, and browse audio using recent advances in speech recognition and machinelistening. This paper reviews the state of the art in audioinformation retrieval, and presents recent advances in automatic speech recognition, word spotting, speaker and music identification, and audio similarity with a view towardsmaking audio less "opaque". A special section addresses intelligent interfaces for navigating and browsing audio andmultimedia documents, using automatically derived information to go beyond the tape recorder metaphor.

450 citations