System and methods for recognizing sound and music signals in high noise and distortion

doi:10.1121/1.2723991

Home
/
Papers
/
System and methods for recognizing sound and music signals in high noise and distortion

Patent•DOI•

System and methods for recognizing sound and music signals in high noise and distortion

20 Apr 2001-Journal of the Acoustical Society of America-Vol. 121, Iss: 4, pp 1832

TL;DR: In this article, the authors proposed a method for recognizing audio samples that locates an audio file that most closely matches the audio sample from a database indexing a large set of original recordings, where each indexed audio file is represented in the database index by a set of landmark timepoints and associated fingerprints.

read less

Abstract: A method for recognizing an audio sample locates an audio file that most closely matches the audio sample from a database indexing a large set of original recordings. Each indexed audio file is represented in the database index by a set of landmark timepoints and associated fingerprints. Landmarks occur at reproducible locations within the file, while fingerprints represent features of the signal at or near the landmark timepoints. To perform recognition, landmarks and fingerprints are computed for the unknown sample and used to retrieve matching fingerprints from the database. For each file containing matching fingerprints, the landmarks are compared with landmarks of the sample at which the same fingerprints were computed. If a large number of corresponding landmarks are linearly related, i.e., if equivalent fingerprints of the sample and retrieved file have the same time evolution, then the file is identified with the sample. The method can be used for any type of sound or music, and is particularly effective for audio signals subject to linear and nonlinear distortion such as background noise, compression artifacts, or transmission dropouts. The sample can be identified in a time proportional to the logarithm of the number of entries in the database; given sufficient computational power, recognition can be performed in nearly real time as the sound is being sampled.

...read moreread less

Citations

PDF

Open Access

More filters

Patent•

Method and apparatus for browsing using multiple coordinated device sets

[...]

Richard R. Reisman

02 Sep 2009

TL;DR: In this paper, the authors present systems and methods for navigating hypermedia using multiple coordinated input/output device sets, allowing a user and/or an author to control what resources are presented on which device sets (whether they are integrated or not), and provide for coordinating browsing activities to enable such a user interface to be employed across multiple independent systems.

...read moreread less

Abstract: Systems and methods for navigating hypermedia using multiple coordinated input/output device sets. Disclosed systems and methods allow a user and/or an author to control what resources are presented on which device sets (whether they are integrated or not), and provide for coordinating browsing activities to enable such a user interface to be employed across multiple independent systems. Disclosed systems and methods also support new and enriched aspects and applications of hypermedia browsing and related business activities.

...read moreread less

1,974 citations

Patent•

Method and Apparatus for Browsing Using Alternative Linkbases

[...]

Richard Reisman

06 Jan 2014

TL;DR: In this article, the authors present systems and methods for navigating hypermedia using multiple coordinated input/output device sets, allowing a user and/or an author to control what resources are presented on which device sets (whether they are integrated or not), and provide for coordinating browsing activities to enable such a user interface to be employed across multiple independent systems.

...read moreread less

1,344 citations

Patent•

Intuitive computing methods and systems

[...]

Tony F. Rodriguez, Geoffrey B. Rhoads

23 Feb 2011

TL;DR: A smart phone senses audio, imagery, and/or other stimulus from a user's environment, and acts autonomously to fulfill inferred or anticipated user desires as discussed by the authors, and can apply more or less resources to an image processing task depending on how successfully the task is proceeding or based on the user's apparent interest in the task.

...read moreread less

Abstract: A smart phone senses audio, imagery, and/or other stimulus from a user's environment, and acts autonomously to fulfill inferred or anticipated user desires. In one aspect, the detailed technology concerns phone-based cognition of a scene viewed by the phone's camera. The image processing tasks applied to the scene can be selected from among various alternatives by reference to resource costs, resource constraints, other stimulus information (e.g., audio), task substitutability, etc. The phone can apply more or less resources to an image processing task depending on how successfully the task is proceeding, or based on the user's apparent interest in the task. In some arrangements, data may be referred to the cloud for analysis, or for gleaning. Cognition, and identification of appropriate device response(s), can be aided by collateral information, such as context. A great number of other features and arrangements are also detailed.

...read moreread less

1,056 citations

Patent•

Connected audio and other media objects

[...]

Kenneth L. Levy, Geoffrey B. Rhoads

25 Jan 2001

TL;DR: In this paper, a decoding process extracts the identifier from a media object and possibly additional context information and forwards it to a server, in turn, maps the identifier to an action, such as returning metadata, re-directing the request to one or more other servers, requesting information from another server to identify the media object, etc.

...read moreread less

Abstract: Media objects are transformed into active, connected objects via identifiers embedded into them or their containers. In the context of a user's playback experience, a decoding process extracts the identifier from a media object and possibly additional context information and forwards it to a server. The server, in turn, maps the identifier to an action, such as returning metadata, re-directing the request to one or more other servers, requesting information from another server to identify the media object, etc. The linking process applies to broadcast objects as well as objects transmitted over networks in streaming and compressed file formats.

...read moreread less

1,026 citations

Patent•

Smartphone-based methods and systems

[...]

Bruce L. Davis, Tony F. Rodriguez, Geoffrey B. Rhoads, William Y. Conwell, Jerrine K. Owen, Adnan M. Alattar, Eliot Rogers, Brett A. Bradley, Alastair M. Reed, Robert Craig Brandis - Show less +6 more

04 Nov 2011

TL;DR: In this paper, the authors discuss the use of portable devices (e.g., smartphones and tablet computers) in a variety of applications, such as shopping, text entry, sign language interpretation, and vision-based discovery.

...read moreread less

Abstract: Arrangements involving portable devices (e.g., smartphones and tablet computers) are disclosed. One arrangement enables a content creator to select software with which that creator's content should be rendered—assuring continuity between artistic intention and delivery. Another utilizes a device camera to identify nearby subjects, and take actions based thereon. Others rely on near field chip (RFID) identification of objects, or on identification of audio streams (e.g., music, voice). Some technologies concern improvements to the user interfaces associated with such devices. Others involve use of these devices in connection with shopping, text entry, sign language interpretation, and vision-based discovery. Still other improvements are architectural in nature, e.g., relating to evidence-based state machines, and blackboard systems. Yet other technologies concern use of linked data in portable devices—some of which exploit GPU capabilities. Still other technologies concern computational photography. A great variety of other features and arrangements are also detailed.

...read moreread less

679 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Content-based classification, search, and retrieval of audio

[...]

E. Wold¹, T. Blum, D. Keislar, J. Wheaten•Institutions (1)

University of California, Berkeley¹

01 Sep 1996-IEEE MultiMedia

TL;DR: The audio analysis, search, and classification engine described here reduces sounds to perceptual and acoustical features, which lets users search or retrieve sounds by any one feature or a combination of them, by specifying previously learned classes based on these features.

...read moreread less

Abstract: Many audio and multimedia applications would benefit from the ability to classify and search for audio based on its characteristics. The audio analysis, search, and classification engine described here reduces sounds to perceptual and acoustical features. This lets users search or retrieve sounds by any one feature or a combination of them, by specifying previously learned classes based on these features, or by selecting or entering reference sounds and asking the engine to retrieve similar or dissimilar sounds.

...read moreread less

1,147 citations

Proceedings Article•

[...]

Udi Manber¹•Institutions (1)

University of Arizona¹

17 Jan 1994

TL;DR: Application of sif can be found in file management, information collecting, program reuse, file synchronization, data compression, and maybe even plagiarism detection.

...read moreread less

Abstract: We present a tool, called sif, for finding all similar files in a large file system. Files are considered similar if they have significant number of common pieces, even if they are very different otherwise. For example, one file may be contained, possibly with some changes, in another file, or a file may be a reorganization of another file. The running time for finding all groups of similar files, even for as little as 25% similarity, is on the order of 500MB to 1GB an hour. The amount of similarity and several other customized parameters can be determined by the user at a post-processing stage, which is very fast. Sif can also be used to very quickly identify all similar files to a query file using a preprocessed index. Application of sif can be found in file management, information collecting (to remove duplicates), program reuse, file synchronization, data compression, and maybe even plagiarism detection.

...read moreread less

821 citations

Patent•

Method and article of manufacture for content-based analysis, storage, retrieval, and segmentation of audio information

[...]

Thomas L. Blum, Douglas F. Keislar, James A. Wheaton, Erling H. Wold

21 Jul 1997

TL;DR: In this paper, a system that performs analysis and comparison of audio data files based upon the content of the data files is presented, which produces a set of numeric values (a feature vector) that can be used to classify and rank the similarity between individual audio files typically stored in a multimedia database or on the Web.

...read moreread less

Abstract: A system that performs analysis and comparison of audio data files based upon the content of the data files is presented. The analysis of the audio data produces a set of numeric values (a feature vector) that can be used to classify and rank the similarity between individual audio files typically stored in a multimedia database or on the World Wide Web. The analysis also facilitates the description of user-defined classes of audio files, based on an analysis of a set of audio files that are members of a user-defined class. The system can find sounds within a longer sound, allowing an audio recording to be automatically segmented into a series of shorter audio segments.

...read moreread less

726 citations

Proceedings Article•DOI•

Content-based retrieval of music and audio

[...]

Jonathan Foote¹•Institutions (1)

National University of Singapore¹

06 Oct 1997

TL;DR: In this article, a supervised vector quantizer is used to learn audio features from a corpus of simple sounds and musical excerpts, and the similarity measure is based on statistics derived from a supervised quantizer, rather than matching simple pitch or spectral characteristics.

...read moreread less

Abstract: Though many systems exist for content-based retrieval of images, little work has been done on the audio portion of the multimedia stream. This paper presents a system to retrieve audio documents y acoustic similarity. The similarity measure is based on statistics derived from a supervised vector quantizer, rather than matching simple pitch or spectral characteristics. The system is thus able to learn distinguishing audio features while ignoring unimportant variation. Both theoretical and experimental results are presented, including quantitative measures of retrieval performance. Retrieval was tested on a corpus of simple sounds as well as a corpus of musical excerpts. The system is purely data-driven and does not depend on particular audio characteristics. Given a suitable parameterization, this method may thus be applicable to image retrieval as well.© (1997) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

...read moreread less

487 citations

Journal Article•DOI•

An overview of audio information retrieval

[...]

Jonathan Foote¹•Institutions (1)

National University of Singapore¹

01 Jan 1999-Multimedia Systems

TL;DR: The state of the art in audio information retrieval is reviewed, and recent advances in automatic speech recognition, word spotting, speaker and music identification, and audio similarity are presented with a view towards making audio less “opaque”.

...read moreread less

Abstract: The problem of audio information retrieval is familiar to anyone who has returned from vacation to find ananswering machine full of messages. While there is not yetan "AltaVista" for the audio data type, many workers arefinding ways to automatically locate, index, and browse audio using recent advances in speech recognition and machinelistening. This paper reviews the state of the art in audioinformation retrieval, and presents recent advances in automatic speech recognition, word spotting, speaker and music identification, and audio similarity with a view towardsmaking audio less "opaque". A special section addresses intelligent interfaces for navigating and browsing audio andmultimedia documents, using automatically derived information to go beyond the tape recorder metaphor.

...read moreread less

450 citations