scispace - formally typeset
Search or ask a question
DOI

An industrial-strength audio search algorithm

01 Jan 2004-pp 582-588
TL;DR: In this article, the authors developed and commercially deployed a flexible audio search engine that is noise and distortion resistant, computationally efficient, and massively scalable, capable of quickly identifying a short segment of music captured through a cellphone microphone in the presence of foreground voices and other dominant noise, and through voice codec compression.
Abstract: We have developed and commercially deployed a flexible audio search engine. The algorithm is noise and distortion resistant, computationally efficient, and massively scalable, capable of quickly identifying a short segment of music captured through a cellphone microphone in the presence of foreground voices and other dominant noise, and through voice codec compression, out of a database of over a million tracks. The algorithm uses a combinatorially hashed time-frequency constellation analysis of the audio, yielding unusual properties such as transparency, in which multiple tracks mixed together may each be identified. Furthermore, for applications such as radio monitoring, search times on the order of a few milliseconds per query are attained, even on a massive music database.

Content maybe subject to copyright    Report

Citations
More filters
Patent
09 May 2005
TL;DR: In this article, the identification results for successive fingerprints are prepared while using a series of fingerprints for the series of blocks, whereby an identification result depicts an association of a block of information units with a predetermined information entity.
Abstract: In order to analyze an information signal, which has a series of blocks of information units, whereby a number of successive blocks of the series of blocks depicts an information entity, identification results for successive fingerprints are prepared (12) while using a series of fingerprints for the series of blocks, whereby an identification result depicts an association of a block of information units with a predetermined information entity. After this, at least two hypotheses are formed (14) from the identification results for the successive fingerprints. A first hypothesis is an assumption for the association of the series of blocks with a first information entity, and the second hypothesis is an assumption for the association of the series of blocks with the second information entity. Afterwards, different hypotheses are tested (16) in order to obtain a test result on the basis of which an assertion concerning the information signal is made (20). This results in obtaining a meaningful and reliable continuous-time analysis of an information signal.

94 citations

Patent
04 May 2010
TL;DR: In this paper, a system and methods for recognizing sounds are provided, where user input relating to one or more sounds is received from a computing device, and instructions are executed by a processor to discriminate the one or multiple sounds, extract music features from the sounds, analyze the music features using databases, and obtain information regarding the features based on the analysis.
Abstract: Systems and methods for recognizing sounds are provided herein. User input relating to one or more sounds is received from a computing device. Instructions, which are stored in memory, are executed by a processor to discriminate the one or more sounds, extract music features from the one or more sounds, analyze the music features using one or more databases, and obtain information regarding the music features based on the analysis. Further, information regarding the music features of the one or more sounds may be transmitted to display on the computing device.

74 citations

Patent
21 May 2013
TL;DR: In this paper, the authors present a system and methods for searching databases by sound data input, and present a search technology that furnishes search results in a fast and accurate manner.
Abstract: Systems and methods for searching databases by sound data input are provided herein. A service provider may have a need to make their database(s) searchable through search technology. However, the service provider may not have the resources to implement such search technology. The search technology may allow for search queries using sound data input. The technology described herein provides a solution addressing the service provider's need, by giving a search technology that furnishes search results in a fast, accurate manner. In further embodiments, systems and methods to monetize those search results are also described herein.

72 citations

Proceedings ArticleDOI
25 Jun 2013
TL;DR: A user study is presented to demonstrate that novice programmers can implement the core logic of interesting apps with Auditeur in less than 30 minutes, using only 15 - 20 lines of Java code.
Abstract: Auditeur is a general-purpose, energy-efficient, and context-aware acoustic event detection platform for smartphones. It enables app developers to have their app register for and get notified on a wide variety of acoustic events. Auditeur is backed by a cloud service to store user contributed sound clips and to generate an energy-efficient and context-aware classification plan for the phone. When an acoustic event type has been registered, the smartphone instantiates the necessary acoustic processing modules and wires them together to execute the plan. The phone then captures, processes, and classifies acoustic events locally and efficiently. Our analysis on user-contributed empirical data shows that Auditeur's energy-aware acoustic feature selection algorithm is capable of increasing the device lifetime by 33.4%, sacrificing less than 2% of the maximum achievable accuracy. We implement seven apps with Auditeur, and deploy them in real-world scenarios to demonstrate that Auditeur is versatile, 11.04% - 441.42% less power hungry, and 10.71% - 13.86% more accurate in detecting acoustic events, compared to state-of-the-art techniques. We present a user study to demonstrate that novice programmers can implement the core logic of interesting apps with Auditeur in less than 30 minutes, using only 15 - 20 lines of Java code.

67 citations

Proceedings ArticleDOI
01 Nov 2015
TL;DR: Nowadays small size UAV (Unmanned Aerial Vehicle) systems are really popular and they can be used for smuggling, observation, violation of privacy etc, and the price of these things is quickly becoming lower.
Abstract: Nowadays small size UAV (Unmanned Aerial Vehicle) systems are really popular. We can find so many of them on video sharing portals and nowadays there are many articles on the media with the advantages and disadvantages of them. We can use “drones” for so many useful task, but on the other hand the risk is increasing with the growing popularity. We can found articles about the unethical and unlawful usages of them. They can be used for smuggling, observation, violation of privacy etc. As a consequence of the growing popularity, the price of these things is quickly becoming lower. High quality cameras are also widespread and it's also easy to install one onto a commercially available drone. We have to admit that these factors are become increasingly critical.

47 citations

References
More filters
Journal ArticleDOI
TL;DR: The audio analysis, search, and classification engine described here reduces sounds to perceptual and acoustical features, which lets users search or retrieve sounds by any one feature or a combination of them, by specifying previously learned classes based on these features.
Abstract: Many audio and multimedia applications would benefit from the ability to classify and search for audio based on its characteristics. The audio analysis, search, and classification engine described here reduces sounds to perceptual and acoustical features. This lets users search or retrieve sounds by any one feature or a combination of them, by specifying previously learned classes based on these features, or by selecting or entering reference sounds and asking the engine to retrieve similar or dissimilar sounds.

1,147 citations

Proceedings Article
01 Jan 2002
TL;DR: An audio fingerprinting system that uses the fingerprint of an unknown audio clip as a query on a fingerprint database, which contains the fingerprints of a large library of songs, the audio clip can be identified.
Abstract: Imagine the following situation. You’re in your car, listening to the radio and suddenly you hear a song that catches your attention. It’s the best new song you have heard for a long time, but you missed the announcement and don’t recognize the artist. Still, you would like to know more about this music. What should you do? You could call the radio station, but that’s too cumbersome. Wouldn’t it be nice if you could push a few buttons on your mobile phone and a few seconds later the phone would respond with the name of the artist and the title of the music you’re listening to? Perhaps even sending an email to your default email address with some supplemental information. In this paper we present an audio fingerprinting system, which makes the above scenario possible. By using the fingerprint of an unknown audio clip as a query on a fingerprint database, which contains the fingerprints of a large library of songs, the audio clip can be identified. At the core of the presented system are a highly robust fingerprint extraction method and a very efficient fingerprint search strategy, which enables searching a large fingerprint database with only limited computing resources.

911 citations

Proceedings ArticleDOI
Cheng Yang1
21 Oct 2001
TL;DR: The algorithm tries to capture the intuitive notion of similarity perceived by humans: two pieces are similar if they are fully or partially based on the same score, even if they were performed by different people or at different speed.
Abstract: We present a prototype method of indexing raw-audio music files in a way that facilitates content-based similarity retrieval. The algorithm tries to capture the intuitive notion of similarity perceived by humans: two pieces are similar if they are fully or partially based on the same score, even if they are performed by different people or at different speed. Local peaks in signal power are identified in each audio file, and a spectral vector is extracted near each peak. Nearby peaks are selectively grouped together to form "characteristic sequences" which are used as the basis for indexing. A hashing scheme known as "locality-sensitive hashing" is employed to index the high-dimensional vectors. Retrieval results are ranked based on the number of final matches filtered by some linearity criteria.

75 citations