scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Robust audio fingerprint extraction algorithm based on 2-D chroma

16 Jul 2012-pp 763-767
TL;DR: An improved audio fingerprinting extraction algorithm which was proposed by Shazam company is proposed, which uses a combinatorial hashed time-frequency analysis of the audio, yielding unusual properties in which multiple tracks mixed together may each be identified.
Abstract: Audio fingerprinting, like human fingerprint, identifies audio clips from a large number of databases successfully, even when the audio signals are slightly or seriously distorted. In the paper, based on 2-D, we propose an improved audio fingerprinting extraction algorithm which was proposed by Shazam company. The algorithm uses a combinatorial hashed time-frequency analysis of the audio, yielding unusual properties in which multiple tracks mixed together may each be identified. The results of experiment verify the improvement in the retrieval speed and accuracy.
Citations
More filters
Journal ArticleDOI
TL;DR: An inclusive survey on key indoor technologies and techniques is carried out with to view to explore their various benefits, limitations, and areas for improvement, and advocates hybridization of technologies as an effective approach to achieve reliable IoT-based indoor systems.

88 citations


Cites methods from "Robust audio fingerprint extraction..."

  • ...Thus, efforts are made in developing a more robust algorithm for easy implementation of the technique as found in [209-218]....

    [...]

Journal ArticleDOI
09 Jan 2018-Sensors
TL;DR: audio fingerprinting techniques that can be used with the acoustic data acquired from mobile devices using data acquired with mobile devices published between 2002 and 2017 are reviewed.
Abstract: An increase in the accuracy of identification of Activities of Daily Living (ADL) is very important for different goals of Enhanced Living Environments and for Ambient Assisted Living (AAL) tasks. This increase may be achieved through identification of the surrounding environment. Although this is usually used to identify the location, ADL recognition can be improved with the identification of the sound in that particular environment. This paper reviews audio fingerprinting techniques that can be used with the acoustic data acquired from mobile devices. A comprehensive literature search was conducted in order to identify relevant English language works aimed at the identification of the environment of ADLs using data acquired with mobile devices, published between 2002 and 2017. In total, 40 studies were analyzed and selected from 115 citations. The results highlight several audio fingerprinting techniques, including Modified discrete cosine transform (MDCT), Mel-frequency cepstrum coefficients (MFCC), Principal Component Analysis (PCA), Fast Fourier Transform (FFT), Gaussian mixture models (GMM), likelihood estimation, logarithmic moduled complex lapped transform (LMCLT), support vector machine (SVM), constant Q transform (CQT), symmetric pairwise boosting (SPB), Philips robust hash (PRH), linear discriminant analysis (LDA) and discrete cosine transform (DCT).

26 citations


Cites background or methods from "Robust audio fingerprint extraction..."

  • ...The authors of [34] proposed an audio fingerprinting system with several characteristics, including robustness, granularity, and retrieval speed, reporting an accuracy between 88% and 99%....

    [...]

  • ...[34] The authors present an audio fingerprinting algorithm, where the audio fingerprints are produced based on 2-Dimagel, reporting an accuracy between 88% and 99%....

    [...]

  • ...The structure of the audio fingerprinting implemented is the same as all other algorithms presented in [34], applying the FFT and an High-pass filter....

    [...]

  • ...The authors used the local maximum chroma energy (LMCE) to extract the perception features of Tempo-Frequency domain [34]....

    [...]

  • ...[34] 2012 20 music clips with 5 s Proposes an audio fingerprinting algorithm for recognition of some clips Not mentioned No No...

    [...]

Journal ArticleDOI
TL;DR: Combined with the linear prediction-minimum mean squared error (LP-MMSE), an efficient perceptual hashing algorithm based on improved spectral entropy for speech authentication was proposed in this paper and Experimental results show that the proposed algorithm was better than other existing methods in compactness.
Abstract: Combined with the linear prediction-minimum mean squared error (LP-MMSE), an efficient perceptual hashing algorithm based on improved spectral entropy for speech authentication was proposed in this paper. The linear prediction analysis is conducted on speech signal after preprocessing, framing and adding windows, and obtained the minimum mean squared error coefficient matrix. And then, the spectral entropy parameter matrix of each frame is calculated by using improved spectral entropy method. And the final binary perceptual hashing sequence is generated based on the above two matrices, and the speech authentication is completed. Comparing the experimental results of combining the Teager energy operator (TEO) with the linear predictive coefficients (LPC), LP-MMSE and line spectrum pair (LSP) coefficient respectively, it can be seen that the proposed algorithm had a good compromise between robustness, discrimination and authentication efficiency, and the proposed algorithm can meet the requirement of real-time speech authentication in speech communication. Experimental results show that the proposed algorithm was better than other existing methods in compactness.

16 citations

Patent
12 Jan 2018
TL;DR: In this article, an audio matching method and device and electronic equipment is described. And audio matching results of the audio segments are merged to obtain a matching result of the to-be-matched audio data.
Abstract: The invention discloses an audio matching method and device and electronic equipment. The method includes the steps that firstly, to-be-matched audio data is acquired; secondly, the to-be-matched audio data is segmented to obtain multiple to-be-matched audio segments after segmentation; thirdly, audio fingerprint features of each to-be-matched audio segment are extracted, according to the extracted audio fingerprint features, audio matching is conducted on each to-be-matched audio segment by using a pre-built audio matching library, and audio matching results of the to-be-matched audio segments are obtained; fourthly, the audio matching results of the to-be-matched audio segments are merged to obtain a matching result of the to-be-matched audio data. By means of the method, the audio retrieval efficiency can be improved.

11 citations

Proceedings ArticleDOI
24 Oct 2013
TL;DR: A fast algorithm is proposed to the audio content-based retrieval with the fingerprint technique, based on the extraction of the frequency features of the audio and a hash function, which has a high success rate and a response time lower than other techniques.
Abstract: Fingerprinting is one of the most used techniques for searching and identification audio with a wide spectrum of applications. Different algorithms defines different fingerprint extraction and the match techniques, with different efficiency, computational load, robustness, response time and location search. Nowadays music audio retrieval faces two main challenges in order to be efficient: robustness and speed. This article proposes a fast algorithm to the audio content-based retrieval with the fingerprint technique, based on the extraction of the frequency features of the audio and a hash function. Experiments determined a high success rate and a response time lower than other techniques, optimal to real time applications like monitoring radio stations or songs identifying.

5 citations


Cites methods from "Robust audio fingerprint extraction..."

  • ...In this article, an fingerprint algorithm for audio contents retrieval is proposed, based in the shazam technique, showing a high reliability and low cost implementation [16]....

    [...]

References
More filters
Proceedings Article
01 Jan 2002
TL;DR: An audio fingerprinting system that uses the fingerprint of an unknown audio clip as a query on a fingerprint database, which contains the fingerprints of a large library of songs, the audio clip can be identified.
Abstract: Imagine the following situation. You’re in your car, listening to the radio and suddenly you hear a song that catches your attention. It’s the best new song you have heard for a long time, but you missed the announcement and don’t recognize the artist. Still, you would like to know more about this music. What should you do? You could call the radio station, but that’s too cumbersome. Wouldn’t it be nice if you could push a few buttons on your mobile phone and a few seconds later the phone would respond with the name of the artist and the title of the music you’re listening to? Perhaps even sending an email to your default email address with some supplemental information. In this paper we present an audio fingerprinting system, which makes the above scenario possible. By using the fingerprint of an unknown audio clip as a query on a fingerprint database, which contains the fingerprints of a large library of songs, the audio clip can be identified. At the core of the presented system are a highly robust fingerprint extraction method and a very efficient fingerprint search strategy, which enables searching a large fingerprint database with only limited computing resources.

911 citations

Proceedings Article
01 Jan 2003
TL;DR: The algorithm is noise and distortion resistant, computationally efficient, and massively scalable, capable of quickly identifying a short segment of music captured through a cellphone microphone in the presence of foreground voices and other dominant noise, out of a database of over a million tracks.
Abstract: We have developed and commercially deployed a flexible audio search engine. The algorithm is noise and distortion resistant, computationally efficient, and massively scalable, capable of quickly identifying a short segment of music captured through a cellphone microphone in the presence of foreground voices and other dominant noise, and through voice codec compression, out of a database of over a million tracks. The algorithm uses a combinatorially hashed time-frequency constellation analysis of the audio, yielding unusual properties such as transparency, in which multiple tracks mixed together may each be identified. Furthermore, for applications such as radio monitoring, search times on the order of a few milliseconds per query are attained, even on a massive music database.

683 citations

Journal ArticleDOI
TL;DR: The state of the art in audio information retrieval is reviewed, and recent advances in automatic speech recognition, word spotting, speaker and music identification, and audio similarity are presented with a view towards making audio less “opaque”.
Abstract: The problem of audio information retrieval is familiar to anyone who has returned from vacation to find ananswering machine full of messages. While there is not yetan "AltaVista" for the audio data type, many workers arefinding ways to automatically locate, index, and browse audio using recent advances in speech recognition and machinelistening. This paper reviews the state of the art in audioinformation retrieval, and presents recent advances in automatic speech recognition, word spotting, speaker and music identification, and audio similarity with a view towardsmaking audio less "opaque". A special section addresses intelligent interfaces for navigating and browsing audio andmultimedia documents, using automatically derived information to go beyond the tape recorder metaphor.

450 citations


Additional excerpts

  • ...978-1-4673-0174-9/12/$31.00 ©2012 IEEE ICALIP2012 763...

    [...]

Journal ArticleDOI
TL;DR: Based on the super-Gaussian statistical model, computationally efficient maximum a posteriori speech estimators are derived, which outperform the commonly applied Ephraim-Malah algorithm.
Abstract: This contribution presents two spectral amplitude estimators for acoustical background noise suppression based on maximum a posteriori estimation and super-Gaussian statistical modelling of the speech DFT amplitudes. The probability density function of the speech spectral amplitude is modelled with a simple parametric function, which allows a high approximation accuracy for Laplace- or Gamma-distributed real and imaginary parts of the speech DFT coefficients. Also, the statistical model can be adapted to optimally fit the distribution of the speech spectral amplitudes for a specific noise reduction system. Based on the super-Gaussian statistical model, computationally efficient maximum a posteriori speech estimators are derived, which outperform the commonly applied Ephraim-Malah algorithm.

343 citations

Journal ArticleDOI
TL;DR: This website delivers the matching song, as well as related music information, of immediate interest to the user through a query-by-example music sample.
Abstract: Guided by a user's query-by-example music sample, it delivers the matching song, as well as related music information, of immediate interest to the user.

263 citations


Additional excerpts

  • ...978-1-4673-0174-9/12/$31.00 ©2012 IEEE ICALIP2012 763...

    [...]