scispace - formally typeset
Proceedings ArticleDOI

Scalable and robust audio fingerprinting method tolerable to time-stretching

21 Jul 2015-pp 436-440

TL;DR: The experiment results show the method is highly tolerable to time-stretch than the state-of-the-art Shazam's audio fingerprinting, and is scalable and tolerant toTime-stretching.

AbstractA time-stretching invariant, robust audio fingerprinting method, based on landmarks in the audio spectrogram is proposed in this paper. Time-stretching of audio clips or songs are done to evade copyright detection as most of the fingerprinting techniques are time dependent. Time-stretching is also used in music industry to produce remix & song mash-ups and in multimedia broadcasting to fit content within the required duration. The proposed algorithm is based on the audio hashing of frequency peaks in the spectrogram. It is scalable and tolerant to time-stretching. The experiment results show the method is highly tolerable to time-stretch than the state-of-the-art Shazam's audio fingerprinting.

...read more


Citations
More filters
Journal ArticleDOI
09 Jan 2018-Sensors
TL;DR: audio fingerprinting techniques that can be used with the acoustic data acquired from mobile devices using data acquired with mobile devices published between 2002 and 2017 are reviewed.
Abstract: An increase in the accuracy of identification of Activities of Daily Living (ADL) is very important for different goals of Enhanced Living Environments and for Ambient Assisted Living (AAL) tasks. This increase may be achieved through identification of the surrounding environment. Although this is usually used to identify the location, ADL recognition can be improved with the identification of the sound in that particular environment. This paper reviews audio fingerprinting techniques that can be used with the acoustic data acquired from mobile devices. A comprehensive literature search was conducted in order to identify relevant English language works aimed at the identification of the environment of ADLs using data acquired with mobile devices, published between 2002 and 2017. In total, 40 studies were analyzed and selected from 115 citations. The results highlight several audio fingerprinting techniques, including Modified discrete cosine transform (MDCT), Mel-frequency cepstrum coefficients (MFCC), Principal Component Analysis (PCA), Fast Fourier Transform (FFT), Gaussian mixture models (GMM), likelihood estimation, logarithmic moduled complex lapped transform (LMCLT), support vector machine (SVM), constant Q transform (CQT), symmetric pairwise boosting (SPB), Philips robust hash (PRH), linear discriminant analysis (LDA) and discrete cosine transform (DCT).

18 citations


Cites background or methods from "Scalable and robust audio fingerpri..."

  • ...The performance of the algorithm decreases at higher additive noise in comparison with other algorithms [19], reporting an accuracy around 96....

    [...]

  • ...In [19], the authors proposed an audio fingerprinting method, based on landmarks in the audio spectrogram....

    [...]

  • ...[19] 2015 1500 audio files Proposes an audio fingerprinting method based on landmarks in the audio spectrogram Computer No No...

    [...]

  • ...[19] The authors propose an audio fingerprinting method that is tolerant to time-stretching and is scalable....

    [...]

  • ...The algorithm is based on the audio hashing of frequency peaks in the spectrogram [19]....

    [...]

Journal ArticleDOI
TL;DR: This paper proposes a way to generate a single high-quality audio recording of a meeting using no equipment other than participants' personal devices using an adaptive audio fingerprint based on spectrotemporal eigenfilters, where the fingerprint design is learned on-the-fly in a totally unsupervised way to perform well on the data at hand.
Abstract: This paper proposes a way to generate a single high-quality audio recording of a meeting using no equipment other than participants’ personal devices. Each participant in the meeting uses their mobile device as a local recording node, and they begin recording whenever they arrive in an unsynchronized fashion. The main problem in generating a single summary recording is to temporally align the various audio recordings in a robust and efficient manner. We propose a way to do this using an adaptive audio fingerprint based on spectrotemporal eigenfilters, where the fingerprint design is learned on-the-fly in a totally unsupervised way to perform well on the data at hand. The adaptive fingerprints require only a few seconds of data to learn a robust design, and they require no tuning. Our method uses an iterative, greedy two-stage alignment algorithm which finds a rough alignment using indexing techniques, and then performs a more fine-grained alignment based on Hamming distance. Our proposed system achieves $>$ 99% alignment accuracy on challenging alignment scenarios extracted from the ICSI meeting corpus, and it outperforms five other well-known and state-of-the-art fingerprint designs. We conduct extensive analyses of the factors that affect the robustness of the adaptive fingerprints, and we provide a simple heuristic that can be used to adjust the fingerprint’s robustness according to the amount of computation we are willing to perform.

8 citations


Cites background from "Scalable and robust audio fingerpri..."

  • ...Several works extend this approach to allow for tempo changes [10], pitch shifts [11], or both [12], [13]....

    [...]

Journal ArticleDOI
Jyotismita Chaki1
TL;DR: The aim of this state-of-art paper is to produce a summary and guidelines for using the broadly used methods, to identify the challenges as well as future research directions of acoustic signal processing.
Abstract: Audio signal processing is the most challenging field in the current era for an analysis of an audio signal. Audio signal classification (ASC) comprises of generating appropriate features from a sound and utilizing these features to distinguish the class the sound is most likely to fit. Based on the application’s classification domain, the characteristics extraction and classification/clustering algorithms used may be quite diverse. The paper provides the survey of the state-of art for understanding ASC’s general research scope, including different types of audio; representation of audio like acoustic, spectrogram; audio feature extraction techniques like physical, perceptual, static, dynamic; audio pattern matching approaches like pattern matching, acoustic phonetic, artificial intelligence; classification, and clustering techniques. The aim of this state-of-art paper is to produce a summary and guidelines for using the broadly used methods, to identify the challenges as well as future research directions of acoustic signal processing.

5 citations

Journal ArticleDOI
TL;DR: This paper proposes a multistep approach to address the problem of live song identification for popular bands by representing the audio as a sequence of binary codes called hashprints, derived from a set of spectrotemporal filters that are learned in an unsupervised artist-specific manner.
Abstract: The goal of live song identification is to allow concertgoers to identify a live performance by recording a few seconds of the performance on their cell phone. This paper proposes a multistep approach to address this problem for popular bands. In the first step, GPS data are used to associate the audio query with a concert in order to infer who the musical artist is. This reduces the search space to a dataset containing the artist's studio recordings. In the next step, the known-artist search is solved by representing the audio as a sequence of binary codes called hashprints, which can be efficiently matched against the database using a two-stage cross-correlation approach. The hashprint representation is derived from a set of spectrotemporal filters that are learned in an unsupervised artist-specific manner. On the Gracenote live song identification benchmark, the proposed system outperforms five other baseline systems and improves the mean reciprocal rank of the previous state of the art from 0.68 to 0.79, while simultaneously reducing the average runtime per query from 10 to 0.9 s. We conduct extensive analyses of major factors affecting system performance.

5 citations

Proceedings ArticleDOI
01 Oct 2017
TL;DR: This study suggests a set of parameterization steps which could be successfully implemented in other related audio applications and shows an accuracy of 99% using three seconds of granularity samples, and also the best compromise between processing time and performance is achieved.
Abstract: This article follows step by step a general framework for fingerprint extraction in order to develop a system for advertisements' monitoring. The parameterization process uses some spatial and spectral characteristics measured over 600 advertisements that contain various types of sounds. Key factors such as accuracy, process time, and granularity are analyzed together in order to enhance the system performance. At the end, the algorithm shows an accuracy of 99% using three seconds of granularity samples, and also the best compromise between processing time and performance is achieved. This study suggests a set of parameterization steps which could be successfully implemented in other related audio applications.

4 citations


Cites background or methods from "Scalable and robust audio fingerpri..."

  • ...In [10] fingerprint modeling is related to robustness, avoiding audio degradation causes for time-stretching....

    [...]

  • ...For example, in [8], [10], [15] and [12] a hash table is used for looking for candidates, then a method of similarity is applied for reject candidates....

    [...]


References
More filters
Journal ArticleDOI
TL;DR: A lgor i thm is appl icable in the genera l case and requi res O ( p n + n log n) t ime for any input strings o f lengths m and n even though the lower bound on T ime of O ( m n ) need not apply to all inputs.
Abstract: We start by def ining conven t ions and t e rmino logy that will be used th roughou t this paper . String C = clc~ ... cp is a subsequence of string A = aja2 "'" am if there is a mapp ing F : {1, 2 . . . . , p} ~ {1, 2, ... , m} such that F(i) = k only if c~ = ak and F is a m o n o t o n e strictly increasing funct ion (i .e. F(i) = u, F(]) = v, and i < j imply that u < v). C can be fo rmed by delet ing m p (not necessari ly ad jacen t ) symbols f rom A . F o r example , " c o u r s e " is a subsequence of " c o m p u t e r sc ience . " Str ing C is a c o m m o n s ubs equence of strings A and B if C is a s u b s e q u e n c e of A and also a subsequence of B. String C is a longest c o m m o n subsequence (abbrev ia ted LCS) of string A and B if C is a c o m m o n subsequence of A and B of maximal length , i .e. there is no c o m m o n subsequence of A and B that has grea te r length. Th roughou t this paper , we assume that A and B are strings of lengths m and n , m _< n , that have an LCS C of (unknown) length p . We assume that the symbols that may appea r in these strings c o m e f rom some a lphabet of size t . A symbol can be s tored in m e m o r y by using log t bits, which we assume will fit in one word of memory . Symbols can be c o m p a r e d (a -< b?) in one t ime unit . The n u m b e r of di f ferent symbols that actual ly appear in string B is def ined to be s (which must be less than n and t). The longest c o m m o n s u b s e q u e n c e prob lem has been solved by using a recurs ion re la t ionship on the length of the solut ion [7, 12, 16, 21]. These are general ly appl icable a lgor i thms that take O ( m n ) t ime for any input strings o f lengths m and n even though the lower bound on t ime of O ( m n ) need not apply to all inputs [2]. We present a lgor i thms that , depend ing on the na ture of the Input, may not requ i re quadra t ic t ime to r ecove r an LCS. The first a lgor i thm is appl icable in the genera l case and requi res O ( p n + n log n) t ime. T h e second a lgor i thm requi res t ime b o u n d e d by O((m + 1 p )p log n). In the c o m m o n special case where p is close to m , this a lgor i thm takes t ime

730 citations


"Scalable and robust audio fingerpri..." refers methods in this paper

  • ...There are different algorithms to compute the LCS [12]....

    [...]

  • ...It is the implementation of LCS that gives the proposed method the tolerance against timestretching....

    [...]

  • ...The longest common subsequence (LCS) between two sequences is the longest possible combination of the elements common to both the sequences such that the order of the elements in both sequences is preserved....

    [...]

Proceedings Article
01 Jan 2003
TL;DR: The algorithm is noise and distortion resistant, computationally efficient, and massively scalable, capable of quickly identifying a short segment of music captured through a cellphone microphone in the presence of foreground voices and other dominant noise, out of a database of over a million tracks.
Abstract: We have developed and commercially deployed a flexible audio search engine. The algorithm is noise and distortion resistant, computationally efficient, and massively scalable, capable of quickly identifying a short segment of music captured through a cellphone microphone in the presence of foreground voices and other dominant noise, and through voice codec compression, out of a database of over a million tracks. The algorithm uses a combinatorially hashed time-frequency constellation analysis of the audio, yielding unusual properties such as transparency, in which multiple tracks mixed together may each be identified. Furthermore, for applications such as radio monitoring, search times on the order of a few milliseconds per query are attained, even on a massive music database.

648 citations


"Scalable and robust audio fingerpri..." refers background or methods in this paper

  • ...The samples were tested on the proposed method as well as on Shazam....

    [...]

  • ...in [3], the peaks of the spectrogram of digital audio signal is less susceptible to noises and features derived from those peaks provides a robust solution to the audio fingerprinting problems....

    [...]

  • ...Wang et al.[3] proposed a highly robust audio fingerprinting technique based on landmarks of peaks in the spectrogram, which is, perhaps, the most popular audio fingerprinting technique and is commercially available as shazam....

    [...]

  • ...It clearly marks out that the proposed method is highly tolerable to time-stretching compared to Shazam....

    [...]

  • ...The results obtained by querying the second set of test samples on both the proposed as well as Shazam is displayed in Table II....

    [...]

Journal ArticleDOI
TL;DR: This article attempts to explain the operation of the phase vocoder in terms accessible to musicians, relying heavily on the familiar concepts of sine waves, filters, and additive synthesis, and employing a minimum of mathematics.
Abstract: For composers interested in the modification of natural sounds, the phase vocoder is a digital signal processing technique of potentially great significance. By itself, the phase vocoder can perform very high fidelity time-scale modification or pitch transposition of a wide variety of sounds. In conjunction with a standard software synthesis program, the phase vocoder can provide the composer with arbitrary control of individual harmonics. But use of the phase vocoder to date has been limited primarily to experts in digital signal processing. Consequently, its musical potential has remained largely untapped. In this article, I attempt to explain the operation of the phase vocoder in terms accessible to musicians. I rely heavily on the familiar concepts of sine waves, filters, and additive synthesis, and I employ a minimum of mathematics. My hope is that this tutorial will lay the groundwork for widespread use of the phase vocoder, both as a tool for sound analysis and modification, and as a catalyst for continued musical exploration.

341 citations


"Scalable and robust audio fingerpri..." refers methods in this paper

  • ...It is commonly implemented using phase-vocoder method [6][7]....

    [...]

Journal ArticleDOI
TL;DR: This paper examines the problem of phasiness in the context of time-scale modification and provides new insights into its causes, and two extensions to the standard phase vocoder algorithm are introduced, and the resulting sound quality is shown to be significantly improved.
Abstract: The phase vocoder is a well established tool for time scaling and pitch shifting speech and audio signals via modification of their short-time Fourier transforms (STFTs). In contrast to time-domain time-scaling and pitch-shifting techniques, the phase vocoder is generally considered to yield high quality results, especially for large modification factors and/or polyphonic signals. However, the phase vocoder is also known for introducing a characteristic perceptual artifact, often described as "phasiness", "reverberation", or "loss of presence". This paper examines the problem of phasiness in the context of time-scale modification and provides new insights into its causes. Two extensions to the standard phase vocoder algorithm are introduced, and the resulting sound quality is shown to be significantly improved. Moreover, the modified phase vocoder is shown to provide a factor-of-two decrease in computational cost.

329 citations


"Scalable and robust audio fingerpri..." refers methods in this paper

  • ...It is commonly implemented using phase-vocoder method [6][7]....

    [...]

Proceedings ArticleDOI
Shumeet Baluja1, Michele Covell1
01 Jan 2006
TL;DR: Waveprint uses a combination of computer-vision techniques and large-scale-data-stream processing algorithms to create compact fingerprints of audio data that can be efficiently matched, and explicitly measures the tradeoffs between performance, memory usage, and computation.
Abstract: In this paper, we introduce Waveprint, a novel method for audio identification. Waveprint uses a combination of computer-vision techniques and large-scale-data-stream processing algorithms to create compact fingerprints of audio data that can be efficiently matched. The resulting system has excellent identification capabilities for small snippets of audio that have been degraded in a variety of manners, including competing noise, poor recording quality, and cell-phone playback. We explicitly measure the tradeoffs between performance, memory usage, and computation through extensive experimentation.

109 citations


"Scalable and robust audio fingerpri..." refers methods in this paper

  • ...The application of audio fingerprinting has been significant solution to this problem and enables automatic identification and monitoring of copyrighted audio content[1]....

    [...]