Scalable and robust audio fingerprinting method tolerable to time-stretching

doi:10.1109/ICDSP.2015.7251909

Home
/
Papers
/
Scalable and robust audio fingerprinting method tolerable to time-stretching

Proceedings Article•DOI•

Scalable and robust audio fingerprinting method tolerable to time-stretching

Jacob George¹, Ashok Jhunjhunwala¹•Institutions (1)

Indian Institute of Technology Madras¹

21 Jul 2015-pp 436-440

TL;DR: The experiment results show the method is highly tolerable to time-stretch than the state-of-the-art Shazam's audio fingerprinting, and is scalable and tolerant toTime-stretching.

read less

Abstract: A time-stretching invariant, robust audio fingerprinting method, based on landmarks in the audio spectrogram is proposed in this paper. Time-stretching of audio clips or songs are done to evade copyright detection as most of the fingerprinting techniques are time dependent. Time-stretching is also used in music industry to produce remix & song mash-ups and in multimedia broadcasting to fit content within the required duration. The proposed algorithm is based on the audio hashing of frequency peaks in the spectrogram. It is scalable and tolerant to time-stretching. The experiment results show the method is highly tolerable to time-stretch than the state-of-the-art Shazam's audio fingerprinting.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Recognition of Activities of Daily Living Based on Environmental Analyses Using Audio Fingerprinting Techniques: A Systematic Review.

[...]

Ivan Miguel Pires¹, Rui Santos¹, Nuno Pombo², Nuno Pombo¹, Nuno M. Garcia², Nuno M. Garcia¹, Francisco Flórez-Revuelta³, Susanna Spinsante⁴, Rossitza Goleva⁵, Eftim Zdravevski⁶ - Show less +6 more•Institutions (6)

University of Beira Interior¹, Universidade Lusófona², University of Alicante³, Marche Polytechnic University⁴, New Bulgarian University⁵, Saints Cyril and Methodius University of Skopje⁶

09 Jan 2018-Sensors

TL;DR: audio fingerprinting techniques that can be used with the acoustic data acquired from mobile devices using data acquired with mobile devices published between 2002 and 2017 are reviewed.

...read moreread less

Abstract: An increase in the accuracy of identification of Activities of Daily Living (ADL) is very important for different goals of Enhanced Living Environments and for Ambient Assisted Living (AAL) tasks. This increase may be achieved through identification of the surrounding environment. Although this is usually used to identify the location, ADL recognition can be improved with the identification of the sound in that particular environment. This paper reviews audio fingerprinting techniques that can be used with the acoustic data acquired from mobile devices. A comprehensive literature search was conducted in order to identify relevant English language works aimed at the identification of the environment of ADLs using data acquired with mobile devices, published between 2002 and 2017. In total, 40 studies were analyzed and selected from 115 citations. The results highlight several audio fingerprinting techniques, including Modified discrete cosine transform (MDCT), Mel-frequency cepstrum coefficients (MFCC), Principal Component Analysis (PCA), Fast Fourier Transform (FFT), Gaussian mixture models (GMM), likelihood estimation, logarithmic moduled complex lapped transform (LMCLT), support vector machine (SVM), constant Q transform (CQT), symmetric pairwise boosting (SPB), Philips robust hash (PRH), linear discriminant analysis (LDA) and discrete cosine transform (DCT).

...read moreread less

26 citations

Cites background or methods from "Scalable and robust audio fingerpri..."

...The performance of the algorithm decreases at higher additive noise in comparison with other algorithms [19], reporting an accuracy around 96....
[...]
...In [19], the authors proposed an audio fingerprinting method, based on landmarks in the audio spectrogram....
[...]
...[19] 2015 1500 audio files Proposes an audio fingerprinting method based on landmarks in the audio spectrogram Computer No No...
[...]
...[19] The authors propose an audio fingerprinting method that is tolerant to time-stretching and is scalable....
[...]
...The algorithm is based on the audio hashing of frequency peaks in the spectrogram [19]....
[...]

Journal Article•DOI•

Pattern analysis based acoustic signal processing: a survey of the state-of-art

[...]

Jyotismita Chaki¹•Institutions (1)

VIT University¹

03 Feb 2020-International Journal of Speech Technology

TL;DR: The aim of this state-of-art paper is to produce a summary and guidelines for using the broadly used methods, to identify the challenges as well as future research directions of acoustic signal processing.

...read moreread less

Abstract: Audio signal processing is the most challenging field in the current era for an analysis of an audio signal. Audio signal classification (ASC) comprises of generating appropriate features from a sound and utilizing these features to distinguish the class the sound is most likely to fit. Based on the application’s classification domain, the characteristics extraction and classification/clustering algorithms used may be quite diverse. The paper provides the survey of the state-of art for understanding ASC’s general research scope, including different types of audio; representation of audio like acoustic, spectrogram; audio feature extraction techniques like physical, perceptual, static, dynamic; audio pattern matching approaches like pattern matching, acoustic phonetic, artificial intelligence; classification, and clustering techniques. The aim of this state-of-art paper is to produce a summary and guidelines for using the broadly used methods, to identify the challenges as well as future research directions of acoustic signal processing.

...read moreread less

9 citations

Journal Article•DOI•

Robust and efficient multiple alignment of unsynchronized meeting recordings

[...]

T. J. Tsai¹, Andreas Stolcke²•Institutions (2)

University of California, Berkeley¹, Microsoft²

01 May 2016-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper proposes a way to generate a single high-quality audio recording of a meeting using no equipment other than participants' personal devices using an adaptive audio fingerprint based on spectrotemporal eigenfilters, where the fingerprint design is learned on-the-fly in a totally unsupervised way to perform well on the data at hand.

...read moreread less

Abstract: This paper proposes a way to generate a single high-quality audio recording of a meeting using no equipment other than participants’ personal devices. Each participant in the meeting uses their mobile device as a local recording node, and they begin recording whenever they arrive in an unsynchronized fashion. The main problem in generating a single summary recording is to temporally align the various audio recordings in a robust and efficient manner. We propose a way to do this using an adaptive audio fingerprint based on spectrotemporal eigenfilters, where the fingerprint design is learned on-the-fly in a totally unsupervised way to perform well on the data at hand. The adaptive fingerprints require only a few seconds of data to learn a robust design, and they require no tuning. Our method uses an iterative, greedy two-stage alignment algorithm which finds a rough alignment using indexing techniques, and then performs a more fine-grained alignment based on Hamming distance. Our proposed system achieves $>$ 99% alignment accuracy on challenging alignment scenarios extracted from the ICSI meeting corpus, and it outperforms five other well-known and state-of-the-art fingerprint designs. We conduct extensive analyses of the factors that affect the robustness of the adaptive fingerprints, and we provide a simple heuristic that can be used to adjust the fingerprint’s robustness according to the amount of computation we are willing to perform.

...read moreread less

8 citations

Cites background from "Scalable and robust audio fingerpri..."

...Several works extend this approach to allow for tempo changes [10], pitch shifts [11], or both [12], [13]....
[...]

Journal Article•DOI•

Known-Artist Live Song Identification Using Audio Hashprints

[...]

TJ Tsai¹, Thomas Prätzlich, Meinard Müller•Institutions (1)

Harvey Mudd College¹

15 Feb 2017-IEEE Transactions on Multimedia

TL;DR: This paper proposes a multistep approach to address the problem of live song identification for popular bands by representing the audio as a sequence of binary codes called hashprints, derived from a set of spectrotemporal filters that are learned in an unsupervised artist-specific manner.

...read moreread less

Abstract: The goal of live song identification is to allow concertgoers to identify a live performance by recording a few seconds of the performance on their cell phone. This paper proposes a multistep approach to address this problem for popular bands. In the first step, GPS data are used to associate the audio query with a concert in order to infer who the musical artist is. This reduces the search space to a dataset containing the artist's studio recordings. In the next step, the known-artist search is solved by representing the audio as a sequence of binary codes called hashprints, which can be efficiently matched against the database using a two-stage cross-correlation approach. The hashprint representation is derived from a set of spectrotemporal filters that are learned in an unsupervised artist-specific manner. On the Gracenote live song identification benchmark, the proposed system outperforms five other baseline systems and improves the mean reciprocal rank of the previous state of the art from 0.68 to 0.79, while simultaneously reducing the average runtime per query from 10 to 0.9 s. We conduct extensive analyses of major factors affecting system performance.

...read moreread less

7 citations

Proceedings Article•DOI•

Audio fingerprint parameterization for multimedia advertising identification

[...]

Jose Medina¹, Francisco Vega¹, Daniel Mendoza¹, Victor Saquicela¹, Mauricio Espinoza¹ - Show less +1 more•Institutions (1)

University of Cuenca¹

01 Oct 2017

TL;DR: This study suggests a set of parameterization steps which could be successfully implemented in other related audio applications and shows an accuracy of 99% using three seconds of granularity samples, and also the best compromise between processing time and performance is achieved.

...read moreread less

Abstract: This article follows step by step a general framework for fingerprint extraction in order to develop a system for advertisements' monitoring. The parameterization process uses some spatial and spectral characteristics measured over 600 advertisements that contain various types of sounds. Key factors such as accuracy, process time, and granularity are analyzed together in order to enhance the system performance. At the end, the algorithm shows an accuracy of 99% using three seconds of granularity samples, and also the best compromise between processing time and performance is achieved. This study suggests a set of parameterization steps which could be successfully implemented in other related audio applications.

...read moreread less

4 citations

Cites background or methods from "Scalable and robust audio fingerpri..."

...In [10] fingerprint modeling is related to robustness, avoiding audio degradation causes for time-stretching....
[...]
...For example, in [8], [10], [15] and [12] a hash table is used for looking for candidates, then a method of similarity is applied for reject candidates....
[...]

References

PDF

Open Access

More filters

Proceedings Article•

Survey and evaluation of audio fingerprinting schemes for mobile query-by-example applications

[...]

Vijay Chandrasekhar¹, Matthew Sharifi², David A. Ross²•Institutions (2)

Stanford University¹, Google²

01 Jan 2011

TL;DR: This work surveys and evaluates popular audio fingerprinting schemes in a common framework with short query probes captured from cell phones, and reports results important for mobile applications: Receiver Operating Characteristic (ROC) performance, size of fingerprints generated compared to size of audio probe, and transmission delay if the fingerprint data were to be transmitted over a wireless link.

...read moreread less

Abstract: We survey and evaluate popular audio fingerprinting schemes in a common framework with short query probes captured from cell phones. We report and discuss results important for mobile applications: Receiver Operating Characteristic (ROC) performance, size of fingerprints generated compared to size of audio probe, and transmission delay if the fingerprint data were to be transmitted over a wireless link. We hope that the evaluation in this work will guide work towards reducing latency in practical mobile audio retrieval applications.

...read moreread less

69 citations

"Scalable and robust audio fingerpri..." refers background in this paper

...Another potential application of audio fingerprinting is queryby-example (QbE)[2], wherein a small snippet of the audio is used to search for the original track from the database....
[...]

Proceedings Article•

A scalable audio fingerprint method with robustness to pitch-shifting

[...]

Sébastien Fenet¹, Gael Richard², Yves Grenier³•Institutions (3)

Institut Mines-Télécom¹, Grenoble Institute of Technology², Télécom ParisTech³

01 Oct 2011

TL;DR: A novel fingerprint technique, relying on a hashing technique coupled with a CQT-based fingerprint, with a strong robustness to pitch-shifting, with an efficient post-processing for the removal of false alarms is proposed.

...read moreread less

Abstract: Audio fingerprint techniques should be robust to a variety of distortions due to noisy transmission channels or specific sound processing. Although most of nowadays techniques are robust to the majority of them, the quasi-systematic use of a spectral representation makes them possibly sensitive to pitch-shifting. This distortion indeed induces a modification of the spectral content of the signal. In this paper, we propose a novel fingerprint technique, relying on a hashing technique coupled with a CQT-based fingerprint, with a strong robustness to pitch-shifting. Furthermore, we have associated this method with an efficient post-processing for the removal of false alarms. We also present the adaptation of a database pruning technique to our specific context. We have evaluated our approach on a real-life broadcast monitoring scenario. The analyzed data consisted of 120 hours of real radio broadcast (thus containing all the distortions that would be found in an industrial context). The reference database consisted of 30.000 songs. Our method, thanks to its increased robustness to pitch-shifting, shows an excellent detection score.

...read moreread less

52 citations

"Scalable and robust audio fingerpri..." refers methods in this paper

...[11] describes an audio hashing technique based on landmarks from Constant Q Transform, instead of audio spectrogram, which is slightly tolerable to pitch-shifting....
[...]
...An algorithm to prune the database for redundant feature codes is described in [11]....
[...]

Proceedings Article•DOI•

A novel audio fingerprinting method robust to time scale modification and pitch shifting

[...]

Bilei Zhu¹, Wei Li¹, Zhurong Wang¹, Xiangyang Xue¹•Institutions (1)

Fudan University¹

25 Oct 2010

TL;DR: A novel audio fingerprinting method that is highly robust to Time Scale Modification (TSM) and pitch shifting is proposed, based on computer-vision techniques that transforms each 1-D audio signal into a 2-D image and treats TSM and pitch shift of the audio signal as stretch and translation of the corresponding image.

...read moreread less

Abstract: A novel audio fingerprinting method that is highly robust to Time Scale Modification (TSM) and pitch shifting is proposed. Instead of simply employing spectral or tempo-related features, our system is based on computer-vision techniques. We transform each 1-D audio signal into a 2-D image and treat TSM and pitch shifting of the audio signal as stretch and translation of the corresponding image. Robust local descriptors are extracted from the image and matched against those of the reference audio signals. Experimental results show that our system is highly robust to various audio distortions, including the challenging TSM and pitch shifting.

...read moreread less

40 citations

"Scalable and robust audio fingerpri..." refers methods in this paper

...[9] used SIFT features on audio spectrogram images to achieve a fingerprinting that is tolerant to timestretching and pitch shifting....
[...]

Robust Identification of Time-Scaled Audio

[...]

Rolf Bardeli

01 Jun 2004

TL;DR: This paper proposes a robust fingerprinting technique designed for the identification of time-scaled audio data that uses extracted fingerprints as an input to an algebraic indexing technique that has already been successfully applied to the task of audio identification.

...read moreread less

Abstract: Automatic identification of audio titles on radio broadcasts is a first step towards automatic annotation of radio programmes. Systems designed for the purpose of identification have to deal with a variety of postprocessing potentially imposed on audio material at the radio stations. One of the more difficult techniques to be handled is time-scaling, i.e., the variation of playback speed. In this paper we propose a robust fingerprinting technique designed for the identification of time-scaled audio data. To allow for fast timescale invariant audio identification, the extracted fingerprints are used as an input to an algebraic indexing technique that has already been successfully applied to the task of audio identification.

...read moreread less

18 citations

Additional excerpts

...[10] obtained a robust identification method...
[...]

Proceedings Article•DOI•

A novel local audio fingerprinting algorithm

[...]

Mani Malekesmaeili¹, Rabab K. Ward¹•Institutions (1)

University of British Columbia¹

12 Nov 2012

TL;DR: A local fingerprinting algorithm is proposed for the purpose of audio copy detection that outperforms the state-of-the-art and is robust to noise as well as tempo and pitch modifications of the audio signal.

...read moreread less

Abstract: A local fingerprinting algorithm is proposed for the purpose of audio copy detection. The proposed algorithm is robust to noise as well as tempo and pitch modifications of the audio signal. The fingerprints are extracted from adaptively scaled patches of the time-chroma representation of the audio signal. The proposed time-chroma representation, converts tempo change and pitch shift attacks on an audio signal to scaling and circular shift attacks on images, respectively. The proposed algorithm is shown to outperform the state-of-the-art.

...read moreread less

14 citations

"Scalable and robust audio fingerpri..." refers background or methods in this paper

...[5] have proposed a local feature extraction method based on time-chroma analysis of audio....
[...]
...As pointed out in [5], most of the existing audio fingerprinting implementations are not invariant to time-stretching, and therefore it has become the most common attack to avoid copyright infringement detection....
[...]