scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Scalable and robust audio fingerprinting method tolerable to time-stretching

21 Jul 2015-pp 436-440
TL;DR: The experiment results show the method is highly tolerable to time-stretch than the state-of-the-art Shazam's audio fingerprinting, and is scalable and tolerant toTime-stretching.
Abstract: A time-stretching invariant, robust audio fingerprinting method, based on landmarks in the audio spectrogram is proposed in this paper. Time-stretching of audio clips or songs are done to evade copyright detection as most of the fingerprinting techniques are time dependent. Time-stretching is also used in music industry to produce remix & song mash-ups and in multimedia broadcasting to fit content within the required duration. The proposed algorithm is based on the audio hashing of frequency peaks in the spectrogram. It is scalable and tolerant to time-stretching. The experiment results show the method is highly tolerable to time-stretch than the state-of-the-art Shazam's audio fingerprinting.
Citations
More filters
Journal ArticleDOI
09 Jan 2018-Sensors
TL;DR: audio fingerprinting techniques that can be used with the acoustic data acquired from mobile devices using data acquired with mobile devices published between 2002 and 2017 are reviewed.
Abstract: An increase in the accuracy of identification of Activities of Daily Living (ADL) is very important for different goals of Enhanced Living Environments and for Ambient Assisted Living (AAL) tasks. This increase may be achieved through identification of the surrounding environment. Although this is usually used to identify the location, ADL recognition can be improved with the identification of the sound in that particular environment. This paper reviews audio fingerprinting techniques that can be used with the acoustic data acquired from mobile devices. A comprehensive literature search was conducted in order to identify relevant English language works aimed at the identification of the environment of ADLs using data acquired with mobile devices, published between 2002 and 2017. In total, 40 studies were analyzed and selected from 115 citations. The results highlight several audio fingerprinting techniques, including Modified discrete cosine transform (MDCT), Mel-frequency cepstrum coefficients (MFCC), Principal Component Analysis (PCA), Fast Fourier Transform (FFT), Gaussian mixture models (GMM), likelihood estimation, logarithmic moduled complex lapped transform (LMCLT), support vector machine (SVM), constant Q transform (CQT), symmetric pairwise boosting (SPB), Philips robust hash (PRH), linear discriminant analysis (LDA) and discrete cosine transform (DCT).

26 citations


Cites background or methods from "Scalable and robust audio fingerpri..."

  • ...The performance of the algorithm decreases at higher additive noise in comparison with other algorithms [19], reporting an accuracy around 96....

    [...]

  • ...In [19], the authors proposed an audio fingerprinting method, based on landmarks in the audio spectrogram....

    [...]

  • ...[19] 2015 1500 audio files Proposes an audio fingerprinting method based on landmarks in the audio spectrogram Computer No No...

    [...]

  • ...[19] The authors propose an audio fingerprinting method that is tolerant to time-stretching and is scalable....

    [...]

  • ...The algorithm is based on the audio hashing of frequency peaks in the spectrogram [19]....

    [...]

Journal ArticleDOI
Jyotismita Chaki1
TL;DR: The aim of this state-of-art paper is to produce a summary and guidelines for using the broadly used methods, to identify the challenges as well as future research directions of acoustic signal processing.
Abstract: Audio signal processing is the most challenging field in the current era for an analysis of an audio signal. Audio signal classification (ASC) comprises of generating appropriate features from a sound and utilizing these features to distinguish the class the sound is most likely to fit. Based on the application’s classification domain, the characteristics extraction and classification/clustering algorithms used may be quite diverse. The paper provides the survey of the state-of art for understanding ASC’s general research scope, including different types of audio; representation of audio like acoustic, spectrogram; audio feature extraction techniques like physical, perceptual, static, dynamic; audio pattern matching approaches like pattern matching, acoustic phonetic, artificial intelligence; classification, and clustering techniques. The aim of this state-of-art paper is to produce a summary and guidelines for using the broadly used methods, to identify the challenges as well as future research directions of acoustic signal processing.

9 citations

Journal ArticleDOI
TL;DR: This paper proposes a way to generate a single high-quality audio recording of a meeting using no equipment other than participants' personal devices using an adaptive audio fingerprint based on spectrotemporal eigenfilters, where the fingerprint design is learned on-the-fly in a totally unsupervised way to perform well on the data at hand.
Abstract: This paper proposes a way to generate a single high-quality audio recording of a meeting using no equipment other than participants’ personal devices. Each participant in the meeting uses their mobile device as a local recording node, and they begin recording whenever they arrive in an unsynchronized fashion. The main problem in generating a single summary recording is to temporally align the various audio recordings in a robust and efficient manner. We propose a way to do this using an adaptive audio fingerprint based on spectrotemporal eigenfilters, where the fingerprint design is learned on-the-fly in a totally unsupervised way to perform well on the data at hand. The adaptive fingerprints require only a few seconds of data to learn a robust design, and they require no tuning. Our method uses an iterative, greedy two-stage alignment algorithm which finds a rough alignment using indexing techniques, and then performs a more fine-grained alignment based on Hamming distance. Our proposed system achieves $>$ 99% alignment accuracy on challenging alignment scenarios extracted from the ICSI meeting corpus, and it outperforms five other well-known and state-of-the-art fingerprint designs. We conduct extensive analyses of the factors that affect the robustness of the adaptive fingerprints, and we provide a simple heuristic that can be used to adjust the fingerprint’s robustness according to the amount of computation we are willing to perform.

8 citations


Cites background from "Scalable and robust audio fingerpri..."

  • ...Several works extend this approach to allow for tempo changes [10], pitch shifts [11], or both [12], [13]....

    [...]

Journal ArticleDOI
TL;DR: This paper proposes a multistep approach to address the problem of live song identification for popular bands by representing the audio as a sequence of binary codes called hashprints, derived from a set of spectrotemporal filters that are learned in an unsupervised artist-specific manner.
Abstract: The goal of live song identification is to allow concertgoers to identify a live performance by recording a few seconds of the performance on their cell phone. This paper proposes a multistep approach to address this problem for popular bands. In the first step, GPS data are used to associate the audio query with a concert in order to infer who the musical artist is. This reduces the search space to a dataset containing the artist's studio recordings. In the next step, the known-artist search is solved by representing the audio as a sequence of binary codes called hashprints, which can be efficiently matched against the database using a two-stage cross-correlation approach. The hashprint representation is derived from a set of spectrotemporal filters that are learned in an unsupervised artist-specific manner. On the Gracenote live song identification benchmark, the proposed system outperforms five other baseline systems and improves the mean reciprocal rank of the previous state of the art from 0.68 to 0.79, while simultaneously reducing the average runtime per query from 10 to 0.9 s. We conduct extensive analyses of major factors affecting system performance.

7 citations

Proceedings ArticleDOI
01 Oct 2017
TL;DR: This study suggests a set of parameterization steps which could be successfully implemented in other related audio applications and shows an accuracy of 99% using three seconds of granularity samples, and also the best compromise between processing time and performance is achieved.
Abstract: This article follows step by step a general framework for fingerprint extraction in order to develop a system for advertisements' monitoring. The parameterization process uses some spatial and spectral characteristics measured over 600 advertisements that contain various types of sounds. Key factors such as accuracy, process time, and granularity are analyzed together in order to enhance the system performance. At the end, the algorithm shows an accuracy of 99% using three seconds of granularity samples, and also the best compromise between processing time and performance is achieved. This study suggests a set of parameterization steps which could be successfully implemented in other related audio applications.

4 citations


Cites background or methods from "Scalable and robust audio fingerpri..."

  • ...In [10] fingerprint modeling is related to robustness, avoiding audio degradation causes for time-stretching....

    [...]

  • ...For example, in [8], [10], [15] and [12] a hash table is used for looking for candidates, then a method of similarity is applied for reject candidates....

    [...]

References
More filters
Proceedings Article
01 Jan 2011
TL;DR: This work surveys and evaluates popular audio fingerprinting schemes in a common framework with short query probes captured from cell phones, and reports results important for mobile applications: Receiver Operating Characteristic (ROC) performance, size of fingerprints generated compared to size of audio probe, and transmission delay if the fingerprint data were to be transmitted over a wireless link.
Abstract: We survey and evaluate popular audio fingerprinting schemes in a common framework with short query probes captured from cell phones. We report and discuss results important for mobile applications: Receiver Operating Characteristic (ROC) performance, size of fingerprints generated compared to size of audio probe, and transmission delay if the fingerprint data were to be transmitted over a wireless link. We hope that the evaluation in this work will guide work towards reducing latency in practical mobile audio retrieval applications.

69 citations


"Scalable and robust audio fingerpri..." refers background in this paper

  • ...Another potential application of audio fingerprinting is queryby-example (QbE)[2], wherein a small snippet of the audio is used to search for the original track from the database....

    [...]

Proceedings Article
01 Oct 2011
TL;DR: A novel fingerprint technique, relying on a hashing technique coupled with a CQT-based fingerprint, with a strong robustness to pitch-shifting, with an efficient post-processing for the removal of false alarms is proposed.
Abstract: Audio fingerprint techniques should be robust to a variety of distortions due to noisy transmission channels or specific sound processing. Although most of nowadays techniques are robust to the majority of them, the quasi-systematic use of a spectral representation makes them possibly sensitive to pitch-shifting. This distortion indeed induces a modification of the spectral content of the signal. In this paper, we propose a novel fingerprint technique, relying on a hashing technique coupled with a CQT-based fingerprint, with a strong robustness to pitch-shifting. Furthermore, we have associated this method with an efficient post-processing for the removal of false alarms. We also present the adaptation of a database pruning technique to our specific context. We have evaluated our approach on a real-life broadcast monitoring scenario. The analyzed data consisted of 120 hours of real radio broadcast (thus containing all the distortions that would be found in an industrial context). The reference database consisted of 30.000 songs. Our method, thanks to its increased robustness to pitch-shifting, shows an excellent detection score.

52 citations


"Scalable and robust audio fingerpri..." refers methods in this paper

  • ...[11] describes an audio hashing technique based on landmarks from Constant Q Transform, instead of audio spectrogram, which is slightly tolerable to pitch-shifting....

    [...]

  • ...An algorithm to prune the database for redundant feature codes is described in [11]....

    [...]

Proceedings ArticleDOI
25 Oct 2010
TL;DR: A novel audio fingerprinting method that is highly robust to Time Scale Modification (TSM) and pitch shifting is proposed, based on computer-vision techniques that transforms each 1-D audio signal into a 2-D image and treats TSM and pitch shift of the audio signal as stretch and translation of the corresponding image.
Abstract: A novel audio fingerprinting method that is highly robust to Time Scale Modification (TSM) and pitch shifting is proposed. Instead of simply employing spectral or tempo-related features, our system is based on computer-vision techniques. We transform each 1-D audio signal into a 2-D image and treat TSM and pitch shifting of the audio signal as stretch and translation of the corresponding image. Robust local descriptors are extracted from the image and matched against those of the reference audio signals. Experimental results show that our system is highly robust to various audio distortions, including the challenging TSM and pitch shifting.

40 citations


"Scalable and robust audio fingerpri..." refers methods in this paper

  • ...[9] used SIFT features on audio spectrogram images to achieve a fingerprinting that is tolerant to timestretching and pitch shifting....

    [...]

01 Jun 2004
TL;DR: This paper proposes a robust fingerprinting technique designed for the identification of time-scaled audio data that uses extracted fingerprints as an input to an algebraic indexing technique that has already been successfully applied to the task of audio identification.
Abstract: Automatic identification of audio titles on radio broadcasts is a first step towards automatic annotation of radio programmes. Systems designed for the purpose of identification have to deal with a variety of postprocessing potentially imposed on audio material at the radio stations. One of the more difficult techniques to be handled is time-scaling, i.e., the variation of playback speed. In this paper we propose a robust fingerprinting technique designed for the identification of time-scaled audio data. To allow for fast timescale invariant audio identification, the extracted fingerprints are used as an input to an algebraic indexing technique that has already been successfully applied to the task of audio identification.

18 citations


Additional excerpts

  • ...[10] obtained a robust identification method...

    [...]

Proceedings ArticleDOI
12 Nov 2012
TL;DR: A local fingerprinting algorithm is proposed for the purpose of audio copy detection that outperforms the state-of-the-art and is robust to noise as well as tempo and pitch modifications of the audio signal.
Abstract: A local fingerprinting algorithm is proposed for the purpose of audio copy detection. The proposed algorithm is robust to noise as well as tempo and pitch modifications of the audio signal. The fingerprints are extracted from adaptively scaled patches of the time-chroma representation of the audio signal. The proposed time-chroma representation, converts tempo change and pitch shift attacks on an audio signal to scaling and circular shift attacks on images, respectively. The proposed algorithm is shown to outperform the state-of-the-art.

14 citations


"Scalable and robust audio fingerpri..." refers background or methods in this paper

  • ...[5] have proposed a local feature extraction method based on time-chroma analysis of audio....

    [...]

  • ...As pointed out in [5], most of the existing audio fingerprinting implementations are not invariant to time-stretching, and therefore it has become the most common attack to avoid copyright infringement detection....

    [...]