Proceedings ArticleDOI
Robust audio identification for MP3 popular music
Wei Li,Yaduo Liu,Xiangyang Xue +2 more
- pp 627-634
Reads0
Chats0
TLDR
Experiments show that compressed-domain spectral entropy as the audio feature to implement a novel audio fingerprinting algorithm exhibits strong robustness against various audio signal distortions like recompression, noise interference, echo addition, equalization, band-pass filtering, pitch shifting, and slight time-scale modification.Abstract:
Audio identification via fingerprint has been an active research field with wide applications for years. Many technical papers were published and commercial software systems were also employed. However, most of these previously reported methods work on the raw audio format in spite of the fact that nowadays compressed format audio, especially MP3 music, has grown into the dominant way to store on personal computers and transmit on the Internet. It would be interesting if a compressed unknown audio fragment is able to be directly recognized from the database without the fussy and time-consuming decompression-identification-recompression procedure. So far, very few algorithms run directly in the compressed domain for music information retrieval, and most of them take advantage of MDCT coefficients or derived energy type of features. As a first attempt, we propose in this paper utilizing compressed-domain spectral entropy as the audio feature to implement a novel audio fingerprinting algorithm. The compressed songs stored in a music database and the possibly distorted compressed query excerpts are first partially decompressed to obtain the MDCT coefficients as the intermediate result. Then by grouping granules into longer blocks, remapping the MDCT coefficients into 192 new frequency lines to unify the frequency distribution of long and short windows, and defining 9 new subbands which cover the main frequency bandwidth of popular songs in accordance with the scale-factor bands of short windows, we calculate the spectral entropy of all consecutive blocks and come to the final fingerprint sequence by means of magnitude relationship modeling. Experiments show that such fingerprints exhibit strong robustness against various audio signal distortions like recompression, noise interference, echo addition, equalization, band-pass filtering, pitch shifting, and slight time-scale modification etc. For 5s-long query examples which might be severely degraded, an average top-five retrieval precision rate of more than 90% can be obtained in our test data set composed of 1822 popular songs.read more
Citations
More filters
Patent
Method for Embedding Voice Mail in a Spoken Utterance Using a Natural Language Processing Computer System
TL;DR: In this article, a method for processing a voice message in a computerized system is presented, which receives and records a speech utterance including a message portion and a communication portion.
Patent
Systems and methods for sound recognition
TL;DR: In this paper, a system and methods for recognizing sounds are provided, where user input relating to one or more sounds is received from a computing device, and instructions are executed by a processor to discriminate the one or multiple sounds, extract music features from the sounds, analyze the music features using databases, and obtain information regarding the features based on the analysis.
Patent
Systems and methods for enabling natural language processing
TL;DR: In this paper, the authors present a system and methods for searching databases by sound data input, and present a search technology that furnishes search results in a fast and accurate manner.
Journal ArticleDOI
SIFT-based local spectrogram image descriptor: a novel feature for robust music identification
TL;DR: In this article, scale invariant feature transform (SIFT) local descriptors computed from a spectrogram image were used as sub-fingerprints for music identification. But, their robustness is limited by the time-frequency misalignments caused by time stretching and pitch shifting.
Journal ArticleDOI
An analysis of content-based classification of audio signals using a fuzzy c-means algorithm
Mohammad A. Haque,Jong-Myon Kim +1 more
TL;DR: An efficient content-based audio classification approach to classify audio signals into broad genres using a fuzzy c-means (FCM) algorithm that outperforms the existing state-of-the-art audio classification systems by more than 11% in classification performance.
References
More filters
Book
Digital Watermarking and Steganography
TL;DR: This new edition now contains essential information on steganalysis and steganography, and digital watermark embedding is given a complete update with new processes and applications.
Proceedings Article
A Highly Robust Audio Fingerprinting System.
Jaap A. Haitsma,Ton Kalker +1 more
TL;DR: An audio fingerprinting system that uses the fingerprint of an unknown audio clip as a query on a fingerprint database, which contains the fingerprints of a large library of songs, the audio clip can be identified.
Journal ArticleDOI
A Review of Audio Fingerprinting
TL;DR: Different techniques describing its functional blocks as parts of a common, unified framework for audio fingerprinting are reviewed.
Journal ArticleDOI
A feature-based robust digital image watermarking scheme
Chih-Wei Tang,Hsueh-Ming Hang +1 more
TL;DR: A robust digital image watermarking scheme that combines image feature extraction and image normalization is proposed to resist both geometric distortion and signal processing attacks.
Proceedings ArticleDOI
A feature-based robust digital image watermarking scheme
M. Hemahlathaa,C. Chellppan +1 more
TL;DR: The overall architecture for a feature-based robust digital image watermarking scheme is designed and a simulated attacking procedure is performed using predefined attacks to evaluate the robustness of every candidate feature region selected.