Author
Hongxue Wang
Bio: Hongxue Wang is an academic researcher from Shanghai University. The author has contributed to research in topics: Noise & Speech coding. The author has an hindex of 1, co-authored 1 publications receiving 12 citations.
Papers
More filters
••
01 Jan 2011TL;DR: A robust audio feature, local energy centroid (LEC), which can represent the energy conglomeration degree of the relative small region in the spectrum is introduced and generated based on the LEC feature which is conducive to enhance the robustness of system.
Abstract: Audio fingerprint is an effective representation of an audio signal using low-level features and can be used to identify unlabeled audio based on its content. In this paper, we introduce a robust audio feature, local energy centroid (LEC), which can represent the energy conglomeration degree of the relative small region in the spectrum. Our audio fingerprint is generated based on the LEC feature which is conducive to enhance the robustness of system. In audio retrieval processing, an improved scoring strategy is proposed to resist the linear speed change. Experimental results show that the new fingerprinting system is quite robust in the present of noise and the proposed method can achieve satisfying recognition accuracy.
13 citations
Cited by
More filters
••
TL;DR: audio fingerprinting techniques that can be used with the acoustic data acquired from mobile devices using data acquired with mobile devices published between 2002 and 2017 are reviewed.
Abstract: An increase in the accuracy of identification of Activities of Daily Living (ADL) is very important for different goals of Enhanced Living Environments and for Ambient Assisted Living (AAL) tasks. This increase may be achieved through identification of the surrounding environment. Although this is usually used to identify the location, ADL recognition can be improved with the identification of the sound in that particular environment. This paper reviews audio fingerprinting techniques that can be used with the acoustic data acquired from mobile devices. A comprehensive literature search was conducted in order to identify relevant English language works aimed at the identification of the environment of ADLs using data acquired with mobile devices, published between 2002 and 2017. In total, 40 studies were analyzed and selected from 115 citations. The results highlight several audio fingerprinting techniques, including Modified discrete cosine transform (MDCT), Mel-frequency cepstrum coefficients (MFCC), Principal Component Analysis (PCA), Fast Fourier Transform (FFT), Gaussian mixture models (GMM), likelihood estimation, logarithmic moduled complex lapped transform (LMCLT), support vector machine (SVM), constant Q transform (CQT), symmetric pairwise boosting (SPB), Philips robust hash (PRH), linear discriminant analysis (LDA) and discrete cosine transform (DCT).
26 citations
••
TL;DR: A high-performance audio fingerprinting system used in real-world query-by-example applications for acoustic audio-based content identification, especially for use in heterogeneous portable consumer devices or on-line audio distributed system is proposed.
Abstract: In this paper, we propose a high-performance audio fingerprinting system used in real-world query-by-example applications for acoustic audio-based content identification, especially for use in heterogeneous portable consumer devices or on-line audio distributed system. In the proposed method, audio fingerprints are generated using a modulated complex lapped transform-based non-repeating foreground audio extraction and an adaptive thresholding method for prominent peak detection. Effective matching is performed using a robust peak-pair-based hash function of non-repeating foreground audio to protect against noise, echo, artifacts from pitch-shifting, time-stretching, resampling, equalization, or compression. Experimental results confirm that the proposed method is quite robust in various distorted conditions and achieves preliminarily promising accuracy results.
15 citations
••
TL;DR: A salient audio peak‐pair fingerprint, based on a modulated complex lapped transform, improves the accuracy of the audio fingerprinting system in actual noisy environments with low computational complexity.
Abstract: The robustness of an audio fingerprinting system in an actual noisy environment is a major challenge for audio-based content identification. This paper proposes a high-performance audio fingerprint extraction method for use in portable consumer devices. In the proposed method, a salient audio peak-pair fingerprint, based on a modulated complex lapped transform, improves the accuracy of the audio fingerprinting system in actual noisy environments with low computational complexity. Experimental results confirm that the proposed method is quite robust in different noise conditions and achieves promising preliminary accuracy results.
6 citations
••
TL;DR: The authors present a novel framework for content-based audio retrieval based on the audio fingerprinting scheme that is robust against large linear speed changes and their scheme is robust to several signal processing attacks and manipulations except for linear speed change.
Abstract: Audio fingerprinting is the process to obtain a compact content-based signature that summarizes the essence of an audio clip. In general, existing audio fingerprinting schemes based on wavelet transforms are not robust against large linear speed changes. The authors present a novel framework for content-based audio retrieval based on the audio fingerprinting scheme that is robust against large linear speed changes. In the proposed scheme, 8 levels Daubechies wavelet decomposition is adopted for extracting time-frequency features and two fingerprint extraction algorithms are designed. The experimental results from this study are discussed further into the article. DOI: 10.4018/jdcf.2012040104 50 International Journal of Digital Crime and Forensics, 4(2), 49-69, April-June 2012 Copyright © 2012, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. local energy centroid (LEC) was proposed to represent the energy conglomeration degree of the relative small region in the spectrum (Pan et al., 2011), while a robust audio fingerprinting algorithm in the MP3 compressed domain was proposed with high robustness to time scale modification (Zhou & Zhu, 2011). Among existing transform-based audio fingerprinting schemes, the schemes based on the wavelet transform are very popular, since the wavelet transform or more particularly the discrete wavelet transform is a relatively recent and computationally efficient technique for extracting information about non-stationary signals like audio. Wavelet transform is a local transformation on a signal in time and frequency domains, which can effectively extract information from the signal, and do multi-scale detailed analysis on a function or signal by functions such as scaling and translation, thereby can solve many difficult issues which cannot be solved by the Fourier transform. Therefore, our paper focuses on the wavelet transform based schemes. The existing works based on wavelet transforms can be classified into the following two categories. The first type of fingerprinting schemes performs the wavelet transform on each audio frame directly to extract time-frequency features for audio fingerprinting. In Lu (2002), the one dimensional continuous Morlet wavelet transform is adopted to extract two fingerprints for authentication and recognition purposes, respectively. In Ghouti and Bouridane (2006), a robust perceptual audio hashing scheme using balanced multiwavelets (BMW) is proposed. They first perform 5 levels wavelet decomposition on each audio frame and divide the 5 decomposition sub-bands’ coefficients into 32 different frequency bands. Then the estimation quantization (EQ) with a window of 5 audio samples is adopted. Finally, 32 bits subfingerprinting is extracted according to the relationship between the log variances of each sub-bands’ coefficients and the mean of all the log variances for each audio frame. They do several experiments to demonstrate that their scheme is robust to several signal processing attacks and manipulations except for linear speed change. The other type of fingerprinting schemes introduces the computer vision technique to convert the audio clip into a 2-D spectrogram and then apply the wavelet transform. In Ke et al. (2005), the spectrogram of each audio snippet is viewed as a 2-D image and the wavelet transform is used to extract 860 descriptors for a 10 seconds audio clip. Then apply the pairwise boosting scheme to learn compact, discriminative, local descriptors that are efficient in audio retrieval. This algorithm can finish retrieving quickly and accurately in practical systems with poor recording quality or significant ambient noises. In Baluja and Covell (2006, 2007), the so-called Waveprint, combining of computer vision and data stream processing, was proposed. The Harr wavelet is used for extracting the t top magnitude wavelets for each spectral image. And the selected features are modeled by the Min-Hash technique. In the retrieval step, the locality sensitive hashing (LSH) technique is introduced. This algorithm exhibits an excellent identification rate against content-preserving degradations except for linear speed changes. Furthermore, the tradeoffs between the performance, memory usage, and computation are analyzed through extensive experiments. As an extension, the parameters of the system are analyzed and verified in Baluja and Covell (2008). This system shows superiority in terms of memory usage and computation, while being more accurate when compared with Ke et al. (2005). From above, we can obviously know that the existing works based on wavelet transforms are not robust against large linear speed changes in common. Based on this, a novel framework of audio fingerprinting which is robust against large linear speed changes is proposed in this paper. We adopt the Daubechies wavelet transform Daub8 in our framework. Compared with the Haar wavelet transform, the scaling signals and wavelets for the Daubechies wavelet transforms have slightly longer supports, i.e., averages and differences are produced using just a few more values from the signal. This 19 more pages are available in the full version of this document, which may be purchased using the \"Add to Cart\" button on the product's webpage: www.igi-global.com/article/daubechies-wavelets-basedrobust-audio/68409?camid=4v1 This title is available in InfoSci-Journals, InfoSci-Journal Disciplines Computer Science, Security, and Information Technology, InfoSci-Select, InfoSci-Surveillance, Security, and Defense eJournal Collection, InfoSci-Computer Science and IT Knowledge Solutions – Journals. Recommend this product to your librarian: www.igi-global.com/e-resources/libraryrecommendation/?id=2
5 citations
••
TL;DR: The proposed fingerprint method in a fundamental frequency band makes up for the weakness of existing methods generated from frequency domain and archives the precision rate of the range from 95 to 100% according to the degree of manipulation, which compares favorably with the precision rates obtained by traditional approaches.
Abstract: To enhance the tracking performance of illegal audio copies, we introduce a robust audio fingerprinting method against various attacks in this paper. Most audio fingerprints consist of the information in the frequency band of audio. These fingerprinting methods may lose the uniqueness of the audio fingerprint by irregular movement such as an attack with pitch value changes. The proposed fingerprint method in a fundamental frequency band makes up for the weakness of existing methods generated from frequency domain. Using the geometrical property of the proposed method, a new hashing method is employed in the similarity calculation process to compare the audio contents. In order to prove the validity of proposed algorithm, we experiment for six environments such as tempo, pitch, speed modification, noise addition, low pass filter and high pass filter. The proposed method shows the highest level of performance in most experimental environments. Especially, with respect to the tempo, pitch, and speed manipulation experiments, the proposed method archives the precision rate of the range from 95 to 100% according to the degree of manipulation, which compares favorably with the precision rate obtained by traditional approaches, and yields a precision rate between 85 and 100% in noise addition and filtering experiments.
5 citations