scispace - formally typeset
Search or ask a question

Showing papers on "Spectrogram published in 2012"


Journal ArticleDOI
TL;DR: Direct brain recordings from neurosurgical patients listening to speech reveal that the acoustic speech signals can be reconstructed from neural activity in auditory cortex.
Abstract: How the human auditory system extracts perceptually relevant acoustic features of speech is unknown. To address this question, we used intracranial recordings from nonprimary auditory cortex in the human superior temporal gyrus to determine what acoustic information in speech sounds can be reconstructed from population neural activity. We found that slow and intermediate temporal fluctuations, such as those corresponding to syllable rate, were accurately reconstructed using a linear model based on the auditory spectrogram. However, reconstruction of fast temporal fluctuations, such as syllable onsets and offsets, required a nonlinear sound representation based on temporal modulation energy. Reconstruction accuracy was highest within the range of spectro-temporal fluctuations that have been found to be critical for speech intelligibility. The decoded speech representations allowed readout and identification of individual words directly from brain activity during single trial sound presentations. These findings reveal neural encoding mechanisms of speech acoustic parameters in higher order human auditory cortex.

547 citations


Journal ArticleDOI
TL;DR: This paper focuses on the discrimination between subjects' electroencephalogram (EEG) responses to self-assessed liked or disliked music by evaluating different feature extraction approaches and classifiers to this end and provides early evidence and pave the way for the development of a generalized brain computer interface for music preference recognition.
Abstract: Affective phenomena, as reflected through brain activity, could constitute an effective index for the detection of music preference. In this vein, this paper focuses on the discrimination between subjects' electroencephalogram (EEG) responses to self-assessed liked or disliked music, acquired during an experimental procedure, by evaluating different feature extraction approaches and classifiers to this end. Feature extraction is based on time-frequency (TF) analysis by implementing three TF techniques, i.e., spectrogram, Zhao-Atlas-Marks distribution and Hilbert-Huang spectrum (HHS). Feature estimation also accounts for physiological parameters that relate to EEG frequency bands, reference states, time intervals, and hemispheric asymmetries. Classification is performed by employing four classifiers, i.e., support vector machines, k-nearest neighbors (k-NN), quadratic and Mahalanobis distance-based discriminant analyses. According to the experimental results across nine subjects, best classification accuracy {86.52 (±0.76)%} was achieved using k-NN and HHS-based feature vectors ( FVs) representing a bilateral average activity, referred to a resting period, in β (13-30 Hz) and γ (30-49 Hz) bands. Activity in these bands may point to a connection between music preference and emotional arousal phenomena. Furthermore, HHS-based FVs were found to be robust against noise corruption. The outcomes of this study provide early evidence and pave the way for the development of a generalized brain computer interface for music preference recognition.

215 citations


Book
06 Dec 2012
TL;DR: Acoustic Signals of Animals: Recording, Field Measurements, Analysis and Description finds application of Filters in Bioacoustics, Digital Signal Analysis, Editing, and Synthesis, and Properties of Various Analog Filters and Antialiasing and Antiimaging Filters.
Abstract: Chapter 1 Acoustic Signals of Animals: Recording, Field Measurements, Analysis and Description H. C. Gerhardt 1 Introduction 2 Field Recordings and Measurements 2.1 Equipment 2.2 On-Site Measurements 2.3 Signal Amplitude, Directionality, and Background Noise Levels 2.4 Patterns of Sound Propagation in Natural Habitats 3 Laboratory Analysis of Animal Sounds 3.1 Terminology 3.2 Temporal and Spectral Analysis: Some General Principles 4 Examples of Descriptions and Analyses 4.1 Temporal Properties of Pulsatile Calls 4.2 Amplitude-Time Envelopes 4.3 Relationships between Fine-Scale Temporal and Spectral Properties 4.4 Spectrally Complex Calls 5 Summary References.- Chapter 2 Digital Signal Acquisition and Representation M. Clements 1 Introduction 2 Digital Signal Processing 2.1 Major Applications of DSP 2.2 Definition of Digital Systems 2.3 Difference Equations 3 Digital Filter Frequency Response 3.1 Unit-Sample Response Characterization 3.2 Frequency-Domain Interpretation of Systems 3.3 Frequency-Domain Interpretation of Signals 4 Conversion Between Analog and Digital Data Forms 4.1 The Sampling Theorem 4.2 Signal Recovery by Filtering 4.3 Fourier Transform Relations 4.4 Effects of Sampling Rates 4.5 Reconstruction 5 Fundamental Digital Processing Techniques 5.1 Power Spectra 5.2 Time and Frequency Resolution 5.3 Windows 5.4 Spectral Smoothing 5.5 The Discrete Fourier Transform 5.6 Correlation 5.7 Autocorrelation 5.8 Cross-correlation 5.9 Spectrograms 6 An Intoduction to Some Advanced Topics 6.1 Digital Filtering 6.2 Linear Prediction 6.3 Homomorphic Analysis 7 Summary.- Chapter 3 Digital Signal Analysis, Editing, and Synthesis K. Beeman 1 Introduction 2 Temporal and Spectral Measurements 3 Time-Varying Amplitude Analysis 3.1 Amplitude Envelopes 3.2 Gate Functions 4 Spectral Analysis 4.1 Power Spectrum Features 4.2 Measuring Similarity Among Power Spectra 4.3 Other Spectral Analysis Techniques 5 Spectrographic Analysis 5.1 Spectrogram Generation 5.2 Spectrogram Display 5.3 Spectrogram Parameter Measurements 6 Classification of Naturally Occurring Animal Sounds 6.1 Properties of Ideal Signals 6.1.1 Periodicity 6.1.2 Amplitude Modulation 6.1.3 Frequency Modulation 6.1.4 Biologically Relevant Sound Types 7 Time-varying Frequency Analysis 7.1 Deriving Spectral Contours 7.2 Sound-similarity Comparison 8 Digital Sound Synthesis 8.1 Editing 8.2 Arithmetic Manipulation and Generation of Sound 8.3 Synthesis Models 8.3.1 Tonal Model 8.4 Sources of and A Functions 8.4.1 Mathematically Based Functions 8.4.2 Functions Derived from Natural Sounds 9 Sound Manipulation and Generation Techniques 9.1 Duration Scaling 9.2 Amplitude-Envelope Manipulations 9.3 Spectral Manipulations 9.3.1 Frequency Shifting and Scaling 9.3.2 Frequency Modulation 9.4 Synthesis of Biological Sound Types 9.4.1 Tonal and Polytonal Signals 9.4.2 Pulse-Repetition Signals 9.4.3 Harmonic Signals 9.4.4 Noisy Signals 9.5 Miscellaneous Synthesis Topics 9.5.1 Template Sounds 9.5.2 Noise Removal 10 Summary References.- Chapter 4 Application of Filters in Bioacoustics P. K. Stoddard 1 Introduction 2 General Uses of Filters and Some Cautions 3 Anatomy and Performance of a Filter 4 Properties of Various Analog Filters 5 Antialiasing and Antiimaging Filters 5.1 A/D Conversion Requires an Analog Lowpass Filter 5.2 Choosing an Antialiasing Filter 5.3 D/A Conversion also Requires an Analog Lowpass Filter 5.4 Analog Filters: Passive Versus Active Components 6 Analog Versus Digital Filters

98 citations


Journal ArticleDOI
TL;DR: This work addresses the issue of underdetermined source separation in a particular informed configuration where both the sources and the mixtures are known during a so-called encoding stage, and allows reliable estimation of the sources through generalized Wiener filtering, provided their spectrograms are known.

90 citations


Proceedings Article
01 Dec 2012
TL;DR: Evaluation on a data set of 14 full-track real-world pop songs showed that use of a similarity matrix can overall improve on the separation performance compared with a previous repetition-based source separation method, and a recent competitive music/voice separation methods, while still being computationally efficient.
Abstract: Repetition is a fundamental element in generating and perceiving structure in music. Recent work has applied this principle to separate the musical background from the vocal foreground in a mixture, by simply extracting the underlying repeating structure. While existing methods are effective, they depend on an assumption of periodically repeating patterns. In this work, we generalize the repetitionbased source separation approach to handle cases where repetitions also happen intermittently or without a fixed period, thus allowing the processing of music pieces with fast-varying repeating structures and isolated repeating elements. Instead of looking for periodicities, the proposed method uses a similarity matrix to identify the repeating elements. It then calculates a repeating spectrogram model using the median and extracts the repeating patterns using a time-frequency masking. Evaluation on a data set of 14 full-track real-world pop songs showed that use of a similarity matrix can overall improve on the separation performance compared with a previous repetition-based source separation method, and a recent competitive music/voice separation method, while still being computationally efficient.

87 citations


Proceedings ArticleDOI
09 Mar 2012
TL;DR: A novel technique, which is effective at low sampling rates, is introduced, which will make RF fingerprinting more practical for resource constrained devices such as mobile transceivers.
Abstract: RF fingerprinting is a technique, where a transmitter is identified from its electromagnetic emission. Most existing RF fingerprinting techniques require high sampling rates. This paper introduces a novel technique, which is effective at low sampling rates. This make RF fingerprinting more practical for resource constrained devices such as mobile transceivers. The technique is demonstrated with Bluetooth transceivers. A data acquisition system is designed to capture the Bluetooth signals in the 2.4GHz ISM band. A Spectrogram utilizing the Short Time Fourier Transform is used to obtain the energy envelope of the instantaneous transient signal and unique features are extracted from the envelope. The technique adopted for identification of the Bluetooth transmitters has shown promising results as compared to the reported techniques in the literature and have accurately classified the Bluetooth transmitters at low sampling rates.

78 citations


Journal ArticleDOI
TL;DR: Results demonstrate that the proposed probabilistic model for multiple-instrument automatic music transcription outperforms leading approaches from the transcription literature, using several error metrics.
Abstract: In this work, a probabilistic model for multiple-instrument automatic music transcription is proposed. The model extends the shift-invariant probabilistic latent component analysis method, which is used for spectrogram factorization. Proposed extensions support the use of multiple spectral templates per pitch and per instrument source, as well as a time-varying pitch contribution for each source. Thus, this method can effectively be used for multiple-instrument automatic transcription. In addition, the shift-invariant aspect of the method can be exploited for detecting tuning changes and frequency modulations, as well as for visualizing pitch content. For note tracking and smoothing, pitch-wise hidden Markov models are used. For training, pitch templates from eight orchestral instruments were extracted, covering their complete note range. The transcription system was tested on multiple-instrument polyphonic recordings from the RWC database, a Disklavier data set, and the MIREX 2007 multi-F0 data set. Results demonstrate that the proposed method outperforms leading approaches from the transcription literature, using several error metrics.

76 citations


Book ChapterDOI
12 Mar 2012
TL;DR: An online approach is proposed to adaptively learn a dictionary for that source during the separation process and separate the mixture over time to perform online semi-supervised separation for real-time applications.
Abstract: Non-negative spectrogram factorization algorithms such as probabilistic latent component analysis (PLCA) have been shown to be quite powerful for source separation. When training data for all of the sources are available, it is trivial to learn their dictionaries beforehand and perform supervised source separation in an online fashion. However, in many real-world scenarios (e.g. speech denoising), training data for one of the sources can be hard to obtain beforehand (e.g. speech). In these cases, we need to perform semi-supervised source separation and learn a dictionary for that source during the separation process. Existing semi-supervised separation approaches are generally offline, i.e. they need to access the entire mixture when updating the dictionary. In this paper, we propose an online approach to adaptively learn this dictionary and separate the mixture over time. This enables us to perform online semi-supervised separation for real-time applications. We demonstrate this approach on real-time speech denoising.

71 citations


Proceedings ArticleDOI
09 Sep 2012
TL;DR: A novel algorithm is presented that combines the advantages of both classical algorithms and non-negative spectrogram decomposition algorithms and significantly outperforms four categories of classical algorithms in non-stationary noise environments.
Abstract: Classical single-channel speech enhancement algorithms have two convenient properties: they require pre-learning the noise model but not the speech model, and they work online. However, they often have difficulties in dealing with non-stationary noise sources. Source separation algorithms based on nonnegative spectrogram decompositions are capable of dealing with non-stationary noise, but do not possess the aforementioned properties. In this paper we present a novel algorithm that combines the advantages of both classical algorithms and non-negative spectrogram decomposition algorithms. Experiments show that it significantly outperforms four categories of classical algorithms in non-stationary noise environments.

65 citations


Journal ArticleDOI
TL;DR: The detection algorithm is general enough to detect all types of humpback vocalizations and outperforms energy detection techniques, providing a probability of detection P(D) = 95% for P( FA) < 5% for three acoustic deployments, compared to P(FA) > 40% for two energy-based techniques.
Abstract: Conventional detection of humpback vocalizations is often based on frequency summation of band-limited spectrograms under the assumption that energy (square of the Fourier amplitude) is the appropriate metric. Power-law detectors allow for a higher power of the Fourier amplitude, appropriate when the signal occupies a limited but unknown subset of these frequencies. Shipping noise is non-stationary and colored and problematic for many marine mammal detection algorithms. Modifications to the standard power-law form are introduced to minimize the effects of this noise. These same modifications also allow for a fixed detection threshold, applicable to broadly varying ocean acoustic environments. The detection algorithm is general enough to detect all types of humpback vocalizations. Tests presented in this paper show this algorithm matches human detection performance with an acceptably small probability of false alarms (P(FA) 40% for two energy-based techniques. The generalized power-law detector also can be used for basic parameter estimation and can be adapted for other types of transient sounds.

55 citations


Journal ArticleDOI
TL;DR: This paper first establishes the radar signal model of the spinning missile during flight, and then extracts the micro-Doppler modulation frequency through analysis of the periodic structure of the resulting spectrogram (short-time Fourier transform (STFT)) - i.e., the time-frequency distribution (TFD).

Journal ArticleDOI
TL;DR: In this paper, a multi-scale autocorrelation via morphological wavelet slices (MAMWS) approach was proposed to detect bearing fault signatures, where the vibration measurement of a bearing is decomposed using morphological stationary wavelet with different resolutions of structuring elements and extracted temporal components are then transformed to form a frequency-domain view of morphological slices by the Fourier transform.

Proceedings ArticleDOI
09 Jul 2012
TL;DR: A novel local audio fingerprint called MASK (Masked Audio Spectral Keypoints) that can effectively encode the acoustic information existent in audio documents and discriminate between transformed versions of the same acoustic documents and other unrelated documents is presented.
Abstract: This paper presents a novel local audio fingerprint called MASK (Masked Audio Spectral Key points) that can effectively encode the acoustic information existent in audio documents and discriminate between transformed versions of the same acoustic documents and other unrelated documents. The fingerprint has been designed to be resilient to strong transformations of the original signal and to be usable for generic audio, including music and speech. Its main characteristics are its locality, binary encoding, robustness and compactness. The proposed audio fingerprint encodes the local spectral energies around salient points selected among the main spectral peaks in a given signal. Such encoding is done by centering on each point a carefully designed mask defining regions of the spectrogram whose average energies are compared with each other. From each comparison we obtain a single bit depending on which region has more energy, and group all bits into a final binary fingerprint. In addition, the fingerprint also stores the frequency of each peak, quantized using a Mel filter bank. The length of the fingerprint is solely defined by the number of compared regions being used, and can be adapted to the requirements of any particular application. In addition, the number of salient points encoded per second can be also easily modified. In the experimental section we show the suitability of such fingerprint to find matching segments by using the NIST-TRECVID benchmarking evaluation datasets by comparing it with a well known fingerprint, obtaining up to $26$\% relative improvement in NDCR score.

Proceedings ArticleDOI
10 Jun 2012
TL;DR: This paper presents a new approach to classify human motions using a Doppler radar for applications in security and surveillance, and it is shown that this approach is more computationally efficient than the traditional principal component analysis.
Abstract: This paper presents a new approach to classify human motions using a Doppler radar for applications in security and surveillance. Traditionally, the Doppler radar is an effective tool for detecting the position and velocity of a moving target, even in adverse weather conditions and from a long range. In this paper, we are interested in using the Doppler radar to recognize the micro-motions exhibited by people. In the proposed approach, a frequency modulated continuous wave radar is applied to scan the target, and the short-time Fourier transform is used to convert the radar signal into spectrogram. Then, the new two-directional, two-dimensional principal component analysis and linear discriminant analysis are performed to obtain the feature vectors. This approach is more computationally efficient than the traditional principal component analysis. Finally, support vector machines are applied to classify feature vectors into different human motions. Evaluated on a radar data set with three types of motions, the proposed approach has a classification rate of 91.9%.

Proceedings ArticleDOI
25 Mar 2012
TL;DR: This paper presents a new time-frequency reassignment process for the spectrogram, called the Levenberg-Marquardt reassignment, which uses the second-order derivatives of the phase of the short-time Fourier transform, and provides the user with a setting parameter.
Abstract: This paper presents a new time-frequency reassignment process for the spectrogram, called the Levenberg-Marquardt reassignment. Compared to the classical one, this new reassignment process uses the second-order derivatives of the phase of the short-time Fourier transform, and provides the user with a setting parameter. This parameter allows him to produce either a weaker or a stronger localization of the signal components in the time-frequency plane.

Patent
27 Jun 2012
TL;DR: In this paper, a double-threshold algorithm and a Welch method were used to estimate a power spectrum to judge whether the abnormal sound exists in the public places, and a traditional harmonic balance (HB) weight function was modified, and weight changes caused by low signal-to-noise ration were lowered.
Abstract: The invention relates to a recognizing and locating method for abnormal sound in public places, which belongs to the technical field of audio signal processing. The method uses a double-threshold algorithm and a Welch method to estimate a power spectrum to judge whether the abnormal sound exists in the public places. Feature sequence signals of the abnormal sound are converted into a spectrogram of a time-frequency domain, and problems of feature extraction and classification recognizing of the abnormal sound are solved by using the sparse coding sound recognition technology sensed by hearing. In order to restrain effects of pulse noise in background noise on the abnormal sound locating, non-linear transformation is introduced, and cross-correlation function peak capacity of the abnormal sound is improved. A traditional harmonic balance (HB) weight function is modified, and weight changes caused by low signal-to-noise ration are lowered. Multi-frame data weight processing is introduced, a novel HB weight generalized cross correlation algorithm is suitable for capacity for processing abnormal sound locating in complex acoustic environments in the public places. Due to the fact thatthe method is combined with the sparse coding sound recognition technology based on hearing sense and the modified sound locating technology of time difference of arrival, sound information accompanied with abnormal events can be utilized better, and intelligent level of a monitoring system in the public places can be improved.

Proceedings ArticleDOI
25 Mar 2012
TL;DR: New variants of the non-negative matrix factorization concept that incorporate music-specific constraints are introduced that incorporateMusic spectrograms' structural regularities.
Abstract: Music spectrograms typically have many structural regularities that can be exploited to help solve the problem of decomposing a given spectrogram into distinct musically meaningful components. In this paper, we introduce new variants of the non-negative matrix factorization concept that incorporate music-specific constraints.

Posted Content
TL;DR: This paper presents a technique for Informed Source Separation of a single channel mixture, based on the Multiple Input Spectrogram Inversion (MISI) phase estimation method, which outperform both this reference technique and the oracle Wiener filter.
Abstract: This paper presents a technique for Informed Source Separation (ISS) of a single channel mixture, based on the Multiple Input Spectrogram Inversion method. The reconstruction of the source signals is iterative, alternating between a time- frequency consistency enforcement and a re-mixing constraint. A dual resolution technique is also proposed, for sharper transients reconstruction. The two algorithms are compared to a state-of-the-art Wiener-based ISS technique, on a database of fourteen monophonic mixtures, with standard source separation objective measures. Experimental results show that the proposed algorithms outperform both this reference technique and the oracle Wiener filter by up to 3dB in distortion, at the cost of a significantly heavier computation.

Proceedings ArticleDOI
28 Jun 2012
TL;DR: A novel vocal separator inspired by single channel vocal separation algorithms which finds the k nearest neighbours to each frame of a spectrogram of the mixture signal which is then used as the estimate of the background music at the current frame.
Abstract: Recently, single channel vocal separation algorithms have been proposed which exploit the fact that most popular music can be regarded as a repeating musical background over which a locally non-repeating vocal signal is superimposed. In this paper we describe a novel vocal separator inspired by these approaches which finds the k nearest neighbours to each frame of a spectrogram of the mixture signal. The median value of these frames is then used as the estimate of the background music at the current frame. This is then used to generate a mask on the original complex-valued spectrogram before inversion to the time domain. The effectiveness of the approach is demonstrated on a number of real-world signals. (5 pages)

Posted Content
TL;DR: In this paper, a tractable convex program is proposed to recover a signal from the magnitude of its short-time Fourier transform (STFT) in audio signal processing.
Abstract: The problem of recovering a signal from the magnitude of its short-time Fourier transform (STFT) is a longstanding one in audio signal processing Existing approaches rely on heuristics that often perform poorly because of the nonconvexity of the problem We introduce a formulation of the problem that lends itself to a tractable convex program We observe that our method yields better reconstructions than the standard Griffin-Lim algorithm We provide an algorithm and discuss practical implementation details, including how the method can be scaled up to larger examples

Proceedings ArticleDOI
06 Sep 2012
TL;DR: By applying a sparse-representation based classifier to the device RSFs, state-of-the-art identification accuracy of 95.55% has been obtained on a set of 8 telephone handsets, from Lincoln-Labs Handset Database (LLHDB).
Abstract: Speech signals convey information not only for speakers' identity and the spoken language, but also for the acquisition devices used during their recording. Therefore, it is reasonable to perform acquisition device identification by analyzing the recorded speech signal. To this end, the random spectral features (RSFs) are proposed as an intrinsic fingerprint suitable for device identification. The RSFs are extracted from each speech signal by first averaging its spectrogram along the time axis and then by projecting the resulting mean spectrogram onto a Gaussian random matrix of compatible dimensions. By applying a sparse-representation based classifier to the device RSFs, state-of-the-art identification accuracy of 95.55% has been obtained on a set of 8 telephone handsets, from Lincoln-Labs Handset Database (LLHDB).

Journal ArticleDOI
TL;DR: In this paper, the Hilbert-Huang transform (HHT) was combined with a fourth-order spectral analysis tool named Kurtogram, where the Kurtogram was applied to locate the non-stationary intra-and inter-wave modulation components in the original signals and produce more monochromatic IMFs.

Proceedings ArticleDOI
01 Dec 2012
TL;DR: The random spectral features and the labeled spectral features are proposed as intrinsic fingerprints suitable for device identification and extracted by applying unsupervised and supervised feature selection to the mean spectrogram of each speech signal.
Abstract: Speech signals convey information not only for the speakers' identity and the spoken language, but also for the acquisition devices used during their recording. Therefore, it is reasonable to perform acquisition device identification by analyzing the recorded speech signal. To this end, the random spectral features (RSFs) and the labeled spectral features (LSFs) are proposed as intrinsic fingerprints suitable for device identification. The RSFs and the LSFs are extracted by applying unsupervised and supervised feature selection to the mean spectrogram of each speech signal, respectively. State-of-the-art identification accuracy of 97.58% has been obtained by employing LSFs on a set of 8 telephone handsets, from Lincoln-Labs Handset Database (LLHDB).

Proceedings ArticleDOI
25 Mar 2012
TL;DR: This paper deals with phase estimation in the framework of underdetermined blind source separation, using an estimated spectrogram of the source and its associated Wiener filter and shows that this technique brings significant improvements over the classicalWiener filter, while being much faster than other iterative methods.
Abstract: This paper deals with phase estimation in the framework of underdetermined blind source separation, using an estimated spectrogram of the source and its associated Wiener filter. By thresholding the Wiener mask, two domains are defined on the spectrogram : a confidence domain where the phase is kept as the phase of the mixture, and its complement where the phase is updated with a projection similar to the widely-used Griffin and Lim technique. We show that with this simple technique, the choice of parameters results in a simple trade-off between distortion and interference. Experiments show that this technique brings significant improvements over the classical Wiener filter, while being much faster than other iterative methods.

Journal ArticleDOI
30 Apr 2012
TL;DR: The results show that the KNN and ANN were able to classify the spectrogram image with 87.5% to 90% accuracy for the brain balancing application.
Abstract: In this paper, the comparison between K-Nearest Neighbor (KNN) and Artificial Neural Network (ANN) algorithm for classifying the spectrogram images in brain balancing is presented. After producing spectrogram image from Electroencephalogram (EEG) signals, Gray Level Co-occurrence Matrix (GLCM) texture feature were extracted. These features produced huge matrices, therefore to reduce the size of matrices; the Principal Component Analysis (PCA) is applied. The results show that the KNN and ANN were able to classify the spectrogram image with 87.5% to 90% accuracy for the brain balancing application.

Proceedings Article
01 Jan 2012
TL;DR: Using group sparsity to restrict simultaneous activation of sources is proposed, allowing us to discover the identity of an unknown speaker from multiple candidates, and further to recognise the phonetic content more reliably with a narrowed down subset of atoms belonging to the most likely speakers.
Abstract: Spectrogram factorisation using a dictionary of spectrotemporal atoms has been successfully employed to separate a mixed audio signal into its source components. When atoms from multiple sources are included in a combined dictionary, the relative weights of activated atoms reveal likely sources as well as the content of each source. Enforcing sparsity on the activation weights produces solutions, where only a small number of atoms are active at a time. In this paper we propose using group sparsity to restrict simultaneous activation of sources, allowing us to discover the identity of an unknown speaker from multiple candidates, and further to recognise the phonetic content more reliably with a narrowed down subset of atoms belonging to the most likely speakers. An evaluation on the CHiME corpus shows that the use of group sparsity improves the results of noise robust speaker identification and speech recognition using speaker-dependent models.

Proceedings Article
01 Jan 2012
TL;DR: This study compute the local representation on speech spectrogram as the raw “signal” and use it as the local sparse code to perform a standard phone classification task and demonstrates meaningful acoustic-phonetic properties that are captured by a collection of the dictionary entries.
Abstract: We propose a novel approach to acoustic modeling based on recent advances in sparse representations. The key idea in sparse coding is to compute a compressed local representation of a signal via an over-complete basis or dictionary that is learned in an unsupervised way. In this study, we compute the local representation on speech spectrogram as the raw “signal” and use it as the local sparse code to perform a standard phone classification task. A linear classifier is used that directly receives the coding space for making the classification decision. The simplicity of the linear classifier allows us to assess whether the sparse representations are sufficiently rich to serve as effective acoustic features for discriminating speech classes. Our experiments demonstrate competitive error rates when compared to other shallow approaches. An examination of the dictionary learned in sparse feature extraction demonstrates meaningful acoustic-phonetic properties that are captured by a collection of the dictionary entries.

Journal ArticleDOI
28 Mar 2012-Chaos
TL;DR: This paper presents an alternative pragmatic approach to identifying chaos using response frequency characteristics and extending the concept of the spectrogram and it is shown to work well on both experimental and simulated time series.
Abstract: The sign of the largest Lyapunov exponent is the fundamental indicator of chaos in a dynamical system. However, although the extraction of Lyapunov exponents can be accomplished with (necessarily noisy) experimental data, this is still a relatively data-intensive and sensitive endeavor. This paper presents an alternative pragmatic approach to identifying chaos using response frequency characteristics and extending the concept of the spectrogram. The method is shown to work well on both experimental and simulated time series.

Journal ArticleDOI
Mian Pan1, Lan Du1, Penghui Wang1, Hongwei Liu1, Zheng Bao1 
TL;DR: Experimental results for measured data show that the spectrogram feature of HRRP data has significant advantages over the time domain sample in both the recognition and rejection performance, and MTL provides a better recognition performance.
Abstract: In radar high-resolution range profile (HRRP)-based statistical target recognition, one of the most challenging task is the feature extraction. This article utilizes spectrogram feature of HRRP data for improving the recognition performance, of which the spectrogram is a two-dimensional feature providing the variation of frequency domain feature with time domain feature. And then, a new radar HRRP target recognition method is presented via a truncated stick-breaking hidden Markov model (TSB-HMM). Moreover, multi-task learning (MTL) is employed, from which a full posterior distribution on the numbers of states associated with the targets can be inferred and the target-dependent states information are shared among multiple target-aspect frames of each target. The framework of TSB-HMM allows efficient variational Bayesian inference, of interest for large-scale problem. Experimental results for measured data show that the spectrogram feature has significant advantages over the time domain sample in both the recognition and rejection performance, and MTL provides a better recognition performance.

Proceedings ArticleDOI
10 Jun 2012
TL;DR: After a comprehensive series of experiments, it is shown that the SVM classifier trained with LBP is able to achieve a recognition rate of 80%.
Abstract: In this paper we compare two different textural feature sets for automatic music genre classification. The idea is to convert the audio signal into spectrograms and then extract features from this visual representation. Two textural descriptors are explored in this work: the Gray Level Co-Occurrence Matrix (GLCM) and Local Binary Patterns (LBP). Besides, two different strategies of extracting features are considered: a global approach where the features are extracted from the entire spectrogram image and then classified by a single classifier; a local approach where the spectrogram image is split into several zones which are classified independently and final decision is then obtained by combining all the partial results. The database used in our experiments was the Latin Music Database, which contains music pieces categorized into 10 musical genres, and has been used for MIREX (Music Information Retrieval Evaluation eXchange) competitions. After a comprehensive series of experiments we show that the SVM classifier trained with LBP is able to achieve a recognition rate of 80%. This rate not only outperforms the GLCM by a fair margin but also is slightly better than the results reported in the literature.