Showing papers on "Spectrogram published in 2010"

PDF

Open Access

Journal Article•DOI•

Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation

[...]

Alexey Ozerov¹, Cédric Févotte¹•Institutions (1)

01 Mar 2010-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: In this article, a general data-driven object-based model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals, is considered.

...read moreread less

Abstract: We consider inference in a general data-driven object-based model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals. We work in the short-time Fourier transform (STFT) domain, where convolution is routinely approximated as linear instantaneous mixing in each frequency band. Each source STFT is given a model inspired from nonnegative matrix factorization (NMF) with the Itakura-Saito divergence, which underlies a statistical model of superimposed Gaussian components. We address estimation of the mixing and source parameters using two methods. The first one consists of maximizing the exact joint likelihood of the multichannel data using an expectation-maximization (EM) algorithm. The second method consists of maximizing the sum of individual likelihoods of all channels using a multiplicative update algorithm inspired from NMF methodology. Our decomposition algorithms are applied to stereo audio source separation in various settings, covering blind and supervised separation, music and speech sources, synthetic instantaneous and convolutive mixtures, as well as professionally produced music recordings. Our EM method produces competitive results with respect to state-of-the-art as illustrated on two tasks from the international Signal Separation Evaluation Campaign (SiSEC 2008).

...read moreread less

636 citations

Proceedings Article•

Binary Coding of Speech Spectrograms Using a Deep Auto-encoder

[...]

Li Deng¹, Michael L. Seltzer¹, Dong Yu¹, Alex Acero¹, Abdelrahman Mohamed², Geoffrey E. Hinton² - Show less +2 more•Institutions (2)

Microsoft¹, University of Toronto²

01 Sep 2010

TL;DR: This paper reports the recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms and shows that the binary codes learned produce a logspectral distortion that is approximately 2 dB lower than a subband vector quantization technique over the entire frequency range of wide-band speech.

...read moreread less

Abstract: This paper reports our recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms. The top layer of the generative model learns binary codes that can be used for efficient compression of speech and could also be used for scalable speech recognition or rapid speech content retrieval. Each layer of the generative model is fully connected to the layer below and the weights on these connections are pretrained efficiently by using the contrastive divergence approximation to the log likelihood gradient. After layer-bylayer pre-training we “unroll” the generative model to form a deep auto-encoder, whose parameters are then fine-tuned using back-propagation. To reconstruct the full-length speech spectrogram, individual spectrogram segments predicted by their respective binary codes are combined using an overlapand-add method. Experimental results on speech spectrogram coding demonstrate that the binary codes produce a logspectral distortion that is approximately 2 dB lower than a subband vector quantization technique over the entire frequency range of wide-band speech. Index Terms: deep learning, speech feature extraction, neural networks, auto-encoder, binary codes, Boltzmann machine

...read moreread less

372 citations

Journal Article•DOI•

Model-Based Expectation-Maximization Source Separation and Localization

[...]

Michael I. Mandel¹, Ron Weiss¹, Daniel P. W. Ellis¹•Institutions (1)

Columbia University¹

01 Feb 2010-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper describes a model-based expectation-maximization source separation and localization system for separating and localizing multiple sound sources from an underdetermined reverberant two-channel recording, and creates probabilistic spectrogram masks that can be used for source separation.

...read moreread less

Abstract: This paper describes a system, referred to as model-based expectation-maximization source separation and localization (MESSL), for separating and localizing multiple sound sources from an underdetermined reverberant two-channel recording. By clustering individual spectrogram points based on their interaural phase and level differences, MESSL generates masks that can be used to isolate individual sound sources. We first describe a probabilistic model of interaural parameters that can be evaluated at individual spectrogram points. By creating a mixture of these models over sources and delays, the multi-source localization problem is reduced to a collection of single source problems. We derive an expectation-maximization algorithm for computing the maximum-likelihood parameters of this mixture model, and show that these parameters correspond well with interaural parameters measured in isolation. As a byproduct of fitting this mixture model, the algorithm creates probabilistic spectrogram masks that can be used for source separation. In simulated anechoic and reverberant environments, separations using MESSL produced on average a signal-to-distortion ratio 1.6 dB greater and perceptual evaluation of speech quality (PESQ) results 0.27 mean opinion score units greater than four comparable algorithms.

...read moreread less

317 citations

Harmonic/Percussive Separation Using Median Filtering

[...]

Derry Fitzgerald¹•Institutions (1)

Dublin Institute of Technology¹

01 Jan 2010

TL;DR: In this paper, median filtering is used to separate the harmonic and percussive parts of a monaural audio signal, and the two resulting median filtered spectrograms are then used to generate masks which are then applied to the original spectrogram.

...read moreread less

Abstract: In this paper, we present a fast, simple and effective method to separate the harmonic and percussive parts of a monaural audio signal. The technique involves the use of median filtering on a spectrogram of the audio signal, with median filtering performed across successive frames to suppress percussive events and enhance harmonic components, while median filtering is also performed across frequency bins to enhance percussive events and supress harmonic components. The two resulting median filtered spectrograms are then used to generate masks which are then applied to the original spectrogram to separate the harmonic and percussive parts of the signal. We illustrate the use of the algorithm in the context of remixing audio material from commercial recordings.

...read moreread less

240 citations

Proceedings Article•

Bayesian Nonparametric Matrix Factorization for Recorded Music

[...]

David M. Blei¹, Perry R. Cook¹, Matthew D. Hoffman¹•Institutions (1)

Princeton University¹

21 Jun 2010

TL;DR: This work develops Gamma Process Nonnegative Matrix Factorization (GaP-NMF), a Bayesian nonparametric approach to decomposing spectrograms and derives a mean-field variational inference algorithm and evaluates GaP- NMF on both synthetic data and recorded music.

...read moreread less

Abstract: Recent research in machine learning has focused on breaking audio spectrograms into separate sources of sound using latent variable decompositions. These methods require that the number of sources be specified in advance, which is not always possible. To address this problem, we develop Gamma Process Nonnegative Matrix Factorization (GaP-NMF), a Bayesian nonparametric approach to decomposing spectrograms. The assumptions behind GaP-NMF are based on research in signal processing regarding the expected distributions of spectrogram data, and GaP-NMF automatically discovers the number of latent sources. We derive a mean-field variational inference algorithm and evaluate GaP-NMF on both synthetic data and recorded music.

...read moreread less

160 citations

Fast signal reconstruction from magnitude STFT spectrogram based on spectrogram consistency

[...]

Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono¹, Shigeki Sagayama¹, Morinosato Wakamiya - Show less +1 more•Institutions (1)

University of Tokyo¹

01 Jan 2010

TL;DR: In this paper, the authors present recent theoretical and experimental developments on the application to signal reconstruction from a modified magnitude spectrogram of the constraints that an array of complex numbers must verify to be a consistent short-time Fourier transform (STFT) spectrogram.

...read moreread less

Abstract: The modification of magnitude spectrograms is at the core of many audio signal processing methods, from source separation to sound modification or noise canceling, and reconstructing a natural sounding signal in such situations is thus a very important issue. This article presents recent theoretical and experimental developments on the application to signal reconstruction from a modified magnitude spectrogram of the constraints that an array of complex numbers must verify to be a consistent short-time Fourier transform (STFT) spectrogram, i.e., to be the STFT spectrogram of an actual real-valued signal. We give here further theoretical insights, present several potential variations on our previously introduced algorithm, investigate various techniques to speed up the signal reconstruction process, and present a thorough experimental comparison of the performance of all the considered algorithms.

...read moreread less

80 citations

Journal Article•DOI•

A survey of spectrogram track detection algorithms

[...]

Thomas Lampert¹, Simon O'Keefe¹•Institutions (1)

University of York¹

01 Feb 2010-Applied Acoustics

TL;DR: An extensive survey and an algorithm taxonomy is presented and each algorithm is reviewed according to a set of criteria relating to their success in application, concluding that none of these algorithms fully meets these criteria.

...read moreread less

76 citations

Journal Article•DOI•

Estimating Multiple Frequency-Hopping Signal Parameters via Sparse Linear Regression

[...]

Daniele Angelosante¹, Georgios B. Giannakis¹, Nicholas D. Sidiropoulos²•Institutions (2)

University of Minnesota¹, University of Crete²

01 Oct 2010-IEEE Transactions on Signal Processing

TL;DR: A novel approach based on sparse linear regression (SLR) is developed, formulated as one of under-determined linear regression with a dual sparsity penalty, and its exact solution is obtained using the alternating direction method of multipliers (ADMoM).

...read moreread less

Abstract: Frequency hopping (FH) signals have well-documented merits for commercial and military applications due to their near-far resistance and robustness to jamming. Estimating FH signal parameters (e.g., hopping instants, carriers, and amplitudes) is an important and challenging task, but optimum estimation incurs an unrealistic computational burden. The spectrogram has long been the starting non-parametric estimator in this context, followed by line spectra refinements. The problem is that hop timing estimates derived from the spectrogram are coarse and unreliable, thus severely limiting performance. A novel approach is developed in this paper, based on sparse linear regression (SLR). Using a dense frequency grid, the problem is formulated as one of under-determined linear regression with a dual sparsity penalty, and its exact solution is obtained using the alternating direction method of multipliers (ADMoM). The SLR-based approach is further broadened to encompass polynomial-phase hopping (PPH) signals, encountered in chirp spread spectrum modulation. Simulations demonstrate that the developed estimator outperforms spectrogram-based alternatives, especially with regard to hop timing estimation, which is the crux of the problem.

...read moreread less

75 citations

Journal Article•DOI•

Ultrafast insulator-to-metal phase transition as a switch to measure the spectrogram of a supercontinuum light pulse

[...]

Federico Cilento¹, Federico Cilento², Claudio Giannetti³, Gabriele Ferrini³, Stefano Dal Conte⁴, Tommaso Sala³, Giacomo Coslovich², Giacomo Coslovich¹, M. Rini⁵, Andrea Cavalleri⁶, Andrea Cavalleri⁷, Fulvio Parmigiani² - Show less +8 more•Institutions (7)

AREA Science Park¹, University of Trieste², Catholic University of the Sacred Heart³, University of Pavia⁴, Lawrence Berkeley National Laboratory⁵, University of Hamburg⁶, University of Oxford⁷

11 Jan 2010-Applied Physics Letters

TL;DR: In this paper, the temporal and spectral structure (spectrogram) of a complex light pulse exploiting the ultrafast switching character of a nonthermal photoinduced phase transition is demonstrated. But the method is limited to femtosecond near-infrared laser pulses.

...read moreread less

Abstract: In this letter we demonstrate the possibility to determine the temporal and spectral structure (spectrogram) of a complex light pulse exploiting the ultrafast switching character of a nonthermal photoinduced phase transition. As a proof, we use a VO2 multifilm, undergoing an ultrafast insulator-to-metal phase transition when excited by femtosecond near-infrared laser pulses. The abrupt variation in the multifilm optical properties, over a broad infrared/visible frequency range, is exploited to determine, in situ and in a simple way, the spectrogram of a supercontinuum pulse produced by a photonic crystal fiber. The determination of the structure of the pulse is mandatory to develop pump-probe experiments with frequency resolution over a broad spectral range (700–1100 nm).

...read moreread less

66 citations

Book Chapter•DOI•

Notes on nonnegative tensor factorization of the spectrogram for audio source separation: statistical insights and towards self-clustering of the spatial cues

[...]

Cédric Févotte¹, Alexey Ozerov²•Institutions (2)

Télécom ParisTech¹, French Institute for Research in Computer Science and Automation²

21 Jun 2010

TL;DR: It is shown that the statistical source models implied by the nonnegative tensor factorization of multichannel spectrograms under PARAFAC structure implicitly assumes a nonpoint-source model contrasting with usual BSS assumptions and the links between the measure of fit chosen for the NTF and the implied statistical distribution of the sources are clarified.

...read moreread less

Abstract: Nonnegative tensor factorization (NTF) of multichannel spectrograms under PARAFAC structure has recently been proposed by Fitzgerald et al as a mean of performing blind source separation (BSS) of multichannel audio data. In this paper we investigate the statistical source models implied by this approach. We show that it implicitly assumes a nonpoint-source model contrasting with usual BSS assumptions and we clarify the links between the measure of fit chosen for the NTF and the implied statistical distribution of the sources. While the original approach of Fitzgeral et al requires a posterior clustering of the spatial cues to group the NTF components into sources, we discuss means of performing the clustering within the factorization. In the results section we test the impact of the simplifying nonpoint-source assumption on underdetermined linear instantaneous mixtures of musical sources and discuss the limits of the approach for such mixtures.

...read moreread less

60 citations

Journal Article•DOI•

Sensitive White Space Detection with Spectral Covariance Sensing

[...]

Jaeweon Kim¹, Jeffrey G. Andrews¹•Institutions (1)

University of Texas at Austin¹

01 Sep 2010-IEEE Transactions on Wireless Communications

TL;DR: It is shown that SCS is highly robust to noise uncertainty, whereas many other spectrum sensors are not, and improves by 3 dB for the same dwell time, which is a very significant improvement for this application.

...read moreread less

Abstract: This paper proposes a novel, highly effective spectrum sensing algorithm for cognitive radio and white space applications. The proposed spectral covariance sensing (SCS) algorithm exploits the different statistical correlations of the received signal and noise in the frequency domain. Test statistics are computed from the covariance matrix of a partial spectrogram and compared with a decision threshold to determine whether a primary signal or arbitrary type is present or not. This detector is analyzed theoretically and verified through realistic open-source simulations using actual digital television signals captured in the US. Compared to the state of the art in the literature, SCS improves sensitivity by 3 dB for the same dwell time, which is a very significant improvement for this application. Further, it is shown that SCS is highly robust to noise uncertainty, whereas many other spectrum sensors are not.

...read moreread less

Proceedings Article•DOI•

NMF with time-frequency activations to model non stationary audio events

[...]

Romain Hennequin¹, Roland Badeau¹, Bertrand David¹•Institutions (1)

Télécom ParisTech¹

14 Mar 2010

TL;DR: An extension of non-negative matrix factorization where the temporal activations become frequency dependent and follow a time-varying autoregressive moving average (ARMA) modeling leads to an efficient single-atom decomposition for a single audio event with strong spectral variation (but with constant pitch).

...read moreread less

Abstract: Real world sounds often exhibit non-stationary spectral characteristics such as those produced by a harpsichord or a guitar. The classical Non-negative Matrix Factorization (NMF) needs a number of atoms to accurately decompose the spectrogram of such sounds. An extension of NMF is proposed hereafter which includes time-frequency activations based on ARMA modeling. This leads to an efficient single-atom decomposition for a single audio event. The new algorithm is tested on real audio data and shows promising results.

...read moreread less

Book Chapter•DOI•

Nonnegative matrix factorization with Markov-Chained bases for modeling time-varying patterns in music spectrograms

[...]

Masahiro Nakano¹, Jonathan Le Roux², Hirokazu Kameoka², Yu Kitano¹, Nobutaka Ono¹, Shigeki Sagayama¹ - Show less +2 more•Institutions (2)

University of Tokyo¹, Nippon Telegraph and Telephone²

27 Sep 2010

TL;DR: In this paper, a sparse representation for polyphonic music signals is presented, which is an extension of nonnegative matrix factorization (NMF) for learning the time-varying spectral patterns of musical instruments, such as attack of the piano or vibrato of the violin, without any prior information.

...read moreread less

Abstract: This paper presents a new sparse representation for polyphonic music signals. The goal is to learn the time-varying spectral patterns of musical instruments, such as attack of the piano or vibrato of the violin in polyphonic music signals without any prior information. We model the spectrogram of music signals under the assumption that they are composed of a limited number of components which are composed of Markov-chained spectral patterns. The proposed model is an extension of nonnegative matrix factorization (NMF). An efficient algorithm is derived based on the auxiliary function method.

...read moreread less

Book Chapter•DOI•

Consistent wiener filtering: generalized time-frequency masking respecting spectrogram consistency

[...]

Jonathan Le Roux¹, Emmanuel Vincent², Yuu Mizuno³, Hirokazu Kameoka¹, Nobutaka Ono³, Shigeki Sagayama³ - Show less +2 more•Institutions (3)

Nippon Telegraph and Telephone¹, French Institute for Research in Computer Science and Automation², University of Tokyo³

27 Sep 2010

TL;DR: In this article, the authors generalize the concept of Wiener filtering to time-frequency masks which can involve manipulation of the phase as well by formulating the problem as a consistency-constrained Maximum-Likelihood one.

...read moreread less

Abstract: Wiener filtering is one of the most widely used methods in audio source separation. It is often applied on time-frequency representations of signals, such as the short-time Fourier transform (STFT), to exploit their short-term stationarity, but so far the design of the Wiener time-frequency mask did not take into account the necessity for the output spectrograms to be consistent, i.e., to correspond to the STFT of a time-domain signal. In this paper, we generalize the concept of Wiener filtering to time-frequency masks which can involve manipulation of the phase as well by formulating the problem as a consistency-constrained Maximum-Likelihood one. We present two methods to solve the problem, one looking for the optimal time-domain signal, the other promoting consistency through a penalty function directly in the time-frequency domain. We show through experimental evaluation that, both in oracle conditions and combined with spectral subtraction, our method outperforms classical Wiener filtering.

...read moreread less

Journal Article•DOI•

Performance Analysis of different Filters for Power Line Interface Reduction in ECG Signal

[...]

Yatindra Kumar, Gorav Kumar Malik

06 Oct 2010-International Journal of Computer Applications

TL;DR: The results have clearly indicated that there is reduction in Power line noise in the ECG signal changes according to filter, and the best result is shown by adaptive filter.

...read moreread less

Abstract: Over the years Computer aided analysis of ECG signal is gaining with tremendous amount of work being carried out all over the world. This paper is a small step on our part in that direction, ECG Electrocardiogram signal most comely known recognized and used biomedical signal, the ECG signal is very sensitive in nature, and even if small noise mixed with original signal the various characteristics of the signal changes, Data corrupted with noise must either filtered or discarded, filtering is important issue for design consideration of real time heart monitoring systems. The purpose of this paper is to quantify relative performance analysis of different filtering methods for power line interface reduction. The data base for the performance analysis is created by simulation of ECG signal , an ideal ECG signal is best for performance analysis, then data base is corrupted with 50 Hz power line interface ,the ability of different filter (use IIR Notch , Wiener, adaptive filter) are checked by changes in filtered signal, signal to noise ratio, Power of the signal, Power spectral density ,spectrogram of the signal , The location of peaks and its amplitude also measured by Pan Tompkins algorithm for performance analysis of filters. The results have clearly indicated that there is reduction in Power line noise in the ECG signal changes according to filter, and the best result is shown by adaptive filter we can see it easily in spectrogram, The results have been concluded using Mat lab and Simulated ECG database.

...read moreread less

Proceedings Article•DOI•

Synthesizing speech from Doppler signals

[...]

Arthur R. Toth¹, Kaustubh Kalgaonkar², Bhiksha Raj¹, Tony Ezzat²•Institutions (2)

Carnegie Mellon University¹, Georgia Institute of Technology²

14 Mar 2010

TL;DR: A new device for synthesizing speech from characterizations of facial motion associated with speech - a Doppler sonar that is able to synthesize reasonable speech signals, comparable to those obtained from tethered devices such as EMGs.

...read moreread less

Abstract: It has long been considered a desirable goal to be able to construct an intelligible speech signal merely by observing the talker in the act of speaking. Past methods at performing this have been based on camera-based observations of the talker's face, combined with statistical methods that infer the speech signal from the facial motion captured by the camera. Other methods have included synthesis of speech from measurements taken by electro-myelo graphs and other devices that are tethered to the talker - an undesirable setup. In this paper we present a new device for synthesizing speech from characterizations of facial motion associated with speech - a Doppler sonar. Facial movement is characterized through Doppler frequency shifts in a tone that is incident on the talker's face. These frequency shifts are used to infer the underlying speech signal. The setup is farfield and untethered, with the sonar acting from the distance of a regular desktop microphone. Preliminary experimental evaluations show that the mechanism is very promising - we are able to synthesize reasonable speech signals, comparable to those obtained from tethered devices such as EMGs.

...read moreread less

Journal Article•DOI•

A Quantitative Technique to Compare and Classify Humpback Whale (Megaptera novaeangliae) Sounds

[...]

Denis Chabot¹•Institutions (1)

Memorial University of Newfoundland¹

26 Apr 2010-Ethology

TL;DR: In an attempt to minimize observer bias, numerical taxonomy methods were used to describe and classify humpback whale sounds to make studies of animal communication performed by different researchers or on different species more easily comparable.

...read moreread less

Abstract: In an attempt to minimize observer bias, numerical taxonomy methods were used to describe and classify humpback whale sounds. The spectrograms (N = 1255) were digitized into a 16 × 21 binary matrix. The rows were 16 frequencies selected on a logarithmic scale (0.12–8 kHz). The columns were 21 time samples taken every 0.1 s. Each point of the matrix was coded 1 if it lay over part of the sound. Other binary variables were added to code for relative intensity within a sound, frequency modulation and amplitude modulation. The sounds were then compared using the Jaccard similarity coefficient for binary data, and classified with average linkage cluster analysis. This technique produced 115 clusters, which were compared with my aural and visual impressions of the sounds. I agreed with most major categories identified by cluster analysis, but many small clusters had to be fused to other categories. This was partially due to the technique used, and to the complexity of the repertoire under study. Improvements are proposed to further reduce observer bias in classification of sounds, and thus make studies of animal communication performed by different researchers or on different species more easily comparable.

...read moreread less

Book Chapter•DOI•

Harmonic and Percussive Sound Separation and Its Application to MIR-Related Tasks

[...]

Nobutaka Ono¹, Kenichi Miyamoto¹, Hirokazu Kameoka¹, Jonathan Le Roux¹, Yuuki Uchiyama¹, Emiru Tsunoo¹, Takuya Nishimoto¹, Shigeki Sagayama¹ - Show less +4 more•Institutions (1)

University of Tokyo¹

01 Jan 2010

TL;DR: A simple and fast method to separate a monaural audio signal into harmonic and percussive components, which leads to a useful pre-processing for MIR-related tasks and the application of the proposed technique to automatic chord recognition and rhythm-pattern extraction.

...read moreread less

Abstract: In this chapter, we present a simple and fast method to separate a monaural audio signal into harmonic and percussive components, which leads to a useful pre-processing for MIR-related tasks. Exploiting the anisotropies of the power spectrograms of harmonic and percussive components, we define objective functions based on spectrogram gradients, and, applying to them the auxiliary function approach, we derive simple and fast update equations which guarantee the decrease of the objective function at each iteration. We show experimental results for sound separation on popular and jazz music pieces, and also present the application of the proposed technique to automatic chord recognition and rhythm-pattern extraction.

...read moreread less

Journal Article•DOI•

Combining Auditory Preprocessing and Bayesian Estimation for Robust Formant Tracking

[...]

Claudius Gläser¹, Martin Heckmann¹, Frank Joublin¹, Christian Goerick¹•Institutions (1)

Honda¹

01 Feb 2010-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This approach combines a preprocessing based on functional principles of the human auditory system and a probabilistic tracking scheme with an algorithm for adaptive frequency range segmentation as well as Bayesian smoothing to derive an efficient framework for estimating formant trajectories.

...read moreread less

Abstract: We present a framework for estimating formant trajectories. Its focus is to achieve high robustness in noisy environments. Our approach combines a preprocessing based on functional principles of the human auditory system and a probabilistic tracking scheme. For enhancing the formant structure in spectrograms we use a Gammatone filterbank, a spectral preemphasis, as well as a spectral filtering using difference-of-Gaussians (DoG) operators. Finally, a contrast enhancement mimicking a competition between filter responses is applied. The probabilistic tracking scheme adopts the mixture modeling technique for estimating the joint distribution of formants. In conjunction with an algorithm for adaptive frequency range segmentation as well as Bayesian smoothing an efficient framework for estimating formant trajectories is derived. Comprehensive evaluations of our method on the VTR-formant database emphasize its high precision and robustness. We obtained superior performance compared to existing approaches for clean as well as echoic noisy speech. Finally, an implementation of the framework within the scope of an online system using instantaneous feature-based resynthesis demonstrates its applicability to real-world scenarios.

...read moreread less

Proceedings Article•DOI•

Classification of EEG correlates on emotion using features from Gaussian mixtures of EEG spectrogram

[...]

Reza Khosrowabadi¹, Abdul Wahab Abdul Rahman²•Institutions (2)

Nanyang Technological University¹, International Islamic University, Islamabad²

01 Dec 2010

TL;DR: The results showed that the proposed feature extraction using Gaussian mixtures of EEG spectrogram yielded better classification results using the KNN classifier.

...read moreread less

Abstract: This paper presents the classification of EEG correlates on emotion using features extracted by Gaussian mixtures of EEG spectrogram. This method is compared with three feature extraction methods based on fractal dimension of EEG signal including Higuchi, Minkowski Bouligand, and Fractional Brownian motion. The K nearest neighbor and Support Vector Machine are applied to classify extracted features. The 4 emotional states investigated in this paper are defined using the valence-arousal plane: two valence states (positive and negative) and two arousal states (calm, excited). The accuracy of system to classify 4 emotional states is investigated on EEG collected from 26 subjects (20 to 32 years old) while exposed to emotionally-related visual and audio stimuli. The results showed that the proposed feature extraction using Gaussian mixtures of EEG spectrogram yielded better classification results using the KNN classifier.

...read moreread less

Journal Article•DOI•

Modeling complex phenotypes: generalized linear models using spectrogram predictors of animal communication signals.

[...]

Scott H. Holan¹, Christopher K. Wikle¹, Laura Sullivan-Beckers², Reginald B. Cocroft¹•Institutions (2)

University of Missouri¹, University of Nebraska–Lincoln²

01 Sep 2010-Biometrics

TL;DR: The model developed characterizes key aspects of the acoustic signal that influence sexual selection while alleviating the need to extract higher‐level signal traits a priori.

...read moreread less

Abstract: A major goal of evolutionary biology is to understand the dynamics of natural selection within populations. The strength and direction of selection can be described by regressing relative fitness measurements on organismal traits of ecological significance. However, many important evolutionary characteristics of organisms are complex, and have correspondingly complex relationships to fitness. Secondary sexual characteristics such as mating displays are prime examples of complex traits with important consequences for reproductive success. Typically, researchers atomize sexual traits such as mating signals into a set of measurements including pitch and duration, in order to include them in a statistical analysis. However, these researcher-defined measurements are unlikely to capture all of the relevant phenotypic variation, especially when the sources of selection are incompletely known. In order to accommodate this complexity we propose a Bayesian dimension-reduced spectrogram generalized linear model that directly incorporates representations of the entire phenotype (one-dimensional acoustic signal) into the model as a predictor while accounting for multiple sources of uncertainty. The first stage of dimension reduction is achieved by treating the spectrogram as an "image" and finding its corresponding empirical orthogonal functions. Subsequently, further dimension reduction is accomplished through model selection using stochastic search variable selection. Thus, the model we develop characterizes key aspects of the acoustic signal that influence sexual selection while alleviating the need to extract higher-level signal traits a priori. This facet of our approach is fundamental and has the potential to provide additional biological insight, as is illustrated in our analysis.

...read moreread less

Proceedings Article•DOI•

2D THz signature for substance identification

[...]

Vyacheslav A. Trofimov¹, Svetlana A. Varentsova¹•Institutions (1)

Moscow State University¹

23 Apr 2010-Proceedings of SPIE

TL;DR: The method of identification is based on the analysis of spectrum dynamics of medium response and has the ability not only to detect the presence of the substance in the sample but to identify it by its 2D signature, which is unique for each investigated substance.

...read moreread less

Abstract: The method, which gives us a possibility to obtain the unique 2D signature of substance, for its identification in THz frequency range is developed and applied for the treatment of signals, passed through ordinary materials or selected explosives, including those hidden under opaque simulant covers. The method of identification is based on the analysis of spectrum dynamics (spectrogram) of medium response and has the ability not only to detect the presence of the substance in the sample but to identify it by its 2D signature, which is unique for each investigated substance. It allows to trace the dynamics of many spectral lines in one set of measurements simultaneously and to obtain the full information about the spectrum dynamics of the measured signal. We showed that spectrograms of THz pulses, passed through the explosives, hidden under simulant covers, widely differ from spectrograms of simulant themselves despite of a little difference in their Fourier spectra. Therefore, the method allows detecting and identifying the hidden substances with high probability and can be very effective for defense and security applications. The problem of detection of a noisy regular acoustic signal with linear modulation of frequency is examined too.

...read moreread less

A procedure of vibration analysis from planetary gearbox under non-stationary cyclic operations by instantaneous frequency estimation in time-frequency domain

[...]

Radoslaw Zimroz¹, Fabien Millioz, Nadine Martin•Institutions (1)

Wrocław University of Technology¹

22 Jun 2010

TL;DR: In this article, a new approach is proposed and exploited for complex, multistage gearboxes with planetary stage, to extract information related to cyclic load variation, an instantaneous speed obtained via time-frequency spectrogram will be used.

...read moreread less

Abstract: Condition monitoring of gearboxes via vibration analysis is well-recognized approach in scientific literature and also in engineering practice. However, in many cases machine works under non-stationary operating conditions (load and speed variation), that often requires special signal processing and pattern recognition suitable for time varying systems. One of key problem is to identify variation of external load or speed. Measurement of current consumed by electric motor or instantaneous speed obtained by processing of tachometer signal, in many practical situations (industrial condition) may be difficult or impossible. In such case non-stationary load variation may be identified by extraction of information hidden in vibration signal. For example it may be extracted from amplitude or frequency demodulation. Unfortunately both approaches are difficult (or even impossible) for our machines due to complexity of design and wide range of load/speed variation. In order to avoid these constrains in this paper new approach will be proposed and exploited for complex, multistage gearboxes with planetary stage. To extract information related to cyclic load variation, an instantaneous speed obtained via time-frequency spectrogram will be used. Algorithms for Instantaneous Frequency (IF) estimation via T-F maps have been initially developed by Millioz and Martin. In this paper a novel procedure for Instantaneous Speed estimation (based on IF identification by mentioned automatic algorithm) will be proposed, next the procedure will be applied to vibration signals from planetary gearboxes.

...read moreread less

Proceedings Article•DOI•

Latent-variable decomposition based dereverberation of monaural and multi-channel signals

[...]

Rita Singh¹, Bhiksha Raj², Paris Smaragdis³•Institutions (3)

Carnegie Mellon University¹, Disney Research², Adobe Systems³

14 Mar 2010

TL;DR: Experimental evaluations show that the proposed algorithm is able to greatly reduce the reverberation effects in even highly reverberant signals captured in auditoria and other open spaces.

...read moreread less

Abstract: We present an algorithm to dereverberate single- and multi-channel audio recordings. The proposed algorithm models the magnitude spectrograms of clean audio signals as histograms drawn from a multinomial process. Spectrograms of reverberated signals are obtained as histograms of draws from the PDF of the sum of two random variables, one representing the spectrogram of clean speech and the second the frequency decomposition of the room response. The spectrogram of the clean signal is computed as a maximum-likelihood estimate from the spectrogram of reverberant speech using an EM algorithm. Experimental evaluations show that the proposed algorithm is able to greatly reduce the reverberation effects in even highly reverberant signals captured in auditoria and other open spaces.

...read moreread less

Journal Article•DOI•

Vector Brillouin optical time-domain analyzer for high-order acoustic modes.

[...]

Michel Dossou¹, Denis Bacquet¹, Pascal Szriftgiser¹•Institutions (1)

university of lille¹

15 Nov 2010-Optics Letters

TL;DR: A vector Brillouin optical time-domain analyzer that has a high immunity level to noise, and it features a phase spectrogram capability, well suited for complex situations involving several acoustic resonances, such as high-order longitudinal modes.

...read moreread less

Abstract: Thanks to a double-frequency phase modulation scheme, we report a vector Brillouin optical time-domain analyzer (BOTDA). This BOTDA has a high immunity level to noise, and it features a phase spectrogram capability. It is well suited for complex situations involving several acoustic resonances, such as high-order longitudinal modes. It has notably been used to characterize a dispersion-shifted fiber, allowing us to report spectrograms with multiple acoustic resonances. A very high 57dB dynamic range is also reported for 100-ns-long pulses simultaneously with a 16cm numerical resolution.

...read moreread less

Journal Article•

Object-Based Audio Coding Using Non-Negative Matrix Factorization for the Spectrogram Representation

[...]

Joonas Nikunen, Tuomas Virtanen

01 May 2010-Journal of The Audio Engineering Society

Journal Article•DOI•

Time-frequency imaging algorithm for highspeed spinning targets in two dimensions

[...]

Li Jinwei¹, Cheng-Wei Qiu², Lei Zhang¹, Mengdao Xing³, Zheng Bao¹, Tat Soon Yeo² - Show less +2 more•Institutions (3)

Xidian University¹, National University of Singapore², Chinese Academy of Sciences³

09 Sep 2010-Iet Radar Sonar and Navigation

TL;DR: A novel coherent spectrogram redistribution method, coherent single range Doppler interferometry (CSRDI), is proposed, which is capable of generating high-resolution imagery by applying a phase matched processing and performs well at low signal-to-noise ratio.

...read moreread less

Abstract: This study focuses on the narrow-band radar imaging for high-speed spinning targets. Based on the time-frequency characteristic of the echoed signal, a novel coherent spectrogram redistribution method, coherent single range Doppler interferometry (CSRDI), is proposed, which is capable of generating high-resolution imagery by applying a phase matched processing. Furthermore, the approach performs well at low signal-to-noise ratio. The spinning rate error is also taken into consideration and an estimation approach based on the focal entropy is proposed. The validity is confirmed by real data and numerical simulations.

...read moreread less

Journal Article•DOI•

High resolution spectrograms using a component optimized short-term fractional Fourier transform

[...]

Aled T. Catherall¹, Duncan P. Williams¹•Institutions (1)

Defence Science and Technology Laboratory¹

01 May 2010-Signal Processing

TL;DR: An algorithm based on a short-term representation of the fractional Fourier transform which is highly suited to signals that contain multiple non-stationary components, including a synthetic signal and a bat echolocation signal is presented.

...read moreread less

Journal Article•DOI•

Development of QRS Detection using Short-time Fourier Transform based Technique

[...]

Nopadol Uchaipichat, Sakonthawat Inban

20 Aug 2010-International Journal of Computer Applications

TL;DR: The short-time Fourier transform (STFT) was employed in ECG filtering stage and the narrow rectangular window was used to transform ECG signals into time-frequency domain for QRS complex detection.

...read moreread less

Abstract: This paper reports our study in QRS complex detection. The short-time Fourier transform (STFT) was employed in ECG filtering stage. The narrow rectangular window was used to transform ECG signals into time-frequency domain. The temporal information at 45 Hz from spectrogram was analyzed for detecting QRS locations. The automated thresholding combined with local maxima finding method was modified to find the QRS location. The data used in this study is MIT-BIH Arrhythmia database. As the results, our proposed technique achieved the detection rate better than 99% and fail ratio was 1.3%.

...read moreread less

Proceedings Article•

A super-resolution spectrogram using coupled PLCA.

[...]

Juhan Nam¹, Gautham J. Mysore¹, Joachim Ganseman², Kyogu Lee³, Jonathan S. Abel¹ - Show less +1 more•Institutions (3)

Stanford University¹, University of Antwerp², Seoul National University³

26 Sep 2010

TL;DR: A novel method is presented that achieves high resolution simultaneously in both time and frequency, the “super-resolution spectrogram”, which can be particularly useful for speech as it can simultaneously resolve both glottal pulses and individual harmonics.

...read moreread less

Abstract: The short-time Fourier transform (STFT) based spectrogram is commonly used to analyze the time-frequency content of a signal. Depending on window size, the STFT provides a trade-off between time and frequency resolutions. This paper presents a novel method that achieves high resolution simultaneously in both time and frequency. We extend Probabilistic Latent Component Analysis (PLCA) to jointly decompose two spectrograms, one with a high time resolution and one with a high frequency resolution. Using this decomposition, a new spectrogram, maintaining high resolution in both time and frequency, is constructed. Termed the “super-resolution spectrogram”, it can be particularly useful for speech as it can simultaneously resolve both glottal pulses and individual harmonics.

...read moreread less

Collapse