Showing papers on "Spectrogram published in 2008"

PDF

Open Access

Journal Article•DOI•

Feature-based human motion parameter estimation with radar

[...]

P. van Dorp, Frans C. A. Groen¹•Institutions (1)

12 Mar 2008-Iet Radar Sonar and Navigation

TL;DR: The authors focus on a fast feature-based approach to estimate human motion features for real-time applications to provide a realistic look-alike of the real motion of the person.

...read moreread less

Abstract: Radar can be an extremely useful sensing technique to observe persons. It perceives persons behind walls or at great distances and in situations where persons have no or poor visibility. Human motion modulates the radar signal which can be observed in the spectrogram of the received signal. Extraction of these movements enables the animation of a person in virtual reality. The authors focus on a fast feature-based approach to estimate human motion features for real-time applications. The human walking model of Boulic is used, which describe the human motion with three parameters. Personification information is obtained by estimating the individual leg and torso parameters. These motion parameters can be estimated from the temporal maximum, minimum and centre velocity of the human motion distribution. Three methods are presented to extract these velocities. Additionally, we extract an independent human motion repetition frequency estimate based on velocity slices in the spectrogram. Kalman filters smooth the parameters and estimate the global Boulic parameters. These estimated parameters are input to the human model of Boulic which forms the basis for animation. The methods are applied to real radar measurements. The animated person generated with the extracted parameters provides a realistic look-alike of the real motion of the person.

...read moreread less

178 citations

Journal Article•DOI•

Audio Denoising by Time-Frequency Block Thresholding

[...]

Guoshen Yu¹, Stéphane Mallat¹, Emmanuel Bacry¹•Institutions (1)

École Normale Supérieure¹

01 May 2008-IEEE Transactions on Signal Processing

TL;DR: A block thresholding estimation procedure is introduced, which adjusts all parameters adaptively to signal property by minimizing a Stein estimation of the risk.

...read moreread less

Abstract: Removing noise from audio signals requires a nondiagonal processing of time-frequency coefficients to avoid producing ldquomusical noise.rdquo State of the art algorithms perform a parameterized filtering of spectrogram coefficients with empirically fixed parameters. A block thresholding estimation procedure is introduced, which adjusts all parameters adaptively to signal property by minimizing a Stein estimation of the risk. Numerical experiments demonstrate the performance and robustness of this procedure through objective and subjective evaluations.

...read moreread less

161 citations

Proceedings Article•

Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram

[...]

Nobutaka Ono¹, Kenichi Miyamoto¹, Jonathan Le Roux¹, Hirokazu Kameoka¹, Shigeki Sagayama¹ - Show less +1 more•Institutions (1)

University of Tokyo¹

01 Aug 2008

TL;DR: A simple and fast method to separate a monaural audio signal into harmonic and percussive components, which is much useful for multi-pitch analysis, automatic music transcription, drum detection, modification of music, and so on.

...read moreread less

Abstract: In this paper, we present a simple and fast method to separate a monaural audio signal into harmonic and percussive components, which is much useful for multi-pitch analysis, automatic music transcription, drum detection, modification of music, and so on. Exploiting the differences in the spectrograms of harmonic and percussive components, the objective function is defined in a quadrature form of the spectrogram gradients. Applying the auxiliary function approach to that, simple and fast update equations are derived, which guarantee the decrease of the objective function at each iteration. We show some experimental results by applying our method to popular and jazz music songs.

...read moreread less

136 citations

Journal Article•DOI•

Musical Genre Classification Using Nonnegative Matrix Factorization-Based Features

[...]

Andre Holzapfel, Yannis Stylianou

01 Feb 2008-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: Using nonnegative matrix factorization to derive a novel description for the timbre of musical sounds, a spectrogram is factorized providing a characteristic spectral basis and compression is shown to reduce the noise present in the data set resulting in more stable classification models.

...read moreread less

Abstract: Nonnegative matrix factorization (NMF) is used to derive a novel description for the timbre of musical sounds. Using NMF, a spectrogram is factorized providing a characteristic spectral basis. Assuming a set of spectrograms given a musical genre, the space spanned by the vectors of the obtained spectral bases is modeled statistically using mixtures of Gaussians, resulting in a description of the spectral base for this musical genre. This description is shown to improve classification results by up to 23.3% compared to MFCC-based models, while the compression performed by the factorization decreases training time significantly. Using a distance-based stability measure this compression is shown to reduce the noise present in the data set resulting in more stable classification models. In addition, we compare the mean squared errors of the approximation to a spectrogram using independent component analysis and nonnegative matrix factorization, showing the superiority of the latter approach.

...read moreread less

116 citations

Proceedings Article•DOI•

Human activity classification based on micro-Doppler signatures using an artificial neural network

[...]

Youngwook Kim¹, Hao Ling¹•Institutions (1)

University of Texas at Austin¹

05 Jul 2008

TL;DR: The result shows that it is quite feasible to recognize the different human activities using micro-Doppler information and that the trained ANN could lead to high error when it is used to classify data measured from another sensor.

...read moreread less

Abstract: An ANN has been proposed to classify human activities from their micro-Doppler signatures. Data were collected using a Doppler radar for 12 human subjects performing seven activities to construct the training data set. Six features from Doppler signatures were captured in the spectrogram. Validation tests based on the features resulted in an 82.7% and 87.8% classification accuracy for two different validation scenarios. This result shows that it is quite feasible to recognize the different human activities using micro-Doppler information. Several issues still need to be further addressed. In this study, we used measurement data for the training process. The features can be affected by the characteristics of the particular radar used, such as I-Q imbalance, polarization and Rx-Tx locations. Therefore, the trained ANN could lead to high error when it is used to classify data measured from another sensor. Our study is only applicable when the human approaches the radar head-on. Data from other aspects should be included in the testing. Also, we used a 3-second time-window for the features extraction. If the human activity changes during the window duration, classification error may increase. A method to extract features within a shorter time duration needs further research.

...read moreread less

90 citations

Proceedings Article•DOI•

Advanced patient or elder fall detection based on movement and sound data

[...]

Charalampos Doukas¹, Ilias Maglogiannis¹•Institutions (1)

University of the Aegean¹

22 Jul 2008

TL;DR: Evaluation results indicate the high accuracy and the effectiveness of the proposed implementation of a patient monitoring system that may be used for patient activity recognition and emergency treatment in case a patient or an elder falls.

...read moreread less

Abstract: The paper presents am initial implementation of a patient monitoring system that may be used for patient activity recognition and emergency treatment in case a patient or an elder falls. Sensors equipped with accelerometers and microphones are attached on the body of the patients and transmit patient movement and sound data wirelessly to the monitoring unit. Applying Short Time Fourier Transform (STFT) and spectrogram analysis on sounds detection of fall incidents is possible. The classification of the sound and movement data is performed using Support Vector Machines. Evaluation results indicate the high accuracy and the effectiveness of the proposed implementation.

...read moreread less

90 citations

Journal Article•DOI•

Principal components generalized projections: a review [Invited]

[...]

Daniel J. Kane

01 Jun 2008-Journal of The Optical Society of America B-optical Physics

TL;DR: In this paper, a general spectrogram/sonogram inversion algorithm called principal components generalized projections (PCGP) was proposed for frequency-resolved optical gating (FROG) measurements.

...read moreread less

Abstract: Frequency-resolved optical gating (FROG) is a technique used to measure ultrafast laser pulses by optically producing a spectrogram, or FROG trace, of the measured pulse. While a great deal of information about the pulse can be gleaned from its FROG trace, quantitative pulse information must be obtained using an iterative two-dimensional phase retrieval algorithm. A general spectrogram/sonogram inversion algorithm called principal components generalized projections (PCGP) that can be applied to pulse measurement schemes, such as FROG, is reviewed. The algorithm is fast, robust, and can invert FROG traces in real time, making commercial pulse measurement systems based on FROG a reality. Measurement rates are no longer algorithm limited; they are data-acquisition limited. Also, because of some of its unique properties, the PCGP algorithm has found applications in measuring attosecond pulses and measuring telecommunications pulses. In addition, the PCGP structures the inversion and measurement process in a way that can allow new insights into convergence properties of spectrogram and sonogram inversion algorithms.

...read moreread less

89 citations

Proceedings Article•

Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction.

[...]

Jonathan Le Roux¹, Nobutaka Ono¹, Shigeki Sagayama¹•Institutions (1)

University of Tokyo¹

01 Jan 2008

TL;DR: The constraints which a set of complex numbers must verify to be a consistent STFT spectrogram are derived and described, and it is shown how inconsistency can be used to develop a spectrogram-based audio encryption scheme.

...read moreread less

Abstract: As many acoustic signal processing methods, for example for source separation or noise canceling, operate in the magnitude spectrogram domain, the problem of reconstructing a perceptually good sounding signal from a modified magnitude spectrogram, and more generally to understand what makes a spectrogram consistent, is very important. In this article, we derive the constraints which a set of complex numbers must verify to be a consistent STFT spectrogram, i.e. to be the STFT spectrogram of a real signal, and describe how they lead to an objective function measuring the consistency of a set of complex numbers as a spectrogram. We then present a flexible phase reconstruction algorithm based on a local approximation of the consistency constraints, explain its relation with phase-coherence conditions devised as necessary for a good perceptual sound quality, and derive a real-time time scale modification algorithm based on sliding-block analysis. Finally, we show how inconsistency can be used to develop a spectrogram-based audio encryption scheme.

...read moreread less

87 citations

Proceedings Article•

Combining pitch-based inference and non-negative spectrogram factorization in separating vocals from polyphonic music.

[...]

Tuomas Virtanen¹, Annamaria Mesaros¹, Matti Ryynänen¹•Institutions (1)

Tampere University of Technology¹

01 Jan 2008

TL;DR: A novel algorithm based on pitch estimation and nonnegative matrix factorization (NMF) that predicts the amount of noise in the vocal segments, which allows separating vocals and noise even when they overlap in time and frequency is proposed.

...read moreread less

Abstract: This paper proposes a novel algorithm for separating vocals from polyphonic music accompaniment. Based on pitch estimation, the method first creates a binary mask indicating timefrequency segments in the magnitude spectrogram where harmonic content of the vocal signal is present. Second, nonnegative matrix factorization (NMF) is applied on the non-vocal segments of the spectrogram in order to learn a model for the accompaniment. NMF predicts the amount of noise in the vocal segments, which allows separating vocals and noise even when they overlap in time and frequency. Simulations with commercial and synthesized acoustic material show an average improvement of 1.3 dB and 1.8 dB, respectively, in comparison with a reference algorithm based on sinusoidal modeling, and also the perceptual quality of the separated vocals is clearly improved. The method was also tested in aligning separated vocals and textual lyrics, where it produced better results than the reference method.

...read moreread less

85 citations

Proceedings Article•

A real-time equalizer of harmonic and percussive components in music signals

[...]

Nobutaka Ono¹, Kenichi Miyamoto¹, Hirokazu Kameoka¹, Shigeki Sagayama¹•Institutions (1)

University of Tokyo¹

01 Jan 2008

TL;DR: A real-time equalizer to control a volume balance of harmonic and percussive components in music signals without a priori knowledge of scores or included instruments is presented.

...read moreread less

Abstract: In this paper, we present a real-time equalizer to control a volume balance of harmonic and percussive components in music signals without a priori knowledge of scores or included instruments. The harmonic and percussive components of music signals have much different structures in the power spectrogram domain, the former is horizontal, while the latter is vertical. Exploiting the anisotropy, our methods separate input music signals into them based on the MAP estimation framework. We derive two kind of algorithm based on a I-divergence-based mixing model and a hard mixing model. Although they include iterative update equations, we realized the real-time processing by a sliding analysis technique. The separated harmonic and percussive components are finally remixed in an arbitrary volume balance and played. We show the prototype system implemented on Windows environment.

...read moreread less

81 citations

Posted Content•

Audio Classification from Time-Frequency Texture

[...]

Guoshen Yu¹, Jean-Jacques E. Slotine²•Institutions (2)

École Polytechnique¹, Massachusetts Institute of Technology²

25 Sep 2008-arXiv: Computer Vision and Pattern Recognition

TL;DR: A simple audio classification algorithm based on treating sound spectrograms as texture images, inspired by an earlier visual classification scheme particularly efficient at classifying textures achieves surprisingly good performance in musical instrument classification experiments.

...read moreread less

Abstract: Time-frequency representations of audio signals often resemble texture images. This paper derives a simple audio classification algorithm based on treating sound spectrograms as texture images. The algorithm is inspired by an earlier visual classification scheme particularly efficient at classifying textures. While solely based on time-frequency texture features, the algorithm achieves surprisingly good performance in musical instrument classification experiments.

...read moreread less

Proceedings Article•DOI•

On the feasibility of hardware implementation of sub-Nyquist random-sampling based analog-to-information conversion

[...]

S. Pfetsch¹, T. Ragheb¹, Jason N. Laska¹, Hamid Nejati¹, Anna C. Gilbert², Martin J. Strauss², Richard G. Baraniuk¹, Yehia Massoud¹ - Show less +4 more•Institutions (2)

Rice University¹, University of Michigan²

18 May 2008

TL;DR: Results from the RS-AIC hardware implementation demonstrate successful reconstruction of signals that are sampled at half the Nyquist-rate while maintaining up to a 51 dB signal-to-noise ratio (SNR), which is equivalent to an 8.5 bit resolution analog to digital converter.

...read moreread less

Abstract: In this paper, we successfully demonstrate the feasibility of hardware implementation of a sub-Nyquist random- sampling based analog to information converter (RS-AIC). The RS-AIC is based on the theory of information recovery from random samples using an efficient information recovery algorithm to compute the spectrogram of the signal. Our RS-AIC enables sub-Nyquist acquisition and processing of wideband signals that are sparse in a local Fourier representation. Results from our RS-AIC hardware implementation demonstrate successful reconstruction of signals that are sampled at half the Nyquist-rate while maintaining up to a 51 dB signal-to-noise ratio (SNR), which is equivalent to an 8.5 bit resolution analog to digital converter.

...read moreread less

Journal Article•DOI•

Exploiting Conjugate Symmetry of the Short-Time Fourier Spectrum for Speech Enhancement

[...]

Kamil Wojcicki¹, M. Milacic¹, Anthony Phillip Stark¹, James Lyons¹, Kuldip K. Paliwal¹ - Show less +1 more•Institutions (1)

Griffith University¹

23 May 2008-IEEE Signal Processing Letters

TL;DR: This work proposes a novel approach where the noisy magnitude spectrum is recombined with a changed phase spectrum to produce a modified complex spectrum, which results in improved speech quality.

...read moreread less

Abstract: Typical speech enhancement algorithms operate on the short-time magnitude spectrum, while keeping the short-time phase spectrum unchanged for synthesis. We propose a novel approach where the noisy magnitude spectrum is recombined with a changed phase spectrum to produce a modified complex spectrum. During synthesis, the low energy components of the modified complex spectrum cancel out more than the high energy components, thus reducing background noise. Using objective speech quality measures, informal subjective listening tests and spectrogram analysis, we show that the proposed method results in improved speech quality.

...read moreread less

Journal Article•DOI•

Quantitative Performance Analysis of Scalogram as Instantaneous Frequency Estimator

[...]

Ervin Sejdic¹, Igor Djurovic², Ljubisa Stankovic²•Institutions (2)

University of Toronto¹, University of Montenegro²

01 Aug 2008-IEEE Transactions on Signal Processing

TL;DR: Results showed that the scalogram with the Morlet wavelet exhibited good performance for the sample linear FM signal and the sample hyperbolic FM signal in comparison to the spectrogram.

...read moreread less

Abstract: Instantaneous frequency (IF) estimation through the estimation of peak locations in the time-frequency plane is an important approach for signals contaminated with additive white Gaussian noise. In this paper, the forementioned analysis is carried out for continuous wavelet transform. The analysis of the scalogram as the instantaneous frequency estimator is performed for any FM signal regardless of the mother wavelet. Accurate expressions for the bias and the variance of the estimator are derived, and reveal that the bias and the variance are signal dependent. Results are statistically confirmed through the numerical analysis for several mother wavelets, and among considered wavelets, the Morlet wavelet produces the smallest estimation error. Furthermore, the performance of the IF estimator based on the scalogram and the spectrogram were compared through analysis of mean square error. These results showed that the scalogram with the Morlet wavelet exhibited good performance for the sample linear FM signal and the sample hyperbolic FM signal in comparison to the spectrogram.

...read moreread less

Proceedings Article•DOI•

Localized spectro-temporal cepstral analysis of speech

[...]

Jake Bouvrie¹, Tony Ezzat¹, Tomaso Poggio¹•Institutions (1)

Massachusetts Institute of Technology¹

12 May 2008

TL;DR: A novel speech feature analysis technique based on localized spectro- temporal cepstral analysis of speech that is more robust to noise, and better capture temporal modulations important for recognizing plosive sounds is presented.

...read moreread less

Abstract: Drawing on recent progress in auditory neuroscience, we present a novel speech feature analysis technique based on localized spectro- temporal cepstral analysis of speech. We proceed by extracting localized 2D patches from the spectrogram and project onto a 2D discrete cosine (2D-DCT) basis. For each time frame, a speech feature vector is then formed by concatenating low-order 2D- DCT coefficients from the set of corresponding patches. We argue that our framework has significant advantages over standard one- dimensional MFCC features. In particular, we find that our features are more robust to noise, and better capture temporal modulations important for recognizing plosive sounds. We evaluate the performance of the proposed features on a TIMIT classification task in clean, pink, and babble noise conditions, and show that our feature analysis outperforms traditional features based on MFCCs.

...read moreread less

Journal Article•DOI•

Detection and characterization of seismic phases using continuous spectral estimation on incoherent and partially coherent arrays

[...]

Steven J. Gibbons¹, Frode Ringdal¹, Tormod Kværna¹•Institutions (1)

NORSAR¹

01 Jan 2008-Geophysical Journal International

TL;DR: In this paper, a real-time automatic detection system for regional phase arrivals on the NORSAR array and demonstrate how stable and accurate slowness and azimuth estimates can be obtained for quite marginal signals.

...read moreread less

Abstract: SUMMARY Seismic arrays are employed in the global monitoring of earthquakes and explosions because of their superior ability to detect and estimate the direction of incident seismic arrivals. Traditional beamforming and f–k analysis require waveform semblance over the full array aperture and cannot be applied in many situations where signals are incoherent between sensors. The NORSAR and MJAR arrays are two primary IMS stations where this is the case for high-frequency regional phases. Large intersite distances and significant geological heterogeneity at these arrays result in waveform dissimilarity which precludes coherent array processing in the frequency bands with optimal SNR. Multitaper methods provide low variance spectral estimates over short time-windows and seismic arrivals can be detected on single channels using a non-linear spectrogram transformation which attains local maxima at times and frequencies characterized by an energy increase. This detection procedure requires very little a priori knowledge of the spectral content of the signal. The transformed spectrograms can be beamformed over large-aperture arrays or networks according to theoretical time-delays resulting in an incoherent detection system which does not require waveform semblance at any frequencies. We outline a real-time automatic detection system for regional phase arrivals on the NORSAR array and demonstrate how stable and accurate slowness and azimuth estimates can be obtained for quite marginal signals. In the case of partially coherent arrays, the procedure described may provide stable, if low resolution, estimates which can subsequently be refined using coherent processing over subsets of sensors. In particular, we illustrate how the spectrogram beamforming method facilitates a stable and accurate slowness estimate for the incoherent high-frequency Pn arrival at the MJAR array in Japan from the 2006 October 9 underground nuclear test in North Korea.

...read moreread less

Journal Article•DOI•

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise

[...]

T.S. Brandes¹•Institutions (1)

Durham University¹

01 Aug 2008-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper describes an effective process for automated detection and classification of frequency-modulated sounds from birds, crickets, and frogs that have a narrow short-time frequency bandwidth using a frequency band threshold filter on spectrograms.

...read moreread less

Abstract: This paper describes an effective process for automated detection and classification of frequency-modulated sounds from birds, crickets, and frogs that have a narrow short-time frequency bandwidth. An algorithm is provided for extracting these signals from background noise using a frequency band threshold filter on spectrograms. Feature vectors are introduced and demonstrated to accurately model the resultant bioacoustic signals with hidden Markov models. Additionally, sequences of sounds are successfully modeled with composite hidden Markov models, allowing for a wider range of automated species recognition.

...read moreread less

Journal Article•DOI•

Spectrogram denoising and automated extraction of the fundamental frequency variation of dolphin whistles

[...]

Asitha Mallawaarachchi¹, Sim Heng Ong¹, Mandar Chitre, Elizabeth A. Taylor•Institutions (1)

National University of Singapore¹

05 Aug 2008-Journal of the Acoustical Society of America

TL;DR: This method is shown to perform well when dealing with the challenging problem of denoising broadband transients commonly encountered in warm shallow waters inhabited by snapping shrimp and would also be useful with other types of broadband transient noise.

...read moreread less

Abstract: Marine mammal vocalizations are often analyzed using time-frequency representations (TFRs) which highlight their nonstationarities. One commonly used TFR is the spectrogram. The characteristic spectrogram time-frequency (TF) contours of marine mammal vocalizations play a significant role in whistle classification and individual or group identification. A major hurdle in the robust automated extraction of TF contours from spectrograms is underwater noise. An image-based algorithm has been developed for denoising and extraction of TF contours from noisy underwater recordings. An objective procedure for measuring the accuracy of extracted spectrogram contours is also proposed. This method is shown to perform well when dealing with the challenging problem of denoising broadband transients commonly encountered in warm shallow waters inhabited by snapping shrimp. Furthermore, it would also be useful with other types of broadband transient noise.

...read moreread less

Journal Article•

Detection and recognition of North Atlantic right whale contact calls in the presence of ambient noise

[...]

Ildar R. Urazghildiiev¹, Christopher W. Clark¹, Timothy P. Krein¹•Institutions (1)

Cornell University¹

01 Mar 2008-Canadian Acoustics

TL;DR: In this paper, the problem of detection and recognition of contact calls produced by North Atlantic right whales, Eubalaena glacialis, is considered. And the proposed solution is based on a multiple-stage hypothesis-testing technique involving a spectrogram-based detector, spectrogram testing, and feature vector testing algorithms.

...read moreread less

Abstract: The problem of detection and recognition of contact calls produced by North Atlantic right whales, Eubalaena glacialis, is considered. A proposed solution is based on a multiple-stage hypothesis-testing technique involving a spectrogram-based detector, spectrogram testing, and feature vector testing algorithms. Results show that the proposed technique is able to detect over 80% of the contact calls detected by a human operator and to produce about 26 false alarms per 24 h of observation.

...read moreread less

Proceedings Article•DOI•

Using sparse representations for missing data imputation in noise robust speech recognition

[...]

Jort F. Gemmeke¹, Bert Cranen¹•Institutions (1)

Radboud University Nijmegen¹

01 Jan 2008

TL;DR: A novel imputation technique working on entire words that achieves recognition accuracies of 92% at SNR -5 dB using oracle masks on AURORA-2 as compared to 61% using a conventional frame-based approach.

...read moreread less

Abstract: Noise robustness of automatic speech recognition benefits from using missing data imputation: Prior to recognition the parts of the spectrogram dominated by noise are replaced by clean speech estimates. Especially at low SNRs each frame contains at best only a few uncorrupted coefficients. This makes frame-by-frame restoration of corrupted feature vectors error-prone, and recognition accuracy will mostly be sub-optimal. In this paper we present a novel imputation technique working on entire words. A word is sparsely represented in an overcomplete basis of exemplar (clean) speech signals using only the uncorrupted time-frequency elements of the word. The corrupted elements are replaced by estimates obtained by projecting the sparse representation in the basis. We achieve recognition accuracies of 92% at SNR −5 dB using oracle masks on AURORA-2 as compared to 61% using a conventional frame-based approach. The performance obtained with estimated masks can be directly related to the proportion of correctly identified uncorrupted coefficients.

...read moreread less

Dimension reduction of the modulation spectrogram for speaker verification.

[...]

Tomi Kinnunen, Kong Aik Lee¹, Haizhou Li¹•Institutions (1)

Agency for Science, Technology and Research¹

01 Jan 2008

TL;DR: A low-dimensional feature is defined which captures the shape of the modulation spectra and is improved from the previous result of Eer=25.1% to EER=17.4% on the NIST 2001 speaker recognition task.

...read moreread less

Abstract: A so-called modulation spectrogram is obtained from the conventional speech spectrogram by short-term spectral analysis along the temporal trajectories of the frequency bins. In its original definition, the modulation spectrogram is a highdimensional representation and it is not clear how to extract features from it. In this paper, we define a low-dimensional feature which captures the shape of the modulation spectra. The recognition accuracy of the modulation spectrogram based classifier is improved from our previous result of EER=25.1% to EER=17.4% on the NIST 2001 speaker recognition task.

...read moreread less

Journal Article•DOI•

Time frequency distributions of TMJ sounds

[...]

Sven E. Widmalm¹, William J. Williams¹, Cunsheng Zheng¹•Institutions (1)

University of Michigan¹

28 Jun 2008-Journal of Oral Rehabilitation

TL;DR: The RID gave a detailed representation of the TMJ signals' relative energy distribution in the time and frequency domains, with a great reduction in the interference or cross terms, and appears to be most useful in the application of time-frequency distributions in classification of TMJ sounds.

...read moreread less

Abstract: summary For analysis of time-varying signals such as the TMJ sounds, it is often desirable to know how the frequency components change with time, using methods of timefrequency analysis. The aim of this study was to compare two of the most familiar methods for energy density representation with a newly developed technique. The sounds were recorded with a microphone fastened to the subject’s forehead, transformed to the time-frequency domain and displayed as 3D- and contour plots using spectrogram, Wigner distribution (WD), and the reduced interference distribution (RID) to display their time-frequency energy distributions. The spectrogram resolved only the lowfrequency components. The WD provided higher resolution but also exhibited strong interference between components. The RID gave a detailed representation of the TMJ signals’ relative energy distribution in the time and frequency domains, with a great reduction in the interference or cross terms. The RID therefore appears to be most useful in the application of time-frequency distributions in classification of TMJ sounds.

...read moreread less

Journal Article•DOI•

Detection and identification of atomic clock anomalies

[...]

Lorenzo Galleani¹, Patrizia Tavella•Institutions (1)

Polytechnic University of Turin¹

05 Dec 2008-Metrologia

TL;DR: In this article, the authors developed a method that uses the dynamic Allan variance and the spectrogram to detect and to identify the typical anomalies of an atomic clock, and applied the method to simulated data.

...read moreread less

Abstract: When an anomaly occurs in an atomic clock, its stability and frequency spectrum change with time. The variation with time of the stability can be evaluated with the dynamic Allan variance. The variation with time of the frequency spectrum can be described with the spectrogram, a time–frequency distribution. We develop a method that uses the dynamic Allan variance and the spectrogram to detect and to identify the typical anomalies of an atomic clock. We apply the method to simulated data.

...read moreread less

Journal Article•DOI•

Model-based automated detection of echolocation calls using the link detector.

[...]

Mark D. Skowronski¹, M. Brock Fenton•Institutions (1)

University of Western Ontario¹

15 Jul 2008-Journal of the Acoustical Society of America

TL;DR: The links detector was validated by using an artificial recording environment, including synthetic calls, atmospheric absorption, and echoes, which provided control of signal-to-noise ratio and an absolute ground truth.

...read moreread less

Abstract: The link detector combines a model-based spectral peak tracker with an echo filter to detect echolocation calls of bats. By processing calls in the spectrogram domain, the links detector separates calls that overlap in time, including call harmonics and echoes. The links detector was validated by using an artificial recording environment, including synthetic calls, atmospheric absorption, and echoes, which provided control of signal-to-noise ratio and an absolute ground truth. Maximum hit rate (2% false positive rate) for the links detector was 87% compared to 1.5% for a spectral peak detector. The difference in performance was due to the ability of the links detector to filter out echoes. Detection range varied across species from 13 to more than 20 m due to call bandwidth and frequency range. Global features of calls detected by the links detector were compared to those of synthetic calls. The error in all estimates increased as the range increased, and estimates of minimum frequency and frequency of most energy were more accurate compared to maximum frequency. The links detector combines local and global features to automatically detect calls within the machine learning paradigm and detects overlapping calls and call harmonics in a unified framework.

...read moreread less

Journal Article•DOI•

Characterisation of human gait using a continuous-wave radar at 24 GHz

[...]

C. Hornsteiner¹, Jürgen Detlefsen¹•Institutions (1)

Technische Universität München¹

26 May 2008-Advances in Radio Science

TL;DR: In this paper, it was shown that even during one gait cycle the velocity of the torso, which constitutes the major part of the reflection, is not constant and a smaller portion of the signal is reflected from the legs.

...read moreread less

Abstract: . Human locomotion consists of a complex movement of various parts of the body. The reflections generated by body parts with different relative velocities result in different Doppler shifts which can be detected as a superposition with a Continuous-Wave (CW) Radar. A time-frequency transform like the short-time Fourier transform (STFT) of the radar signal allows a representation of the signal in both time- and frequency domain (spectrogram). It can be shown that even during one gait cycle the velocity of the torso, which constitutes the major part of the reflection, is not constant. Further a smaller portion of the signal is reflected from the legs. The velocity of the legs varies in a wide range from zero (foot is on the ground) to a velocity which is higher than that of the torso. The two dominant parameters which characterise the human gait are the step rate and the mean velocity. Both parameters can be deduced from suitable portions of the spectrogram. The statistical evaluation of the two parameters has the potential to be included for discrimination purposes either between different persons or between humans and other moving objects.

...read moreread less

Proceedings Article•DOI•

Hierarchical spectro-temporal features for robust speech recognition

[...]

Xavier Domont¹, Martin Heckmann¹, Frank Joublin¹, Christian Goerick¹•Institutions (1)

Honda¹

12 May 2008

TL;DR: An auditory-inspired feed-forward architecture which achieves good performance in noisy conditions on a segmented word recognition task is presented and its combination with MFCCs or RASTA features yields improved recognition scores in noise.

...read moreread less

Abstract: Previously we presented an auditory-inspired feed-forward architecture which achieves good performance in noisy conditions on a segmented word recognition task. In this paper we propose to use a modified version of this hierarchical model to generate features for standard hidden Markov models. To obtain these features we firstly compute the spectrograms using a Gammatone filterbank. A filtering over the channels permits to enhance the formant frequencies which are afterwards detected using Gabor-like receptive fields. Then the responses of the receptive fields are combined to complex features which span the whole frequency range and extend over three different time windows. The features have been evaluated on a single digit recognition task. The results show that their combination with MFCCs or RASTA features yields improved recognition scores in noise.

...read moreread less

Patent•

Arbitrarily signal generating device

[...]

Yue Song, Zhijian Zhang, Chiye Yu, Shengping Hu, Yingbiao Du - Show less +1 more

16 Apr 2008

TL;DR: In this paper, an FPGA-based random signal generator comprising a PC, a USB controller, an MCU3, a MCU interface module, a crystal resonator, an EPC2, a time controller, a dual-channel DA output circuit, a frequency controller, register matrix unit, a keyboard, a key scanning module, FLASH, a Flash control module, TFT display, a TFT control module and a DDS signal generator, a waveform synthesis module are presented.

...read moreread less

Abstract: The invention discloses an FPGA-based random signal generator comprising a PC, a USB controller, an MCU3, an MCU interface module, a crystal resonator, an EPC2, a time controller, a dual-channel DA output circuit, a frequency controller, a register matrix unit, a keyboard, a keyboard scanning module, a FLASH, a Flash control module, a TFT display, a TFT control module, a DDS signal generator, a waveform synthesis module and other waveform generators. When the invention is used, software can automatically complete frequency spectrum information identification and get frequency point amplitude and phase parameters after a frequency spectrogram and phase spectrogram parameters are inputted into a software control interface; then a time domain information table is obtained after sampling values are quantized and encoded, the time domain information table is downloaded to the RAM of a DDS generating circuit to realize periodic or nonperiodic time domain signal reduction output; furthermore, a waveform amplitude is online stepped and adjustable, thereby realizing the aims of frequency domain output and time domain output.

...read moreread less

Journal Article•DOI•

On the Description of Spectrogram Probabilities With a Chi-Squared Law

[...]

Julien Huillery¹, Fabien Millioz¹, Nadine Martin¹•Institutions (1)

Centre national de la recherche scientifique¹

01 Jun 2008-IEEE Transactions on Signal Processing

TL;DR: A chi-squared description of theSpectrogram distribution appears accurate when the analysis window used to construct the spectrogram decreases to zero at its boundaries, regardless of the level of correlation contained in the signal.

...read moreread less

Abstract: Given a correlated Gaussian signal, may a chi-squared law of probability always be used to describe a spectrogram coefficient distribution? If not, would a "chi-squared description" lead to an acceptable amount of error when detection problems are to be faced in the time-frequency domain? These two questions prompted the study reported in this paper. After deriving the probability distribution of spectrogram coefficients in the context of a non centered Gaussian correlated signal, the Kullback-Leibler divergence is first used to evaluate to what extent the nonwhiteness of the signal and the Fourier analysis window impact the probability distribution of the spectrogram. To complete the analysis, a detection task formulated as a binary hypothesis test is considered. We evaluate the error committed on the probability of false alarm when the likelihood ratio test is expressed with chi-squared laws. From these results, a chi-squared description of the spectrogram distribution appears accurate when the analysis window used to construct the spectrogram decreases to zero at its boundaries, regardless of the level of correlation contained in the signal. When other analysis windows are used, the length of the window and the correlation contained in the analyzed signal impact the validity of the chi-squared description.

...read moreread less

Patent•

Beat enhancement device, sound output device, electronic apparatus and method of outputting beats

[...]

Kosei Yamashita¹, Yasuhide Hosoda¹•Institutions (1)

Sony Computer Entertainment¹

05 May 2008

TL;DR: In this paper, a beat extractor extracts a beat component of the sound signal based on a spectrogram, and generates a beat waveform having information of a beat timing and a beat intensity.

...read moreread less

Abstract: In a sound output device, a sound input unit acquires a sound signal reproduced by a reproduction device. A beat extractor extracts a beat component of the sound signal based on a spectrogram, and generates a beat waveform having information of a beat timing and a beat intensity. An output signal generator amplifies the sound signal with the beat waveform being a gain, using the beat timing and beat intensity which the beat waveform has. A sound output unit outputs the beat enhanced sound signal as a sound by performing D/A conversion on the beat enhanced sound signal.

...read moreread less

Patent•

Sound source separation system, sound source separation method, and computer program for sound source separation

[...]

Katsutoshi Itoyama¹, Hiroshi G. Okuno¹, Masataka Goto¹•Institutions (1)

National Institute of Advanced Industrial Science and Technology¹

14 Apr 2008

TL;DR: In this article, an audio signal produced by playing a plurality of musical instruments is separated into sound sources according to respective instrument sounds, and each time a separation process is performed, the updated model parameter estimation/storage section 114 estimates parameters respectively contained in updated model parameters.

...read moreread less

Abstract: An audio signal produced by playing a plurality of musical instruments is separated into sound sources according to respective instrument sounds. Each time a separation process is performed, the updated model parameter estimation/storage section 114 estimates parameters respectively contained in updated model parameters such that updated power spectrograms gradually change from a state close to initial power spectrograms to a state close to a plurality of power spectrograms most recently stored in a power spectrogram separation/storage section. Respective sections including the power spectrogram separation/storage section 112 and an updated distribution function computation/storage section 118 repeatedly perform process operations until the updated power spectrograms change from the state close to the initial power spectrograms to the state close to the plurality of power spectrograms most recently stored in the power spectrogram separation/storage section 112. The final updated power spectrograms are close to the power spectrograms of single tones of one musical instrument contained in the input audio signal formed to contain harmonic and inharmonic models.

...read moreread less